gwassnp和_GWAS,SNP,和疾病
三种⽅法如何获取snp信息
有研究表明STAT4上的rs7574865和HLA-DQ的 rs9275319是⼈群中⼄型肝炎病毒(HBV)相关肝细胞癌(HCC)遗传易感基因
意思是,某两个位点变异导致⼄型肝炎病毒和相关肝细胞癌发⽣的关键原因。rsID分别代表两个变异位点 (发现变异位点后通过
vep/snpEFF对变异位点进⾏的注释)。所以根据rsID能够找到这个位点在基因组的位置。可以⽤dnSNP来查看rsID的基因坐标。
⽅法⼀:
下载All_ 这个⽂件(很⼤数据):
mkdir -p ~/annotation/variation/human/dbSNP
cd ~/annotation/variation/human/dbSNP
## ftp://v/snp/organisms/human_9606_b147_GRCh38p2/
## ftp://v/snp/organisms/human_9606_b147_GRCh37p13/
nohup wget ftp://v/snp/organisms/human_9606_b147_GRCh37p13/VCF/All_ &
wget ftp://v/snp/organisms/human_9606_b147_GRCh37p13/VCF/All_bi
运⾏的时候有报错:No such directory ‘snp/organisms/human_9606_b147_GRCh37p13/VCF’.
⽅法⼆:
也可以登录⽹页版本数据库,直接修改 url(⼩量搜索):
⽅法三:
SNPedia,直接修改url (优点,搜集了⾮常多的其它数据库的链接)苦荞麦
拓展:如何进⾏GWAS分析
⽅法⼀:指示灯英文
plink进⾏分析
plink做SNP筛选和GWAS
plink进⾏GWAS分析
⽅法⼆:
R包分析 (绘制曼哈顿图)
Postgwas: Advanced GWAS Interpretation in R
如何call SNP and indels
如何SNP过滤
缺失⽐例 (Missing rates)
GENO>0.05
Shortly we will apply more stringent criteria, such that GENO > 0.05. In this ca, 0.05*89 = 4.45 samples, meaning that if a SNP is missing in 4.45 more more samples, that SNP will be removed from the datat.
89是全部sample数,89xGENO得到的阀值是4.45,所以某个call的SNP在4样品(或以下)⾥没有出现,保留;在5个样本以上没出现则删掉。
最⼩等位基因频率 (Minor Allele frequencies)
提⽰: MAF< 0.03 如果SNP较多可以设置为MAF<0.05
MAF is the Minor Allele Frequency. It can be ud to exclude SNPs which are not informative becau they show little variation in the sample t being analyzed. For instance, if a SNP shows variation in only 1 of the 89 individuals, it is not uful statistically and should be removed.
意思是,如果某⼀个SNP只出现在很少数样品(< MAF x Total Number of samples)的时候,就需要移除
Removing SNPs out of Hardy-Weinberg equilibrium(p-value > 10^6 - 10^4 ) 哈迪温伯格平衡
Population genetic theory suggests that under ‘normal’ conditions, there is a predictable relationship between allele frequencies and genotype frequencies. In cas where the genotype distribution is different from what one would expect bad on the allele frequencies, one potential explanation for this is genotyping error. Natural lection is another explanation. For this reason, we typically check for deviation from Hardy-Weinberg equilibrium in the controls for a ca-control study. For a quantitative trait, PLINK just us everyone. The following command generates p-values for deviation from HWE for each SNP. Low p-values indicate that a SNP is out of HWE.
由vcf⽂件进⾏SNP过滤
坐立不安
运⽤vcftools转换为plink的输⼊形式,输出 bed⽂件 (或者map⽂件),然后作为输⼊进⾏过滤
vcftools --vcf my.vcf --plink --out plink
plink --noweb --file plink --geno 0.05 --maf 0.05 --hwe 0.0001 --make-bed --out QC
如果还不知道什么是GWAS?什么是SNP?这⾥是定义:
Genome-wide association studies (GWAS) 是指在⼈類全基因組範圍內利⽤存在的序列變異,即單核苷酸多型性(SNP),並從中篩選出與疾病相關的SNPs。英语幽默
哪些疾病与SNP有关呢?
近些年,全基因组关联分析⽅法(Genome-Wide Association Study,简称GWAS)利⽤⼤群体和⾼密度SNP(Single Nucleotide Polymorphism,单核苷酸多态)分⼦标记已经定位到了上千个与复杂疾病关联的SNP位点,⽽且这些关联信号在多次试验中有很⾼的可重复性。⽐如⼈类常见疾病肥胖,糖尿病,精神分裂等。
广告的英文SNP的误差因素?
由于随机采样带来到抽样误差(这在现实中⽆法避免)以及SNP之间复杂的连锁不平衡(linkage diquilibrium, 简称LD),GWAS定位到的SNP位点通常不是致病位点。黎明身高
2016年发表在PLOS-one上的⽂章,介绍SNP与⾻关节炎。
虽然不是很⽜的杂志,但是⽂章质量很好。
Functional Characterization of the Osteoarthritis Susceptibility Mapping to CHST11—A Bioinformatics and Molecular Study
根据标题可以知道,是对Osteoarthritis疾病的研究,针对的⽬标基因是CHST11,Carbohydrate sulfot
ransfera 11 is an enzyme that in humans is encoded by the CHST11 糖-磺基转移酶 (不知道具体翻译,请(⽣)化学⼤神指教)。基因位置 是 chr12:拥抱大自然
毛体书法104,455,295-104,762,014 (GRCh38)。CHST11的功能研究,英国剑桥的桑格研究所有做过该基因敲除的⼩
⿏,Chst11^tm1a(KOMP)Wtsi 。这个基因主要与⾻头和软⾻的表型phenotyping有关系。⼩⿏的表型研究⾥发现异常:Homozygous viability at P14。
2012年柳叶⼑⾥也有⽂章说这个基因突变会导致,⾻关节炎,这个杂志就不⽤说有多厉害了。
Identification of new susceptibility loci for osteoarthritis (arcOGEN): a genome-wide association
接下来分别看⼀下这两篇⽂章,和这个基因,以及这个基因的SNP,以及对其功能分析上的研究与阐述。
(⼀) ⾻关节炎的背景:
什么是OA?
(1)Osteoarthritis (OA) is a common dia of older individuals that is characterized by the focal(病灶点) loss of articular cartilage. This loss usually occurs gradually over many years and typically results in chronic pain and verely impaired joint function by the sixth or venth decade of life.
(2)Osteoarthritis is the most common form of arthritis worldwide and is a major cau of pain and disability in elderly people.
genetics上OA的特点?
(1)OA is polygenic and unlike many other common arthritic dias, there are no OA risk- conferring loci of large singular impact
(2)It is a complex dia of the musculoskeletal system with both genetic and environmental risk factors. From the results of heritability studies in twins, sibling pairs, and families, genetic factors are estimated to account for about 50% of the risk of developing osteoarthritis in the hip or knee, although preci estimates vary according to x, affected site, and verity of dia.
(⼆)研究⽅法:
(1)偏重功能分析
Identification of SNPs in LD with rs835487
Identification of Sequences Homologous to the Enhancer in Non-Human Mammals
Cloning of pGL3-Promoter Lucifera Reporter Plasmids
Transfection of Cell Lines
Electrophoretic Mobility Shift Assays (EMSAs)
Ethics Statement, Cartilage Collection and Nucleic Acid Extraction
Gene Expression, Genotyping and AEI Analysis
Chondrogenic Differentiation of MSCs
(2)偏重分析
We undertook a large genome-wide association study (GWAS) in 7,410 unrelated and retrospectively and prospectively lected patients with vere osteoarthritis in the arcOGEN study, 80% of whom had undergone total joint replacement, and 11,009 unrelated controls from the UK. We
replicated the most promising signals in an independent t of up to 7,473 cas and 42,938 controls, from studies in Iceland, Estonia, the Netherlands, and the UK. All patients and controls were of European descent.
(三)结论
(1)rs835487 (allele G; THR) located within intron two of CHST11 is associated with hip OA