CNV拷贝数变异分析(GISTIC在线分析、maftools)

更新时间:2023-07-13 19:49:08 阅读: 评论:0

CNV拷贝数变异分析(GISTIC在线分析、maftools)
CNV拷贝数变异分析是什么?贴⼀段TCGA官⽹的介绍
“The copy number variation (CNV) pipeline us Affymetrix SNP 6.0 array data to identify genomic regions that are repeated and infer the copy number of the repeats. This pipeline is built onto the existing TCGA level 2 data
generated by Birdsuite and us the DNAcopy R-package to perform a circular binary gmentation (CBS) analysis.
CBS translates noisy intensity measurements into chromosomal regions of equal copy number. The final output files are gmented into genomic regions with the estimated copy number for each region. The GDC further transforms the copy number values into gment mean values, which are equal to log2(copy-number/ 2). Diploid regions will have a gment mean of zero, amplified regions will have positive values, and deletions will have negative values.”
⽬录
1. gment file数据下载和处理
1.1 从TCGA下载数据
下载⽂件类型:
Copy Number Segment:A table that associates contiguous chromosomal gments with genomic coordinates, mean array intensity, and the number of probes that bind to each gment.
Masked Copy Number Segment:A table with the same information as the Copy Number Segment except that gments with probes known to contain germline mutations are removed
这⾥我⽤Masked Copy Number Segment做⽰范
rm(list = ls())
伊藤沙月options(stringsAsFactors = F)
options(scipen = 200)
library(SummarizedExperiment)
library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-BLCA",
data.category = "Copy Number Variation",
GDCdownload(query,method = "api")
骂人的英语BLCA_CNV_download <- GDCprepare(query = query, save = TRUE, save.filename = "BLCA_CNV_download.rda")
1.2 数据处理
no rervations#读取rda⽂件
A=load("C:/Urs/Meredith/Desktop/BLCA_CNV_download.rda")
tumorCNV <- eval(par(text = A))
#改名
tumorCNV=tumorCNV[,2:7]
tumorCNV=tumorCNV[,c('Sample','Chromosome','Start','End','Num_Probes','Segment_Mean')]
write.table(tumorCNV,file = '',p = '\t',quote = F,row.names = F)
#提取01A结尾的样本(这⾥我⽤了python,⼩伙伴们可以⽤R来做)
filename =''
finalResultName =''
read_file =open(filename)
out_file =open(finalResultName,"r+")
昆明职业技能培训for line in adlines():
data = line.split()
x = data[0][13:16]
if x =='01A':
out_file.write(data[0])
beckonout_file.write('\t')
out_file.write(data[1])
out_file.write('\t')本地人英文
out_file.write(data[2])
out_file.write('\t')
out_file.write(data[3])
out_file.write('\t')
out_file.write(data[4])
out_file.write('\t')
out_file.write(data[5])
gloomy什么意思out_file.write('\t')
out_file.write('\n')
2. marker file数据下载和处理
2.1 从TCGA下载数据
TCGA现在的参考基因组版本是hg38,需要从官⽹下载marker file。下载地址:,选择最新版本的“SNP6 GRCh38 Remapped Probet File for Copy Number Variation Analysis”⽂件,并注意“If you are using Masked Copy Number Segment for GISTIC analysis, plea only keep probets with freqcnv =FALSE”
三角形三边的关系
2.2 提取freqcnv=FALSE数据,并且整理成标准格式#这⾥我也是⽤的python(因为我电脑太菜⽤R要跑很久)
filename ="ap."
finalResultName =""
read_file =open(filename)
out_file =open(finalResultName,"r+")
for line in adlines():
data = line.split()
if data[5]=='FALSE':
out_file.write(data[0])
out_file.write('\t')
out_file.write(data[1])
out_file.write('\t')
英国议会否决脱欧out_file.write(data[2])
out_file.write('\t')
out_file.write('\n')
3. GenePattern GISTIC_2.0在线分析
refgene file⼩伙伴们根据需要选择,这⾥我⽤的是TCGA下载的数据,所以选择hg38。将gment_file跟marker_file分别拖到g file 跟markers file区域。置信区间系统默认0.9,可以根据需要调整。点击RUN。
quick view
要等半个⼩时左右。
我们来看下其中两个⽂件

本文发布于:2023-07-13 19:49:08,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/78/1094840.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:下载   数据   需要   选择
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图