比较差异_比较不同流程(limmavoom,edgeR,DESeq2)差异分析的区别

更新时间:2023-07-07 06:10:34 阅读: 评论:0

⽐较差异_⽐较不同流程(limmavoom,edgeR,DESeq2)
差异分析的区别
下⾯是9⽉学员“冰吉林”的投稿
前⾔:
距离第⼀次听说⽣信已经⼗⼏年了,现在是邋遢⼤叔重新开始学代码,精⼒确实已不像从前,各位⼊坑还是要乘早。后来约莫在5年前,课题组当时有个RNA-Seq数据,lab meeting时听瑞典⼩哥在汇报DEGs筛选,当时感觉好是神奇。其实陆陆续续也有过学习的念头,但在对⾃⼰的各种纵容下,想法⼜逐渐隐没。直到2⽉前,机缘巧合参加了⽣信技能树培训,才进⼀步强化了⾃⼰学习⽣信技术的信念。
⼏天前,曾⽼师在群⾥给我布置了⼀份学徒作业,⽐较不同流程(limma/voom,edgeR,DESeq2  )差异分析的区别,拟使⽤的数据集是TCGA-BRCA的counts值矩阵。作为⾮肿瘤⼝的⽣信新⼈,秉着⽆知者⽆畏的态度试了⼀试。以下是具体过程。
代码主要来源于⼩洁⽼师(不是我吹,听了⼩洁⽼师的课,傻⼦也能学会R代码)。
R包安装
# R包太多,这⾥略了。
for (pkg in cran_packages){
if (! require(ly=T) ) {
install.packages(pkg,ask = F,update = F)
require(ly=T)
}
}
# first prepare BioManager on CRAN
if(!require("BiocManager")) install.packages("BiocManager",update = F,ask = F)
# u BiocManager to install
for (pkg in Biocductor_packages){
if (! require(ly=T) ) {
开题报告课题来源BiocManager::install(pkg,ask = F,update = F)
require(ly=T)
}
}
#前⾯的报错都先不要管。主要看这⾥
for (pkg in c(Biocductor_packages,cran_packages)){
require(ly=T)
}
#哪个报错,就回去安装哪个。如果你没有安装xx包,却提⽰你xx包不存在,这也正常,是因为复杂的依赖关系,缺啥补啥。
if(!require(tinyarray))devtools::install_local("tinyarray-master.zip",upgrade = F)
数据下载
1.从⽹页选择数据,下载manifest⽂件
在Repository勾选⾃⼰需要的ca和file类型。
2.使⽤gdc-client⼯具下载
#step1
options(stringsAsFactors = F)
library(stringr)
cancer_type="TCGA-BRCA"
if(!ists("clinical"))ate("clinical")
if(!ists("expdata"))ate("expdata")
dir()
#此代码块是在Terminal中使⽤
conda create -n gdc python=3.7
source activate gdc
git /NCI-GDC/gdc-client
cd gdc-client
pip install -
python tup.py install 2>&1 | tee -a install.log
###安装完毕后开始下载数据
gdc-client download -m gdc_manifest. -d clinical
gdc-client download -m gdc_manifest. -d expdata
#下载结束后
length(dir("./clinical/"))
length(dir("./expdata/"))
表达矩阵获得
1.整理临床信息
library(XML)
result "./clinical/00049989-fa21-48fb-8dda-710c0dd5932e/nationwidechildrens_l")  #单个运⾏rootnode rootsize print(rootnode[1])
#print(rootnode[2])
xmldataframe 2])
head(t(xmlToDataFrame(rootnode[2])))
xmls = dir("clinical/",pattern = "*.xml$",recursive = T)  #可层级
td = function(x){
result "clinical/",x))
rootnode  xmldataframe 2])
return(t(xmldataframe))
}
cl = lapply(xmls,td)
cl_df cl_df[1:3,1:3]
clinical = data.frame(cl_df)
kux格式用什么播放器clinical[1:4,1:4]
2.整理表达矩阵
批量读取所有的⽂件。
count_files = dir("expdata/",pattern = "*.$",recursive = T)奶油芝士怎么吃
十一月份什么星座
ex = function(x){
result "expdata/",x),row.names = 1,p = "\t")
return(result)
}
#  head(ex("03aee74e-4e37-4a58-a720-c90e807d2f40/"))
处理近义词exp = lapply(count_files,ex)
exp dim(exp)
exp[1:4,1:4]
下载cart-json⽂件已将⽂件名与样本ID⼀⼀对应。
meta "metadata.cart.2020-12-17.json")
colnames(meta)
ids ids[[1]]
class(ids[[1]][,1])
ID = sapply(ids,function(x){x[,2]})
file2id = data.frame(file_name = meta$file_name,
ID = ID)
head(file2id$file_name)
head(count_files)
count_files2 = stringr::str_split(count_files,"/",simplify = T)[,2]
count_files2[1] %in% file2id$file_name
file2id = file2id[match(count_files2,file2id$file_name),]
colnames(exp) = file2id$ID
exp[1:4,1:4]
3.过滤表达矩阵
dim(exp)
exp = exp[apply(exp, 1, function(x) sum(x > 1) > 9), ]
dim(exp)
exp[1:4,1:4]
# 过滤在⾄少在75%的样本中都有表达的基因
keep_exp 0) >= floor(0.75*ncol(exp))
table(keep_exp)
exp exp[1:4,1:4]
dim(exp)
4.添加分组信息
根据样本ID的第14-15位,给样本分组(tumor和normal)
table(str_sub(colnames(exp),14,15))
group_list = ifel(as.numeric(str_sub(colnames(exp),14,15)) 10,'tumor','normal') group_list = factor(group_list,levels = c("normal","tumor"))一个人的爱
table(group_list)
save(exp,clinical,group_list,cancer_type,file = paste0(cancer_type,"gdc.Rdata")) DEGs分析及对⽐
#三⼤R包差异分析
if(!require(stringr))install.packages('stringr')
if(!require(ggplotify))install.packages("ggplotify")
if(!require(patchwork))install.packages("patchwork")
if(!require(cowplot))install.packages("cowplot")
if(!require(DESeq2))BiocManager::install('DESeq2')
if(!require(edgeR))BiocManager::install('edgeR')
if(!require(limma))BiocManager::install('limma')
rm(list = ls())
load("TCGA-BRCAgdc.Rdata")
table(group_list)
#deq2----
library(DESeq2)
colData                      condition=group_list)
if(!ists(paste0(cancer_type,"dd.Rdata"))){
dds    countData = exp,
小孩烫伤
dds    countData = exp,
colData = colData,
design = ~ condition)
dds  save(dds,file = paste0(cancer_type,"dd.Rdata"))
}
load(paste0(cancer_type,"dd.Rdata"))
# 两两⽐较
res "condition",rev(levels(group_list))))
resOrdered # 按照P值排序
DEG head(DEG)
#添加change列标记基因上调下调
logFC_cutoff 2*sd(abs(log2FoldChange)) )
#logFC_cutoff
k1 = (DEG$pvalue 0.05)&(DEG$log2FoldChange k2 = (DEG$pvalue 0.05)&(DEG$log2FoldChange > logFC_cutoff) DEG$change = ifel(k1,"DOWN",ifel(k2,"UP","NOT"))
table(DEG$change)
head(DEG)
DESeq2_DEG
#edgeR----
library(edgeR)
dge dge$samples$lib.size dge
design 0+group_list)
rownames(design)colnames(design)
dge dge dge
fit fit2 1,1))
戏曲知识DEG=topTags(fit2, n=nrow(exp))
DEG=as.data.frame(DEG)
logFC_cutoff 2*sd(abs(logFC)) )
#logFC_cutoff
k1 = (DEG$PValue 0.05)&(DEG$logFC k2 = (DEG$PValue 0.05)&(DEG$logFC > logFC_cutoff)
DEG$change = ifel(k1,"DOWN",ifel(k2,"UP","NOT"))
head(DEG)
table(DEG$change)
edgeR_DEG
###limma----
library(limma)
design 0+group_list)
colnames(design)=levels(group_list)
rownames(design)=colnames(exp)
dge dge
v "quantile")
fit
constrasts = paste(rev(levels(group_list)),collap = "-")
cont.matrix fit2=contrasts.fit(fit,cont.matrix)
fit2=eBayes(fit2)
DEG = topTable(fit2, coef=constrasts, n=Inf)
DEG = na.omit(DEG)
logFC_cutoff 2*sd(abs(logFC)) )
#logFC_cutoff
k1 = (DEG$P.Value 0.05)&(DEG$logFC k2 = (DEG$P.Value 0.05)&(DEG$logFC > logFC_cutoff)
DEG$change = ifel(k1,"DOWN",ifel(k2,"UP","NOT"))
table(DEG$change)
head(DEG)
limma_voom_DEG

本文发布于:2023-07-07 06:10:34,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/89/1071269.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:学习   下载   矩阵   表达   数据   安装   开始
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图