如何获得FPKMRPKM计算需要的基因长度(考虑exon之间的
overlap)
版权声明:本⽂为博主原创⽂章,转载请注明出处
这⾥我们跟Cufflinks的原理⼀致,使⽤总的外显⼦长度,并且去除过多的重叠的外显⼦的部分。使⽤R语⾔,输⼊为基因的GTF⽂件
包的安装
依赖data.table, IRanges,rtracklayer
install.packages("data.table")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rtracklayer")
感冒怎么写BiocManager::install("IRanges")
代码
library(data.table)
library("IRanges")
require("rtracklayer")
世界灾难hg19 <- readGFF("f")
anno <- tDT(hg19)
绘制流程图anno <- anno[type=="exon",]
tnames(anno,c("qid","start","end","gene_name","exon_number"),c("Chr","ExonStart","ExonEnd","Gene","Exon_number"))
#mkdir bin and mean by bin
Exon_region <- unique(anno[,.(Chr,ExonStart,ExonEnd,Exon_number,Gene)])
俞立中
Exon_region <- Exon_region[,{x <- IRanges(ExonStart,ExonEnd);y <- reduce(x); list(ExonStart=y@start,ExonEnd=y@start+y@width-1)},by=.(Gene,Chr)]
里约奥运会开幕式
Exon_region[,Exon_num:=1:.N,by=Gene]
Exon_region <- Exon_region[,.(Chr,ExonStart,ExonEnd,Exon_num,Gene)]
Exon_len <- Exon_region[,.(ExonLen = ExonEnd - ExonStart + 1),by=.(Exon_num,Gene)]
gene_len <- Exon_len[,.(Length = sum(ExonLen)),by=Gene]
# write out
fwrite(Exon_region,file="All_hg19gene_exon.bed", p = "\t", col.names = T)
fwrite(gene_len, file = "All_", p = "\t", col.names = T)
~
结果⽂件
朝散大夫
出国留学中介机构1.
2.
豹子简笔画
参考资料