keg

更新时间:2023-01-04 16:14:26 阅读: 评论:0


2023年1月4日发(作者:初中英语不好怎么办)

⽆参转录组GO、KEGG富集分析——

diamond+idmapping+GOstats

准备

(1)⽤⽆参转录组分析软件得到unigenefasta⽂件,命名为my_,格式如下表所⽰:

>MSTRG.5.1gene=MSTRG.5

TGATGTCATCGATCCGTGACGTTTAGTATTCAACCAATAGGAATCAACCACGTAGGATTGGCGATCCTCG

TCAAACGTGTAAACGGTAGATCTGAACCGTTGACTGGCTGAGAGAAAACACATATTGTGGTATTTTAGTC

GGTACGATTAAGAAACACGAGAAACACACCGATGAGCGATAACAACCGTAGGATCGTCAACTTGCTTTTG

ACATCGTTGCCTATAAATTAAATATCAACAGCCCTACACGTGAGATTACTTTCATTATCTTCCCTTTTCG

AGATCGGAGCACTAATTTGAATTCAGAGAAGCGAAAGCGACGCAGAGAAGATGGCTCGTACCAAGCAGAC

CGCTCGTAAGTCTACCGGAGGAAAGGCCCCGAGGAAGCAGCTTGCTACTAAGGTATGGACTCTCTCTCTC

TCTCTCTCTCTCTCTCTCTCTCTC

>MSTRG.6.1gene=MSTRG.6

GTAGAAATATAATGGGCTTAAAGATAAGGCCCATTAATTACAGAATCCTACGGGCACGTTACGTGCCGGT

TTAGTTTATTTTGCATTAAAAACCAATTTCGGAGTCGTCATCTTCCTCTTCGCCCAGATTCACTTCCTTC

GAGAAGATTCGCATCATTTTCCCAGTTATGGATTGGCAAGGACAGAAACTAGCGGAGCAGCTGATGCAGA

TCCTGCTCTTGATCGCCGCCGTTGTGGCGTTCGTCGTTGGTTACACGACGGCGTCGTTTAGGACGATGAT

GTTGATTTACGCGGGAGGGGTTGGTGTCACGACGTTGATCACGGTGCCGAACTGGCCATTCTTTAACCGT

CATCCACTCAAGTGGTTGGAACCAAGTGAAGCGGAGAAGCATCCTAAACCGGAAGTCGTTGTTAGCTCGA

AGAAGAAGTCATCTAAAAAGTAGCAAATCGTCTTTTGTAATCTCTCTTTTTTTTCTAGACCTTTGATATA

AAAAAAAAAGACATGTTCTGGATTTTGCTTATGAATAAGATAGTCTAAATGACATAATAATTTCGATTGA

TTCTGAGACATCCTTGCTTAATTGTTATGTA

>BnaA01g00010D.1gene=BnaA01g00010D

TTTAGTTTATTTTGCATTAAAAACCAATTTCGGAGTCGTCATCTTCCTCTTCGCCCAGATTCACTTCCTT

CGAGAAGATTCGCATCATTTTCCCAGGTATAGATTCTCTGACGAGATCTGATTCTAGTTTGTTGCTTATT

GTTCTTGTAGATTCGAATCCGGCGATTATCAATTGCATTTCGTGCTGGATTCAATTCGAAAGATCCGATC

TAATCGTTTTGGTTGGTGTTGATTCAGTTATGGATTGGCAAGGACAGAAACTAGCGGAGCAGCTGATGCA

GATCCTGCTCTTGATCGCCGCCGTTGTGGCGTTCGTCGTTGGTTACACGACGGCGTCGTTTAGGACGATG

ATGTTGATTTACGCGGGAGGGGTTGGTGTCACGACGTTGATCACGGTGCCGAACTGGCCATTCTTTAACC

GTCATCCACTCAAGTGGTTGGAACCAAGTGAAGCGGAGAAGCATCCTAAACCGGAAGTCGTTGTTAGCTC

GAAGAAGAAGTCATCTAAAAAGTAG

(2)下载并安装DIAMOND⼯具,可提升BLAST⽐对速度。

wget/bbuchfink/diamond/releas/download/v0.9.18/

1.将unigenes⽐对到swissprot数据库(NR数据库同)

(1)获取swissprot数据库:

wgetftp:///blast/db/FASTA/

(2)建库:

diamondmakedb--inswissprot-dswissprot

(3)blastx⽐对:

Evalue设定为1e-5,每query输出1条对应hit。阈值设定规则见参考2。

diamondblastx-dswissprot-qmy_-k1-e0.00001-oswiss_dia_matches.m8

得到的swiss_dia_matches.m8⽂件格式如下表所⽰,第⼆列为swissprotaccessionnumber:

MSTRG.5.1Q5MYA4.396.2261.6e-0651.6

MSTRG.6.1Q944J0.183.79224.2e-36152.5

BnaA01g00010D.1Q944J0.183.79221.4e-35150.6

注释

(1)获取⽂件和Uniprot2GO_⽂件(py⽂件下载见参考3):

wgetftp:///databas/idmapping/

idmapping⽂件另⼀下载路径:

wget/pub/databas/uniprot/knowledgeba/idmapping/idmapping_

⽂件格式如下表所⽰:

Q6GZX4001R_FRG3G2947773YP_031579.181941549;49237298PF04947GO:0006355;GO:0046782;GO:0006351UniRef100_Q6GZX4UniRef9

Q6GZX3002L_FRG3G2947774YP_031580.149237299;81941548;47060117PF03003GO:0033644;GO:0016021UniRef100_Q6GZX3UniRef90_Q6GZX3

Q197F8002R_IIV34156251YP_654574.1109287880;123808694;106073503UniRef100_Q197F8UniRef90_Q197F8UniRef50_Q197F8UPI0000D83464

⽂件以tab键分隔,第1列为swissportaccessionnumber(UniportkbID),第4列为NRID,第8列为GO注释。

(2)更改swiss_dia_matches.m8⽂件,将其第⼆列以“.”分开,只取“.”前⾯的字符串。更改后如下表所⽰:

MSTRG.5.1Q5MYA496.2261.6e-0651.6

MSTRG.6.1Q944J083.79224.2e-36152.5

BnaA01g00010D.1Q944J083.79221.4e-35150.6

(3)修改下载的Uniprot2GO_⽂件:

由⽂件格式可知:

为swissprot⽐对结果做GO注释时,第16⾏应改为:

UniProt_GO[lsplit[0]]=lsplit[7]

为NR⽐对结果作GO注释时,第16⾏应改为:

UniProt_GO[lsplit[3]]=lsplit[7]

此外,若在windowsCMD中运⾏,第14⾏需加⼊decode()函数,linux中则不需要:

lsplit=().rstrip().split("t")

(4)运⾏py⽂件进⾏GO注释:

pythonUniProt2GO_s_dia_matches.m8swiss_

swiss_⽂件格式如下表所⽰:

BnaA08g27970DGO:0045271,GO:0016021,GO:0005739,GO:0031966,GO:0009853,GO:0005747,GO:0055114

BnaAnng26910DGO:0031087,GO:0010606,GO:0000932,GO:0017148,GO:0042803,GO:0006397

BnaC07g50970DGO:0042938,GO:0042939,GO:0016021,GO:0042936,GO:0042937,GO:0022857,GO:0006807,GO:0005886,GO:0015031,GO:0009506

(5)⽤R包为GO注释增添EVIDENCE注释:

⽤python将swiss_⽂件的GO条⽬按“,”拆开,存⼊⽂件swiss_go_forStats:

defbefore_GOstat(f1,f2):

nes():

j=('t')

forkinj[1].split(','):

m=j[0]+'t'+k

if(m[-1]!='n'):

m=m+'n'

print(m)

(m)

f1=open('swiss_','r')

f2=open('swiss_go_forStats','w')

()

()

⽂件swiss_go_forStats格式如下表所⽰:

BnaA08g27970DGO:0005747

BnaA08g27970DGO:0055114

BnaAnng26910DGO:0031087

EVIDENCE注释:

>library()

>#keytypes()

>swiss_id<-('swiss_go_forStats',header=F)

>colnames(swiss_id)<-c('gene_id','GO')

>ev_id<-lect(,keys=(swiss_id$GO),columns=c('EVIDENCE'),keytype="GO")

>library(dplyr)

>swiss_goev<-left_join(swiss_id,ev_id[,1:2])

>(swiss_goev,'swiss_goev_forStats',=F,quote=F)

swiss_goev_forStats⽂件格式如下表所⽰:

gene_idGOEVIDENCE

BnaA08g27970DGO:0045271NA

BnaA08g27970DGO:0016021IDA

BnaA08g27970DGO:0016021IBA

BnaA08g27970DGO:0016021IEA

注释(以Brassicanapus为例)

(1)KAAS⽹站⾃动注释my_⽂件,得到基因名称对应的KOlist,保存为all_⽂件,格式如下表所⽰:

MSTRG.6.1K12946

BnaA01g00010D.1K12946

BnaA01g00050D.1K09338

⽂件格式如下:

+DGENESKO

#

KEGGOrthology(KO)-Brassicanapus(rape)

%

本文发布于:2023-01-04 16:14:26,感谢您对本站的认可!

本文链接:http://www.wtabcd.cn/fanwen/fan/90/91599.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

上一篇:calcium
下一篇:listerine
标签:keg
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图