Trinity_Tutorial_RNA-Seq

更新时间:2023-05-16 07:25:06 阅读：评论：0

Tutorial 1: P rocessing R NA-‐q I llumina P aired E nd

Data t hrough T rinity D e N ovo

Trinity partitions the quence data into many individual de Bruijn graphs, each reprenting the transcriptional complexity at a given gene or locus, and then process each graph independently to extract full-length splicing isoforms and to tea apart transcripts derived from paralogous genes. Briefly, the process works like so: •Inchworm asmbles the RNA-Seq data into the unique quences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.

•Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster reprents the full transcriptional complexity for a given gene (or ts of genes that

share quences in common). Chrysalis then partitions the full read t among the disjoint graphs.

•Butterfly then process the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and

teasing apart transcripts that corresponds to paralogous genes.燃气灶十大品牌

To r un T rinity, w e c an u e ither p aired o r u npaired r eads. W hen u sing p aired, y ou s hould be a ble t o s ee /1 i n o ne f asta f ile a nd /2 i n t he o ther.

国子监是什么机构<Example L eft.fq>

@61DFRAAXX100204:1:100:10494:3070/1 ACTGCATCCTGGAAAGAATCAATGGTGGCCGGAAAGTGTTTTTCAAATACAAGAGTGACAATGTGCCCTGTTGTTT +

ACCCCCCCCCCCCCCCCCCCCCCCCCCCCCBC?CCCCCCCCC@@CACCCCCACCCCCCCCCCCCCCCCCCCCCCCC @61DFRAAXX100204:1:100:10497:13422/1 GTAATTTCCGTACCTGCCACAGTGTGGGCTCACCCTGCTTAGAGGACAGGGAAGGACCCTAAAGGTAGGCTGATGC + CCCCCCCCCCCCCCCCCCCCCCDCDCCCCCCCCCCCCCCCCCCCDDCCDDCDCBDCCDDDDBADDADDB@DBBBA@ @61DFRAAXX100204:1:100:10546:4478/1 CTGGGCTGCAGCTAAGTTCTCTGCATCCTCCTTCTT

GCTTGTGGCTGGGAAGAAGACAATGTTGTCGATGGTCTGG +

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC7CB@CA:>AB?C=C@@@@?A@?5:88:

@61DFRAAXX100204:1:100:10494:3070/2 CTCAAATGGTTAATTCTCAGGCTGCAAATATTCGTTCAGGATGGAAGAACATTTTCTCAGTATTCCATCTAGCTGC +

C<CCCCCCCACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBCCCCCCCCCCCCCCCCACCCCCACCC= @61DFRAAXX100204:1:100:10497:13422/2 GAGTTACTGGTAAGACGCTTACACCTATAACTCAAGGTCGGAATAGTCCCTCCAGTCCCTTTAGTAACCCAGTGGC + CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDCCCCCCCCCCCCCCCCCCCCCCCCCCDCCCCCCCACCC

If you have strand-specific data, specify the library type. There are four library types:

•Paired reads:

o RF: first read (/1) of fragment pair is quenced as anti-n (rever(R)), and cond read (/2) is in the n strand (forward(F)); typical of the dUTP/UDG quencing method.

o FR: first read (/1) of fragment pair is quenced as n (forward), and cond read (/2) is in the antin strand (rever)

•Unpaired (single) reads:

o F: the single read is in the n (forward) orientation

o R: the single read is in the antin (rever) orientation

Once w e h ave t ransferred t he f iles t o t he s erver, r unning t rinity i s r elatively s imple s ince i t runs s tepwi t hrough t he “Trinity” p ipeline. N avigate t o t he d irectory w here y ou h ave your s equence f iles (tutorial1and2 f older), a nd r un t he f ollowing c ommand:

Example P aired R un:

Trinity.pl --qType fq --left reads.left.fq --right

reads.right.fq --SS_lib_type RF --paired_fragment_length 280 --min_contig_length 305 --CPU 4 --bfly_opts "-V 10 --stderr"

Note: -‐-‐bfly_opts “-‐V 10 -‐-‐stderr" i s s et s o t hat t he V erbo l evel i s h igh a nd w ill p rint t o t he s creen.

Where -- left <FILENAME> is your filename of you want to process one type of paired end reads, and --right

<FILENAME> is the filename of the cond type of paired end reads. Page 1 of the tutorial should describe the difference.

By tting the —SS_lib_type parameter to one of the above, you are indicating that the reads are strand-specific. By default, reads are treated as not strand-specific.

if strand-specific data, t:

--SS_lib_type <string> :if paired: RF or FR, if single: F or R

Butterfly-related options:

--bfly_opts <string> :parameters to pass through to butterfly (e butterfly documentation). --bflyHeapSpace <string> :java heap space tting for butterfly (default: 1000M) => yields command java -Xmx1000M -jar Butterfly.jar ... $bfly_opts

--no_run_butterfly :stops after the Chrysalis stage. You'll need to run the Butterfly computes parately, such as on a computing grid.

营销组织Inchworm-related options:

--no_meryl :do not u meryl for computing the k-mer catalog (default: us meryl, providing improved runtime performance)

冬至诗词--min_kmer_cov <int> :min count for K-mers to be asmbled by Inchworm (default: 1) Misc:

--CPU <int> :number of CPUs to u, default: 2

--min_contig_length <int> :minimum asmbled contig length to report (def=200)

--paired_fragment_length <int> :maximum length expected between fragment pairs (aim for 90% percentile) (def=300)

--jaccard_clip :option, t if you have paired reads and you expect high gene density with UTR overlap (u FASTQ input file format for reads).

Other important considerations:

•Trinity performs best with strand-specific data, in which ca n and antin transcripts can be resolved. If you do know this, u the –SS_lib_type flag to describe the data.

•Whether you u Fastq or Fasta formatted input files, be sure to keep the reads oriented as they are reported by Illumina, if the data are strand-specific. This is becau, Trinity will properly orient the quences

according to the specified library type. If the data are not strand-specific, now worries becau the reads

will be pard in both orientations.

•If you do not have strand-specific data, and you do not plan to u the —jaccard_clip option, you can combine all your reads into a single fastq or fasta file and u the —single option. You can also combine

paired reads and single reads, as long as the paired reads are recognized by having the same accession

prefix with /1 and /2 to discriminate between paired ends.

•If you have multiple paired-end library fragment sizes, t the —paired_fragment_length according to the larger inrt library. Pairings that exceed that distance will be treated as if they were unpaired by the

Butterfly process. Trinity's defaults are tuned to a library with an ~300 ba fragment length.

•by tting the —CPU option, you are indicating:

o the number of threads for Inchworm to u (in most cas, Inchworm multithreading does not currently lead to performance gains. In future releas, this may change).

o most importantly, the number of Butterfly executions that will occur simultaneously.

For l oblolly, t ake n ote t hat t he –cpu o ption c an b e s et t o 4 i nstead o f t he d efault o f 2.

Tutorial 2: P rocess f or R ead A lignment, V isualization, a nd

Abundance E stimation w ith P aired E nd

Once y ou h ave f inished r unning T rinity.pl, y ou c an p rocess t his o utput t o v isualize a nd g et abundance e stimations.

1.Align r eads t o t he T rinity t ranscripts u sing t he u til/alignReads.pl s cript, w hich c an

leverage B owtie, B LAT, o r B WA a s t he a ligner.

大大大香蕉

Caution should be taken in using this wrapper and the modified tools, becau there are advantages and disadvantages to each, as described below:

a.Bowtie: Abundance estimation using RSEM (as described below) currently leverages Bowtie gap-

free alignments. Running bowtie (original, not the newer bowtie 2…still investigating) with paired

fragment reads will exclude alignments where only one of the mate pairs aligns. Since Trinity

doesn't perform scaffolding across quencing gaps yet, there will be cas (more so in fragmented

transcripts corresponding to lowly expresd transcripts) where only one of the mate-pairs aligns.

The alignReads.pl script operates similarly to TopHat in that it runs Bowtie to align each of the

淡蓝色图片paired fragment reads parately, and then groups them into pairs afterwards. We capture both the

paired and the unpaired fragment read alignments from Bowtie for visualization and examining

read support for the transcript asmblies. The properly-mapped pairs are further extracted and can

be ud as a substrate for RSEM-bad abundance estimation (e below).

b.BLAT: we've found BLAT to be particularly uful in generating spliced short-read alignments to

targets where short introns exist. We include BLAT here only for exploratory purpos.

c.BWA: the modified version of BWA provides SAM entries for each of the multiply mapped reads

alternative mappings, but grouping of pairs is performed by the alignReads.pl script, and the total

number of alignments reported tends to be substantially less than running the latest version of

BWA in paired mode without having the multiply mapped individual reads. BWA is

recommended specifically for SNP-calling exercis, and we're continuing to explore the various

options available, including further tweaks here.

2.Run f rom t he t utorial1and2 d irectory:

/opt/trinityrnaq_r2011-10-29/util/alignReads.pl --left reads.left.fq --right reads.right.fq --qType fq --target trinity_out_dir/Trinity.fasta --aligner bowtie --

SS_lib_type RF

Note: i f y our d ata a re s trand-‐specific, b e s ure t o s et -‐-‐SS_lib_type a s d one w ith T rinity.pl

3.This a lignment g enerates a l ot o f o utput f iles. T he b dSorted.bam

file c ontains b oth p roperly-‐mapped p airs a nd s ingle u npaired f ragment r eads. T his file c an b e u d f or v isualizing t he a lignments a nd c overage d ata u sing I GV. T he *nameSorted*PropMapPairsForRm.bam c ontains o nly t he p roperly-‐mapped pairs f or u w ith t he R SEM s oftware. W e w ill b e u sing t he

dSorted.bam f ile t o v isualize t he d ata w ith I GV.

4.To u I GV, g et i t f rom : h ttp:///igv/ .

5.Once y ou h ave t he p rogram r unning, u t he I mport G enome t ool t o l oad t he

Trinity.fasta f ile a s a g enome. A lso, l oad t he b dSorted.bam f ile containing t he a ligned r eads. Y ou w ill n eed t o t ransfer t he a ssociated

dSorted.bam.bai f ile (the i ndex), s o t hat I GV w ill l oad t he b am f ile.

Note: I f a fter l oading t he g enome a nd t he b am f ile y ou s till c an n ot s ee a ny

data, u t he z oom t ool i n t he t op r ight c orner t o z oom i n. A lso, c licking o n the n ucleotides i n t he b ottom s equence w indow w ill t oggle a 3 f rameshift

城堡简笔画translation. T his c ould t hen b e f lipped b y r ight c licking i n t his s ame w indow to g et t he o ther 3 f rameshift t ranslation.

6.RSEM i s e normously u ful f or a bundance e stimation i n t he c ontext o f

transcriptome a smblies. R SEM c an b e d ownloaded h ere:

deweylab.biostat.wisc.edu/rm/. H owever, c urrently R SEM i s i ncluded

with T rinity s ince t hey h ave a s lightly m odified v ersion.

7.Run

品性

/opt/trinityrnaq_r2011-10-29/util/RSEM_util/run_RSEM.pl --transcripts

trinity_out_dir/Trinity.fasta --name_sorted_bam

bowtie_out/bowtie_out.nameSorted.sam.+.sam.PropMapPairsForRSEM.bam --paired --group_by_component

This w ill r un R SEM t o e stimate r ead a bundance.

8.Execute

/opt/trinityrnaq_r2011-10-29/util/RSEM_util/summarize_RSEM_fpkm.pl --

transcripts trinity_out_dir/Trinity.fasta --RSEM sults --

fragment_length 300 --group_by_component | tee Trinity.RSEM.fpkm

This w ill s ummarize t he R SEM F PKM v alues i nto a n e asy t o r ead t ext f ile n amed

Trinity.RSEM.fpkm.

Trinity.RSEM.fpkm F ile

#Total fragments mapped to transcriptome: 24114.01

transcript length eff_length count fraction fpkm %comp_fpkm comp20_c0_q1 349 50 3.00 5.67e-03 2488.18 100.00

comp0_c0_q1 3739 3440 531.56 2.03e-02 6408.03 11.02

comp0_c0_q2 3697 3398 4240.44 1.64e-01 51750.92 88.98

comp9_c0_q1 5528 5229 192.07 4.83e-03 1523.25 12.45

comp9_c0_q2 5399 5100 1317.93 3.40e-02 10716.49 87.55

comp19_c0_q1 433 134 2.00 1.87e-03 618.95 100.00

comp1_c0_q1 6716 6417 699.32 1.43e-02 4519.33 17.66

comp1_c0_q2 6665 6366 2949.41 6.10e-02 19213.17 75.07 comp1_c0_q3 3969 3670 6.08 2.18e-04 68.70 0.27

comp1_c0_q4 3918 3619 123.99 4.51e-03 1420.79 5.55

comp1_c0_q5 3152 2853 0.42 1.93e-05 6.10 0.02

comp1_c0_q6 3101 2802 24.79 1.16e-03 366.89 1.43

comp32_c0_q1 562 263 7.00 3.45e-03 1103.76 100.00

comp10_c0_q1 3823 3524 610.19 2.28e-02 7180.58 90.22

comp10_c0_q2 3715 3416 50.42 1.94e-03 612.09 7.69

comp10_c0_q3 2749 2450 0.00 1.29e-07 0.00 0.00

comp10_c0_q4 2641 2342 9.39 5.27e-04 166.27 2.09

本文发布于:2023-05-16 07:25:06，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/902793.html

上一篇：疫情期间流调人员工作总结范文(通用9篇)

下一篇：分子复习题

标签：燃气灶营销淡蓝色简笔画品牌

留言与评论（共有 0 条评论）