Trinity_Tutorial_RNA-Seq

更新时间:2023-05-21 17:13:15 阅读: 评论:0

Tutorial    1:    P rocessing    R NA-­‐q    I llumina    P aired    E nd
Data    t hrough    T rinity    D e    N ovo
Trinity partitions the quence data into many individual de Bruijn graphs, each reprenting the transcriptional complexity at a given gene or locus, and then process each graph independently to extract full-length splicing isoforms and to tea apart transcripts derived from paralogous genes. Briefly, the process works like so: •Inchworm asmbles the RNA-Seq data into the unique quences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
•Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster reprents the full transcriptional complexity for a given gene (or ts of genes that
share quences in common). Chrysalis then partitions the full read t among the disjoint graphs.
•Butterfly then process the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and
teasing apart transcripts that corresponds to paralogous genes.
To    r un    T rinity,    w e    c an    u     e ither    p aired    o r    u npaired    r eads.        W hen    u sing    p aired,    y ou    s hould    be    a ble    t o    s ee    /1    i n    o ne    f asta    f ile    a nd    /2    i n    t he    o ther.
<Example    L eft.fq>字母音标
@61DFRAAXX100204:1:100:10494:3070/1 ACTGCATCCTGGAAAGAATCAATGGTGGCCGGAAAGTGTTTTTCAAATACAAGAGTGACAATGTGCCCTGTTGTTT +
ACCCCCCCCCCCCCCCCCCCCCCCCCCCCCBC?CCCCCCCCC@@CACCCCCACCCCCCCCCCCCCCCCCCCCCCCC @61DFRAAXX100204:1:100:10497:13422/1 GTAATTTCCGTACCTGCCACAGTGTGGGCTCACCCTGCTTAGAGGACAGGGAAGGACCCTAAAGGTAGGCTGATGC + CCCCCCCCCCCCCCCCCCCCCCDCDCCCCCCCCCCCCCCCCCCCDDCCDDCDCBDCCDDDDBADDADDB@DBBBA@ @61DFRAAXX100204:1:100:10546:4478/1 CTGGGCTGCAGCTAAGTTCTCTGCATCCTCCTTCTT
GCTTGTGGCTGGGAAGAAGACAATGTTGTCGATGGTCTGG +
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC7CB@CA:>AB?C=C@@@@?A@?5:88:
<Example    R ight.fq>
@61DFRAAXX100204:1:100:10494:3070/2 CTCAAATGGTTAATTCTCAGGCTGCAAATATTCGTTCAGGATGGAAGAACATTTTCTCAGTATTCCATCTAGCTGC +
C<CCCCCCCACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBCCCCCCCCCCCCCCCCACCCCCACCC= @61DFRAAXX100204:1:100:10497:13422/2 GAGTTACTGGTAAGACGCTTACACCTATAACTCAAGGTCGGAATAGTCCCTCCAGTCCCTTTAGTAACCCAGTGGC + CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDCCCCCCCCCCCCCCCCCCCCCCCCCCDCCCCCCCACCC
If you have strand-specific data, specify the library type. There are four library types:
•Paired reads:
o RF: first read (/1) of fragment pair is quenced as anti-n (rever(R)), and cond read (/2) is in the n strand (forward(F)); typical of the dUTP/UDG quencing method.
nuclearenergyo FR: first read (/1) of fragment pair is quenced as n (forward), and cond read (/2) is in the antin strand (rever)
•Unpaired (single) reads:
o F: the single read is in the n (forward) orientation
o R: the single read is in the antin (rever) orientation联系我们英文
Once    w e    h ave    t ransferred    t he    f iles    t o    t he    s erver,    r unning    t rinity    i s    r elatively    s imple    s ince    i t    runs    s tepwi    t hrough    t he    “Trinity”    p ipeline.        N avigate    t o    t he    d irectory    w here    y ou    h ave    your    s equence    f iles    (tutorial1and2    f older),    a nd    r un    t he    f ollowing    c ommand:
Example    P aired    R un:
Trinity.pl --qType fq --left reads.left.fq --right
reads.right.fq --SS_lib_type RF --paired_fragment_length 280  --min_contig_length 305 --CPU 4 --bfly_opts "-V 10 --stderr"
Note:    -­‐-­‐bfly_opts    “-­‐V    10    -­‐-­‐stderr"    i s    s et    s o    t hat    t he    V erbo    l evel    i s    h igh    a nd    w ill    p rint    t o    t he    s creen.
绯闻女孩布莱尔Where -- left <FILENAME> is your filename of you want to process one type of paired end reads,  and --right
<FILENAME> is the filename of the cond type of paired end reads.  Page 1 of the tutorial should describe the difference.
By tting the —SS_lib_type parameter to one of the above, you are indicating that the reads are strand-specific. By default, reads are treated as not strand-specific.
if strand-specific data, t:
--SS_lib_type <string>  :if paired: RF or FR,  if single: F or R
Butterfly-related options:
--bfly_opts <string>    :parameters to pass through to butterfly (e butterfly documentation).  --bflyHeapSpace <string> :java heap space tting for butterfly (default: 1000M) => yields command java -Xmx1000M -jar Butterfly.jar ... $bfly_opts
--no_run_butterfly      :stops after the Chrysalis stage. You'll need to run the Butterfly computes parately, such as on a computing grid.户口本翻译
Inchworm-related options:重庆口才培训
--no_meryl              :do not u meryl for computing the k-mer catalog (default: us meryl, providing improved runtime performance)
--min_kmer_cov <int>            :min count for K-mers to be asmbled by Inchworm (default: 1)  Misc:
--CPU <int>              :number of CPUs to u, default: 2
--min_contig_length <int> :minimum asmbled contig length to report (def=200)
--paired_fragment_length <int>  :maximum length expected between fragment pairs (aim for 90% percentile)  (def=300)
--jaccard_clip    :option, t if you have paired reads and you expect high gene density with UTR overlap (u FASTQ input file format for reads).
Other important considerations:
•Trinity performs best with strand-specific data, in which ca n and antin transcripts can be resolved.  If you do know this, u the –SS_lib_type flag to describe the data.
•Whether you u Fastq or Fasta formatted input files, be sure to keep the reads oriented as they are reported by Illumina, if the data are strand-specific. This is becau, Trinity will properly orient the quences
according to the specified library type. If the data are not strand-specific, now worries becau the reads
will be pard in both orientations.
•If you do not have strand-specific data, and you do not plan to u the —jaccard_clip option, you can combine all your reads into a single fastq or fasta file and u the —single option. You can also combine
paired reads and single reads, as long as the paired reads are recognized by having the same accession
prefix with /1 and /2 to discriminate between paired ends.
钓鱼岛英文•If you have multiple paired-end library fragment sizes, t the —paired_fragment_length according to the larger inrt library. Pairings that exceed that distance will be treated as if they were unpaired by the
Butterfly process. Trinity's defaults are tuned to a library with an ~300 ba fragment length.
•by tting the —CPU option, you are indicating:
o the number of threads for Inchworm to u (in most cas, Inchworm multithreading does not currently lead to performance gains. In future releas, this may change).
o most importantly, the number of Butterfly executions that will occur simultaneously.
星期三 英文For    l oblolly,    t ake    n ote    t hat    t he    –cpu    o ption    c an    b e    s et    t o    4    i nstead    o f    t he    d efault    o f    2.
Tutorial    2:    P rocess    f or    R ead    A lignment,    V isualization,    a nd
Abundance    E stimation    w ith    P aired    E nd
Once    y ou    h ave    f inished    r unning    T rinity.pl,    y ou    c an    p rocess    t his    o utput    t o    v isualize    a nd    g et    abundance    e stimations.
1.Align    r eads    t o    t he    T rinity    t ranscripts    u sing    t he    u til/alignReads.pl    s cript,    w hich    c an
leverage    B owtie,    B LAT,    o r    B WA    a s    t he    a ligner.
Caution should be taken in using this wrapper and the modified tools, becau there are advantages and disadvantages to each, as described below:
a.Bowtie: Abundance estimation using RSEM (as described below) currently leverages Bowtie gap-
free alignments. Running bowtie (original, not the newer bowtie 2…still investigating) with paired
fragment reads will exclude alignments where only one of the mate pairs aligns. Since Trinity
doesn't perform scaffolding across quencing gaps yet, there will be cas (more so in fragmented
transcripts corresponding to lowly expresd transcripts) where only one of the mate-pairs aligns.
The alignReads.pl script operates similarly to TopHat in that it runs Bowtie to align each of the
paired fragment reads parately, and then groups them into pairs afterwards. We capture both the
paired and the unpaired fragment read alignments from Bowtie for visualization and examining
read support for the transcript asmblies. The properly-mapped pairs are further extracted and can
be ud as a substrate for RSEM-bad abundance estimation (e below).
b.BLAT: we've found BLAT to be particularly uful in generating spliced short-read alignments to
targets where short introns exist. We include BLAT here only for exploratory purpos.
c.BWA: the modified version of BWA provides SAM entries for each of the multiply mapped reads
alternative mappings, but grouping of pairs is performed by the alignReads.pl script, and the total
powerless
number of alignments reported tends to be substantially less than running the latest version of
BWA in paired mode without having the multiply mapped individual reads. BWA is
高中数学学习方法
recommended specifically for SNP-calling exercis, and we're continuing to explore the various
options available, including further tweaks here.
2.Run    f rom    t he    t utorial1and2    d irectory:
/opt/trinityrnaq_r2011-10-29/util/alignReads.pl --left reads.left.fq --right reads.right.fq --qType fq --target trinity_out_dir/Trinity.fasta --aligner bowtie --
SS_lib_type RF
Note:    i f    y our    d ata    a re    s trand-­‐specific,    b e    s ure    t o    s et    -­‐-­‐SS_lib_type    a s    d one    w ith    T rinity.pl
3.This    a lignment    g enerates    a    l ot    o f    o utput    f iles.        T he    b dSorted.bam
file    c ontains    b oth    p roperly-­‐mapped    p airs    a nd    s ingle    u npaired    f ragment    r eads.    T his    file    c an    b e    u d    f or    v isualizing    t he    a lignments    a nd    c overage    d ata    u sing    I GV.        T he    *nameSorted*PropMapPairsForRm.bam    c ontains    o nly    t he    p roperly-­‐mapped    pairs    f or    u     w ith    t he    R SEM    s oftware.        W e    w ill    b e    u sing    t he
dSorted.bam    f ile    t o    v isualize    t he    d ata    w ith    I GV.
4.To    u     I GV,    g et    i t    f rom    :    h ttp:///igv/    .
5.Once    y ou    h ave    t he    p rogram    r unning,    u     t he    I mport    G enome    t ool    t o    l oad    t he
Trinity.fasta    f ile    a s    a    g enome.            A lso,    l oad    t he    b dSorted.bam    f ile    containing    t he    a ligned    r eads.        Y ou    w ill    n eed    t o    t ransfer    t he    a ssociated
dSorted.bam.bai    f ile    (the    i ndex),    s o    t hat    I GV    w ill    l oad    t he    b am    f ile.
Note:    I f    a fter    l oading    t he    g enome    a nd    t he    b am    f ile    y ou    s till    c an    n ot    s ee    a ny
data,    u     t he    z oom    t ool    i n    t he    t op    r ight    c orner    t o    z oom    i n.        A lso,    c licking    o n    the    n ucleotides    i n    t he    b ottom    s equence    w indow    w ill    t oggle    a    3    f rameshift
translation.        T his    c ould    t hen    b e    f lipped    b y    r ight    c licking    i n    t his    s ame    w indow    to    g et    t he    o ther    3    f rameshift    t ranslation.
6.RSEM    i s    e normously    u ful    f or    a bundance    e stimation    i n    t he    c ontext    o f
transcriptome    a smblies.        R SEM    c an    b e    d ownloaded    h ere:
deweylab.biostat.wisc.edu/rm/.        H owever,    c urrently    R SEM    i s    i ncluded
with    T rinity    s ince    t hey    h ave    a    s lightly    m odified    v ersion.
7.Run
/opt/trinityrnaq_r2011-10-29/util/RSEM_util/run_RSEM.pl --transcripts
trinity_out_dir/Trinity.fasta --name_sorted_bam
bowtie_out/bowtie_out.nameSorted.sam.+.sam.PropMapPairsForRSEM.bam --paired --group_by_component
This    w ill    r un    R SEM    t o    e stimate    r ead    a bundance.
8.Execute
/opt/trinityrnaq_r2011-10-29/util/RSEM_util/summarize_RSEM_fpkm.pl --
transcripts trinity_out_dir/Trinity.fasta --RSEM sults --
fragment_length 300 --group_by_component | tee Trinity.RSEM.fpkm
This    w ill    s ummarize    t he    R SEM    F PKM    v alues    i nto    a n    e asy    t o    r ead    t ext    f ile    n amed
Trinity.RSEM.fpkm.
Trinity.RSEM.fpkm    F ile
#Total fragments mapped to transcriptome: 24114.01
transcript      length  eff_length      count  fraction        fpkm    %comp_fpkm comp20_c0_q1  349    50      3.00    5.67e-03        2488.18  100.00
comp0_c0_q1  3739    3440    531.56  2.03e-02        6408.03  11.02
comp0_c0_q2  3697    3398    4240.44 1.64e-01        51750.92        88.98
comp9_c0_q1  5528    5229    192.07  4.83e-03        1523.25  12.45
comp9_c0_q2  5399    5100    1317.93 3.40e-02        10716.49        87.55
comp19_c0_q1  433    134    2.00    1.87e-03        618.95  100.00
comp1_c0_q1  6716    6417    699.32  1.43e-02        4519.33  17.66
comp1_c0_q2  6665    6366    2949.41 6.10e-02        19213.17        75.07 comp1_c0_q3  3969    3670    6.08    2.18e-04        68.70    0.27
comp1_c0_q4  3918    3619    123.99  4.51e-03        1420.79    5.55
comp1_c0_q5  3152    2853    0.42    1.93e-05        6.10    0.02
comp1_c0_q6  3101    2802    24.79  1.16e-03        366.89      1.43
comp32_c0_q1  562    263    7.00    3.45e-03        1103.76  100.00
comp10_c0_q1  3823    3524    610.19  2.28e-02        7180.58  90.22
comp10_c0_q2  3715    3416    50.42  1.94e-03        612.09  7.69
comp10_c0_q3  2749    2450    0.00    1.29e-07        0.00    0.00
comp10_c0_q4  2641    2342    9.39    5.27e-04        166.27      2.09

本文发布于:2023-05-21 17:13:15,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/117328.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:重庆   音标   绯闻   翻译   方法   培训   字母
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图