剪切位点预测方法

更新时间:2023-06-11 18:53:23 阅读: 评论:0

Splice Site Tools A Comparative Analysis Report
Beth Hellen
Contents
Introduction 3 Methods 4 Results 5 Conclusions 9 References 10 Appendix 1  Variants found in literature 11
Introduction
Splicing is a process which modifies mRNA after transcription.  It allows for introns to be
removed and exons joined together to form mature mRNA, ready for translation into protein.
The splice site junction, found where an intron meets an exon, contains multiple quence motifs.  The motifs provide signals to allow for correct splicing to occur.  The best characterid
of the are the acceptor and donor splice site signals.  The signals consist of invariant dinucleotides at positions +1, +2, -1 and -2 of the intron and less well conrved nucleotides both within the immediate adjoining exonic quence and deeper into the intron from the +3 and -3 positions (Seif et al., 1979). The specific splicing of a gene can be easily affected by mutations in the quence surrounding the splice site junction.  This can lead to alternate splicing and thus adverly affect the translated protein (Novoyatleva et al., 2006; Tazi et al., 2009).
In-silico splice site prediction tools can be ud to predict the effect of a genetic variant on splicing.  A large number of prediction tools are currently available, either as standalone programs or as part of the Alamut (www.interactive-
orithms, the performance of the algorithms have not been formally assd and may give divergent results.  This analysis aims to provide an asssment of the performance of the algorithms in the prediction of splicing-related variant pathogenicity.  It will also asss the scope of the splice-site prediction tools to ensure that they can be ud in the most appropriate way.  The analysis will allow scientists to u splice site prediction tools in the prediction of pathogenesis with more confidence.
In this analysis, six of the most common donor and acceptor prediction algorithms have been assd for their ability to predict the pathogenicity of splice site variants.  The algorithms chon were tho suggested by the UV guidelines, plus MaxEntScan, which are ud as part of the Alamut and HSF splicing interfaces.  The six algorithms were: GeneSplicer (Pertea et al., 2001), Human Splicing Finder (HSF) (Desmet et al., 2009), MaxEntScan (Yeo & Burge, 2004), NetGene2 (Brunak et al., 1991), NNSplice (Ree et al., 1997) and SSFL, an algorithm bad on Alex Dong Li’s Splice Site Finder (no longer available).  In each algorithm the splice signal given by the wild type quence is compared to the splice site signal given by a mutated quence supplied by the ur.
Methods
彩虹的意义
Six algorithms were assd for their ability to predict disruption to normal splicing patterns, caud by genetic variants. SSFL, MaxEntScan, NNSplice and GeneSplicer were accesd through the Alamut interface.  HSF and a cond implementation of MaxEntScan were accesd through the HSF interface.  Netgene2 was implemented using a stand alone web interface.  The majority of the methods were chon becau they had been recommended by the UV guidelines; MaxEntScan was included becau it is ud in both the HSF and Alamut splicing interfaces.  A t of 265 pathogenic variants and 15 non-pathogenic variants from a total of 180 genes (e figure 1 and appendix 1) were retrieved from the literature.  The variants were ud to asss the splice site prediction algorithms using their default ttings and recommended lengths of quence.  Sensitivity (equation 1), specificity (equation 2) and accuracy (equation 3) were
calculated, as were the standard errors for each of the statistics.  For the purpos of this analysis a true positive was defined as a pathogenic variant correctly classified as pathogenic and a true negative was a non-pathogenic variant correctly classified as non-pathogenic.  A change in splice site signal of ≥10% was considered to predict a pathogenic effect.
(1)
(2)
(3)
A cond t of nsitivity, specificity and accuracy calculations were made for tho variants which did not fall into the invariant di-nucleotide positions at -1, -2, +1, +2.  The datat
consisted of 110 pathogenic variants and 15 non-pathogenic variants.  The variants occurred in 83 different genes.  This analysis will allow the algorithms to be assd on their performance with the more difficult splice site variants.
The UV guidelines for splice site analysis recommend the u of three prediction algorithms to give a connsus prediction.  Combinations of three high performing algorithms were compared to determine whether the accuracy was improved. The criteria required to categori a variant as pathogenic or non-pathogenic was that at least two of the algorithms must agree on the prediction. 
The accuracy scores were calculated and compared to tho given by the single algorithms.
曲奇饼干做法To test the range of predictions made by the algorithms at each intronic position near the splice site junction, an in-silico analysis was performed. Thirteen acceptor and donor splice site
junctions from BRCA1 and BRCA2 were analyd.  Only junctions where the wild type splice site signal was found by all four of the highest performing algorithms were ud.  The wild type ba at each position from +1 to +10 or -1 to -10 was artificially mutated in-silico to each of the
remaining 3 nucleotides and the proportional change in splice site signal given by each algorithm was recorded.  The mean change in splice site prediction (equation 4) at each position was plotted for each algorithm.  The mean change in splice site signal strength is described in
李白腾云equation 4, where SS M is the mutated splice site signal, SS W  is the wild type splice site signal and N is the number of examples analyd.
(4)
Results
香蕉船
Pathogenic and non-pathogenic splice site related variants retrieved from the literature were found at a range of positions relative to the splice site junction (Figure 1). The  majority of  splice site related  pathogenic mutations ud in this analysis were found within intronic positions
between 1 and 10 nucleotides from the splice site junction.  However, >40 of the variants were found in positions within the exon, and pathogenic mutations were also found at >100bp from the splice site junction.  Only 15 non-pathogenic variants were found and they mainly occurred at positions further from the splice site junction.  The small number of non-pathogenic variants aris from the problem of non-reporting of negative results.  This is likely to increa the error associated with the specificity scores.
送别的诗句-40-200
2040
010203040506070
Intronic_position
F r e q u e n c
复合函数求积分y
Figure 1  Chart showing the position of variants retrieved from the literature.  Variants in exonic positions are shown at 0, variants >50bp from the splice site junction are binned and reprented as a
single frequency at 50bp from the splice site. Black lines reprent the frequency of pathogenic variants and red lines reprent the frequency of non-pathogenic variants.
The nsitivity, specificity and accuracy scores showed that the four highest performing algorithms were NNSplice, MaxEntScan, GeneSplicer and SSFL (Figure 2).  The algorithms
achieved between 80 and 92% accuracy and nsitivity.  The specificity scores (between 73 and 93%) were less reliable due to the smaller number of variants tested.  The four algorithms are tho implemented through the Alamut interface.  It is possible that the ea of interpretation of the results, when using the Alamut interface, has influenced this result.  With the HSF interface it was more difficult to determine the predicted difference in splice site signal.
Figure 2Accuracy, Sensitivity and Specificity values for each of the splice site prediction algorithms tested.  Sensitivity measures the ability to predict pathogenic variants (TP) and specificity measures
the ability to predict non-pathogenic variants (TN).
混合烟>好脏的哈利The removal of variants occurring at +1, +2, -1 and -2 positions reduced the performance of the algorithms, as was expected (Figure 3). However, two algorithms (MaxEntScan & NNSplice) still achieved an accuracy score of >80%.  Therefore it can be en that the algorithms perform reasonably well, even with variants where it is more difficult to predict the splicing effect.
Figure 3Accuracy, Sensitivity and Specificity values for each of the splice site prediction algorithms t
ested. Only variants which did not occur at one of the +1, +2,-1 or -2 positions were analyd.
The accuracy given by the connsus prediction of splice site signals was found to be between 86% and 92% for all combinations (Figure 4).  The highest accuracy obtained through a connsus method was comparable to that given by MaxEntScan when implemented through Alamut.  None of the connsus methods achieved an accuracy that was significantly higher than the individual algorithms.

本文发布于:2023-06-11 18:53:23,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/931581.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:复合   函数
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图