细菌染色体复制起点的确定
What is GC Skewing?
If DNA were random strings of letters, you would expect about half of the G's in a genome to be on the leading strand, and the other half on the lagging strand. However, one strand of DNA often has significantly more than its share of G's (thereby causing the other strand to have significantly more than its share of C's). For example, the origin and terminus of replication in a circular chromosome often have unusually even or unusually uneven distribution of G's and C's. The unevenness, or skew, is measured in a "window," or subquence. By sliding the window along the quence, unusually even or unusually uneven distributions can be located. GC Skew is calculated as (G - C) / (G + C), where G is the number of G's in the window, and C is the number of C's.
Interpreting GC-Skew Graphs
The sample quence on this web page is approximately 10 kb from the lagging strand of
Mycoplasma pneumoniae. The annotated origin of replication is approximately in the center of the quence. Experiment with different window sizes (from 100 to 3000) and step sizes (from 20 to 200) to e which combination is best for finding the origin of replication. The origin of replication is typically associated with a change in sign of the GC-skew. However, there are usually many such changes in sign, especially for smaller window sizes. Therefore, a cond measure, the cumulative skew, is ud. The cumulative skew is simply the running sum of the skew values in each window. The origin of replication is associated with the global minimum of the cumulative skew (or global maximum if the lagging strand is analyzed, as in this example).
quote:/gc_skew/gc_skew.html
伴随基因组测序技术的广泛应用,用生物信息学方法判断基因组的复制起点相应产生了。如GC skewbarber什么意思,GC cumulative skew, Z curve 以及应用多种方法综合判断等。
GC skew 的计算公式为 (nG-nC)/(nG+nC),其中nG(nC)为一特定大小DNA片段(窗口)内G或C的含量。GC skew在染色体复制起始位点、终止点产生明显的由负到正、由
汉译英在线正到负的变化。 该方法是1996年由Lobry [1]通过对大肠杆菌、枯燥芽胞杆菌和流感嗜血杆菌3种细菌基因组的分析发现它们DNA链不同区域的核苷酸组成不对称而建立的,即前导链含有较多的G,而 traffic lights什么意思后随链含有较多的C。因为DNA链在复制起点oriC处改变其从后随链到前导链的复制模式,所以在oriC处会发生GC skew由负到正的变化。该现象随后被其他研究人员在很多细菌研究中证实,此方法进而广泛应用于细菌染色体复制起始位点的判断。造成这种现象的原因主要是 由于在前导链和后随链上选择和突变压力的不同。
在GC skew的基础上,Grigoriev[1]建立了一种累计skew(cumulative skew)的方法。这种方法是从DNA序列的任一位置开始,计算(nG-nC)/(nG+nC),并依次把相邻的(nG-nC)/(nG+nC)累计相 occupy是什么意思加,最大值在复制终点,最小值在复制起点。它的优点是适用于一些GC skew不太明显的微生物,用一般的GC skew作图很难观察GC skew正负值的转变点,但用累计GC skew就很容易看出。另外,累计skew 的图形是一条“V”形的曲线,并非一般GC skew的上下波动的曲线,故更直观 [2,大学英语四级改革3]。
豪斯医生 第六季一些细菌核苷酸分布不对称的明显转变并不是与复制起点相对应的。为了更准确的定位复制起 点,Mackiewicz[4]等利用复制起点序列的特点结合DNA 核苷酸分布不对称性(a),DnaA box 的分布(b),和dnaA 基因的位置(d)建立了新的定位复制起点的方
法。这三种方法确定出来的位置相吻合,则假定的起始位点的可信度最高。因为细菌复制起点序列只在相近物种间是保守的,但是几乎所有的细菌复制起点序列都包括几个成簇分布的DnaA box和一个AT-rich区域,同时复制起始位点经常位于dnaA 基因附近。根据这三种方法确定的复制起始位点结果分成5组:abd,三种方法确定的位置一致;ab,DNA asymmetry 的极值在DnaA box簇附近;ad,DNA asymmetry 的极值和dnaA基因相对应;bd,dnaA基因在DnaA box簇附近;O,每种方法确定出不同的位置。结果中abd组细菌占55%,将这三种特性结合来定位复制起始位点的结果是比较可靠的。英文分组
Z曲线是张春霆院士建立的,它是表示一条DNA序列唯一等价的三维空间曲线,是通过几何学的途径对基因组序列进行研究[5]。Z曲线的计算公式如 下:xn=(An+Gn)-(Cn+Tn), yn=(An+Cn)-(Gn+Tn), zn=(An+Tn)-(Gn+Tn). xn,yn,zn∈[-N,N],n=0,1,2,……,N, 其中An外表英语,Gn,Cn,june怎么读Tn分别是从第一个碱基到第n个碱基长度范围内出现的A,G,C,T的个数。定义A0=G0=C0=T0=0。N是序列的长度。Z 曲线的三个坐标代表不同的生物学意义:x,y,z分别表示purine/pyrimidine学习视频(R/Y),amino/keto(M/K)和 strong-H bond/weak-H bond在基因组序列方向上的分布。Z curve还包含的GC-disparity 曲线(xn-yn)/2和AT-disparity 曲线(xn+yn)/2。细菌染色体复制起点和RY-disparity(xn),MK-disparity(yn),GC-dispa
rity,AT- disparity四种曲线中的一种曲线的一个极值相关联。此外结合Z 曲线和其他oriC序列特征如位于dnaA 基因附近,为AT-rich 区域,包括至少一个DnaA box,以及在相近物种中序列保守等来综合系统的确定复制起点,同时将该方法建立了一个数据库DoriC[6]。该数据库包括了到2008年1月18日为 止的578个基因组,有高达98.4%的定位率。GC skew以及Cumulative skew 方法的分辨率受窗口大小的限制,窗口长度越长分辨率越低,Z curve方法的分辨率则精确到一个核苷酸的长度,所以用此方法定位的准确率高,Zcurve 方法成功确定了用GC skew方法无法确定的梅氏甲烷八叠球菌的复制起点,预测它位于1,564,657 bp与1,556,241 bp之间[7]。
1. Bentley, S.D., and Parkhill, J. (2004) Comparative genomic structure of prokaryotes. Annu Rev Genet, 38, 771-792.
2. Grigoriev, A. (1998) Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res 26: 2286-90.
3. 包其郁,杨焕明 (2002) 原核生物基因组DNA链组成的非对称性. 微生物学报 42: 755-8.
4. Mackiewicz, P. et al. (2004) Where does bacterial replication start? Rules for predicting the oriC region. Nucleic Acids Res 32: 3781-91.
5. Zhang, C.T., Zhang, R., and Ou, H.Y. (2003) The Z curve databa: a graphic reprentation of genome quences. Bioinformatics, 19(5), 593-599.
6. Gao, F. and Zhang, C.T. (2007) DoriC: a databa of oriC regions in bacterial genomes. Bioinformatics 23: 1866-7
7. Zhang, C.T. and Zhang, R. Single replication origin of the archaeon Methanosarcina mazei revealed by the Z curve method. Biochem Biophys Res Commun 297(2):396-400