GeneDoc: Analysis and Visualization of Genetic Variation
离婚起诉书范文
Karl B. Nicholas1, Hugh B. Nicholas Jr. 2, and David W. Deerfield II.2
1Bank of America; 315 Montgomery; San Francisco, CA 94127
2Pittsburgh Supercomputing Center; 4400 Fifth Avenue; Pittsburgh, PA, 15213.
* 20 * 40 * 60
4bp2 : A I SE FNNYGCYCGLGGSGTPVD : 64
1poa : N V.RS DYGCYCGRGGSGTPVD I C.....W : 62
3p2pb : A I SH FNNYGCYCGLGGSGTPVD L C.....Y : 63
1ceh : A I SE FNNYG CGLGGSGTPVD L VLVDN : 68
1pis : AL AIPGSHPL YGCYCGLGGSGTPVDE KNL CK VDNP : 68
1psha : N V.RS DYGCYCGRGGSGTPVD KI C.....W : 62
1ae7 : N N KRP MDYG CGAGGSGTPVD : 62
1buna : N TIP KT N : 63
1clpa : S.GKN YGAYG GC.....N : 59
1ppa : S.TGKN YGSYGCNCG DC.....N : 59
室内设计图1pp2l : S.GRS DC.....N : 59量字开头的成语
1aypc : N.GKE Y GT : 61
1psj : S.AKKS GC.....D : 59
l f mI YGCyCG gg G P D DRCC Hd CY C p
* 80 * 100 * 120 *
4bp2 : ..NNYSYSCS EITCS NN SK..VPY.N KNLD. C........ : 117
1poa : KTYSYECS TLTCKGGNN..APY.N : 118
3p2pb : ESYSYSCS EITCN NN K..APY.N : 119
1ceh : NNYSYSCS EITCS NN K..VPY.N : 123
1pis : YTESYSYSCSNTEITCN NN N..APY. : 124
1psha : KTYSYECS TLTCK NN G..APY.N : 119
1ae7 : KMSAYDYYCG GPYCRNIKK..APY.N : 119
1buna : QSYSYKLT TIICYGA T N..SEY.I : 120
1clpa : KDRYSYSWK TIVC.GENN N TY.NK.YYL LCK..KADAC : 121
1ppa : TDRYSYSWK AIIC.EEKN Y.N.KAY LKCK..KPDTC : 121
1pp2l : TVSYTYSEENGEIIC.GGDD DN YD..WLFP CR.EEPEPC : 122
1aypc : LSYKFSNS RITC.AKQD Y.NK.YYS CR.GSTPRC : 124
1psj : MDVYSFSEE DIVC.GGDD N YN.Y FG CP SEPC : 124
Ysy i C C C Cd aAiCf Y n C Introduction
GeneDoc provides tools for visualizing, editing, and analyzing multiple quence alignments of protein and nucleic acid quences. GeneDoc embeds the tools in an explicitly evolutionary context. This context is most directly expresd as the ability to divide the quences into groups that reflect the division of superfamilies of genes (and proteins) into distinct families. GeneDoc can analyze and visualize the groups either parately or together. Groups can also be contrasted . GeneDoc’s analysis capabilities include statistical tools that allow urs to evaluate explicit biological or evolutionary hypothes expresd in terms of specific groupings of quences (Nicholas and Graves, 1983; Nicholas and McClain, 1995). The visualization tools are strongly integrated with the analysis tools and prent the analysis results in a form that is easily comprehend and to u in prentations. GeneDoc provides an evolutionary context for alignment editing by evaluating changes to the alignment in terms of explicit evol utionary models. GeneDoc’s analysis functions help urs discover which quence residues are important in the structural and functional roles carried out by biological macromolecules.
Editing Tools
辽宁移动官网
GeneDoc’s alignment editing features help overcome the c urrent limitations in multiple quence alignment programs (Nicholas et al., 1995; McClure et al., 1994). Editing can incorporate structural or biochemical information about which residues should be aligned. . GeneDoc's alignment scores are bad on the accumulated knowledge of evolutionary process incorporated in the empirical log-odds scoring matrices. GeneDoc provides such matrices for both protein and nucleic acid quences (Dayhoff et al., 1978; Henikoff and Henikoff, 1992; States et al., 1991, Altschul, 1991). Scores are an objective measure of whether or not specific changes are justified for a given degree of divergence.
GeneDoc offers two different ways to compute a score for any ction of your alignment. The first is sum-of-pairs scoring which involves scoring all of the alignments between the independent pairs of quences and adding the scores together to yield the total alignment score. While sum-of-pairs scoring is less than ideal, it results in alignments that are clor to tho produced by superposition of three dimensional structures than do alignments produced by the heuristic methods. The cond is weighted parsimony scoring, an alignment criterion that is more biologically desirable but impos higher computational requirements (Sankoff and Cedegren, 1983). Weighted
parsimony will result in an alignment that is most congruent with a ur specified phylogenetic tree r
elating the quences. Phylogenetic trees for u with weighted parsimony scoring can be imported in either Phylip or Nexus style tree files, or can be built with the graphical tree building interface in GeneDoc. The tree can also be edited in this interface.
GeneDoc has two editing modes that are kept parate from each other to prevent unintended changes in the parate aspects of the alignment. The first mode is alignment editing mode. Characters in one quence are moved relative to characters in the other quences in this mode. The overall lengths of the quences may be changed by either adding or removing gap characters. Gap characters may be added or removed in three ways: in the quence currently marked by the cursor; to all of the quences except the one marked by the cursor; or to all of the quences. “Grab and drag” arrangement allows quence r esidues to be moved without necessarily changing the number of gap characters in the quence. The cond editing mode is residue editing mode in which the quence residues may be changed from one value to another. This includes changing one quence character to another and changing gap characters into quence characters or vice versa. However, no operation that would change the sum of the quence characters and gap characters is allowed in this mode.
Visualization
GeneDoc’s visualization capabilitie s are built around two residue display modes and six shading modes. The two residue display modes are to display all residues and to display only tho residues that differ from the master quence. The master quence is either the connsus quence for the alignment or for a group within the alignment or the first quence within the alignment or a group within the alignment. The two residue display modes can be combined with any of the six shading modes.
Three of the shading modes are actually visual displays of widely ud analys of multiple quence alignments. Conrvation mode produces a display that highlights alignment columns that show from 1 to 4 ur defined levels of conrvation. Quantify mode highlights the 1, 2, or 3 most frequent residues found in each column of the alignment, which focus attention on the quence positions that have evolved with a similar pattern of differentiation even though the actual residues at the position may differ. In both conrvation and quantify mode the ur ts the colors ud for the highlighting and determines whether or not to treat conrvative substitutions as if they are identical (e.g., I, L, V, M). Physiochemical properties mode analyzes each alignment position in terms of the hierarchical t of amino acid properties similar to tho propod by Dickerson and Geis (1969) and each position is shaded to identify the most exclusive t to which all of the amino acids at that
position can be assigned. The other three shading modes also highlight alignment position according to an analysis. However the analysis is either largely (property shading mode) or entirely (structure and manual shading modes) under the control of the ur. The property shading mode allows the ur to divide the possible quence residues into an arbitrary number of ts each assigned its own coloring scheme. The colors can then be applied to tho columns where the property identified with the t is conrved or they can be applied to every residue in the alignment.
The structure shading mode allows urs to define an arbitrary number of states that the quence residues may inhabit and assign colors to each state. Urs can import information about protein condary structure or RNA folding and color specific residues in a particular quence, a group of quences, or the entire alignment according to that structural information. GeneDoc has provisions for importing state information from the Protein Structure databa (PSdb) (Deerfield and Geigel, 1996), DSSP (Kabsch and Sander, 1983), both are derived from Brookhaven PDB files. State information may also be imported from many of the structure prediction programs on the EMBL rver, or as ur defined values of from the reformatted version of the 3D_ALI databa (Pascarella and Argos, 1992) available on the GeneDoc web site. Ur defined values require a file that assigns the residues of a specific quence to states defined in a file of ur created state definitions. The re
sidues in the specific quence will be highlighted in the corresponding color. This shading may be extended to the other quences in the alignment or only to tho in the same group as the original quence. It is possible to shade every quence in the alignment individually in this manner.
Manual shading allows the ur to assign specific colors to individual residues with point and click ea.
Analysis
Many of GeneDoc's analys are Kolmogorov-Smirnov (K-S) analys of pairs of cumulative distribution functions (Sokal and Rohlf, 1995). K-S analys provide a rigorous asssment whether two distributions are different.
The difference can be either in the location or shape of the distributions. Thus, K-S tests are more broadly bad
than more common tests like Student's T test or the F test. The K-S tests u distributions of alignment scores or comparisons of quences in terms of the percentage of identities between a pair of aligned quences. Probably the most uful test is the analysis of whether the scores for pai
rs of quence within the same group are smaller than the scores for pairs of quences that are in different groups. A positive result for this test indicates that the grouping categories are systematically reflected in the quences (Nicholas and Graves, 1983; Nicholas and McClain, 1995).
There are two types of contrast analysis that contrast the quences within one group with tho in the other groups on a position by position basis. The PCR contrast highlights sites that meet two criteria. First is that a single residue is completely conrved within the group. Second, this conrved residue does not appear, at that position, in any quence outside of the group in which it is conrved.
The group contrast analysis is less restrictive within the group than is the PCR contrast analysis. In the group contrast analysis all of the quence residues at a site are required to have a positive similarity score with each other. Residues outside of the group must have a negative similarity score with every residue from within the group.
Files
GeneDoc GCG’s msf file format as its primary file type using the header region to store information about residue display and shading modes along with large amounts of ur configuration choices. In
addition to the msf files, quences may be read from or written to Clustal W aln files, Pearson FASTA files, and PIR formatted files. Aligned quences can also be written to Phylip interleaved files.
Graphic results can be nt to the printer or to a Postscript file by using an appropriate printer driver. Highlighted results can also be exported in Windows Enhanced Meta Files or in Macintosh style PICT files.
Summary
GeneDoc is a full featured multiple quence alignment visualization, editing, and analysis tool. It has an
easy-to-u point and click ur interface with extensive keyboard mapping for advanced urs. In addition to the features described above there are many more features and additional details in the extensive context nsitive help files that comes with the program. Figure 1 shows an alignment of 13 phospholipas A2. The shading for each quence indicates the condary structure state of the residue as derived from the three dimensional coordinates taken Brookhaven PDB file that is ud to label the quence. The condary structure states were computed using the four state PSdb model
(Deerfield and Geigel, 1996). The alignment and PSdb files ud to create the figure are available on the GeneDoc web site.
GeneDoc version 2.1 runs on any IBM compatible personal computer under Windows 31, Windows 95 or Windows NT. It can be obtained at no cost over the World Wide Web at: /~Ketchup/genedoc.shtml. Thanks to Rusll Malmberg a version that runs on DEC Alpha workstations under Windows NT is available at: dogwood.botany.uga.edu/malmberg/software.html. GeneDoc has benefited from the comments, suggestions, and error reports from a number of early urs. Additional feedback is welcomed by KBN at
“”.
References
Altschul, S.F. 1991. J. Mol. Biol., vol. 219, pp. 555 - 565.
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. 1978. In "Atlas of Protein Sequence and Structure" vol. 5(3)
M.O. Dayhoff (ed.), National Biomedical Rearch Foundation, Washington. pp. 345 - 352.
Deerfield, D.W., II and Geigel, J., 1996. www.psc.edu/biomed/pages/rearch/PSdb/PSdbPaper/
Henikoff S. and Henikoff, J.G. 1992. Proc. Natl. Acad. Sci. USA. vol. 89, pp. 10915 - 10919.
Dickerson, R.E. and Geis, I. 1969 The Structure and Actions of Proteins. . pp. 16 - 17. Harper &
Row Publishers, New York, NY
天柱山Kabsch, W. and Sander, C. 1983 Biopolymers, vol. 22, pp. 2577 - 2637.
McClure, M.A., Vasi, T.K., and Fitch, W.M. 1994 Mol. Biol. Evol. vol. 11, pp. 571 - 592.
Nicholas, H.B. Jr., and Graves, S.B. 1983 J. Mol. Biol., vol. 171, pp. 111 - 118.
Nicholas, H.B. Jr. and McClain, W.H. 1995. Journal of Molecular Evolution, vol. 40, pp. 482-486.
如何备课
兰州石化研究院Nicholas, H.B. Jr., Ropelewski, A.J., Deerfield, D.W. II., and Behrmann, J.G. 1995 Proceedings of the 10th International Conference on Methods in Protein Structure, Eds. M.Z. Atassi and E. Appella,
Plenum Press, New York pp. 515 - 525.
Pascarella, S. and Argos, P. 1992 Prot. Engng., vol. 5, pp. 121 - 137.
Sankoff, D. and Cedegren, R.J. 1983. In "Time Warps, String Edits, and Macromolecules: The
Theory and Practice of Sequence Comparison." D. Sankoff and J.B. Kruskal (eds.) pp. 253 - 263.
Addison-Wesley, Reading, MA
Sokal, R.R. and Rohlf, F.J. 1995 Biometry, 3rd ed. W.H. Freeman & Co. New York, NY.风云巨变
States, D.J., Gish, W., and Altschul, S.F. 1991. Methods, vol. 3, pp. 66 - 70.