J. Japan. Soc. Hort. Sci. 78 (1): 23–31. 2009.
Available online at www.jp/brow/jjshs1
JSHS © 2009
Review
Web Databas for Omics Data in Tomato
Ayako Suzuki1**, Keita Suwabe2** and Kentaro Yano1*
1Faculty of Agriculture, Meiji University, Kawasaki 214-8571, Japan
2Graduate School of Life Sciences, Tohoku University, Sendai 980-8577, Japan
Tomato (Solanum lycopersicum) is a model plant of the Solanaceae family. Various biological aspects of tomato, such as the mechanisms of its development and physiology, have been investigated with molecular biological approaches. In addition, the number of genome quences and expression quence tags in the International Nucleotide Sequence Databas has been rapidly increasing. Geno
me and transcript quence data have facilitated other large-scale omics studies using bioinformatics analys. Recently, omics data, including experimental materials, quences, gene expression, gene and protein functions, and metabolic pathways have become available from various web databas. This wealth of comprehensive online resources allows for the extraction of esntial new biological information. In this review, we summarize the current status of omics databas for tomato rearchers.
Key Words:bioinformatics, databa, omics, tomato.
Introduction
Tomato (Solanum lycopersicum, formerly Lycopersicon esculentum, 2n=24) is a vegetable crop consumed worldwide and a model plant of the Solanaceae family, which includes various vegetable crops (e.g., potato, pepper, and eggplant), ornamental plants (petunia), medicinal plants (Capsicum and Datura), and plants for other us such as tobacco. Various biological aspects of tomato such as carotenoid biosynthesis, hormonal effects, fruit development, and pathogenesis have been investigated with physiological and genetic approaches for a long time (Bramley, 2002; Gorguet et al., 2005; Hagimori et al., 2005; Pedley and Martin, 2003). In 1986, the first extensive linkage map in
higher plants was generated with restriction fragment length polymor-phism (RFLP) markers using tomato (Bernatzky and Tanksley, 1986).
Since DNA quencing approaches began in the 1980s, tomato quences have been deposited in the International Nucleotide Sequence Databas (INSD) (Brunak et al., 2002) maintained by DDBJ (Sugawara et al., 2008), EMBL (Cochrane et al., 2008), and GenBank (Benson et al., 2008). In 2004, the international rearch community launched the internationally coordinated Solanaceae Genome Project (SOL) consor-tium, one aim of which is to quence the genome of tomato cultivar ‘Heinz 1706’ (Mueller et al., 2005b). Due to recent progress in that tomato quencing project, the number of tomato genome survey quences (GSS) has been rapidly increasing (Table1). Apart from genome quencing, approximately 260,000 tomato expression quence tags (ESTs) are now available from INSD (Table1); this is the largest number of ESTs among vegetable crops (Table2). Moreover, full-length cDNAs (high throughput cDNA quences; HTCs) from the miniature tomato cultivar ‘Micro-Tom’ have also accumulated in INSD. The rapid accumulation of tomato quence data in INSD enables comparative genomic studies between tomato and other plants. In such analys, the complete genome quences of Arabidopsis thaliana (120Mb) (Arabidopsis Genome Initiative, 2000), rice (Oryza sativa; 390Mb) (International Rice Genome Sequencing Project, 2005),
and Populus trichocarpa (480Mb) (Tuskan et al., 2006) help to identify orthologous genes and tomato-specific genes among them. Information on tomato-specific genes would facilitate the elucidation of gene functions and metabolic pathways that are characteristic of tomato. Information on genome and transcript quences has facilitated other omics studies in tomato. Several cDNA microarray platforms have been designed to analyze genome-wide gene expression profiles in tomato (Alba et al., 2004; Frick and Schaller, 2002; Yano et al., 2006a)
Received; September 14, 2008. Accepted; November 8, 2008.
*Corresponding author (E-mail: iji.ac.jp).
**The authors contributed equally to this review article.
23
A. Suzuki, K. Suwabe and K. Yano 24
and recently, metabolite profiling has also been intensively investigated (Iijima et al., 2008; Saito et al.,2008b, 2008c). The kinds of ‘omics data’ of tomato,such as functional and structural annotations of the genome and gene products, expression of genes, and metabolic pathways, are av
ailable from various web databas. The public data have the capability to contribute not only to each omics study but also to the emerging field of systems biology. Pioneer studies and developments in tomato functional genomics have been well covered in previous reviews (Fray and Grierson,1993; Mysore et al., 2001; Rick, 1991; Shibata, 2005;Yano et al., 2007). Here, we review the current status of web databas that provide fundamental omics data
and focus on their potential for rearch in experimental biology in tomato. The web sites mentioned in this article are summarized in Table 3.
Linkage maps
Information on tomato linkage maps is available on two websites, The SOL Network (SGN) and the National Center for Biotechnology Information (NCBI). SGN provides information on ven genetic linkage maps that were constructed from gregating populations and inbred lines derived from cross between tomato cultivars and cloly-related wild species (Mueller et al.,2008). Many kinds of DNA markers, such as RFLPs,single nucleotide polymorphisms (SNPs), cleaved
Table 1.Tomato quences provided by NCBI.z High-quality nucleotide quences.y
Expresd quence tag.x
Genome Survey Sequence.
Protein
Nucleotide
Nucleotide z
EST y
GSS x
Total Jan., 199545950000500Jan., 20001,1991,22150,2411,25152,713Jan., 20041,8953,254150,51911,895165,668Jan., 20052,0545,425188,55611,895205,876Jan., 20062,4075,821198,848184,832389,501Jan., 20072,7986,969250,953319,031576,953Jan., 2008
2,964
9,904
258,789
319,461
588,154
Table 2.Twenty plants ranked on the basis of the number of publicly available ESTs.
z
The number of expresd quence tags (ESTs), genome survey quences (GSS), high-quality nucleotide quences (Nucleotide) and proteins were derived from NCBI on August 20, 2008.
y
Genome sizes and the numbers of haploid chromosomes (n ) were derived from the Entrez Genome Project databa (bi.v/entrez/query.fcgi?db =genomeprj, December 2, 2008).x
ND; no data in the Entrez Genome Project databa.
Species
EST z GSS z Nucleotide z Protein z Genome size (Mb)y
n y Arabidopsis thaliana (thale cress)1,526,133487,777230,830138,4931205Zea mays (maize)1,464,8592,091,938161,76622,0082,40010Oryza sativa (rice)1,220,877286,566215,275224,28339012Triticum aestivum (wheat)1,050,79151,8308,7345,28816,00021Glycine max (soybean)
838,858368,512
5,8443,4811,20020Hordeum vulgare + subsp. vulgare (barley)392,5011789,0712,8535,0007Vitis vinifera (wine grape)352,984229,27291,21755,88050019Pinus taeda (loblolly pine)328,6285,393
4,0352,088ND x ND Physcomitrella patens subsp. Patens 305,606078,50672,12551027Picea glauca
272,46441,8791,8313,00017Solanum lycopersicum (tomato)259,921319,461
12,1643,26695012Malus × domestica (apple tree)255,659351,6411,30975017Saccharum officinarum (sugarcane)252,6870642491ND ND Medicago truncatula (barrel medic)249,625168,8155,3885,2015008Solanum tuberosum (potato)230,804141,4713,6723,236840
失败后的励志句子12Sorghum bicolor (sorghum)209,814794,962
10,6924,73276010Citrus sinensis 203,75203284753809Lotus japonicus 157,95146,569
112,9847334706Helianthus annuus 94,11505,7402,5603,00017Populus trichocarpa
89,943
297
115,060
4,394
480
19
J. Japan. Soc. Hort. Sci. 78 (1): 23–31. 2009.25 Table3.Major public databas containing biological information on tomato.
Databa Contents
Sequence databas
INSD
(www.insdc/)The International Nucleotide Sequence Databas, maintained by DDBJ, EMBL and GenBank.
DDBJ
(www.ddbj.nig.ac.jp/)Nucleotide quence databa at the National Institute of Genetics (NIG) in Japan.
EMBL
(www.ebi.ac.uk/embl)Nucleotide quence databa at the European Bioinformatics Institute (EBI) in the UK.
GenBank
(bi.v/Genbank/index.html)Nucleotide quence databa maintained by the National Center for Biotechnology Information (NCBI) in the USA.
dbEST
(bi.v/dbEST/index.html)
EST databa for each organism in NCBI.
UniGene
(bi.v/entrez/query.fcgi?db=unigene)
Unigene databa for each organism in NCBI.
dbGSS
龙卷风怎么画(bi.v/dbGSS/index.html)
Genome survey quences (GSS) databa for each organism in NCBI. Tomato Integrated Databas
The International Tomato Sequencing Project
(ll.edu/about/tomato_quencing.pl)
Web-site of the International Tomato Sequencing Project.
SOL
(ll.edu/solanaceae-project/)
Web-site of the International Solanaceae Genome Project.
SGN
(ll.edu/index.pl)Genomic, genetic and taxonomic information for species in the Solanaceae and related families.
DFCI Tomato Gene Index
(compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=tomato)
Databa for publicly available ESTs and unigenes for tomato.
MiBASE
(jp/jsol/microtom/)Databa for tomato unigenes with ESTs from Micro-Tom, gene expressions, metabolic pathways, gene ontologies.
TomDB
(mips.gsf.de/proj/plant/jsf/tomato/index.jsp)
Databa for the tomato genome databa. Tomato Full-length cDNA
KaFTom
(www.jp/kaftom/index.html)Databa for ESTs and full-length quences from tomato full-length cDNA libraries and their annotations.
Chromosome maps
Solanum lycopersicum (tomato) genome view
(bi.v/mapview/i?taxid=4081)
大学生贫困申请书范文Databa for tomato linkage maps in NCBI. Gene expression
GEO
(bi.v/geo/)Providing data from microarray, rial analysis of gene expression (SAGE), and mass spectrometry proteomics.
ArrayExpress
(www.ebi.ac.uk/Databas/microarray.html)
Databa for gene expression data from microarray experiments in EBI.
SGED
(www.tigr/tdb/potato/SGED_index2.shtml)
Databa for Solanaceae expression data using potato cDNA microarrays.
幼儿园制度
草酸钙TED
(ll.edu/)Tomato microarray data warehou and databas for tomato microarray expression data and tomato digital expression data.
CGEP
(ll.edu/CGEP/CGEP.html)Web-site of The Center for Gene Expression Profiling (CGEP) for high quality tomato cDNA microarrays.
Metabolite and Metabolic Pathway
KEGG
鹅蛋有什么好处
(jp/kegg/)Databas for metabolic pathways, genes, protein families, ligands, drugs, dias and so on.
SolCyc
(ll.edu/LYCO/rver.html)
Tomato metabolic pathway databa.
TOMET
(ll.edu)Tomato Metabolite Databa (TOMET) contains data on metabolites suc
h as ascorbate, carotenoids and sugars.
Metabolome Tomato Databa (MoTo DB) (appliedbioinformatics.wur.nl/moto/)Metabolite databa dedicated to liquid chromatography-mass spectrom-etry-bad metabolomics of tomato fruit.
Genomic Resources
TGRC
(tgrc.ucdavis.edu)Genebank of wild relatives, monogenic mutants and miscellaneous genetic stocks of tomato at Tomato Genetics Resource Center.
Others
RAP-DB
(rapdb.lab.nig.ac.jp/)Databa of The Rice Annotation Project providing access to rice annotation data.
TAIR
(www.arabidopsis/)The Arabidopsis Information Resource (TAIR) which releas a databa of genetic and molecular biology data for Arabidopsis thaliana.
Populus trichocarpa Genome
(genome.jgi-psf/Poptr1/Poptr1.home.html)
Databas for Populus trichocarpa genome.
TIGR
(www.tigr/index.shtml)Web-site of The Institute for Genomic Rearch (TIGR) providing information for analyzing genomes.
The NCBI Entrez Genome Project databa
(bi.v/entrez/query.fcgi?db=genomeprj)Databa for complete and incomplete large-scale quencing projects for cellular organisms.
Gene ontology
(tology/index.shtml?all)
Web-site of Gene Ontology Consortium.
Plant ontology
(www.plantontology/index.html)
Web-site of Plant Ontology Consortium.
A. Suzuki, K. Suwabe and K. Yano 26
amplified polymorphic quence (CAPS), and simple quence repeats (SSRs), have been assigned in the maps. Currently, the map “Tomato-EXPEN1992”, bad on an F2 population from cultivar ‘VF36-Tm2a’ and S.pennellii (formerly L.pennellii) ‘LA716’, has two CAPS and 919 RFLP markers (Tanksley et al., 1992).“Tomato-EXHIR1997”, derived from an interspecific backcross of cultivar ‘TA209 (E6203)’ and S.habrochaites (formerly L.hirsutum) ‘LA1777’, con-sists of 134 RFLP markers (Bernacchi and Tanksley, 1997). “Tomato-EXPEN2000”, bad on an F2 population of S.lycopersicum ‘LA925’ and S.pennellii ‘LA716’, includes 1,088 CAPS, 1,342 RFLP, 19 SNP, and 155 SSR markers (Fulton et al., 2002). “Tomato-EXPIMP2001”, derived from a cross of cultivar ‘TA2
09’with backcross (BC) and backcross recombinant inbred lines (BCRILs) of S.pimpinellifolium (formerly L.pimpinellifolium) ‘LA1589’, includes one CAPS and 143 RFLP markers (Doganlar et al., 2002; Grandillo and Tanksley, 1996; Tanksley et al., 1996). “Tomato-EXPIMP 2008”, derived from a cross of cultivar ‘TA492’ and S.pimpinellifolium ‘LA1589’, includes 36 CAPS, 68 RFLP, and 77 SSR markers. Two other linkage maps were constructed from introgression lines (ILs) between S.pennellii and S.lycopersicum ‘M82’ bad on DNA markers in the maps “EXPEN1992” and “EXPEN2000”. All this information is in the progress of being combined. NCBI provides ven linkage maps:“Gene_92” with 231 markers, “LEXHIR_97” with 135 markers, “LEXLP_86” with 112 markers, “LEXLP_92”with 692 markers, “LEXPIMP_01” with 145 markers,“Paterson_88” with 67 markers, and “Bonierbale_88”with 126 markers. SGN and NCBI provide map viewers that allow identification of the relationships between the different linkage maps.
Bacterial artificial chromosomes and physical
maps
SGN provides fundamental information relevant to tomato genome quencing, such as plant materials, genes, ESTs, DNA markers, bacterial artificial chromo-some (BAC) clones anchoring linka
ge maps (called “ed BAC” clones) and their nucleotide quences, linkage maps, and physical maps. For physical mapping and whole-genome quencing, BAC libraries were constructed from the tomato cultivar ‘Heinz 1706’ using Hin d III, Mbo I, and Eco R I partially digested DNAs (Budiman et al., 2000). BAC libraries reprent more than 15-fold coverage of the tomato haploid genome and contain 400,000 clones, with an average inrt size of 117.5kb. For genome quencing, 88,642 BACs of the libraries were applied to generate fingerprints of the tomato genome with the objective of BAC-by-BAC quencing at the Arizona Genome Institute(Mueller et al.,arizona.edu/fpc/ tomato/). For construction and anchoring of a tomato physical map of “Tomato-EXPEN2000”, the finger-printed BACs were hybridized with 1,536 overlapping oligonucleotide (called “overgo”) probes, which were generated from markers on the current high density linkage map (Cai et al., 1998) for analysis. The overgo probes that matched corresponding BACs are referred to as “anchor points”, and more than 650 anchor points on the map are currently available. The anchored BAC clones (ed BAC clones) have been quenced as the starting point of the tomato genome quencing project. The chromosome positions of the BAC clones were also verified by fluorescence in situ hybridization (FISH). A FISH map has been constructed with labeled BAC probes at the pachytene pha (Mueller et al. 2005a; ll.edu/cview/map.pl?map_id=p9) and currently, 10,258 BACs are linked to the FISH map.
Genomic quencing and annotation
SGN provides up-to-date information on the progress of the genome quencing project. The methodology for the project is bad on deep BAC end quencing from 400,000 BAC clones. The current aim of the quencing project is to completely quence the entire euchromatic regions (approximately 220Mb), which contain the majority of genes in the tomato genome, on the distal portions of the arms of each chromosome. In the tomato genome (approximately 950Mb) (Arumuganathan and Earle, 1991), the other regions are pericentromeric heterochromatin, which is largely devoid of genes (http: //ll.edu/about/
tomato_project_overview.pl, December 2, 2008) and includes a higher ratio of repetitive quences, which interrupt whole genome quencing. Approximately 1,500 ed BAC clones have been anchored to the tomato high density genetic map for guidance in genome quencing (Mueller et al., 2005a). To date, 31% of the quencing of euchromatic arms has been completed through the collaboration of 12 countries.
SGN also provides information about gene quences, exons inferred by computational methods, homologous quences of tomato ESTs, tomato unigenes (described below), Arabidopsis protein
quences, and potato ESTs. Genomic quences, BAC end quences, and homolo-gous quences are graphically visualized with the multiple alignment viewer “Genome browr (Gbrow)” (Stein et al., 2002).
In addition to this project, another proposal for genome quencing of a clo wild relative, S.pennellii, is in progress. The quencing projects are part of the SOL-100 project with the objective of quencing and phenotyping 100 diver Solanaceae species.
Full-length cDNA quences Information on tomato full-length cDNA clones is provided by INSD and KaFTom. Full-length cDNA clones are fundamental resources for functional genomics, as well as for the detection of intron-exon boundaries. The first full-length tomato cDNA library
J. Japan. Soc. Hort. Sci. 78 (1): 23–31. 2009.27
00后男明星was constructed from the fruit of ‘Micro-Tom’ (Tsugane et al., 2005) and then full-length cDNA libraries were constructed from the fruit at different developmental stages and from pathogen-treated leaves. To date, 57,422 ESTs and 2,268 HTCs have been deposited at INSD (August, 2008). In the KaFTom databa, the intron-exon structures, the InterProScan (Zdobnov and Apweiler, 2001) annotations, and BLAST (Altschul et al., 1990) annotations are available. The genome structures we
re estimated from alignments between HTCs and genomic quences by est2genome (Mott, 1997) and BLASTN. As part of the National BioResource Projects (NBRP) in Japan, the quencing, maintenance and distribution of full-length cDNA clones from ‘Micro-Tom’ have been carried out since 2007 at the Kazusa DNA Rearch Institute.
ESTs and unigenes
INSD and other public databas provide information on tomato ESTs and a non-redundant quence t. ESTs provide information on transcript quences and spatiotemporal expression patterns of genes at various developmental stages. Currently, INSD provides infor-mation on 259,921 tomato ESTs (Table1). Non-redundant connsus quences, which can be estab-lished by clustering and/or asmbling EST quences, allow the u of computational approaches such as homology arches and functional annotations to analyze putative gene functions. Several different notations, such as “unigenes”, “Unigenes”, “UNIGENEs”, and “tenta-tive connsus (TC)” refer to a non-redundant quence t in databas.
Information on tomato unigene quences is provided by the DFCI Tomato Gene Index databa (Lee et al., 2005), the SGN (Mueller et al., 2005a), and the MiBASE (Yano et al., 2006b). DFCI Toma
to Gene Index (relea 12.0 on June 15, 2008) provides information on 46,849 unigene quences, which were generated by asmbling and clustering 333,385 tomato ESTs, along with open reading frames (ORFs), alternative splicing quences, SNPs, corresponding homologous protein quences, gene ontology (GO) terms (Gene Ontology Consortium, 2008), cDNA libraries, Enzyme Commission (EC) numbers, KEGG metabolic pathways, unique 70-mer oligonucleotide quences, and orthologues in other organisms. SGN provides information on 34,829 unigene quences asmbled from 239,172 ESTs (version Tomato 200607), along with EST libraries, microarray resources, DNA markers, manual annotations, BLAST annotations bad on the non-redundant amino acid quence databa (nr) at NCBI and the Arabidopsis protein databa at The Arabidopsis Information Resource (TAIR) (Rhee et al., 2003), predicted peptide quences, and InterProScan annotations (including domains, GO terms, and gene families). MiBASE provides information on 26,363 tomato unigenes (version KTU1, January, 2005) asmbled from 150,581ESTs in dbEST (Boguski et al., 1993) and 35,824 ‘Micro-Tom’ ESTs of fruit and leaves (Yamamoto et al., 2005; Yano et al., 2006b). MiBASE contains BLAST annotations bad on the Arabidopsis translated protein quence databa in TAIR and the TIGR Gene Indices, as well as annotations from the NCBI nr databa. MiBASE also contains 1,935 putative SNPs obtained by comparing relevant EST quences among ‘Micro-Tom’and other inbred tomato lines (‘E6203’, ‘R11-
13’, ‘Rio Grande PtoR’, ‘R11-12’, and a wild relative, S.pennellii ‘TA56’), EST libraries, BLAST annotations, GO terms, metabolic pathways, and gene expression information obtained from ‘Micro-Tom’ cDNA arrays (described below). In addition, updated information on unigenes (version KTU2) has been available with BLAST annotations. NCBI has relead the UniGene databa (Wheeler et al., 2003), which provides accession numbers of ESTs that appear to be derived from the same locus. In the UniGene databa, each unigene (cluster) entry (nucleotide quences, including ESTs) includes other data, such as protein similarities. The current version of UniGene (Solanum lycopersicum: Unigene Build #32) contains 17,766 tomato unigenes constructed from 237,321 EST quences. Rearchers should keep in mind that different unigene ts of tomato are assigned in each public domain. In the respective computational methods for unigene construction, there are different unigene quences in the DFCI Tomato Gene Index, SGN, and MiBASE. In contrast, NCBI’s UniGene does not provide connsus unigene quences. Instead, UniGene is updated weekly or monthly, thus providing more recent entry information (accession numbers) for each unigene (cluster). Despite the issues, unigene quences rve as a uful basis for analyzing ORFs, quence similarities, and functional domains in protein quences.
Gene expression
Gene expression profiles on tomato cDNA micro-arrays are also available in public databas. The Center for Gene Expression Profiling (CGEP) provides information on the tomato cDNA microarray TOM1, which contains approximately 12,000 probes (Alba et al., 2004). Information on each probe is available with BLAST annotations, SGN unigene names, and the 5' and 3' end quences of TOM1 cDNA clones, which are available in the SGN databa. The tomato microarray expression databa (TED) (Fei et al., 2006) also provides expression profile data for TOM1. Currently, TED provides expression data from 15 ts of experiments, including annotations, unigene names, quence similarities, and expression patterns. Hierar-chical clustering, k-means clustering, lf-organizing map (SOM) clustering, and GO term analysis can also be performed against the experimental data in TED. MiBASE provides experimental data obtained from cDNA arrays of ‘Micro-Tom’. From fruit and leaf cDNA健身美体