This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bioinformatics
TrichOME: A Comparative Omics Databasefor Plant Trichomes1[C][W][OA]
Xinbin Dai, Guodong Wang, Dong Sik Yang, Yuhong Tang, Pierre Broun, M. David Marks, Lloyd W. Sumner,Richard A. Dixon, and Patrick Xuechun Zhao*
Plant Biology Division, The Samuel Roberts Noble Foundation, Ardmore, Oklahoma 73401 (X.D., D.S.Y., Y.T.,L.W.S., R.A.D., P.X.Z.); Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing100101, China (G.W.); Nestle R&D Center Tours, Plant Science and Technology, 37390 Notre-Dame D’Oe, France(P.B.); and Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108 (M.D.M.)
Plant secretory trichomes have a unique capacity for chemical synthesis and secretion and have been described as biofactoriesfor the production of natural products. However, until recently, most trichome-specific metabolic pathways and genes involvedin various trichome developmental stages have remained unknown. Furthermore, only a very limited amount of planttrichome genomics information is available in scattered databases. We present an integrated “omics” database, TrichOME, tofacilitate the study of plant trichomes. The database hosts a large volume of functional omics data, including expressedsequence tag/unigene sequences, microarray hybridizations from both trichome and control tissues, mass spectrometry-basedtrichome metabolite profiles, and trichome-related genes curated from published literature. The expressed sequence tag/unigene sequences have been annotated based upon sequence similarity with popular databases (e.g. Gene Ontology, KyotoEncyclopedia of Genes and Genomes, and Transporter Classification Database). The unigenes, metabolites, curated genes, andprobe sets have been mapped against each other to enable comparative analysis. The database also integrates bioinformaticstools with a focus on the mining of trichome-specific genes in unigenes and microarray-based gene expression profiles. TrichOMEis a valuable and unique resource for plant trichome research, since the genes and metabolites expressed in trichomes are oftenunderrepresented in regular non-tissue-targeted cDNA libraries. TrichOME is freely available at http://www.planttrichome.org/.
Plant trichomes are epidermal tissues located on thesurfaces of leaves, petals, stems, petioles, peduncles,and seed coats depending on species. By virtue of theirphysical properties (size, density), trichome hairs candirectly serve to protect buds of plants from insectdamage, reduce leaf temperature, increase light reflec-tance, prevent loss of water, and reduce leaf abrasion(Wagner, 1991; Wagner et al., 2004).
Although the morphology of trichomes variesgreatly, they can be generally classified into two types:simple trichomes (STs) and glandular secreting tri-chomes (GSTs; Wagner et al., 2004). STs of Arabidopsis(Arabidopsis thaliana) have been chosen as models forstudying cell fate and differentiation (Wagner, 1991;Breuer et al., 2009; Marks et al., 2009). In Arabidopsis,STs on leaves consist of a unicellular structure with a
stalk and three to four branches (Fig. 1B). Although theSTs are referred to as “nonglandular” (presumablynonsecreting), expression of genes involved in antho-cyanin, flavonoid, and glucosinolate pathways cannevertheless be detected in STs, indicating the roles ofSTs in the biosynthesis of secondary compounds anddefense (Wang et al., 2002; Jakoby et al., 2008). GSTs arefound on about one-third of vascular plants. GSTshave a multicellular structure with a stalk terminatingin a glandular head (Fig. 1, A and C–G). GSTs areinitiated from a single protodermal cell that undergoesvertical enlargement and multiple divisions to giverise to fully developed trichomes. GSTs often produceand accumulate terpenoid and phenylpropanoid oils(Wagner et al., 2004). However, alkaloids, the thirdmajor class of plant secondary compounds, are notcommon in GST exudates (Laue et al., 2000). Theamount of exudates produced by GSTs may reach 30%of mature leaf dry weight, as found in certain Austra-lian desert plants (Dell and McComb, 1978). PlantGSTs can impact pathogen defense, pest resistance,pollinator attraction, and water retention based on thephytochemicals they secrete.
GSTs on the aerial organ surfaces have a uniquecapacity for synthesis and secretion of chemicals(largely plant secondary metabolites), and they havebeen described as “chemical factories” for the produc-tion of high-value natural products (Mahmoud andCroteau, 2002; Wagner et al., 2004; Schilmiller et al.,2008). Secondary metabolites play important roles in
1 This work was supported by the National Science FoundationPlant Genome Program (grant no. 0605033) and The Samuel RobertsNoble Foundation.
* Corresponding author; e-mail [email protected] author responsible for distribution of materials integral to the
findings presented in this article in accordance with the policydescribed in the Instructions for Authors (www.plantphysiol.org) is:Patrick Xuechun Zhao ([email protected]).
[C] Some figures in this article are displayed in color online but inblack and white in the print edition.
[W] The online version of this article contains Web-only data.[OA] Open Access articles can be viewed online without a sub-
44 Plant Physiology�, January 2010, Vol. 152, pp. 44–54, www.plantphysiol.org � 2009 American Society of Plant Biologists www.plantphysiol.orgon April 9, 2019 - Published by Downloaded from
protecting the plant against insect predation and otherbiotic challenges (Peter and Shanower, 1998), and theyare potential sources for pharmaceutical and nutra-ceutical product development. For example, the tri-chome-borne artemisinin from Artemisia annua is stillthe most effective drug against malaria, and the earlysteps of its biosynthetic pathway have been exten-sively studied (Duke et al., 1994; Arsenault et al., 2008).Recently, the mechanisms by which plant glandulartrichomes make, transport, store, and secrete a greatvariety of unique compounds, especially terpenoidsand flavoniods, have received extended research in-terest because of the potential use of these compoundsin pharmaceutical and nutraceutical applications.Seminal studies have reported the assignment ofgene functions to specific metabolic pathways in glan-dular trichomes of several plant species, includingmint(Mentha 3 piperita; Alonso et al., 1992; Rajaonarivonyet al., 1992; Lange et al., 2000), basil (Ocimum basilicum;Gang et al., 2002; Iijima et al., 2004; Xie et al., 2008),Artemisia (Teoh et al., 2006; Zhang et al., 2008), tomato(Solanum lycopersicum; Fridman et al., 2005; Besser et al.,2009; Schilmiller et al., 2009), andhop (Humulus lupulus;Nagel et al., 2008;Wang et al., 2008). Table I summarizesthe trichome structure, classification, and secondarymetabolites in these species.“Omics” approaches are being extensively em-
ployed for systematically studying the molecular as-pects of trichome development, metabolism, andsecretion at the “systems biology” level, and data arebeing rapidly generated; however, currently, only lim-ited omics data for plant trichomes are publicly avail-able in the literature and scattered databases. Forexample, the National Center for Biotechnology Infor-
mation (NCBI) dbEST (Boguski et al., 1993) hostsapproximately 130,000 ESTs sampled from trichomecDNA libraries of 13 species (as of February 2009)or mixed tissues including trichomes. The publicMinimum Information About a Microarray Experi-ment-compliant (Brazma et al., 2001) repository Array-Express (Parkinson et al., 2009) hosts 16 microarrayhybridization experiments for Arabidopsis nongland-ular trichomes, but there are no glandular trichomemicroarray data available in public databases so far.Lastly, we could not find large-scale metabolite profiledata for plant trichomes from public omics databases,such as the AraCyc and MedicCyc databases (Muelleret al., 2003; Urbanczyk-Wochniak and Sumner, 2007).
The omics data in publicly available databases areusually far from comprehensive and not integrated, asmost of these data are in raw format. For example, EST/unigene andmicroarray data sets are not interlinked, andcomparative analysis between trichome andnontrichometissues for the identification of trichome-specific genes isabsent. Therefore, to mine useful information such astrichome specificity of gene expression, signal strengthsneed to be normalized among hybridization experimentsfrom both trichome and nontrichome tissues; to search aspecific gene and deduce its potential function(s), se-quences from cDNA libraries need to be annotated basedon metabolic pathways and further cross-linked to me-tabolite profiles by catalytic reaction. Such data collection,integration, and mining processes pose great challengesto biologists interested in trichomes.
To overcome the above limitations, we have devel-oped TrichOME, an integrated genomic database forplant trichomes. TrichOME integrates omics data, suchas EST/unigene sequences, microarray gene expression
Figure 1. Representative scanningelectron microscopy images of tri-chomes on plants. A, Erect glandulartrichome on the stem of M. sativa. B,Nonglandular trichome on a rosetteleaf of Arabidopsis. C, Procumbenttrichome on the petiole of M. trunca-tula. D, Field of glandular trichomes ona female bract of Cannabis sativa. E,Glandular trichomes on a bract of H.lupulus. F, Nonglandular trichome ona leaf of M. truncatula. G, Types VI(small arrow) and I (large arrow) tri-chomes on a leaf of S. lycopersicum.All the scanning electron microscopyimages were generated as describedpreviously (Ahlstrand, 1996; Eschet al., 2004) using an Emitech Tech-nologies (www.emitech.co.uk) K1150cyropreparation system and a HitachiHigh Technologies (www.hitachi-hhta.com) S3500N scanning electron mi-croscope. Bars = 100 mm.
TrichOME: a Comparative Omics Database for Plant Trichomes
Plant Physiol. Vol. 152, 2010 45 www.plantphysiol.orgon April 9, 2019 - Published by Downloaded from
profiles, metabolite profiles, and literature-mining re-sults. To assemble unigenes and pursue comparativeanalysis of gene expression, the database has integrated
a large amount of EST sequence and gene expressionprofile data from nontrichome tissues as well. Weprocessed, curated, and annotated the hosted data by
Table I. A summary of trichome structure, classification, and accumulated metabolites in representative plant species
Species Trichome Structure and Classification Metabolites
Arabidopsis thaliana Nonglandular trichome, large, single epidermal cellswith a stalk and three or four branches on the surfaceof most shoot-derived organs
Artemisia annua Biseriate 10-celled glandular trichome, head includingthree apical cell pairs (Duke and Paul, 1993)
Artemisinin, an endoperoxide sesquiterpene lactone
Cistus creticus Glandular trichome composed of a long multicellularstalk of over 200 mm topped by a small glandularhead cell; two types of nonglandular trichome:multicellular stellate and simple unicellular spike(Gulz et al., 1996)
Labdane-type components, such as ent-3#-acetoxy-13-epi-manoyl oxide and ent-13-epi-manoyl oxide(Falara et al., 2008)
Humulus lupulus Peltate glandular trichome with a glandular headconsisting of 30 to 72 cells, four stalk cells, and fourbasal cells; bulbous glandular trichome, consisting offour (occasionally eight) head glandular cells, twostalk cells, and two basal cells; nonglandular trichome(cystolith hair) with a hard calcium carbonatestructure at base of a hair (Oliveira et al., 1988; Nagelet al., 2008)
Essential oil, including myrcene, humulene, andcaryophyllene; bitter acids, including humulonesand lupulones; prenylfalvonoids, includingxanthohumol and desmethylxanthohumol(Wang et al., 2008)
Medicago sativa Erect glandular trichome containing multicellular stalktypically over 200 mm long topped by a glandularhead composed of a few cells with a diameter ofapproximately 15 mm; nonglandular trichomecomposed of a short base cell and a unicellularelongated shaft (Ranger and Hower, 2001)
N-(3-Methylbutyl) amide of linoleic acid (Rangeret al., 2005)
Mentha 3 piperita Peltate glandular trichome consisting of a basal cell,a stalk cell, and disc of eight glandular cellsapproximately 60 mm in diameter (McCaskillet al., 1992)
Essential oils, such as p-menthanes; monoterpenes,including menthone and menthol
Nicotiana tabacum Tall glandular trichome, a multicellular stalk toppedby unicellular or multicellular head; short glandulartrichome, a unicellular stalk topped by multicellularhead (Akers et al., 1978)
Labdene-diol diterpenes and amphipathic sugaresters (Lin and Wagner, 1994)
Ocimum basilicum Peltate glandular trichome consisting of a base cell, stalkcell, and a four-celled head; capitate glandulartrichome, consisting of a single base and stock celland one- to two-celled head; multicellularnonglandular spiked trichome (Werker et al., 1993)
Phenylpropanoid eugenol, monoterpanoid linalool,and phenylpropanoid methylcinnanmate(Xie et al., 2008)
Salvia fruticosa Type I consisting of one to two stalk cells and one to twoenlarged, rounded to pear-shaped secretory headcells; type II consisting of one to two stalk cells andone elongated head cell as narrow as the stalk cells atits base and slightly enlarged above; type III consistingof two to five elongated stalk cells and rounded headin young leaves, which becomes cup shaped inmature leaves (Werker et al., 1985)
Essential oils, such as a-pinene, 1,8-cineole,camphor, and borneol (Arikat et al., 2004)
Solanum habrochaites Type I glandular trichome, large in size withmulticellular base; type III, intermediate in size witha single basal cell; type IV, a short, multicellular stalkthat secretes droplets of sticky exudate at the tip; typeV, short, slender, one to four celled; type VI, short witha two- to four-celled glandular head; type VII, 0.05- to0.1-mm smaller glandular hair with a four- toeight-celled glandular head; type VI is particularlyabundant (Reeves, 1977)
Mainly a-santalene, a-bergamotene, andb-bergamotene; small amounts of a-humuleneand b-caryophyllene (Besser et al., 2009)
Solanum lycopersicum Same as above Monoterpenes (Besser et al., 2009)Solanum pennellii Same as above but possesses high density of
type IV glandular trichome (Lemke andMutschler, 1984)
2,3,4-Triacylglucoses (Goffreda et al., 1989)
Dai et al.
46 Plant Physiol. Vol. 152, 2010 www.plantphysiol.orgon April 9, 2019 - Published by Downloaded from
referring to multiple popular biological databases toensure the highest quality. TrichOME provides an in-teractiveWeb site (http://www.planttrichome.org/) forsearching trichome-related ESTs/unigenes, microarrayhybridizations, metabolites, and literature information.
DATA COMPILATION, INTEGRATION,AND MINING
EST Sequences
Plant trichome-related EST sequences were down-loaded from NCBI dbESTand were grouped by cDNAlibraries and species using an in-house Perl script. ThecDNA libraries from nontrichome tissues and full-length cDNAs were collected for the purpose ofunigene assembly and in silico expression analysis.The cDNA libraries of low quality, smaller libraries(,500 EST members), and nontrichome cDNA librar-ies (due to some errors in the dbEST database) weremanually removed.The collected ESTs were further processed prior to
assembly in order to remove the remaining cloningvectors, adaptors, and contaminating sequences (Leeet al., 2005). For each cDNA library, the cross_match(Ewing and Green, 1998) program was used to searchagainst the corresponding vector sequence. Subse-quently, the rigorous Smith-Waterman SSearch pro-gram (Pearson and Lipman, 1988) was employed toclean short adapters, restriction endonuclease sites,and PCR primers provided in the original cDNAlibrary descriptions. After the above procedures, Trich-OME currently hosts about one million cleansed ESTsequences from both trichome and correspondingnontrichome control tissues of 13 species (A. annua,Cistus creticus, H. lupulus, Medicago sativa, Medicagotruncatula, M. piperita, Nicotiana benthamiana, Nicotianatabacum, O. basilicum, Salvia fruticosa, Solanum habro-chaites, S. lycopersicum, and Solanum pennellii).The cleansed ESTsequences were further assembled
into unigenes, which consist of singletons and tenta-tive consensuses by species, using The Institute forGenomic Research Gene Index clustering tools (Leeet al., 2005). Some unigenes were further refined byreferring to the available full-length cDNA from thesame species in order to improve assembly quality.The unigene and EST sequences were annotated
using BLASTX queries against UniProtKB and Swiss-Prot databases with a cutoff E-value of less than 1e-04.The top five query results were listed as valid anno-tations. Using the same protocol, the unigene/ESTsequences were further annotated using the GeneOntology database (Ashburner et al., 2000), KyotoEncyclopedia of Genes and Genomes Pathway Data-base (KEGG; Kanehisa et al., 2008), Transporter Clas-sification Database (TCDB; Saier et al., 2006), and planttranscription factor database (PlantTFDB; Guo et al.,2008). The conserved domains of sequences weresearched by InterProScan with its default cutoffE-value (Hunter et al., 2009).
Microarray Hybridizations
The microarray data sets in TrichOME were col-lected from three different sources: ArrayExpress(Parkinson et al., 2009), Arabidopsis Gene Atlas(Schmid et al., 2005), and our internal experiments.TrichOME currently hosts microarray-based expres-sion profiles for two model plants, Arabidopsis (usingthe Affymetrix ATH1 GeneChip) and M. truncatula, aswell as the forage crop M. sativa (alfalfa; using theAffymetrix Medicago GeneChip). Our expressionanalyses with three biological replicates were per-formed on glandular and nonglandular trichome tis-sues as well as nontrichome tissues (as a control), suchas stem, flower, root, and nodule (Schmid et al., 2005;Benedito et al., 2008; Wang et al., 2008).
For each Affymetrix array hybridization, the resul-tant image .cel file was exported using the GeneChipOperating Software version 1.4 (Affymetrix) and thenimported into Robust Multiarray Average software forglobal normalization (Irizarry et al., 2003). Presence/absence call for each probe set was analyzed usingdCHIP software (Li and Wong, 2001) for its highreliability.
To annotate the unigenes and array data, thetrichome-related unigenes were mapped to Affyme-trix GeneChip probe sets using a probe sets remappingPERL script developed by Affymetrix. Meanwhile, theAffymetrix probe set target sequences were mapped tothe Gene Ontology database (Ashburner et al., 2000),KEGG gene andmetabolic pathway database (Kanehisaet al., 2008), TCDB (Saier et al., 2006), and PlantTFDB(Guo et al., 2008) by performing BLASTX searchingagainst the reference sequences with a cutoff E-value ofless than 1e-04.
Mass Spectrometry-Based Metabolite Profiles
TrichOME hosts mass spectrometry (MS)-basedmetabolite profiles for plant trichomes. Currently,the database hosts gas chromatography (GC)-MSdata obtained for potato leafhopper-susceptible and-resistant lines of M. sativa. The data are composed oftriplicate profiles obtained by multiple publishedmethods (Broeckling et al., 2005). These methods in-cluded GC-MS of a polar extract, GC-MS of a nonpolarextract with lipid hydrolysis, and GC-MS of nonpolarextracts without lipid hydrolysis. Metabolite profilingmethods have been described in the details page ofeach experiment. Polar GC-MS metabolite profilesinclude over 300 components composed of aminoacids, organic acids, alcohols, nucleotides, sugars,and sugar phosphates. Nonpolar, hydrolyzed GC-MSmetabolite profiles include over 300 components com-posed of fatty acids, long-chain alcohols, waxes, ter-penoids, and sterols. In particular, fatty acid amideswere reported, as these have been recently suggestedto contribute to leafhopper resistance (Ranger et al.,2005). Additional experimental data, including solid-phase microextraction-GC-MS of volatiles and ultra-
TrichOME: a Comparative Omics Database for Plant Trichomes
Plant Physiol. Vol. 152, 2010 47 www.plantphysiol.orgon April 9, 2019 - Published by Downloaded from
performance liquid chromatography/quadrupoletime-of-flight MS for profiling secondary metabolites,have been collected and will be added to the databasein the near future.
Raw GC-MS metabolite profiles were deconvolutedusing AMDIS software (the Automated Mass SpectralDeconvolution and Identification System from theNational Institute of Standards and Technology;http://www.amdis.net/). Deconvoluted peaks wereidentified using AMDIS spectral matching with acustom spectral database generated from authenticstandards (Broeckling et al., 2005). The unidentifiedpeaks were further analyzed through spectral compar-isons with data in MassBank (http://www.massbank.jp/) with a similarity score cutoff of 0.85 or greater. Therelative peak areas of metabolites were extracted usingcustom MET-IDEA software (Broeckling et al., 2006)and normalized against the peak areas of the internalstandard. Normalization allows for the quantitativecomparison of accumulated metabolites or tentativepeaks.
The annotated peaks were linked to correspondingcompounds in the PubChem database (http://pubchem.ncbi.nlm.nih.gov/) and KEGG compounddatabase by manually matching compound names.We further mapped the identified compounds totrichome-related unigenes and probe set targets. Thiswas performed by downloading the KEGG mappingfiles of compounds, reactions, and enzymes as well asstandard KEGG Enzyme Commission (EC) number orKEGG Orthology (KO) sequence data sets from theKEGG ftp site. By referring to these mapping files,compounds were mapped to EC/KO number throughreaction numbers. The compounds were further asso-ciated with unigenes and probe set target sequencesby BLASTX searching these sequences against thedownloaded standard EC/KO reference sequences(E-value# 1e-04). TrichOME allows the user to down-load raw data by experiments or explore details of thepeak data (Fig. 2).
Literature Mining of Trichome-Related Genesand Proteins
TrichOME hosts trichome-related genes and pro-teins curated from the published literature. The geneswere initially searched using information such asgene/protein name, alias names, and annotationsfrom the NCBI Gene Database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene), followed by ex-tensive human curation. Hosted genes were furtherassociated with trichome-related unigene/EST se-quences as well as microarray probe set target se-quences by BLASTX queries of the standard proteinsequences of hosted genes downloaded from the NCBIGene Database with a cutoff E-value of less than 1e-6.
System Implementation and Web Interfaces
TheTrichOMEsystem consists of a back-enddatabaseand a front-end Web site developed using Java. The
front-end Web site interfaces integrated AJAX technol-ogy powered by YUI libraries (http://developer.yahoo.com/yui/) to improve user-server interactions.
The ESTs/unigenes and associated cDNA libraryinformation are available for batch download by spe-cies and libraries (Fig. 3A). Web tree views and com-prehensive search interfaces were also implemented tofacilitate the interactive queries. For example, theKEGG tree view allows users to explore trichome-related ESTs/unigenes and present the sequences bymetabolic pathways (Fig. 3B). A BLAST interface en-ables users to search user-submitted sequences againstthe trichome-related sequences in the database. Trich-OME allows users to download the sequences filteredby query individually or in batch. It also includes anonline data submission page to collect trichome-related omics data from the research communities.
TrichOME integrates an in silico gene expressionanalysis algorithm to search the unique or highlyexpressed unigenes in trichome tissue (Fig. 4). Thealgorithm counts EST members of each unigene acrossmultiple cDNA libraries to calculate an R value (Stekelet al., 2000), which represents the library specificity ofthe unigene.
For each hybridization experiment, TrichOME hostsboth raw and normalized data, and it provides adescription page with links for batch downloading. Inaddition, the database provides a series of “compareby…” functions for the query of hybridization signalsby probe set identifiers, annotation of probe set targetsequences, associated trichome-related unigenes, orthe ratio of signal strength between trichome andnontrichome tissues.
The trichome-related unigenes, microarray probesets, metabolites, and curated trichome-related geneshave been mapped to each other to allow users to gainan overall systems understanding of trichome biologyand biochemistry.
Case Studies
Case Study 1: Mining of Putative Trichome-Specific Genesby in Silico and Microarray Gene Expression Analysis
Because M. truncatula has the most (52 in total)comprehensive nontrichome cDNA libraries availableas controls, this organismwas chosen as an example tosearch for putative trichome-specific or highly prefer-entially expressed sequences. In the “in silico expres-sion” of the “EST analysis” section, we selected“MT_TRI” as trichome cDNA library and all of thenontrichome cDNA libraries as a control. After click-ing the Analyze button, the server calculates R valuesof all unigenes by comparing the abundance of genetranscripts among cDNA libraries and returns 111unigenes with significantly higher expression levelsin the trichome cDNA library than in the nontrichomecDNA libraries (R $ 4; Supplemental File S1). Thesequences are available for batch download by click-ing the “download all” link on the same page. Users
Dai et al.
48 Plant Physiol. Vol. 152, 2010 www.plantphysiol.orgon April 9, 2019 - Published by Downloaded from
may also select the “show unigenes only detected intrichome” option to view the unigenes only present inthe trichome library; the R value statistical result willbe ignored under this option.To cross-validate the above in silico expression
analysis results, the 111 unigene identifiers were cop-ied into the “compare by unigene” section under the“microarray analysis” menu to retrieve expressionsignals of corresponding probe sets from all tissue-specific microarray hybridization experiments (meanvalues). Among the 111 unigenes, 82 unigenes, forwhich corresponding probe set targets were expressedat least two times higher in signal strength in trichometissue than in nontrichome tissue (Supplemental FileS2), were identified as putative trichome-related genesin M. truncatula for future experimental analysis.Remarkably, among the 20 genes that were between
10- and 758-fold preferentially expressed in trichomescompared with nontrichome tissue, 16 were annotatedas being involved in biotic or abiotic stress responses(Table II). These include pathogenesis-related proteinsof the PR1a, -4, and -5a (thaumatin) classes as well aschitinases and endoglucanases, all of which are asso-ciated with induced responses to pathogen attack
(Rigden and Coutts, 1988). Chitinases and glucanaseslikely modify the cell walls of invading pathogens andare often potent inhibitors of fungal growth (Mauchet al., 1988; Zhu et al., 1994). Two genes annotatedas germin-like proteins were preferentially expressed72- and 78-fold in Medicago trichomes. Germ andgermin-like proteins may exhibit oxalate oxidase orsuperoxide disumutase activity, have been implicatedin the generation of hydrogen peroxide during plantdefense responses, and have been shown geneticallyto play important roles in plant defense (Lou andBaldwin, 2006; Godfrey et al., 2007). Two lipid transferprotein genes were preferentially expressed by 32- and37-fold in the trichomes. These genes are often amongthe most highly expressed in plant trichomes (Langeet al., 2000; Aziz et al., 2005; Wang et al., 2008).Although they may play roles in the transfer of lipidmetabolites between cell compartments, they have alsobeen shown to act in plant defense, either throughdirect antimicrobial activity (Kader, 1997) or as signaltransduction components (Maldonado et al., 2002).The Ser protease inhibitor gene that was preferentiallyexpressed over 750-fold in the Medicago trichomes islikely involved in defense against herbivory, whereas
Figure 2. The interface for metabolite profiles by GC-MS in TrichOME. [See online article for color version of this figure.]
TrichOME: a Comparative Omics Database for Plant Trichomes
Plant Physiol. Vol. 152, 2010 49 www.plantphysiol.orgon April 9, 2019 - Published by Downloaded from
the gene annotated as encoding a desiccation-relatedprotein may help protect against abiotic stress. Geneswith similar potential functions (e.g. dehydrins) arealso expressed in trichomes of other species (Azizet al., 2005).
On the basis of this preliminary analysis, the M.truncatula glandular trichomes appear to provide theplant with a defensive barrier through primary ex-pression of high levels of proteins with direct defen-sive properties. This is in contrast to the trichomes fromother species, which, although expressing some of thesame types of defense genes (Supplemental File S3),primarily appear to harness their gene expression to-ward the biosynthesis and accumulation of antimicrobialsecondary metabolites. For example, genes associatedwith terpene metabolism are among the most highlyexpressed in trichomes of mint, hop, and tomato (Langeet al., 2000; Wang et al., 2008; Besser et al., 2009).
We also compared the transcriptional profiles innonglandular trichomes (of Arabidopsis) and leavesfrom which trichomes had been removed using theArabidopsis ATH1 GeneChip. Six transcription factorgenes (three Myb genes, two WRKY genes, and one
homeodomain/Leu zipper [HD-ZIP] gene) werefound in the top 20 probe sets/genes that were highlyexpressed in trichome tissue (Table III). Furthermore, aportion of these genes or their family membersare reportedly involved in trichome development(Jakoby et al., 2008). For example, AT1G79840 (HD-ZIP, GLABRA2 [GL2] gene) and AT2G37260 (WRKY,TRANSPARENT TESTA GLABRA2 gene) are well-studied regulators of leaf trichome formation (Rerieet al., 1994; Johnson et al., 2002). Interestingly, thetop 20 most highly expressed genes in nonglandulartrichomes are quite different from the genes that arehighly expressed in glandular trichomes (Table II). Thelatter include many more genes that are involved inbiotic or abiotic stress responses, and no transcriptionfactor was found in the latter list.
Case Study 2: Comparative Analysis of MonoterpenoidBiosynthesis Genes in Hop Trichomes
Hop is a perennial, dioecious plant that belongs tothe Cannabaceae family. Hop trichomes produce andaccumulate large amounts of terpenoids (the mono-
Figure 3. The interface for searching trichome-related ESTs/unigenes. A, List of species with trichome-related ESTs/unigenes andlinks for batch downloading. Acc., Accession number. B, Trichome-related unigenes classified by KEGG metabolic pathways.[See online article for color version of this figure.]
Dai et al.
50 Plant Physiol. Vol. 152, 2010 www.plantphysiol.orgon April 9, 2019 - Published by Downloaded from
terpene pinene and the sesquiterpenes humulene andcaryophyllene), prenylflavonoids, and two importantclasses of bitter acids (humulone and lupulone deriv-atives), all of which give additive flavor to beer(Stevens et al., 1998; Wang et al., 2008). To study themonoterpenoid biosynthesis pathway, we firstsearched the hop trichome-related unigenes involvedin the pathway by exploring the KEGG classificationtree in the “ESTanalysis” section. Some of the returnedunigenes, such as TCHL10609 and TCHL10281,are annotated as encoding pinene synthase (monoter-
pene synthase) and given the KEGG orthologyterm K07384. K07384 is located under the node repre-senting monoterpenoid biosynthesis pathway (“Me-tabolism / Biosynthesis of Secondary Metabolites /Monoterpenoid Biosynthesis”). We then designedprimers targeting these unigenes and successfullycloned the full-length cDNAs of the monoterpenesynthases HlMTS1 and HlMTS2. The detailed com-parative analysis of the monoterpenoid biosynthesisgenes/enzymes in hop trichomes has been published(Wang et al., 2008).
Figure 4. The interface for in silico gene expression analysis. The function calculates R values for the identification of trichome-specific or highly expressed unigenes by comparing with EST members of unigenes expressed in the trichome library andnontrichome control libraries.
Table II. Top 20 probe sets/unigenes highly expressed in trichome of M. truncatula
Probe Set Identifier Unigene Accession No. Annotation Ratioa
Mtr.16277.1.S1_at TCMT54757 Ser proteinase inhibitor 758.5Mtr.49787.1.S1_s_at TCMT59300 ES611317 Endo-1,3-b-glucanase 323.5Mtr.43770.1.S1_at TCMT49613 Pathogenesis-related protein 1a 242.7Mtr.31496.1.S1_s_at TCMT40799 Putative arsenate reductase 153.6Mtr.22721.1.S1_at TCMT48089 Foot protein 4 variant 2 153.28Mtr.6179.1.S1_at TCMT56621 Desiccation-related protein PCC13-62 144.43Mtr.34454.1.S1_at TCMT43273 Class Ib chitinase 128.25Mtr.32629.1.S1_s_at TCMT42631 Thaumatin-like protein PR-5a 104.22Mtr.22726.1.S1_at TCMT59229 Unknown 94.63Mtr.34878.1.S1_at TCMT60298 Germin-like protein 77.63Msa.1226.1.S1_at CO513853 Germin-like protein 71.59Mtr.44506.1.S1_x_at TCMT59300 Endo-1,3-b-glucanase 66.05Mtr.10325.1.S1_at TCMT41963 Chitinase-related agglutinin 63.20Mtr.35661.1.S1_at TCMT59126 Probable lipid transfer protein family protein 54.78MsaAffx.3538.1.S1_at TCMT42628 TCMT40025 Putative senescence-associated protein 40.86Mtr.8763.1.S1_at ES613671 Thaumatin-like protein PR-5a 38.56Mtr.34695.1.S1_at TCMT55690 Plant lipid transfer/seed storage/trypsin-a-amylase
The genes expressed in trichomes may be under-represented in non-tissue-targeted libraries due to thedistinct features of trichome tissue in secondary me-tabolism. That is to say, many of the trichome-relatedtranscripts may not appear in regular EST databasesbecause of their relative low expression levels inwhole plant tissue. For example, 68% of ESTs in theM. truncatula trichome cDNA library are singletons orbelong to trichome-specific unigenes, whereas theaverage value is 15% for all 52 nontrichome cDNAlibraries. Therefore, the unique data in TrichOME are aresource to the trichome biology and plant defenseresearch communities and are also expected to be avaluable resource for the annotation/reannotation of“model” plant genomes, especially for the modellegume M. truncatula.
Since EST sequences are often short and of lowquality, we have included additional sequences fromEST libraries of nontrichome tissues as well as indi-vidual cDNA sequences to generate longer and morecredible unigene sequences. These nontrichome ESTlibraries were also used as control libraries for in silicoanalysis of trichome-specific genes. We will routinelyupdate the TrichOME database when more trichomeor nontrichome EST sequences are available in publicrepositories, aiming to further improve the quality ofunigene assembly and comparative analysis of tri-chome genes.
TrichOME hosts Metabolomics Standards Initiative(Sansone et al., 2007)-compliant data. The GC-MS-based metabolite profiles are much more reproduciblethan LC-MS with respect to retention time and spectra
of peaks. The retention time index of each peak can becalibrated based on reference compounds and there-fore enables comparison across experiments. Mean-while, the spectrum data of each peak fromGC-MS areless variable and are comparable across laboratories,because the common electrical ionization protocol (70eV) is applied for spectrum generation. In the Trich-OME database, some of the peaks were identifiedbased on the standard electrical ionization spectrumlibraries, for example, our internal library and theMassBank (http://www.massbank.jp/).
TrichOME also provides tools for the analysis ofmicroarray data and unigenes in addition to hostingsequences, gene expression profiles, metabolite pro-files, and curated trichome-related genes. The datafrom different sections have been carefully cross-mapped to each other to enable users to cross-refer-ence the data in different sections. For example,TrichOME hosts the Affymetrix Medicago GeneChiphybridization results from two organisms. The probesets are mapped with the curated trichome-relatedgenes from the literature and are also linked tounigenes. These data and analytical functions enableusers to identify the differentially expressed genesbetween trichome and nontrichome tissues.
Transcription factors and transporters are receivingextensive interest from the trichome research commu-nity, as the former may regulate secondary metabolicpathways and the latter function to transfer naturalproducts across the plasma membrane and tonoplast.TrichOME annotates unigenes based on the publishedPlantTFDB and TCDB databases (Saier et al., 2006; Guoet al., 2008). These in-depth annotations facilitate the
Table III. Top 20 probe sets that are highly expressed in nonglandular trichome compared with leaftissue in Arabidopsis
Probe Set
Identifier
Gene Accession
No.Annotation
Signal Ratio
(Trichome-Leaf)
249408_at At5g40330 Myb-related protein 2.65245181_at At5g12420 Putative protein similarity to various predicted
proteins, contains ATP synthase-d (OSCP) subunitsignature AA211-230
2.61
257490_x_at At1g01380 Myb homolog 2.55260166_at At1g79840 GL2, HD-ZIP protein, GL2 gene 2.40260386_at At1g74010 Putative strictosidine synthase 2.36267459_at At2g33850 Unknown protein 2.31256359_at At1g66460 Protein kinase, Pfam profile PF00069 2.31265954_at At2g37260 Putative WRKY-type DNA-binding protein 2.31264387_at At1g11990 Putative growth regulator protein (axi 1 gene) 2.24253392_at At4g32650 Potassium channel protein AtKC potassium channel 2.24253595_at At4g30830 M1 protein, Streptococcus pyrogenes, PIR2:S46489 2.23263775_at At2g46410 Putative MYB family transcription factor 2.23263480_at At2g04032 Putative root iron transporter protein 2.23265008_at At1g61560 Mlo protein 2.21248236_at At5g53870 Putative phytocyanin/early nodulin-like protein 2.19248896_at At5g46350 WRKY-type DNA-binding protein 2.17246000_at At5g20820 Putative protein-predicted proteins 2.15259410_at At1g13340 Hypothetical protein 2.14266544_at At2g35300 Late embryogenesis abundant proteins 2.13
Dai et al.
52 Plant Physiol. Vol. 152, 2010 www.plantphysiol.orgon April 9, 2019 - Published by Downloaded from