Top Banner
The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Proling 1[OPEN] Hsin-Yen Larry Wu , a Gaoyuan Song, b Justin W. Walley, b and Polly Yingshan Hsu a,2,3 a Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 b Department of Plant Pathology and Microbiology, Iowa State University, Ames, Iowa 50011 ORCID IDs: 0000-0001-9407-338X (H.-Y.L.W.); 0000-0003-1633-9159 (G.S.); 0000-0001-7553-2237 (J.W.W.); 0000-0001-7071-5798 (P.Y.H.). Recent applications of translational control in Arabidopsis (Arabidopsis thaliana) highlight the potential power of manipulating mRNA translation for crop improvement. However, to what extent translational regulation is conserved between Arabidopsis and other species is largely unknown, and the translatome of most crops remains poorly studied. Here, we combined de novo transcriptome assembly and ribosome proling to study global mRNA translation in tomato (Solanum lycopersicum) roots. Exploiting features corresponding to active translation, we discovered widespread unannotated translation events, including 1,329 upstream open reading frames (uORFs) within the 59 untranslated regions of annotated coding genes and 354 small ORFs (sORFs) among unannotated transcripts. uORFs may repress translation of their downstream main ORFs, whereas sORFs may encode signaling peptides. Besides evolutionarily conserved sORFs, we uncovered 96 Solanaceae-specic sORFs, revealing the importance of studying translatomes directly in crops. Proteomic analysis conrmed that some of the unannotated ORFs generate stable proteins in planta. In addition to dening the translatome, our results reveal the global regulation by uORFs and microRNAs. Despite diverging over 100 million years ago, many translational features are well conserved between Arabidopsis and tomato. Thus, our approach provides a high-throughput method to discover unannotated ORFs, elucidates evolutionarily conserved and unique translational features, and identies regulatory mechanisms hidden in a crop genome. Besides being an essential step in gene expression, mRNA translation directly shapes the proteome, which contributes to cellular structure, function, and activity in all organisms. The characterization of translational regulation in Arabidopsis (Arabidopsis thaliana) has en- abled crop improvement, including increasing tomato (Solanum lycopersicum) sweetness, rice (Oryza sativa) immunity, and lettuce (Lactuca sativa) resistance to ox- idative stress (Sagor et al., 2016; Xu et al., 2017b; Zhang et al., 2018). However, not everything in Arabidopsis is applicable to other plants, and how the Arabidopsis translatome compares with other species is largely unknown. Moreover, due to limited genomic resources and methods, translational landscapes and their un- derlying regulation in crops remain understudied. Ribosome proling, or Ribo-seq, has emerged as a high-throughput technique to study global translation (Ingolia et al., 2009; Brar and Weissman, 2015; Andreev et al., 2017). In a Ribo-seq experiment, ribosomes in the sample of interest are immobilized and the lysate is treated with nucleases to obtain ribosome-protected mRNA fragments (i.e. ribosome footprints). Finally, sequencing of the ribosome footprints reveals the quantity and positions of ribosomes on a given tran- script. Because ribosomes decipher mRNA every three nucleotides, the periodic feature of ribosome footprints can be used to uncover previously unannotated trans- lation events (Bazzini et al., 2014; Fields et al., 2015; Ji et al., 2015; Calviello et al., 2016; Hsu et al., 2016). For example, upstream open reading frames (uORFs) in the 59 leader sequence or 59 untranslated region (UTR) have been shown to be widespread in many protein-coding genes in humans (Homo sapiens), mouse (Mus musculus), zebrash (Danio rerio), yeast (Saccharomyces cerevisiae), and plants (Brar et al., 2012; Liu et al., 2013; Ji et al., 2015; Lei et al., 2015; Chew et al., 2016; Hsu et al., 2016; Johnstone et al., 2016). Several well-characterized ex- amples and global analyses indicate that uORFs can modulate the translation of their downstream main ORFs (Liu et al., 2013; von Arnim et al., 2014; Lei et al., 2015; Chew et al., 2016; Johnstone et al., 2016; Hsu and Benfey, 2018). Moreover, numerous presumed noncoding 1 This work was supported by the U.S. Department of Agriculture- National Institute of Food and Agriculture (postdoctoral fellowship 2016-67012-24720) and Michigan State University (startup fund) to P.Y.H.; the National Science Foundation (1759023), the U.S. Depart- ment of Agriculture-National Institute of Food and Agriculture (Hatch project 3808), and Iowa State University (Plant Sciences Insti- tute Award) to J.W.W.. 2 Author for contact: [email protected]. 3 Senior author. The author responsible for distribution of materials integral to the ndings presented in this article in accordance with the policy de- scribed in the Instructions for Authors (www.plantphysiol.org) is: Polly Hsu ([email protected]). H.-Y.L.W. and P.Y.H. designed the research; P.Y.H performed the sequencing experiments; H.-Y.L.W. and P.Y.H. analyzed the sequenc- ing data; G.S. and J.W.W. performed the proteomic experiments and analyzed the proteomic data; H.-Y.L.W. and P.Y.H. wrote the article with input from all authors. [OPEN] Articles can be viewed without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.19.00541 Plant Physiology Ò , September 2019, Vol. 181, pp. 367380, www.plantphysiol.org Ó 2019 American Society of Plant Biologists. All Rights Reserved. 367 https://plantphysiol.org Downloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.
14

The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

Aug 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

The Tomato Translational Landscape Revealed byTranscriptome Assembly and Ribosome Profiling1[OPEN]

Hsin-Yen Larry Wu,a Gaoyuan Song,b Justin W. Walley,b and Polly Yingshan Hsua,2,3

aDepartment of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824bDepartment of Plant Pathology and Microbiology, Iowa State University, Ames, Iowa 50011

ORCID IDs: 0000-0001-9407-338X (H.-Y.L.W.); 0000-0003-1633-9159 (G.S.); 0000-0001-7553-2237 (J.W.W.); 0000-0001-7071-5798 (P.Y.H.).

Recent applications of translational control in Arabidopsis (Arabidopsis thaliana) highlight the potential power of manipulatingmRNA translation for crop improvement. However, to what extent translational regulation is conserved between Arabidopsisand other species is largely unknown, and the translatome of most crops remains poorly studied. Here, we combined de novotranscriptome assembly and ribosome profiling to study global mRNA translation in tomato (Solanum lycopersicum) roots.Exploiting features corresponding to active translation, we discovered widespread unannotated translation events, including1,329 upstream open reading frames (uORFs) within the 59 untranslated regions of annotated coding genes and 354 small ORFs(sORFs) among unannotated transcripts. uORFs may repress translation of their downstream main ORFs, whereas sORFs mayencode signaling peptides. Besides evolutionarily conserved sORFs, we uncovered 96 Solanaceae-specific sORFs, revealing theimportance of studying translatomes directly in crops. Proteomic analysis confirmed that some of the unannotated ORFsgenerate stable proteins in planta. In addition to defining the translatome, our results reveal the global regulation by uORFsand microRNAs. Despite diverging over 100 million years ago, many translational features are well conserved betweenArabidopsis and tomato. Thus, our approach provides a high-throughput method to discover unannotated ORFs, elucidatesevolutionarily conserved and unique translational features, and identifies regulatory mechanisms hidden in a crop genome.

Besides being an essential step in gene expression,mRNA translation directly shapes the proteome, whichcontributes to cellular structure, function, and activityin all organisms. The characterization of translationalregulation in Arabidopsis (Arabidopsis thaliana) has en-abled crop improvement, including increasing tomato(Solanum lycopersicum) sweetness, rice (Oryza sativa)immunity, and lettuce (Lactuca sativa) resistance to ox-idative stress (Sagor et al., 2016; Xu et al., 2017b; Zhanget al., 2018). However, not everything in Arabidopsis isapplicable to other plants, and how the Arabidopsis

translatome compares with other species is largelyunknown. Moreover, due to limited genomic resourcesand methods, translational landscapes and their un-derlying regulation in crops remain understudied.Ribosome profiling, or Ribo-seq, has emerged as a

high-throughput technique to study global translation(Ingolia et al., 2009; Brar and Weissman, 2015; Andreevet al., 2017). In a Ribo-seq experiment, ribosomes in thesample of interest are immobilized and the lysate istreated with nucleases to obtain ribosome-protectedmRNA fragments (i.e. ribosome footprints). Finally,sequencing of the ribosome footprints reveals thequantity and positions of ribosomes on a given tran-script. Because ribosomes decipher mRNA every threenucleotides, the periodic feature of ribosome footprintscan be used to uncover previously unannotated trans-lation events (Bazzini et al., 2014; Fields et al., 2015; Jiet al., 2015; Calviello et al., 2016; Hsu et al., 2016). Forexample, upstream open reading frames (uORFs) in the59 leader sequence or 59 untranslated region (UTR) havebeen shown to be widespread in many protein-codinggenes in humans (Homo sapiens), mouse (Musmusculus),zebrafish (Danio rerio), yeast (Saccharomyces cerevisiae),and plants (Brar et al., 2012; Liu et al., 2013; Ji et al., 2015;Lei et al., 2015; Chew et al., 2016; Hsu et al., 2016;Johnstone et al., 2016). Several well-characterized ex-amples and global analyses indicate that uORFs canmodulate the translation of their downstream mainORFs (Liu et al., 2013; von Arnim et al., 2014; Lei et al.,2015; Chew et al., 2016; Johnstone et al., 2016; Hsu andBenfey, 2018).Moreover, numerous presumed noncoding

1This work was supported by the U.S. Department of Agriculture-National Institute of Food and Agriculture (postdoctoral fellowship2016-67012-24720) and Michigan State University (startup fund) toP.Y.H.; the National Science Foundation (1759023), the U.S. Depart-ment of Agriculture-National Institute of Food and Agriculture(Hatch project 3808), and Iowa State University (Plant Sciences Insti-tute Award) to J.W.W..

2Author for contact: [email protected] author.The author responsible for distribution of materials integral to the

findings presented in this article in accordance with the policy de-scribed in the Instructions for Authors (www.plantphysiol.org) is:Polly Hsu ([email protected]).

H.-Y.L.W. and P.Y.H. designed the research; P.Y.H performed thesequencing experiments; H.-Y.L.W. and P.Y.H. analyzed the sequenc-ing data; G.S. and J.W.W. performed the proteomic experiments andanalyzed the proteomic data; H.-Y.L.W. and P.Y.H. wrote the articlewith input from all authors.

[OPEN]Articles can be viewed without a subscription.www.plantphysiol.org/cgi/doi/10.1104/pp.19.00541

Plant Physiology�, September 2019, Vol. 181, pp. 367–380, www.plantphysiol.org � 2019 American Society of Plant Biologists. All Rights Reserved. 367

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 2: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

RNAs have been found to possess translated smallORFs (sORFs), usually below 100 codons (Bazzini et al.,2014; Hsu et al., 2016; Bazin et al., 2017; Ruiz-Orera andAlbà, 2019). The small size of the protein products ofsORFs suggests that they may serve as signaling pep-tides (Hsu and Benfey, 2018; Ruiz-Orera and Albà,2019). Despite their importance, uORFs and sORFs areoften missing in annotations because computationalpredictions often assume that (1) protein-coding se-quences encode proteins greater than 100 amino acidsand (2) only the longest ORF in a transcript is translated(Basrai et al., 1997; Claverie, 1997). Thus, ribosomeprofiling provides an unparalleled opportunity to ex-perimentally identify translated ORFs genome wide inan unbiased manner.

In plant research, ribosome profiling has been used tostudy translational regulation in diverse aspects ofplant development and response to stress, includingphotomorphogenesis, chloroplast differentiation, coty-ledon development, hypoxia, hormone responses, nu-trient deprivation, drought, pathogen responses, andbiogenesis of small interfering RNAs (Liu et al., 2013;Zoschke et al., 2013; Juntawong et al., 2014; Lei et al.,2015; Merchante et al., 2015; Chotewutmontri andBarkan, 2016; Li et al., 2016; Bazin et al., 2017; Xuet al., 2017a; Shamimuzzaman and Vodkin, 2018). Wepreviously optimized the resolution of this technique toresolve three-nucleotide periodicity, which enabled usto precisely define translated regions within individualtranscripts, in Arabidopsis. As a result, we were able toidentify previously unannotated translation events,including usage of non-AUG start sites, uORFs in 59UTRs, and sORFs in annotated noncoding RNAs (Hsuet al., 2016). To date, systematically identifying trans-lated ORFs in plants has only been attempted in Ara-bidopsis (Hsu et al., 2016; Bazin et al., 2017).

Tomato is the most widely cultivated vegetableworldwide (Schwarz et al., 2014). It belongs to the So-lanaceae, whose members produce important foods,spices, and medicines. Like other crops, tomato haslimited genomic resources or optimized methods. Forinstance, the latest annotation, from the InternationalTomato Annotation Group (ITAG), ITAG 3.2 for cvHeinz 1706, only contains predicted protein-codinggenes, whereas noncoding RNAs and uORFs are notincluded (Fernandez-Pozo et al., 2015). We choseseedling roots to establish the protocol for translatomicanalysis for two reasons: (1) the root plays an essentialrole in water/nutrient uptake as well as interactionbetween plants and other organisms or the environ-ment; and (2) the root is composed of diverse cell types,which is beneficial for surveying translation events, aswe observed in our previous work in Arabidopsisseedlings (Hsu et al., 2016). Here, we performed ribo-some profiling in combination with de novo tran-scriptome assembly to discover noncoding RNAs,uORFs, and sORFs and chart the translational land-scape in tomato roots. The mapping and quantificationof ribosome footprints in tomato not only uncoverednumerous unannotated translation events but also

revealed global features involved in translationalregulation.

RESULTS

Establishment of an Experimental and Data AnalysisPipeline to Map the Tomato Translatome

To map actively translated ORFs, we isolated theroots of tomato seedlings (cv Heinz 1706) and per-formed strand-specific RNA sequencing (RNA-seq)and Ribo-seq in parallel (Fig. 1, A and B). RNA-seq re-veals transcript identity and abundance, whereas Ribo-seq maps and quantifies ribosome occupancy on agiven transcript (Brar and Weissman, 2015). We adap-ted our protocol and pipeline for Arabidopsis (Hsuet al., 2016) with two major modifications: (1) we in-creased the amount of RNase I used in tomato ribosomefootprinting to achieve comparable resolution (see“Materials and Methods” for details); and (2) we per-formed paired-end 100-bp RNA-seq followed byreference-guided de novo transcriptome assembly tocapture transcripts missing from the ITAG3.2 referenceannotation (Fig. 1C; see “Materials and Methods” fordetails). This strategy allowed us to map the translatedregions in both annotated and previously unannotatedtranscripts in an unbiased manner using the ORF-finding tool, RiboTaper (Calviello et al., 2016).

As the quality of ribosome footprints is critical forfinding ORFs (Hsu et al., 2016), we first systematicallyevaluated the Ribo-seq results by mapping the reads tothe ITAG3.2 annotation. Consistent with observationsin other nonplant organisms and Arabidopsis (Ingoliaet al., 2009; Bazzini et al., 2014; Hsu et al., 2016), thedominant ribosome footprints in tomato were 28 nu-cleotides long (Fig. 2A). Moreover, in contrast to RNA-seq, the Ribo-seq reads predominantly mapped to theannotated coding sequences (CDSs) and were sparse inthe 59 UTRs and 39 UTRs (Fig. 2, B and C). The threebiological replicates were highly correlated, as indi-cated by the Pearson correlation, in both Ribo-seq (r 50.998;1) and RNA-seq (r5 0.998;0.999; SupplementalFig. S1, A and B). Overall, the RNA-seq and Ribo-seqdata sets also showed a strong positive correlation(Pearson correlation after removing two extreme out-liers, r 5 0.878–0.88; Spearman correlation with alldata points, r 5 0.912–0.915; Supplemental Fig. S1,C–F). Most importantly, the distribution of ribosomefootprints within the CDS displayed clear three-nucleotide periodicity, a signature of translating ribo-somes that decipher three nucleotides at a time (Fig. 2C;Supplemental Fig. S2). Analyzing the distribution offootprints relative to the annotated translation start/stop sites allowed us to infer that the codon at the P-sitewithin the ribosome is located between nucleotides 13and 15 for 28-nucleotide footprints, and so on for spe-cific footprint lengths (Supplemental Figs. S2 and S3).To visualize the position of the codon being translated,hereafter we use the first nucleotides of the P-sites

368 Plant Physiol. Vol. 181, 2019

Wu et al.

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 3: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

(denoted as P-site signals) to indicate the positions ofthe footprints on the transcripts (Fig. 2C). The robust-ness of the three-nucleotide periodicity can be quanti-fied based on the percentage of reads in the expectedreading frame (shown in red in Fig. 2C and hereafter).At a global level, our 28-nucleotide footprints resultedin 85.5% in-frame reads. Together, these results dem-onstrate that our tomato Ribo-seq data set is of high

quality compared with data sets from plants and otherorganisms (Bazzini et al., 2014; Guydosh and Green,2014; Chung et al., 2015; Schafer et al., 2015; Hsuet al., 2016).Next, we performed reference-guided de novo tran-

scriptome assembly for the RNA-seq data usingstringtie, a transcript assembler (Pertea et al., 2015).Then, the newly assembled transcriptomes from the

Figure 1. Experimental and data analysis procedures for ribosome profiling in tomato roots. A, Four-day-old tomato seedling roots(;3 cm from the tip) were used in this study. B, Experimental workflow for RNA-seq and Ribo-seq and schematics of their ex-pected read distributions in the three reading frames. This figure was adapted fromHsu et al. (2016). C, Data analysisworkflow forreference-guided de novo transcriptome assembly and ORF discovery using RiboTaper.

Figure 2. Ribosome footprints are enriched in coding sequences and display strong three-nucleotide periodicity. A, Distributionof read length of the ribosome footprints. nt., Nucleotides. B, Distribution of the Ribo-seq and RNA-seq reads in different genomicfeatures annotated in ITAG3.2. C,Meta-gene analysis of the 28-nucleotide ribosome footprints near the annotated translation startand stop sites defined by ITAG3.2. The red, blue, and green bars represent reads mapped to the first (expected), second, and thirdreading frames, respectively. The majority of footprints were mapped to the CDS in the expected reading frame (85.5% in frame).For each read, only the first nucleotide in the P-site was plotted (for details, see Supplemental Figs. S2 and S3). The A-site(aminoacyl-tRNA entry site), P-site (peptidyl-tRNA formation site), and E-site (uncharged tRNA exit site) within the ribosomesat translation initiation and termination, and the inferred P-site (nucleotides 13–15) and A-site (nucleotides 16–18), are illustrated.The original meta-plots generated by RiboTaper for all footprint lengths are shown in Supplemental Figure S2.

Plant Physiol. Vol. 181, 2019 369

Tomato Translatome Revealed by Ribosome Profiling

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 4: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

replicates were merged and compared with theITAG3.2 annotations using gffcompare software(Pertea et al., 2016; Fig. 1C). In total, we uncovered 2,263unannotated transcripts that could potentially encodefor novel proteins. These transcripts could be classifiedinto six groups based on their strands and genomicpositions relative to existing gene features, such asintergenic (class u), cis-natural antisense transcripts(class x), intronic (class i), and others (class y and class o;Fig. 3, A and C); the nomenclature and descriptions ofthese discovered transcripts are adapted based on thegffcompare software (Pertea et al., 2016). Class s is ex-pected to result from mapping errors (Pertea et al.,2016) and was included in our downstream analysisas a negative control. The most abundant classes ofuncharacterized transcript in our data were intergenictranscripts (class u; 1,260) and cis-natural antisensetranscripts (class x; 568). All six classes of uncharac-terized transcripts, along with the annotated genes inITAG3.2, were used to find translated ORFs.

Translational Landscape of Tomato Roots as Defined byRibosome Profiling

After collecting the transcript information, we usedRiboTaper (Calviello et al., 2016) to interrogate both theannotated transcripts in ITAG3.2 and the newly as-sembled transcripts to search for all possible ORFs inthe transcriptome. RiboTaper examines the P-site sig-nals within each possible ORF and tests whetherthe signals display a statistically significant three-nucleotide periodicity (Calviello et al., 2016). As aquality control, we first examined translated ORFsdetected at annotated coding regions. In total, 20,659annotated ORFs were identified as translated in ourdata set (Fig. 3B; Supplemental Data Set S1A). Among20,285 annotated protein-coding transcripts that havereasonable transcript levels (transcripts per million[TPM]. 0.5 in RNA-seq), 18,626 (92%) have translatedORFs identified. This indicates that our approach toidentifying translated ORFs is efficient and robust. Inaddition to annotated ORFs, there were 1,329 unanno-tated uORFs translated from the 59 UTR of annotatedgenes (Fig. 3B; Supplemental Data Sets S1B and S2).Notably, since only approximately half of the tran-scripts in ITAG3.2 (17,684 out of 35,768) have an an-notated 59 UTR and because RiboTaper can onlyidentify ORFs in defined transcript ranges, the totalnumber of uORFs in tomato root is clearly anunderestimate.

Excitingly, we identified 354 unannotated translatedORFs from the newly assembled transcripts (Fig. 3B;Supplemental Data Sets S1C and S3). These unanno-tated ORFs were found in different classes of tran-scripts, but none were detected in the negative control,class s (Fig. 3C). As expected, most of the newly dis-covered ORFs were relatively small; ;71% of them(250) encode proteins of less than 100 amino acids(Fig. 3D). Due to their relatively small size, hereafter we

call them small ORFs (sORFs). The average lengths ofthe uORFs, sORFs, and annotated ORFs are 31, 95, and422 amino acids, respectively. Among the 354 sORFs, 87have a predicted signal peptide and are expected to besecreted proteins/peptides (Fig. 3E; Supplemental DataSet S1D). To test if the sORFs and annotated ORFs havesimilar translational properties, we compared theirtranslation efficiency (see the definition in “Materialsand Methods”) and found that they were statisticallyindistinguishable (Fig. 3F). This result supports thenewly identified sORFs as genuine protein-codinggenes in the tomato genome.

The majority of the identified ORFs have high frac-tions of P-site signals mapped to the expected readingframe (Supplemental Fig. S4). Visualizing the profiles ofindividual transcripts confirmed that both the sORFsand numerous annotated ORFs display strong three-nucleotide periodicity within the identified coding re-gions (Fig. 3, G and H). Therefore, by combining thehigh-quality Ribo-seq datawith RiboTaper analysis, wenot only validated many of the annotated gene modelsbut also discovered new ORFs. These previously un-annotated translated regions have been compiled andare ready to be incorporated into the official tomatoannotation (Supplemental Data Sets S1, A–C, S2,and S3).

Evolutionarily Conserved and Solanaceae-Specific sORFs

Previously, we identified 27 sORFs in Arabidopsis byapplying RiboTaper on Ribo-seq data (Hsu et al., 2016).Eight of the Arabidopsis sORFs have known tomatohomologs. Our tomato root data showed that seven ofthe conserved sORFs were both transcribed and trans-lated (Supplemental Fig. S5, A–D). Since Arabidopsisand tomato diverged approximately 100 million yearsago (Ku et al., 2000), our data support that some sORFsare conserved across evolution.

If the newly identified tomato sORFs encode proteinsfor conserved biological processes, we would expectthem to be preserved during evolution. We performedTBLASTN using 157 single-exon sORFs that were 16 to100 amino acids long on 10 diverse plant genomes, in-cluding a wild tomato (Solanum pennellii), potato (So-lanum tuberosum, which belongs to the same family astomato, the Solanaceae), four dicots in other families,two monocots, a lycophyte, and a moss (SupplementalFig. S6). In total, we found 96 Solanaceae-specificsORFs, including 18 sORFs unique to tomato and78 sORFs shared by tomato and either wild tomatoor potato. Out of 157 sORFs analyzed, 139 of themhave homologs in at least one other plant genome.Some of the sORFs are highly conserved across these10 genomes (Supplemental Fig. S6), suggesting thefunctional significance of these sORFs throughout ev-olution. Importantly, the conserved patterns among thehomologs correlate well with their phylogenic rela-tionships, indicating that these sORF homologs areunlikely to be false positives that randomly occurred in

370 Plant Physiol. Vol. 181, 2019

Wu et al.

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 5: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

Figure 3. The translational landscape of the tomato root. A, Classes of newly assembled transcripts identified by stringtie andgffcompare and used in downstream ORF identification. This figure was adapted from the gffcompare Web site (Pertea et al.,2016). B, Summary of translated ORFs identified by RiboTaper in our data set and peptide support from mass spectrometry (MS)data. The uORFs and annotated ORFs were identified from the 59UTRs and expected CDSs of annotated protein-coding genes in

Plant Physiol. Vol. 181, 2019 371

Tomato Translatome Revealed by Ribosome Profiling

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 6: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

the BLAST search. While some sORFs are widely con-served, 96 sORFs are unique to Solanaceae, highlight-ing that our approach to study translatomes directly intomato revealed translational events that were impos-sible to learn about by studying Arabidopsis alone.Taken together, our results reveal both evolutionarilyconserved and Solanaceae-specific sORFs.

Some sORFs and uORFs Generate Stable Proteinsin Planta

To evaluate whether the previously unknown ORFs,including sORFs and uORFs, accumulate stable pro-teins in planta and to validate our Ribo-seq results, weperformed a proteogenomic analysis (Walley andBriggs, 2015) to identify novel peptides arising fromthese unannotated ORFs. Because the sORFs anduORFs are quite small, their protein products do notalways generate peptides with ideal size and/or mass-to-charge ratios that are suitable for detection by MS.To increase the diversity of peptides for MS analysis,we extracted proteins from the roots and shoots of to-mato seedlings and digested the proteins into pep-tides using trypsin or GluC, independently, prior totwo-dimensional liquid chromatography-tandem massspectrometry (MS/MS). As the sORFs and uORFs arecurrently missing from the tomato annotation, we cre-ated a custom protein database (Supplemental Data SetS4) derived from our Ribo-seq data to assist in identi-fying these unannotated proteins. In addition, we usedour custom protein database to search publicly avail-able proteomic data from the tomato fruit (Proteo-meXchange PXD004887) and pericarp (ProteomeXchangePXD004947; Mata et al., 2017; Szymanski et al., 2017). Intotal, we identified 12,172 proteins, including 29 sORFsand 30 uORFs, with at least one unique peptide fromthese six proteomic data sets (Fig. 3B; Supplemental DataSets S1, E and F, and S5, A–C). The MS detection rates

(at least one unique peptide) for sORFs below 100 aminoacids, 100 to 200 amino acids, and higher than 200 aminoacids in our data are 4.8%, 16.3%, and 35.3%, respectively,suggesting that proteins with a larger size have betterchances to be detected by MS. Despite the limitations ofMS in small protein identification, our results support thatsome uORFs and sORFs accumulate stable proteins inplanta.

Ribo-Seq Fine-Tunes and Improves Genome Annotation

Comparing the RiboTaper output and the annotatedgene models, we found cases in which the translatedORFs were dramatically different from the predictedgene models. For example, translation may occur in adifferent reading frame or at a distinct region on thetranscript (Fig. 3I; Supplemental Fig. S7, A–F). Thus,Ribo-seq provides a high-throughput experimentalapproach to validate and improve genome annotation.Furthermore, in several cases, using visual inspection,we found regions that appear to contain a short ORFthat overlaps with the long annotated ORF but uses adifferent reading frame (Fig. 3J). These overlappingORFs are similar to nonupstream coding ORFs identi-fied in the human genome (Michel et al., 2012), and theirfunctional importance is still unknown.

The translation start sites in the genome annotationare typically defined computationally, and often themost upstream AUG is predicted to be the start codon.Unexpectedly, in 64 genes, the RiboTaper-definedtranslation start sites were actually upstream of theannotated start sites (Fig. 4A; Supplemental Data SetS1G). In contrast, some ORFs appeared to use start sitesdownstream of the annotated start sites (Fig. 4B;Supplemental Fig. S5B). Currently, ITAG3.2 containsonly one isoform per gene, and hence only one tran-scription start site is predicted per gene. It is possiblethat, in some cases, translation starts downstream of the

Figure 3. (Continued.)ITAG3.2, respectively. The previously unknown ORFs were identified from the newly assembled transcripts. The bottom rowindicates the number of proteins in each category supported by MS data sets, either from our own proteomic analysis or searchesagainst publicly available data. C, Summary of newly assembled transcripts andORFs identified in each class of newly assembledtranscripts. The total number of transcripts, number of transcripts identified as translated, and total number of translated ORFs arelisted. D, Size distribution of each class of sORFs, uORFs, and annotated ORFs (aORFs). E, Predicted subcellular localization ofproteins encoded by the sORFs. The prediction was performed using TargetP (Emanuelsson et al., 2000) with specificity 0.9 as acutoff. F, Translation efficiency of sORFs comparedwith annotatedORFs. Only the coding regionswere used to compute the TPMand translation efficiency of each transcript. For the x axis, only the range from 0 to 3 (arbitrary units) is shown. A two-sampleKolmogorov-Smirnov test was used to determine statistical significance. G to J, RNA-seq coverage and Ribo-seq periodicity indifferent genes: an intergenic sORFon chromosome 4 (G); an annotated coding gene that has good support from the Ribo-seq datafor the predicted gene model (H); a misannotated ORF (I), note that the Ribo-seq reads do not match the CDS in the gene modeland a different reading frame is used; a transcript with a potentially overlapping ORF within the annotated ORF (J). In G to J, the xaxis indicates the genomic coordinate of the gene and the y axis shows the normalized read count (counts per hundred millionreads). Ribo-seq reads are shown by plotting the first nucleotide of their P-sites (denoted as the P-site signals). The black and graydashed vertical lines mark the predicted translation start and stop sites, respectively. The red, blue, and green lines in the Ribo-seqplot indicate the P-site signals mapped to the first (expected) reading frame and the second and third reading frames, respectively;the gray lines indicate the P-site signals mapped to outside of the annotated or identified coding regions. Hence, a higher ratio ofred means better three-nucleotide periodicity. For the gene model beneath the Ribo-seq data, the gray, black, and white areasindicate the 59 UTR, CDS, and 39 UTR, respectively. In J, the yellow box above the gene model indicates the region with apotential ORF overlapping with the annotated ORF.

372 Plant Physiol. Vol. 181, 2019

Wu et al.

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 7: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

annotated site, because transcription initiates down-stream of the annotated transcription start site. None-theless, it appears that the most upstream AUG is notalways used as the translation start site.Non-AUG translation initiation has been discovered

in animals and plants (Simpson et al., 2010; Laing et al.,2015; Kearse and Wilusz, 2017; Spealman et al., 2018).Twelve evolutionarily conserved noncanonical trans-lation starts upstream of themost likely AUGhave beenpredicted in Arabidopsis (Simpson et al., 2010), and wepreviously showed that at least one of them, inAT3G10985, has high Ribo-seq coverage using a CUGcodon (Hsu et al., 2016). The profile of the tomato ho-molog of AT3G10985 confirmed the possible usage ofthe CUG start site (Fig. 4C). Next, we identified tomatohomologs of all 12 predicted noncanonical-start genesand systematically checked their Ribo-seq coverageupstream of the annotated AUG start sites. We selectedgenes that met the following criteria: (1) the Ribo-seqreads cover at least seven in-frame P-site positionswithin the first 20 codons upstream of the AUG; and (2)there is no stop codon within the first 20 codons up-stream of the AUG. We found that eight tomato genesthat met the above criteria contain abundant readsupstream of the annotated AUG, suggesting that theyuse non-AUG start sites (Fig. 4D). Thus, despite theevolutionary distance between Arabidopsis and to-mato, the usage of noncanonical translation initiationremains conserved in these homologs.

uORFs Regulate Translation Efficiency

Using RiboTaper, we identified 1,329 translateduORFs based on their significant three-nucleotide per-iodicity (Fig. 3B; Supplemental Data Sets S1B and S2).These uORFs included previously predicted conserveduORFs in the tomato SAC51 homolog (Fig. 5A; Imaiet al., 2006) as well as previously unknown uORFs innumerous coding genes (Fig. 5B). Manual inspection ofthese transcripts suggested that the high stringency ofRiboTaper might miss uORFs with lower periodicity,overlapping uORFs, and non-AUG-start uORFs. Forexample, the second of the three uORFs in the tomatoSAC51 transcript (Fig. 5A) was not identified as codingby RiboTaper, presumably due to the imperfect perio-dicity in this area. Nevertheless, those identified arehigh-confidence translated uORFs.Global analyses have reported that translated uORFs

repress the translation of their downstream main ORFs(Liu et al., 2013; Lei et al., 2015; Chew et al., 2016;Johnstone et al., 2016). Consistent with these reports,we found that globally, transcripts containing uORFshave lower translation efficiency than those withoutuORFs (Fig. 5C). In addition, more uORFs in a tran-script correlate with stronger translational repression(Fig. 5C). To investigate which physiological pathwaysmight be regulated by uORFs, we checked the GeneOntology (GO) terms of the uORF-containing genes.Intriguingly, uORF-containing genes were enriched for

Figure 4. Upstream/downstream start sites and non-AUG start sites. A and B, Examples of the usage of an upstream start site (A) ora downstream start site (B). The gene model and data presentation are the same as those described in the legend of Figure 3. Theblue triangles mark the locations of the annotated translation start sites. The orange triangles mark the locations of the RiboTaper-identified translation start sites. C, A tomato homolog of an Arabidopsis gene that was predicted to use an upstreamCUG start site(orange triangle). Note the abundant in-frame P-site signals upstream of the annotated AUG start (blue triangle) in the 59UTR. D,Conservation of potential CUG/non-AUG start sites. The Arabidopsis gene identifier, tomato gene identifier, percentage aminoacid identity, and number of in-frame P-site positions with Ribo-seq reads within the first 20 codons upstream of the AUG in ourtomato root data are shown.

Plant Physiol. Vol. 181, 2019 373

Tomato Translatome Revealed by Ribosome Profiling

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 8: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

protein kinases and phosphatases as well as signaltransduction (Fig. 5D). This is similar to a previousprediction in Arabidopsis (Kim et al., 2007), except thattranscription factors are not enriched in our data. Ourresults imply that substantial portions of the proteinphosphorylation/dephosphorylation and signal trans-duction pathways in tomato are likely translationallyregulated through uORFs.

Translation start sites have a well-defined Kozakconsensus sequence in different organisms (Kozak,1987; Lütcke et al., 1987). For example, the conservednucleotides at positions 23 and 14 of the Kozak se-quence in plants are purines (A/G) and G, respectively(Lütcke et al., 1987). As expected, we observed thisconserved pattern among the annotated ORFs (Fig. 5E).Next, we examined the Kozak consensus sequences ofthe translated uORFs and their downstream mainORFs. Whereas the downstream main ORFs also favorthe conserved nucleotides at 23 and 14 of the Kozaksequence, this pattern is missing in the uORFs (Fig. 5, Eand F). Similar results were observed in the Kozak se-quences of the uORFs and downstream main ORFs in

Arabidopsis (Liu et al., 2013). The poorly conservedKozak sequences might allow for more leaky scanning,a phenomenon in which a weak initiation context issometimes skipped by the ribosome during translationinitiation, so the downstream main ORFs could stillhave some chances to be translated.

Regulation of Gene Expression by MicroRNAs

MicroRNAs regulate gene expression throughmRNA cleavage and translational repression (Yu et al.,2017; Li et al., 2018). The roles of microRNAs in tomatoare less well understood than in Arabidopsis. We firstpredicted 6,312 microRNA target genes in tomato(Supplemental Data Set S1, H and I) using psRNA-Target (Dai et al., 2018). Next, we compared their RNA-seq and Ribo-seq levels and the translation efficiency ofthe microRNA targets and other coding genes globally.The transcript levels of the microRNA targets wereslightly but significantly reduced, consistent with thepossibility that microRNAs regulate gene expression

Figure 5. uORFs repress translationefficiency of their downstream mainORFs and contain less-pronouncedKozak sequences. A and B, Profiles ofgenes containing conserved uORFs (A)or a previously uncharacterized uORF(B). The gene model and data presen-tation are the same as those describedin the legend of Figure 3. The uORFsare labeled with yellow and orangeboxes in the gene models. For theuORFs, the orange and green dashedvertical lines mark the translation startand stop sites, respectively. C, Thetranslation efficiency of the main ORFsfor transcripts containing a differentnumber of translated uORFs. Only thecoding regions were used to computethe TPM and translation efficiency ofeach transcript. The colored bars beforethe P values indicate the pairs of dataused to determine statistical signifi-cance. The P values were determinedwith two-sample Kolmogorov-Smirnovtests. A.U., Arbitrary units. D, Selectednonredundant GO categories for genescontaining one or more uORFs. FDR,False discovery rate. E and F, Kozaksequences of annotated ORFs, uORFs,and uORF-associated main ORFs. Thestatistical significance in F was deter-mined using x2 tests.

374 Plant Physiol. Vol. 181, 2019

Wu et al.

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 9: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

throughmRNA cleavage (Fig. 6A). In addition, both theRibo-seq levels and translation efficiency of the micro-RNA target genes were reduced (Fig. 6, B and C), con-sistent with prior observations of translationalrepression mediated by microRNAs (Faghihi andWahlestedt, 2009). Thus, our results suggest that glob-ally, microRNAs regulate gene expression at both thetranscript and translational levels in tomato.

DISCUSSION

Most of the plant research on mRNA translation wasperformed in Arabidopsis, and the knowledge has beentransferred into several crops to improve crop perfor-mance. However, on a genome-wide level, it is unclearhow well the Arabidopsis translatome compares withother species. In this study, we combined de novotranscriptome assembly and ribosome profiling tostudy the tomato translatome. We found that despiteArabidopsis and tomato diverging over 100 millionyears ago, many translational features are well con-served. Overall, we observed shared features betweenour Arabidopsis and tomato Ribo-seq data, includingthe most abundant ribosome footprint size and theinferred P-site within ribosome footprints. We foundthat previously unannotated translation events, such asuORFs and sORFs, are also widespread in tomato. Inaddition, we observed that usage of non-AUG transla-tion start sites is shared between Arabidopsis and to-mato. Finally, translational regulatory mechanisms,including uORFs on their downstream main ORFs andmicroRNAs on their target genes, are also well con-served in these two species.Interestingly, we discovered 96 previously unknown

sORFs only present in Solanaceae, including 78 sharedby tomato and either wild tomato or potato and 18sORFs uniquely found in tomato. These family-specificsORFs may provide functions unique to Solanaceae.The idea of family-specific regulatory molecules wasproposed based on systemin, the first peptide hormoneidentified in plants. Systemin is only present in Sol-aneae, a subtribe of the Solanaceae (Pearce et al., 1991;

Constabel et al., 1998). Such family- or subfamily-specificregulatory molecules may evolve during evolution for aspecific lineage of plants. Even species-specific sORFshave been proposed to be important (Andrews andRothnagel, 2014). The functions of the widely con-served and Solanaceae-specific sORFs require furtherstudies.Peptide signaling is crucial for cell-cell communica-

tion in numerous aspects of plant development andstress responses (Tavormina et al., 2015; Hsu andBenfey, 2018). We found 87 sORFs that encode poten-tial secreted peptides. However, as about 50% of se-creted proteins in plants lack a well-defined signalpeptide (Agrawal et al., 2010), some sORFs without apredicted signal peptide may still be secreted. In addi-tion, sORF products without a signal peptide have beenfound to play an important role in a wide range ofphysiological processes in plants, such as vegetativeand reproductive development, small interfering RNAbiogenesis, and stress tolerance (Casson et al., 2002;Blanvillain et al., 2011; Ikeuchi et al., 2011; Valdiviaet al., 2012; De Coninck et al., 2013). Therefore, theidentification of sORFs using ribosome profiling facili-tates potential applications of these peptides in im-proving crop performance.Several studies have illustrated the power of altering

mRNA translation via uORFs to improve agriculture(Sagor et al., 2016; Xu et al., 2017b; Zhang et al., 2018).For example, engineering rice that specifically inducesdefense proteins when a uORF is repressed by patho-gen attack enables immediate plant resistance withoutcompromising plant growth in the absence of patho-gens (Xu et al., 2017b). The identification of translatedORFs provides new possibilities to fine-tune the syn-thesis of proteins involved in diverse physiologicalpathways. Notably, the number of uORFs in tomato isstill an underestimate. Approximately half of the to-mato genes still lack annotated 59UTRs, and RiboTaperonly searches for potential translated ORFs in definedtranscript regions. Thus, uORFs could be an even morewidespread mechanism to control translation in to-mato. Future studies using a combination of cap anal-ysis gene expression sequencing or paired-end analysis

Figure 6. Regulation of gene expression bymicroRNAs (miRNAs). Cumulative distributions are shown for RNA-seq (A), Ribo-seq(B), and translation efficiency (TE; C) of microRNA targets and nonmicroRNA target genes. For the x axis in A and B, only the rangefrom 0 and 50 (TPM) is shown. Only the coding regions were used to compute the TPM and translation efficiency of eachtranscript. The P values were determined with two-sample Kolmogorov-Smirnov tests. A.U., Arbitrary units.

Plant Physiol. Vol. 181, 2019 375

Tomato Translatome Revealed by Ribosome Profiling

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 10: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

of transcription start sites sequencing with the long-read sequencing could facilitate defining the 59 UTRsassociated with specific isoforms (Ozsolak and Milos,2011) and enable the identification of missing uORFs.

Ribo-seq has been integrated into proteomic researchto achieve deeper proteome coverage (Menschaertet al., 2013; Van Damme et al., 2014; Crappé et al.,2015; Calviello et al., 2016). Unlike DNA or RNA mol-ecules, which can be sequenced using genomic tech-nologies, proteins are typically identified by matchingMS spectra to theoretical spectra from candidate pep-tides in a reference protein database. Before ribosomeprofiling became available, to include potential proteinsequences, the conventional proteogenomics approachexploited either three-frame translation using tran-scriptome data or six-frame translation using genomicsequences (Walley and Briggs, 2015; Ruggles et al.,2017). Integrating Ribo-seq data into the constructionof protein databases for proteogenomic studies has twoadvantages: (1) Ribo-seq discovers unannotated trans-lation events and thus enables the identification of un-known proteins that were previously missed in theannotation; and (2) compared with three-frame or six-frame translation, Ribo-seq reduces the search spaceand false positives. Therefore, our custom protein da-tabase, built based on the Ribo-seq data, may aid inproteomic research in tomato.

CONCLUSION

In summary, our approach combining transcriptomeassembly and ribosome profiling enabled the identifi-cation of translated ORFs genome wide in tomato andrevealed conserved and unique translational featuresacross evolution. Our results not only provide valuableinformation to the plant community but also present apractical strategy to study translatomes in other less-well-annotated organisms.

MATERIALS AND METHODS

Plant Materials and Preparation of Lysates for RNA-Seqand Ribo-Seq

Tomato seeds (Solanum lycopersicum ‘Heinz 1706’) were obtained from theC.M. Rick Tomato Genetics Resource Center (accession LA4345) and bulked.For each replicate, ;300 tomato seeds were surface sterilized in 70% (v/v)ethanol for 5 min followed by bleach solution (2.4% [v/v] NaHClO and 0.3%[v/v] Tween 20) for 30 min with shaking. The seeds were then washed withsterile water five times. Next, the seeds were stratified on 13 Murashige andSkoog medium (4.3 g L21 Murashige and Skoog salt, 1% [w/v] Suc, 0.5 g L21

MES, pH 5.7, and 1% [w/v] agar) and kept at 22°C in the dark for 3 d beforebeing grown under 16-h-light/8-h-dark conditions at 22°C for 4 d. Seedlingsthat germinated at approximately the same time and of similar size were se-lected for the experiments. Roots (;3 cm from the tip) from ;180 plants wereharvested at Zeitgeber time 3 (3 h after lights on) in batches and immediatelyfrozen in liquid nitrogen. The frozen tissues were pooled and pulverized inliquid nitrogen using amortar and pestle. Approximately 0.4 g of tissue powderwas resuspended in 1.2 mL of lysis buffer (100 mM Tris-HCl [pH 8], 40 mM KCl,20 mM MgCl2, 2% [v/v] polyoxyethylene [10] tridecyl ether [Sigma, P2393], 1%[w/v] sodium deoxycholate [Sigma, D6750], 1 mM dithiothreitol, 100 mg mL21

cycloheximide, and 10 units mL21 DNase I [Epicenter, D9905K]) as described

by Hsu et al. (2016). After incubation on ice with gentle shaking for 10 min, thelysate was spun at 4°C at 20,000g for 10min. The supernatant was transferred toa new tube and divided into 100-mL aliquots. The aliquoted lysates were flashfrozen in liquid nitrogen and stored at 280°C until processing.

RNA Purification and RNA-Seq Library Construction

For RNA-seq samples, 10 mL of 10% (w/v) SDS was added to the 100-mLlysate aliquots described above. RNA greater than 200 nucleotides wasextracted using a Zymo RNA Clean & Concentrator kit (Zymo Research,R1017). The obtained RNAwas checked with a Bioanalyzer (Agilent) RNA picochip to access the RNA integrity, and a RNA integrity number value rangingfrom 9.2 to 9.4 was obtained for each replicate. Ribosomal RNAs (rRNAs) weredepleted using a RiboZero Plant Leaf kit (Illumina, MRZPL1224). Next, 100 ngof the rRNA-depleted RNA was used as the starting material, fragmented to;200 nucleotides based on the RNA integrity number reported by the Bio-analyzer, and processed using the NEBNext Ultra Directional RNA LibraryPrep Kit (New England Biolabs, E7420S) to create strand-specific libraries. Thelibraries were barcoded and enriched using 11 cycles of PCR amplification. Thelibraries were brought to equal molarity, pooled, and sequenced on one lane ofa Hi-Seq 4000 using PE-100 sequencing.

Ribosome Footprinting and Ribo-Seq Library Construction

The Ribo-seq samples were prepared based on Hsu et al. (2016) withmodifications described as follows, which optimize the method for tomato.Briefly, the RNA concentration of each lysate was first determined using aQubit RNA HS assay (Invitrogen, Q32852) using a 10-fold dilution. Next,100 mL of the lysate described above was treated with 100 units of nuclease(provided in the TruSeq Mammalian Ribo Profile Kit, Illumina, RPHMR12126)per 40mg of RNAwith gentle shaking at room temperature for 1 h. The nucleasereaction was stopped by immediately transferring to ice and adding 15 mL ofSUPERase-IN (Invitrogen, AM2696). The ribosomeswere isolated using IllustraMicroSpin S-400 HR columns (GE Healthcare, 27514001). RNA greater than 17nucleotides was purified first (Zymo Research, R1017), and then RNA smallerthan 200 nucleotides was enriched (Zymo Research, R1015). Next, the rRNAswere depleted using a RiboZero Plant Leaf kit (Illumina, MRZPL1224). TherRNA-depleted RNA was then separated via 15% (w/v) Tris-borate-EDTA-urea PAGE (Invitrogen, EC68852BOX), and gel slices ranging from 28 to 30nucleotides were excised. Ribosome footprints were recovered from the excisedgel slices using the overnight elution method, and the sequencing libraries wereconstructed according to the TruSeq Mammalian Ribo Profile Kit manual. Thefinal libraries were amplified via nine cycles of PCR. The libraries were broughtto equal molarity, pooled, and sequenced on two lanes of a Hi-Seq 4000 usingSE-50 sequencing.

RNA-Seq and Ribo-Seq Data Analysis

The rawRNA-seq andRibo-seq data and detailedmapping parameters havebeen deposited in the Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/geo) under accession number GSE124962. The tomato referencegenome sequence and annotation files used in this study were downloadedfrom the Sol Genomics Network (Fernandez-Pozo et al., 2015). The adaptorsequence AGATCGGAAGAGCACACGTCT was first removed from the Ribo-seq data using FASTX_clipper v0.0.14 (http://hannonlab.cshl.edu/fastx_tool-kit/). For both RNA-seq and Ribo-seq, the rRNA, tRNA, small nuclear RNA,small nucleolar RNA, and repeat sequences were removed using Bowtie2v2.3.4.1 (Langmead and Salzberg, 2012). The rRNA, tRNA, small nuclear RNA,and small nucleolar RNA sequences were extracted from the SL2.5 genomeassembly with the ITAG2.4 annotation (Fernandez-Pozo et al., 2015), and therepeat sequences were extracted from the SL3.0 genome assembly with theITAG3.2 annotation. After these contaminating sequences were removed usingBowtie2, the preprocessed RNA-seq and Ribo-seq files were used to calculatethe read distribution in different gene features (Fig. 2B) using the featureCountsfunction of the Subread package v1.5.3 (Liao et al., 2014).

Next, the preprocessed RNA-seq and Ribo-seq reads were mapped to thetomato reference genome sequence SL3.0 with the ITAG3.2 annotation usingthe STAR v2.6.0.c (Dobin et al., 2013). The reference-guided de novo assemblyof the mapped RNA-seq reads was performed with stringtie v1.3.3b (Perteaet al., 2015), and the newly assembled gtf files were compared with ITAG3.2using gffcompare v0.10.1 (Pertea et al., 2016). The i, x, y, o, u, and s classes of

376 Plant Physiol. Vol. 181, 2019

Wu et al.

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 11: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

new transcripts (see Fig. 3A for details) and their descriptions were extractedfrom the gffcompare output gtf and concatenated with ITAG3.2. This combinedgtf (referred to as Tomato_Root_ixyous1ITAG3.2.gtf; submitted to GEO as aprocessed file within GSE124962) was used to map the RNA-seq and Ribo-seqreads again with STAR. Notably, all six classes of uncharacterized transcripts inTomato_Root_ixyous1ITAG3.2.gtf were assigned as noncoding RNAs, andthis gtf was used for downstream RiboTaper analysis. The three biologicalreplicates of the mapped bam files for RNA-seq were merged into one largebam file with SAMtools v1.8 (Li et al., 2009). The three mapped Ribo-seq bamfiles were also merged. The two merged bam files above were then used forORF discovery with RiboTaper v1.3 (Calviello et al., 2016).

For RiboTaper analysis, the RiboTaper annotation files and the offset pa-rameters (i.e. the inferred P-site position for each footprint length) were firstobtained. The RiboTaper annotation files were generated using the crea-te_annotations_files.bash function in the RiboTaper package using SL3.0 as-sembly and the Tomato_Root_ixyous1ITAG3.2.gtf. To obtain the offsetparameters, the create_metaplots.bash and metag.R functions in the RiboTaperpackage were used to generate meta-gene plots. The offset parameters wereidentified through the meta-gene plots. For 24-, 25-, 26-, 27-, and 28-nucleotidefootprints, the offset values were 8, 9, 10, 11, and 12, respectively (SupplementalFig. S3). Next, we performed RiboTaper analysis using the RiboTaper annota-tion, offset parameters, and RNA-seq and Ribo-seq bam files. The coding se-quences identified by RiboTaper from the newly assembled transcripts wereextracted from the translated_ORFs_filtered_sorted.bed file and integratedwith Tomato_Root_ixyous1ITAG3.2.gtf to generate Supplemental Data Sets S2and S3.

We then mapped the Ribo-seq and RNA-seq data again to the CDS rangeswith STAR, and the transcripts per million (TPM) for the CDS of each transcriptwas quantified via RSEM v1.3.0 (Li and Dewey, 2011). The formula to calculatetranslation efficiency is TE 5 (the TPMCDS of Ribo-seq)/(the TPMCDS of RNA-seq). To avoid inflation due to a small denominator, only genes with an RNA-seq TPM greater than 0.5 were used in the statistical analysis of translationefficiency. The plotting of three-nucleotide periodicity of the Ribo-seq andcoverage of RNA-seq was generated by incorporating the plot function in Rv3.4.3 (R Core Team, 2013) with functions from GenomicRanges v1.30.3,GenomicFeatures v1.30.3, and GenomicAlignments v1.14.2 libraries (Lawrenceet al., 2013) to read in the gtf file and RNA-seq bam file. The merged RNA-seqbam file from STAR and the processed P_sites_all file from RiboTaper wereused to plot the RNA-seq coverage and P-sites of Ribo-seq, respectively. TheLinux command line code to preprocess the P_sites_all file before use for plottingwas cut -f 1,3,6 P_sites_all j sort j uniq -c j sed -r ’s/^(*[^ ]1) 1/\1\t/’ .name_output_file. For plotting the CUG/non-AUG start gene, the CDS rangeof the gene in the gtf file was manually modified before plotting.

Statistical Analysis

The statistical analysis in this studywas performed in R (RCore Team, 2013).The chisq.test and ks.test functions of the stats package in Rwere used for the x2

analysis and the Kolmogorov-Smirnov test, respectively. The Pearson andSpearman correlation coefficients were calculated using the cor function.Pairwise comparisons were performed using the corrplot function in the corr-plot v0.84 package (Wei, 2013). The empirical cumulative probabilities oftranslation efficiency were calculated using the ecdf function (in the statspackage) and plotted with the base R plot function.

Protein Extraction and Digestion

Roots (;3 cm near the tip) and shoots (shoot tip including ;1 cm of hy-pocotyl) of 4-d-old tomato seedlings were harvested at Zeitgeber time 3 (3 hafter lights on). The proteomics experiments were carried out based on estab-lishedmethods as follows (Castellana et al., 2014; Song et al., 2018a, 2018b). Fivevolumes (v/w) of Tris-buffered phenol, pH 8, was added to 150 mg of groundtissue, vortexed for 1 min, then mixed with 5 volumes (buffer/tissue, v/w) ofextraction buffer (50 mM Tris, pH 7.5, 1 mM EDTA, pH 8, and 0.9 M Suc), andcentrifuged at 13,000g for 10 min at 4°C. The phenol phase was transferred to anew tube, and a second phenol extractionwas performed on the aqueous phase.The two phenol phase extractions were combined, and 5 volumes of prechilledmethanol with 0.1 M ammonium acetate was added. This was mixed well andkeep at 280°C for 1 h prior to centrifugation at 4,500g for 10 min at 4°C. Pre-cipitation with 0.1 M ammonium acetate in methanol was performed twice withincubation at 220°C for 30 min. The sample was resuspended in 70% (v/v)methanol at kept at220°C for 30 min prior to centrifuging at 4,500g for 10 min

at 4°C. The supernatant was discarded, and the pellet was placed in a vacuumconcentrator until it was nearly dry. Two volumes (buffer/pellet, v/v) ofprotein digestion buffer [8 M urea, 50 mM Tris, pH 7, and 5 mM Tris(2-carbox-yethyl)phosphine hydrochloride] was added to the pellet. The samples werethen probe sonicated to aid in resuspension of the pellet. The protein concen-tration was then determined using the Bradford assay (Thermo Scientific).

The solubilized protein (;1 mg) was added to an Amicon Ultracel –30Kcentrifugal filter (catalog no. UFC803008) and centrifuged at 4,000g for 20 to40min. This stepwas repeated once. Then, 4 mL of urea solutionwith 2mM Tris(2-carboxyethyl)phosphine hydrochloride was added to the filter unit andcentrifuged at 4,000g for 20 to 40 min. Next, 2 mL of iodoacetamide solution(50 mM iodoacetamide in 8 M urea) was added and incubated without mixing atroom temperature for 30 min in the dark prior to centrifuging at 4,000g for 20 to40 min. Two milliliters of urea solution was added to the filter unit, which wasthen centrifuged at 4,000g for 20 to 40 min. This step was repeated once. Twomilliliters of 0.05 M NH4HCO3 was added to the filter unit and centrifuged at4,000g for 20 to 40 min. This step was repeated once. Then, 2 mL 0.05 M

NH4HCO3 with trypsin (enzyme:protein ratio 1:100) or GluC (enzyme:proteinratio 1:20) was added. Samples were incubated at 37°C overnight. Undigestedprotein was estimated using Bradford assays, and then trypsin (1 mg mL21) wasadded to a ratio of 1:100 and an equal volume of Lys-C (0.1mgmL21) was addedto the trypsin/Lys-C-digested sample and GluC was added at a ratio of 1:20 tothe sample digested with GluC. The digests were incubated for an additional4 h at 37°C. The filter unit was added to a new collection tube and centrifuged at4,000g for 20 to 40 min. One milliliter of 0.05 M NH4HCO3 was added andcentrifuged at 4,000g for 20 to 40 min. The samples were acidified to pH 2 to 3with 99% (v/v) formic acid and centrifuged at 21,000g for 20 min. Finally,samples were desalted using 50-mg Sep-Pak C18 cartridges (Waters). Elutedpeptides were dried using a vacuum centrifuge (Thermo Scientific) andresuspended in 0.1% (v/v) formic acid. Peptide amount was quantified usingthe Pierce BCA Protein Assay Kit.

Liquid Chromatography-MS/MS

An Agilent 1260 quaternary HPLC device was used to deliver a flow rate of;600 nL min21 via a splitter. All columns were packed in house using a NextAdvance pressure cell, and the nanospray tipswere fabricated using a fused silicacapillary that was pulled to a sharp tip using a laser puller (Sutter, P-2000).Twenty-five micrograms of peptides was loaded unto 20-cm capillary columnspacked with 5 mM Zorbax SB-C18 (Agilent), which was connected using a zerodead volume 1-mm filter (Upchurch,M548) to a 5-cm-long strong cation exchange(SCX) column packed with 5 mM polysulfoethyl (PolyLC). The SCX column wasthen connected to a 20-cm nanospray tip packed with 2.5 mM C18 (Waters). Thethree sections were joined and mounted on a custom electrospray source foronline nested peptide elution. A new set of columns was used for every sample.Peptideswere eluted from the loading columnunto the SCX columnusing a 0% to80% acetonitrile gradient over 60 min. Peptides were then fractionated from theSCX column using a series of ammonium acetate salt steps as follows: 10, 30, 32.5,35, 37.5, 40, 42.5, 45, 50, 55, 65, 75, 85, 90, 95, 100, 150, and 1,000 mM. For theseanalyses, buffers A (99.9%water and 0.1% formic acid), B (99.9% acetonitrile and0.1% formic acid), C (100mM ammonium acetate and 2% formic acid), and D (1 M

ammonium acetate and 2% formic acid) were utilized. For each salt step, a 150-min gradient programcomprised a 0 to 5min increase to the specified ammoniumacetate concentration (using buffer C or D), 5 to 10min hold, 10 to 14min at 100%buffer A, 15 to 120 min at 5% to 35% buffer B, 120 to 140 min at 35% to 80% bufferB, 140 to 145 min at 80% buffer B, and 145 to 150 min at buffer A.

Eluted peptides were analyzed using a Thermo Scientific Q-Exactive Plus high-resolution quadrupole Orbitrap mass spectrometer, which was directly coupled tothe HPLC device. Data-dependent acquisition was obtained using Xcalibur 4.0software in positive ionmodewith a spray voltage of 2 kV, a capillary temperatureof 275°C, and a radio frequency of 60.MS1 spectraweremeasured at a resolution of70,000, an automatic gain control of 3e6, with amaximum ion time of 100ms and amass range of 400 to 2,000mass-to-charge ratio. Up to 15MS2 scanswere triggeredat a resolution of 17,500, an automatic gain control of 1e5with amaximum ion timeof 50 ms, an isolation window of 1.5 mass-to-charge ratio, and a normalized col-lision energy of 28. Charge exclusion was set to unassigned, 1, 5 to 8, and greaterthan 8. MS1 that triggered MS2 scans were dynamically excluded for 25 s.

Database Search and False Discovery Rate Filtering

The raw datawere analyzed usingMaxQuant version 1.6.3.3 (Tyanova et al.,2016). A customized protein database containing 22,513 proteins (Supplemental

Plant Physiol. Vol. 181, 2019 377

Tomato Translatome Revealed by Ribosome Profiling

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 12: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

Data Set S4) was generated from the RiboTaper output file ORFs_max_filt.Spectra were searched against the customized protein database, which wascomplemented with reverse decoy sequences and common contaminants byMaxQuant. Carbamidomethyl Cys was set as a fixed modification, while Metoxidation and protein N-terminal acetylation were set as variable modifica-tions. Digestion parameters were set to specific and Trypsin/P;LysC or GluC.Up to two missed cleavages were allowed. A false discovery rate less than 0.01and protein identification level was required. The second peptide option wasused to identify cofragmented peptides. The match between runs feature ofMaxQuant was not utilized. Raw data files and MaxQuant Search results havebeen deposited in the Mass Spectrometry Interactive Virtual Environment re-pository (https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp) withdata set identifier MSV000083363.

Prediction of the Subcellular Localization of sORFs

A fasta file containing the sORF amino acid sequences was uploaded to theTargetP Web site (Emanuelsson et al., 2000). We selected Plant as the organismgroup and .0.90 as the specificity cutoff and then submitted for analysis.

Evolutionary Analysis

The TBLASTN function for BLAST v2.7.1 (OS Linux_x86_64;Camacho et al., 2009) was used for the homology search. Because sev-eral plant genomes still lack exon-intron junction information in theirannotations, we only selected single-exon tomato sORFs that encoded16 to 100 amino acid residues for this analysis, and the reference genomes(Athaliana_167_TAIR9.fa, Atrichopoda_291_v1.0.fa, Csinensis_154_v1.fa,Mtruncatula_285_Mt4.0.fa, Osativa_323_v7.0.fa, Ppatens_318_v3.fa, S_lyco-persicum_chromosomes.3.00.fa, Sitalica_312_v2.fa, Smoellendorffii_91_v1.fa,and Stuberosum_448_v4.03.fa) were downloaded from Phytozome v12(Goodstein et al., 2012). The fa (fasta) files for each genome were used to gen-erate BLAST databases with the following code: makeblastdb -in genome.fa-parse_seqids -dbtype nucl, where genome.fa was replaced with the fasta filefor each genome. Next, the code tblastn -query input.fa -db species_database-out species_blast_result.txt -evalue 0.001 -outfmt ’6 qseqid sseqid length qlenqstart qend sstart send pident gapopen mismatch evalue bitscore’ -num_th-reads 10 was used to search for sequence homologs in the target genomes. Thenames of species_database and species_blast_result.txt were changed corre-spondingly. The final heatmap for amino acid identity was plotted in R usingthe pheatmap v1.0.10 (Kolde, 2015) and RColorBrewer v1.1.2 (Neuwirth, 2014)libraries.

MicroRNA Target Identification

The tomato microRNA sequences were extracted from Kaur et al. (2017) andLiu et al. (2017). Next, we used psRNATarget (Dai et al., 2018) against ITAG3.2mRNA sequences to identify potential microRNA targets. We used Schema V2(2017 release; Dai et al., 2018) and selected calculate target accessibility as theanalysis parameter.

GO Term Analysis

agriGO v2.0 (Tian et al., 2017) was used for the GO analysis of uORF-containing genes.

Accession Numbers

The raw RNA-seq and Ribo-seq data have been deposited in the GEO da-tabase under accession number GSE124962. Proteomics raw data files andMaxQuant Search results have been deposited at the Mass Spectrometry In-teractive Virtual Environment repository with data set identifierMSV000083363.

Supplemental Data

The following supplemental materials are available.

Supplemental Figure S1. Correlation between RNA-seq and Ribo-seqdata.

Supplemental Figure S2. Meta-gene analysis and inference of the P-site forribosome footprints of different lengths.

Supplemental Figure S3. Summary of the inferred P-site position for eachfootprint length.

Supplemental Figure S4. Fractions of in-frame P-sites for different groupsof translated ORFs.

Supplemental Figure S5. Translation of tomato homologs of ArabidopsissORFs.

Supplemental Figure S6. Evolutionary conservation of sORFs.

Supplemental Figure S7. Examples of conflicts between annotated genemodels and translational profiles.

Supplemental Data Set S1. Lists of ORFs_proteomics_miRNAs, xlsx,spreadsheets (A) to (I).

Supplemental Data Set S2. uORF.gtf (gtf for uORFs).

Supplemental Data Set S3. sORF.gtf (gtf for sORFs).

Supplemental Data Set S4. Amino_acid_sequences_for_translated_ORFs.fa(amino acid sequences for all translated ORFs identified by RiboTaper inthis study).

Supplemental Data Set S5. Proteogenomics, xlsx, spreadsheets (A) to (C).

ACKNOWLEDGMENTS

We thank Philip N. Benfey at Duke University for generous support to helpinitiate this project. This work used the Vincent J. Coates Genomics SequencingLaboratory at the University of California, Berkeley, supported by a NationalInstitutes of Health instrumentation grant (S10 OD018174).

Received May 3, 2019; accepted June 10, 2019; published June 27, 2019.

LITERATURE CITED

Agrawal GK, Jwa NS, Lebrun MH, Job D, Rakwal R (2010) Plant secre-tome: Unlocking secrets of the secreted proteins. Proteomics 10: 799–827

Andreev DE, O’Connor PBF, Loughran G, Dmitriev SE, Baranov PV,Shatsky IN (2017) Insights into the mechanisms of eukaryotic transla-tion gained with ribosome profiling. Nucleic Acids Res 45: 513–526

Andrews SJ, Rothnagel JA (2014) Emerging evidence for functional pep-tides encoded by short open reading frames. Nat Rev Genet 15: 193–204

Basrai MA, Hieter P, Boeke JD (1997) Small open reading frames: Beautifulneedles in the haystack. Genome Res 7: 768–771

Bazin J, Baerenfaller K, Gosai SJ, Gregory BD, Crespi M, Bailey-Serres J(2017) Global analysis of ribosome-associated noncoding RNAs unveilsnew modes of translational regulation. Proc Natl Acad Sci USA 114:E10018–E10027

Bazzini AA, Johnstone TG, Christiano R, Mackowiak SD, Obermayer B,Fleming ES, Vejnar CE, Lee MT, Rajewsky N, Walther TC, et al (2014)Identification of small ORFs in vertebrates using ribosome footprintingand evolutionary conservation. EMBO J 33: 981–993

Blanvillain R, Young B, Cai YM, Hecht V, Varoquaux F, Delorme V,Lancelin JM, Delseny M, Gallois P (2011) The Arabidopsis peptide kissof death is an inducer of programmed cell death. EMBO J 30: 1173–1183

Brar GA, Weissman JS (2015) Ribosome profiling reveals the what, when,where and how of protein synthesis. Nat Rev Mol Cell Biol 16: 651–664

Brar GA, Yassour M, Friedman N, Regev A, Ingolia NT, Weissman JS(2012) High-resolution view of the yeast meiotic program revealed byribosome profiling. Science 335: 552–557

Calviello L, Mukherjee N, Wyler E, Zauber H, Hirsekorn A, Selbach M,Landthaler M, Obermayer B, Ohler U (2016) Detecting actively trans-lated open reading frames in ribosome profiling data. Nat Methods 13:165–170

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K,Madden TL (2009) BLAST1: Architecture and applications. BMC Bio-informatics 10: 421

Casson SA, Chilley PM, Topping JF, Evans IM, Souter MA, Lindsey K(2002) The POLARIS gene of Arabidopsis encodes a predicted peptide

378 Plant Physiol. Vol. 181, 2019

Wu et al.

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 13: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

required for correct root growth and leaf vascular patterning. Plant Cell14: 1705–1721

Castellana NE, Shen Z, He Y, Walley JW, Cassidy CJ, Briggs SP, Bafna V(2014) An automated proteogenomic method uses mass spectrometry toreveal novel genes in Zea mays. Mol Cell Proteomics 13: 157–167

Chew GL, Pauli A, Schier AF (2016) Conservation of uORF repressivenessand sequence features in mouse, human and zebrafish. Nat Commun 7:11663

Chotewutmontri P, Barkan A (2016) Dynamics of chloroplast translationduring chloroplast differentiation in maize. PLoS Genet 12: e1006106

Chung BY, Hardcastle TJ, Jones JD, Irigoyen N, Firth AE, Baulcombe DC,Brierley I (2015) The use of duplex-specific nuclease in ribosome pro-filing and a user-friendly software package for Ribo-seq data analysis.RNA 21: 1731–1745

Claverie JM (1997) Computational methods for the identification of genesin vertebrate genomic sequences. Hum Mol Genet 6: 1735–1744

Constabel CP, Yip L, Ryan CA (1998) Prosystemin from potato, blacknightshade, and bell pepper: Primary structure and biological activity ofpredicted systemin polypeptides. Plant Mol Biol 36: 55–62

Crappé J, Ndah E, Koch A, Steyaert S, Gawron D, De Keulenaer S, DeMeester E, De Meyer T, Van Criekinge W, Van Damme P, et al (2015)PROTEOFORMER: Deep proteome coverage through ribosome profilingand MS integration. Nucleic Acids Res 43: e29

Dai X, Zhuang Z, Zhao PX (2018) psRNATarget: A plant small RNA targetanalysis server (2017 release). Nucleic Acids Res 46: W49–W54

De Coninck B, Carron D, Tavormina P, Willem L, Craik DJ, Vos C,Thevissen K, Mathys J, Cammue BPA (2013) Mining the genome ofArabidopsis thaliana as a basis for the identification of novel bioactivepeptides involved in oxidative stress tolerance. J Exp Bot 64: 5297–5307

Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P,Chaisson M, Gingeras TR (2013) STAR: Ultrafast universal RNA-seqaligner. Bioinformatics 29: 15–21

Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predictingsubcellular localization of proteins based on their N-terminal amino acidsequence. J Mol Biol 300: 1005–1016

Faghihi MA, Wahlestedt C (2009) Regulatory roles of natural antisensetranscripts. Nat Rev Mol Cell Biol 10: 637–643

Fernandez-Pozo N, Menda N, Edwards JD, Saha S, Tecle IY, Strickler SR,Bombarely A, Fisher-York T, Pujar A, Foerster H, et al (2015) The SolGenomics Network (SGN): From genotype to phenotype to breeding.Nucleic Acids Res 43: D1036–D1041

Fields AP, Rodriguez EH, Jovanovic M, Stern-Ginossar N, Haas BJ,Mertins P, Raychowdhury R, Hacohen N, Carr SA, Ingolia NT, et al(2015) A regression-based analysis of ribosome-profiling data reveals aconserved complexity to mammalian translation. Mol Cell 60: 816–827

Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, MitrosT, Dirks W, Hellsten U, Putnam N, et al (2012) Phytozome: A com-parative platform for green plant genomics. Nucleic Acids Res 40:D1178–D1186

Guydosh NR, Green R (2014) Dom34 rescues ribosomes in 39 untranslatedregions. Cell 156: 950–962

Hsu PY, Benfey PN (2018) Small but mighty: Functional peptides encodedby small ORFs in plants. Proteomics 18: e1700038

Hsu PY, Calviello L, Wu HL, Li FW, Rothfels CJ, Ohler U, Benfey PN(2016) Super-resolution ribosome profiling reveals unannotated trans-lation events in Arabidopsis. Proc Natl Acad Sci USA 113: E7126–E7135

Ikeuchi M, Yamaguchi T, Kazama T, Ito T, Horiguchi G, Tsukaya H(2011) ROTUNDIFOLIA4 regulates cell proliferation along the body axisin Arabidopsis shoot. Plant Cell Physiol 52: 59–69

Imai A, Hanzawa Y, Komura M, Yamamoto KT, Komeda Y, Takahashi T(2006) The dwarf phenotype of the Arabidopsis acl5 mutant is sup-pressed by a mutation in an upstream ORF of a bHLH gene. Develop-ment 133: 3575–3585

Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS (2009)Genome-wide analysis in vivo of translation with nucleotide resolutionusing ribosome profiling. Science 324: 218–223

Ji Z, Song R, Regev A, Struhl K (2015) Many lncRNAs, 5’UTRs, andpseudogenes are translated and some are likely to express functionalproteins. eLife 4: e08890

Johnstone TG, Bazzini AA, Giraldez AJ (2016) Upstream ORFs are prev-alent translational repressors in vertebrates. EMBO J 35: 706–723

Juntawong P, Girke T, Bazin J, Bailey-Serres J (2014) Translational dy-namics revealed by genome-wide profiling of ribosome footprints inArabidopsis. Proc Natl Acad Sci USA 111: E203–E212

Kaur P, Shukla N, Joshi G, VijayaKumar C, Jagannath A, Agarwal M,Goel S, Kumar A (2017) Genome-wide identification and characteriza-tion of miRNAome from tomato (Solanum lycopersicum) roots and root-knot nematode (Meloidogyne incognita) during susceptible interaction.PLoS ONE 12: e0175178

Kearse MG, Wilusz JE (2017) Non-AUG translation: A new start for pro-tein synthesis in eukaryotes. Genes Dev 31: 1717–1731

Kim BH, Cai X, Vaughn JN, von Arnim AG (2007) On the functions of theh subunit of eukaryotic initiation factor 3 in late stages of translationinitiation. Genome Biol 8: R60

Kolde R (2015) pheatmap: Pretty heatmaps. https://cran.r-project.org/web/packages/pheatmap/index.html

Kozak M (1987) An analysis of 59-noncoding sequences from 699 vertebratemessenger RNAs. Nucleic Acids Res 15: 8125–8148

Ku HM, Vision T, Liu J, Tanksley SD (2000) Comparing sequenced seg-ments of the tomato and Arabidopsis genomes: Large-scale duplicationfollowed by selective gene loss creates a network of synteny. Proc NatlAcad Sci USA 97: 9121–9126

Laing WA, Martínez-Sánchez M, Wright MA, Bulley SM, Brewster D,Dare AP, Rassam M, Wang D, Storey R, Macknight RC, et al (2015) Anupstream open reading frame is essential for feedback regulation ofascorbate biosynthesis in Arabidopsis. Plant Cell 27: 772–786

Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie2. Nat Methods 9: 357–359

Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R,Morgan MT, Carey VJ (2013) Software for computing and annotatinggenomic ranges. PLOS Comput Biol 9: e1003118

Lei L, Shi J, Chen J, Zhang M, Sun S, Xie S, Li X, Zeng B, Peng L, HauckA, et al (2015) Ribosome profiling reveals dynamic translational land-scape in maize seedlings under drought stress. Plant J 84: 1206–1218

Li B, Dewey CN (2011) RSEM: Accurate transcript quantification fromRNA-Seq data with or without a reference genome. BMC Bioinformatics12: 323

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G,Abecasis G, Durbin R (2009) The Sequence Alignment/Map format andSAMtools. Bioinformatics 25: 2078–2079

Li S, Le B, Ma X, Li S, You C, Yu Y, Zhang B, Liu L, Gao L, Shi T, et al(2016) Biogenesis of phased siRNAs on membrane-bound polysomes inArabidopsis. eLife 5: e22750

Li Z, Xu R, Li N (2018) MicroRNAs from plants to animals, do they define anew messenger for communication? Nutr Metab (Lond) 15: 68

Liao Y, Smyth GK, Shi W (2014) featureCounts: An efficient general pur-pose program for assigning sequence reads to genomic features. Bio-informatics 30: 923–930

Liu MJ, Wu SH, Wu JF, Lin WD, Wu YC, Tsai TY, Tsai HL, Wu SH (2013)Translational landscape of photomorphogenic Arabidopsis. Plant Cell25: 3699–3710

Liu M, Yu H, Zhao G, Huang Q, Lu Y, Ouyang B (2017) Profiling ofdrought-responsive microRNA and mRNA in tomato using high-throughput sequencing. BMC Genomics 18: 481

Lütcke HA, Chow KC, Mickel FS, Moss KA, Kern HF, Scheele GA (1987)Selection of AUG initiation codons differs in plants and animals. EMBOJ 6: 43–48

Mata CI, Fabre B, Hertog ML, Parsons HT, Deery MJ, Lilley KS, NicolaïBM (2017) In-depth characterization of the tomato fruit pericarp pro-teome. Proteomics 17: 1600406

Menschaert G, Van Criekinge W, Notelaers T, Koch A, Crappé J, GevaertK, Van Damme P (2013) Deep proteome coverage based on ribosomeprofiling aids mass spectrometry-based protein and peptide discoveryand provides evidence of alternative translation products and near-cognate translation initiation events. Mol Cell Proteomics 12: 1780–1790

Merchante C, Brumos J, Yun J, Hu Q, Spencer KR, Enríquez P, BinderBM, Heber S, Stepanova AN, Alonso JM (2015) Gene-specific transla-tion regulation mediated by the hormone-signaling molecule EIN2. Cell163: 684–697

Michel AM, Choudhury KR, Firth AE, Ingolia NT, Atkins JF, Baranov PV(2012) Observation of dually decoded regions of the human genomeusing ribosome profiling data. Genome Res 22: 2219–2229

Neuwirth E (2014) RColorBrewer: ColorBrewer Palettes. https://cran.r-project.org/web/packages/RColorBrewer/index.html

Plant Physiol. Vol. 181, 2019 379

Tomato Translatome Revealed by Ribosome Profiling

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Page 14: The Tomato Translational Landscape Revealed by · The Tomato Translational Landscape Revealed by Transcriptome Assembly and Ribosome Profiling1[OPEN] Hsin-Yen Larry Wu,a Gaoyuan

Ozsolak F, Milos PM (2011) RNA sequencing: advances, Challenges andopportunities. Nat Rev Genet 12: 87–98

Pearce G, Strydom D, Johnson S, Ryan CA (1991) A polypeptide fromtomato leaves induces wound-inducible proteinase inhibitor proteins.Science 253: 895–897

Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, SalzbergSL (2015) StringTie enables improved reconstruction of a transcriptomefrom RNA-seq reads. Nat Biotechnol 33: 290–295

Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL (2016) Transcript-levelexpression analysis of RNA-seq experiments with HISAT, StringTie andBallgown. Nat Protoc 11: 1650–1667

R Core Team (2013) R: A Language and Environment for Statistical Com-puting. R Foundation for Statistical Computing, Vienna

Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D,Zhang B, Mani DR (2017) Methods, tools and current perspectives inproteogenomics. Mol Cell Proteomics 16: 959–981

Ruiz-Orera J, Albà MM (2019) Translation of small open reading frames:Roles in regulation and evolutionary innovation. Trends Genet 35:186–198

Sagor GHM, Berberich T, Tanaka S, Nishiyama M, Kanayama Y, KojimaS, Muramoto K, Kusano T (2016) A novel strategy to produce sweetertomato fruits with high sugar contents by fruit-specific expression of asingle bZIP transcription factor gene. Plant Biotechnol J 14: 1116–1126

Schafer S, Adami E, Heinig M, Rodrigues KEC, Kreuchwig F, Silhavy J,van Heesch S, Simaite D, Rajewsky N, Cuppen E, et al (2015) Trans-lational regulation shapes the molecular landscape of complex diseasephenotypes. Nat Commun 6: 7200

Schwarz D, Thompson AJ, Kläring HP (2014) Guidelines to use tomato inexperiments with a controlled environment. Front Plant Sci 5: 625

Shamimuzzaman M, Vodkin L (2018) Ribosome profiling reveals changesin translational status of soybean transcripts during immature cotyledondevelopment. PLoS ONE 13: e0194596

Simpson GG, Laurie RE, Dijkwel PP, Quesada V, Stockwell PA, Dean C,Macknight RC (2010) Noncanonical translation initiation of the Arabi-dopsis flowering time and alternative polyadenylation regulator FCA.Plant Cell 22: 3764–3777

Song G, Brachova L, Nikolau BJ, Jones AM, Walley JW (2018a) Hetero-trimeric G-protein-dependent proteome and phosphoproteome in un-stimulated Arabidopsis roots. Proteomics 18: e1800323

Song G, Hsu PY, Walley JW (2018b) Assessment and refinement of samplepreparation methods for deep and quantitative plant proteome profil-ing. Proteomics 18: e1800220

Spealman P, Naik AW, May GE, Kuersten S, Freeberg L, Murphy RF,McManus J (2018) Conserved non-AUG uORFs revealed by a novelregression analysis of ribosome profiling data. Genome Res 28: 214–222

Szymanski J, Levin Y, Savidor A, Breitel D, Chappell-Maor L, Heinig U,Töpfer N, Aharoni A (2017) Label-free deep shotgun proteomics revealsprotein dynamics during tomato fruit tissues development. Plant J 90:396–417

Tavormina P, De Coninck B, Nikonorova N, De Smet I, Cammue BPA(2015) The plant peptidome: An expanding repertoire of structural fea-tures and biological functions. Plant Cell 27: 2095–2118

Tian T, Liu Y, Yan H, You Q, Yi X, Du Z, Xu W, Su Z (2017) agriGO v2.0: AGO analysis toolkit for the agricultural community, 2017 update. Nu-cleic Acids Res 45: W122–W129

Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platformfor mass spectrometry-based shotgun proteomics. Nat Protoc 11:2301–2319

Valdivia ER, Chevalier D, Sampedro J, Taylor I, Niederhuth CE, WalkerJC (2012) DVL genes play a role in the coordination of socket cell re-cruitment and differentiation. J Exp Bot 63: 1405–1412

Van Damme P, Gawron D, Van Criekinge W, Menschaert G (2014)N-terminal proteomics and ribosome profiling provide a comprehensiveview of the alternative translation initiation landscape in mice and men.Mol Cell Proteomics 13: 1245–1261

von Arnim AG, Jia Q, Vaughn JN (2014) Regulation of plant translation byupstream open reading frames. Plant Sci 214: 1–12

Walley JW, Briggs SP (2015) Dual use of peptide mass spectra: Protein atlasand genome annotation. Curr Plant Biol 2: 21–24

Wei T (2013) corrplot: Visualization of a correlation matrix. https://cran.r-project.org/web/packages/corrplot/index.html

Xu G, Greene GH, Yoo H, Liu L, Marqués J, Motley J, Dong X (2017a)Global translational reprogramming is a fundamental layer of immuneregulation in plants. Nature 545: 487–490

Xu G, Yuan M, Ai C, Liu L, Zhuang E, Karapetyan S, Wang S, Dong X(2017b) uORF-mediated translation allows engineered plant disease re-sistance without fitness costs. Nature 545: 491–494

Yu Y, Jia T, Chen X (2017) The ‘how’ and ‘where’ of plant microRNAs. NewPhytol 216: 1002–1017

Zhang H, Si X, Ji X, Fan R, Liu J, Chen K, Wang D, Gao C (2018) Genomeediting of upstream open reading frames enables translational control inplants. Nat Biotechnol 36: 894–898

Zoschke R, Watkins KP, Barkan A (2013) A rapid ribosome profilingmethod elucidates chloroplast ribosome behavior in vivo. Plant Cell 25:2265–2275

380 Plant Physiol. Vol. 181, 2019

Wu et al.

https://plantphysiol.orgDownloaded on December 21, 2020. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.