RESULTS AFFILIATIONS DWNLD CONTACT: [email protected], [email protected] Mass spectrometry and ribosome profiling, a perfect combination towards a more comprehensive identification strategy of true in vivo protein forms. Jeroen Crappé* a ; Alexander Koch* a ; Sandra Steyaert a ; Wim Van Criekinge a ; Petra Van Damme b,c ; Gerben Menschaert a MATERIALS & METHODS WORKFLOW OVERVIEW 1. Ribosome profiling of ribosome captured mRNA fragments performed on HiSeq, Illumina sequencing plaLorm (RIBOseq). 2. Quality control based on exisPng tools as FastQC and custom metagenic funcPonal assessment. 3. Transcriptome mapping based on STAR [4] and/or TopHat2 [5] local aligners. 4. Single nucleoPde polymorphism (SNP) calling based on samToolsmpileup and/or GATK. 5. TranslaPon iniPaPon site (TIS) calling based on trained Support Vector Machine (SVM) [2] or rulebased algorithm [1] . 6. TranslaOon product assembly, taken into account the SNP, TIS, INDEL awareness. Construct complete proteome, opPonally combine with SwissProt. 7. MSbased proteomics/pepOdomics using SearchGui [6] and PepPdeShaker [7] tools. 8. Genomecentric visualizaOon of all generated informaPon tracks (Ensembl, UCSC, IGV genome browsers). => All results are stored in a rela%onal SQLite database allowing further detailed analysis. An increasing number of studies involve integraOve analysis of gene and protein expression data, taking advantage of new technologies such as nextgeneraPon transcriptome sequencing (RNASeq) and highly sensiPve mass spectrometry (MS). Recently, a strategy, termed ribosome profiling, based on deep sequencing of ribosomeprotected mRNA fragments, indirectly monitoring protein synthesis, has been described. When used in combinaPon with iniOaOonspecific translaOon inhibitors, it enables the idenPficaPon of (alternaPve) translaPon iniPaPons. INTRODUCTION RIBOseq experimental workflow [3] In contrast to rouPnely employed protein databases in proteomics searches, RIBOseq derived data gives a more representaOve expression state and accounts for sequence variaPon informaPon ( single nucleoOde polymorphism, inserOons, deleOons and RNAsplice variants) and alternaOve translaOon iniOaOon leading to Nterminal extended and/or truncated protein forms. Furthermore, RIBOseq reveals translaPon start at nearcognate start sites. Without taking this informaPon into account, MSbased proteomic studies may fail to detect novel, important protein forms. MOA of iniOaOonspecific translaOon inhibitors and example of genemapped RIBOseq signals [1,2] GOALS ✓ Compile a samplespecific protein search database based on ribosome profiling sequencing data. ✓ Introduce new translaOon products in the MS search space: Nterminal extensions/ truncaPons, translated uORFs, nearcognate start sites. ✓ Bridging two omics worlds: transcriptomics & MSbased proteomics by means of RIBOseq. ✓ Deep proteome coverage based on ribosome profiling aids mass spectrometrybased protein and pepOde discovery and provides evidence of alternaOve translaOon products and nearcognate translaOon iniOaOon events [1] . ✓ Future work will mainly focus on : Further invesPgaPon of the differenPal expression on translaPon level of UniProtKBSwissProt and RIBOseq derived translaPon products: technological and/or biological relevance? Detailed assessment of the difference between in vivo measurement of protein synthesis (RIBO seq) and protein presence (MSbased proteomics). Generalize the pipeline to all types of nextgeneraPon sequencing RNAseq data: (direcPonal) A+/A RNAseq, CLIPseq, exomeseq, ribo seq QuanOtaOve correlaOon of RIBOseq and (non) labelled MSbased proteomics Incorporate Pipeline into GalaxyP [9] CONCLUSION & FUTURE WORK [1] Lee, S., Liu, B., Lee, S., Huang, S. X., Shen, B., and Qian, S. B. (2012) Global mapping of translaPon iniPaPon sites in mammalian cells at single nucleoPde resoluPon. Proc. Natl. Acad. Sci. U.S.A. 109, E2424– E2432. [2] Ingolia, N. T., Lareau, L. F., and Weissman, J. S. (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802. [3] Michel, A. M., Baranov, P.V. (2013) Ribosome profiling: a HiDef monitor for protein synthesis at the genomewide scale. Wiley Interd. Rev. RNA. Epub ahead of print. [4] Dobin A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski C., Jha S., Batut P., Chaisson, M., Gingeras T. R. (2013) STAR: ultrafast universal RNAseq aligner. BioinformaPcs, 29 (1): 1521. [5] Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., Salzberg, S. L. (2013) TopHat2: accurate alignment of transcriptomes in the presence of inserPons, delePons and gene fusions. Genome Biology 2013, 14:R36. [6] Vaudel, M., Barsnes, H., Berven, F.S., Sickmann, A., Martens, L. (2011) SearchGUI: An opensource graphical user interface for simultaneous OMSSA and X!Tandem searches. Mar;11(5):9969. [7] hup://pepPdeshaker.googlecode.com [8] Menschaert, G., Van Criekinge, W., Notelaers, T., Koch, A., Crappé, J., Gevaert, K., Van Damme, P. (2013) Deep proteome coverage based on ribosome profiling aids MSbased protein and pepPde discovery and provides evidence of alternaPve translaPon products and nearcognate translaPon iniPaPon events. Mol Cell Prot. Epub ahead of print. [9] hup://getgalaxyp.org REFERENCES a Laboratory of BioinformaPcs and ComputaPonal Genomics, Department of MathemaPcal Modelling, StaPsPcs and BioinformaPcs, Faculty of Bioscience Engineering, Ghent University, B9000, Ghent b Department of Medical Protein Research, Flemish InsPtute for Biotechnology, B9000 Ghent c Department of Biochemistry, Ghent University, B9000 Ghent * Cofirst authors MS experiments 2 Data Sets: Shotgun proteomics: 24 LC runs > 112.974 MS/MS spectra. PosiPonal proteomics (Nterm COFRADIC): 45 LC runs > 68.523 MS/MS spectra 1556 259 16 3 1 4 Riboseq (n=13.454) Nterminomics (n=1.835) (A) COFRADIC: Pie charts depicPng the idenOficaOon of novel translaOon products: N terminal extensions truncaPons, uORF translaPons, internal outofframe translaPons (lev panel is RIBOseq based, right panel is MSbased using custom DB). (B) SHOTGUN: Pie chart showing an +2.5% gain in protein idenOficaOon using the combined RIBOseq derived and SwissProt search database. (C) Weblogos depicPng the sequence context (three bases upstream and four bases downstream) of the newly idenPfied translaPon iniPaPon sites, clearly poinPng to nearcognate translaOon iniOaOon. Version 1 Custom DB [8] based on: SVM based TIS calling [2] Using reference genomic sequence Combi of RIBOseq derived and SwissProt WebLogo 3.3 0.0 0.5 1.0 probability T C G A G T A C T A C G T C G A 5 C T G A C T G WebLogo 3.3 0.0 1.0 2.0 bits T C G A T A C A C G C G A 5 C T G A C T G 3117 86 27 11 11 49 swiss new muta3on n5term5ext alterna3ve isoform 83 2 1 non(swiss n(term(ext muta3on n = 3252 B Version 2 Custom DB based on: rulebased TIS calling [1] Including SNPINDEL informaOon Only RiboSeq derived sequences Examples: (A) UCSC genomebrowser screenshot showing the extended form of the Fxr2 gene translaPon product starPng at near cognate GTG start site. IdenPfied trypPc pepPde is also depicted. (B) 2 examples of (a)synonymous SNPs with overlapping trypPc pepPde idenPficaPon resulPng from the custom RIBOseq derived protein product database. AB near-cognate GTG start triplet synonymous mutaPon : Vps53 gene generic_Q9D7A6|uc008ekg.1_36|Q9D7A6 MACAAARSPADQDRFICIYPAYLNNKKTIAEGRRIPISKAVENPTATEIQDVCSAVGLNA sp|Q9D7A6|SRP19_MOUSE MACSAARPPADQDRFIFIYPAYLNNKKTIAEGRRIPISKAVENPTATEIQDVCSAVGLNA ***:*** ******** ******************************************* generic_Q9D7A6|uc008ekg.1_36|Q9D7A6 FLEKNKMYSREWNRDVQFRGRVRVQLKQEDGSLCLVQFPSRKSVMLYVAEMIPKLKTRTQ sp|Q9D7A6|SRP19_MOUSE FLEKNKMYSREWNRDVQFRGRVRVQLKQEDGSLCLVQFPSRKSVMLYVAEMIPKLKTRTQ ************************************************************ generic_Q9D7A6|uc008ekg.1_36|Q9D7A6 KSGGADPSLQQGEGSKKGKGKKKK sp|Q9D7A6|SRP19_MOUSE KSGGADPILQQGEGSKKGKGKKKK ******* **************** asynonymous : Srp19 gene C A 0 1000 2000 3000 4000 protein * SVM based TIS calling (version 1, no SwissProt) OLD SVM based TIS calling (version 1, + SwissProt) OLD stringent rule based TIS calling New stringent rule based TIS calling and SNP/INDEL aware NEW shotgun identifications using custom protein DB protein * peptide * PSM * SVM based TIS calling (version 1, no SwissProt) OLD 2194 14519 32818 SVM based TIS calling (version 1, + SwissProt) OLD 3252 stringent rule based TIS calling NEW 3295 15913 32348 stringent rule based TIS calling and SNP/INDEL aware NEW 3341 14967 32620 * 1% FDR level on protein, peptide and PSM level