M ETHODS IN M OLECULAR B IOLOGY Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK For further volumes: http://www.springer.com/series/7651
M E T H O D S I N M O L E C U L A R B I O L O G Y
Series EditorJohn M. Walker
School of Life and Medical SciencesUniversity of Hertfordshire
Hat fi eld, Hertfordshire, AL10 9AB, UK
For further volumes: http://www.springer.com/series/7651
RNA Bioinformatics
Edited by
Ernesto PicardiDepartment of Biosciences, Biotechnology and Biopharmaceutics,
University of Bari, Bari, Italy
ISSN 1064-3745 ISSN 1940-6029 (electronic)ISBN 978-1-4939-2290-1 ISBN 978-1-4939-2291-8 (eBook) DOI 10.1007/978-1-4939-2291-8 Springer New York Heidelberg Dordrecht London
Library of Congress Control Number: 2014958476
© Springer Science+Business Media New York 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifi cally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfi lms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifi cally for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specifi c statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
Printed on acid-free paper
Humana Press is a brand of SpringerSpringer is part of Springer Science+Business Media (www.springer.com)
Editor Ernesto Picardi Department of Biosciences Biotechnology and Biopharmaceutics University of Bari Bari , Italy
Institute of Biomembranes and Bioenergetics National Research CouncilBari, Italy
National Institute of Biostructures and Biosystems (INBB) Rome, Italy
v
RNA is a versatile nucleic acid polymer with a structure analogous to single-stranded DNA even though it has a backbone composed of ribose sugar and the organic base thymine (T) is replaced by uracil (U). In contrast to DNA, RNA molecules are less stable and character-ized by secondary and tertiary structures, which underline their different function. According to the central dogma of molecular biology, RNA is directly involved in the fl ux of genetic information from the DNA to the proteins. However, recent developments in molecular biology indicate that RNA molecules have a plethora of functional roles and are indispensable for living organisms and cell homeostasis. Indeed, on the basis of their func-tions, RNA molecules can be divided into two groups, coding and noncoding. While the coding fraction is represented by messenger RNAs (mRNAs), noncoding RNAs (ncRNAs) include at least two main classes: (1) structural ncRNAs as transfer RNAs (tRNAs), ribo-somal RNAs (rRNAs), and small nucleolar RNAs (snoRNAs); (2) regulatory ncRNAs as micro RNAs (miRNAs), piwi-interacting RNAs (piwiRNAs), and long noncoding RNAs (lncRNAs).
A large fraction of eukaryotic transcriptomes consists of ncRNAs that play relevant biological roles as the regulation of gene expression in normal as well as pathological condi-tions. NcRNA functions are generally mediated by interactions with other RNA molecules or DNA regions or proteins. In all cases, RNA secondary and tertiary structures are basic for a correct function and interaction. However, the characterization of RNA molecules is a challenging task and, thus, during past decades a variety of bioinformatics tools have been developed. Nowadays, RNA bioinformatics represents one of the most active fi elds of bio-informatics and computational biology.
The interest towards RNA bioinformatics has increased rapidly thanks to the advent of recent high-throughput sequencing technologies that enable the investigation of complete transcriptomes at single nucleotide resolution.
The present book has been conceived with the aim of providing an overview of RNA bioinformatics methodologies, starting from “classical” technologies to predict secondary and tertiary structures, to novel strategies and algorithms based on massive RNA sequenc-ing. Indeed, the content of the book is organized as follows:
– Part I—RNA secondary and tertiary structures. This section includes chapters devoted to main computational algorithms to predict, draw, and edit secondary and tertiary RNA structures or infer RNA-RNA interactions.
– Part II—Analysis of high-throughput RNA sequencing data. The aim of this section is to provide a global overview of current methodologies to handle and analyze large sequencing dataset generated by next-generation sequencing (NGS) technologies. Indeed, the section includes chapters about quality control of RNA sequencing data or the mapping of RNA-Seq reads on complete reference genomes or the gene expression profi ling. In addition, methodologies to investigate posttranscriptional events as alternative splicing or RNA editing and entire meta-transcriptomes are also described in detail.
Pref ace
vi
– Part III—Web resources for RNA data analysis. Finally, this section provides chapters about several available web resources to work with RNA data without specifi c computer requirements (hardware and software) or specialized bioinformatics skills.
Hoping that the book content meets the reader expectations, I would like to acknowl-edge those who helped make this book possible: all chapter authors for their work and excellent contributions; the Series Editor, John Walker, for his constant support and sug-gestions; my wife Angela and daughter Adele for their patience and encouragement.
A special thanks is addressed to my mentor Graziano Pesole for his always indispensable paternal suggestions.
Finally, I would to dedicate this effort to my parents since they have always believed in me and to the memory of my fi rst supervisor Carla Quagliariello who introduced me for the fi rst time to the wonderful and fascinating world of RNA bioinformatics.
Bari, Italy Ernesto Picardi
Preface
vii
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
PART I RNA SECONDARY AND TERTIARY STRUCTURES
1 Free Energy Minimization to Predict RNA Secondary Structures and Computational RNA Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Alexander Churkin , Lina Weinbrand , and Danny Barash
2 RNA Secondary Structure Prediction from Multi-aligned Sequences . . . . . . . . 17 Michiaki Hamada
3 A Simple Protocol for the Inference of RNA Global Pairwise Alignments . . . . 39 Eugenio Mattei , Manuela Helmer-Citterich , and Fabrizio Ferrè
4 De Novo Secondary Structure Motif Discovery Using RNAProfile . . . . . . . . . 49 Federico Zambelli and Giulio Pavesi
5 Drawing and Editing the Secondary Structure(s) of RNA . . . . . . . . . . . . . . . . 63 Yann Ponty and Fabrice Leclerc
6 Modeling and Predicting RNA Three-Dimensional Structures. . . . . . . . . . . . . 101 Jérôme Waldispühl and Vladimir Reinharz
7 Fast Prediction of RNA–RNA Interaction Using Heuristic Algorithm . . . . . . . 123 Soheila Montaseri
PART II ANALYSIS OF HIGH-THROUGHPUT RNA SEQUENCING DATA
8 Quality Control of RNA-Seq Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Xing Li , Asha Nair , Shengqin Wang , and Liguo Wang
9 Accurate Mapping of RNA-Seq Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Kin Fai Au
10 Quantifying Entire Transcriptomes by Aligned RNA-Seq Data . . . . . . . . . . . . 163 Raffaele A. Calogero and Francesca Zolezzi
11 Transcriptome Assembly and Alternative Splicing Analysis . . . . . . . . . . . . . . . . 173 Paola Bonizzoni , Gianluca Della Vedova , Graziano Pesole , Ernesto Picardi , Yuri Pirola , and Raffaella Rizzi
12 Detection of Post-Transcriptional RNA Editing Events . . . . . . . . . . . . . . . . . . 189 Ernesto Picardi , Anna Maria D’Erchia , Angela Gallo , and Graziano Pesole
13 Prediction of miRNA Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Anastasis Oulas , Nestoras Karathanasis , Annita Louloupi , Georgios A. Pavlopoulos , Panayiota Poirazi , Kriton Kalantidis , and Ioannis Iliopoulos
Contents
viii
14 Using Deep Sequencing Data for Identification of Editing Sites in Mature miRNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Shahar Alon and Eli Eisenberg
15 NGS-Trex: An Automatic Analysis Workflow for RNA-Seq Data . . . . . . . . . . . 243 Ilenia Boria , Lara Boatti , Igor Saggese , and Flavio Mignone
16 e-DNA Meta-Barcoding: From NGS Raw Data to Taxonomic Profiling . . . . . 257 Fosso Bruno , Marzano Marinella , and Monica Santamaria
17 Deciphering Metatranscriptomic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Evguenia Kopylova , Laurent Noé , Corinne Da Silva , Jean- Frédéric Berthelot , Adriana Alberti , Jean -Marc Aury , and Hélène Touzet
18 RIP-Seq Data Analysis to Determine RNA–Protein Associations . . . . . . . . . . . 293 Federico Zambelli and Giulio Pavesi
PART III WEB RESOURCES FOR RNA DATA ANALYSIS
19 The ViennaRNA Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Andreas R. Gruber , Stephan H. Bernhart , and Ronny Lorenz
20 Exploring the RNA Editing Potential of RNA-Seq Data by ExpEdit . . . . . . . . 327 Mattia D’Antonio , Ernesto Picardi , Tiziana Castrignanò , Anna Maria D’Erchia , and Graziano Pesole
21 A Guideline for the Annotation of UTR Regulatory Elements in the UTRsite Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Matteo Giulietti , Giorgio Grillo , Sabino Liuni , and Graziano Pesole
22 Rfam: Annotating Families of Non-Coding RNA Sequences . . . . . . . . . . . . . . 349 Jennifer Daub , Ruth Y. Eberhardt , John G. Tate , and Sarah W. Burge
23 ASPicDB: A Database Web Tool for Alternative Splicing Analysis . . . . . . . . . . 365 Mattia D’Antonio , Tiziana Castrgnanò , Matteo Pallocca , Anna Maria D’Erchia , Ernesto Picardi , and Graziano Pesole
24 Analysis of Alternative Splicing Events in Custom Gene Datasets by AStalavista . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Sylvain Foissac and Michael Sammeth
25 Computational Design of Artificial RNA Molecules for Gene Regulation. . . . . 393 Alessandro Laganà , Dario Veneziano , Francesco Russo , Alfredo Pulvirenti , Rosalba Giugno , Carlo Maria Croce , and Alfredo Ferro
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Contents
ix
ADRIANA ALBERTI • Genoscope—National Sequencing Center , Evry , France SHAHAR ALON • Department of Neurobiology, George S. Wise Faculty of Life Sciences ,
Tel-Aviv University , Tel-Aviv , Israel ; Sagol School of Neuroscience , Tel-Aviv University , Tel-Aviv , Israel
KIN FAI AU • Department of Internal Medicine , University of Iowa , Iowa City , IA , USA ; Department of Biostatistics , University of Iowa , Iowa City , IA , USA
JEAN -MARC AURY • Genoscope—National Sequencing Center , Evry , France DANNY BARASH • Department of Computer Science , Ben-Gurion University , Been- Sheva ,
Israel STEPHAN H. BERNHART • Department of Bioinformatics , University of Leipzig , Leipzig ,
Germany JEAN- FRÉDÉRIC BERTHELOT • Inria Lille Nord-Europe , Villeneuve d’Ascq , France LARA BOATTI • Department of Sciences and Technological Innovation (DiSIT) ,
University of Piemonte Orientale “A. Avogadro” , Alessandria , Italy PAOLA BONIZZONI • Department of Informatics, Systems and Communication (DISCo) ,
University of Milano-Bicocca , Milano , Italy ILENIA BORIA • Department of Chemistry , University of Milano , Milano , Italy FOSSO BRUNO • Department of Biosciences, Biotechnology and Biopharmaceutics ,
University of Bari , Bari , Italy SARAH W. BURGE • Wellcome Trust Sanger Institute , Cambridgeshire , UK ; European
Molecular Biology Laboratory , European Bioinformatics Institute , Cambridgeshire , UK RAFFAELE A. CALOGERO • Department of Molecular Biotechnology and Health Sciences ,
University of Torino , Torino , Italy TIZIANA CASTRIGNANÒ • Consorzio Interuniversitario CINECA , Rome , Italy ALEXANDER CHURKIN • Department of Computer Science , Ben-Gurion University ,
Beer-Sheva , Israel CARLO MARIA CROCE • Department of Molecular Virology, Immunology and Medical
Genetics, Comprehensive Cancer Center , The Ohio State University , Columbus , OH , USA MATTIA D’ANTONIO • Consorzio Interuniversitario CINECA , Rome , Italy ANNA MARIA D’ERCHIA • Department of Biosciences, Biotechnology and Biopharmaceutics ,
University of Bari , Bari , Italy ; Institute of Biomembranes and Bioenergetics , National Research Council , Bari , Italy
JENNIFER DAUB • Wellcome Trust Sanger Institute , Cambridgeshire , UK ; European Molecular Biology Laboratory , European Bioinformatics Institute , Cambridgeshire , UK
RUTH Y. EBERHARDT • Wellcome Trust Sanger Institute , Cambridgeshire , UK ; European Molecular Biology Laboratory , European Bioinformatics Institute , Cambridgeshire , UK
ELI EISENBERG • Raymond and Beverly Sackler School of Physics and Astronomy , Tel-Aviv University , Tel-Aviv , Israel ; Sagol School of Neuroscience , Tel-Aviv University , Tel-Aviv , Israel
FABRIZIO FERRÈ • Center for Molecular Bioinformatics (CBM), Department of Biology , University of Rome Tor Vergata , Rome , Italy
Contributors
x
ALFREDO FERRO • Department of Clinical and Molecular Biomedicine , University of Catania , Catania , Italy
SYLVAIN FOISSAC • UMR1388 GenPhySE , French National Institute for Agricultural Research (INRA) , Castanet Tolosan , France
ANGELA GALLO • RNA Editing Laboratory, Oncohaematology Department , IRCCS Ospedale Pediatrico “Bambino Gesù” , Rome , Italy
ROSALBA GIUGNO • Department of Clinical and Molecular Biomedicine , University of Catania , Catania , Italy
MATTEO GIULIETTI • Institute of Biomembranes and Bioenergetics , National Research Council , Bari , Italy
GIORGIO GRILLO • Institute of Biomedical Technologies , National Research Council , Bari , Italy
ANDREAS R. GRUBER • Biozentrum , University of Basel , Basel , Switzerland ; Swiss Institute of Bioinformatics , Basel , Switzerland
MICHIAKI HAMADA • Faculty of Science and Engineering , Waseda University , Tokyo , Japan ; Computational Biology Research Center , National Institute of Advanced Industrial Science and Technology (AIST) , Tokyo , Japan
MANUELA HELMER-CITTERICH • Center for Molecular Bioinformatics (CBM), Department of Biology , University of Rome Tor Vergata , Rome , Italy
IOANNIS ILIOPOULOS • Division of Basic Sciences , University of Crete Medical School , Heraklion , Greece
KRITON KALANTIDIS • Institute of Molecular Biology and Biotechnology , FORTH , Heraklion , Greece ; Department of Biology , University of Crete , Heraklion , Greece
NESTORAS KARATHANASIS • Institute of Molecular Biology and Biotechnology , FORTH , Heraklion , Greece ; Department of Biology , University of Crete , Heraklion , Greece
EVGUENIA KOPYLOVA • Laboratoire d’Informatique Fondamentale de Lille (LIFL) , UMR CNRS 8022 , Villeneuve d’Ascq Cédex , France ; Inria Lille Nord-Europe , Villeneuve d’Ascq Cédex , France
ALESSANDRO LAGANÀ • Department of Molecular Virology, Immunology and Medical Genetics, Comprehensive Cancer Center , The Ohio State University , Columbus , OH , USA
FABRICE LECLERC • Institut de G’en’etique et Microbiologie (IGM) UMR 8621 , Universit’e Paris Sud , Orsay , France
XING LI • Division of Biomedical Statistics and Informatics, Department of Health Sciences Research , Mayo Clinic , Minneapolis , MN , USA
SABINO LIUNI • Institute of Biomedical Technologies , National Research Council , Bari , Italy RONNY LORENZ • Institute for Theoretical Chemistry , University of Vienna , Vienna , Austria ANNITA LOULOUPI • Biomedical Science-Medical Biology , University of Amsterdam ,
Amsterdam , The Netherlands MARZANO MARINELLA • Institute of Biomembranes and Bioenergetics , CNR , Bari , Italy EUGENIO MATTEI • Center for Molecular Bioinformatics (CBM), Department of Biology ,
University of Rome Tor Vergata , Rome , Italy FLAVIO MIGNONE • Department of Sciences and Technological Innovation (DiSIT) ,
University of Piemonte Orientale “A. Avogadro” , Alessandria , Italy SOHEILA MONTASERI • Department of Mathematics, Statistics and Computer Science ,
University of Tehran , Tehran , Iran ASHA NAIR • Division of Biomedical Statistics and Informatics, Department of Health
Sciences Research , Mayo Clinic , Minneapolis , MN , USA
Contributors
xi
LAURENT NOÉ • Laboratoire d’Informatique Fondamentale de Lille (LIFL) , UMR CNRS 8022 , Villeneuve d’Ascq Cédex , France ; Inria Lille Nord-Europe , Villeneuve d’Ascq Cédex , France
ANASTASIS OULAS • Institute of Marine Biology, Biotechnology and Aquaculture , HCMR , Heraklion , Greece
MATTEO PALLOCCA • Translational Oncogenomics Unit , Italian National Cancer Institute “Regina Elena” , Rome , Italy
GIULIO PAVESI • Dipartimento di Bioscienze , Università di Milano , Milano , Italy GEORGIOS A. PAVLOPOULOS • Division of Basic Sciences , University of Crete Medical School ,
Heraklion , Greece GRAZIANO PESOLE • Department of Biosciences, Biotechnology and Biopharmaceutics ,
University of Bari , Bari , Italy ; Institute of Biomembranes and Bioenergetics , National Research Council , Bari , Italy ; National Institute of Biostructures and Biosystems (INBB) , Rome , Italy
ERNESTO PICARDI • Department of Biosciences, Biotechnology and Biopharmaceutics , University of Bari , Bari , Italy ; Institute of Biomembranes and Bioenergetics , National Research Council , Bari , Italy ; National Institute of Biostructures and Biosystems (INBB) , Rome , Italy
YURI PIROLA • Department of Informatics, Systems and Communication (DISCo) , University of Milano-Bicocca , Milano , Italy
PANAYIOTA POIRAZI • Institute of Molecular Biology and Biotechnology , FORTH , Heraklion , Greece
YANN PONTY • Laboratoire d’Informatique de ‘X (LIX) UMR 7161, CNRS and INRIA AMIB , Ecole Polytechnique , Palaiseau , France
ALFREDO PULVIRENTI • Department of Clinical and Molecular Biomedicine , University of Catania , Catania , Italy
VLADIMIR REINHARZ • School of Computer Science , McGill University , Montreal , QC , Canada RAFFAELLA RIZZI • Department of Informatics, Systems and Communication (DISCo) ,
University of Milano-Bicocca , Milano , Italy FRANCESCO RUSSO • Department of Clinical and Molecular Biomedicine , University
of Catania , Catania , Italy ; Laboratory for Integrative System Medicine (LISM), Institute of Informatics and Telematics (IIT) and Institute of Clinical Physiology (IFC) , National Research Council (CNR) , Pisa , Italy
IGOR SAGGESE • Department of Sciences and Technological Innovation (DiSIT) , University of Piemonte Orientale “A. Avogadro” , Alessandria , Italy
MICHAEL SAMMETH • Bioinformatics , National Laboratory of Cientifi c Computing (LNCC) , Rio de Janeiro , Brazil
MONICA SANTAMARIA • Institute of Biomembranes and Bioenergetics , CNR , Bari , Italy CORINNE DA SILVA • Genoscope—National Sequencing Center , Evry , France JOHN G. TATE • Wellcome Trust Sanger Institute , Cambridgeshire , UK ; European
Molecular Biology Laboratory , European Bioinformatics Institute , Cambridgeshire , UK HÉLÈNE TOUZET • Laboratoire d’Informatique Fondamentale de Lille (LIFL) , UMR
CNRS 8022 , Villeneuve d’Ascq Cédex , France ; Inria Lille Nord-Europe , Villeneuve d’Ascq Cédex , France
GIANLUCA DELLA VEDOVA • Department of Informatics, Systems and Communication (DISCo) , University of Milano-Bicocca , Milano , Italy
Contributors
xii
DARIO VENEZIANO • Department of Molecular Virology, Immunology and Medical Genetics, Comprehensive Cancer Center , The Ohio State University , Columbus , OH , USA ; Department of Clinical and Molecular Biomedicine , University of Catania , Catania , Italy
JÉRÔME WALDISPÜHL • School of Computer Science , McGill University , Montreal , QC , Canada SHENGQIN WANG • School of Biological Science and Medical Engineering ,
Southeast University , Nanjing, Jiangsu , PR , China LIGUO WANG • Division of Biomedical Statistics and Informatics, Department of Health
Sciences Research , Mayo Clinic , Minneapolis , MN , USA LINA WEINBRAND • Department of Computer Science , Ben-Gurion University , Beer- Sheva , Israel FEDERICO ZAMBELLI • Istituto di Biomembrane e Bioenergetica , Consiglio Nazionale delle
Ricerche - CNR , Bari , Italy FRANCESCA ZOLEZZI • Singapore Immunology Network (SIgN) , Agency of Science,
Technology and Research (A*STAR), Biopolis , Singapore , Singapore
Contributors