Comparative Genomics
M E T H O D S I N M O L E C U L A R B I O L O G YTM
John M. Walker, SERIES EDITOR
404. Topics in Biostatistics, edited by WalterT. Ambrosius, 2007
403. Patch-Clamp Methods and Protocols, edited byPeter Molnar and James J. Hickman, 2007
402. PCR Primer Design, edited by Anton Yuryev, 2007401. Neuroinformatics, edited by Chiquito J.
Crasto, 2007400. Methods in Lipid Membranes, edited by Alex
Dopico, 2007399. Neuroprotection Methods and Protocols, edited by
Tiziana Borsello, 2007398. Lipid Rafts, edited by Thomas J. McIntosh, 2007397. Hedgehog Signaling Protocols, edited by Jamila
I. Horabin, 2007396. Comparative Genomics, Volume 2, edited by
Nicholas H. Bergman, 2007395. Comparative Genomics, Volume 1, edited by
Nicholas H. Bergman, 2007394. Salmonella: Methods and Protocols, edited by Heide
Schatten and Abe Eisenstark, 2007393. Plant Secondary Metabolites, edited by Harinder
P. S. Makkar, P. Siddhuraju, and KlausBecker, 2007
392. Molecular Motors: Methods and Protocols, edited byAnn O. Sperry, 2007
391. MRSA Protocols, edited by Yinduo Ji, 2007390. Protein Targeting Protocols, Second Edition, edited
by Mark van der Giezen, 2007389. Pichia Protocols, Second Edition, edited by James
M. Cregg, 2007388. Baculovirus and Insect Cell Expression Protocols,
Second Edition, edited by David W.Murhammer, 2007
387. Serial Analysis of Gene Expression (SAGE): DigitalGene Expression Profiling, edited by Kare LehmannNielsen, 2007
386. Peptide Characterization and ApplicationProtocols� edited by Gregg B. Fields, 2007
385. Microchip-Based Assay Systems: Methods andApplications, edited by Pierre N. Floriano, 2007
384. Capillary Electrophoresis: Methods and Protocols,edited by Philippe Schmitt-Kopplin, 2007
383. Cancer Genomics and Proteomics: Methods andProtocols, edited by Paul B. Fisher, 2007
382. Microarrays, Second Edition: Volume 2,Applications and Data Analysis, edited by JangB. Rampal, 2007
381. Microarrays, Second Edition: Volume 1, SynthesisMethods, edited by Jang B. Rampal, 2007
380. Immunological Tolerance: Methods and Protocols,edited by Paul J. Fairchild, 2007
379. Glycovirology Protocols� edited by RichardJ. Sugrue, 2007
378. Monoclonal Antibodies: Methods and Protocols,edited by Maher Albitar, 2007
377. Microarray Data Analysis: Methods andApplications, edited by Michael J. Korenberg, 2007
376. Linkage Disequilibrium and Association Mapping:Analysis and Application, edited by AndrewR. Collins, 2007
375. In Vitro Transcription and Translation Protocols:Second Edition, edited by Guido Grandi, 2007
374. Quantum Dots: edited by Marcel Bruchez andCharles Z. Hotz, 2007
373. Pyrosequencing® Protocols� edited by SharonMarsh, 2007
372. Mitochondria: Practical Protocols, edited by DarioLeister and Johannes Herrmann, 2007
371. Biological Aging: Methods and Protocols, edited byTrygve O. Tollefsbol, 2007
370. Adhesion Protein Protocols, Second Edition, editedby Amanda S. Coutts, 2007
369. Electron Microscopy: Methods and Protocols,Second Edition, edited by John Kuo, 2007
368. Cryopreservation and Freeze-Drying Protocols,Second Edition, edited by John G. Day and GlynStacey, 2007
367. Mass Spectrometry Data Analysis in Proteomics�edited by Rune Matthiesen, 2007
366. Cardiac Gene Expression: Methods and Protocols,edited by Jun Zhang and Gregg Rokosh, 2007
365. Protein Phosphatase Protocols: edited by GregMoorhead, 2007
364. Macromolecular Crystallography Protocols:Volume 2, Structure Determination, edited by SylvieDoublié, 2007
363. Macromolecular Crystallography Protocols: Volume1, Preparation and Crystallization of Macromolecules,edited by Sylvie Doublié, 2007
362. Circadian Rhythms: Methods and Protocols, editedby Ezio Rosato, 2007
361. Target Discovery and Validation Reviews andProtocols: Emerging Molecular Targets andTreatment Options, Volume 2, edited by MouldySioud, 2007
360. Target Discovery and Validation Reviews andProtocols: Emerging Strategies for Targets andBiomarker Discovery, Volume 1, edited by MouldySioud, 2007
359. Quantitative Proteomics by Mass Spectrometry�edited by Salvatore Sechi, 2007
358. Metabolomics: Methods and Protocols, edited byWolfram Weckwerth, 2007
357. Cardiovascular Proteomics: Methods and Protocols,edited by Fernando Vivanco, 2007
356. High-Content Screening: A Powerful Approach toSystems Cell Biology and Drug Discovery, edited byD. Lansing Taylor, Jeffrey Haskins, and Ken Guiliano,and 2007
355. Plant Proteomics: Methods and Protocols, edited byHervé Thiellement, Michel Zivy, Catherine Damerval,and Valerie Mechin, 2007
354. Plant–Pathogen Interactions: Methods andProtocols, edited by Pamela C. Ronald, 2006
M E T H O D S I N M O L E C U L A R B I O L O G YTM
Comparative GenomicsVolume 2
Edited by
Nicholas H. BergmanBioinformatics Program and Department
of Microbiology and Immunology,University of Michigan Medical School,
Ann Arbor, MI
©2007 Humana Press Inc.999 Riverview Drive, Suite 208Totowa, New Jersey 07512
www.humanapress.com
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or byany means, electronic, mechanical, photocopying, microfilming, recording, or otherwise without written permission fromthe Publisher. Methods in Molecular BiologyTM is a trademark of The Humana Press Inc.
All papers, comments, opinions, conclusions, or recommendations are those of the author(s), and do not necessarily reflectthe views of the publisher.
This publication is printed on acid-free paper. ©�ANSI Z39.48-1984 (American Standards Institute) Permanence of Paper for Printed Library Materials
Cover illustration: From Figure 1, Volume 1, Chapter 10, “PSI-BLAST Tutorial,” by Medha Bhagwat and L. Aravind.Ribbon diagrams comparing the three-dimensional structures of the human PCNA protein and the E. coli DNA polymeraseIII beta subunit. The coordinates for these structures are taken from a public database.
Cover Design: Karen Schulz
Production Editor: Christina M. Thomas
For additional copies, pricing for bulk purchases, and/or information about other Humana titles, contact Humana at the aboveaddress or at any of the following numbers: Tel.: 973-256-1699; Fax: 973-256-8341; E-mail: [email protected]; orvisit our Website: www.humanapress.com
Photocopy Authorization Policy: Authorization to photocopy items for internal or personal use, or the internal or personaluse of specific clients, is granted by Humana Press Inc., provided that the base fee of US $30 copy is paid directly to theCopyright Clearance Center at 222 Rosewood Drive, Danvers, MA 01923. For those organizations that have been granteda photocopy license from the CCC, a separate system of payment has been arranged and is acceptable to Humana PressInc. The fee code for users of the Transactional Reporting Service is: [978-1-934115-37-4/07 $30].
Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1
Library of Congress Control Number: 2007930590
Preface
Over the last ten years the amount of biological sequence data availableto researchers has increased by several orders of magnitude, and completegenome sequences (nearly nonexistent ten years ago) have become common-place. The techniques involved in analyzing these sequences have evolvedalmost as rapidly, and several (e.g, BLAST) have become so commonly usedin molecular biology that their names have become verbs. Even so, a number ofextremely powerful tools and techniques developed for comparative genomicanalysis remain unfamiliar to molecular biologists, and thus are underutilized.
The primary aim of these volumes is to provide a set of tutorials that will beuseful to molecular biologists beginning to use comparative genomic analysistools in a number of different areas. Volume I contains the first four of sevensections: In the first section, the reader is introduced to genomes via a numberof visualization tools that allow one to browse through a particular genomeof interest. The second and third sections deal with comparative analysis atthe level of individual sequences, and present methods useful in sequencealignment, the discovery of conserved sequence motifs, and the analysis ofcodon usage. The fourth section deals with the identification and structuralcharacterization of non-coding RNA genes—this class of genes is particularlydifficult to predict, and discovery of these elements is almost completely relianton comparative genomics. (Note that the much larger question of identifyingprotein-coding genes is not addressed here, because there a separate volume inthe MiMB series devoted to this issue).
In the second volume, the fifth section describes a number of tools forcomparative analysis of domain and gene families. These tools are particu-larly useful for predicting protein function as well as potential protein-proteininteractions. In the sixth section, methods for comparing groups of genes andgene order are discussed, as are several tools for analyzing genome evolution.Finally, the seventh section deals with experimental comparative genomics.This section includes methods for comparing gene copy number across anentire genome, comparative genomic hybridization, SNP analysis, as well asgenome-wide mapping and typing systems for bacterial genomes.
v
vi Preface
Each chapter includes not only detailed instructions for using a particulartool or method, but also an introduction to the theory behind the technique.Importantly, there are also a number of Notes at the end of each chapterthat guide the beginning user through commonly encountered difficulties, andprovide key tips for using the method most efficiently. Readers are encouragedto note that although some of tools presented in a given section are quitesimilar in aim, they are often designed quite differently, and will have differentstrengths and weaknesses. This is particularly true in considering the computa-tional tools, where the same overall goal (e.g., discovery of conserved motifs)can be pursued using a number of very different statistical approaches. Usersshould therefore explore several different options in attempting comparativeanalyses—a combined approach is often best.
These volumes are the collective effort of many people. I would like toextend a special thanks to all of the contributors, and to the staff at HumanaPress, who helped at every stage of the publication process. I would also liketo especially thank Erica Anderson and Ellen Swenson at the University ofMichigan Medical School and Tim Read at the US Naval Medical ResearchCenter for valuable advice, and help in putting these books together.
Nicholas H. Bergman, PhD
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vContributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiTable of Contents—Volume 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Part I: Comparative Analysis of Domain and Protein
Families
1 Computational Prediction of Domain InteractionsPhilipp Pagel, Normann Strack, Matthias Oesterheld,
Volker Stümpflen, and Dmitrij Frishman . . . . . . . . . . . . . . . . . . . . . . . 3
2 Domain Team: Synteny of Domains is a New Approachin Comparative Genomics
Sophie Pasek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Inference of Gene Function Based on Gene Fusion Events:The Rosetta-Stone Method
Karsten Suhre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Pfam: A Domain-Centric Method for Analyzing Proteinsand Proteomes
Jaina Mistry and Robert Finn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5 InterPro and InterProScan: Tools for Protein SequenceClassification and Comparison
Nicola Mulder and Rolf Apweiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6 Gene Annotation and Pathway Mapping in KEGGKiyoko F. Aoki-Kinoshita and Minoru Kanehisa . . . . . . . . . . . . . . . . . . . . 71
Part II: Orthologs, Synteny, and Genome Evolution
7 Ortholog Detection Using the Reciprocal SmallestDistance Algorithm
Dennis P. Wall and Todd DeLuca . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8 Finding Conserved Gene Order Across Multiple GenomesGiulio Pavesi and Graziano Pesole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
vii
viii Table of Contents
9 Analysis of Genome Rearrangement by Block-InterchangesChin Lung Lu, Ying Chih Lin, Yen Lin Huang,
and Chuan Yi Tang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
10 Analyzing Patterns of Microbial Evolution Using the MauveGenome Alignment System
Aaron E. Darling, Todd J. Treangen, Xavier Messeguer,and Nicole T. Perna. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
11 Visualization of Syntenic Relationships With SynBrowseVolker Brendel, Stefan Kurtz, and Xioakang Pan . . . . . . . . . . . . . . . . . . 153
12 Gecko and GhostFam: Rigorous and Efficient Gene ClusterDetection in Prokaryotic Genomes
Thomas Schmidt and Jens Stoye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Part III: Experimental Analysis of Whole Genomes: ANALYSIS
OF COPY NUMBER AND SEQUENCE POLYMORPHISMS
13 Genome-wide Copy Number Analysis on GeneChip®
Platform Using Copy Number Analyzer for AffymetrixGeneChip 2.0 Software
Seishi Ogawa, Yasuhito Nanya, and Go Yamamoto . . . . . . . . . . . . . . . 185
14 Oligonucleotide Array Comparative GenomicHybridization
Paul van den IJssel and Bauke Ylstra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
15 Studying Bacterial Genome DynamicsUsing Microarray-Based Comparative GenomicHybridization
Eduardo N. Taboada, Christian C. Luebbert,and John H. E. Nash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
16 DNA Copy Number Data Analysis Using the CGHAnalyzerSoftware Suite
Joel Greshock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
17 Microarray-Based Approach for Genome-Wide Survey ofNucleotide Polymorphisms
Brian W. Brunelle and Tracy L. Nicholson . . . . . . . . . . . . . . . . . . . . . . . . 267
18 High-Throughput Genotyping of Single NucleotidePolymorphisms with High Sensitivity
Honghua Li, Hui-Yun Wang, Xiangfeng Cui, Minjie Luo,Guohong Hu, Danielle M. Greenawalt, Irina V. Tereshchenko,James Y. Li, Yi Chu, and Richeng Gao . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Table of Contents ix
19 Single Nucleotide Polymorphism Mapping Array AssayXiaofeng Zhou and David T. W. Wong . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
20 Molecular Inversion Probe AssayFarnaz Absalan and Mostafa Ronaghi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
21 novoSNP3: Variant Detection and Sequence Annotationin Resequencing Projects
Peter De Rijk and Jurgen Del-Favero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
22 Rapid Identification of Single Nucleotide SubstitutionsUsing SeqDoC
Mark L. Crowe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
23 SNPHunter: A Versatile Web-Based Tool for Acquiringand Managing Single Nucleotide Polymorphisms
Tianhua Niu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
24 Identification of Disease Genes: Example-DrivenWeb-Based Tutorial
Medha Bhagwat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
25 Variable Number Tandem Repeat Typing of BacteriaSiamak P. Yazdankhah and Bjørn-Arne Lindstedt . . . . . . . . . . . . . . . . . . 395
26 Fluorescent Amplified Fragment Length PolymorphismGenotyping of Bacterial Species
Meeta Desai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
27 FLP-Mapping: A Universal, Cost-Effective, andAutomatable Method for Gene Mapping
Knud Nairz, Peder Zipperlen, and Manuel Schneider. . . . . . . . . . . . . . 419
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Contributors
Farnaz Absalan • Stanford Genome Technology CenterKiyoko Aoki-Kanehisa • Department of Bioinformatics Soka University,
Faculty of EngineeringRolf Apweiler • European Bioinformatics InstituteMedha Bhagwat • National Center for Biotechnology Information,
National Library of Medicine, National Institutes of HealthVolker Brendel • Department of Genetics, Development, and Cell Biology,
Iowa State UniversityBrian Brunelle • Virus and Prion Diseases of Livestock Research Unit,
National Animal Disease Center, USDA Agricultural Research ServiceYi Chu • University of Medicine and Dentistry of New Jersey, Robert Wood
Johnson Medical School, Department of Molecular Genetics, Microbiology& Immunology
Mark L. Crowe • Genetic Solutions Pty Ltd, AustraliaXiangfeng Cui • University of Medicine and Dentistry of New Jersey,
Robert Wood Johnson Medical School, Department of Molecular Genetics,Microbiology & Immunology
Aaron Darling • Department of Computer Science, University ofWisconsin-Madison
Peter De Rijk • Department of Molecular Genetics, University of AntwerpJurgen Del-Favero • Department of Molecular Genetics, University of
AntwerpTodd DeLuca• Department of Systems Biology, Harvard Medical SchoolMeeta Desai • Applied and Functional Genomics, Health Protection
Agency, United KingdomRobert Finn • Wellcome Trust Sanger InstituteDmitrij Frishman • Technical University of Munich, Department of Genome
Oriented Bioinformatics, Institute for Bioinformatics/MIPS, GSF—ResearchCenter for Environment and Health
xi
xii Contributors
Richeng Gao • University of Medicine and Dentistry of New Jersey,Robert Wood Johnson Medical School, Department of Molecular Genetics,Microbiology & Immunology
Danielle M. Greenawalt • University of Medicine and Dentistry ofNew Jersey, Robert Wood Johnson Medical School, Department ofMolecular Genetics, Microbiology & Immunology
Joel Greshock • GlaxoSmithKline, Abramson Family Cancer ResearchInstitute, University Pennsylvania School of Medicine
Guohong Hu • University of Medicine and Dentistry of New Jersey,Robert Wood Johnson Medical School, Department of Molecular Genetics,Microbiology & Immunology
Yen Lin Huang • Department of Computer Science, National Tsing HuaUniversity
Minoru Kanehisa • Kyoto University Bioinformatics Center, HumanGenome Center, Institute of Medical Science, University of Tokyo
Stefan Kurtz • Department of Genetics, Development, and Cell Biology,Iowa State University
Honghua Li • University of Medicine and Dentistry of New Jersey,Robert Wood Johnson Medical School, Department of Molecular Genetics,Microbiology & Immunology
James Y. Li • University of Medicine and Dentistry of New Jersey,Robert Wood Johnson Medical School, Department of Molecular Genetics,Microbiology & Immunology
Ying Chih Lin • Department of Computer Science, National Tsing HuaUniversity
Bjørn-Arne Lindstedt • Norwegian Institute of Public HealthChin Lung Lu • Department of Biological Science and Technology,
National Chiao Tung UniversityChristian C. Luebbert • Genomics and Proteomics Group, Institute for
Biological Sciences, Canadian National Research CouncilMinjie Luo • University of Medicine and Dentistry of New Jersey,
Robert Wood Johnson Medical School, Department of Molecular Genetics,Microbiology & Immunology
Xavier Messeguer • Department of Software, Technical University ofCatalonia-Barcelona, Barcelona Supercomputing Center (BSC)
Jaina Mistry • Wellcome Trust Sanger InstituteNicola Mulder • European Bioinformatics InstituteKnud Nairz • Institute of Neuropathology, University Hospital of Zurich
Contributors xiii
Yasuhito Nanya • University of Tokyo, Department of RegenerationMedicine
John H.E. Nash • Genomics and Proteomics Group, Institute for BiologicalSciences, Canadian National Research Council
Tracy Nicholson • Respiratory Diseases of Livestock Research Unit,National Animal Disease Center, USDA Agricultural Research Service
Tianhua Niu • Division of Preventative Medicine, Department of Medicine,Brigham and Women’s Hospital, Harvard Medical School
Matthias Oesterheld • Institute for Bioinformatics/MIPS, GSF—ResearchCenter for Environment and Health
Seishi Ogawa • University of Tokyo, Department of Regeneration MedicinePhilipp Pagel • Technical University of Munich, Department of Genome
Oriented Bioinformatics, Institute for Bioinformatics/MIPS, GSF—ResearchCenter for Environment and Health
Xioakang Pan • Department of Genetics, Development, and Cell Biology,Iowa State University
Sophie Pasek • Laboratoire Statistique et Génome, CNRSGiulio Pavesi • Dipartimento di Scienze Biomolecolari e Biotecnologie,
University of MilanNicole T. Perna • Department of Animal Health and Biomedical Sciences
Genome Center, University of Wisconsin-MadisonGraziano Pesole • Dipartimento di Biochimica e Biologia Molecolare,
University of Bari and Istituto Tecnologie Biomediche del C.N.R. (sede diBari)
Mostafa Ronaghi • Stanford Genome Technology CenterThomas Schmidt • Technische Fakultät, Universitat Bielefeld, International
NRW Graduate School in Bioinformatics and Genome Research, GermanyManuel Schneider • Kantonsschule ZugJens Stoye • Technische Fakultät, Universitat Bielefeld, GermanyNormann Strack • Technical University of Munich, Department of Genome
Oriented BioinformaticsVolker Stümpfeln • Institute for Bioinformatics/MIPS, GSF—Research
Center for Environment and HealthKarsten Suhre • Information Génomique et Structurale, CNRSEduardo N. Taboada • Genomics and Proteomics Group, Institute for
Biological Sciences, Canadian National Research CouncilChuan Yi Tang • Department of Computer Science, National Tsing Hua
University
xiv Contributors
Irina V. Tereshchenko • University of Medicine and Dentistry ofNew Jersey, Robert Wood Johnson Medical School, Department ofMolecular Genetics, Microbiology & Immunology
Todd Treangen • Department of Software, Technical University ofCatalonia-Barcelona
Paul van den IJssel • VU University Medical CenterDennis P. Wall • Department of Systems Biology, Harvard Medical SchoolHui-Yun Wang • University of Medicine and Dentistry of New Jersey,
Robert Wood Johnson Medical School, Department of Molecular Genetics,Microbiology & Immunology
David T.W. Wong • UCLA School of DentistryGo Yamamoto • University of Tokyo, Department of Regeneration MedicineSiamak P. Yazdankhah • Norwegian Institute of Public HealthBauke Ylstra • VU University Medical CenterXiaofeng Zhou • UCLA School of DentistryPeder Zipperlen • Tecan Schweiz AG
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixContributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Part 1: Genome Visualization and Annotation 11 Comparative Analysis and Visualization of Genomic
Sequences Using VISTA Browser and AssociatedComputational Tools
Inna Dubchak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Comparative Genomic Analysis Using the UCSC GenomeBrowser
Donna Karolchik, Gill Bejerano, Angie S. Hinrichs, RobertM. Kuhn, Webb Miller, Kate R. Rosenbloom, Ann S. Zweig,David Haussler, and W. James Kent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Comparative Genome Analysis in the Integrated MicrobialGenomes (IMG) System
Victor M. Markowitz and Nikos C. Kyrpides . . . . . . . . . . . . . . . . . . . . . . 35
4 WebACT: An Online Genome Comparison SuiteJames C. Abbott, David M. Aanensen,
and Stephen D. Bentley . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5 GenColors: Annotation and Comparative Genomicsof Prokaryotes Made Easy
Alessandro Romualdi, Marius Felder, Dominic Rose, UlrikeGausmann, Markus Schilhabel, Gernot Glöckner,Matthias Platzer, and Jürgen Sühnel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Comparative Microbial Genome VisualizationUsing GenomeViz
Rohit Ghai and Trinad Chakraborty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7 BugView: A Tool for Genome Visualizationand Comparison
David P. Leader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
xv
xvi Table of Contents
8 CGAS: A Comparative Genome Annotation SystemKwangmin Choi, Youngik Yang, and Sun Kim . . . . . . . . . . . . . . . . . . . . . 133
Part 2: Sequence Alignments 1479 BLAST QuickStart: Example-Driven Web-Based BLAST
TutorialDavid Wheeler and Medha Bhagwat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
10 PSI-BLAST TutorialMedha Bhagwat and L. Aravind. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
11 Organizing and Updating Whole Genome BLAST SearchesWith ReHAB
David J. Esteban, Aijazuddin Syed, and Chris Upton . . . . . . . . . . . . . . 187
12 Alignment of Genomic Sequences Using DIALIGNBurkhard Morgenstern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
13 An Introduction to the Lagan Alignment ToolkitMichael Brudno . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
14 Aligning Multiple Whole Genomes with Mercatorand MAVID
Colin N. Dewey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
15 Mulan: Multiple-Sequence Alignment to Predict FunctionalElements in Genomic Sequences
Gabriela G. Loots and Ivan Ovcharenko . . . . . . . . . . . . . . . . . . . . . . . . . . 237
16 Improving Pairwise Sequence Alignment between DistantlyRelated Proteins
Jin-an Feng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Part 3: Identification of Conserved Sequences
and Biases in Codon Usage 26917 Discovering Sequence Motifs
Timothy L. Bailey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
18 Discovery of Conserved Motifs in Promoters of OrthologousGenes in Prokaryotes
Rekin’s Janky and Jacques van Helden . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
19 PhyME: A Software tool for Finding Motifs in Setsof Orthologous Sequences
Saurabh Sinha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Table of Contents xvii
20 Comparative Genomics-Based OrthologousPromoter Analysis Using the DoOP Databaseand the DoOPSearch Web Tool
Endre Barta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
21 Discovery of Motifs in Promoters of Coregulated GenesOlivier Sand and Jacques van Helden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
22 Fastcompare: A Nonalignment Approach forGenome-Scale Discovery of DNA and mRNA RegulatoryElements Using Network-Level Conservation
Olivier Elemento and Saeed Tavazoie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
23 Phylogenetic Footprinting to Find Functional DNA ElementsAusten R. D. Ganley and Takehiko Kobayashi . . . . . . . . . . . . . . . . . . . . . 367
24 Detecting Regulatory Sites Using PhyloGibbsRahul Siddharthan and Erik van Nimwegen . . . . . . . . . . . . . . . . . . . . . . . 381
25 Using the Gibbs Motif Sampler for PhylogeneticFootprinting
William Thompson, Sean Conlan, Lee Ann McCue,and Charles E. Lawrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
26 Web-Based Identification of Evolutionary Conserved DNAcis-Regulatory Elements
Panayiotis V. Benos, David L. Corcoran, and Eleanor Feingold . . . . 425
27 Exploring Conservation of Transcription Factor BindingSites with CONREAL
Eugene Berezikov, Victor Guryev, and Edwin Cuppen . . . . . . . . . . . . . 437
28 Computational and Statistical Methodologies for ORFeomePrimary Structure Analysis
Gabriela Moura, Miguel Pinheiro, Adelaide Valente Freitas,José Luís Oliveira, and Manuel A. S. Santos . . . . . . . . . . . . . . . . . . . . 449
Part 4: Identification and Structural Characterization of
Noncoding RNAs 46329 Comparative Analysis of RNA Genes: The caRNAc
SoftwareHélène Touzet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
30 Efficient Annotation of Bacterial Genomes for Small,Noncoding RNAs Using the Integrative ComputationalTool sRNAPredict2
Jonathan Livny . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
xviii Table of Contents
31 Methods for Multiple Alignment and Consensus StructurePrediction of RNAs Implemented in MARNA
Sven Siebert and Rolf Backofen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
32 Prediction of Structural Noncoding RNAs With RNAzStefan Washietl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
33 RNA Consensus Structure Prediction With RNAalifoldIvo L. Hofacker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545