Introduction to the Laboratory for Integrated Bioinformatics Todd Taylor Team Leader RIKEN Center for Integrative Medical Sciences Metagenomics is the culture-independent cloning and analysis of bacterial DNA • Environmental or community genomics • Often we cannot isolate individual bacteria, so must study their genomes as a mixture • DNA is extracted directly from the environmental sample, no culturing Why the interest in metagenomics? • Bacteria account for the vast majority of life forms on Earth • Only 1-2% of microbes can be cultured • Most species do not survive in isolation • Widespread applications • Identification and synthesis of novel drugs and chemicals • Human health - probiotics • Biodegradation - sewage, ocean pollutants, plastics, garbage, nuclear waste, etc. • Energy generation - production of clean “green” fuels Metagenomics differs radically from traditional genomics Ocean: Microbes floating in vast sea Human Gut: Closed system for microbes Whale Fall: Microbes come together to degrade whale mass Soil: Dry environment with numerous microbes Complex environmental systems have millions of microbes Several microbes interact together to complete biochemical pathways Vast numbers of bacteria colonize different parts of the body Numbers represent the number of organisms per gram of homogenized tissue or fluid or per square centimeter of skin surface Some of the mechanisms by which the normal flora competes with invading pathogens We aim to reconstruct the combined community metabolic model We can compare phylogenetic trees from multiple metagenomic samples • A spore forming gram-positive bacteria found in mammalian gut. • Induce the appearance of CD4 + T helper cells that produce IL-17 and IL-22 (Th17 cells). • Colonization leads to increased expression of inflammatory and antimicrobial defense related genes, such as Serum Amyloid A. SFB monocolonized mice and rat Feces and cecal content collected, microbial DNA extracted* Genome sequencing by whole-genome shotgun strategy using Sanger and 454 pyrosequencing SFB Genome: Unculturable bacteria that induces the differentiation of Th17 cells in mice gut Accepted in Cell Host & Microbe 2011 Pathway construction in SFB-rat-Yit Conservation of TLR5-binding motifs in SFB flagellin proteins Flagellins are the agonists of TLR5, which in turn directs Th17 cell production Chemotaxis and Flagellar assembly genes in SFB Expression of flagellin genes in mouse SFB There are many computational & bioinformatic challenges & bottlenecks to overcome • Data management & storage – In-house & off-site • Data transfer – Current internet is not capable • Metagenomic sequence assembly – Diverse, complex, huge datasets – Huge memory is required • Data analysis and massive parallel processing – Unprecedented scale • Highly fragmented & incomplete data – How to make sense of it? • Data integration – Comparison with other datasets and information resources • More sure to come... What do we hope to achieve? • Use bioinformatic approaches to: • Model entire ecosystems – Environment – Health • Predict and understand – Fluctuations over time – Impact from external forces – Modifications at genetic level – Manipulation of environments for benefit of living species and long-term sustainability of the earth • Develop tools and pipelines – Support other labs with high-throughput analyses