Top Banner
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Lecture 1: EVE 161: Microbial Phylogenomics Lecture #1: Introduction UC Davis, Winter 2014 Instructor: Jonathan Eisen 1
80

EVE161 Lecture 1

May 10, 2015

Download

Education

Jonathan Eisen

EVE161 Lecture 1

Microbial Phylogenomics course at UC Davis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Lecture 1:

EVE 161:Microbial Phylogenomics

!Lecture #1: Introduction

!UC Davis, Winter 2014

Instructor: Jonathan Eisen

!1

Page 2: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Where we are going and where we have been

• Previous lecture: !

• Current Lecture: !1. Introduction

• Next Lecture: !2. Evolution of DNA sequencing

!2

Page 3: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Lecture 1 Outline

• Course details

• Four eras of sequencing

• Introduction to phylogeny

!3

Page 4: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Main topics of the course

• DNA sequence based studies of microbial diversity

• Four Eras of sequencing !The Tree of Life ! rRNA from environments !Genome Sequencing !Metagenomics

!4

Page 5: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Textbook/Reading

• Each lecture will have some associated background reading and 1+ primary literature papers

• Whenever possible, the primary literature used will be “Open Access” material

• There will also be news stories, blogs and other “media” to review / read

!5

Page 6: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

What you should learn from the course

• History of sequence based studies of microbial diversity

• Current practice in sequence based studies of microbial diversity

• Broad view of what we know about microbial diversity

• How to read and analyze a research paper

!6

Page 7: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Grading

• Attendance and class participation 20 % • Weekly assignments 20 % • Midterm 20 % • Final presentation 20% • Final exam 20%

!7

Page 8: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Student project

• Select 1-2 papers on one of the topics of the course (approval needed)

• Review the paper and write up a summary of your assessment of the paper (more detail on this later)

• Post your assessment on the course blog • Present a short summary of what you did to the

class • Ask and answer questions about your and other

people’s reviews on the course blog

!8

Page 9: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Contact information

• Jonathan Eisen, Professor – [email protected] – Phone 752-3498 – Office Hours: TBD

• Holly Ganz – [email protected] – Office Hours: TBD

!9

Page 10: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Course Information

• SmartSite

• Also will be posting for the broader community at http://microbe.net/eve161

!10

Page 11: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Introduction to EVE161

!11

Page 12: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Microbial Diversity

!12

Page 13: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Microbial Diversity

• Microbes are small • But diversity and numbers are very high • Appearance not a good indicator of type or function • Field observations of limited value

!13

Page 14: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Diversity of Form

!14

Page 15: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Diversity of Function

!15

The Bad The Good The Unusual

The Consumable The Burnable The Planet

Page 16: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Phylogenetic Diversity

!16

Page 17: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Phylogeny was central to Darwin’s Work on Natural Selection

!17

Page 18: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Phylogeny

• Phylogeny is a description of the evolutionary history of relationships among organisms (or their parts).

• This is portrayed in a diagram called a phylogenetic tree.

• Phylogenetic trees are used to depict the evolutionary history of populations, species and genes.

• The Tree of Life refers to the concept that all living organisms are related to one another through shared ancestry.

Ch. 25.1!18

Page 19: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Four Eras of Sequence & Microbial Diversity

!19

Page 20: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Relevant Reading

• Eisen JA. Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLOS Biology 5(3): e82.

!20

Page 21: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Moore’s Law

!21

Page 22: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era I: rRNA Tree of Life

!22

Era I: rRNA Tree of Life

Page 23: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !23

Era I: rRNA Tree of Life

Lectures 3-4

Era I: rRNA Tree of Life

Page 24: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Plantae Protista Animalia

Ernst Haeckel 1866

!24

Page 25: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Monera Protista Plantae Fungi

Animalia

Whittaker – Five Kingdoms 1969

!25

Page 26: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014http://mcb.illinois.edu/faculty/profile/1204

Carl Woese

!26

Page 27: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !27

Page 28: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !28

Page 29: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !29

Page 30: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

• Abstract: A phylogenetic analysis based upon ribosomal RNA sequence characterization reveals that living systems represent one of three aboriginal lines of descent: (i) the eubacteria, comprising all typical bacteria; (ii) the archaebacteria, containing methanogenic bacteria; and (iii) the urkaryotes, now represented in the cytoplasmic component of eukaryotic cells.

Woese and Fox

!30

Page 31: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Propose “three aboriginal lines of descent”

! Eubacteria ! Archaebacteria ! Urkaryotes

!31

Page 32: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Woese 1987

!32

Page 33: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

• Appearance of microbes not informative (enough)

• rRNA Tree of Life identified two major groups of organisms w/o nuclei

• rRNA powerful for many reasons, though not perfect

!33

Page 34: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Tree of Life

• Three main kinds of organisms ! Bacteria ! Archaea ! Eukaryotes

• Viruses not alive, but some call them microbes

• Many misclassifications occurred before the use of molecular methods

!34

Page 35: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Tree of Life

adapted from Baldauf, et al., in Assembling the Tree of Life, 2004

!35

Page 36: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

adapted from Baldauf, et al., in Assembling the Tree of Life, 2004

Most of the phylogenetic diversity of life is microbial

!36

Page 37: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Simplified, Rooted Tree of Life

!37

Page 38: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Alternative rooted tree of life

Archaea

Archaea

!38

Page 39: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era II: rRNA in the Environment

!39

Era II: rRNA in the Environment

Page 40: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era II: rRNA in the Environment

!40

Era II: rRNA in the Environment

Lectures 5-9

Page 41: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Plant/Animal Field Studies

!41

Page 42: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Microbial Field Studies

!42

Page 43: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Culturing Microbes

!43

Page 44: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Great Plate Count Anomaly

!44

Page 45: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Culturing Microscopy

Great Plate Count Anomaly

!45

Page 46: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Culturing Microscopy

CountCount

Great Plate Count Anomaly

!46

Page 47: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

<<<<

Great Plate Count Anomaly

!47

Culturing Microscopy

CountCount

Page 48: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Great Plate Count Anomaly

!48

Problem because appearance not

effective for “who is out there?” or “what are they

doing?”

<<<<

Culturing Microscopy

CountCount

Page 49: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Great Plate Count Anomaly

!49

Problem because appearance not

effective for “who is out there?” or “what are they

doing?”

<<<<

Culturing Microscopy

CountCount

Solution?

Page 50: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Great Plate Count Anomaly

!50

Problem because appearance not

effective for “who is out there?” or “what are they

doing?”

<<<<

Culturing Microscopy

CountCount

Solution?

DNA

Page 51: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Collect from environment

Analysis of uncultured microbes

!51

Page 52: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

DNA extraction

PCR Sequence rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

Yeast

Makes lots of copies of the rRNA genes in sample

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

Yeast

!52

rRNA1 5’ ...TACAGTATAGGTGGAGCTAGCGAT

CGATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

Page 53: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

DNA extraction

PCR Sequence rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’ ...ACACACATAG

GTGGAGCTAGCGATCGATCGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2

!53

rRNA2 5’ ...TACAGTATAGGTGGAGCTAGCGAT

CGATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

Yeast T A C A G TYeast

Page 54: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

DNA extraction

PCR Sequence rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’...ACACACATAGGTGGAGC

TAGCGATCGATCGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2

!54

rRNA2 5’..TACAGTATAGGTGGAGCT

AGCGACGATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

rRNA3 5’...ACGGCAAAATAGGTGGA

TTCTAGCGATATAGA... 3’

rRNA4 5’...ACGGCCCGATAGGTGGATTCTAGCGCCATAGA... 3’

rRNA3 C A C T G T

rRNA4 C A C A G T

Yeast T A C A G T

Yeast

rRNA3 rRNA4

Page 55: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

PCR

!55

PCR and phylogenetic analysis of rRNA genes

Page 56: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Major phyla of bacteria & archaea (as of 2002)

No cultures

Some cultures

!56

Page 57: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

The Hidden Majority Richness estimates

Bohannan and Hughes 2003Hugenholtz 2002

!57

Page 58: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Censored

Censored

Human microbiome case study

!58

Page 59: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Built Environment Case Study

ORIGINAL ARTICLE

Architectural design influences the diversity andstructure of the built environment microbiome

Steven W Kembel1, Evan Jones1, Jeff Kline1,2, Dale Northcutt1,2, Jason Stenson1,2,Ann M Womack1, Brendan JM Bohannan1, G Z Brown1,2 and Jessica L Green1,3

1Biology and the Built Environment Center, Institute of Ecology and Evolution, Department ofBiology, University of Oregon, Eugene, OR, USA; 2Energy Studies in Buildings Laboratory,Department of Architecture, University of Oregon, Eugene, OR, USA and 3Santa Fe Institute,Santa Fe, NM, USA

Buildings are complex ecosystems that house trillions of microorganisms interacting with eachother, with humans and with their environment. Understanding the ecological and evolutionaryprocesses that determine the diversity and composition of the built environment microbiome—thecommunity of microorganisms that live indoors—is important for understanding the relationshipbetween building design, biodiversity and human health. In this study, we used high-throughputsequencing of the bacterial 16S rRNA gene to quantify relationships between building attributes andairborne bacterial communities at a health-care facility. We quantified airborne bacterial communitystructure and environmental conditions in patient rooms exposed to mechanical or windowventilation and in outdoor air. The phylogenetic diversity of airborne bacterial communities waslower indoors than outdoors, and mechanically ventilated rooms contained less diverse microbialcommunities than did window-ventilated rooms. Bacterial communities in indoor environmentscontained many taxa that are absent or rare outdoors, including taxa closely related to potentialhuman pathogens. Building attributes, specifically the source of ventilation air, airflow rates, relativehumidity and temperature, were correlated with the diversity and composition of indoor bacterialcommunities. The relative abundance of bacteria closely related to human pathogens was higherindoors than outdoors, and higher in rooms with lower airflow rates and lower relative humidity.The observed relationship between building design and airborne bacterial diversity suggests thatwe can manage indoor environments, altering through building design and operation the communityof microbial species that potentially colonize the human microbiome during our time indoors.The ISME Journal advance online publication, 26 January 2012; doi:10.1038/ismej.2011.211Subject Category: microbial population and community ecologyKeywords: aeromicrobiology; bacteria; built environment microbiome; community ecology; dispersal;environmental filtering

Introduction

Humans spend up to 90% of their lives indoors(Klepeis et al., 2001). Consequently, the way wedesign and operate the indoor environment has aprofound impact on our health (Guenther andVittori, 2008). One step toward better understandingof how building design impacts human healthis to study buildings as ecosystems. Built envi-ronments are complex ecosystems that containnumerous organisms including trillions of micro-organisms (Rintala et al., 2008; Tringe et al., 2008;Amend et al., 2010). The collection of microbiallife that exists indoors—the built environment

microbiome—includes human pathogens and com-mensals interacting with each other and with theirenvironment (Eames et al., 2009). There have beenfew attempts to comprehensively survey the builtenvironment microbiome (Rintala et al., 2008;Tringe et al., 2008; Amend et al., 2010), with moststudies focused on measures of total bioaerosolconcentrations or the abundance of culturable orpathogenic strains (Berglund et al., 1992; Toivolaet al., 2002; Mentese et al., 2009), rather than a morecomprehensive measure of microbial diversity inindoor spaces. For this reason, the factors thatdetermine the diversity and composition of the builtenvironment microbiome are poorly understood.However, the situation is changing. The develop-ment of culture-independent, high-throughputmolecular sequencing approaches has transformedthe study of microbial diversity in a variety ofenvironments, as demonstrated by the recent explo-sion of research on the microbial ecology of aquaticand terrestrial ecosystems (Nemergut et al., 2011)

Received 23 October 2011; revised 13 December 2011; accepted13 December 2011

Correspondence: SW Kembel, Biology and the Built EnvironmentCenter, Institute of Ecology and Evolution, Department of Biology,University of Oregon, Eugene, OR 97405, USA.E-mail: [email protected]

The ISME Journal (2012), 1–11& 2012 International Society for Microbial Ecology All rights reserved 1751-7362/12

www.nature.com/ismej

Microbial Biogeography of Public Restroom SurfacesGilberto E. Flores1, Scott T. Bates1, Dan Knights2, Christian L. Lauber1, Jesse Stombaugh3, Rob Knight3,4,

Noah Fierer1,5*

1 Cooperative Institute for Research in Environmental Science, University of Colorado, Boulder, Colorado, United States of America, 2 Department of Computer Science,

University of Colorado, Boulder, Colorado, United States of America, 3 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, United

States of America, 4 Howard Hughes Medical Institute, University of Colorado, Boulder, Colorado, United States of America, 5 Department of Ecology and Evolutionary

Biology, University of Colorado, Boulder, Colorado, United States of America

Abstract

We spend the majority of our lives indoors where we are constantly exposed to bacteria residing on surfaces. However, thediversity of these surface-associated communities is largely unknown. We explored the biogeographical patterns exhibitedby bacteria across ten surfaces within each of twelve public restrooms. Using high-throughput barcoded pyrosequencing ofthe 16 S rRNA gene, we identified 19 bacterial phyla across all surfaces. Most sequences belonged to four phyla:Actinobacteria, Bacteriodetes, Firmicutes and Proteobacteria. The communities clustered into three general categories: thosefound on surfaces associated with toilets, those on the restroom floor, and those found on surfaces routinely touched withhands. On toilet surfaces, gut-associated taxa were more prevalent, suggesting fecal contamination of these surfaces. Floorsurfaces were the most diverse of all communities and contained several taxa commonly found in soils. Skin-associatedbacteria, especially the Propionibacteriaceae, dominated surfaces routinely touched with our hands. Certain taxa were morecommon in female than in male restrooms as vagina-associated Lactobacillaceae were widely distributed in femalerestrooms, likely from urine contamination. Use of the SourceTracker algorithm confirmed many of our taxonomicobservations as human skin was the primary source of bacteria on restroom surfaces. Overall, these results demonstrate thatrestroom surfaces host relatively diverse microbial communities dominated by human-associated bacteria with clearlinkages between communities on or in different body sites and those communities found on restroom surfaces. Moregenerally, this work is relevant to the public health field as we show that human-associated microbes are commonly foundon restroom surfaces suggesting that bacterial pathogens could readily be transmitted between individuals by the touchingof surfaces. Furthermore, we demonstrate that we can use high-throughput analyses of bacterial communities to determinesources of bacteria on indoor surfaces, an approach which could be used to track pathogen transmission and test theefficacy of hygiene practices.

Citation: Flores GE, Bates ST, Knights D, Lauber CL, Stombaugh J, et al. (2011) Microbial Biogeography of Public Restroom Surfaces. PLoS ONE 6(11): e28132.doi:10.1371/journal.pone.0028132

Editor: Mark R. Liles, Auburn University, United States of America

Received September 12, 2011; Accepted November 1, 2011; Published November 23, 2011

Copyright: ! 2011 Flores et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported with funding from the Alfred P. Sloan Foundation and their Indoor Environment program, and in part by the NationalInstitutes of Health and the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, orpreparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

Introduction

More than ever, individuals across the globe spend a largeportion of their lives indoors, yet relatively little is known about themicrobial diversity of indoor environments. Of the studies thathave examined microorganisms associated with indoor environ-ments, most have relied upon cultivation-based techniques todetect organisms residing on a variety of household surfaces [1–5].Not surprisingly, these studies have identified surfaces in kitchensand restrooms as being hot spots of bacterial contamination.Because several pathogenic bacteria are known to survive onsurfaces for extended periods of time [6–8], these studies are ofobvious importance in preventing the spread of human disease.However, it is now widely recognized that the majority ofmicroorganisms cannot be readily cultivated [9] and thus, theoverall diversity of microorganisms associated with indoorenvironments remains largely unknown. Recent use of cultiva-tion-independent techniques based on cloning and sequencing ofthe 16 S rRNA gene have helped to better describe these

communities and revealed a greater diversity of bacteria onindoor surfaces than captured using cultivation-based techniques[10–13]. Most of the organisms identified in these studies arerelated to human commensals suggesting that the organisms arenot actively growing on the surfaces but rather were depositeddirectly (i.e. touching) or indirectly (e.g. shedding of skin cells) byhumans. Despite these efforts, we still have an incompleteunderstanding of bacterial communities associated with indoorenvironments because limitations of traditional 16 S rRNA genecloning and sequencing techniques have made replicate samplingand in-depth characterizations of the communities prohibitive.With the advent of high-throughput sequencing techniques, wecan now investigate indoor microbial communities at anunprecedented depth and begin to understand the relationshipbetween humans, microbes and the built environment.

In order to begin to comprehensively describe the microbialdiversity of indoor environments, we characterized the bacterialcommunities found on ten surfaces in twelve public restrooms(six male and six female) in Colorado, USA using barcoded

PLoS ONE | www.plosone.org 1 November 2011 | Volume 6 | Issue 11 | e28132

the stall in), they were likely dispersed manually after women usedthe toilet. Coupling these observations with those of thedistribution of gut-associated bacteria indicate that routine use oftoilets results in the dispersal of urine- and fecal-associated bacteriathroughout the restroom. While these results are not unexpected,they do highlight the importance of hand-hygiene when usingpublic restrooms since these surfaces could also be potentialvehicles for the transmission of human pathogens. Unfortunately,previous studies have documented that college students (who arelikely the most frequent users of the studied restrooms) are notalways the most diligent of hand-washers [42,43].

Results of SourceTracker analysis support the taxonomicpatterns highlighted above, indicating that human skin was theprimary source of bacteria on all public restroom surfacesexamined, while the human gut was an important source on oraround the toilet, and urine was an important source in women’srestrooms (Figure 4, Table S4). Contrary to expectations (seeabove), soil was not identified by the SourceTracker algorithm asbeing a major source of bacteria on any of the surfaces, includingfloors (Figure 4). Although the floor samples contained family-leveltaxa that are common in soil, the SourceTracker algorithmprobably underestimates the relative importance of sources, like

Figure 3. Cartoon illustrations of the relative abundance of discriminating taxa on public restroom surfaces. Light blue indicates lowabundance while dark blue indicates high abundance of taxa. (A) Although skin-associated taxa (Propionibacteriaceae, Corynebacteriaceae,Staphylococcaceae and Streptococcaceae) were abundant on all surfaces, they were relatively more abundant on surfaces routinely touched withhands. (B) Gut-associated taxa (Clostridiales, Clostridiales group XI, Ruminococcaceae, Lachnospiraceae, Prevotellaceae and Bacteroidaceae) were mostabundant on toilet surfaces. (C) Although soil-associated taxa (Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were in lowabundance on all restroom surfaces, they were relatively more abundant on the floor of the restrooms we surveyed. Figure not drawn to scale.doi:10.1371/journal.pone.0028132.g003

Figure 4. Results of SourceTracker analysis showing the average contributions of different sources to the surface-associatedbacterial communities in twelve public restrooms. The ‘‘unknown’’ source is not shown but would bring the total of each sample up to 100%.doi:10.1371/journal.pone.0028132.g004

Bacteria of Public Restrooms

PLoS ONE | www.plosone.org 5 November 2011 | Volume 6 | Issue 11 | e28132

high diversity of floor communities is likely due to the frequency ofcontact with the bottom of shoes, which would track in a diversityof microorganisms from a variety of sources including soil, which isknown to be a highly-diverse microbial habitat [27,39]. Indeed,bacteria commonly associated with soil (e.g. Rhodobacteraceae,Rhizobiales, Microbacteriaceae and Nocardioidaceae) were, on average,more abundant on floor surfaces (Figure 3C, Table S2).Interestingly, some of the toilet flush handles harbored bacterialcommunities similar to those found on the floor (Figure 2,Figure 3C), suggesting that some users of these toilets may operatethe handle with a foot (a practice well known to germaphobes andthose who have had the misfortune of using restrooms that are lessthan sanitary).

While the overall community level comparisons between thecommunities found on the surfaces in male and female restroomswere not statistically significant (Table S3), there were gender-

related differences in the relative abundances of specific taxa onsome surfaces (Figure 1B, Table S2). Most notably, Lactobacillaceaewere clearly more abundant on certain surfaces within femalerestrooms than male restrooms (Figure 1B). Some species of thisfamily are the most common, and often most abundant, bacteriafound in the vagina of healthy reproductive age women [40,41]and are relatively less abundant in male urine [28,29]. Ouranalysis of female urine samples collected as part of a previousstudy [26] (Figure 1A), found that Lactobacillaceae were dominant inurine, therefore implying that surfaces in the restrooms whereLactobacillaceae were observed were contaminated with urine. Otherstudies have demonstrated a similar phenomenon, with vagina-associated bacteria having also been observed in airplanerestrooms [11] and a child day care facility [10]. As we foundthat Lactobacillaceae were most abundant on toilet surfaces andthose touched by hands after using the toilet (with the exception of

Figure 2. Relationship between bacterial communities associated with ten public restroom surfaces. Communities were clustered usingPCoA of the unweighted UniFrac distance matrix. Each point represents a single sample. Note that the floor (triangles) and toilet (asterisks) surfacesform clusters distinct from surfaces touched with hands.doi:10.1371/journal.pone.0028132.g002

Table 1. Results of pairwise comparisons for unweighted UniFrac distances of bacterial communities associated with varioussurfaces of public restrooms on the University of Colorado campus using the ANOSIM test in Primer v6.

Door in Door out Stall in Stall outFaucethandle

Soapdispenser

Toilet flushhandle Toilet seat Toilet floor

Door in

Door out 20.139

Stall in 0.149 20.053

Stall out 20.074 20.083 20.037

Faucet handle 20.062 20.011 20.092 20.040

Soap dispenser 20.020 0.014 20.060 20.001 0.070

Toilet flush handle 0.376* 0.405* 0.221 0.350* 0.172* 0.470*

Toilet seat 0.742* 0.672* 0.457* 0.586* 0.401* 0.653* 0.187*

Toilet floor 0.995* 0.988* 0.993* 0.961* 0.758* 0.998* 0.577* 0.950*

Sink floor 1.000* 0.995* 1.000* 0.974* 0.770* 1.000* 0.655* 0.982* 20.033

The R-statistic is shown for each comparison with asterisks denoting comparisons that were statistically significant at P#0.01.doi:10.1371/journal.pone.0028132.t001

Bacteria of Public Restrooms

PLoS ONE | www.plosone.org 4 November 2011 | Volume 6 | Issue 11 | e28132

10 FEBRUARY 2012 VOL 335 SCIENCE www.sciencemag.org 650

NEWSFOCUS

CR

ED

ITS

(T

OP

TO

BO

TT

OM

): (P

HO

TO

) C

OU

RT

ES

Y G

ILB

ER

TO

FLO

RE

S; (C

HA

RT

) G

. E

. F

LO

RE

S E

T A

L.,

PLO

S O

NE

6, 1

1 (2

01

1);

PH

OT

O B

Y S

ISIR

A G

OR

TH

ALA

In just that short time, the microbes had begun to take on a “signature” of outside air (more types from plants and soil), and 2 hours after the windows were shut again, the proportion of microbes from the human body increased back to pre-vious levels.

The s tudy, which appeared online 26 Janu-ary in The ISME Journal, found that mechanically ventilated rooms had lower microbial diversity than ones with open win-dows. The availability of fresh air translated into lower proportions of microbes associ-ated with the human body, and consequently, fewer potential pathogens. Although this result suggests that having natural airfl ow may be healthier, Green says answering that question requires clinical data; she’s hoping to convince a hospital to participate in a study to see if the incidence of hospital-acquired infections is associated with a room’s micro-bial community.

For his part, Peccia, who is also a Sloan grantee, is merging microbiology and the

physics of aerosols to look more closely at how the movement of air affects microbes. Peccia says his group is building on work by air-quality engineers and scientists, but “we want to add biology to the equation.”

Bacteria in air behave like other particles; their size dictates how they disperse or settle. Humans in a room not only shed microbes from their skin and mouths, but they also drum up microbial material from the fl oor as

they move around. But to quantify those con-tributions, Peccia’s team has had to develop new methods to collect airborne bacteria and extract their DNA, as the microbes are much less abundant in air than on surfaces.

In one recent study, they used air fi lters to sample airborne particles and microbes in a classroom during 4 days during which students were present and 4 days during which the room was vacant. They measured the abundance and type of fungal and bac-terial genomes present and estimated the microbes’ concentrations in the entire room. By accounting for bacteria entering and leav-

ing the room through ventilation, they calculated that people shed or resuspended about 35 million bacterial cells per person per hour. That number is much higher than the several-hundred-thousand maximum previously estimated to be present in indoor air, Peccia reported last fall at the American Association for Aerosol Research Conference in Orlando, Florida.

His group’s data also suggest that rooms have “memories” of past human inhabitants. By kick-ing into the air settled microbes from the fl oor, occupants expose themselves not just to the microbes of a person coughing next to them, but also possibly to those from a person who coughed in the room a few hours or even days ago.

Peccia hopes to come up with ways to describe the distribution of bacteria indoors that can be used in conjunction with exist-ing knowledge about particulate matter and chemicals in designing healthier buildings. “My hope is that we can bring this enough to the forefront that people who do aerosol sci-ence will fi nd it as important to know biology as to know physics and chemistry,” he says.

Still, even though he’s a willing partici-

pant in indoor microbial ecology research, Peccia thinks that the field has yet to gel. And the Sloan Foundation’s Olsiewski shares some of his con-cern. “Everybody’s gen-erating vast amounts of

data,” she says, but looking across data sets can be diffi cult because groups choose dif-ferent analytical tools. With Sloan support, though, a data archive and integrated analyt-ical tools are in the works.

To foster collaborations between micro-biologists, architects, and building scientists, the foundation also sponsored a symposium on the microbiome of the built environment at the 2011 Indoor Air conference in Austin, Texas, and launched a Web site, MicroBE.net, that’s a clearinghouse of information on the fi eld. Although Olsiewski won’t say how long the foundation will fund its indoor microbial ecology program, she says Sloan is committed to supporting all of the current projects for the next few years. The program’s ultimate goal, she says, is to create a new fi eld of scientifi c inquiry that eventually will be funded by tradi-tional government funding agencies focused on basic biology and environmental policy.

Matthew Kane, a microbial ecologist and program director at the U.S. National Sci-ence Foundation (NSF), says that although there was interest in these questions prior to the Sloan program, the Sloan Foundation has taken a directed approach to funding the research, and “I have no doubt that their investment is going to reap great returns.” So far, though, NSF has funded only one study on indoor microbes: a study of Pseudomonas bacteria in human households.

As studies like Green’s building ecology analysis progress, they should shed light on how indoor environments differ from those traditionally studied by microbial ecologists. “It’s important to have a quantitative under-standing of how building design impacts microbial communities indoors, and how these communities impact human health,” Green says. But it remains to be seen whether we’ll someday design and maintain our build-ings with microbes in mind.

–COURTNEY HUMPHRIES

Courtney Humphries is a freelance writer in Boston and author of Superdove.

100

80

60

40

20

0

Ave

rag

e c

on

trib

uti

on

(%

)

Door in

Door out

Stall i

n

Stall o

ut

Faucet h

andles

Soap disp

enser

Toile

t seat

Toile

t flu

sh h

andle

Toile

t flo

or

Sink f

loor

SOURCES

Soil

Water

Mouth

Urine

Gut

Skin

Outside infl uence. Students prepare to sample air outside a class-room in China as part of an indoor ecology study.

Bathroom biogeography. By swabbing different surfaces in public restrooms, researchers determined that microbes vary in where they come from depend-ing on the surface (chart).

Published by AAAS

on

Febr

uary

9, 2

012

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

!59

Page 60: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era III: Genome Sequencing

!60

Era III: Genome Sequencing

Page 61: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era III: Genome Sequencing

!61

Era III: Genome Sequencing

Lectures 10-14

Page 62: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

1st Genome Sequence

Fleischmann et al. 1995 !62

Page 63: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Genomes Revolutionized Microbiology

• Predictions of metabolic processes

• Better vaccine and drug design

• New insights into mechanisms of evolution

• Genomes serve as template for functional studies

• New enzymes and materials for engineering and synthetic biology

!63

Page 64: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !64

Page 65: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Metabolic Predictions

!65

Page 66: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Lateral Gene Transfer

Perna et al. 2003!66

Page 67: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Network of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

!67

Page 68: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

GEBA Case Study

• Phylogenetic diversity poorly sampled

• GEBA project at DOE-JGI correcting this

!68

Page 69: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era IV: Genomes in the environment

!69

Era IV: Genomes in the Environment

Page 70: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era IV: Genomes in the environment

!70

Era IV: Genomes in the Environment

Lectures 15-19

Page 71: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era IV: Metagenomics

DNA extraction

PCRSequence all genes

5’ ...TACAGTATAGGTGGAGCTAGCGATCGAT

CGA... 3’

Page 72: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Delong Lab

Restriction mapping. Large genomic fragments isolated from fosmid cloneswere mapped by partial and double digestion with various restriction endonucle-ases. When the subclone sizes exceeded 10 kb, the F-factor-based vectorpBAC108L (30) was used to accommodate the fosmid subfragments. Partialdigestions were performed by adding 2.5 U of restriction enzyme to 1 ⇥g ofNotI-digested clone DNA in a 30-⇥l reaction mixture. The reaction mixture wasincubated at 37⇤C, and 10-⇥l aliquots were removed at 10, 40, and 60 min.Restriction digestions were terminated by adding 1 ⇥l of 0.5 M EDTA to thereaction mixtures and placing the tubes on ice. The partially digested DNA wasseparated by pulsed-field gel electrophoresis as described above except using a 1-to 3-s ramped switch time at 100 V for 16 h. The sizes of the separated fragmentswere determined relative to those of known standards. The distances of therestriction sites relative to the terminal T7 and SP6 promoter sites on the excisedcassette were determined by end labeling 10 pmol of T7- or SP6-specific oligo-nucleotides with [⌅-32P]ATP (7,000 Ci/mmol) and hybridizing with Southernblots of the gels.

Southern blots of agarose gels containing fosmid and pBAC clones digestedwith two or more restriction enzymes were probed with labeled T7 and SP6oligonucleotides as well as random-prime-labeled subclones and PCR fragmentscarrying gene sequences identified from the shotgun sequencing described above.This information was correlated with the size estimates from the partial diges-tions to generate physical and genetic maps of the fosmids and their subclones.

Phylogenetic analysis. Sequence alignment and DeSoete distance (9) analyseswere performed on a Sun Sparc 10 workstation using GDE 2.2 and Treetool 1.0,obtained from the ribosomal database project (RDP) (23). DeSoete least squaresdistance analyses (9) were performed by using pairwise evolutionary distances,calculated by using the correction of Olsen to account for empirical base fre-quencies (34). Reference sequences were obtained from the RDP, version 4.0(23). Maximum likelihood analyses (10) of ssu rRNA sequences were performedby using fastDNAml 1.0 (25), obtained from the RDP. For distance analyses ofthe inferred amino acid sequence of EF2, evolutionary distances were estimatedby using the Phylip program (12) Protdist, and tree topology was inferred by theFitch-Margoliash method, using random taxon addition and global branch swap-ping. For maximum parsimony analyses of protein sequences, the Phylip pro-gram Protpars was used with random taxon addition and ordinary parsimonyoptions.

Nucleotide sequence accession numbers. Partial sequences reported in Table1 have been submitted to GenBank under the following accession numbers:U40238, U40239, U40240, U40241, U40242, U40243, U40244, and U40245. Thenucleotide sequences encoding ssu rRNA and EF2 have been submitted toGenBank under accession numbers U39635 and U41261.

RESULTS

Figure 1 shows an overview of the procedures used to con-struct an environmental library from the mixed picoplanktonsample. Our goal was to construct a stable, large insert DNAlibrary representing picoplankton genomic DNA, in order togain information about the genetic and physiological potentialof one constituent group in this community, the planktonicmarine Crenarchaeota. Agarose plugs containing high-molecu-lar-weight picoplankton DNA were prepared by concentratingcells from 30 liters of seawater, using hollow fiber filtration.These agarose plugs, representing picoplankton collected froma variety of sites and depths in the eastern North Pacific, werescreened for the presence of archaebacteria by using botheubacterium-biased (to test for positive amplification) and ar-chaeon-biased rDNA primers. PCR amplification results fromseveral of the agarose plugs (data not shown) indicated thepresence of significant amounts of archaeal DNA. Quantitativehybridization experiments using rRNA extracted from onesample, collected at a depth of 200 m off the Oregon coast,indicated that planktonic archaea in this assemblage comprisedapproximately 4.7% of the total picoplankton biomass (thissample corresponds to ‘‘PAC1’’-200 m in Table 1 of reference8). Results from archaeon-biased rDNA PCR amplificationperformed on agarose plug lysates confirmed the presence ofrelatively large amounts of archaeal DNA in this sample. Aga-rose plugs prepared from this picoplankton sample were cho-sen for subsequent fosmid library preparation. Each 1-ml aga-rose plug from this site contained approximately 7.5 � 109

cells; therefore, approximately 5.4 � 108 cells were present inthe 72-⇥l slice used in the preparation of the partially digestedDNA.

Recombinant fosmids, each containing ca. 40 kb of pico-plankton DNA insert, yielded a library of 3,552 fosmid clones,containing approximately 1.4 � 108 bp of cloned DNA. All ofthe clones examined contained inserts ranging in size from 38to 42 kbp (Fig. 2). Both the multiplex PCR (Fig. 3) and thehybridization experiments suggested that well B7 on microtiter

FIG. 1. Flowchart depicting the construction and screening of an environ-mental library from a mixed picoplankton sample. MW, molecular weight;PFGE, pulsed-field gel electrophoresis.

FIG. 2. Pulsed-field gel showing the separation of selected fosmid clonesdigested with NotI and BamHI. The pFOS1 vector band is at 7.2 kbp. The toptwo bands of clone 4B7 are doublets.

VOL. 178, 1996 GENOMIC FRAGMENTS FROM PLANKTONIC MARINE ARCHAEA 593

at U

NIV

OF

CA

LIF

DA

VIS

on

Ma

y 1

8, 2

01

0

jb.a

sm

.org

Do

wn

loa

de

d fro

m

!72

Page 73: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Delong Lab

tion with multiple sequence alignments, indi-cates that the majority of active site residuesare well conserved between proteorhodopsinand archaeal bacteriorhodopsins (15).

A phylogenetic comparison with archaealrhodopsins placed proteorhodopsin on an in-dependent long branch, with moderate statis-tical support for an affiliation with sensoryrhodopsins (16) (Fig. 1B). The finding ofarchaeal-like rhodopsins in organisms as di-verse as marine proteobacteria and eukarya(6) suggests a potential role for lateral genetransfer in their dissemination. Available ge-nome sequence data are insufficient to iden-tify the evolutionary origins of the proteo-rhodopsin genes. The environments fromwhich the archaeal and bacterial rhodopsinsoriginate are, however, strikingly different.Proteorhodopsin is of marine origin, whereasthe archaeal rhodopsins of extreme halophilesexperience salinity 4 to 10 times greater thanthat in the sea (14).

Functional analysis. To determinewhether proteorhodopsin binds retinal, weexpressed the protein in Escherichia coli(17). After 3 hours of induction in the pres-ence of retinal, cells expressing the proteinacquired a reddish pigmentation (Fig. 3A).When retinal was added to the membranes ofcells expressing the proteorhodopsin apopro-tein, an absorbance peak at 520 nm wasobserved after 10 min of incubation (Fig.3B). On further incubation, the peak at 520nm increased and had a !100-nm half-band-width. The 520-nm pigment was generatedonly in membranes containing proteorhodop-sin apoprotein, and only in the presence ofretinal, and its !100-nm half-bandwidth istypical of retinylidene protein absorptionspectra found in other rhodopsins. The red-shifted "max of retinal ("max # 370 nm in thefree state) is indicative of a protonated Schiffbase linkage of the retinal, presumably to thelysine residue in helix G (18).

Light-mediated proton translocation was de-termined by measuring pH changes in a cellsuspension exposed to light. Net outward trans-port of protons was observed solely in proteor-hodopsin-containing E. coli cells and only inthe presence of retinal and light (Fig. 4A).Light-induced acidification of the medium wascompletely abolished by the presence of a 10$M concentration of the protonophore carbonylcyanide m-chlorophenylhydrazone (19). Illumi-nation generated a membrane electrical poten-tial in proteorhodopsin-containing right-side-out membrane vesicles, in the presence of reti-nal, reaching –90 mV 2 min after light onset(20) (Fig. 4B). These data indicate that proteo-rhodopsin translocates protons and is capable ofgenerating membrane potential in a physiolog-ically relevant range. Because these activitieswere observed in E. coli membranes containingoverexpressed protein, the levels of proteorho-dopsin activity in its native state remain to be

determined. The ability of proteorhodopsin togenerate a physiologically significant mem-brane potential, however, even when heterolo-gously expressed in nonnative membranes, isconsistent with a postulated proton-pumpingfunction for proteorhodopsin.

Archaeal bacteriorhodopsin, and to a less-er extent sensory rhodopsins (21), can bothmediate light-driven proton-pumping activi-ty. However, sensory rhodopsins are general-ly cotranscribed with genes encoding theirown transducer of light stimuli [for example,Htr (22, 23)]. Although sequence analysis ofproteorhodopsin shows moderate statisticalsupport for a specific relationship with sen-

sory rhodopsins, there is no gene for an Htr-like regulator adjacent to the proteorhodopsingene. The absence of an Htr-like gene inclose proximity to the proteorhodopsin genesuggests that proteorhodopsin may functionprimarily as a light-driven proton pump. It ispossible, however, that such a regulatormight be encoded elsewhere in the proteobac-terial genome.

To further verify a proton-pumping func-tion for proteorhodopsin, we characterizedthe kinetics of its photochemical reaction cy-cle. The transport rhodopsins (bacteriorho-dopsins and halorhodopsins) are character-ized by cyclic photochemical reaction se-

Fig. 1. (A) Phylogenetic tree of bacterial 16S rRNA gene sequences, including that encoded on the130-kb bacterioplankton BAC clone (EBAC31A08) (16). (B) Phylogenetic analysis of proteorhodop-sin with archaeal (BR, HR, and SR prefixes) and Neurospora crassa (NOP1 prefix) rhodopsins (16).Nomenclature: Name_Species.abbreviation_Genbank.gi (HR, halorhodopsin; SR, sensory rhodopsin;BR, bacteriorhodopsin). Halsod, Halorubrum sodomense; Halhal, Halobacterium salinarum (halo-bium); Halval, Haloarcula vallismortis; Natpha, Natronomonas pharaonis; Halsp, Halobacterium sp;Neucra, Neurospora crassa.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 289 15 SEPTEMBER 2000 1903

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

!73

Page 74: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !74

Page 75: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Shotgun metagenomics

shotgun

sequence

Metagenomics!75

Page 76: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Community structure and metabolismthrough reconstruction of microbialgenomes from the environmentGene W. Tyson1, Jarrod Chapman3,4, Philip Hugenholtz1, Eric E. Allen1, Rachna J. Ram1, Paul M. Richardson4, Victor V. Solovyev4,Edward M. Rubin4, Daniel S. Rokhsar3,4 & Jillian F. Banfield1,2

1Department of Environmental Science, Policy and Management, 2Department of Earth and Planetary Sciences, and 3Department of Physics, University of California,Berkeley, California 94720, USA4Joint Genome Institute, Walnut Creek, California 94598, USA

...........................................................................................................................................................................................................................

Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and theirroles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we reportreconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three othergenomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency ofgenomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a differentindividual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level.The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundancevariants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologousrecombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed thepathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extremeenvironment.

The study of microbial evolution and ecology has been revolutio-nized by DNA sequencing and analysis1–3. However, isolates havebeen the main source of sequence data, and only a small fraction ofmicroorganisms have been cultivated4–6. Consequently, focus hasshifted towards the analysis of uncultivated microorganisms viacloning of conserved genes5 and genome fragments directly fromthe environment7–9. To date, only a small fraction of genes have beenrecovered from individual environments, limiting the analysis ofmicrobial communities as networks characterized by symbioses,competition and partitioning of community-essential roles.Comprehensive genomic data would resolve organism-specificpathways and provide insights into population structure, speciationand evolution. So far, sequencing of whole communities has notbeen practical because most communities comprise hundreds tothousands of species10.

Acid mine drainage (AMD) is a worldwide environmentalproblem that arises largely from microbial activity11. Here, wefocused on a low-complexity AMD microbial biofilm growinghundreds of feet underground within a pyrite (FeS2) ore body

12–15.This represents a self-contained biogeochemical system character-ized by tight coupling between microbial iron oxidation andacidification due to pyrite dissolution11,16,17. Random shotgunsequencing of DNA from entire microbial communities is oneapproach for the recovery of the gene complement of uncultivatedorganisms, and for determining the degree of variability withinpopulations at the genome level. We used random shotgun sequen-cing of the biofilm to obtain the first reconstruction of multiplegenomes directly from a natural sample. The results provide novelinsights into community structure, and reveal the strategies thatunderpin microbial activity in this environment.

Initial characterization of the biofilmBiofilms growing on the surface of flowing AMD in the five-way region of the Richmond mine at Iron Mountain, California12,were sampled in March 2000. Screening using group-specific18

fluorescence in situ hybridization (FISH) revealed that all biofilmscontained mixtures of bacteria (Leptospirillum, Sulfobacillus and, ina few cases, Acidimicrobium) and archaea (Ferroplasma and othermembers of the Thermoplasmatales). The genome of one of thesearchaea, Ferroplasma acidarmanus fer1, isolated from the Richmondmine, has been sequenced previously (http://www.jgi.doe.gov/JGI_microbial/html/ferroplasma/ferro_homepage.html).A pink biofilm (Fig. 1a) typical of AMD communities was

selected for detailed genomic characterization (see SupplementaryInformation). The biofilm was dominated by Leptospirillum speciesand contained F. acidarmanus at a relatively low abundance (Fig. 1b,c). This biofilm was growing in pH 0.83, 42 8C, 317mM Fe, 14mMZn, 4mM Cu and 2mM As solution, and was collected from asurface area of approximately 0.05m2.A 16S ribosomal RNA gene clone library was constructed from

DNA extracted from the pink biofilm, and 384 clones were end-sequenced (see Supplementary Information). Results indicated thepresence of three bacterial and three archaeal lineages. The mostabundant clones are close relatives of L. ferriphilum19 and belongto Leptospirillum group II (ref. 13). Although 94% of the Lepto-spirillum group II clones were identical, 17 minor variants weredetected with up to 1.2% 16S rRNA gene-sequence divergence fromthe dominant type. Tightly defined groups (up to 1% sequencedivergence) related to Leptospirillum group III (ref. 13), Sulfobacillus,Ferroplasma (some identical to fer1), ‘A-plasma’15 and ‘G-plasma’15

were also detected. Leptospirillum group III, G-plasma andA-plasma have only recently been detected in culture-independentmolecular surveys. FISH-based quantification (Fig. 1c; seealso Supplementary Information) confirmed the dominance ofLeptospirillum group II in the biofilm.

Community genome sequencing and assemblyIn conventional shotgun sequencing projects of microbial isolates,all shotgun fragments are derived from clones of the same genome.When using the shotgun sequencing approach on genomes from an

articles

NATURE | doi:10.1038/nature02340 | www.nature.com/nature 1© 2004 Nature Publishing Group

!76

Environmental Genome ShotgunSequencing of the Sargasso SeaJ. Craig Venter,1* Karin Remington,1 John F. Heidelberg,3

Aaron L. Halpern,2 Doug Rusch,2 Jonathan A. Eisen,3

Dongying Wu,3 Ian Paulsen,3 Karen E. Nelson,3 William Nelson,3

Derrick E. Fouts,3 Samuel Levy,2 Anthony H. Knap,6

Michael W. Lomas,6 Ken Nealson,5 Owen White,3

Jeremy Peterson,3 Jeff Hoffman,1 Rachel Parsons,6

Holly Baden-Tillson,1 Cynthia Pfannkoch,1 Yu-Hui Rogers,4

Hamilton O. Smith1

Wehave applied “whole-genome shotgun sequencing” tomicrobial populationscollected enmasse on tangential flow and impact filters from seawater samplescollected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairsof nonredundant sequencewas generated, annotated, and analyzed to elucidatethe gene content, diversity, and relative abundance of the organisms withinthese environmental samples. These data are estimated to derive from at least1800 genomic species based on sequence relatedness, including 148 previouslyunknown bacterial phylotypes. We have identified over 1.2 million previouslyunknown genes represented in these samples, including more than 782 newrhodopsin-like photoreceptors. Variation in species present and stoichiometrysuggests substantial oceanic microbial diversity.

Microorganisms are responsible for most of thebiogeochemical cycles that shape the environ-ment of Earth and its oceans. Yet, these organ-isms are the least well understood on Earth, asthe ability to study and understand the metabol-ic potential of microorganisms has been ham-pered by the inability to generate pure cultures.Recent studies have begun to explore environ-mental bacteria in a culture-independent man-ner by isolating DNA from environmental sam-ples and transforming it into large insert clones.For example, a previously unknown light-drivenproton pump, proteorhodopsin, was discoveredwithin a bacterial artificial chromosome (BAC)from the genome of a SAR86 ribotype (1), andsoil microbial DNA libraries have been construct-ed and screened for specific activities (2).

Here we have applied whole-genome shot-gun sequencing to environmental-pooled DNAsamples to test whether new genomic approach-es can be effectively applied to gene and spe-cies discovery and to overall environmental

characterization. To help ensure a tractable pilotstudy, we sampled in the Sargasso Sea, a nutrient-limited, open ocean environment. Further, weconcentrated on the genetic material captured onfilters sized to isolate primarily microbial inhabit-ants of the environment, leaving detailed analysisof dissolved DNA and viral particles on one endof the size spectrum and eukaryotic inhabitants onthe other, for subsequent studies.The Sargasso Sea. The northwest Sar-

gasso Sea, at the Bermuda Atlantic Time-seriesStudy site (BATS), is one of the best-studiedand arguably most well-characterized regionsof the global ocean. The Gulf Stream representsthe western and northern boundaries of thisregion and provides a strong physical boundary,separating the low nutrient, oligotrophic openocean from the more nutrient-rich waters of theU.S. continental shelf. The Sargasso Sea hasbeen intensively studied as part of the 50-yeartime series of ocean physics and biogeochem-istry (3, 4) and provides an opportunity forinterpretation of environmental genomic data inan oceanographic context. In this region, for-mation of subtropical mode water occurs eachwinter as the passage of cold fronts across theregion erodes the seasonal thermocline andcauses convective mixing, resulting in mixedlayers of 150 to 300 m depth. The introductionof nutrient-rich deep water, following thebreakdown of seasonal thermoclines into thebrightly lit surface waters, leads to the bloom-ing of single cell phytoplankton, including twocyanobacteria species, Synechococcus and Pro-

chlorococcus, that numerically dominate thephotosynthetic biomass in the Sargasso Sea.

Surface water samples (170 to 200 liters)were collected aboard the RV Weatherbird IIfrom three sites off the coast of Bermuda inFebruary 2003. Additional samples were col-lected aboard the SV Sorcerer II from “Hydro-station S” in May 2003. Sample site locationsare indicated on Fig. 1 and described in tableS1; sampling protocols were fine-tuned fromone expedition to the next (5). Genomic DNAwas extracted from filters of 0.1 to 3.0 !m, andgenomic libraries with insert sizes ranging from2 to 6 kb were made as described (5). Theprepared plasmid clones were sequenced fromboth ends to provide paired-end reads at the J.Craig Venter Science Foundation Joint Tech-nology Center on ABI 3730XL DNA sequenc-ers (Applied Biosystems, Foster City, CA).Whole-genome random shotgun sequencing ofthe Weatherbird II samples (table S1, samples 1 to4) produced 1.66 million reads averaging 818 bpin length, for a total of approximately 1.36 Gbp ofmicrobial DNA sequence. An additional 325,561sequences were generated from the Sorcerer IIsamples (table S1, samples 5 to 7), yielding ap-proximately 265 Mbp of DNA sequence.Environmental genome shotgun as-

sembly. Whole-genome shotgun sequencingprojects have traditionally been applied to iden-tify the genome sequence(s) from one particularorganism, whereas the approach taken here isintended to capture representative sequencefrom many diverse organisms simultaneously.Variation in genome size and relative abun-dance determines the depth of coverage of anyparticular organism in the sample at a givenlevel of sequencing and has strong implicationsfor both the application of assembly algorithmsand for the metrics used in evaluating the re-sulting assembly. Although we would expectabundant species to be deeply covered and wellassembled, species of lower abundance may berepresented by only a few sequences. For asingle genome analysis, assembly coveragedepth in unique regions should approximate aPoisson distribution. The mean of this distribu-tion can be estimated from the observed data,looking at the depth of coverage of contigsgenerated before any scaffolding. The assem-bler used in this study, the Celera Assembler(6), uses this value to heuristically identifyclearly unique regions to form the backbone ofthe final assembly within the scaffolding phase.However, when the starting material consists ofa mixture of genomes of varying abundance, athreshold estimated in this way would classifysamples from the most abundant organism(s) asrepetitive, due to their greater-than-averagedepth of coverage, paradoxically leaving themost abundant organisms poorly assembled.We therefore used manual curation of an initial

1The Institute for Biological Energy Alternatives, 2TheCenter for the Advancement of Genomics, 1901 Re-search Boulevard, Rockville, MD 20850, USA. 3TheInstitute for Genomic Research, 9712 Medical CenterDrive, Rockville, MD 20850, USA. 4The J. Craig VenterScience Foundation Joint Technology Center, 5 Re-search Place, Rockville, MD 20850, USA. 5University ofSouthern California, 223 Science Hall, Los Angeles, CA90089–0740, USA. 6Bermuda Biological Station forResearch, Inc., 17 Biological Lane, St George GE 01,Bermuda.

*To whom correspondence should be addressed. E-mail: [email protected]

RESEARCH ARTICLE

2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org66

Page 77: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

A B C D E F G

T U V W X Y Z

Binning challenge

!77

Page 78: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

A B C D E F G

T U V W X Y Z

Binning challenge

Best binning method: reference genomes

!78

Page 79: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

ARTICLES

A human gut microbial gene catalogueestablished by metagenomic sequencingJunjie Qin1*, Ruiqiang Li1*, Jeroen Raes2,3, Manimozhiyan Arumugam2, Kristoffer Solvsten Burgdorf4,Chaysavanh Manichanh5, Trine Nielsen4, Nicolas Pons6, Florence Levenez6, Takuji Yamada2, Daniel R. Mende2,Junhua Li1,7, Junming Xu1, Shaochuan Li1, Dongfang Li1,8, Jianjun Cao1, Bo Wang1, Huiqing Liang1, Huisong Zheng1,Yinlong Xie1,7, Julien Tap6, Patricia Lepage6, Marcelo Bertalan9, Jean-Michel Batto6, Torben Hansen4, Denis LePaslier10, Allan Linneberg11, H. Bjørn Nielsen9, Eric Pelletier10, Pierre Renault6, Thomas Sicheritz-Ponten9,Keith Turner12, Hongmei Zhu1, Chang Yu1, Shengting Li1, Min Jian1, Yan Zhou1, Yingrui Li1, Xiuqing Zhang1,Songgang Li1, Nan Qin1, Huanming Yang1, Jian Wang1, Søren Brunak9, Joel Dore6, Francisco Guarner5,Karsten Kristiansen13, Oluf Pedersen4,14, Julian Parkhill12, Jean Weissenbach10, MetaHIT Consortium{, Peer Bork2,S. Dusko Ehrlich6 & Jun Wang1,13

To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Herewe describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundantmicrobial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set,,150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent)microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. Thegenes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entirecohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which arealso largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms offunctions present in all individuals and most bacteria, respectively.

It has been estimated that the microbes in our bodies collectivelymake up to 100 trillion cells, tenfold the number of human cells,and suggested that they encode 100-fold more unique genes thanour own genome1. The majority of microbes reside in the gut, havea profound influence on human physiology and nutrition, and arecrucial for human life2,3. Furthermore, the gut microbes contribute toenergy harvest from food, and changes of gut microbiome may beassociated with bowel diseases or obesity4–8.

To understand and exploit the impact of the gut microbes onhuman health and well-being it is necessary to decipher the content,diversity and functioning of the microbial gut community. 16S ribo-somal RNA gene (rRNA) sequence-based methods9 revealed that twobacterial divisions, the Bacteroidetes and the Firmicutes, constituteover 90% of the known phylogenetic categories and dominate thedistal gut microbiota10. Studies also showed substantial diversity ofthe gut microbiome between healthy individuals4,8,10,11. Although thisdifference is especially marked among infants12, later in life the gutmicrobiome converges to more similar phyla.

Metagenomic sequencing represents a powerful alternative torRNA sequencing for analysing complex microbial communities13–15.Applied to the human gut, such studies have already generated some3 gigabases (Gb) of microbial sequence from faecal samples of 33

individuals from the United States or Japan8,16,17. To get a broaderoverview of the human gut microbial genes we used the IlluminaGenome Analyser (GA) technology to carry out deep sequencing oftotal DNA from faecal samples of 124 European adults. We generated576.7 Gb of sequence, almost 200 times more than in all previousstudies, assembled it into contigs and predicted 3.3 million uniqueopen reading frames (ORFs). This gene catalogue contains virtuallyall of the prevalent gut microbial genes in our cohort, provides abroad view of the functions important for bacterial life in the gutand indicates that many bacterial species are shared by differentindividuals. Our results also show that short-read metagenomicsequencing can be used for global characterization of the geneticpotential of ecologically complex environments.

Metagenomic sequencing of gut microbiomes

As part of the MetaHIT (Metagenomics of the Human IntestinalTract) project, we collected faecal specimens from 124 healthy, over-weight and obese individual human adults, as well as inflammatorybowel disease (IBD) patients, from Denmark and Spain (Supplemen-tary Table 1). Total DNA was extracted from the faecal specimens18

and an average of 4.5 Gb (ranging between 2 and 7.3 Gb) of sequencewas generated for each sample, allowing us to capture most of the

*These authors contributed equally to this work.{Lists of authors and affiliations appear at the end of the paper.

1BGI-Shenzhen, Shenzhen 518083, China. 2European Molecular Biology Laboratory, 69117 Heidelberg, Germany. 3VIB—Vrije Universiteit Brussel, 1050 Brussels, Belgium. 4HagedornResearch Institute, DK 2820 Copenhagen, Denmark. 5Hospital Universitari Val d’Hebron, Ciberehd, 08035 Barcelona, Spain. 6Institut National de la Recherche Agronomique, 78350Jouy en Josas, France. 7School of Software Engineering, South China University of Technology, Guangzhou 510641, China. 8Genome Research Institute, Shenzhen University MedicalSchool, Shenzhen 518000, China. 9Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark. 10Commissariat a l’EnergieAtomique, Genoscope, 91000 Evry, France. 11Research Center for Prevention and Health, DK-2600 Glostrup, Denmark. 12The Wellcome Trust Sanger Institute, Hinxton, CambridgeCB10 1SA, UK. 13Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark. 14Institute of Biomedical Sciences, University of Copenhagen & Faculty of HealthScience, University of Aarhus, 8000 Aarhus, Denmark.

Vol 464 | 4 March 2010 | doi:10.1038/nature08821

59Macmillan Publishers Limited. All rights reserved©2010

!79

Case Study - Human Microbiome Metagenomics

Page 80: EVE161 Lecture 1

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Almost all (99.96%) of the phylogenetically assigned genes belongedto the Bacteria and Archaea, reflecting their predominance in the gut.Genes that were not mapped to orthologous groups were clusteredinto gene families (see Methods). To investigate the functional con-tent of the prevalent gene set we computed the total number oforthologous groups and/or gene families present in any combinationof n individuals (with n 5 2–124; see Fig. 2c). This rarefaction ana-lysis shows that the ‘known’ functions (annotated in eggNOG orKEGG) quickly saturate (a value of 5,569 groups was observed): whensampling any subset of 50 individuals, most have been detected.However, three-quarters of the prevalent gut functionalities consistsof uncharacterized orthologous groups and/or completely novel genefamilies (Fig. 2c). When including these groups, the rarefaction curveonly starts to plateau at the very end, at a much higher level (19,338groups were detected), confirming that the extensive sampling of alarge number of individuals was necessary to capture this considerableamount of novel/unknown functionality.

Bacterial functions important for life in the gut

The extensive non-redundant catalogue of the bacterial genes fromthe human intestinal tract provides an opportunity to identify bac-terial functions important for life in this environment. There arefunctions necessary for a bacterium to thrive in a gut context (thatis, the ‘minimal gut genome’) and those involved in the homeostasisof the whole ecosystem, encoded across many species (the ‘minimalgut metagenome’). The first set of functions is expected to be presentin most or all gut bacterial species; the second set in most or allindividuals’ gut samples.

To identify the functions encoded by the minimal gut genome weuse the fact that they should be present in most or all gut bacterialspecies and therefore appear in the gene catalogue at a frequencyabove that of the functions present in only some of the gut bacterialspecies. The relative frequency of different functions can be deducedfrom the number of genes recruited to different eggNOG clusters,after normalization for gene length and copy number (Supplemen-tary Fig. 10a, b). We ranked all the clusters by gene frequencies anddetermined the range that included the clusters specifying well-known essential bacterial functions, such as those determined experi-mentally for a well-studied firmicute, Bacillus subtilis27, hypothe-sizing that additional clusters in this range are equally important.As expected, the range that included most of B. subtilis essentialclusters (86%) was at the very top of the ranking order (Fig. 5).Some 76% of the clusters with essential genes of Escherichia coli28

were within this range, confirming the validity of our approach.This suggests that 1,244 metagenomic clusters found within the range(Supplementary Table 10; termed ‘range clusters’ hereafter) specifyfunctions important for life in the gut.

We found two types of functions among the range clusters: thoserequired in all bacteria (housekeeping) and those potentially specificfor the gut. Among many examples of the first category are thefunctions that are part of main metabolic pathways (for example,central carbon metabolism, amino acid synthesis), and importantprotein complexes (RNA and DNA polymerase, ATP synthase, generalsecretory apparatus). Not surprisingly, projection of the range clusterson the KEGG metabolic pathways gives a highly integrated picture ofthe global gut cell metabolism (Fig. 6a).

The putative gut-specific functions include those involved in adhe-sion to the host proteins (collagen, fibrinogen, fibronectin) or inharvesting sugars of the globoseries glycolipids, which are carriedon blood and epithelial cells. Furthermore, 15% of range clustersencode functions that are present in ,10% of the eggNOG genomes(see Supplementary Fig. 11) and are largely (74.3%) not defined(Fig. 6b). Detailed studies of these should lead to a deeper compre-hension of bacterial life in the gut.

To identify the functions encoded by the minimal gut metagenome,we computed the orthologous groups that are shared by individuals ofour cohort. This minimal set, of 6,313 functions, is much larger than theone estimated in a previous study8. There are only 2,069 functionallyannotated orthologous groups, showing that they gravely underesti-mate the true size of the common functional complement among indi-viduals (Fig. 6c). The minimal gut metagenome includes a considerablefraction of functions (,45%) that are present in ,10% of thesequenced bacterial genomes (Fig. 6c, inset). These otherwise rare func-tionalities that are found in each of the 124 individuals may be necessaryfor the gut ecosystem. Eighty per cent of these orthologous groupscontain genes with at best poorly characterized function, underscoringour limited knowledge of gut functioning.

Of the known fraction, about 5% codes for (pro)phage-relatedproteins, implying a universal presence and possible important eco-logical role of bacteriophages in gut homeostasis. The most strikingsecondary metabolism that seems crucial for the minimal metage-nome relates, not unexpectedly, to biodegradation of complex sugarsand glycans harvested from the host diet and/or intestinal lining.Examples include degradation and uptake pathways for pectin(and its monomer, rhamnose) and sorbitol, sugars which are omni-present in fruits and vegetables, but which are not or poorly absorbedby humans. As some gut microorganisms were found to degrade bothof them29,30, this capacity seems to be selected for by the gut ecosystemas a non-competitive source of energy. Besides these, capacity toferment, for example, mannose, fructose, cellulose and sucrose is alsopart of the minimal metagenome. Together, these emphasize the

40

30

20

10

0

Clu

ster

(%)

1 2,001 4,001 6,001 8,001 10,001Cluster rank

Range

Figure 5 | Clusters that contain the B. subtilis essential genes. The clusterswere ranked by the number of genes they contain, normalized by averagelength and copy number (see Supplementary Fig. 10), and the proportion ofclusters with the essential B. subtilis genes was determined for successivegroups of 100 clusters. Range indicates the part of the cluster distributionthat contains 86% of the B. subtilis essential genes.

• •

• •

••

••

• •

• •

••

••

Healthy

Crohn’s disease

Ulcerative colitis

P value: 0.031

PC2

PC1

Figure 4 | Bacterial species abundance differentiates IBD patients andhealthy individuals. Principal component analysis with health status asinstrumental variables, based on the abundance of 155 species with $1%genome coverage by the Illumina reads in at least 1 individual of the cohort,was carried out with 14 healthy individuals and 25 IBD patients (21 ulcerativecolitis and 4 Crohn’s disease) from Spain (Supplementary Table 1). Two firstcomponents (PC1 and PC2) were plotted and represented 7.3% of wholeinertia. Individuals (represented by points) were clustered and centre ofgravity computed for each class; P-value of the link between health status andspecies abundance was assessed using a Monte-Carlo test (999 replicates).

ARTICLES NATURE | Vol 464 | 4 March 2010

62Macmillan Publishers Limited. All rights reserved©2010

!80