Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Post on 26-Apr-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Genomes from metagenomes: recovery and analysis of

population genomes

Kate OrmerodAustralian Centre for Ecogenomics,

School of Chemistry & Molecular Biosciences

Genomes from metagenomes

v

• Represent an consensus version of a particular species present within a dataset

2

Isolate sequencing

Metagenomic sequencing

Why assemble?

• We can pull functional and phylogenetic information from a metagenomic sample without any assembly – so why assemble?– Expanding the tree of life reference genomes– Gives context to the functional data – how much

variation exists within a given family?– Functional assignment based on full length genes– Provides a basis for strain level comparisons

3

Generating population genomes

• Experimental design– Multiple samples

• Different sites/biological entities

• Time series• Different sample treatment

• Initial de novo assembly– Combined samples

• Binning– Based on coverage in

individual samples and sequence composition vv v v

4

Experimental design

• How diverse is the environment you are sampling? How rare are the genomes you seek? – Assess using:

• Test batch of shotgun samples• 16S rRNA profiling

• How easy is it to obtain samples?

• Is there likely to be host contamination?

• Depth can be achieved across multiple samples

5

Assembly & binning

• Assembly– Software

• SPAdes• MetaVelvet• CLC Genomics Workbench

– Read QC• eg Trimmomatic

– Combined samples• All at once?• Subset?

– Gap filling• eg ABySS

vv

v

v v

vv

vv

v

vv

vv

vv

v

vv v

6

Assembly & binning

• Binning– Software

• MetaBAT• CONCOCT• Canopy• GroopM

– Coverage/co-abundance• Coverage within a single

sample• Differential coverage

– Coverage across different samples

– Sequence composition• Tetranucleotide frequency

patterns– Beware small contigs vv v v

7

Problems with population genomes

• Completeness– Insufficient coverage– Regions with different profile from the rest of the

genome eg duplications, lateral transfer, transposable elements

• Contamination– Misassembly – chimeric contigs – lateral transfer,

transposable elements

8

Assessing genome quality

• CheckM– Measures completeness

and contamination– Lineage specific marker

gene assessment• Read mapping

– Detection of structural variation, low/high coverage, different insert sizes

• Reference genome coverage

9

Improving population genomes

• Reassembly of individual genome bins– Extract reads mapping to

individual bins (BamM)

• Reference guided assembly

• Different binning tool and/or assembly tool

10

metaQUAST

11

Analysis of population genomes

• Major advantage = culture independent– Less biased view of the diversity of the chosen

environment

• Major challenge = culture independent– Therefore functional information may not be available– Describing new genus, family, phylum?

12

Finding the story in your genomes

• Inferring core metabolism– Energy– Food

• Environment specific points of interest– Antibiotic resistance– Virulence factors

• Comparisons to phylogenetic and environmental neighbours

13

Analysis of population genomes: an example

14Ormerod, K.L., D.L.A.Wood, N. Lachner, et al. "Genomic characterization of the uncultured Bacteroidales family S24-7 inhabiting the guts of homeothermic animals." Microbiome (2016) 4:36.

Genome annotation

• Annotation– Prokka– NCBI

• Annotation + analysis– RAST– IMG– KBase

15

16

Functional categorisation

• Databases assign function to protein annotations using homology– CAZy: carbohydrate active enzymes– KEGG– COG– EggNOG

• Differentiation between: – Family members– Phylogenetic neighbours– Environmental neighbours

17

CAZy: carbohydrate active enzymes

18

KEGG & COG: intrafamily comparisons

19

KEGG & COG: interfamily comparisons

20

CAZy: niche companion comparison

21

Database bias

22

From annotations to pathways

23

From annotations to pathways

• KEGG– 490 reference pathways

• BioCyc– MetaCyc

• 2,453 pathways• 2,063 organisms

– Pathway Tools• RAST

– ModelSEED• IMG• Kbase

24

25

Pathway Tools

26

Pathway Tools

27

Pathway Tools

28

Pathway Tools

29

Metabolic gap filling

• Are there missing enzymes within core pathways?– Are these truly missing or an assembly artefact?– Is there something missing suggestive of reliance on

other members of the community?

• How does your prediction compare to other, possibly cultured, species from similar environment?– Are there known culturing requirements for related

species?

30

Metabolic overview

31

Summary

32

• Test your environment before committing• Depth can be achieved across multiple

samples

• Test your environment before committing• Depth can be achieved across multiple

samples

Plan

• Combine all samples or subsets?• Combine all samples or subsets?

Assemble

• More samples means improved differential coverage binning

• Try multiple binning programs

• More samples means improved differential coverage binning

• Try multiple binning programs

Bin

• Databases may not capture everything -make use of multiple databases

• Databases may not capture everything -make use of multiple databases

Annotate

• Core metabolism can be inferred from reference pathways, phylogenetic neighbours and environmental neighbours

• Core metabolism can be inferred from reference pathways, phylogenetic neighbours and environmental neighbours

Infer

• Sangwan, N., F. Xia and J. A. Gilbert (2016). "Recovering complete and draft population genomes from metagenome datasets." Microbiome 4(1): 1-11.

• Turaev, D. and T. Rattei (2016). "High definition for systems biology of microbial communities: metagenomics gets genome-centric and strain-resolved." Current Opinion in Biotechnology 39: 174-181.

• Franzosa, E. A., T. Hsu, A. Sirota-Madi, et al. (2015). "Sequencing and beyond: integrating molecular 'omics' for microbial community profiling." Nature Reviews Microbiology 13(6): 360-372.

For the utility of different extraction methods:Albertsen, M., P. Hugenholtz, A. Skarshewski, et al. (2013). "Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes." Nature Biotechnology 31(6): 533-538.

Useful references

Acknowledgements

ACEPhil HugenholtzGene TysonDavid WoodNancy LachnerJoshua DalyNicola AngelSerene Low

TRI, University of QueenslandMark Morrison

33

University of NewcastlePhil HansbroShaan Gellatly

IMB, University of QueenslandMatt Cooper

AIBN, University of QueenslandLars NielsenCristiana Dal’MolinRobin Palfreyman

QFABJeremy Parsons

top related