Metagenomics,usingNextGeneraon, Sequencing, · PDF fileOIE Seminar, Berlin 07th June 2013 Metagenomics,usingNextGeneraon, Sequencing, technology!!...

CARGE FLI

10. OIE Seminar, Berlin 07th June 2013

Metagenomics using Next Genera2on Sequencing technology

Mar2n Beer, Bernd Hoffmann, Ma=hias Scheuch, Dirk Höper

Ins%tute of Diagnos%c Virology

Preview

!   Introduc2on – pathogen detec2on

!   The metagenomic approach

!   Challenges of metagenomics

!   From sample prepara2on to NGS to data analysis

!   Summary and conclusions

BTV, SBV and Usutu outbreak as „indicator“

African Horse sickness

West Nile Virus

Rift Valley Fever

EHDV ASFV

CCHFV ??????


How to detect the unexpected or unknown?

The basic definition of metagenomics is the analysis of genomic DNA from a whole community.

Gilbert JA, Dupont CL (2011). Ann Rev Mar Sci 3: 347-71. 10.1146/annurev-marine-120709-142811

What is Metagenomics (1)?


What is Metagenomics (2)?

Metagenomics is the applica0on of modern genomics techniques to the study of communi0es of microbial organisms directly in their natural environments, bypassing the need for isola2on and lab cul2va2on of individual species.

Chen K, Pachter L (2005). PLoS Comput Biol 1(2): e24. doi:10.1371/journal.pcbi.0010024


Why Do Metagenomic Pathogen Detec2on (1)?

Because every causative agent of infectious disease relies on nucleic acids (NAs) both for its genome and gene expression (only known exception so far are transmissible spongiform encephalopathies) AND Modern shotgun sequencing methods detect all NAs in a given sample with ± equal probability


Suppose the following •  We have animals suffering from an unknown disease •  Standard targeted diagnos0cs do not reveal the causa0ve agent

•  We expect as the causa2ve agent a pathogen containing nucleic acids

→ it is straighQorward to comprehensively sequence total NAs from these animals to detect the pathogen


Because it works!

Metagenomics Workflow

Sample !

NGS method !

Data analysis!


For example: Roche Genome Sequencer Gs flex (up to 500 Mio. bp per run)

Next generation sequencing (NGS)

-  Read: short fragments (80 to 500 bp) -  Contig: larger sequence pieces assembled from reads

Key Issues for Diagnos2c Metagenomics

•  Pathogen detec0on in a metagenomic dataset is finding the needle in a haystack


Zoom 1000 X


Example •  CaZle genome: 2.97 Gbp •  pCV2 genome: 1768 b → mass ra0o pCV2:CaZle = 1:1,679,864


Sensi0vity is determined by •  the ra0o of the genome sizes (or in the RNA case

genome/total RNA) and •  the copy numbers of the genomes (RNA molecules)

1. Sensi2vity can easily be scaled up by simply producing more sequences to yield at least one pathogen read 2. Choosing the appropriate sample material, i.e. material with high loads of pathogen (and minimum host genome) is crucial



•  Huge datasets are generated, resul2ng in complex and 2me consuming data analysis

•  Classifica0on of the sequences relies on finding similari0es to known pathogens

•  Limited read length (with some instruments) impedes data analysis

•  How to deal with unclassifiable sequences: are they random ar0ficial or natural but unknown sequences? These sequences require intense manual evalua2on


Sample Choice – Sample Prepara2on


Sample Choice

-‐  Body fluid/0ssue material? -‐  Preferably the matrix with the lowest quan2ty of host nucleic acids (NA)

-‐  The highest possible quan0ty of pathogen NA -‐  DNA or RNA?

-‐  RNA is to be preferred over DNA because all replica0ng viruses generate RNA but only a few generate DNA


Sample Choice

Example 1 -‐  Fish are dying for unknown reason -‐  Sample from metabolically highly ac0ve gill 0ssue -‐  DNA virus expected

-‐  Sequencing library from RNA -‐  229,000 sequencing reads -‐  2 reads RNA virus (0.00087%)

Example 2 -‐  sequencing RNA isolated from serum from BVDV persistently infected caZle

-‐  5% viral reads

Sample Choice

Sample Choice

Example 3 -‐  Schmallenberg virus samples RNA from serum -‐  7/27,000 (0.026%) reads orthobunyavirus -‐  Library only sufficient to yield a total of approx. 85,000 reads

Sample Prepara2on

Sample Prepara2on

-‐  Usually high background of host nucleic acids -‐  Different nuclease diges0on dependent techniques for the deple0on of these: -‐  DNase SISPA -‐  Vidisca -‐  Kits for sample normaliza2on

-‐  BUT: nuclease digest means risk of irreversible informa0on loss

Sequencing

Sequencing

-‐  Various technologies available -‐  Read length is a cri2cal determinant -‐  Sanger with too low throughput -‐  Some plaQorms with high throughput require long sequencing 0me

For diagnos0cs 2me necessary 0ll comple0on may be an issue

Sequencing: Impact of Read Length

Success of read classifica0on depending on read length (GCReoV-‐Daten Histogramm)

Sequencing: Impact of Read Length

What if … the Schmallenberg-‐virus sequencing reads had been shorter?

Long reads Short Reads Mean length 315 96 Total No. reads 27420 27420 No. classified reads 26128 25310 No. Orthobunyavirus hits 7 1 Similarity with subject sequence 68.37 % -‐ 95.60 % 98.68 % No. unclassified reads 1292 2110

Data Analysis

Data Analysis

A known virus/bacterium with a sufficiently similar genome or amino acid sequence is necessary The more sequences are available for diverse viruses/ organisms, the higher the chance gets to iden0fy a novel/unexpected pathogen by sequence comparison

Data Analysis

-‐  A bigger database causes longer computa0on to find the most similar sequence

-‐  Too small databases produce significant hits that are not meaningful!

-‐  The stringency of the search algorithm is crucial

Metagenomic analysis

Beyond detec2on: fulfillment of Koch’s postulates

-‐  Discussion is necessary about the possibility and necessity to fulfill Koch’s postulates

-‐  Failure to fulfill these might result in false diagnosis

-‐  In animal diagnos0cs fulfillment can be possible and desirable (see SBV)


Mokili -‐ Koch’s postulates •  The diseased host metagenome in a sample is

significantly different from a control sample •  Inocula0on of the clinical sample in a healthy subject

should result in the same disease state •  Differen0al metagenomics traits iden0fied in step 1 have

to be recognized, only arer infec0on, in the metagenome of the inoculated subject

•  Inocula0on of a sample of the diseased subject from step 2 must induce disease in a newly infected subject

J. L. Mokili, F. Rohwer, B. E. Du0lh, Curr. Opin. Virol. 2, 63 (2012)

A new Orthobunyavirus - SBV


Conclusions

•  Metagenomics is a universal powerful tool for discovery of unexpected/unknown pathogens

•  The key to success is a high rela0ve abundance of the pathogen compared to the host nucleic acid in a given sample

•  In order to reduce the cost and to circumvent the boZleneck of data analysis, protocols for op0mized sample prepara0on are necessary

•  Op0mized data analysis and databases are crucial

Acknowledgements

All colleagues from the FLI contribu0ng to this work

All colleagues from na0onal and interna0onal ins0tutes sending in samples

Metagenomics,usingNextGeneraon, Sequencing, · PDF fileOIE Seminar, Berlin 07th June 2013 Metagenomics,usingNextGeneraon, Sequencing, technology!!...

Documents