Top Banner
1/42 Identifying Bacteriophages in Metagenomic Data Sets RCAM 2017 Vanessa Jurtz Technical University of Denmark October 17, 2017
42

Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

Mar 08, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

1/42

Identifying Bacteriophages in Metagenomic DataSets

RCAM 2017Vanessa Jurtz

Technical University of Denmark

October 17, 2017

Page 2: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

2/42

Contents

I Phages and why they matter

I MetaPhinder - Identifying phages

I Further characterization of sequences

I Phage cocktail data sets

Page 3: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

3/42

Phages and why they matter

”Bacteria rule the world and phages rule bacteria” -by MyaBreitbart

I most abundant organisms in biosphere 1031

I outnumber bacteria 10:1

I kill up to 50% of bacteria produced every day

I impact biogeochemical cycling of key elements such ascarbon, nitrogen and phosphorus

Page 4: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

4/42

Phages and why they matter

Figure adapted from http://upload.wikimedia.org/wikipedia/commons/e/e7/Phage

Page 5: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

5/42

Phages and why they matter

I 13 families based on morphology and nucleic acidcomposition (ICTV)

I phage genomes: single or double stranded DNA or RNA

I most sequenced phages today are of the orderCaudovirales tailed dsDNA phages

Page 6: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

6/42

Phages and why they matter

Figure adapted from Ceyssenset al. 2010 Inrtoduction to Bacteriophage biology and diversity; ASM Press

Page 7: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

7/42

Phages and why they matter

Phages have greatly impactedour understanding of biology!

I central dogma ofmolecular biology: DNA→ RNA → proteins

I first organism to besequenced phage MS2(ssRNA) in 1976 andphage φX174 (ssDNA) in1977

I phage typing, phagedisplay, CRISPR-Cas

Figure adapted from https://www.quora.com/How-are-scientists-able-to-identify-specific-bacteria

Page 8: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

8/42

Phages and why they matter

I Phages were discovered byF. Twort (1915) and F.d’Herelle (1917)

I F. d’Herelle was the firstto apply phages fortherapeutic purposes

I G. Eliava founded theEliava institute in Tsibilisi,Georgia in 1923

Felix d’Herelle Frederik Twort

George Eliava

Page 9: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

9/42

Phages and why they matter - Phage therapy

I antibiotics are easy to produce and store

I antibiotic resistances cause problems → postantibiotic era (WHO 2014)

I phages are specific to certain bacteria

I safety concerns (integrases, virulence factors)

I difficult licensing in western countries

I complex and dose independent pharmacokinetics

Eliava Institute

Page 10: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

10/42

Challenges in phage identification+characterization

I small phage genome size

I contribute around 2-5% of total DNA in metagenomic sample∗

I few fully sequenced phage genomes in public databases (< 6000)

I little annotation in general (protein function, host etc.)

*https://www.ncbi.nlm.nih.gov/pubmed/22864264

Page 11: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

11/42

MetaPhinder

identifying phage sequences inmetagenomic samples bydatabase comparison

Page 12: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

12/42

MetaPhinder- similar methods

MetaPhinder’s aim is only identifying phage contigs, therefore themethod itself remains very simple.

Page 13: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

13/42

MetaPhinder

Page 14: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

14/42

MetaPhinder

mosaic genomes:

genomic rearrangement:

Page 15: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

15/42

MetaPhinder

ANI = average nucleotide identityN = number of hitsid = blastn identityal = alignment lengthmcov = merged coverage

Page 16: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

16/42

MetaPhinder

I Can we find a %ANI threshold to classify a contig as phage?

I Which method should be used for database comparison?

Page 17: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

17/42

MetaPhinder

Which method should be used for database comparison?

● ● ●

0.7

0.8

0.9

1.0

0.5−5 5−25 25−50 50−100 100+length [kbp]

AU

C

● blastn

KmerFinder

tBLASTx

Page 18: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

18/42

MetaPhinder

Can we find a %ANI threshold to classify a contig as phage?

AUC: 0.9690.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00false positive rate

true

pos

itive

rat

e

A

0.00

0.25

0.50

0.75

1.00

0 25 50 75 100threshold [%ANI]

rate

false positive rate

true positive rate

B

threshold = 1.7 %ANI

Page 19: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

19/42

MetaPhinder

Page 20: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

20/42

MetaPhinder

Predicting prophage data sets:

Page 21: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

21/42

MetaPhinder

Practical experience on a data set of sewage samplesfrom all over the world:

→ %ANI threshold is too low

→ developement of MetaPhinder version 2

Page 22: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

22/42

MetaPhinder

No threshold specification!

I no need to redefine threshold if database is updated

I contig selection left at discretion of user

Page 23: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

23/42

MetaPhinder

Page 24: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

24/42

MetaPhinder

min. 10%ANI and %ANI > bacterial coverage

Page 25: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

25/42

MetaPhinder

MetaPhinder limitations:

I small size of phage database

I no discovery of completely new phages possible

I removal of prophage kmers from bacterial DB incomplete(due to incomplete annotation)

What about prophages?

I MetaPhinder is not designed for prophage annotation

I use specialized software: PHASTER, VirSorter, PhiSpy etc.

Page 26: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

26/42

Further characterization of sequences

Page 27: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

27/42

VirulenceFinder

Searches for virulence genes of Listeria, S. aureus, E. coli, Enterococcususing blastn.

Webservice: https://cge.cbs.dtu.dk/services/VirulenceFinder/

Page 28: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

28/42

ResFinder

ResFinder identifies acquired antimicrobial resistance genes.

Webservice: https://cge.cbs.dtu.dk/services/ResFinder/

Page 29: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

29/42

VirulenceFinder and ResFinder results

Page 30: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

30/42

HostPhinder

Julia Villaroel(PhD student DTU)

HostPhinder identifies the bacterialhost of a query phage genome based onits genomic similarity to a database ofphage genomes with known host.

Webservice: https://cge.cbs.dtu.dk/services/HostPhinder/

Page 31: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

31/42

HostPhinder

I kmer based comparison todatabase

I calculate coverage

I use scoring criterion wherenormalized coverages of databasehits with the same host aresummed

I correct predictions: genus 81%species 74%

Webservice: https://cge.cbs.dtu.dk/services/HostPhinder/

Page 32: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

32/42

HostPhinder

HostPhinder can only predict hosts that are part of the database!

Webservice: https://cge.cbs.dtu.dk/services/HostPhinder/

Page 33: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

33/42

Phage Cocktail data sets

phage solution for medical application consisting of several different phagespecies

Page 34: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

34/42

Phage Cocktail data sets

Henrike Zschach(PhD student DTU)

Julia Villaroel(PhD student DTU)

INTESTI cocktail:

I active against E. coli, Enterococcus, Proteus, P.aeruginosa, Shigella, Salmonella, Staphylococcus

I in use since 1937 (regularly updated every 6months)

I against intestinal infections

I analysis in 2015/2016

PYO cocktail:

I active against Staphylococcus, Streptococcus,Proteus, E. coli, P. aeruginosa

I against skin or wound infections

I analysis in 2017

Page 35: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

35/42

Phage Cocktail data sets

INTESTI PYO

Page 36: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

36/42

Phage Cocktail data sets

INTESTI PYO

I predicted hosts correspond well with advertised specificity

I no harmful genes discovered

Page 37: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

37/42

Phage Cocktail data sets

PYO cocktail: which DB phage is most similar to a given bin?→ reverse engineer MetaPhinder!

Page 38: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

38/42

Conclusion

I MetaPhinder compares contigs to a phage database

I new version also compares sequences to a bacterial database

I flexibility - users can create their own database

I small amount of sequenced phages in public databases

I phage therapy provides an alternative to antibiotics, therefore abetter understanding of phages is important

Page 39: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

39/42

Acknowledgments

Morten Nielsen(Professor DTU)

Henrike Zschach(PhD student DTU)

Julia Villaroel(PhD student DTU)

Mette Voldby Larsen(CEO GoSeqIt)

Ole Lund (Professor DTU)Frank Møller Aarestrup (Professor DTU)

Page 40: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

40/42

e-value

● ● ●

0.7

0.8

0.9

1.0

0.5−5 5−25 25−50 50−100 100+length [kbp]

AU

C

● blastn %ANI + e−value 0.05

blastn %ANI + e−value 1

blastn %ANI + e−value 1e−10

Page 41: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

41/42

KmerFinder vs. blastn

●●

●●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●●●●

●●●●

●●

● ●

● ●

●● ●●●●

●●●

●●

●●

● ●

● ●●

●● ●

●●●

●●

●●

●●

●●

●●

●●●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●●

●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●

●●

●●● ●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●● ●●● ●

●●●●●●●●●●●●●

●●

●●●●● ●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

8010

0

KmerFinder q_cov

blas

tn %

AN

I

%ANI = 100(q_cov( 1

16))

%ANI = 100(q_cov)

Page 42: Identifying Bacteriophages in Metagenomic Data Setsmaiage.jouy.inra.fr/sites/maiage.jouy.inra.fr/...25/42 MetaPhinder MetaPhinder limitations: I small size of phage database I no discovery

42/42

Top hit ANI vs. ANI all

0.000.010.020.03

0 25 50 75 100

dens

ity

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●● ●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

0

25

50

75

100

0 25 50 75 100% ANI top hit

%A

NI a

ll hi

ts

0

25

50

75

100

0.00

0.01

0.02

0.03

density