Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characterization of the Intestinal Microbiome Dr. Rita R. Colwell University of Maryland, College Park Johns Hopkins University Bloomberg School of Public Health CosmosID Inc.
Jan 09, 2017
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characterization of the Intestinal Microbiome
Dr. Rita R. ColwellUniversity of Maryland, College Park
Johns Hopkins University Bloomberg School of Public HealthCosmosID Inc.
Culture - Numerical Taxonomy
Nucleic Acid (Base Composition)
Density Gradient Hybridization
Fluorescent Antibody Microscopy
Polymerase Chain Reaction (PCR)
Next Gen Sequencing
Metagenomics
1960
1965
1970
1975
1985
1996
2008
2000
A Timeline of Microbiology
First Numerical Approaches to Microbial Taxonomy
Culture - Numerical Taxonomy
Nucleic Acid (Base Composition)
Density Gradient Hybridization
Fluorescent Antibody Microscopy
Polymerase Chain Reaction
Next Gen Sequencing
Metagenomics
1960
1965
1970
1975
1985
1996
2008
2000
Solving the Mystery of the “Viable But Non-Culturable (VBNC)” Bacteria
Culture - Numerical Taxonomy
Nucleic Acid (Base Composition)
Density Gradient Hybridization
Fluorescent Antibody Microscopy
Polymerase Chain Reaction (PCR)
Next Gen Sequencing
Metagenomics
1960
1965
1970
1975
1985
1996
2008
2000
JD Oliver, Journal of Microbiology, 2005 Feb; 43 Spec No: 93-100
Over 400 papers have appeared on the VBNC phenomenon and over 1000 papers describing the various aspects of it.
Bacteria Described to Enter The VBNC state
Cholera: A Global Disease
• Acute water-related diarrheal disease
• Seventh pandemic started in 1960s
• Occurs in more than 50 countries affecting approximately 7 million people
• Bengal Delta is known as “native homeland” of cholera outbreaks
• Since cholera bacteria • exist naturally in aquatic habitats• evidence of new biotypes
emerging, it is highly unlikely that cholera will be eradicated but clearly can be controlled by provision of safe drinking water.
The Copepod Vector
Copepods were found to carry approximately 10,000 to 50,000 CFU of V. cholerae per copepod
Culture - Numerical Taxonomy
Nucleic Acid (Base Composition)
Density Gradient Hybridization
Fluorescent Antibody Microscopy
Polymerase Chain Reaction (PCR)
Next Gen Sequencing
Metagenomics
1960
1965
1970
1975
1985
1996
2008
2000
Source: The Institute for Genomic Research
Vibrio cholerae
Sequenced and published in 2000
Small Chromosome
Large Chromosome
Culture - Numerical Taxonomy
Nucleic Acid (Base Composition)
Density Gradient Hybridization
Fluorescent Antibody Microscopy
Polymerase Chain Reaction (PCR)
Next Gen Sequencing
Metagenomics
1960
1965
1970
1975
1985
1996
2008
2000
Role of Microbiome in Health and Wellness
AcneAlzheimer’s DiseaseAntibiotic-associated DiarrheaAtherosclerosis & ArthritisAsthma/AllergiesAttention Deficit Hyperactivity Disorder AutismAutoimmune Diseases (Multiple Sclerosis, Lupus, Rheumatoid arthritis)Bipolar DisorderCancerChronic Fatigue / FibromyalgiaCoeliac DiseaseChron’s DiseaseCystic FibrosisDental CavitiesDepression and AnxietyDiabetes Type 1 & 2EpilepsyEczemaIrritable Bowel SyndromeGastric UlcersMalnutritionNarcolepsyObesityParkinson’s DiseaseUlcerative Colitis
Identified Bacteria
Raw Sequence Reads
Biological specimen Community DNA
GenBookⓇ Biomarker Matching
GenBookⓇ AR/VF Library
TetR
CIPR
mecActxA
Microbial Identification &
Pathogen Characterization
GenBookⓇ Database
How It Works
DNA Sequencing
CosmosID for Automated Metagenomics in the 21st Century
CosmosID Analyzes Microbial DNA sequences
• Growing proprietary database of 65,000 microbial genomes (bacteria, viruses, fungi and parasites)
• In minutes, our software solution delivers unrivaled specificity and sensitivity
• Microbial Identification at subspecies/strain level• Relative abundance of the microbial community• Presence of antibiotic resistance and virulence
factors• Sequencing platform agnostic
• Can handle both short read and long read NGS data
• Illumina, ThermoFisher, Pacific Biosciences and Oxford Nanopore
• Commercial products (software and databases are well maintained and continuously updated)
Infectious Disease – Rapid Evolution
§ Previously recognized pathogens are evolving faster.
§ New, potentially dangerous pathogens are emerging every year.
§ Nosocomial and mixed microbial infections are dramatically increasing.
§ Many acute infectious diseases have unknown or poorly known etiology
§ Resident microflora in health and wellness Source: Clinical Infectious Diseases 2013;57(S3):S139–70
0.0 0.2 0.4 0.6 0.8 1.0
HC1
F1-score (Presence / Absence)
OneCodexAbundanceOneCodexCountsPhyloSiftClarkM1DefaultClarkM4SpacedKrakenNBCLMATPhyloSift90pctKrakenFilteredCosmosIDMetaphlanDiamondMeganBlastMeganFilteredCosmosIDFilteredBlastMeganFilteredLiberalOneCodexCountsFilteredOneCodexAbundanceFiltered
0.0 0.2 0.4 0.6 0.8 1.0
ds.soil
F1-score (Presence / Absence)
OneCodexCountsOneCodexAbundancePhyloSiftLMATKrakenClarkM4SpacedClarkM1DefaultPhyloSift90pctNBCKrakenFilteredBlastMeganFilteredLiberalMetaphlanCosmosIDCosmosIDFilteredOneCodexCountsFilteredOneCodexAbundanceFilteredBlastMeganFilteredDiamondMegan
Cos
mos
ID_f
ilter
ed
Cos
mos
ID
Bla
stM
egan
_filt
ered
Bla
stM
egan
_filt
ered
_lib
eral
LMA
T
Met
aPhl
An
Kra
ken_
filte
red
Phy
loS
ift_f
ilter
ed
Dia
mon
dMeg
an_f
ilter
ed
Kra
ken
One
Cod
ex
NB
C
Phy
loS
ift
One
Cod
ex_f
ilter
ed
0
20
40
60
80
100
perc
ent
subspecies
Bla
stM
egan
_filt
ered
Cos
mos
ID_f
ilter
ed
One
Cod
ex_f
ilter
ed
Bla
stM
egan
_filt
ered
_lib
eral
Dia
mon
dMeg
an_f
ilter
ed
Cos
mos
ID
Met
aPhl
An
Kra
ken_
filte
red
LMA
T
Phy
loS
ift_f
ilter
ed
CLA
RK
S
Kra
ken
CLA
RK
NB
C
One
Cod
ex
Phy
loS
ift
0
20
40
60
80
100
perc
ent
species
Bla
stM
egan
_filt
ered
Cos
mos
ID_f
ilter
ed
One
Cod
ex_f
ilter
ed
Bla
stM
egan
_filt
ered
_lib
eral
Dia
mon
dMeg
an_f
ilter
ed
Cos
mos
ID
Met
aPhl
An
Kra
ken_
filte
red
LMA
T
Phy
loS
ift_f
ilter
ed
CLA
RK
S
Kra
ken
CLA
RK
One
Cod
ex
NB
C
Phy
loS
ift
0
20
40
60
80
100
perc
ent
genus
F1-scoreprecisionrecallAUPR
B
A D
C
CosmosID Performance
Unpublished Data:
• 34 datasets of varying complexity and diversity
• 12 tools
0.0 0.2 0.4 0.6 0.8 1.0
HC1
F1-score (Presence / Absence)
OneCodexAbundanceOneCodexCountsPhyloSiftClarkM1DefaultClarkM4SpacedKrakenNBCLMATPhyloSift90pctKrakenFilteredCosmosIDMetaphlanDiamondMeganBlastMeganFilteredCosmosIDFilteredBlastMeganFilteredLiberalOneCodexCountsFilteredOneCodexAbundanceFiltered
0.0 0.2 0.4 0.6 0.8 1.0
ds.soil
F1-score (Presence / Absence)
OneCodexCountsOneCodexAbundancePhyloSiftLMATKrakenClarkM4SpacedClarkM1DefaultPhyloSift90pctNBCKrakenFilteredBlastMeganFilteredLiberalMetaphlanCosmosIDCosmosIDFilteredOneCodexCountsFilteredOneCodexAbundanceFilteredBlastMeganFilteredDiamondMegan
Cos
mos
ID_f
ilter
ed
Cos
mos
ID
Bla
stM
egan
_filt
ered
Bla
stM
egan
_filt
ered
_lib
eral
LMA
T
Met
aPhl
An
Kra
ken_
filte
red
Phy
loS
ift_f
ilter
ed
Dia
mon
dMeg
an_f
ilter
ed
Kra
ken
One
Cod
ex
NB
C
Phy
loS
ift
One
Cod
ex_f
ilter
ed
0
20
40
60
80
100
perc
ent
subspecies
Bla
stM
egan
_filt
ered
Cos
mos
ID_f
ilter
ed
One
Cod
ex_f
ilter
ed
Bla
stM
egan
_filt
ered
_lib
eral
Dia
mon
dMeg
an_f
ilter
ed
Cos
mos
ID
Met
aPhl
An
Kra
ken_
filte
red
LMA
T
Phy
loS
ift_f
ilter
ed
CLA
RK
S
Kra
ken
CLA
RK
NB
C
One
Cod
ex
Phy
loS
ift
0
20
40
60
80
100
perc
ent
species
Bla
stM
egan
_filt
ered
Cos
mos
ID_f
ilter
ed
One
Cod
ex_f
ilter
ed
Bla
stM
egan
_filt
ered
_lib
eral
Dia
mon
dMeg
an_f
ilter
ed
Cos
mos
ID
Met
aPhl
An
Kra
ken_
filte
red
LMA
T
Phy
loS
ift_f
ilter
ed
CLA
RK
S
Kra
ken
CLA
RK
One
Cod
ex
NB
C
Phy
loS
ift
0
20
40
60
80
100
perc
ent
genus
F1-scoreprecisionrecallAUPR
B
A D
C
Collaboration with Chris Mason, Weil Cornell Medicine
Accuracy of Relative Abundance
● ●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
● ●
●
● ●
−100
0100
200
300
400
500
600
700
● ●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
● ●
●
● ●
BLAS
T−Megan
BLAS
T−Megan−Liberal
CLAR
K
CLAR
K−S
Cosm
osID−filte
red
Cosm
osID
Diam
ond
Kraken−filte
red
Kraken
LMAT
MetaPalette
MetaPalette−Specific
Metaphlan2
NBC
OneCo
dex−filtered
OneCo
dex
Phylo
sift−filtered
Phylo
sift
Percent Difference of Estimated to True AbundanceBioPool & NARG1 samples
●
●
●
●
●●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●●●●●●
●●●●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●●●●●
●●●●
●
●●
●
●
●●●
●●
●
●●●●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●●
●
●
●●●●●●●●●●●●●●●●●●
●
●●●●
●
●
●●
●●●●●●●●
●●●●
●
●●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●
●
●●●●
●
●●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●●●●●
●
●
●●●●●●
●
●●
●●
●
●●●●
●
●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●
●
●●●
●
●
●●●
●
●●●●
●●
●
●
●●●●
●
●
−100
0100
200
300
400
500
600
700
800
●
●
●
●
●●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●●●●●●
●●●●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●●●●●
●●●●
●
●●
●
●
●●●
●●
●
●●●●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●●
●
●
●●●●●●●●●●●●●●●●●●
●
●●●●
●
●
●●
●●●●●●●●
●●●●
●
●●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●
●
●●●●
●
●●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●●●●●
●
●
●●●●●●
●
●●
●●
●
●●●●
●
●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●
●
●●●
●
●
●●●
●
●●●●
●●
●
●
●●●●
●
●
BLAS
T−Megan
BLAS
T−Megan−Liberal
CLAR
K
CLAR
K−S
Cosm
osID−filte
red
Cosm
osID
Diam
ond
Kraken−filte
red
Kraken
LMAT
MetaPalette
MetaPalette−Specific
Metaphlan2
NBC
OneCo
dex−filtered
OneCo
dex
Phylo
sift−filtered
Phylo
sift
Percent Difference of Estimated to True AbundanceHC LC samples
Synthetic Datasets Biological Datasets
Detection is one problem; abundance is much harder
Biological Specimens (i.e. Stool, CSF, etc.)
Sequencing
Further analysis using CLC • Functional• Mapping• Assembly
CLC
Sample to Analysis with CLC + CosmosID Plugin
CosmosIDCLC Plugin
X
Y
Z
BEST MATCH
• Microbial Identification (subspecies & strain)
• Antibiotic Resistance• Virulence Factors• Relative Abundance• Bacteria, fungi, protists,
viruses
WGS fasta or fastq file
Curated Genome Databases
§ Mostcomprehensiveandlargestcurateddatabases(>65,000genomes)
§ Organizedasphylogenetictrees
§ Twotypesofbiomarkers:§ Uniquetotheorganism,and§ Sharedacrossthephylogeneticlineageinthetree
Protists
CosmosID Use Cases
Some Applications:
Microbiome ResearchClinical MetagenomicsEmerging and Re-emerging Pathogen DiscoveryPolymicrobial Infection DynamicsHospital-associated InfectionsOutbreak InvestigationFood SafetyFunctional FoodHuman MicrobiomeHome MicrobiomeSubway MicrobiomeAnimal MicrobiomePharmaceuticals R&DOil MetagenomeEnvironmental ScreeningCosmeticsClinical Trial
Analyzed >30K Biological Samples
History/Cholera
Metagenomics
Microbiomes in Health and Disease:Microbiome Analysis of Acute Diarrheal
Patients Compared with Healthy Individuals
Total # of NICED samples: 74Indian Healthy Control (HC): 20Sick with Unknown Etiology (UE): 28Sick with Known Etiology (KE): 26Healthy Human Microbiome Project (HMP): 20
BacteriaVibrio choleraeVibrio parahaemolyticusVibrio fluvialisAeromonas spp.Campylobacter jejuniCampylobacter coliShigellaSalmonellaEscherichia coli
VirusesRotavirusAdenovirusNorovirusSapovirusAstrovirus
ParasitesGiardia lambliaCryptosporidium parvumEntamoeba histolyticaBlastocystis hominis
Microbiome of Acute Diarrheal Patients Compared with Healthy Individuals
In collaboration with the National Institute of Cholera and Enteric Disease (NICED), Calcutta, India
Enteric Pathogens Monitored By NICED
DIAR
RHEA
LPATIEN
TS
HEALTH
YINDIVIDU
ALS
A Subpopulation is Overrepresented in Diarrheal Patients Compared to Healthy Individuals
γ- Proteobacteria – Firmicutes - Bacteroidetes
N2_GY16
N2_GY31
N2_GY26
N2_GY09
N2_GY23
N2_GY29
N2_GY30
N2_GY14
N2_GY19
N3_IDH_20N3_IDH_33N3_IDH_38N2_G
Y22N2_G
Y28N2_G
Y11N2_G
Y12N3_IDH_37HM
P_SRS023583HM
P_SRS043001HM
P_SRS017433HM
P_SRS019601HM
P_SRS058770HM
P_SRS064557HM
P_SRS015854HM
P_SRS017701HM
P_SRS017307HM
P_SRS015190HM
P_SRS016335HM
P_SRS022609HM
P_SRS056259HM
P_SRS019968HM
P_SRS011586HM
P_SRS013476HM
P_SRS020233HM
P_SRS019161N3_IDH_10AN3_IDH_2N3_NICED_27N2_CSN10N3_NICED_26N2_G
Y20N2_G
Y13HM
P_SRS012902N3_NICED_24N3_IDH_40N3_NICED_28N3_IDH_32N3_IDH_17N2_CSN1N2_CSN6N2_CSN8N3_NICED_30N3_IDH_16N3_IDH_6AN3_IDH_18N3_IDH_12N2_G
Y17N3_IDH_9N3_IDH_4N3_NICED_22N3_NICED_25N2_CSN9HM
P_SRS013215N2_G
Y21N2_G
Y25N2_G
Y18N2_CSN5N3_IDH_7N3_IDH_36N2_G
Y27N2_CSN3N2_CSN2N2_CSN7N2_G
Y15
AcinetobacterSpirochaetaceaeSphingomonadaceaeShigellaFlavobacteriaceaeNeisseriaBurkholderiaceaeLactobacillusSynergistaceaeSalmonellaActinomycetalesActinomycetaceaeLactococcus lactisMoraxellaceaeCitrobacterCarnobacteriaceaeMicrococcaceaeNeisseriaceaeFusobacteriumPeptoniphilaceaeEnterobacter cloacae complexEnterobacterialesKlebsiellaAcidaminococcaceaeCampylobacteralesCampylobacteraceaeAeromonadaceaeTenericutesBacillales Family XI. Incertae SedisVibrioCoriobacterialesOxalobacteraceaeMethanobacteriaceaePeptostreptococcaceaeBurkholderialesErysipelotrichalesVerrucomicrobiaceaeHelicobacteraceaeMycoplasmataceaeCampylobacter jejuniActinobacteriaBacteriaBrachyspiraceaeSelenomonadalesEnterococcaceaeBifidobacterialesBifidobacteriumStreptococcusLactobacillaceaeLeuconostocaceaeEnterobacteriaceaeEscherichia coliCoriobacterineaeDesulfovibrionalesFusobacteriaceaeDesulfovibrionaceaeCoriobacteriaceaeClostridiaceaeRikenellaceaeClostridiaErysipelotrichaceaeEubacteriaceaeunclassified ClostridialesClostridialesSutterellaceaeBacteroidalesPorphyromonadaceaeBifidobacteriaceaePasteurellaceaeStreptococcaceaePrevotellaceaeVeillonellaceaeRuminococcaceaeLachnospiraceaeBacteroidaceae Group
HCHMPKEUE
FamilyLevel
Many pathogens can readily be identified from disease patients
: Known Etiology: Unknown Etiology
UE_N2_GY31KE_N3_IDH_5
UE_N3_IDH_40UE_N2_GY09
UE_N3_IDH_36UE_N2_GY22UE_N2_GY12
KE_N3_IDH_19UE_N2_GY23UE_N2_GY28UE_N2_GY26UE_N2_GY19
UE_N3_IDH_35UE_N2_GY17UE_N2_GY14
KE_N3_IDH_20UE_N3_IDH_34UE_N3_IDH_33
UE_N2_GY20UE_N2_GY29
UE_N3_IDH_39UE_N2_GY11KE_N3_IDH_3
HC_N3_NICED_23UE_N2_GY18UE_N2_GY13
UE_N3_IDH_31KE_N3_IDH_7KE_N3_IDH_9
HC_N3_NICED_26KE_N3_IDH_17KE_N3_IDH_16
UE_N2_GY21UE_N2_GY30
UE_N3_IDH_37HC_N2_CSN10
HC_N2_CSN3UE_N2_GY15HC_N2_CSN2HC_N2_CSN9
KE_N3_IDH_12HC_N3_NICED_22
KE_N3_IDH_10AKE_N3_IDH_2HC_N2_CSN5UE_N2_GY25
KE_N3_IDH_18HC_N3_NICED_25
HC_N2_CSN7KE_N3_IDH_6A
KE_N3_IDH_4HC_N3_NICED_27
UE_N2_GY27UE_N3_IDH_32UE_N3_IDH_38
UE_N2_GY24UE_N2_GY16KE_N3_IDH_8
KE_N3_IDH_14KE_N3_IDH_11
HC_N3_NICED_30HC_N3_NICED_28HC_N3_NICED_24
HC_N2_CSN8HC_N2_CSN1HC_N2_CSN6
Esch
erich
ia_c
oli_
TW11
681
Esch
erich
ia_c
oli_
str_
K−12
_sub
str_
DH10
BEs
cher
ichia
_col
i_TW
1059
8Es
cher
ichia
_col
i_E4
82/B
41Es
cher
ichia
_col
i_M
S_14
5−7
Esch
erich
ia_c
oli_
NA11
4Es
cher
ichia
_col
i_2_
3916
Esch
erich
ia_c
oli_
MS_
21−1
Esch
erich
ia_c
oli_
SMS−
3−5
Esch
erich
ia_c
oli_
XH14
0AEs
cher
ichia
_col
i_M
S_11
6−1
Esch
erich
ia_c
oli_
ETEC
_H10
407
Esch
erich
ia_c
oli_
MS_
69−1
Esch
erich
ia_c
oli_
MS1
75/1
16Es
cher
ichia
_col
i_H2
52Es
cher
ichia
_col
i_ST
EC_B
2F1
Shig
ella
_sp_
D9Es
cher
ichia
_col
i_SE
15Es
cher
ichia
_col
i_B
Esch
erich
ia_c
oli_
MS_
196−
1Es
cher
ichia
_col
i_3_
3884
/96_
154/
O11
3_H2
1Es
cher
ichia
_col
i_55
989
Esch
erich
ia_c
oli_
O10
4:H4
_mai
nEs
cher
ichia
_col
i_O
157:
H7_s
tr_FR
IK20
00Es
cher
ichia
_col
i_M
S_11
5_1/
2_41
68/T
W11
681
Esch
erich
ia_c
oli_
H736
Esch
erich
ia_c
oli_
H494
Esch
erich
ia_c
oli_
E22/
1200
9/3_
2608
_spl
itEs
cher
ichia
_col
i_O
104:
H4_0
1/04
/09−
8351
Node
_664
2Es
cher
ichia
_col
i_18
27−7
0Es
cher
ichia
_col
i_TW
1072
2/M
S145
Esch
erich
ia_c
oli_
E22
Esch
erich
ia_c
oli_
ATCC
_873
9Es
cher
ichia
_col
i_TW
1442
5Es
cher
ichia
_col
i_UM
N026
/FVE
C/04
2cv
mar
_001
2_at
_119
9_Es
cher
ichia
_col
i_G
ENE_
yjeH
cvm
ar_0
016_
at_8
05_E
sche
richi
a_co
li_G
ENE_
yjeJ
cvm
ar_0
017_
at_9
63_E
sche
richi
a_co
li_G
ENE_
yjeK
cvm
ar_0
019_
x_at
_93_
Esch
erich
ia_c
oli_
GEN
E_su
gEcv
mar
_002
0_at
_394
_Esc
heric
hia_
coli_
GEN
E_bl
ccv
mar
_002
1_at
_111
7_Es
cher
ichia
_col
i_G
ENE_
ampC
cvm
ar_0
022_
at_3
41_E
sche
richi
a_co
li_G
ENE_
frdD
cvm
ar_0
023_
at_3
38_E
sche
richi
a_co
li_G
ENE_
frdC
cvm
ar_0
024_
at_6
10_E
sche
richi
a_co
li_G
ENE_
frdB
cvm
ar_0
025_
at_1
625_
Esch
erich
ia_c
oli_
GEN
E_frd
Acv
mar
_014
4_at
_241
_Esc
heric
hia_
coli_
GI_
2163
7409
cvm
ar_0
170_
at_5
01_E
sche
richi
a_co
li_G
ENE_
cat
cvm
ar_0
374_
s_at
_486
_Esc
heric
hia_
coli_
GEN
E_m
rxcv
mar
_037
7_s_
at_9
4_Es
cher
ichia
_col
i_G
I_51
0368
5cv
mar
_037
8_s_
at_8
5_Es
cher
ichia
_col
i_G
I_51
0368
6cv
mar
_044
6_s_
at_2
49_E
sche
richi
a_co
li_G
I_15
2580
cvm
ar_0
448_
s_at
_403
_Esc
heric
hia_
coli_
GI_
1525
82cv
mar
_044
9_s_
at_8
79_E
sche
richi
a_co
li_G
I_15
2583
cvm
ar_0
450_
s_at
_198
_Esc
heric
hia_
coli_
GI_
1525
84cv
mar
_045
1_s_
at_1
72_E
sche
richi
a_co
li_G
I_15
2585
cvm
ar_0
453_
s_at
_740
_Esc
heric
hia_
coli_
GI_
1525
87cv
mar
_066
6_s_
at_2
80_E
sche
richi
a_co
li_G
I_16
4886
6cv
mar
_071
4_s_
at_1
94_E
sche
richi
a_co
li_G
ENE_
orfF
mvir
_242
83_s
_at_
1882
_Esc
heric
hia_
coli_
Gen
eID_
ECC2
359
mvir
_242
87_s
_at_
1457
_Esc
heric
hia_
coli_
Gen
eID_
ECC2
363
mvir
_242
89_s
_at_
1551
_Esc
heric
hia_
coli_
Gen
eID_
ECC2
364
mvir
_245
64_a
t_45
7_Es
cher
ichia
_col
i_G
ENE_
cfa−
Im
vir_2
4567
_at_
474_
Esch
erich
ia_c
oli_
GEN
E_af
aE−1
mvir
_245
76_a
t_27
7_Es
cher
ichia
_col
i_G
ENE_
draA
mvir
_246
14_x
_at_
705_
Esch
erich
ia_c
oli_
GEN
E_om
pAm
vir_2
4697
_at_
116_
Esch
erich
ia_c
oli_
GEN
E_lta
mvir
_247
01_a
t_21
6_Es
cher
ichia
_col
i_G
ENE_
sta
mvir
_247
11_a
t_13
91_E
sche
richi
a_co
li_G
ENE_
pet
mvir
_247
21_s
_at_
82_E
sche
richi
a_co
li_G
ENE_
set1
Am
vir_2
4752
_at_
939_
Esch
erich
ia_c
oli_
GEN
E_es
pGm
vir_4
3038
_at_
335_
Esch
erich
ia_c
oli_
GEN
E_ag
g3B
mvir
_430
41_a
t_15
27_E
sche
richi
a_co
li_G
ENE_
agg3
Cm
vir_4
3044
_at_
495_
Esch
erich
ia_c
oli_
GEN
E_ag
g3D
mvir
_430
59_a
t_26
6_Es
cher
ichia
_col
i_G
ENE_
aggR
mvir
_430
68_a
t_38
6_Es
cher
ichia
_col
i_G
ENE_
cs3
mvir
_430
77_a
t_36
2_Es
cher
ichia
_col
i_G
ENE_
cseA
mvir
_430
92_s
_at_
405_
Esch
erich
ia_c
oli_
GEN
E_af
aE−3
mvir
_431
01_a
t_47
2_Es
cher
ichia
_col
i_G
ENE_
draE
2m
vir_4
3143
_at_
494_
Esch
erich
ia_c
oli_
GEN
E_aa
tBm
vir_4
3149
_at_
1071
_Esc
heric
hia_
coli_
GEN
E_aa
tDm
vir_4
3155
_at_
525_
Esch
erich
ia_c
oli_
GEN
E_dr
aBm
vir_4
3158
_at_
1738
_Esc
heric
hia_
coli_
GEN
E_dr
aCm
vir_4
3161
_at_
440_
Esch
erich
ia_c
oli_
GEN
E_dr
aDm
vir_4
3164
_at_
168_
Esch
erich
ia_c
oli_
GEN
E_dr
aPm
vir_4
3188
_s_a
t_18
51_E
sche
richi
a_co
li_G
ENE_
papC
mvir
_432
87_s
_at_
1880
_Esc
heric
hia_
coli_
GEN
E_iu
tAm
vir_4
3338
_at_
202_
Esch
erich
ia_c
oli_
GEN
E_ltb
mvir
_433
50_s
_at_
160_
Esch
erich
ia_c
oli_
GEN
E_se
t1B
mvir
db_0
213_
s_at
_116
_Esc
heric
hia_
coli_
GEN
E_tn
pRm
virdb
_022
8_at
_266
0_Es
cher
ichia
_col
i_G
ENE_
tnpA
mvir
db_0
243_
at_2
85_E
sche
richi
a_co
li_G
ENE_
tnpA
mvir
db_0
247_
at_9
3_Es
cher
ichia
_col
i_G
ENE_
ycdA
mvir
db_0
249_
at_2
96_E
sche
richi
a_co
li_G
ENE_
stbB
mvir
db_0
250_
at_5
50_E
sche
richi
a_co
li_G
ENE_
stbA
mvir
db_0
261_
s_at
_58_
Esch
erich
ia_c
oli_
GEN
E_yd
dAm
virdb
_026
1_x_
at_3
05_E
sche
richi
a_co
li_G
ENE_
yddA
mvir
db_0
273_
at_7
9_Es
cher
ichia
_col
i_G
ENE_
yehA
mvir
db_0
276_
at_4
42_E
sche
richi
a_co
li_G
I_38
6061
13m
virdb
_052
0_s_
at_2
58_E
sche
richi
a_co
li_G
ENE_
sugE
mvir
db_0
521_
at_5
27_E
sche
richi
a_co
li_G
ENE_
blc
mvir
db_0
580_
x_at
_99_
Esch
erich
ia_c
oli_
GI_
3334
641
mvir
db_0
606_
s_at
_103
_Esc
heric
hia_
coli_
GEN
E_qa
cEde
lta1
mvir
db_0
644_
at_2
842_
Esch
erich
ia_c
oli_
GI_
1432
6200
mvir
db_0
835_
at_2
775_
Esch
erich
ia_c
oli_
GI_
3798
3283
mvir
db_0
849_
at_3
17_E
sche
richi
a_co
li_G
ENE_
lacY
mvir
db_0
856_
x_at
_228
_Esc
heric
hia_
coli_
GEN
E_tra
A
org
Sam
ple
Escherichia coli
Identify Organism Characterize for Accessory Genes
Unkn
own
Etio
logy
Unknown Etiology samples predominantly contain members of E. coli super familyKn
own
Etio
logy
Healt
hyCo
ntro
ls
NICED PCA Bray-Curtis distanceHMPHCUEKE
Treatment Group
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 -1-0.75
-0.5-0.2500.250.50.75
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
Microbiome of Healthy People in India Different From That of Western Europeans
Heatlhy ControlUnknown EtiologyKnown Etiology
HMP
Number of Individuals with AMR genes present in microbiome
beta.lactamase
tetracycline
sulphonamide
rifampicin
quinolone
fosfomycin
nitroimidazole
phenicol
macrolide
trimethoprim
aminoglycoside
Unknown Etiology
Known Etiology
Healthy or Asymptomatic Control
0
4.8
9.6
15
Genes which match at > 50% coverageHMP samples had no genes present which matched at this level of coverage
Predominance of genes related to carbohydrate metabolism
Amino Acids and DerivativesCarbohydrates
Cell Wall and Capsule
Cofactors, Vitamins,Prosthetic Groups, Pigments
DNA Metabolism
Membrane Transport
Protein Metabolism
unclassified
Alanine, serine, and glycine
Arginine; urea cycle, polyamines
Aromatic amino acids and derivatives
Branchedunclassifiedchain amino acids
Glutamine, glutamate, aspartate, asparagine; ammonia assimilationHistidine Metabolism
Lysine, threonine, methionine, and cysteineProline and 4unclassifiedhydroxyprolineunclassified_1Aminosugars
Central carbohydrate metabolism
CO2 fixation
Diunclassified and oligosaccharides
Fermentation
Monosaccharides
Oneunclassifiedcarbon Metabolism
Organic acids
Polysaccharides
Sugar alcohols
unclassified_2
Capsular and extracellular polysacchrides
GramunclassifiedNegative cell wall components
GramunclassifiedPositive cell wall componentsunclassified_3
CRISPsDNA recombinationDNA repair DNA replicationDNA uptake, competence
unclassified_4
ABC transporters
Protein and nucleoprotein secretion system, Type IVProtein secretion system, ChaperoneunclassifiedUsher pathway (CU)
Protein secretion system, Type IIProtein secretion system, Type III
Protein secretion system, Type VIProtein secretion system, Type VIIProtein secretion system, Type VIII (Extracellular nucleation/precipitation pathway, ENP)
Protein translocation across cytoplasmic membraneSugar Phosphotransferase Systems, PTS
TRAP transportersUniunclassified Symunclassified and Antiporters
Protein biosynthesis
Protein degradation
Protein folding
Protein processing and modification
SecretionSelenoproteins
Arabinose Sensor andtransport moduleCell Division and Cell Cycle
Central metabolism
Clusteringunclassifiedbased subsystemsDormancy and Sporulation
Fatty Acids, Lipids, and Isoprenoids
Iron acquisition and metabolism
Metabolism of Aromatic Compounds
Miscellaneous
Motility and ChemotaxisNitrogen Metabolism
Nucleosides and Nucleotides
Phages, Prophages,Transposable elements
Phages, Prophages, Transposableelements, Plasmids
Phosphorus MetabolismPhotosynthesisPotassium metabolism
Predictions based on plantunclassifiedprokaryote comparative analysis
Regulation and Cell signaling
RespirationRNA Metabolism
Secondary Metabolism
Stress Response
Sulfur MetabolismTranscriptional regulation
Virulence
Virulence, Disease and Defense
unclassified
Alanine, serine, and glycine
Arginine; urea cycle, polyamines
Aromatic amino acids and derivatives
Branchedunclassifiedchain amino acids
Glutamine, glutamate, aspartate, asparagine; ammonia assimilationHistidine Metabolism
Lysine, threonine, methionine, and cysteineProline and 4unclassifiedhydroxyprolineunclassified_1Aminosugars
Central carbohydrate metabolism
CO2 fixation
Diunclassified and oligosaccharides
Fermentation
Monosaccharides
Oneunclassifiedcarbon Metabolism
Organic acids
Polysaccharides
Sugar alcohols
unclassified_2
Capsular and extracellular polysacchrides
GramunclassifiedNegative cell wall components
GramunclassifiedPositive cell wall componentsunclassified_3
CRISPsDNA recombinationDNA repair DNA replicationDNA uptake, competence
unclassified_4
ABC transporters
Protein and nucleoprotein secretion system, Type IVProtein secretion system, ChaperoneunclassifiedUsher pathway (CU)
Protein secretion system, Type IIProtein secretion system, Type III
Protein secretion system, Type VIProtein secretion system, Type VIIProtein secretion system, Type VIII (Extracellular nucleation/precipitation pathway, ENP)
Protein translocation across cytoplasmic membraneSugar Phosphotransferase Systems, PTS
TRAP transportersUniunclassified Symunclassified and Antiporters
Protein biosynthesis
Protein degradation
Protein folding
Protein processing and modification
SecretionSelenoproteins
5
10
15
Average Abundance (%)
Functional GroupsLevel 1 Abundance < 5%
Functional GroupsLevel 1 Abundance > 5%
Functional analysis
Qiagen Functional analysis – Reinforces the Community Composition
Beta-diversity analysis
Permutation analysis (significance of
clustering)
Differential abundance analysis
Evaluation of differentially
abundant function
HMP
Indian healthy
HMP
Indian healthy
Ontology (GO) Clustering based on Pfam Clustering based on Gene
Summary
• Microbial communities found in the healthy volunteers suggest that themicrobiome of healthy humans of Indian descent is markedly different thanthose of Western European descent.
• Indian population may tolerate low number of pathogenic microorganismsthat may indicate a “disease state” for Western European descent
• Metadata revealed that patients who exhibited profound watery diarrhea contained in their microbiome pathogens primarily of the Escherichia colicomplex, namely pathogenic E. coli and Shigella species.
• Multiple pathogens can readily be identified from disease patients
• Microbial community of Indian population encodes alarming rate of antibiotic resistance genes
• Functional analysis of the Indian microbiome indicates predominance of carbohydrate metabolism genes
• Over abundance of cyto-/hemolysis genes observed in unknown etiology help explain diseased state
Pilot Studies Underway: NGS based (culture free) direct detection
Study Area Sample TypeWound and surgical Infections
Tissues, aspirates, swabs
Infective Endocarditis cardiac valves
Broad range pathogen detection CSF and Biopsies
HCAP BAL specimens;
Necrotizing Fasciitis Tissues: Muscle, Lung, Liver
Neonatal Sepsis blood, CSF, urine
Orthopedic Infection Pure isolates
Cystic Fibrosis and UTI Stool and Urine
Study Area Sample TypeHealthcare Associated Infections: Strainsubtyping and molecular epidemiology
Hospital Isolates
HAI: infection control and source tracking Isolates and biofilms
Broad Range Pathogen detection Blood; Ulcer, isolates
Empyema Plural effusionNeutropenic Infection & Aseptic Meningitis
Blood, CSF, Throat swab etc
Strain ID and sub-typing Clinical IsolatesProsthetic Joint Infection Tissue, swab
Lyme Disease Blood, CSF
CosmosID is working with the top research hospitals in the United States and Europe
Enabling End to End Microbiome Research with Great Partners
Sample Collection & Biobanking
DNA Isolation
Sequencing
High Resolution Microbial Characterization and Identification
Functional Analysis
Sequencing Partners
Sample
Action
How CosmosID Works
Reference database
Metagenomic SampleUnique and Shared Regions Identified
Sample Matched with Database
IdentificationStaphylococcus aureus subsp aureus USA300 TCH959
Propionibacterium acnes KPA171202
Enterococcus faecalis OG1RF