Top Banner
Cell Host & Microbe, Volume 20 Supplemental Information Stable Engraftment of Bidobacterium longum AH1206 in the Human Gut Depends on Individualized Features of the Resident Microbiome María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, Amy O'Callaghan, Marco Ventura, Douwe van Sinderen, Benjamin Hillmann, Pajau Vangay, Dan Knights, Robert W. Hutkins, and Jens Walter
23

María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Aug 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Cell Host & Microbe, Volume 20

Supplemental Information

Stable Engraftment of Bifidobacterium longum

AH1206 in the Human Gut Depends on Individualized

Features of the Resident Microbiome

María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, AmyO'Callaghan, Marco Ventura, Douwe van Sinderen, Benjamin Hillmann, PajauVangay, Dan Knights, Robert W. Hutkins, and Jens Walter

Page 2: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Abdominal pain

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

,max

3)

of TOP of consumption

Heartburn

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Acid regurgitation

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Sucking sensationsin the stomach

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Nausea and vomiting

BL 7 days 14 days 8 days0

1

2

3S

ympt

om s

core

(min

0, m

ax 3

)

of TOP of consumption

Borborygmus

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Abdominal distension

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Eructation

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Increased flatus

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Passage of stools

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Loose stools

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Hard stools

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Urgent need fordefecation

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

Feeling of incomplete defecation

BL 7 days 14 days 8 days0

1

2

3

Sym

ptom

sco

re (m

in 0

, max

3)

of TOP of consumption

PlaceboProbiotic

Figure S1

Supplemental Figures

Page 3: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

−0.5

0.0

0.5

−1.0 −0.5 0.0 0.5 1.0MDS1

MD

S2

−0.5

0.0

0.5

−1.0 −0.5 0.0 0.5 1.0MDS1

MD

S2

ABCDEFGHIJKMOPQRSTUVWX

A B

C

D

Clostridiales IncertaeSedis XIII

Rela

tive

abun

danc

e (%

)

BL Pla Pro0.0

0.2

0.4

0.6

0.8

aba

b

Anaerovorax

Rela

tive

abun

danc

e (%

)

BL Pla Pro0.0

0.2

0.4

0.6

ab a

bCollinsella

Rela

tive

abun

danc

e (%

)

BL Pla Pro0

2

4

6

8

10

aba

b

B.l

ongu

m s

peci

esre

lativ

e ab

unda

nce

(%)

baseline Placebo Probiotic TOP 28d0.0

0.5

1.0

1.5

2.0E

Rela

tive

abun

danc

e (%

)

BL Pla Pro BL Pla Pro BL Pla Pro BL Pla Pro BL Pla Pro0

1

2

3

8

10

12

B. longum

B. pseudocatenulatum99%

B. adolescentis

B. adolescentis99%

Bifidobacterium

−0.6

−0.3

0.0

0.3

0.6

−0.8 −0.4 0.0 0.4MDS1

MD

S2

group

baselineplaceboprobiotic

16S rRNA gene tags WMS

Figure S2

Persisters

Page 4: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

0

2

4

6

8

10

12

100 % >99 % >98 % >97 % >95 % >93 %

****

*

*

n.s.n.s.

Perc

enta

ge o

f seq

uenc

es

Percentage of Identity to AH1206 16S rRNA

0

2

4

6

8

10

12

100 % >99 % >98 % >97 % >95 % >93 %

* ** *

n.s.n.s.

n.s.

Perc

enta

ge o

f seq

uenc

es

Percentage of Identity to AH1206 16S rRNA

16 rRNA - OTU

Collinsella aerofaciens Eggerthella lenta

**

**

16S rRNA - MED

**p 0.0022m.i. 6.12

p 0.0497m.i. 2.67

p 0.0357m.i. 1.27

A B

Bifidobacterium longum

qRT PCR

*

C

Baseline PlaceboD

Non-persistersPersisters

Non-persistersPersisters

Figure S3

Page 5: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

A B C

D E F

g_2015

Log1

0 (g

ene

copy

num

ber/

gr o

f fec

es)

NP_b P_b NP_28 P_28 NP_20 P_200

2

4

6

8

10

12

baseline 28 days TOP

~200 daysTOP

* n.s. n.s.

g_1077

Log1

0 (g

ene

copy

num

ber/

gr o

f fec

es)

NP_b P_b NP_28 P_28 NP_20 P_200

2

4

6

8

10

12

baseline 28 days TOP

~200 daysTOP

* n.s. n.s.

g_1467

Log1

0 (g

ene

copy

num

ber/

gr o

f fec

es)

NP_b P_b NP_28 P_28 NP_20 P_200

2

4

6

8

10

12

baseline 28 days TOP

~200 daysTOP

* n.s. n.s.

g_1830

Log1

0 (g

ene

copy

num

ber/

gr o

f fec

es)

NP_b P_b NP_28 P_28 NP_20 P_200

2

4

6

8

10

12

baseline 28 days TOP

~200 daysTOP

** n.s. n.s.

g_1136Lo

g10

(gen

e co

pynu

mbe

r/ gr

of f

eces

)

NP_b P_b NP_28 P_28 NP_20 P_200

2

4

6

8

10

12

baseline 28 days TOP

~200 daysTOP

* n.s. n.s.

g_0877

Log1

0 (g

ene

copy

num

ber/g

r of f

eces

)

NP_b P_b NP_28 P_28 NP_20 P_200

2

4

6

8

10

12

baseline 28 days TOP

~200 daysTOP

** * **

Figure S4

Page 6: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

A

B

Melezitose/Raffinose cluster

BL1206_2005 BL1206_2019

Bbr_1855 Bbr_1869

90.3% 98.9% 98.7% 100% 100% 99.8% 100% 99.3% 98.2% 98.5% 99.6% 98.6% 98.1% 90.0% 91.3%

B. breve UCC2003

B. longumsubsp. longumAH1206

g 2015

4000 8000 12000 16000

BL1206_0408 BL1206_0413

Bbr_0417 Bbr_0422

97.1% 90.9% 88.1% 96.9% 97.7% 97.1%

Galactan cluster

g 0877

Alpha glucosidaseAlpha galactosidaseBeta galactosidaseEndogalactanaseTransporterTranscriptional regulatorHypothetical proteinTransposaseOther

B. breve UCC2003

B. longumsubsp. longumAH1206

Figure S5

Page 7: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Supplemental Figure Legends Figure S1. Reported symptoms before, during and after placebo and AH1206 administration (Related to Figure 1). Symptoms as assessed with a Gastrointestinal Symptom Rating Scale (GSRS) obtained during the double-blind, placebo-controlled, crossover study. Results for each evaluated symptom are shown in individual plots. Data are represented as mean and standard error. No significant differences were detected. Figure S2. Impact of administration of B. longum subsp. longum strain AH1206 on the composition of the resident fecal microbiota (Related to Figure 2). Non-metric multidimensional scaling (NMDS) plots of beta diversity of the fecal microbiota of all subjects during baseline, probiotic treatment, and placebo color coded by (A) treatment and (B) individual. Administration of AH1206 did not change the structure of the overall bacterial communities of the fecal microbiota (p = 0.999), which continued to cluster by subjects independently of treatment (p = 0.005) as determined by NMDS scaling analysis and permutational multivariate analysis of variance (Adonis PERMANOVA) based on the Bray-Curtis distance. (C) Some taxa had significantly different proportions between the probiotic and placebo treatment (14 days of consumption) (the family Clostridiales Incertae Sedis XIII and the genus Anaerovoax) or with the baseline (the genus Collinsella), but none was consistently different in both comparisons (Figure S2C), suggesting that these shifts were not induced specifically by AH1206. Significantly different groups are represented by different letters. (D) Average relative abundance of the genus Bifidobacterium, and OTUs that represent the dominant Bifidobacterium species present in the fecal samples. Although AH1206 was detectable at 7.7 ± 0.9 log10 cells per gram of feces during consumption, abundance of total bifidobacteria, B. longum, and other Bifidobacterium species were not different between probiotic treatment, placebo, and baseline. (E) Relative abundance of B. longum species (OTU11) of the 8 subjects classified as persisters, before, during and after treatments consumption. Although significant changes in Bifidobacterium species were not observed, there was a trend (p < 0.1) for the species B. longum to increase in persisters. Data are represented as mean and standard error in (C)-(E). All figures are based on samples collected at 14 days of probiotic or placebo consumption, and at baseline immediately before probiotic treatment. Figure S3. Differences in community membership between persisters and non-persisters (Related to Figure 3). Relative abundance of bacterial species (determined as OTUs) that differ most among persisters and non-persiters. (A) Collinsella aerofaciens, (B) Eggerthella lenta, and (C) Bifidobacterium longum. Abundance of B. longum quantified by different independent methods 16S rRNA OTU clustering (OTU 11), 16S rRNA Oligotyping (Oligo 3648) and qPCR B. longum species-specific primer. Data in boxplots show the median, first and third quartiles (hinges); whiskers are extended to the highest value within the 1.5 x IQR of the upper and lower hinges. (D) Relative abundance of sequences in the sequence dataset obtained from fecal samples with decreasing similarity with the 16S rRNA sequence of B. longum subsp. longum AH1206 in persisters and non-persisters. Data are represented as mean and standard error. p ≤ 0.01 [ ], p < 0.05 [

]. Figure S4. Absolute quantification of representative genes that discriminate persisters and non-persisters by qPCR throughout the duration of the trial (Related to Figure 5). Gene abundance as gene copy number per gram of feces in persisters and non-persisters at baseline, 28 days, and around 200 days after consumption was discontinued. (A) g2015 - LacI-type response repressor (B) g1077 - LacI-family transcriptional regulator (C) g1467 & (D) g1830 - Permeases of the major facilitator superfamily (COG0477) (E) g1136 - ABC-type sugar transport system permease component (F) g0877 - β-galactosidase. The bacterial strain used as reference for gene copy number (standard curve) was AH1206 with the exception of g0877 in which B. longum ATCC 15707 was used. Data are represented as mean and standard error. p < 0.001 [ ], p ≤ 0.01 [ ], p < 0.05 [ ], n.s. not significant.

* * *

* * * * * *

Page 8: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Figure S5. Gene clusters in Bifidobacterium longum subsp. longum AH1206 with high similarity to clusters in Bifidobacterium breve UCC2003 involved in the metabolism of plant-derived carbohydrates (Related to Figure 4). Comparison between two carbohydrate-utilization gene clusters of B. longum AH1206, which contain discriminative genes, with homologous gene clusters functionally characterized in B. breve UCC2003. (A) BLASTP-based gene comparisons between B. longum AH1206 and B. breve UCC2003 (e-value 0.0001 and 50 % of similarity over 50 % of protein length) revealed an orthologue of g2015 and its surrounding region (BL1206_2015-2019) in a highly homologous cluster in B. breve UCC2003 (similarity above 90 % in BLASTP alignments). This cluster has been functionally characterized to be involved in the degradation of melezitose (plus additional not yet known carbohydrates of presumed plant origin). Orthologue of g2105 in B. breve is adjacent to a gene cluster, also conserved in AH1206, that is involved in metabolism of the plant-derived oligosaccharides, raffinose and stachyose (Bbr_1855-69) (O’Connell et al., 2014). (B) Orthologue of g0877 in the genome of AH1206 and its surrounding region (BL1206_0408-13) is highly homologous to a cluster in B. breve UCC2003 (similarity above 90 % in BLASTP alignments) involved in the utilization of galactan (Bbr_0417-22), a plant-derived galactose-containing polysaccharide (O’Connell Motherway et al., 2010). Both clusters show strain-to-strain variability in the B. breve and B. longum pangenome (20 % and 40 % of strains in B. breve and B. longum, respectively), potentially explaining why B. longum AH1206, a strain containing both clusters, can colonize in cases where other B. longum strains are present but do not possess these clusters.

Page 9: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Supplemental Tables

Table S1. Related to Figure 2. Multilocus sequencing profiles of the original B. longum subsp. longum AH1206 strain and isolates recovered from subjects after ~200 days of persistence. Unknown sequences are denoted as NA, sequences were not obtained due to sequencing errors.

Strain ST clpC fusA gyrB ileS purF rplB rpoB

B. longum subsp. longum AH1206 115 19 27 27 37 27 9 50

isolate 1 - Subject A 116 17 26 24 37 32 9 31 isolate 2 - Subject A - 17 26 24 37 32 NA 31 isolate 3 - Subject A 117 17 26 24 37 32 1 31 isolate 4 - Subject A 117 17 26 24 37 32 1 31 isolate 5 - Subject A 117 17 26 24 37 32 1 31

isolate 1 - Subject H 121 19 27 27 37 27 1 50 isolate 2 - Subject H 121 19 27 27 37 27 1 50 isolate 3 - Subject H - 19 27 27 37 27 NA 50 isolate 4 - Subject H 115 19 27 27 37 27 9 50 isolate 5 - Subject H 115 19 27 27 37 27 9 50

isolate 1 - Subject I 115 19 27 27 37 27 9 50 isolate 2 - Subject I 115 19 27 27 37 27 9 50 isolate 3 - Subject I 115 19 27 27 37 27 9 50 isolate 4 - Subject I 115 19 27 27 37 27 9 50 isolate 5 - Subject I 115 19 27 27 37 27 9 50

isolate 1 - Subject Q 115 19 27 27 37 27 9 50 isolate 2 - Subject Q 115 19 27 27 37 27 9 50 isolate 3 - Subject Q 115 19 27 27 37 27 9 50 isolate 4 - Subject Q 115 19 27 27 37 27 9 50 isolate 5 - Subject Q 115 19 27 27 37 27 9 50

NA: No usable sequence.

Page 10: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Table S2. Related to Figure 3. List of genes that were significantly increased (p < 0.01) after 14 days of B. longum subsp. longum AH1206 consumption. Mean abundance and standard deviation before and after AH1206 consumption are also reported. Coverage to best hit in NBCI database is > to 99% if not indicated.

gene ID Function Taxonomic group of the closest

match in NCBI database (% homology)

p-valueBaseline relative

abundance (mean + SD)

Treatment relative abundance

(mean + SD)

g 045563 Cell division protein Fic Contig: B. longum (99%) Gene: B. longum (100%) 3.11E-06 0.021 ± 0.076 0.164 ± 0.311

g 010923 Anthranilate phosphoribosyltransferase (EC 2.4.2.18)

Contig and Gene: Bifidobacterium longum (99%) 0.0069 0.072 ± 0.167 0.186 ± 0.296

g 075375 Von Willebrand factor, type A Contig and Gene: Bifidobacterium longum (99%) 0.0071 0.095 ± 0.245 0.213 ± 0.361

g 054034 COG1272: Predicted membrane protein hemolysin III homolog

Contig: B. longum (99%) Gene: B. longum (100%) 3.75E-05 0.029 ± 0.104 0.207 ± 0.399

g 071483 protein CapI Contig: B. longum (99%) Gene: B. longum (100%, coverage 99%)

0.0090 0.044 ± 0.098 0.341 ± 1.226

g 088569 FIG00424610: hypothetical protein

Contig: B. longum (99%) Gene: B. longum (100%, coverage 99%)

0.0082 0.053 ± 0.082 0.149 ± 0.197

g 092289 FIG00425372: hypothetical protein, DNA-binding protein

Contig: B. longum (99%) Gene: B. longum (99%), and B. breve (99%)

0.0072 0.055 ± 0.121 0.138 ± 0.189

g 069513 FIG00426986: hypothetical protein

Contig: B. longum (99%, coverage 98%) Gene: B. longum (100%)

0.0046 0.053 ± 0.203 0.101 ± 0.168

g 026887 FIG00672354: hypothetical protein

Contig: B. longum (99%) Gene: B. longum (100%, 98% coverage), and B. angulatum (100%, 98% coverage)

0.0092 0.048 ± 0.123 0.128 ± 0.202

g 069086 Glucosyl-3-phosphoglyceratesynthase (EC 2.4.1.266)

Contig: B. longum (99%) Gene: Bifidobacterium (100%), B. breve (79%), B. longum (76%)

0.0002 0.025 ± 0.081 0.154 ± 0.290

g 112640 Histone acetyltransferase HPA2 and related acetyltransferases

Contig: B. longum (99%) Gene: B. longum (99%) 0.0022 0.049 ± 0.093 0.224 ± 0.336

g 081628 LacI-type transcriptional regulator Contig: B. longum (99%) Gene: B. longum (99%) 0.0025 0.129 ± 0.364 0.326 ± 0.537

g 102524 Membrane protein involved in the export of O-antigen and teichoic acid

Contig: B. longum (99%) Gene: B. longum (100%, 85% coverage)

0.0044 0.081 ± 0.254 0.208 ± 0.386

g 119008 Mobile element protein, integrase Contig: B. longum (98%) Gene: B. longum (99%), and B. breve (94%)

0.0069 0.086 ± 0.273 0.137 ± 0.283

g 005426 Multiple sugar ABC transporter, membrane-spanning permease protein MsmF

Conting and gene: B. longum and B. Breve (100%) 0.0067 0.080 ± 0.134 0.255 ± 0.348

g 000364 Multiple sugar ABC transporter, substrate-binding protein

Conting and gene: B. longum and B. Breve (100%) 0.0089 0.044 ± 0.077 0.155 ± 0.249

g 084249 O-Glycosyl hydrolase family 30 Contig: B. longum (99%) Gene: B. longum (99%) 0.0088 0.040 ± 0.073 0.141 ± 0.229

g 020111 Low molecular weight protein-tyrosine-phosphatase Wzb (EC 3.1.3.48)

Contig: B. longum (99%, 93% coverage) Gene: B. longum (99%)

0.0091 0.134 ± 0.382 0.253 ± 0.403

g 018379 Pyridoxal phosphate-dependent transferase

Contig: B. longum (99%) Gene: B. longum (99%) 0.0020 0.041 ± 0.095 0.134 ± 0.205

g 107147 Bifunctional P-protein, chorismate mutase/prephenate dehydratase

Contig: B. longum (98%) Gene: B. longum (99%), and B. breve (98%)

0.0071 0.087 ± 0.173 0.231 ± 0.326

g 087183 Glycosyltransferase family 1 Contig: B. longum (99%) Gene: B. longum (99%) 0.0091 0.031 ± 0.068 0.270 ± 1.004

g 063968 Putative glycosyl hydrolase of unknown function (DUF1680)

Contig: B. longum (99%) Gene: B. longum (100%) 0.0017 0.121 ± 0.384 0.286 ± 0.525

g 114280 Signal transduction histidine kinase

Contig: B. longum (99%) Gene: B. longum (99%), and B. 0.0049 0.047 ± 0.078 0.162 ± 0.247

Page 11: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

breve (99%)

g 118213 Similar to Glutamate--cysteine ligase (EC 6.3.2.2)

Contig: B. longum (99%) Gene: B. longum (100%) 0.0054 0.044 ± 0.076 0.145 ± 0.226

g 096374 Thiazole biosynthesis adenylyltransferase ThiF

Contig: B. longum (99%) Gene: B. longum (99%) 0.0092 0.134 ± 0.267 0.306 ± 0.446

g 015253 Twin-arginine translocation protein TatA

Contig: B. longum (99%) Gene: B. longum (100%) 0.0022 0.080 ± 0.255 0.202 ± 0.349

g 051132 Twin-arginine translocation protein TatC

Contig: B. longum (99%) Gene: B. longum (99%, coverage 99%)

0.0044 0.134 ± 0.438 0.287 ± 0.495

g 071776 UDP-glucose dehydrogenase (EC 1.1.1.22)

Contig: B. longum (99%) Gene: B. longum (99%) 0.0075 0.026 ± 0.058 0.206 ± 0.739

g 094305 Uncharacterized protein family UPF0145

Contig: B. longum (99%) Gene: B. longum (99%), and B. breve (98%)

0.0010 0.054 ± 0.122 0.236 ± 0.438

Page 12: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Table S3. Related to Figure 3. Taxonomic groups that are significantly different between persisters from non-persisters before probiotic consumption (baseline sample). Mean abundance and standard deviation for each persistence group, p-values (Wilcoxon test), mean importance (Random forest) are reported.

RDP MultiClassifier

Taxa - Family level p-value

Persisters relative

abundance (%)

(mean + SD)

Non-persisters relative

abundance (mean + SD)

Mean importance (Random Forest- excluding outlier)

Mean importance (Random Forest-

All subjects)

Coriobacteriaceae 0.0243 2.89 + 2.15 7.42 + 4.84 2.5 3.17

Taxa - Genus level

Collinsella 0.0169 2.10 + 2.20 6.70 + 4.72 1.78 2.57

Eggerthella 0.0229 0.41 + 0.44 0.10 + 0.15 3.86 3.30

Lachnospiracea incertae sedis 0.0290 15.64 + 3.49 10.78 + 4.55 1.00 1.14

Operational Taxonomic Units (OTUs)

OTU Phylum Closest strain or deepest possible

taxonomic assignation (% sequence homology)

Closest match in NCBI database

(% sequence homology)

p-value

Persisters relative

abundance (mean + SD)

Non-persisters relative

abundance (mean + SD)

Mean importance (Random Forest- excluding outlier)

Mean importance (Random Forest-

All subjects)

OTU 11 Actinobacteria Bifidobacterium longum subsp. longum YIT 12736 (100%) Bifidobacterium longum subsp. suis Su851 (100%)

AB924530.1 (100%) AB924532.1 (100%)

0.0022 0.31 + 0.51 1.99 + 1.78 6.12 5.46

OTU 52 Actinobacteria Eggerthella lenta DSM 2243 (100%) KP233451.1 (100%) 0.0497 0.46 + 0.44 0.09 + 0.15 2.67 1.96

OTU 1 Actinobacteria Collinsella aerofaciens D9-82 (KP233454.1)(100%)

KJ808028.1 (100%) 0.0357 2.02 + 2.35 6.04 + 4.51 1.27 1.52

OTU 44 Firmicutes Clostridiales (order) (98%) AB969374.1 (100% 0.0347 0.19 + 0.51 0.68 + 1.35 0.80 1.07

OTU 12 Proteobacteria Proteobacteria (phylum) (53%) KF095417.1 (100%) 0.0138 1.84 + 1.22 1.38 + 3.08 0.78 1.93

OTU 10 Firmicutes Lachnospiracea incertae sedis (genus)(99%)

JX230487.1 (100%) 0.0205 5.80 + 4.22 3.21 + 3.46 0.04 0.48

OTU 102

Firmicutes Ruminococcaceae (family) (96%) JQ186929.1 (100%) 0.0383 0.00 + 0.00 0.12 + 0.23 < 0.01 < 0.01

Page 13: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Table S4. Related to Figures 3-5. List of genes within B. longum-contigs identified to be discriminant between persisters and non-persisters at baseline. Genes mean abundance and standard deviation for each persistence group, p-values (Wilcoxon test), mean importance (Random forest) are reported. Genes coding for carbohydrate metabolisim related proteins are in blue. Abundance of genes that underrepresented in non-persisters are in red. Discriminant genes that have equivalents in B. longum AH1206 genome are indicated and their percentage of amino acid identity after specified in parenthesis after the function description.

Function gene ID

Persisters relative

abundace (mean + SD)

Non-persisters relative

abundace (mean + SD)

p-value

Mean importance (Random Forest)

COG0477: Permeases of the major facilitator superfamily (AH1206, 100%) g 1830 0.029 ± 0.055 0.268 ± 0.605 0.0080 2.27

RNA methyltransferase (AH1206, 99.1%) g 0009 0.018 ± 0.025 0.098 ± 0.128 0.0028 2.18 FIG00534652: hypothetical protein MFS transporter (99%,99% WP 032684830.1) (AH1206, 100%)

g 0323 0.018 ± 0.018 0.066 ± 0.028 0.0016 2.03

Beta-galactosidase (EC 3.2.1.23) g 0877 0.003 ± 0.009 0.090 ± 0.104 0.0013 1.72 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100) (AH1206, 100%) g 1048 0.036 ± 0.052 0.122 ± 0.047 0.0047 1.68

Amino acid transporter (AH1206, 100%) g 1568 0.004 ± 0.010 0.045 ± 0.071 0.0040 1.30 Dipeptide-binding ABC transporter, periplasmic substrate-binding component (TC 3.A.1.5.2) (AH1206, 100%)

g 0143 0.014 ± 0.025 0.061 ± 0.048 0.0179 1.26

Lacl family transcroptional regulator (AH1206, 100%) g 1077 0.038 ± 0.043 0.090 ± 0.037 0.0166 1.16

dTDP-4-dehydrorhamnose 3,5-epimerase (EC 5.1.3.13) / dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133) ; required for exopolysaccharide precursor (AH1206, 99.9%)

g 0411 0.128 ± 0.131 0.022 ± 0.026 0.0219 1.14

FIG00425886: hypothetical protein membrane protein (86%,100% WP 007054290.1) (AH1206, 100%)

g 0791 0.026 ± 0.034 0.074 ± 0.043 0.0166 1.06

LacI-type transcriptional regulator (AH1206, 100%) g 1833 0.023 ± 0.032 0.091 ± 0.054 0.0035 1.04

Ribosome protection-type tetracycline resistance related proteins, group 2/ Elongation factor G (AH1206, 100%)

g 0186 0.029 ± 0.027 0.066 ± 0.035 0.0184 1.03

FIG00423810: hypothetical protein ABC-type sugar transport system, periplasmic component (86%, 99% CCK35170.1) (AH1206, 99.7%)

g 1574 0.012 ± 0.012 0.026 ± 0.017 0.0274 1.01

Sugar kinase 2, ROK family (AH1206, 100%) g 1627 0.020 ± 0.021 0.056 ± 0.043 0.0306 0.99

ATP synthase delta chain (EC 3.6.3.14) (AH1206, 99.9%) g 0555 0.021 ± 0.017 0.053 ± 0.029 0.0037 0.94

dTDP-4-dehydrorhamnose 3,5-epimerase (EC 5.1.3.13) / dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133); required for exopolysaccharide precursor (AH1206, 99.9%)

g 0404 0.116 ± 0.120 0.017 ± 0.024 0.0104 0.90

ATP-dependent DNA helicase RecG (EC 3.6.1.-) (AH1206, 100%) g 1196 0.032 ± 0.026 0.082 ± 0.044 0.0070 0.89

Lacl-type response repressor (AH1206, 100%) g 2015 0.022 ± 0.040 0.053 ± 0.042 0.0463 0.80 Succinate dehydrogenase iron-sulfur protein (EC 1.3.99.1) (AH1206, 100%) g 0381 0.017 ± 0.021 0.044 ± 0.030 0.0344 0.74

Beta-glucoside bgl operon antiterminator, BglG family (AH1206, 100%) g 0267 0.021 ± 0.027 0.075 ± 0.058 0.0067 0.66

Riboflavin kinase (EC 2.7.1.26) / FMN adenylyltransferase (EC 2.7.7.2) (AH1206, 100%)

g 0189 0.030 ± 0.033 0.081 ± 0.045 0.0113 0.65

ABC transporter permease protein (AH1206, 100%) g 1422 0.034 ± 0.035 0.075 ± 0.054 0.0429 0.55

Succinate dehydrogenase flavoprotein subunit (EC 1.3.99.1) (AH1206, 100%) g 0016 0.031 ± 0.023 0.060 ± 0.033 0.0182 0.54

Page 14: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

FIG00424097: hypothetical protein Pilus assembly protein TadE (99%, 100% WP 007054681.1) (AH1206, 100%)

g 1826 0.016 ± 0.022 0.065 ± 0.067 0.0138 0.53

Transmembrane component STY3231 of energizing module of queuosine-regulated ECF transporter (AH1206, 100%)

g 1705 0.026 ± 0.056 0.056 ± 0.040 0.0329 0.52

Multi antimicrobial extrusion protein (Na(+)/drug antiporter), MATE family of MDR efflux pumps (AH1206, 100%)

g 1272 0.005 ± 0.010 0.021 ± 0.018 0.0411 0.47

Protein containing transglutaminase-like domain, putative cysteine protease (AH1206, 100%)

g 1746 0.040 ± 0.040 0.096 ± 0.050 0.0285 0.41

Proposed amino acid ligase found clustered with an amidotransferase; involved peptidoglycan biosynthesis (AH1206, 100%)

g 1971 0.041 ± 0.034 0.081 ± 0.040 0.0438 0.39

SSU ribosomal protein S19p (S15e) (AH1206, 100%) g 1977 0.131 ± 0.120 0.037 ± 0.033 0.0326 0.39

Lead, cadmium, zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); Copper-translocating P-type ATPase (EC 3.6.3.4) (AH1206, 100%)

g 1042 0.027 ± 0.031 0.070 ± 0.053 0.0400 0.39

FIG00672330: hypothetical protein PF12502 family protein (98%, 100%, EIJ30731.1) (AH1206, 99.6%)

g 1367 0.042 ± 0.046 0.008 ± 0.014 0.0345 0.36

Inosine-uridine preferring nucleoside hydrolase (EC 3.2.2.1) (AH1206, 100%) g 0712 0.026 ± 0.024 0.069 ± 0.067 0.0239 0.36

Glycosyltransferase (EC 2.4.1.-) (AH1206, 100%) g 1179 0.033 ± 0.039 0.076 ± 0.039 0.0363 0.34

Hypothetical sugar kinase in cluster with indigoidine synthase indA , PfkB family of kinases (AH1206, 100%)

g 0733 0.021 ± 0.042 0.057 ± 0.048 0.0276 0.34

Substrate-specific component YkoE of thiamin-regulated ECF transporter for HydroxyMethylPyrimidine (AH1206, 100%)

g 0759 0.022 ± 0.027 0.071 ± 0.047 0.0097 0.26

Beta-galactosidase (EC 3.2.1.23) g 0222 0.005 ± 0.013 0.036 ± 0.036 0.0291 0.26 Glucosamine--fructose-6-phosphate aminotransferase [isomerizing] (EC 2.6.1.16) (AH1206, 100%)

g 1621 0.037 ± 0.031 0.073 ± 0.029 0.0153 0.25

Transposase IS66 (AH1206, 100%) g 0788 0.013 ± 0.026 0.189 ± 0.501 0.0487 0.22 ABC-type sugar transport system, permease component (AH1206, 100%) g 1136 0.023 ± 0.027 0.076 ± 0.062 0.0355 0.21

Activator of (R)-2-hydroxyglutaryl-CoA dehydratase (AH1206, 100%) g 1019 0.024 ± 0.023 0.045 ± 0.018 0.0477 0.20

putative ATP-binding protein (AH1206, 100%) g 2054 0.009 ± 0.016 0.051 ± 0.061 0.0216 0.20 Acetylornithine deacetylase/Succinyl-diaminopimelate desuccinylase and related deacylases; Dipeptidase (AH1206, 100%)

g 1409 0.015 ± 0.027 0.041 ± 0.029 0.0408 0.19

Alpha amylase catalytic subunit, Pseudo-gene (AH1206, 100%) g 1268 0.016 ± 0.029 0.062 ± 0.056 0.0449 0.18

DNA ligase (EC 6.5.1.2) (AH1206, 100%) g 0191 0.030 ± 0.025 0.061 ± 0.032 0.0153 0.18 Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1) (AH1206, 100%) g 1516 0.018 ± 0.022 0.054 ± 0.033 0.0179 0.17

Segregation and condensation protein B (AH1206, 100%) g 0268 0.017 ± 0.021 0.052 ± 0.051 0.0429 0.17

Glycine cleavage system H protein (AH1206, 100%) g 0805 0.026 ± 0.032 0.057 ± 0.035 0.0494 0.16

FIG00427083: hypothetical protein Alpha-amylase (99%,93% WP 013140039.1)

g 0811 0.030 ± 0.040 0.003 ± 0.007 0.0395 0.15

COG0477: Permeases of the major facilitator superfamily (AH1206, 100%) g 1467 0.031 ± 0.059 0.138 ± 0.214 0.0463 0.15

DNA polymerase I (EC 2.7.7.7) (AH1206, 100%) g 1843 0.034 ± 0.031 0.066 ± 0.024 0.0239 0.13

Pullulanase, Pseudo-gene (AH1206, 100%) g 0146 0.021 ± 0.026 0.050 ± 0.032 0.0481 0.06

Page 15: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Uracil phosphoribosyltransferase (EC 2.4.2.9) (AH1206, 100%) g 2048 0.029 ± 0.041 0.100 ± 0.082 0.0274 0.05

GH5 family, putative beta-glucosidase, substrate unknown (AH1206, 100%) g 1756 0.009 ± 0.027 0.043 ± 0.046 0.0442 0.05

Phosphoribosyl-ATP pyrophosphatase (EC 3.6.1.31) (AH1206, 100%) g 0036 0.013 ± 0.020 0.061 ± 0.088 0.0359 0.04

Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25) (AH1206, 100%) g 1520 0.033 ± 0.029 0.078 ± 0.040 0.0104 0.02

FIG00426247: hypothetical protein (Uncharacterized protein) (AH1206, 100%) g 0957 0.010 ± 0.030 0.046 ± 0.047 0.0442 0.02

Probable conserve integral membrane alanine and leucine rich protein/ ABC permease, possibly (metal) ion uptake (AH1206, 97.1%)

g 1451 0.013 ± 0.020 0.039 ± 0.032 0.0494 0.02

LacI-type transcriptional regulator (AH1206, 100%) g 1805 0.012 ± 0.017 0.052 ± 0.041 0.0202 0.01

Multiple sugar ABC transporter, membrane-spanning permease protein MsmF (AH1206, 100%)

g 1768 0.036 ± 0.032 0.086 ± 0.070 0.0311 0.01

FIG00424607: hypothetical protein (endonuclease 99%,99% WP 038426361.1) (AH1206, 100%)

g 0752 0.022 ± 0.034 0.060 ± 0.040 0.0481 0.00

FIG00672110: hypothetical protein (membrane protein 99%, 100%, WP 029680156.1) (AH1206, 100%)

g 0018 0.032 ± 0.036 0.083 ± 0.052 0.0251 < 0

Nicotinate phosphoribosyltransferase (EC 2.4.2.11) (AH1206, 100%) g 0088 0.038 ± 0.033 0.076 ± 0.038 0.0200 < 0

Ribose ABC transport system, permease protein RbsC (TC 3.A.1.2.1) (AH1206, 100%) g 0108 0.038 ± 0.037 0.105 ± 0.096 0.0339 < 0

Protein serine/threonine phosphatase PrpC, regulation of stationary phase (AH1206, 100%) g 0238 0.027 ± 0.043 0.078 ± 0.038 0.0229 < 0

Mobile element protein g 0309 0.006 ± 0.013 0.029 ± 0.033 0.0487 < 0 Formate--tetrahydrofolate ligase (EC 6.3.4.3) (AH1206, 100%) g 0374 0.040 ± 0.032 0.069 ± 0.032 0.0435 < 0

Periplasmic beta-glucosidase (EC 3.2.1.21) (AH1206, 100%) g 1284 0.007 ± 0.020 0.029 ± 0.027 0.0442 < 0

COG family: cell wall-associated hydrolases (invasion-associated proteins); PFAM ID: NLPC P60 (AH1206, 100%)

g 1318 0.048 ± 0.041 0.111 ± 0.099 0.0368 < 0

Macrolide 2'-phosphotransferase; aminoglycoside phosphotransferase (AH1206, 100%)

g 1582 0.036 ± 0.033 0.067 ± 0.026 0.0239 < 0

Page 16: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Table S5. Related to Figures 3-5. Top ranked features able to predict B. longum subsp. longum AH1206 persistence, based on their abundance before consumption. Predictiveness of each feature was estimated by calculating the mean importance score when the values of each variable were permuted at random (1000 iterations). Overall, functional B. longum genes were better predictors of AH1206 persistence than bacterial taxa, since 19 of the top 20 predictive features were genes

Feature Mean importance

Best hit of Taxa or Gene sequence to BLASTX [Taxa] (identity, coverage) (accesion number) Annotation according to InterPro

g 1830 2.04 Permease of the major facilitator superfamily protein [Bifidobacterium longum] (99%, 100%) (ALE35936.1)

Major facilitator superfamily

OTU 11 1.89 Bifidobacterium longum subsp. longum YIT 12736 (100%, 100%) (AB924530.1) Bifidobacterium longum subsp. suis Su851 (100%, 100%) (AB924532.1 )

g 0009 1.83 RNA methyltransferase [Bifidobacterium longum] (100%, 99%) (WP_049135472.1)

Nucleic acid-binding, OB-fold

g 0320 1.69 tRNA (guanine(37)-N(1))-methyltransferase [Bifidobacterium longum] (100%, 99%)(WP_015512427.1)

tRNA (guanine-N1-)-methyltransferase, bacteria

g 0877 1.67 beta-galactosidase [Bifidobacterium longum] (98%, 99%) (WP_014485870.1) Beta-galactosidase C-terminal

g 1048 1.44 MULTISPECIES: 3-oxoacyl-ACP reductase [Bifidobacterium] (100%, 100% )(WP_008783504.1)

Glucose/ribitol dehydrogenase (Family comprises 3-oxoacyl-ACP reductase)

g 1574 1.31 hypothetical protein [Bifidobacterium longum] (100%, 99%)(WP_023658436.1)

None predicted

g 0509 1.3 MULTISPECIES: Rrf2 family transcriptional regulator [Bifidobacterium] (1005, 995)(WP_007052318.1)

Transcription regulator Rrf2-type

g 1568 1.22 amino acid transporter [Bifidobacterium adolescentis] (99%, 99%)(WP_039775744.1)

Amino acid/polyamine transporter I

g 1196 1.19 ATP-dependent DNA helicase RecG [Bifidobacterium longum] (100%, 99%) (WP_011068431.1)

Helicase superfamily 1/2, ATP-binding domain

g 1896 1.18 ATP-dependent Clp protease ATP-binding subunit ClpC [Bifidobacterium longum] (100%, 100%) (WP_014485461.1)

Double Clp-N motif; Clp, N-terminal; AAA+ ATPase domain; P-loop containing nucleoside triphosphate hydrolase

g 0824 1.1 Dihydroxy-acid dehydratase [Bifidobacterium longum E18] (99%, 99%) (ESV34022.1)

Dihydroxy-acid/6-phosphogluconate dehydratase

g 0188 1.04 transcriptional regulator [Bifidobacterium longum] (99%, 98%) (WP_007054012.1)

Periplasmic binding protein-like I

g 1833 1.02 LacI family transcriptional regulator [Bifidobacterium longum] (100%, 99%)(WP_047379820.1)

Transcription regulator HTH, LacI; Lambda repressor-like, DNA-binding domain; Periplasmic binding protein-like I

g 0185 0.91 amino acid ABC transporter substrate-binding protein [Bifidobacterium longum] (100%, 99%) (WP_007051584.1)

Periplasmic metal-binding protein Tp34-type

g 1077 0.88 LacI family transcriptional regulator [Bifidobacterium longum] (100%, 99%) (WP_007051644.1)

Transcription regulator HTH, LacI; Lambda repressor-like, DNA-binding domain; Periplasmic binding protein-like I

g 0555 0.84 ATP synthase F1, delta subunit [Bifidobacterium longum subsp. infantis CCUG 52486] (100%, 99%) (EEQ54416.1)

ATPase, OSCP/delta subunit

g 0411 0.82 dTDP-4-dehydrorhamnose reductase [Bifidobacterium ruminantium] (99%, 99%) (WP_026647254.1)

dTDP-4-dehydrorhamnose 3,5-epimerase-related RmlC-like jelly roll fold; RmlD-like substrate binding domain

g 0791 0.80 Membrane protein [Bifidobacterium longum] (99%, 86%) (WP_032682371.1)

None predicted

g 1627 0.79 MULTISPECIES: NagC family transcriptional regulator [Bifidobacterium] 99%, 100% (WP_032685212.1) Sugar kinase [B. longum subsp. longum KACC 91563] (99%, 99%) (AEI97249.1)

ROK family protein

Page 17: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Supplemental Experimental Procedures Experimental design. A double-blind, placebo-controlled, human crossover study was conducted with the primary objective to determine tolerance and safety of B. longum AH1206 (isolated from a fecal sample of a human infant). The treatment (1010 cells/day of B. longum AH1206 in maltodextrin) and placebo (maltodextrin) were provided by Mead Johnson Nutrition and provided to subjects after randomization, specifically coded for each subject. A total of 24 healthy human adults were recruited. One subject dropped out of the study for reasons unrelated to the experiment and was excluded from data analysis. Thus, the study was completed by 23 individuals (11 males and 12 females) that were between 22 and 38 years of age. The study was conducted over two 7-week test periods. Each period included a baseline, a 14-day treatment and a 28-day test of persistence (TOP) (Figure 1A). Subjects were advised to consume the powders suspended in cold or room temperature water. Although feces is a poor surrogate of the small intestinal microbiota where probiotics may exert their main effect, it is used in the vast majority of probiotic trials as it allows the non-invasive analysis of a large amount of samples and provides a reliable determination of bacterial persistence in the gastrointestinal tract. Fecal samples were collected during baseline (day 0), at day 7 and 14 (last day of treatment) of the treatment period, and at day 4, 8, 15, and 28 of the TOP period. Two additional samples were collected 11 and 20 weeks after the study was completed to test for long-term persistence (sample corresponding to subject D at 20 weeks could not be collected due to reasons unrelated to the study). To evaluate tolerance of AH1206 intake, subjects provided weekly reports of gastrointestinal symptoms (bowel movement, stool consistency, discomfort, flatulence, abdominal pain, and bloating) over the first 4 weeks of each test period using a standardized scale from 0 (none, normal, good well-being) to 3 (severe symptoms and discomfort) (Rattanaprasert et al., 2014). Culture and bacterial DNA extraction from fecal samples. Fecal samples were processed within two hours of defecation. A ten-fold dilution of each sample in sterile phosphate saline (PBS, pH 7.0) was immediately frozen at -80ºC for DNA extractions. Total bifidobacteria were cultured anaerobically (48 h) from selected samples (see below) on Bifidobacterium Selective Iodoacetate Mupirocin" (BSIM) agar (Lewis et al., 2015) for enumeration of viable bacteria and colony typing. Two different methods for DNA extractions were used. In order to optimize DNA yield from bifidobacteria for qPCR and 16S rRNA gene tag sequencing, the method described by Martinez et al. (2015) was used. For whole metagenomic shotgun sequencing, DNA was extracted with the QIAamp DNA Stool Mini Kit (QIAGEN, Germantown, MD, USA) with additional mechanical and enzymatic cell-lysis steps as described by Martinez et al. (2010). Quantification of strain AH1206 and the species B. longum by quantitative real-time PCR (qPCR). The genome sequence of B. longum subsp. longum AH1206 was determined by the use of a Roche 454 FLX Titanium instrument through a commercial sequencing service provider (Eurofins MWG Operon, Germany). Following assembly by Newbler v 2.3 (http://454.com/products/analysis-software/index.asp), remaining gaps and sequencing ambiguities were resolved by Sanger sequencing of corresponding PCR products. This resulted in a single contig sequence of approx. 2.42 Mb (The accession number for the genome sequence reported in this paper is NCBI: CP016019). A PCR system specific for strain AH1206 was developed by scientists from Alimentary Health based on the genome sequence of AH1206 using the ProbeFinder software version 2.5 and the Roche Universal Probe Library Assay Design Centre (Roche Diagnostics). The strain specific primers (See table below) and probe (probe #89; Roche, Universal ProbeLibrary, Indianapolis, IN, USA) combination were selected to target a gene unique to strain AH1206 based on database searches.

List of primers used for absolute abundance quantification of B. longum subsp. longum AH1206 and B. longum species

Bacterial group and sequence (5'-3') Product

size (nt) Annealing Temp (°C) Source Detection

Chemistry B. longum AH1206 60 61 Authors TaqMan®-based

(#89 Roche Universal Probe library)

CAA TGG GTC CAT CAA CCA C CGA ACC TCG GTA ATG TCC AG

B. longum group 110 65 Rinne et al. (2005)

SYBR® Green-based TTC CAG TTG ATC GCA TGG TCT TCT GGC TAC CCG TCG AAG CCA CG

Page 18: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Primers/Probe specificity were validated in silico by performing a BLAST search against the NCBI database (hits had <50% identity and e-value higher than 0.0001). qPCR was performed using a Mastercycler Realplex2 (Eppendorf AG, Hamburg, Germany). Each PCR was performed with 25-µl volumes using QuantiFast Probe PCR master mix (Qiagen, Germantown, MD, USA), strain specific primers (0.8 µM), #89 Roche Universal probe (100 nM), and DNA template (1 µl). Specificity of the primers/probe combination was validated with DNA from twelve related Bifidobacterium strains (see table below).

Bifidobacterium strains used in the validation of qPCR system • B. longum subsp. longum (ATCC® 15707™) • B. adolescentis IVS-1 • B. longum longum JDM301 • B. adolescentis ATCC 15703 • B. longum longum F8 • B. adolescentis L2-32 • B. longum DJO10A • Bifidobacterium sp. 113 • B. longum ATCC 15697 • Bifidobacterium sp. 12_1_47BFAA • B. longum infantis UC272 • Bifidobacterium sp. HMLN14

DNA amplification was only obtained for DNA isolated from B. longum AH1206 strain. To test this PCR system in the context of complex fecal bacterial communities, DNA from 29 human fecal samples were tested. These samples included the baseline samples (i.e. before AH1206 administration) as well as five other human fecal samples from an independent study. One study participant had positive amplification (data not shown). The PCR program that resulted in the most selective amplification after the optimization of the system consisted of an initial denaturalization step of 4 min at 94ºC, followed by 40 cycles of denaturation at 94ºC for 45 sec, annealing for 45 sec at 61ºC, and extension stage at 72ºC for 45 sec. For absolute quantification, a standard curve was prepared using a ten-fold dilution series of DNA isolated from an overnight liquid cultures (14 h) of B. longum subsp. longum AH1206 for which cell numbers were determined by quantitative culture. The same standard curve was used for all absolute quantification experiments (taxa and genes) throughout the study. The detection limit of the qPCR system was determined by spiking fecal samples from five human subjects with a range of B. longum AH1206 cells to achieve 103, 104, 105, 106, 107, 108 cells per gram feces. This analysis demonstrated that the detection limit for the system was around 104 cells/gram. Bifidobacterium longum was quantified with species-specific primers (Rinne et al., 2005) using the RealMasterMix SYBR real-time master mix containing SYBR green (5 Prime Inc., Gaithersburg, MD). Primers sequence and further details for both B. longum species and B. longum AH1206 are shown in the table above. The same standard curve described above was used for the species-specific PCR. Typing of isolates. To determine if AH1206 remains viable during transit through the digestive tract, colonies obtained on BSIM agar plates were typed by strain-specific PCR. Thirty colonies per subject from fecal samples obtained during the two baseline periods and the treatment and placebo period, were picked with sterile toothpicks and transferred into individual wells of a 96 well plate. These isolates were microwaved for 1 min at 1.6 kW to lyse cells and release DNA, and qPCR analysis was carried out as described above. Colonies that showed concentrations equal or higher than 106 CFU were considered B. longum AH1206 positive. To confirm persistence of strain AH1206 for longer time periods, fecal samples collected at 20 weeks after study completion were plated on BSIM agar after serial dilutions, and 30 colonies per subject were picked. Each colony was split and transferred to MRS broth, as well as typed by strain-specific colony PCR. Isolates that typed positive were purified by successive liquid and plate cultures, and positive reactions with strain-specific PCR were confirmed. Isolates were then typed by Multi-Locus Sequence Analysis (MLSA) as described by Deletoile et al. (2010). Seven housekeeping genes clpC (coding for a protease), fusA (GTP-binding elongation factor EF-G), gyrB (DNA gyrase, subunit B), ileS (isoleucyl-tRNA synthetase), purF (amidophosphoribosyltransferase), rplB (50S ribosomal subunit protein L2) and rpoB (beta-subunit of RNA polymerase), were sequenced by methods originally described by Ventura et al. (2006) and Santos and Ochman (2004) and modified by Deletoile et al. (2010) (see table below). Amplification products were purified (QIAquick PCR Purification Kit, Qiagen Inc., Maryland) and sequenced by a commercial provider (Eurofins MWG Operon, Huntsville, AL). Allele types and sequence type (ST) were assigned using the tools available in the Bifidobacterium MLST sequence type database

Page 19: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

(www.pasteur.fr/mlst). Sequences that did not match alleles in the database were added to the list of alleles and assigned new numbers. The accession number for the raw sequences reported in this section is NCBI SRA: SRS1539366, and SRS1578670 to SRS1578688.

List of primers used in the study for MLST sequencing

Gene Sequence (5'-3') Product size (nt)

Annealing Temp

(°C) Coding for a protease clpC GAG TAC CGC AAG TAC ATC GAG

600 60 CAT CCT CAT CGT CGA ACA GGA AC GTP-binding elongation factor EF-G

fusA ATC GGC ATC ATG GCY CAC ATY GAT 666 60 CCA GCA TCG GCT GMA CRC CCT T

DNA gyrase, subunit B gyrB AGC TGC ACG CBG GCG GCA AGT TCG 627 60 GTT GCC GAG CTT GGT CTT GGT CTG

Isoleucyl-tRNA synthetase ileS ATC CCG CGY TAC CAG ACS ATG 489 60 CGG TGT CGA CGT AGT CGG CG

Amidophosphoribosyltransferase purF CAT TCG AAC TCC GAC ACC GA 591 60 GTG GGG TAG TCG CCG TTG

50S ribosomal subunit protein L2 rplB GGA CAA GGA CGG CRT SCC SGC CAA 357 60 ACG ACC RCC GTG CGG GTG RTC GAC

Beta-subunit of RNA polymerase rpoB GGC GAG CTG ATC CAG AAC CA 501 60 GCA TCC TCG TAG TTG TAS CC

Sequencing of 16S rRNA gene tags and whole-metagenome sequencing (WMS). The fecal microbiota in samples collected during the baseline (of AH1206 treatment period), the last day of AH1206 consumption, the last day of placebo consumption, and 28 days of TOP was characterized by next-generation sequencing as described by Krumbeck et al. (2015). Briefly, the V5-V6 region of the 16S rRNA gene was amplified by PCR with primers F (5'-CGACRRCCATGCANCACCT-3') and R (5'-RGGATTAGATACCC-3'). Primer R contained unique barcode sequences that supports pooling different samples. DNA concentration was quantified by qPCR and adjusted to the required concentration; amplicons were paired-end sequenced with MiSeq (Illumina) technology at the University of Minnesota Genomics Center. Sequence quality was visualized with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The reads’ 3’ ends were trimmed using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) to remove low quality bases, and paired reads were assembled using merge-illumina-pairs tool from Illumina-utils (https://github.com/meren/illumina-utils), at which step primers were removed and Q30 check was enforced. Subsequently, the number of reads was standardized across samples by subsampling 20,000 sequences/sample. UPARSE v7.0.1001 (Edgar, 2010) was used to further process reads (including chimera removal) and assign them to Operational Taxonomic Units (OTUs) at 98% identity. Representative sequences of each OTU were classified using RDP Classifier, MultiClassifier v1.1 tool (Wang et al., 2007). Sequences were also analyzed using Minimum Entropy Decomposition (MED) (Eren et al., 2015). This allowed us to investigate diversity of ecological units with higher resolution than OTUs does. In parallel, all quality-controlled reads were taxonomically classified using the RDP Multiclassifier tool (phylum to genus levels). WMS was used to characterize the microbiome in all subjects during baseline, the last day of consumption of the probiotic, and the 20 week follow-up sample. The Nextera DNA Sample Prep Kit (Illumina, San Diego, CA, USA) was used for library preparation. For each sample, 50 ng of fecal DNA was tagged and fragmented simultaneously using the Nextera transposome, and DNA was purified using the Zymo™ Purification Kit (ZR-96 DNA Clean & Concentrator™-5, Zymo Research Corporation, Irvine, CA, USA). Fragments were amplified and indices and common adapters were added in a limited PCR program. In a final purification step, DNA was cleaned using AMPure XP beads (Agencourt AMPure XP, Beckman Coulter Genomics, Indianapolis IN, USA) following the manufacturer’s instructions. Concentration of the purified DNA was determined with the dsDNA High Sensitivity Qubit® fluorometer kit (Invitrogen, Grand Island, NY, USA) and samples were pooled for a final mix contained concentration of 5 ng/µl per library. For each library, eight samples were pooled in order to achieve around 15 million reads/lane. lllumina HiSeq2500 platform was used to perform 100 bp paired-end read sequencing at the David H. Murdock Research Institute (Kannapolis,

Page 20: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

NC, USA). Reads pre-processing consisted in quality control based on Q values (> 20 Q), removal of adaptors fragments, control for a minimum length of (> 45 nt), removal of orphan reads, and human contaminant reads. Sequence reads were quality filtered with >90% of bases with a minimum of 20 Q score being acceptable using the quality filter script from FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Sequences that contained fractions (at least 7 nucleotides) of the adaptor were identified with the find_adaptor script of Biopices (https://code.google.com/p/biopieces/), and these fragments were removed using clip_adaptor (biopieces). Sequences less than 45 nucleotides long and orphan sequences were removed using custom scripts, and sequences identified as human DNA by aligning each read pair to the latest assembly version (GRCh38) of the human genome (ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals /Homo_sapiens/) with Bowtie2 (Langmead and Salzberg, 2012). Pairs were discarded when both its reads matched to the reference database with a maximum separation of 1000 (-X 1000). The sample corresponding to subject Q at day 14 of AH1206 treatment was not successfully amplified, so no WMS sequencing data is available for this sample. The MetaPhlAn pipeline (Segata et al., 2013), which relies on unique clade-specific marker genes, was used to assign reads to taxonomic units and obtain the composition of the fecal microbiome. Furthermore, to characterize the composition of gene functions within the communities, two different approaches were employed using either single or assembled reads. Sequence sets were directly processed through the MG-RAST pipeline to predict and annotate functions (SEED SubSystems) (Meyer et al., 2008). In parallel, reads corresponding to the baseline samples of all subjects were assembled with Ray Meta (v2.3) (Boisvert et al., 2012) with a k-mer size of 27. Short reads were then mapped against the assembled contigs (>5 Kbp) (≥ 98% sequence similarity) with Bowtie2 (-N 1 and default parameters) (Langmead and Salzberg, 2012). Protein prediction and annotations were done using MyRast, implemented in Anvi’o (Eren et al., 2015), which also provided the mean coverage (relative abundance) of each predicted gene across samples. Anvi’o was further used to generate, visualize, and quantify genomic bins in the metagenomic dataset. The estimation of the relative abundance of genes belonging to all members in the community was used to evaluate changes in the gut microbiota induced by the treatment. In addition, reads of the 3 sequenced samples (baseline, 14 days of probiotic consumption and 20 week follow-up sample) of each subject were co-assembled (Ray Meta, v2.3, Boisvert et al., 2012), and genomic bins identified as B. longum in Anvi’o were extracted to construct a non-redundant database of this species. Sequence sets from baseline samples were mapped against this B. longum database (with Bowtie 2), and reads with ≥ 98% sequence similarity were further analyzed and visualized with Anvi’o. The latter approach was used to evaluate functional composition differences between persisters and non-persisters. WMS was also used to study the relative abundance of AH1206-specific genes over time. For this purpose, we identified genes unique to AH1206 that were undetected at baseline by mapping baseline metagenomic reads to the genome of AH1206 (using Bowtie 2 with 100% homology). Subsequently, reads of all samples for which we had WMS data (baseline, 14 days of consumption and ~200 days TOP) were mapped against the AH1206-specific genes. Mean coverage for each sample and visualization was obtained with Anvi’o. Raw 16S gene tags and whole-metagenome sequences used for analysis were deposited in the NCBI SRA database. The accession number for the raw sequences reported in this section is NCBI SRA: SRS1483393, SRS1489974, SRS1490014, SRS1490016, SRS1490018, SRS1490022, SRS1490042, SRS1490064, SRS1499910, SRS1501559, SRS1531233, SRS1531244, SRS1531255, SRS1531266, SRS1531221 to SRS1531224, and SRS1531273 to SRS1531276 for baseline samples; SRS1531225 to SRS1531232, SRS1531234 to SRS1531243, and SRS1531245 to SRS1531248 for last day of AH1206 consumption; SRS1531249 to SRS1531254, SRS1531256 to SRS1531265 and SRS1531267 to SRS1531272 for last day of placebo consumption; and SRS1563115, SRS1563117, SRS1563118, SRS1563120, SRS1563122, SRS1563124, SRS1563125, SRS1563127, SRS1563129 to SRS1563141 for 20 week follow-up samples. Ecosystem characteristics. Community evenness was assessed using the Gini index (Gini, 1912). To evaluate α-diversity (diversity within samples), three different measurements were used: total OTUs, Shannon, and Simpson indexes. Dissimilarity among samples (β-diversity) was computed based on Bray-Curtis and Morisita-Horn metrics. The β-diversity measurements were also used to assess temporal stability of an individual’s fecal microbiota (after removing the OTU or MED node that accounts for AH1206). Community stability was calculated as the average of the distance (β-diversity) between all consecutive time points. Gini index, alpha and beta diversity were calculated using QIIME tools v 1.8.0 (Caporaso et al., 2011).

Page 21: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Bray-Curtis distances between samples were calculated and visualized by nonmetric multidimensional scaling (NMDS). R statistic program (R version 3.0.1, R Core Team, 2013) and ggplot2 package were used to generate NMDS was calculated and plot with the function vegdist (package vegan 2.0-10) (Oksanen et al., 2012). Absolute quantification of functional genes by qPCR. To confirm findings obtained with the metagenomics analysis, qPCR systems (using SYBR detection chemistry) were developed for representative genes discriminant between persisters and non-persisters. Gene sequences were extracted with Anvi’o and candidate primer sequences were selected with the Primer 3 software (Rozen and Skaletsky, 2000). Candidate primers were evaluated for hairpin and dimer formation with Netprimer (Premier Biosoft International, Palo Alto, CA). Afterwards, primers were tested in silico against the NCBI nucleotides database. High identity matches corresponded exclusively to bacterial strains that belonged to the species B. longum. Efficiency of the PCR systems was tested for each primer and qPCR was conducted to quantify absolute levels of each of the genes in fecal samples collected at the baseline, 28 and ~200 days of TOP. One µl of DNA template, and primers at a concentration of 0.8uM were used in 25 µl PCR reactions. Details regarding selected genes, primers sequence and annealing temperatures are shown in the following table.

List of primers used in the study for absolute abundance quantification of discriminant genes

Gene Function Sequence (5'-3') Product size (nt)

Annealing Temp (°C)

LacI-type response repressor g2015 CAT TGC GCG TGG TAC ATG AG 206 62 AGT CGT ATG AGG GTC GTC CA

LacI family transcriptional regulator g1077 AAG CTC GAC TGG TTC ACC TC 204 62 GGT GTT GAG TCG CCT GAT TT

COG0477: Permeases of the major facilitator superfamily g1830 CAA GAG AAC CCT GCG TGA CT 229 62

GAT TGC AGG TAC GGT GTC CA COG0477: Permeases of the major facilitator superfamily g1467 ACA GTG GAA CCA GCG TCT TC 224 62

GGC TGA TTG TGA AGA GGG GA β -galactosidase g0877 ACG GCG ATT CTG TTT AGT GC 166 62 GGT AGC CGC TCC AAT CGT A ABC-type sugar transport system, permease component g1136 AAG CCG ATC ATG GTG TTC GT 236 62

CAC AGG ATC TGC CAG TCG TT Statistical methods. For parameters measured at different time points within treatments (such as self-reported symptom scores and levels of B. longum AH1206 in fecal samples), repeated measures two-way ANOVAs with Bonferroni correction (with time and treatment as factors) were used. For measurements done only once during each treatment (such as the proportion of strain AH1206 among Bifidobacterium isolates), repeated measures one-way ANOVAs were used to determine treatment effects. When the data was not normally distributed, non-parametric tests were used to determine the impact of B. longum AH1206 treatment on the gut microbiota. Friedman tests, which account for variability between subjects, were used to determine the effect of treatment on microbiota taxonomic and functional composition and diversity measurements. Due to the long-term persistence of B. longum AH1206, a crossover effect was observed in three of the subjects (the strain persisted and remained detectable during the placebo treatment). Therefore, placebo data points of these three subjects were removed for statistical purposes. In addition, bacterial abundances of pair-wise comparisons between time points were compared using Mann-Whitney Wilcoxon. Analyses were conducted using R statistical program (R version 3.0.1, R Core Team, 2013). Comparisons between persisters and non-persisters were performed to determine which taxa, functions encoded by the microbiome, and specific genes differed between the two groups. Mann-Whitney Wilcoxon tests were conducted to compare bacterial and gene abundances, α-diversity, stability, evenness and functional richness between groups. Permutational multivariate analysis of variance (Adonis PERMANOVA) based on sample distances (β-diversity) was used to study changes in the community composition due to treatments consumption (200 permutations). Adonis PERMANOVA was run using the adonis function in the vegan package (Oksanen et al., 2012). All analyses were conducted using R statistical program (R version 3.0.1, R Core Team, 2013).

Page 22: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

In order to identify discriminative independent variables, generate predictive models, and compare the importance of individual variables to predict the persistence of AH1206, the supervised learning technique Random Forest (RF) was used (Knights et al., 2011). The generalization error of the classifier was estimated using the leave-one-out cross-validation for which each group of samples was predicted by a model trained on the 21 remaining samples. RF was trained using the default settings for the RF package in R (Liaw and Wiener, 2002). By calculating the mean importance score when the values of each feature (OTU or gene function) were permuted at random (1000 iterations), the predictive power of each variable could be estimated. RF is especially useful for recognizing patterns in highly complex data sets such as human microbiota surveys and overcome limitations of classical hypothesis testing methods. For this reason, and taking into account the relatively small sample size of our study and the complex nature of the gut ecosystem, the p-values obtained with the Wilcoxon test for multiple comparisons were not corrected, but instead RF was used as an additional statistical approach to identify and confirm discriminative taxonomic groups and genes. There was a great overlap in the features (taxa and genes) identified with both methodologies (100% of taxonomic discriminant groups and 90% of the genes identified with Wilcoxon were also detected by RF), and both methods resulted in the same conclusions.

Page 23: María X. Maldonado-Gómez, Inés Martínez, Francesca Bottacini, … · 2017. 9. 17. · the median, first and third quartiles (hinges); whiskers are extended to the highest value

Supplemental References Boisvert, S., Raymond, F., Godzaridis, E., Laviolette, F., Corbeil, J. (2012). Ray Meta: scalable de novo

metagenome assembly and profiling. Genome Biol. 13, R122.

Delétoile, A., Passet, V., Aires, J., Chambaud, I., Butel, M.J., Smokvina, T., Brisse, S. (2010). Species delineation and clonal diversity in four Bifidobacterium species as revealed by multilocus sequencing. Res. Microbiol. 161, 82–90.

Edgar, R.C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–1.

Eren, A.M., Morrison, H.G., Lescault, P.J., Reveillaud, J., Vineis, J.H., Sogin, M.L. (2015). Minimum entropy decomposition : Unsupervised oligotyping for sensitive partitioning of high- throughput marker gene sequences. Isme J 9, 968–979.

Gini, C.W. (1912). Variability and mutability, contribution to the study of statistical distributions and relations, in: Studi Economico-Giurcici Della R. Universita de Calgari.

Langmead, B., Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359.

Lewis, Z.T., Totten, S.M., Smilowitz, J.T., Popovic, M., Parker, E., Lemay, D.G., Van Tassell, M.L., Miller, M.J., Jin, Y.-S., German, J.B., et al. (2015). Maternal fucosyltransferase 2 status affects the gut bifidobacterial communities of breastfed infants. Microbiome 3, 15–17.

Liaw, A, Wiener, M. (2002). Classification and Regression by randomForest. R news 2, 18–22.

Martínez, I., Kim, J., Duffy, P.R., Schlegel, V.L., Walter, J. (2010). Resistant starches types 2 and 4 have differential effects on the composition of the fecal microbiota in human subjects. PLoS One 5, e15046.

Martínez, I., Stegen, J.C., Maldonado-Gómez, M.X., Eren, A.M., Siba, P.M., Greenhill, A.R., Walter, J. (2015). The gut microbiota of rural Papua New Guineans: Composition, diversity patterns, and ecological processes. Cell Rep. 527–538.

Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., et al. (2008). The metagenomics RAST server—a public resource for the automatic phylo-genetic and functional analysis of metagenomes. BMC Bioinformatics 9, 386.

Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., Minchin, P.R., O’Hara, R.B., Simpson, G.L., Solymos, P., Stevens, M.H.H., Wagner, H. (2012). vegan: Community ecology package.

Rinne, M.M., Gueimonde, M., Kalliomäki, M., Hoppu, U., Salminen, S.J., Isolauri, E. (2005). Similar bifidogenic effects of prebiotic-supplemented partially hydrolyzed infant formula and breastfeeding on infant gut microbiota. FEMS Immunol. Med. Microbiol. 43, 59–65.

Rozen, S., Skaletsky, H. (2000). Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 132, 365–386.

Santos, S.R., Ochman, H. (2004). Identification and phylogenetic sorting of bacterial lineages with universally conserved genes and proteins. Environ. Microbiol. 6, 754–759.

Segata, N., Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., Huttenhower, C. (2013). Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9, 811–814.

Ventura, M., Canchaya, C., Del Casale, A., Dellaglio, F., Neviani, E., Fitzgerald, G.F., van Sinderen, D. (2006). Analysis of bifidobacterial evolution using a multilocus approach. Int. J. Syst. Evol. Microbiol. 56, 2783–2792.

Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R. (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–7.