This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Liu et al., Sci. Adv. 2020; 6 : eaba4901 29 May 2020
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
1 of 9
M I C R O B I O L O G Y
Mycobacterium tuberculosis clinical isolates carry mutational signatures of host immune environmentsQingyun Liu1,2*, Jianhao Wei3*, Yawei Li4,5*, Mei Wang3, Jun Su3, Yonghui Lu3, Mariana G. López6, Xueqin Qian3, Zhaoqin Zhu3, Haiying Wang7, Mingyun Gan8, Qi Jiang1,9, Yun-Xin Fu10, Howard E. Takiff11,12, Iñaki Comas6,13, Feng Li3†, Xuemei Lu14,15†, Sarah M. Fortune2,16,17†, Qian Gao1,9†
Mycobacterium tuberculosis (Mtb) infection results in a spectrum of clinical and histopathologic manifestations. It has been proposed that the environmental and immune pressures associated with different contexts of infection have different consequences for the associated bacterial populations, affecting drug susceptibility and the emergence of resistance. However, there is little concrete evidence for this model. We prospectively collected sputum samples from 18 newly diagnosed and treatment- naïve patients with tuberculosis and sequenced 795 colony-derived Mtb isolates. Mutant accumulation rates varied considerably between different bacilli isolated from the same individual, and where high rates of mutation were observed, the mutational spectrum was consistent with reactive oxygen species– induced mutagenesis. Elevated bacterial mutation rates were identified in isolates from HIV-negative but not HIV-positive individuals, suggesting that they were immune-driven. These results support the model that mutagenesis of Mtb in vivo is modulated by the host environment, which could drive the emergence of variants associated with drug resistance in a host-dependent manner.
INTRODUCTIONInfection with Mycobacterium tuberculosis (Mtb) causes a spectrum of clinical outcomes, from latent infection to active disease. During latent infection, Mtb can survive in the infected host for decades, creating a reservoir that continuously fuels the global tuberculosis (TB) epidemic (1). Control of the TB epidemic has been complicated by the emergence of high-level antibiotic resistance, which occurs through de novo chromosomal mutations (2). We understand little about the rates and drivers of mutation in Mtb within the infected host where the bacterial population is thought to face a range of different environments and immunologic stresses in different disease states, as reflected by different histopathologic manifestations of in-fection, including granuloma formation and less-organized areas of inflammation. It is postulated that host immune stressors drive the
bacterial population toward a nonreplicative state. Thus, replication- associated mutations are predicted to accumulate at very low rates (3–5). However, a study in the macaque TB model found that Mtb accumulated mutations at roughly the same rate during both active disease and early latent infection as during rapid growth in vitro (5). This mutant accumulation rate is remarkably similar to the molec-ular clock estimated from the study of human Mtb isolates, suggest-ing that there are additional drivers of mutation in vivo (6).
Whole-genome sequencing has been used to track the evolution of Mtb bacilli within the host during antibiotic treatment and demonstrate how drug-resistant mutations arise and become fixed in the popu-lation (7–10). When a patient develops active TB, they are estimated to harbor 1010 to 1012 bacilli (11), and such a large population should contain numerous genetic mutations that could be used to study the mutagenesis of Mtb in vivo (8, 10, 12). Because antibiotic treatment can reduce Mtb population and diversity very quickly, such muta-tional records should be preserved only in treatment- naïve patients. By applying whole-genome sequencing to large numbers of in-dividual colonies recovered from 18 patients, we characterized the genetic diversity of Mtb populations at the onset of TB disease and reconstructed the within-host evolution of Mtb. This approach pro-vided sufficient resolution to characterize the mutational signature of Mtb at the individual colony level, which should represent single bacilli in vivo. Our data indicate that mutagenesis of Mtb in vivo is modulated by the host environment, suggesting that the risk of de novo drug resistance varies in a host-dependent fashion.
RESULTSSampling Mtb population before antibiotics treatmentBetween 1 February 2017 and 31 December 2018, we recruited 18 new, smear-positive patients with TB who had not taken any anti-TB drugs before the diagnosis. All of the isolates were pan-susceptible except for one isolate that was resistant only to isoniazid (table S1). We collected three to five sputum samples from the new patients
1Shanghai Public Health Clinical Center, Key Laboratory of Medical Molecular Vi-rology (MOE/NHC/CAMS), Shanghai Medical College and School of Basic Medical Sciences, Fudan University, Shanghai, China. 2Department of Immunology and In-fectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA. 3Shanghai Public Health Clinical Center, Fudan University, Shanghai, China. 4CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Ge-nomics, Chinese Academy of Sciences, Beijing 100101, China. 5University of Chinese Academy of Sciences, Beijing 100049, China. 6Tuberculosis Genomic Unit, Instituto de Biomedicina de Valencia (IBV-CSIC), Valencia, Spain. 7Yueyang Hospital of Inte-grated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China. 8Molecular Medical Center, Children’s Hospital of Fudan University, Shanghai, China. 9Shenzhen Center for Chronic Disease Con-trol, Shenzhen, China. 10Department of Biostatistics and Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030, USA. 11Integrated Mycobacterial Pathogenomics Unit, Institut Pasteur, Paris, France. 12Nanshan Center for Chronic Disease Control, Shenzhen, China. 13CIBER in Epidemiology and Public Health (CIBERESP), Madrid, Spain. 14State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China. 15CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China. 16Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA. 17Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.*These authors contributed equally to this work.†Corresponding author. Email: [email protected] (F.L.); [email protected] (X.L.); [email protected] (S.M.F.); [email protected] (Q.G.)
Liu et al., Sci. Adv. 2020; 6 : eaba4901 29 May 2020
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
2 of 9
with TB on the day of diagnosis, which allowed us to deeply sample the Mtb populations in their pulmonary lesions that communicated with the airways. The sampling size of the Mtb population was esti-mated to be between 25,000 and 150,000 bacilli per patient, accord-ing to the conversion index between smear microscopic scores and bacterial load (13, 14). The sputum sample from each patient was digested, dispersed, serially diluted, and then cultured on Löwenstein- Jensen (L-J) solid medium (Fig. 1A). The remainder of the undiluted sputum was cultured separately to represent the whole population of bacilli present. For each patient, we randomly picked ~50 well- separated colonies for whole-genome sequencing (Fig. 1A). In addi-tion, we scraped all the colonies from the whole population plates of nine patients for deep whole-genome sequencing (Fig. 1A).
Sequencing single colonies can delineate the Mtb population structureIn total, we obtained whole-genome sequence data for 795 single colonies from 18 patients (average sequencing depth is 119.4) and nine scraped whole populations (average sequencing depth is 726.3). Sequencing reads were aligned to the reconstructed ancestral genome of the Mtb complex to detect single-nucleotide polymorphisms (SNPs) (15). The SNPs shared by all colonies from a given patient were de-fined as “inherited SNPs,” indicating phylogenetic SNPs that were fixed before the infection. The study then focused on the SNPs that were present in only a proportion of colonies that were termed “de novo SNPs” to indicate their de novo accumulation during infec-tion. To verify that sequencing of ~50 colonies would represent the Mtb population structure in the original samples, we compared the frequency of SNPs detected in the deep-sequenced scraped whole population samples with their ratio in the single colonies. We de-tected 107 SNPs with a frequency above 1.5% in the nine scraped samples, of which 67 (62.6%) could be detected with a similar fre-quency in the corresponding single colonies (Pearson’s correlation coefficient: 0.982; Fig. 1B and fig. S1). The SNPs with higher fre-quencies showed better correlation, while the SNPs with frequencies between 1.5 and 15% presented considerable variation between the scraped population samples and the individual colonies (Pearson’s correlation coefficient: 0.503; Fig. 1B). In addition, we detected 224 SNPs that were present in only one or two single colonies but were not detected in the scraped samples, indicating the increased sensi-tivity of single-colony sequencing for capturing low-frequency mu-tations in the population.
Mtb diversity at the onset of TB diseaseThere was a broad range in the total number of de novo mutations found in the bacterial populations isolated from each patient (0 to 116 SNPs; Fig. 1C), with the average number of de novo mutations varying from 0 to 11.3 SNPs (fig. S2A). However, the SNP distance between any two colonies from the same patient was much lower than between any two strains from different patients (fig. S2, B and C). Mapping of both the fixed and unfixed SNPs from each patient to our phylogenomic database for the identification of multiple evolu-tionary paths (16) did not suggest any mixed infections in our samples. Of the 18 Mtb strains, 17 were lineage 2 strains and one belonged to lineage 4 (table S1).
Because the SNP distance between different bacilli within a sin-gle individual is an important reference for establishing the SNP threshold for epidemiologically defining transmission clusters (17), we calculated the pairwise SNP distance between any two single col-
onies from each patient (Fig. 1D). For 91.0% of the pairs, the differ-ence was ≤5 SNPs, and for 98.8% of the pairs, the difference was ≤12 SNPs, indicating that these two widely used thresholds (5 or 12 SNPs) encompass most of the genetic distance between different bacilli within a patient. However, in 7 of 18 patients, there were col-ony pairs that differed by more than 5 SNPs and 3 of 18 patients had colony pairs that differed by more than 12 SNPs, indicating that these larger SNP distances can occur within an individual and thus could occasionally be found between isolates from cases linked by direct transmission.
Two patterns of Mtb population growthWe next used the minimum evolution method to reconstruct phy-logenetic trees for the single colonies from each patient, setting the inferred ancestral genome as the root (Fig. 2). We found that the in vivo populations of Mtb separated into two different patterns: “starlike expansion” and “stepwise growth”. Ten patients were char-acterized by the starlike expansion model (Fig. 2A). For these cases, only a few colonies showed de novo SNPs, while the majority main-tained the ancestral genome (Fig. 2A). This pattern suggests a bac-terial population derived from a recent expansion, with a starting population that was genetically homogeneous. This observation was consistent with a model in which a small number of Mtb bacilli es-tablished the infection, proliferated, and caused TB disease relatively rapidly (18).
By contrast, eight patients presented a pattern of stepwise growth, typically suggesting two or three stages (Fig. 2B): (i) growth after the initial infection with divergence into subpopulations containing different SNP markers, (ii) a limited number of subpopulations achieving a second wave of growth with further genetic differentia-tion of the within-host Mtb population, and (iii) recent expansion of one or a very few of the subpopulations. It appeared that 68.7 to 96.0% of the colonies were the result of the expansion at the third stage. These inferences rely on the assumption that the initial infec-tion was established by a single or a few genetically homogeneous bacilli of Mtb, as appeared to be typical in the cases with starlike expansions. However, the possibility of initial infections with several different, closely related bacilli cannot be conclusively ruled out. In either scenario, the patterns suggested that a large proportion of the Mtb population at the time of clinically detectable TB disease de-rived from a recently expanded subpopulation.
Mutation rate varies between bacilli within a single individualThere was a variation in the number of de novo SNPs that accu-mulated in different colonies from the same patient (Fig. 1E). For example, for patients J and P in the starlike group, and patients A, C, I, K, Q, H, and S in the stepwise group, a few colonies (marked by gray stars) accumulated 4 to 14 de novo SNPs, while most colonies maintained the ancestral genotypes (Fig. 2). Because the in vivo mutation rate of Mtb is approximately 0.24 to 0.5 SNPs per genome per year (17, 19–21), this disparity suggested that the mutation rate in vivo might not be uniform. To test this, we simulated Mtb popu-lation growth in silico using a growth model with a constant muta-tion rate (Wright-Fisher model) (22). The simulated data showed that the distribution of numbers of de novo SNPs appeared as a very convergent Poisson distribution with only one peak (Fig. 3). In con-trast, the observed data for patients H, I, J, and S showed multiple peaks in the distribution of the number of de novo SNPs. A comparison
Liu et al., Sci. Adv. 2020; 6 : eaba4901 29 May 2020
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
3 of 9
of the 0.9 quantiles in the observed and simulated data shows that H, I, J, and S had several colonies with more de novo SNPs than were predicted (Fig. 3), suggesting a deviation from a constant mutation rate model. An alternative explanation might be positive selection that preferentially selected beneficial mutations. To test this possi-bility, we compared the proportion of nonsynonymous to synony-mous mutations (pNS, similar to dN/dS; see details in Materials and Methods) in the de novo SNPs from the four patients whose colo-nies had the most SNPs (H, I, J, and S) against those of the other patients. We found that the average pNS value for the four patients with high SNP colonies was 0.64 (H, 0.67; I, 0.56; J, 0.83; S, 0.46), while it was 0.79 for the other patients. This indicates a purifying selection in all patients and argues against positive selection as an explanation for the high SNP colonies.
Enhanced mutation rate likely induced by reactive oxygen speciesThe four patients, H, I, J, and S, whose colonies suggested non- constant mutation rates also had larger numbers of total de novo SNPs (Fig. 1C), and the majority of the SNPs were C>T and G>A,
the base changes associated with oxidative damage (Fig. 4A) (23). Although oxidative damage mutations are the major source of genetic changes during in vivo growth (6, 23), the ratio of these mu-tations in patients H, I, J, and S was significantly higher than the average level for all patients (80.6% versus 67.1%, P < 0.0001) and even higher than that of the fixed SNPs in a collection of 8399 ge-nomes from global isolates (80.6% versus 47.2%, P < 0.0001; fig. S3). The most notable case was patient H, where 101 of the 116 (87.1%) total de novo SNPs were due to the base changes associated with oxidative damage. We further created histograms of mutation-type compositions for each colony and mapped them to the phylogenetic tree for each patient (Fig. 4B and fig. S4). For the 40 colonies from patient H, 27 had only 0 to 3 de novo SNPs, while 13 colonies had 4 to 14 de novo SNPs. The extra SNPs in these 13 colonies were ex-clusively mutations associated with oxidative damage—C-T and double CC-TT mutations (Fig. 4B)—which are thought to indicate exposure to reactive oxygen species (ROS) (24, 25). We also observed a similar pattern with I, J, and S (Fig. 4B). Moreover, the colonies with longer mutation branches from patients A, K, Q, and P can also be explained by C-T and CC-TT mutations (fig. S4).
0
5
10
15
A B C E F H I J K L M O P Q S TSubject ID
Num
ber
of d
e no
vo S
NP
s
0
25
50
75
100
125
A B E O P Q R S T C F G H I J K L MSubject ID
Num
ber
of m
utat
ions
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
0.05
0.10
0.15
0.00 0.05 0.10 0.15
Ratio in single colonies
Freq
uenc
y in
pop
ulat
ion
isol
ates
r = 0.982
r = 0.503
BA
WGS of the whole population (>700×)
WGS of 50 single colonies (>70×)
New patients with TBbefore treatment
3–5 qualified sputum samples
Serial dilution to obtain single colonies
0
30
60
90
120
15 20 25
0
5000
10,000
15,000
0 5 10 15 20 25Pairwise SNP distance
Num
ber
of p
airs
C D ENot detected in scraped samples
Detected in scraped samples
Scraped samples not sequenced
Fig. 1. Schematic diagram of the sampling approach and genetic diversity of the Mtb population within the host. (A) We repeatedly collected three to five sputum samples from new and treatment-naïve patients with TB. All sputum samples from a single individual were mixed together and processed with a standard procedure and then serially diluted. The target dilutions were spread onto five plates, while the remaining undiluted samples were mixed together and spread onto two plates. WGS, whole-genome sequencing. (B) The correlation between the frequencies of single-nucleotide polymorphisms (SNPs) in scraped samples and the relative single-colony samples was tested by Pearson’s correlation coefficient (r); the small inset shows an enlargement of the dashed box on the left. (C) A bar plot showing the numbers of de novo SNPs that were detected in colony samples in different patients with the number of SNPs that were not detected in the corresponding scraped whole population samples highlighted. (D) A histogram showing the distribution of pairwise SNP distance between any two single colonies from the same patient with the two dashed lines indicating the two commonly used SNP thresholds for defining transmission clusters. (E) A violin plot showing the distribution of numbers of de novo SNPs in single colonies from each patient.
Liu et al., Sci. Adv. 2020; 6 : eaba4901 29 May 2020
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
4 of 9
The elevated mutation rate was host dependentThis elevated mutation rate could be explained either by an increased susceptibility of these particular Mtb strains to oxidative damage or by increased bacterial exposure to oxidative stress in a subset of the patients. If some of the strains had increased ROS susceptibility, all of the colonies derived from these Mtb strains should exhibit simi-lar patterns of mutations and comparable numbers of SNPs, but this was not what was observed; different colonies from the same strain showed a wide variation in de novo SNPs (Fig. 4B). In addition, strains that gave rise to colonies with higher mutation rates should have more C-T and CC-TT inherited mutations than the other strains, but there was no difference in the ratio of C-T and CC-TT between the two groups for the inherited mutations (fig. S5). Last, the four strains whose colonies had more mutations (H, I, J, and S) were
not clustered in the phylogenetic tree, showing that the increased mutation rate was not a feature of a particular phylogenetic clade (fig. S6); besides, we found no homoplastic mutation between any two of these four strains. Together, these results suggest that increased bacterial susceptibility to ROS seemed to be not a plausi-ble explanation for the higher mutation rates in a subset of colonies.
As the production and regulation of ROS and reactive nitrogen species are important components of the host immune response to pathogen infections, we hypothesized that the variation in the bac-terial mutation rate could be host dependent. We compared the ratio of C-T mutations in Mtb samples collected from HIV-negative hosts (all 18 patients) in this study with the ratio from HIV-positive hosts in a previous study of 2587 Mtb single colonies from 42 HIV- positive hosts (12). This comparison showed that Mtb strains from
0 1 2 3 4 5 6 7
A44
A42
A39
A15
A37
A31
A3
MRCA
A5
A25
A19
A 4
A21
A1
A24
A12
A6A 4 8
A11
A23
A20
A47
A26
A36
A9
A46
A10
A13
A41
A27
A16
A45
A14
A35
A30
A 8
A40
A38
A29
A18
A43
A17
A28
A2
A34
A7
A32
A33
A22
0 1 2 3 4 5 6 7
C37
MRCA
C34
C10
C21
C43
C16
C3
C45
C18
C36
C11
C39
C46C47
C35
C31
C26
C 4
C2
C 9
C17
C13
C 8
C50
C20
C22
C 4 0
C24
C 4 4
C33
C14
C15
C5
C19
C32
C28
C49
C41
C29C27
C42
C1
C 6
C38
C23
C 4 8
C12
C25
C30
C7
0 1 2
F9
F13F14
F42
F26
MRCA
F16
F12
F28
F15
F39
F43
F23
F11
F34
F7
F18
F25
F35
F49
F47
F6
F48
F17
F44
F40
F45F46
F5
F24
F3
F38
F21
0 4 8 1 2 1 6 2
H7
H 2 4
H 5
H 6
H 8
H32
H11
H 2 8
H 3 6
H17
H 4
H13
H31
H21
H 3 0
H10
H20
H 4 0
H37
H27
H16
H18
H 3 9
H23
H26
H12
H1
H 3
H25
H 2
H 3 8
H22
H 9
H 3 3
H14
H19
H 3 4
MRCA
H29
H 3 5
H15
0 1 2 3 4 5 6 7 8
I 4 9
I27
I 4
I 4 0
I 4 8
I10
I16
I 3
I 3 6
I21
I15
I 3 8I35
I 8
I 3 4
I 4 3
I 3 9
I37
I23
I 9
I33
I 4 5
I7
I 4 6
I17I 4 4
I41
I50
MRCA
I42
I31
I19
I26
I11
I24
I20
I29
I14
I1
I47
I 5
I18
I28
I 6
I25
I13
I2
I32
I12
I22
0 1 2 3 4 5 6 7
K21
K47
K39
K2
K44
K15
K22
K37
K20
K11
K27
K32
K3
K34
K8
K19
K6
K42
K18
K35
K49
K28
K38
K17
K25
K10
K4
K1
K33
K16
K7
K14
K26
K24
K5
K40
K13K12
K36
K45
K50
K41
K23
K9
MRCA
0 1 2 3 4 5 6 7
Q10
Q39
MRCA
Q11
Q 4
Q28
Q13
Q17
Q33
Q23
Q20
Q9
Q49
Q1
Q43
Q16
Q2
Q25
Q15
Q36
Q27
Q3
Q14
Q42Q37
Q45
Q 8
Q22
Q44
Q26
Q47
Q30
Q46
Q34
Q5
Q32
Q35
Q31
Q24
Q21
Q18
Q7
Q41
Q40
Q50
Q19
Q6
Q12
Q 4 8
Q29
Q38
0 4 8 1 2 1 6 2 0
S42
S17
S34
S37
S24
S30
S23
S18
S6
S16
S49
S50
S41
MRCA
S39
S3
S36
S7
S 8
S27
S12
S47
S1
S19
S9
S11
S2
S5
S40
S14
S22
S25
S13
S48
S32
S26
S38
S29
S28
S 4
S33
S21
S10
S45
S35
S15
S20
S44S31
S43
S46
C37
C34
C10
C21
C43
C16
C3
C45
C18
C36
C11
C39
C46C47
C35
C31
C26
C 4
C 9
C17
C13
C 8
C22
C 4 0
C24
C 4 4
C33
C14
C15
C5
C19
C32
C28
C49
C41C42
C1
C 6
C23
C 4 8
C12
C25
C30
I 4 9
I27
I 4
I 4 0
I 4 8
I10
I16
I 3
I 3 6
I21
I15
I 3 8I35
I 8
I 3 4
I37
I 9
I33
I 4 5
I7
I 4 6
I17I 4 4
I41
I50
I42
I31
I19
I26
I24
I20
I14
I1
I 5
I18
I28
I 6
I13
I2
I32
I12
I22
K21
K47
K39
K2
K44
K15
K22
K37
K20
K27
K32
K3
K34
K19
K6
K42
K18
K35
K49
K28
K38
K17
K25
K10
K4
K33
K16
K14
K26
K24
K40
K13K12
K36
K45
K50
K41
K23
K9
Q10
Q11
Q 4
Q28
Q13
Q1
Q33
Q23
Q20
Q9
Q49
Q1
Q43
Q16Q25
Q15
Q27
Q3
Q14
Q42Q37
Q45
Q 8
Q22
Q44
Q26
Q47
Q30
Q46
Q34
Q35
Q31
Q24
Q18
Q7
Q41
Q40
Q50
Q19
Q6
Q12
Q4 8
Q29
F9
F13F14
F42
F26
F16
F12
F28
F15
F39
F43
F23
F11
F34
F7
F18
F25
F35
F49
F47
F6
F48
F17
F44
F40
F45F46
F5
F24
F3
F21
H7
H 5
H 6
H 8
H32
H11
H 2 8
H 3 6
H17
H 4
H13
H21
H10
H37
H27
H16
H 3 9
H12
H1
H 2
H 3 8
H22
H 3 3
H14H29
H 3 5
H15
S42
S17
S34
S37
S24
S30
S23
S18
S6
S16
S49
S50
S41
S39
S3
S36
S7
S 8
S27
S12
S47
S1
S9
S11
S2
S5
S40
S14
S22
S25
S13
S4 8
S32
S26
S38
S29
S28
S 4
S33
S21
S10
S45
S15
S20
S44S31
S43
S46
B
Patient E Patient L Patient M Patient O Patient TPatient B Patient J Patient P
1 SNP 2 SNP 1 SNP 1 SNP 1 SNP 1 SNP 6 SNP 4 SNP
Patient C Patient F Patient I Patient K Patient QPatient A Patient H Patient S
A
SNP SNP SNP SNP SNP SNP SNP SNP
A44
A42
A39
A15
A3
A5
A25
A19
A 4
A1
A24
A12
A11
A23
A20
A47
A36
A46
A10
A13
A27
A45
A30
A 8
A40
A38
A29
A43
A17
A28
A2
A34
A7
A32
B30
B38
B4
B17
B2
5
B43
B19B
21
B40
B1
B52
B8
B29
B4
4
B34MRCA
B31
B2
4
B9
B42
B5
B2
B27 B
22
B3B
50
B35
B7
B4
9
B53
B10
B26
B2
3
B13
B39B
54
B14
B11
B28
B15
B2
0
B16
B4
7
B32
B4
8
B33
B51
B18
B41
B12
B37
B6
B4
5
E4
0
MRCAE29E9
E25
E24
E6
E36
E17
E10
E47
E32
E1
E49
E23
E41E
3
E39
E22
E30 E7
E4
3
E8
E26
E21
E50
E14
E35
E19
E4
E15
E13
E4
2
E28
E31
E18
E48
E5
E4
5
E38
E2
0
E12
E2
E16
E4
4
E11
L4
8L
47
L33
L4
L28
L4
6
MRCA
L13L12L11
L38 L5
L17
L50
L15
L4
2
L32
L9
L2
3
L29L30
L3
L4
5
L2
L34
L2
2
L49
L2
0
L27
L4
3
L19L18
L39
L16
L35
L2
5
L40
L26
L14
L21
L36
L41
L7
M4
M3
8
M 4 5
M31
M4 3
M21
M1
2
M 8
M22
M1
0
M7
M3
7
M16
M3
2 M3
6
M1
M41M40M
30
M11
M3
M25
M 9
M29M 4 4
M18
M15
MRCA
M13
M 4 6
M14 M
49
M19
M6
M3
4
M 2 4
M50M
48
M17
M5
M 2 0 M47
M3
3
M3
5
M3
9 O2
O24
O13
O7
O10
O1
O1
6
O26
O11
O4
O18 O
12O1
4
O27
O9
O2
8 O3
O6
O23
O17
O 8
O20
O3
0
O21 MRCA
O22
T5
MRCA
T33
T10
T4
8
T4
1T
42T39
T26
T4
5
T 36
T27
T2
2
T15
T1
T 9
T 4T50
T2
0
T 28T12
T 8
T3
T17
T4
4
T 6
T1
9
T 38
T4
6
T 3 0
T25
T21
T11T2
T2
3
T7T32
T16
T14
T31
T29
T4
0
T18T
24
T4
3
T13
T4
7T49
T37
T 3 4T35
J4
3
J2
3
J 38
J2
0
J25
J32 J10
J4
9
J13
J4
4
J 8J 3 4
J41
J15
J35
J4
6
J1
J14
J26
J2
J40
J5
J19J18
J31
J 9
J24 J
27
J4
8
J33
J7
J2
2
J 30
J36
J4
5
J4
MRCA
J11
J3
7J2
1
J 50
J29
J3
J17J28
J 6
J4
2 J4
7
J12
J16
J39
P 8
P5
P13P
1P25
P11
P32 P7P30
P24
P35
P26P
22
P4
2
P14
P3
P23
P21
P34
P37
P1
9
P4
1
P31
P4
4
P1
8
P33
P2
P38
P3
9
P46
P28
P15
P50
P48P36
P4
3
P29
P6
P47P27
P49
MRCA
P4
5
P17
P12P
20
P4
P4
0
P16
P9
P10
Fig. 2. Phylogenetic trees of Mtb populations from different patients. All trees are rooted to the inferred ancestral genome and all the “inherited SNPs” were excluded before the phylogenetic reconstruction. The length of solid lines represents the number of de novo SNPs. (A) “Starlike expansion” trees for these patients are shown in a circle format. Trees for patients G and R were not shown because no de novo SNPs were detected. (B) “Stepwise growth” trees for these patients are shown in rectangular format. Gray stars indicate those colonies with excessive de novo SNPs and the taxa names with blue backgrounds highlight the recently expanded populations.
0.0
0.1
0.2
0.3
0 2 4 6 8 10 12 14Number of de novo SNPs
Pro
port
ion
of c
olon
ies
0.0
0.2
0.4
0.6
0 1 2 3 4 5 6Number of de novo SNPs
Pro
port
ion
of c
olon
ies
0.0
0.2
0.4
0.6
0 1 2 3 4 5 6 7Number of de novo SNPs
Pro
port
ion
of c
olon
ies
0.0
0.2
0.4
0.6
0 1 2 3 4 5 6 7Number of de novo SNPs
Pro
port
ion
of c
olon
ies
Patient H Patient I Patient J Patient SSimulatedObserved
SimulatedObserved
SimulatedObserved
SimulatedObserved
P = 0.0003 P = 0.0008 P = 0.0044 P = 0.0192
A B C D
Fig. 3. Comparison of the distribution of de novo SNP numbers between simulated and observed populations. (A to D) The comparisons of the Mtb populations from patients H, I, J, and S, respectively. The height of histograms shows the proportions of colonies with the relative number of de novo SNPs. The P values indicate the hypergeometric test for 0.9-quantile SNP numbers for the simulated and observed populations.
Liu et al., Sci. Adv. 2020; 6 : eaba4901 29 May 2020
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
5 of 9
HIV- negative hosts accumulated more C-T mutations than those from HIV-positive hosts (P < 0.0001, Fig. 4, C and D), and the dif-ference remained when we restricted this comparison to lineage 2 strains from the two groups (P = 0.0005; fig. S7). Furthermore, the double CC-TT mutations were not observed in the Mtb colonies from HIV- positive hosts, and in these hosts, the phylogenetic trees of the de novo SNPs did not show colonies with “long branches” (fig. S8). These analyses thus suggest that the signature of ROS- associated mutations could be driven by immune pressure.
DISCUSSIONThis work analyzed the population structure and mutation signa-tures of Mtb by genome sequencing colonies isolated directly from clinical specimens, thereby allowing us to make inferences about the growth and evolution of the bacterial population during infection. The data present at least four clear findings: (i) distinct levels of bacterial diversity were observed in the Mtb populations from dif-ferent patients; (ii) the pairwise SNP distance between any two colonies within a host could occasionally exceed the SNP thresholds used to define a transmission cluster; (iii) the number of de novo SNPs varied significantly between different bacilli within the same host, suggesting that the mutation rate varies between different bac-teria within a host; and (iv) the elevated mutation rate in some colonies carries the signature of ROS-induced mutagenesis.
The estimated mutation rate of Mtb, about 0.5 mutations in every 10,000 genomes copied (6), is low compared to most other bacteria, yet the high rates of acquired resistance in clinical strains raise the
question of whether the mutation rate within the host could somehow be higher (4). The rate calculated from an in vivo macaque infection model was very similar to that inferred in vitro through fluctuation experiments, despite the differences in bacterial replication rates in vitro and in vivo (3, 5), and deep whole-genome sequencing of serial sputum samples from patients with TB taken during treatment revealed a high degree of genetic diversity with tens of unfixed SNPs (8, 26). In the current study, we found an increased rate of ROS- associated mutations that could perhaps explain the high rate of acquired resistance during clinical treatment. Those patients in whom Mtb is subjected to high levels of ROS mutagenesis should carry larger pools of mutations and, thus, perhaps have a higher risk for developing drug resistance. Further studies are warranted to under-stand why some patients induce higher levels of ROS mutagenesis, how to identify them, and how to reduce their risk of developing of drug resistance.
Unexpectedly, only 4 of the 18 patients had bacterial populations wset of the colonies from each of these four patients. Because the excess mutations were predominantly the C-T changes, the results suggest that ROS stress can vary between hosts and also that ROS heterogeneity exists in different microenvironments within the same host. This model is consistent with the growing knowledge of granuloma heterogeneity, where each granuloma represents a micro-environment that can be independently influenced by the local im-mune response (27, 28). This inference was further supported by the lower ratio of C-T mutations in strains of HIV-positive patients, implying that the host immune status plays a role in the frequency of oxidative damage mutations.
0
25
50
75
100
HIV negative HIV positive
Pro
port
ion
of m
utat
ion
type
0.0
0.2
0.4
0.6
0.8
1.0
HIV negative HIV positive
Rat
io o
f C>
T/G
>A
mut
atio
ns
CP < 0.0001 D
67.1%
40.6%
A
0
2
4
6
0
5
10
15
0
2
4
6
8Patient H
MRCA
Patient ICC>TT
C>T/G>A
Other types
Patient S (major clade)
MRCA
0
2
4
6 Patient J
MRCA MRCA
CC>TT
C>T/G>A
Other types
CC>TT
C>T/G>A
Other types
CC>TT
C>T/G>A
Other types
B
Num
ber
of S
NP
s
Num
ber
of S
NP
s
Num
ber
of S
NP
s
Num
ber
of S
NP
s
0
25
50
75
100
125
A B C E F G H I J K L M N O P Q R S T
Num
ber
of m
utat
ions
87%
73%
82%72%
C>T/G>A
C>G/G>C
C>A/G>T
A>T/T>A
A>G/T>C
A>C/T>G
Fig. 4. Oxidative damage mutations are the source of excessive de novo SNPs. (A) The proportions of different mutation types in the samples from patients H, I, J, and S showed a deviation from the average level. (B) Integration of phylogenetic tree and mutation composition histograms for patients H, I, J, and S. For patient S, only the major clade (the 48 colonies) is shown here. Different colors refer to different mutation types. The gray dot “MRCA” represents the reconstructed ancestor of each Mtb population. For patients H, J, and S, the MRCA strains were detected in the single colonies. (C) A comparison of C-T mutation ratio between samples from HIV- negative and HIV-positive hosts (P < 0.0001 by t test), with each dot representing one patient. (D) The bar plot shows the ratio of C-T mutations in all samples from HIV- negative and HIV-positive hosts, respectively.
Liu et al., Sci. Adv. 2020; 6 : eaba4901 29 May 2020
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
6 of 9
Because unfixed SNPs were detected in 39% of the single colonies, we scanned them for CC-TT mutations and, as expected, found them in the genomes of colonies isolated from the patients with an elevated mutation rate: H (9), I (11), J (14), and S (3). Unexpectedly, we also found CC-TT mutations in colonies from patients without elevated mutation rates: 26 in colonies from E, 2 from F, and 3 from O (fig. S9A). This suggests that the rates of ROS-induced mutations could have been underestimated in our study because of the limita-tions of our sampling size and a mixture of different colonies.
We considered positive selection as an alternative explanation for the high numbers of SNPs found in some colonies but discarded this explanation for the following reasons. First, pNS analysis sug-gested that a negative selection was operating on the bacterial pop-ulation of strains H, I, J, and S, which was consistent with previous findings that within-host selective pressure on Mtb is governed by negative selection (8, 29, 30). Second, in the colonies with large numbers of SNPs, the mutation type of excessive SNPs was almost exclusively C-T or CC-TT. It seems unlikely that a particular muta-tion type would be favored by positive selection, which should be based only on the functional impact of the mutations on the genes without a preference for a specific type of mutation. Third, if a pos-itive selection was operating, we should expect to see some homo-plasy or gene enrichment in the different colonies with large numbers of SNPs isolated from a single patient. However, we found neither homoplastic mutations nor gene enrichment across the colonies from each of the four strains that had high SNP colonies.
Our analysis suggested that increased bacterial susceptibility to ROS might not be a plausible explanation for the higher mutation rates in a subset of colonies, but we cannot completely rule out its role in the phenomenon we observed. Heterogeneity in the strains’ susceptibility to ROS could act together with host environmental stress and perhaps help explain why colonies with high numbers of SNPs were isolated only from some patients. Although we found no homoplastic mutations between any two of these four strains (H, I, J, and S) or mutations on the same gene but in different codons, it is still possible that mutations in different genes could have similar functional effects in increasing the susceptibility to ROS stress. The absence of C-T or CC-TT mutational signature in the inherited muta-tions was a strong argument, but it cannot exclude the possibility that these four strains only acquired an increased “ROS susceptibility” very recently, for which they would not have accumulated many C-T or CC-TT in the inherited mutations.
The biological process of Mtb infection is difficult to study in humans, but PET-CT (positron emission tomography–computed tomography) tracking of the dynamic course of infection in cynomolgus macaques revealed that different lesions in the same patient followed diverse trajectories. While some granulomas were sterilized in both active and latent cases, others grew and ultimately determined the clinical outcome of infection (27). Typically, only one or a few granulomas that are unable to contain the bacteria are probably responsible for bacterial dissemination and the onset of active disease (12, 27, 31). Our results are consistent with these observations. We found that the burden of TB disease in humans often appeared the result of the recent expansion of a subpopulation of the bacilli present in the individual. Together, these findings suggest a “turning point” theory for the progression of TB disease, in which the deterioration of a particular lesion leads to clinical disease. Hence, finding biomarkers that can predict such a transition or detect it early would allow preventive inter-ventions that might prevent the development of clinical TB disease.
For those patients with more mutational diversity, the proba-bility was higher that the genomes of any two selected colonies dif-fered by more than 5 SNPs, and pairs that differed by more than 12 SNPs were not rare (fig. S9, B and C). This finding implies that the dominant subpopulation cultured and sequenced from one pa-tient could differ by >12 SNPs from a variant that infects a second patient. In the second host, this transmitted bacillus could again mutate and generate a more distant variant that dominates when the second patient’s sputum is cultured and sequenced. The com-monly used SNP thresholds of ≤5 SNP differences for direct trans-mission and ≤ 12 SNPs for transmission clustered strains have been supported by epidemiologic data (17, 32), but our results suggest that strains with slightly more SNP differences might occasionally belong to the same transmission chain and deserve to be included when conducting epidemiological investigations.
The extent to which the sputum bacillary population is represen-tative of the total mycobacterial population within an individual is unknown. An ideal and complete characterization of the Mtb diver-sity within a host would require sampling from each lesion, which is not feasible in patients with TB. The bacilli in sputum mainly reflect Mtb population in lesions that are open and connected to the air-way, so bacilli in lesions that were not open or connected to airways may not have been sampled and therefore not represented in our data. However, expectorated tubercle bacilli are thought to origi-nate from sites of extensive bacterial growth that are the major con-tributors to active disease (33, 34), so we assume that we sampled at least some of the most clinically relevant sites. In addition, different sputum samples collected on the same day could vary in the Mtb growth sites from which they derive, so to avoid bias, we repeatedly sampled from the same patient and pooled the specimens.
In conclusion, this study provides a baseline characterization of Mtb population diversity within the host during active TB disease, and the two patterns of population growth extend our understanding of Mtb proliferation in vivo. We show that the mutation rate of Mtb in vivo appears to vary, presumably related to the host environment. The possibility that the mutation rate may increase under certain in vivo conditions could help to understand how drug resist ance evolves.
MATERIALS AND METHODSStudy cohortThe patients were enrolled in the Shanghai Public Health Clinical Center (a designated hospital for TB), in Shanghai, China, and the study was approved by the ethics committee of this clinical center. The recruitment criteria for the patients with TB were as follows: (i) no previous history of TB disease, (ii) no previous anti-TB anti-biotics, (iii) positive sputum smear for bacilli on microscopy with a score >1+, (iv) a qualified sputum volume at least 5 ml, and (v) drug-susceptible Mtb isolates. Initially, 20 patients were enrolled, but samples from two patients were excluded because one contained Mycobacterium kansasii and the other was contaminated with fungus, leaving 18 patient specimens for analysis (table S1). In the isolate from patient F, an inhA −15 C-T promoter mutation was found and drug sensitivity testing reported isoniazid resistance after whole- genome sequencing was performed. Because this patient had not previously taken any anti-TB drugs, the strain was included. The methods of this study were carried out in accordance with the ap-proved guidelines, and written informed consent was obtained from the patients before the study.
Liu et al., Sci. Adv. 2020; 6 : eaba4901 29 May 2020
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
7 of 9
Sample collection and processingPatients suspected to have TB were first screened according to re-cruitment criteria 1 and 2. Routine smear microscopy was performed on the sputum samples, and the remaining sediment was temporarily stored at 4°C. Sputum quality was monitored and specimens with purulent sputum were preferentially frozen. The clinical center rou-tinely performed GeneXpert MTB/RIF tests and culture-based drug susceptibility testing for isoniazid, rifampicin, ethambutol, and pyrazinamide. GeneXpert MTB/RIF results were reported on the same day as diagnosis. When a sputum sample had a microscopic score >1+ and was rifampicin sensitive by GeneXpert, a further three to five sputum samples were collected from these patients before the treatment was started. All sputum samples from each pa-tient were combined to reach a total of at least 5 ml for each patient. The combined sputum samples were subjected to digestion with 1:1 of 2% NaOH-NaCl and left standing for 15 min. Phosphate-buffered saline (PBS; pH 6.8) was added to a final volume of 50 ml and the samples were then centrifuged at 3000g for 15 min. The supernatant was discarded and the sediment was suspended in 1 ml of PBS solu-tion. We then performed serial dilutions for each of the samples, and the dilution index was estimated based on the sputum smear results. For each target dilution, eight L-J medium plates were spread to obtain an estimated 50 colonies on each plate. The remainder of the original, undiluted sputum specimens from each patient was centrifuged and spread onto two L-J medium plates. From most patient isolates, 50 well-separated colonies were selected from different plates, but only 10 colonies were obtained from sample G. Each selected colony was spread onto a fresh L-J plate for a short amplification culture of 1 to 2 weeks and then collected for DNA extraction. For nine patients, all of the colonies from the two plates spread with the undiluted sputa were scraped into a single tube for DNA extraction and designated scraped, whole population samples.
Illumina sequencing and SNP callingGenomic DNA from both the single-colony isolates and scraped whole population samples was extracted with the cetyltrimethylam-monium bromide lysozyme method (35). A 300–base pair fragment length library was constructed for each DNA sample and paired-end–sequenced on an Illumina HiSeq 2500 instrument. A previously validated pipeline was used for SNP calling (35). Fixed mutations with a frequency of ≥90% and at least 10 supporting reads were identified. Whole-genome sequence data of 8399 Mtb isolates from previous studies were downloaded and subjected to SNP calling to identify the ratio of C-T/G-A mutations in each isolate (35).
Filter for unfixed SNPsA previously validated pipeline was used to filter out false positives and detect unfixed SNPs (8). We considered only unfixed SNPs whose frequencies were estimated to be ≥1.5% in whole population samples. Because the false positives shared similar patterns in the strains with close genetic backgrounds, we further used repetitive unfixed SNPs called from Shanghai Mtb isolates that had been pre-viously sampled and sequenced as a background filter (32). Ideally, we should not find unfixed mutations in single-colony samples if each colony derived from a single bacillus (8). However, we detected ≥1 unfixed SNP with allele frequencies above 10% but below 90% in 39.0% (310 of 795) of the single-colony samples. Because these unfixed SNPs survived our filters for false positives, we further ex-
amined their prevalence. We found that some unfixed SNPs detected in one colony from a patient were fixed SNPs (≥90%) in other col-onies from the same patient but absent in colonies from other pa-tients, suggesting that these unfixed SNPs were present because the colony had grown from more than one original, isolated bacillus or that colonies from separate bacilli grew into one another. Given these possibilities, we included the unfixed SNPs with frequencies above 50%, which represent the major clones for the subsequent analysis.
Phylogenetic reconstructionFor phylogenetic reconstructions, all SNP locations for each isolate were combined into a nonredundant consensus list and recalled with the mpileup2cns function of VarScan (version 2.3.9) (36). Nu-cleotide positions with missing calls in more than 5% of the isolates were removed. An alignment of the remaining polymorphic posi-tions from all strains was used for phylogeny reconstruction with MEGA 6.0 (37). To make the branch lengths represent the number of de novo SNPs, we used the minimum evolution method to infer the phylogenetic trees under the “No. of differences” model with both transitions and transversions included. The bootstrap method was used with 500 replications for each test. Phylogeny trees were visualized in FigTree (version 1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/).
Proportion of nonsynonymous to synonymous mutationsWe wished to determine whether the excessive mutations in some colonies of the four patients, H, I, J, and S, were due to positive selection, which would accelerate the mutation frequency and fixa-tion in the population. We therefore used pNS to evaluate the selec-tive pressure in these patients’ samples and compared the values to those of the remaining patients (8). The basic principle of the pNS method is similar to that of dN/dS, but dN/dS is used to compare two real sequences that exist in nature, while pNS can be applied to concatenated sequences of all the mutations in the genomes. A co-don substitution matrix was generated using a base substitution model that takes into account the proportion of guanine and cyto-sine in the genome (percentage GC content, 0.656). Briefly, for each variant codon, we used a custom Python script to simulate 50,000 individual introductions of a single mutation into the codon and scored the outcomes as either synonymous or nonsynonymous. We considered the average number of nonsynonymous outcomes of the simulations as an estimate of the probability that a mutation in the given codon would be nonsynonymous. The formula used to calcu-late the pNS was described previously (8).
Simulating Mtb growthWe generated Mtb growth with continuously accumulating neutral mutations under a constant growth rate in silico, starting with a single bacillus cell and ending when the population size reached 108. The mutation rate that we chose here was 2.01 × 10−10 mutations per site per generation (6). Because our analysis excluded the muta-tions in PPE/PE-PGRS family genes and other mobile sequences, which accounts for 8.9% of the whole genome, the genome size was set as 4.41 × 106 × 91.1% = 4.02 × 106. Then, the mutation rate () used was 0.0008 per genome per generation. Mtb bacilli in expo-nential growth are represented by
Liu et al., Sci. Adv. 2020; 6 : eaba4901 29 May 2020
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
8 of 9
where Nt is the number of bacilli in generation t with Nt = 1 and a is a constant number. If the average number of de novo mutations in the TB lineage of a patient is k, the number of generations from the first bacilli cell to 108 is k/. Therefore, the parameter a = ln N t _ t = 8 × ln10 × 0.0008 / k ≈ 0.0147 / k .
The Wright-Fisher model (22) was used to simulate discrete TB growth. In generation t, the expected number of bacilli is Nt = e0.0147t/k. The bacilli in generation t were randomly sampled as prog-enies of cells from generation t − 1. A newborn cell had a probability of accumulating a de novo mutation in each cell division. The workflow of the simulation is displayed in fig. S10.
Test the heavy-tailed distributionTo test whether the de novo mutation number distribution of an ob-served TB lineage is a heavy-tailed distribution, we sorted the mutation number of observed data and simulated data from lowest to highest separately and used a hypergeometric test to compare the 0.9-quantile mutation number of the observed data and simulated data.
Ethics statementThis study was approved by the Research Ethics Review Committee of the Shanghai Public Health Clinical Center, Fudan University.
SUPPLEMENTARY MATERIALSSupplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/22/eaba4901/DC1
View/request a protocol for this paper from Bio-protocol.
REFERENCES AND NOTES 1. M. Gengenbacher, S. H. Kaufmann, Mycobacterium tuberculosis: Success through
dormancy. FEMS Microbiol. Rev. 36, 514–532 (2012). 2. N. Dookie, S. Rambaran, N. Padayatchi, S. Mahomed, K. Naidoo, Evolution of drug
resistance in Mycobacterium tuberculosis: A review on the molecular determinants of resistance and implications for personalized care. J. Antimicrob. Chemother. 73, 1138–1151 (2018).
3. R. R. Kempker, M. Kipiani, V. Mirtskhulava, N. Tukvadze, M. J. Magee, H. M. Blumberg, Acquired drug resistance in Mycobacterium tuberculosis and poor outcomes among patients with multidrug-resistant tuberculosis. Emerg. Infect. Dis. 21, 992–1001 (2015).
4. M. McGrath, N. C. Gey van Pittius, P. D. van Helden, R. M. Warren, D. F. Warner, Mutation rate and the emergence of drug resistance in Mycobacterium tuberculosis. J. Antimicrob. Chemother. 69, 292–302 (2014).
5. C. B. Ford, R. R. Shah, M. K. Maeda, S. Gagneux, M. B. Murray, T. Cohen, J. C. Johnston, J. Gardy, M. Lipsitch, S. M. Fortune, Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nat. Genet. 45, 784–790 (2013).
6. C. B. Ford, P. L. Lin, M. R. Chase, R. R. Shah, O. Iartchouk, J. Galagan, N. Mohaideen, T. R. Ioerger, J. C. Sacchettini, M. Lipsitch, J. L. Flynn, S. M. Fortune, Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection. Nat. Genet. 43, 482–486 (2011).
7. V. Eldholm, G. Norheim, B. von der Lippe, W. Kinander, U. R. Dahle, D. A. Caugant, T. Mannsåker, A. T. Mengshoel, A. M. Dyrhol-Riise, F. Balloux, Evolution of extensively drug-resistant Mycobacterium tuberculosis from a susceptible ancestor in a single patient. Genome Biol. 15, 490 (2014).
8. A. Trauner, Q. Liu, L. E. Via, X. Liu, X. Ruan, L. Liang, H. Shi, Y. Chen, Z. Wang, R. Liang, W. Zhang, W. Wei, J. Gao, G. Sun, D. Brites, K. England, G. Zhang, S. Gagneux, C. E. Barry III, Q. Gao, The within-host population dynamics of Mycobacterium tuberculosis vary with treatment efficacy. Genome Biol. 18, 71 (2017).
9. X. Didelot, A. S. Walker, T. E. Peto, D. W. Crook, D. J. Wilson, Within-host evolution of bacterial pathogens. Nat. Rev. Microbiol. 14, 150–162 (2016).
10. Q. Liu, L. E. Via, T. Luo, L. Liang, X. Liu, S. Wu, Q. Shen, W. Wei, X. Ruan, X. Yuan, G. Zhang, C. E. Barry III, Q. Gao, Within patient microevolution of Mycobacterium tuberculosis correlates with heterogeneous responses to treatment. Sci. Rep. 5, 17507 (2015).
11. C. Colijn, T. Cohen, A. Ganesh, M. Murray, Spontaneous emergence of multiple drug resistance in tuberculosis before and during therapy. PLOS ONE 6, e18327 (2011).
12. T. D. Lieberman, D. Wilson, R. Misra, L. L. Xiong, P. Moodley, T. Cohen, R. Kishony, Genomic diversity in autopsy samples reveals within-host dissemination of HIV-associated Mycobacterium tuberculosis. Nat. Med. 22, 1470–1474 (2016).
13. G. L. Hobby, A. P. Holman, M. D. Iseman, J. M. Jones, Enumeration of tubercle bacilli in sputum of patients with pulmonary tuberculosis. Antimicrob. Agents Chemother. 4, 94–104 (1973).
14. R. Singhal, V. P. Myneedu, Microscopy as a diagnostic tool in pulmonary tuberculosis. Int. J. Mycobacteriol. 4, 1–6 (2015).
15. I. Comas, M. Coscolla, T. Luo, S. Borrell, K. E. Holt, M. Kato-Maeda, J. Parkhill, B. Malla, S. Berg, G. Thwaites, D. Yeboah-Manu, G. Bothamley, J. Mei, L. Wei, S. Bentley, S. R. Harris, S. Niemann, R. Diel, A. Aseffa, Q. Gao, D. Young, S. Gagneux, Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat. Genet. 45, 1176–1182 (2013).
16. M. Gan, Q. Liu, C. Yang, Q. Gao, T. Luo, Deep whole-genome sequencing to detect mixed infection of Mycobacterium tuberculosis. PLOS ONE 11, e0159029 (2016).
17. T. M. Walker, C. L. C. Ip, R. H. Harrell, J. T. Evans, G. Kapatai, M. J. Dedicoat, D. W. Eyre, D. J. Wilson, P. M. Hawkey, D. W. Crook, J. Parkhill, D. Harris, A. S. Walker, R. Bowden, P. Monk, E. G. Smith, T. E. Peto, Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: A retrospective observational study. Lancet Infect. Dis. 13, 137–146 (2013).
18. C. J. Martin, A. M. Cadena, V. W. Leung, P. L. Lin, P. Maiello, N. Hicks, M. R. Chase, J. L. Flynn, S. M. Fortune, Digitally barcoding Mycobacterium tuberculosis reveals in vivo infection dynamics in the macaque model of tuberculosis. MBio 8, e00312–17 (2017).
19. D. B. Folkvardsen, A. Norman, Å. B. Andersen, E. Michael Rasmussen, L. Jelsbak, T. Lillebaek, Genomic epidemiology of a major Mycobacterium tuberculosis outbreak: Retrospective cohort study in a low-incidence setting using sparse time-series sampling. J Infect Dis 216, 366–374 (2017).
20. A. Roetzer, R. Diel, T. A. Kohl, C. Rückert, U. Nübel, J. Blom, T. Wirth, S. Jaenicke, S. Schuback, S. Rüsch-Gerdes, P. Supply, J. Kalinowski, S. Niemann, Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: A longitudinal molecular epidemiological study. PLOS Med. 10, e1001387 (2013).
21. Y. Xu, I. Cancino-Munoz, M. Torres-Puente, L. M. Villamayor, R. Borrás, M. Borrás-Máñez, M. Bosque, J. J. Camarena, E. Colomer-Roig, J. Colomina, I. Escribano, O. Esparcia-Rodríguez, A. Gil-Brusola, C. Gimeno, A. Gimeno-Gascón, B. Gomila-Sard, D. González-Granda, N. Gonzalo-Jiménez, M. R. Guna-Serrano, J. L. López-Hontangas, C. Martín-González, R. Moreno-Muñoz, D. Navarro, M. Navarro, N. Orta, E. Pérez, J. Prat, J. C. Rodríguez, M. M. Ruiz-García, H. Vanaclocha, C. Colijn, I. Comas, High-resolution mapping of tuberculosis transmission: Whole genome sequencing and phylogenetic modelling of a cohort from Valencia Region, Spain. PLOS Med. 16, e1002961 (2019).
22. D. Hartl, A. Clark, Principles of Population Genetics, Fourth Edition. (Sinauer Associates, Inc. Publishers, ed. 4, 2007).
23. D. A. Kreutzer, J. M. Essigmann, Oxidized, deaminated cytosines are a source of C → T transitions in vivo. Proc. Natl. Acad. Sci. U.S.A. 95, 3578–3582 (1998).
24. F. Hutchinson, Induction of tandem-base change mutations. Mutat. Res. 309, 11–15 (1994).
25. T. M. Reid, L. A. Loeb, Tandem double CC→TT mutations are produced by reactive oxygen species. Proc. Natl. Acad. Sci. U.S.A. 90, 3904–3907 (1993).
26. G. Sun, T. Luo, C. Yang, X. Dong, J. Li, Y. Zhu, H. Zheng, W. Tian, S. Wang, C. E. Barry III, J. Mei, Q. Gao, Dynamic population changes in Mycobacterium tuberculosis during acquisition and fixation of drug resistance in patients. J Infect Dis 206, 1724–1733 (2012).
27. P. L. Lin, C. B. Ford, M. T. Coleman, A. J. Myers, R. Gawande, T. Ioerger, J. Sacchettini, S. M. Fortune, J. L. Flynn, Sterilization of granulomas is common in active and latent tuberculosis despite within-host variability in bacterial killing. Nat. Med. 20, 75–79 (2014).
28. A. M. Cadena, S. M. Fortune, J. L. Flynn, Heterogeneity in tuberculosis. Nat. Rev. Immunol. 17, 691–702 (2017).
29. R. Hershberg, M. Lipatov, P. M. Small, H. Sheffer, S. Niemann, S. Homolka, J. C. Roach, K. Kremer, D. A. Petrov, M. W. Feldman, S. Gagneux, High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLOS Biol. 6, e311 (2008).
30. C. S. Pepperell, A. M. Casto, A. Kitchen, J. M. Granka, O. E. Cornejo, E. C. Holmes, B. Birren, J. Galagan, M. W. Feldman, The role of selection in shaping diversity of natural M. tuberculosis populations. PLOS Pathog. 9, e1003543 (2013).
31. M. T. Coleman, P. Maiello, J. Tomko, L. J. Frye, D. Fillmore, C. Janssen, E. Klein, P. L. Lin, Early changes by 18Fluorodeoxyglucose positron emission tomography coregistered with computed tomography predict outcome after Mycobacterium tuberculosis infection in cynomolgus macaques. Infect. Immun. 82, 2400–2404 (2014).
32. C. Yang, T. Luo, X. Shen, J. Wu, M. Gan, P. Xu, Z. Wu, S. Lin, J. Tian, Q. Liu, Z. Yuan, J. Mei, K. DeRiemer, Q. Gao, Transmission of multidrug-resistant Mycobacterium tuberculosis in Shanghai, China: A retrospective observational study using whole-genome sequencing and epidemiological investigation. Lancet Infect. Dis. 17, 275–284 (2017).
Liu et al., Sci. Adv. 2020; 6 : eaba4901 29 May 2020
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
9 of 9
33. G. Canetti, The Tubercle Bacillus in the Pulmonary Lesion of Man: Histobacteriology and its Bearing on the Therapy of Pulmonary Tuberculosis (Springer Publishing Company, 1955).
34. D. B. Young, K. Duncan, Prospects for new interventions in the treatment and prevention of mycobacterial disease. Annu. Rev. Microbiol. 49, 641–673 (1995).
35. Q. Liu, A. Ma, L. Wei, Y. Pang, B. Wu, T. Luo, Y. Zhou, H. X. Zheng, Q. Jiang, M. Gan, T. Zuo, M. Liu, C. Yang, L. Jin, I. Comas, S. Gagneux, Y. Zhao, C. S. Pepperell, Q. Gao, China’s tuberculosis epidemic stems from historical expansion of four strains of Mycobacterium tuberculosis. Nat. Ecol. Evol. 2, 1982–1992 (2018).
36. D. C. Koboldt, Q. Zhang, D. E. Larson, D. Shen, M. D. McLellan, L. Lin, C. A. Miller, E. R. Mardis, L. Ding, R. K. Wilson, VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
37. K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).
Acknowledgments: We thank L.-D. Lyu (School of Basic Medical Sciences, Fudan University) and C. Yang (School of Public Health, Yale University) for the fruitful discussions. Funding: This work was supported by the National Natural Science Foundation of China (91631301 and 81661128043 to Q.G., 81701975 to Q.L., and 31771416 to X.L.), the National Science and Technology Major Project of China (2017ZX10201302 to Q.G. and 2018ZX10714002-001-005 to Z.Z.), the Sanming Project of Medicine in Shenzhen (SZSM201611030 to Q.G.), European Research Council 638553-TB-ACCELERATE (to I.C.), the Key Research Program of the Chinese Academy of Sciences (KFZD-SW-220-1 to X.L.), and the CAS Light of West China Program (to X.L.). Y.F. is supported in part by NIH R01HG009524. Support was also received from NIH awards P01 AI132130 and AI142793 to S.M.F.. Author contributions: Q.L. and Q.G. designed
and implemented the study. Q.G. and S.M.F. supervised this work. Y.-X.F. provided important suggestions and consultants during the design of this study. M.W., Y.Lu, and F.L. recruited patients and collected sputum samples. J.W., J.S., X.Q., Z.Z., and H.W. performed the culture experiments to obtain single colonies. Q.L., M.G., and M.G.L. analyzed the sequencing reads and performed the genetic analysis. Y.Li and X.L. designed and conducted the mathematic simulation of population growth of Mtb. Q.J. performed statistical analysis. Q.L., I.C., H.E.T., S.M.F., and Q.G. drafted the manuscript. All authors critically reviewed and approved the final version of the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors. Sequencing reads have been submitted to the European Nucleotide Archive (EMBL-EBI) under study accession PRJEB34582 and PRJEB34609. The analysis scripts used in this study are available online at GitHub (https://github.com/StopTB/Single_Colony_Project).
Submitted 11 December 2019Accepted 25 March 2020Published 29 May 202010.1126/sciadv.aba4901
Citation: Q. Liu, J. Wei, Y. Li, M. Wang, J. Su, Y. Lu, M. G. López, X. Qian, Z. Zhu, H. Wang, M. Gan, Q. Jiang, Y.-X. Fu, H. E. Takiff, I. Comas, F. Li, X. Lu, S. M. Fortune, Q. Gao, Mycobacterium tuberculosis clinical isolates carry mutational signatures of host immune environments. Sci. Adv. 6, eaba4901 (2020).
environments clinical isolates carry mutational signatures of host immuneMycobacterium tuberculosis
GaoWang, Mingyun Gan, Qi Jiang, Yun-Xin Fu, Howard E. Takiff, Iñaki Comas, Feng Li, Xuemei Lu, Sarah M. Fortune and Qian Qingyun Liu, Jianhao Wei, Yawei Li, Mei Wang, Jun Su, Yonghui Lu, Mariana G. López, Xueqin Qian, Zhaoqin Zhu, Haiying
Terms of ServiceUse of this article is subject to the
is a registered trademark of AAAS.Science AdvancesYork Avenue NW, Washington, DC 20005. The title (ISSN 2375-2548) is published by the American Association for the Advancement of Science, 1200 NewScience Advances