Clonal Evolution of Acute Myeloid Leukemia Revealed by High-Throughput Single-Cell 1 Genomics 2 Kiyomi Morita 1,8 *, Feng Wang 2 *, Katharina Jahn 6 *, Jack Kuipers 6 , Yuanqing Yan 7 , Jairo 3 Matthews 1 , Latasha Little 2 , Curtis Gumbs 2 , Shujuan Chen 2 , Jianhua Zhang 2 , Xingzhi Song 2 , 4 Erika Thompson 3 , Keyur Patel 4 , Carlos Bueso-Ramos 4 , Courtney D DiNardo 1 , Farhad Ravandi 1 , 5 Elias Jabbour 1 , Michael Andreeff 1 , Jorge Cortes 1 , Marina Konopleva 1 , Kapil Bhalla 1 , Guillermo 6 Garcia-Manero 1 , Hagop Kantarjian 1 , Niko Beerenwinkel 6† , Nicholas Navin 3,5 , P Andrew 7 Futreal 2† and Koichi Takahashi 1,2† 8 9 Departments of 1 Leukemia, 2 Genomic Medicine, 3 Genetics, 4 Hematopathology, 5 Bioinformatics, 10 The University of Texas MD Anderson Cancer Center, Houston, Texas, USA 11 6 Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology in 12 Zurich, Zurich, Switzerland 13 7 Department of Neurosurgery, The University of Texas Health Science Center at Houston, 14 Houston, Texas, USA 15 8 Department of Hematology and Oncology, Graduate School of Medicine, The University of 16 Tokyo, Tokyo, Japan 17 18 19 20 *These authors contributed equally to this work. 21 22 † Correspondence to: 23 Niko Beerenwinkel, Ph.D. 24 Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology in 25 Zurich, Mattenstrasse 26, 4058, Basel, Switzerland, Email: [email protected]26 27 28 P Andrew Futreal, Ph.D. 29 Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center,1881 30 East Road, Unit 1954, Houston, TX 77054, USA; Email: [email protected]31 32 Koichi Takahashi, M.D. 33 Department of Leukemia and Genomic Medicine, The University of Texas MD Anderson 34 Cancer Center, 1515 Holcombe Boulevard, Unit 428, Houston, TX 77030, USA; Email: 35 [email protected]36 37 38 39
34
Embed
Clonal Evolution of Acute Myeloid Leukemia Revealed by ... · 1 Clonal Evolution of Acute Myeloid Leukemia Revealed by High-Throughput Single-Cell 2 Genomics 3 Kiyomi Morita1,8*,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Clonal Evolution of Acute Myeloid Leukemia Revealed by High-Throughput Single-Cell 1 Genomics 2
Kiyomi Morita1,8*, Feng Wang2*, Katharina Jahn6*, Jack Kuipers6, Yuanqing Yan7, Jairo 3 Matthews1, Latasha Little2, Curtis Gumbs2, Shujuan Chen2, Jianhua Zhang2, Xingzhi Song2, 4 Erika Thompson3, Keyur Patel4, Carlos Bueso-Ramos4, Courtney D DiNardo1, Farhad Ravandi1, 5 Elias Jabbour1, Michael Andreeff1, Jorge Cortes1, Marina Konopleva1, Kapil Bhalla1, Guillermo 6 Garcia-Manero1, Hagop Kantarjian1, Niko Beerenwinkel6†, Nicholas Navin3,5, P Andrew 7 Futreal2† and Koichi Takahashi1,2† 8 9 Departments of 1Leukemia, 2Genomic Medicine, 3 Genetics, 4Hematopathology, 5Bioinformatics, 10 The University of Texas MD Anderson Cancer Center, Houston, Texas, USA 11 6Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology in 12 Zurich, Zurich, Switzerland 13 7Department of Neurosurgery, The University of Texas Health Science Center at Houston, 14 Houston, Texas, USA 15 8Department of Hematology and Oncology, Graduate School of Medicine, The University of 16 Tokyo, Tokyo, Japan 17 18 19 20 *These authors contributed equally to this work. 21 22 †Correspondence to: 23 Niko Beerenwinkel, Ph.D. 24 Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology in 25 Zurich, Mattenstrasse 26, 4058, Basel, Switzerland, Email: [email protected] 26 27 28 P Andrew Futreal, Ph.D. 29 Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center,1881 30 East Road, Unit 1954, Houston, TX 77054, USA; Email: [email protected] 31 32 Koichi Takahashi, M.D. 33 Department of Leukemia and Genomic Medicine, The University of Texas MD Anderson 34 Cancer Center, 1515 Holcombe Boulevard, Unit 428, Houston, TX 77030, USA; Email: 35 [email protected] 36 37 38 39
Summary 1
One of the pervasive features of cancer is the diversity of mutations found in malignant 2
cells within the same tumor; a phenomenon called clonal diversity or intratumor heterogeneity. 3
Clonal diversity allows tumors to adapt to the selective pressure of treatment and likely 4
contributes to the development of treatment resistance and cancer recurrence. Thus, the ability to 5
precisely delineate the clonal substructure of a tumor, including the evolutionary history of its 6
development and the co-occurrence of its mutations, is necessary to understand and overcome 7
treatment resistance. However, DNA sequencing of bulk tumor samples cannot accurately 8
resolve complex clonal architectures. Here, we performed high-throughput single-cell DNA 9
sequencing to quantitatively assess the clonal architecture of acute myeloid leukemia (AML). 10
We sequenced a total of 556,951 cells from 77 patients with AML for 19 genes known to be 11
recurrently mutated in AML. The data revealed clonal relationship among AML driver mutations 12
and identified mutations that often co-occurred (e.g., NPM1/FLT3-ITD, DNMT3A/NPM1, 13
SRSF2/IDH2, and WT1/FLT3-ITD) and those that were mutually exclusive (e.g., NRAS/KRAS, 14
FLT3-D835/ITD, and IDH1/IDH2) at single-cell resolution. Reconstruction of the tumor 15
phylogeny uncovered history of tumor development that is characterized by linear and branching 16
clonal evolution patterns with latter involving functional convergence of separately evolved 17
clones. Analysis of longitudinal samples revealed remodeling of clonal architecture in response 18
to therapeutic pressure that is driven by clonal selection. Furthermore, in this AML cohort, 19
higher clonal diversity (≥4 subclones) was associated with significantly worse overall survival. 20
These data portray clonal relationship, architecture, and evolution of AML driver genes with 21
unprecedented resolution, and illuminate the role of clonal diversity in therapeutic resistance, 22
relapse and clinical outcome in AML. 23
Main 1
A growing body of evidence supports the role of clonal diversity in therapeutic 2
resistance, recurrence, and poor outcomes in cancer 1. Clonal diversity also reflects the history of 3
the accumulation of somatic mutations within a tumor. Thus, a precise characterization of clonal 4
diversity reveals not only the extent of a tumor’s clonal complexity but also the evolutionary 5
history of the tumor’s development. Much of the work characterizing the clonal architecture of 6
tumors has been done by computational inference using variant allele fraction (VAF) data from 7
massively parallel DNA sequencing of bulk tumor samples 2,3. However, the ability to infer 8
clonal heterogeneity and tumor phylogeny from bulk sequencing data is inherently limited, 9
because bulk sequencing techniques cannot reliably infer mutation co-occurrences and hence 10
often fail in reconstructing clonal substructure. 11
Single-cell DNA sequencing (scDNA-seq) can address some of these challenges 4-8. 12
However, until recently, the available methods required laborious single-cell isolation protocols 13
and suffered from low cell throughput, limited gene coverage, and technical artifacts from 14
whole-genome amplification that hindered their ability to characterize clonal architecture with 15
precision 9. Recent technological advances now allow rapid single-cell genotyping of targeted 16
cancer-related genes in thousands of cells. We previously described the performance and 17
feasibility of a new scDNA-seq platform (Tapestri® , Mission Bio, Inc.) in primary samples 18
from 2 patients with acute myeloid leukemia (AML) 10. Here, using this method, we conducted 19
scDNA-seq in 91 AML samples from 77 patients and uncovered the landscape of AML clonal 20
architecture at single-cell resolution. Using the data, we reconstructed the mutational history of 21
driver genes, some of which are therapeutic targets, and identified both linear and branching 22
clonal evolution patterns in AML. Additionally, we studied dynamic changes of clonal 23
architecture in response to therapies and analyzed the clinical implications of clonal diversity in 1
AML. 2
3
The landscape of driver mutations in AML at single-cell resolution 4
We analyzed bone marrow mononuclear cells (BMNC) from 77 AML patients, of which 5
64 (83%) were previously untreated, and 13 (17%) had relapsed or refractory disease (detailed 6
clinical characteristics are summarized in Supplementary Table 1). The cohort was enriched with 7
samples with normal diploid karyotype (N = 68, 88%) to avoid allelic imbalance affecting the 8
genotype calling. The median bone marrow blast percentage was 44% (interquartile range [IQR]: 9
29%-67%). A median of 7,584 BMNC (IQR: 6,194-8,361) per sample were sequenced by the 10
scDNA-seq platform (Fig. 1a). Across 40 amplicons targeting 19 known AML driver genes, 11
scDNA-seq resulted in a median of 25× coverage per amplicon per cell (IQR: 12×-43×, 12
Extended Data Fig. 1). The amplicons covering guanine-cytosine–rich sequences, such as 13
GATA2, SRSF2, and parts of RUNX1 and TP53, had lower coverage than others, such that 14
relatively large numbers of cells had inconclusive genotype information for the mutations 15
covered by these amplicons (Extended Data Fig. 2). The estimated median allele dropout (ADO) 16
rate was 4.7% (IQR: 3.6%-5.7%) (Extended Data Fig. 3). The estimated lower limit of detection 17
(LOD) of the platform was 0.1% of the cellular population based on the serial dilution assay of a 18
cell line and also from mutation validation by droplet digital PCR (Supplementary Table 2 and 19
Extended Data Fig. 4). 20
In total, we sequenced 556,951 BMNC from 77 AML patient samples (Fig. 1b). The 21
scDNA-seq approach detected 331 somatic mutations in 19 cancer genes, which included 238 22
(72%) single-nucleotide variants (SNV) and 93 (28%) small indels. Among those, 314 mutations 23
(95%) were orthogonally validated: 274 (87%) by conventional bulk next-generation sequencing 1
11 (bulk-seq, median 407×), 29 (9%) by droplet digital PCR, and 11 (3%) by a quantitative PCR 2
assay (all FLT3-internal tandem duplication [ITD], 4%). Therefore, the subsequent analyses used 3
a final set of 314 validated mutations (Supplementary Table 3). Of note, among the shared 4
genomic regions covered by the scDNA-seq and the bulk-seq platforms, all 274 (100%) 5
mutations called by the bulk-seq were also detected by scDNA-seq. The VAF from bulk-seq 6
(bulk VAF) and the VAF inferred from the scDNA-seq data (scDNA-seq VAF) had a good 7
concordance (rs = 0.78, p < 0.001) suggesting that the sequenced cells are a good representation 8
of the total bulk samples. (Fig. 1c and Extended Data Fig. 5). 9
The most frequently detected mutations by scDNA-seq in the 77 patients were in FLT3 10
(N = 37, 48%; 30 [39%] with ITD and 16 [21%] with non-ITD mutations), followed by NRAS 11
finding, and variant calling. Loom files that were generated by the pipeline via GATK-based 9
haplotype calling were then processed using in-house filtering criteria. We included cells for 10
downstream analysis that met the following criteria for genotyping: total read count (depth, DP) 11
≥ 10× and alternative allele count ≥ 3 (scVAF ≥ 15% if 20× ≤ DP ≤ 99×; scVAF ≥ 10% if DP ≥ 12
100×). Cells that did not satisfy these criteria were considered to have missing genotypes. 13
The ADO rate was calculated on the basis of common SNP information using 10 14
amplicons designed to cover 10 highly polymorphic loci in the Tapestri Single-Cell DNA AML 15
Panel. 16
Mutation detection by bulk sequencing 17
As an orthogonal validation, all samples were concurrently sequenced by conventional 18
bulk next-generation sequencing (NGS) using target-capture deep sequencing (N = 66, median 19
coverage: 432×, IQR: 283×-610×) or whole-exome sequencing (N = 11, median coverage: 146×, 20
IQR: 86×-158×). Target-capture NGS was performed using a SureSelect (Agilent Technologies) 21
custom panel of 295 genes that are recurrently mutated in hematological malignancies 22
(Supplementary Table 5). Detailed methods were previously described 11. Briefly, genomic DNA 23
was extracted using an Autopure extractor (QIAGEN/Gentra) and was fragmented and bait-1
captured in solution according to the manufacturer’s protocols. Captured DNA libraries were 2
then sequenced using a HiSeq 2000 sequencer (Illumina) with 76-bp paired-end reads. Whole-3
exome sequencing was performed using SureSelect V4 exome probes (Agilent Technologies) 4
and a HiSeq 2000 sequencer (Illumina) with 76-bp paired-end reads. Modified Mutect and Pindel 5
algorithms were used for mutation calling as described previously 11. 6
Comparison of genotype results from scDNA-seq and bulk sequencing 7
To determine how the models of clonal architecture obtained using the 2 sequencing 8
methods differed, we compared the VAF from bulk sequencing (bulk VAF) and the VAF from 9
single-cell genotype data (scDNA-seq VAF). scDNA-seq VAF was calculated as follows based 10
on the sequencing reads from the pooled single cells: (number of the single-cell sequencing reads 11
with alternate allele) / (number of total single-cell sequencing reads). 12
Inference of mutational histories 13
We used the SCITE (Single Cell Inference of Tumor Evolution) software to infer 14
phylogenetic trees of the driver mutations from scDNA-seq data as previously described 26. 15
SCITE is an MCMC-based Bayesian inference scheme that can be used to find a mutation tree (a 16
partial temporal order of mutations) that best fits the observed single-cell genotypes. The 17
concentration on the mutation tree (as opposed to a cell lineage tree) makes the use of SCITE 18
very efficient for use with our data that is characterized by few mutational events and many cells. 19
SCITE operates with 2 parameters, one for the false positive rate (FPR) and one for the 20
false negative rate, which can be either set to predefined values or inferred in the MCMC model 21
along with the tree structure. We used a global estimate of the sequencing error rate as the FPR 22
(1%) and dataset-specific estimates of the dropout rate (ADO provided by the platform) as the 23
false negative rate (FNR). In cases where no dropout rate was estimated, we let SCITE learn the 1
value from the data by giving it the average value of the estimates across all patients as a prior 2
estimate. We ran SCITE separately for each patient, providing the table of mutation calls as the 3
input (encoding 0 for wild-type, 1 for mutation, and 3 for missing data point). To obtain a robust 4
model, we ran SCITE with 4 different combinations of parameters: 1) use all cells including 5
missing genotype information with 1% FPR and SCITE inferred FNR, 2) use all cells including 6
missing genotype information with 1% FPR and platform provided FNR, 3) use only cells with 7
full genotype information with 1% FPR and SCITE inferred FNR, and 4) use only cells with full 8
genotype information with 1% FPR and platform provided FNR. When provided with an 9
incomplete genotype for a cell, SCITE is still able to use the partial genotyping information in 10
the tree inference and assigns cells into subclones based on the available information. 11
The inference procedure underlying SCITE is fully Bayesian, which allowed us to 12
quantify uncertainty in the inferred clonal architectures by sampling trees from the model’s 13
posterior distribution. We summarized the sampled trees by reporting 95% credible intervals for 14
the inferred subclones. 15
The tree structure (branching vs. linear) were mostly consistent among the 4 models (47 16
of 76 [62%] cases showing consistent tree structure, Extended data. Fig. 12). Phylogeny figures 17
that are shown in Fig. 3 are based on model 2 (all cells, 1% FPR, and platform provided FNR). 18
For longitudinal samples, we combined the scDNA-seq data from all time points from the same 19
patient and ran SCITE for the pooled data, and reconstructed the tumor phylogeny. To obtain 20
time point-specific estimates of subclone sizes, we performed the cell to subclone assignment in 21
the posterior sampling separately for each time point. As in some cases not all mutations were 22
observed at all time points, we adjusted the assignment probabilities such that a cell cannot be 23
placed below any mutation unobserved at the cell’s sampling time. This leads to subclones with a 1
temporary prevalence of 0%. This does not necessarily mean that the subclone was non-2
existent/extinct at that time, but simply reflects the lack of evidence for its existence based on the 3
cells sampled at the respective time point. The number of subclones was defined as the number 4
of distinct cellular populations carrying at least one mutations based on model 2. 5
SNP array 6
Genomic DNA from 28 samples in which scDNA-seq data showed at least 5% of 7
homozygously mutated clones were analyzed by Illumina Omni2.5-8 SNP array. The raw data 8
retrieved from an Illumina Omni2.5-8 SNP array was processed using GenomeStudio 2.0. The 9
raw log R ratio and B allele frequency were used for ASCAT (allele-10
specific copy number analysis of tumors) algorithm 37 to identify copy-number alterations. 11
Droplet digital PCR 12
We performed droplet digital PCR (ddPCR) using QX200TM Droplet Digital TM System 13
(Bio-Rad Laboratories) to confirm the variants that were detected by scDNA-seq but were not 14
detected by bulk NGS. ddPCRTM Supermix for Probes (No dUTP) was used with 50ng of 15
genomic DNA as a template for ddPCR assay in a 96-well plate according to the manufacture’s 16
protocol. 7ng of synthesized mutant DNA (designed through Bio-Rad Laboratories and ordered 17
through Integrated DNA Technologies) in a background of 130ng of normal human genomic 18
DNA (Promega) was used as a positive control. 50ng of normal human genomic DNA 19
(Promega) was used as a negative control. Water was used instead of DNAs for no-template 20
control reactions. Each reaction was tested in duplicate. Variant-specific primers/probes 21
(ddPCRTM Mutation Detection Assays, FAM/HEX for mutant/wildtype) were designed and 22
ordered through Bio-Rad Laboratories and are summarized in Supplementary Table 6. Data was 1
analyzed using Quanta-Soft Analysis Pro software v1.0.596 (Bio-Rad Laboratories). 2
Statistical analysis 3
Categorical variables were compared using Chi-squared or Fisher’s exact tests. 4
Continuous variables were analyzed by Student’s t-tests or Mann-Whitney U test depending on 5
the satisfaction of the statistical testing assumptions. Spearman’s rank correlation coefficient (rs) 6
was used to assess the relationships between two continuous variables that did not follow a 7
normal distribution. To evaluate cell-level co-occurrence and mutual exclusivity, a contingency 8
table was constructed to compute the log2-transformed odds ratios. Fisher’s exact test was used 9
to evaluate the statistical significance of associations. The Benjamini-Hochberg method was used 10
to adjust for multiple testing 38. In order to assess the prognostic relevance of clonal 11
heterogeneity, we collected survival information for previously-untreated 64 AML patients. 12
Overall survival was calculated from the date of pretreatment sample collection to the date of 13
death from any cause, and censored on the date of last follow-up if alive. Those who underwent 14
stem cell transplantation was censored on the date of transplantation. Kaplan-Meier plots were 15
used to visualize survival distributions. Differences in survival between groups were analyzed 16
using log-rank tests. We considered P value of less than 0.05 to be statistically significant. R 17
(ver. 3.4.3) and EZR 39 software packages were used for statistical analysis. 18
Code availability 19
Publicly available codes were used with a citation for data analysis. In-house codes that were 20
used for single-cell sequencing data variant calling are available from the corresponding author 21
on reasonable request. 22
Data availability 23
Deidentified clinical and genetic data is available in supplementary information. 1
Acknowledgments 2
This study was supported in part by the Cancer Prevention and Research Institute of Texas (grant 3
R120501 to PAF), the Welch Foundation (grant G-0040 to PAF), the University of Texas System 4
STARS Award (grant PS100149 to PAF), Physician Scientist Program at MD Anderson (to KT), 5
Lyda Hill Foundation (to PAF), the Charif Souki Cancer Research Fund (to HK), the MD 6
Anderson Cancer Center Leukemia SPORE grant (NIH P50 CA100632) (to HK), the MD 7
Anderson Cancer Center Support Grant (NIH/NCI P30 CA016672), Research Fellowships of the 8
Japan Society for the Promotion of Science for Young Scientists (to KM), and generous 9
philanthropic contributions to MD Anderson’s Moon Shot Program (to PAF, KT, GGM, and 10
HK). We thank Amy Ninetto at Department of Scientific Publications at MD Anderson for 11
providing scientific editing of the manuscript. We also thank Charles Silver, Dennis Eastburn, 12
Robert Durruthy-Durruthy, Matt Cato, Hannah Viernes, Anup Parikh, Sombeet Sahu, Kelly 13
Kaihara, and all others members of Mission Bio Inc. for the technical support. 14
15
Author contributions 16
KM performed the experiments, analyzed the data, and wrote the initial draft of the manuscript. 17
KT designed the study and wrote the manuscript. KJ, JK, and NB performed the phylogenetic 18
analysis. FW, JZ, and XS performed the bioinformatic analysis. YY performed the statistical 19
analysis. JM collected samples. LL, CG, SC, and ET performed sequencing. KP and CBR 20
performed pathologic analyses. CD, FR, EJ, MA, JC, MK, KB, GGM, and HK collected samples 21
and treated patients. NN and PAF critically reviewed the manuscript. PAF and KT provided 22
leadership and managed the study team. All authors read and approved the manuscript. 23
References 1
1 McGranahan, N. & Swanton, C. Clonal Heterogeneity and Tumor Evolution: Past, 2 Present, and the Future. Cell 168, 613-628 (2017). 3
2 Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 4 150, 264-278 (2012). 5
3 Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994-1007 (2012). 6 4 Paguirigan, A. L. et al. Single-cell genotyping demonstrates complex clonal diversity in 7
acute myeloid leukemia. Sci Transl Med 7, 281re282 (2015). 8 5 Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus genome 9
sequencing. Nature 512, 155 (2014). 10 6 Potter, N. et al. Single cell analysis of clonal architecture in acute myeloid leukaemia. 11
Leukemia (2018). 12 7 Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90 13
(2011). 14 8 Eirew, P. et al. Dynamics of genomic clones in breast cancer patient xenografts at single-15
cell resolution. Nature 518, 422-426 (2015). 16 9 Wang, Y. & Navin, Nicholas E. Advances and Applications of Single-Cell Sequencing 17
Technologies. Molecular Cell 58, 598-609 (2015). 18 10 Pellegrino, M. et al. High-throughput single-cell DNA sequencing of acute myeloid 19
leukemia tumors with droplet microfluidics. Genome Res (2018). 20 11 Takahashi, K. et al. Preleukaemic clonal haemopoiesis and risk of therapy-related 21
myeloid neoplasms: a case-control study. Lancet Oncol 18, 100-111 (2017). 22 12 Shouval, R. et al. Single cell analysis exposes intratumor heterogeneity and suggests that 23
FLT3-ITD is a late event in leukemogenesis. Exp Hematol 42, 457-463 (2014). 24 13 Papaemmanuil, E. et al. Genomic Classification and Prognosis in Acute Myeloid 25
Leukemia. New Engl J Med 374, 2209-2221 (2016). 26 14 The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of 27
adult de novo acute myeloid leukemia. N Engl J Med 368, 2059-2074 (2013). 28 15 Mupo, A. et al. A powerful molecular synergy between mutant Nucleophosmin and Flt3-29
ITD drives acute myeloid leukemia in mice. Leukemia 27, 1917-1920 (2013). 30 16 Meyer, S. E. et al. DNMT3A Haploinsufficiency Transforms FLT3ITD Myeloproliferative 31
Disease into a Rapid, Spontaneous, and Fully Penetrant Acute Myeloid Leukemia. 32 Cancer discovery 6, 501-515 (2016). 33
17 Yoshimi, A. et al. Spliceosomal Dysfunction Is a Critical Mediator of IDH2 Mutant 34 Leukemogenesis. Blood 130, 473-473 (2017). 35
18 Dovey, O. M. et al. Molecular synergy underlies the co-occurrence patterns and 36 phenotype of NPM1-mutant acute myeloid leukemia. Blood 130, 1911-1922 (2017). 37
19 Huang, Y.J. et al. RUNX1 Deficiency and SRSF2 Mutation Cooperate to Promote 38 Myelodysplastic Syndrome Development. Blood 130, 119-119 (2017). 39
20 Vicent, S. et al. Wilms tumor 1 (WT1) regulates KRAS-driven oncogenesis and 40 senescence in mouse and human models. J Clin Invest 120, 3940-3952 (2010). 41
21 Pronier, E. et al. Genetic and epigenetic evolution as a contributor to WT1-mutant 42 leukemogenesis. Blood 132, 1265-1278 (2018). 43
22 Dvinge, H., Kim, E., Abdel-Wahab, O. & Bradley, R. K. RNA splicing factors as 44 oncoproteins and tumour suppressors. Nat Rev Cancer 16, 413-430 (2016). 45
23 Bejar, R. et al. Validation of a prognostic model and the impact of mutations in patients 1 with lower-risk myelodysplastic syndromes. J Clin Oncol 30, 3376-3382 (2012). 2
24 Cisowski, J., Sayin, V. I., Liu, M., Karlsson, C. & Bergo, M. O. Oncogene-induced 3 senescence underlies the mutual exclusive nature of oncogenic KRAS and BRAF. 4 Oncogene 35, 1328 (2015). 5
25 Unni, A. M., Lockwood, W. W., Zejnullahu, K., Lee-Lin, S. Q. & Varmus, H. Evidence 6 that synthetic lethality underlies the mutual exclusivity of oncogenic KRAS and EGFR 7 mutations in lung adenocarcinoma. Elife 4, e06907 (2015). 8
26 Jahn, K., Kuipers, J. & Beerenwinkel, N. Tree inference for single-cell data. Genome Biol 9 17, 86 (2016). 10
27 Shlush, L. I. et al. Identification of pre-leukaemic haematopoietic stem cells in acute 11 leukaemia. Nature 506, 328-333 (2014). 12
28 Abelson, S. et al. Prediction of acute myeloid leukaemia risk in healthy individuals. 13 Nature 559, 400-404 (2018). 14
29 Welch, J. S. Mutation position within evolutionary subclonal architecture in AML. Semin 15 Hematol 51, 273-281 (2014). 16
30 Smith, C. C., Lin, K., Stecula, A., Sali, A. & Shah, N. P. FLT3 D835 mutations confer 17 differential resistance to type II FLT3 inhibitors. Leukemia 29, 2390-2392 (2015). 18
31 Anderson, K. et al. Genetic variegation of clonal architecture and propagating cells in 19 leukaemia. Nature 469, 356 (2010). 20
32 Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by 21 multiregion sequencing. N Engl J Med 366, 883-892 (2012). 22
33 Campbell, P. J. et al. The patterns and dynamics of genomic instability in metastatic 23 pancreatic cancer. Nature 467, 1109-1113 (2010). 24
34 Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by 25 multiregion sequencing. Nat Med 21, 751-759 (2015). 26
35 van Galen, P. et al. Single-Cell RNA-Seq Reveals AML Hierarchies Relevant to Disease 27 Progression and Immunity. Cell 176, 1265-1281 e1224 (2019). 28
36 Gaiti, F. et al. Epigenetic evolution and lineage histories of chronic lymphocytic 29 leukaemia. Nature 569, 576-580 (2019). 30
37 Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U 31 S A 107, 16910-16915 (2010). 32
38 Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and 33 Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B 34 (Methodological) 57, 289-300 (1995). 35
39 Kanda, Y. Investigation of the freely available easy-to-use software 'EZR' for medical 36 statistics. Bone marrow transplantation 48, 452-458 (2013). 37
38
39
Figure Legends 1 2 Fig. 1. The Genetic landscape of AML based on single-cell DNA sequencing. a, Distribution 3
of the number of total sequenced cells. Each point represents a sample from unique patients. b, 4
Somatic mutations in 556,951 cells from 77 AML patients detected by single-cell DNA 5
sequencing. Each column represents a cell, and cells from the same case are clustered together 6
within the areas surrounded by the grey lines. Note that some cases are difficult to be segregated 7
in print. Cells that were genotyped as being mutated or wild type for the indicated gene are 8
colored in blue and white, respectively. Cells with missing genotypes are colored in grey. When 9
one sample has multiple different mutations in the same gene, they were annotated differently 10
(e.g., NRAS_a, NRAS_b). A total of 57,953 cells that were genotyped as wild type for all the 11
variants screened are not shown. c, Correlation of the variant allele fraction (VAF) from bulk-12
sequencing and single cell DNA sequencing. The X-axis shows the VAF from the single-cell 13
genotype data (scDNA-seq VAF). The Y-axis shows the VAF from the bulk next-generation 14
sequencing (bulk VAF). Each dot represents a detected variant. The linear trendline was added to 15
best fit the distribution of the dots. The shaded area around the trendline represents the 95% 16
confidence intervals. d, Cellular-level co-occurrence of DNMT3A, NPM1, and FLT3-ITD 17
mutations. Heat map (left) shows the genotype of each sequenced cell for each variant, with 18
clustering based on the genotypes of driver mutations. Each column represents a cell at the 19
indicated scale. Cells with mutations and wild-type cells are indicated in blue and white, 20
respectively. Cells with missing genotypes are indicated in grey. The subclones located to the 21
right of the red line comprised <1% of the total sequence cells, since such small subclones can 22
represent false positive or negative genotypes as a result of ADO or multiplets. The figure on the 23
right show the pairwise association of mutations. The color and size of each panel represent the 24
degree of the logarithmic odds ratio (log OR). The bar on the right side is a key indicating the 1
association of the colors with the log OR. Co-occurrence and mutual exclusivity are indicated by 2
red and blue, respectively. The statistical significance of the associations based on the false 3
discovery rate (FDR) is indicated by the asterisks (*FDR < 0.1, **FDR < 0.05, ***FDR < 4
0.001). e, Frequency of mutation combination showing statistically-significant cell-level co-5
occurrence (FDR < 0.001). x-axis represents the combination of mutations based on mutated 6
genes, and y-axis shows the number of patients showing the significant cell-level co-occurrence 7
of each mutation combination. Mutation combinations that were detected in 3 or more patients 8
are plotted. Bars are colored based on the frequency (red if significantly co-occurred in >10 9
patients, orange if 6-10 patients, green if 4-5 patients, blue if 3 patients). f, Circos plot showing 10
the patterns of mutation co-occurrence for all genes based on the single-cell genotype data. 11
When 2 mutations co-occurred in the same cell, a ribbon connects the genes. The width of each 12
ribbon is proportional to the frequency of mutational events. g, Cellular-level co-occurrence of 13
SF3B1 and SRSF2 mutations. h-l, Cell-level mutual exclusivity patterns of somatic mutations in 14
individual samples for 5 representative cases. (h) KRAS and NRAS, (i) KIT and FLT3-ITD, (j) 15
IDH1 and IDH2, (k) FLT3-non-ITD, FLT3-ITD, and NRAS, and (l) RUNX1 p.K152fs and 16
RUNX1 p.D198N variants did not co-occur in the same cellular populations. Mut, mutated; WT, 17
wild type; Missing, missing genotype. 18
19
Fig. 2. Homozygous variants involving copy-neutral loss of heterozygosity. a-d, 20
Representative cases with highly homozygous variants analyzed by SNP array. The bar graphs 21
on the left show the distribution of zygosity for each indicated variant. Cells that were genotyped 22
as having heterozygous and homozygous mutations are shown in blue and red, respectively. The 23
numbers on the bars represent the number of cells with each genotype. The figures in the middle 1
show the distribution of the allele counts for the two alleles (green or red). The allele count is 2
shown on the vertical axes, and the chromosomes are shown on the horizontal axes. The 3
chromosomes on which the highly homozygous variants were located are highlighted with blue 4
rectangles. Distributions of depth are shown in the figures on the right based on the genotype 5
calling. Heat maps incorporating the zygosity information are also shown for cases with 6
validated homozygosity. Copy-neutral loss of heterozygosity (CN-LOH) involving the 7
homozygously called variant loci was detected by SNP array in cases with highly homozygous 8
(a) RUNX1 p.Q355X and (b) FLT3-ITD variants. Cases with homozygously called (c) SRSF2 9
p.P95R and (d) NPM1 p.L287fs variants did not have CN-LOH or any other copy-number 10