This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Supplementary Materials
Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for
somatic mutations
Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking
Supplementary Materials includes:
Supplementary Notes 1-2
Supplementary Figures 1-4
Supplementary Tables 1-8
2
Supplementary Note 1. Authenticity of the Results
In this note we consider several potential technical artifacts that might explain the tissue-specific
and allele-specific heteroplasmies. First, the postmortem interval between death and tissue sampling
varied from 24-72 hours, so heteroplasmies might reflect postmortem degradation of DNA (which
might be more pronounced in some tissues, such as liver). However, the correlation between the
number of heteroplasmies identified and the postmortem interval is not significant for either
individuals overall (Spearman’s r=0.075, P=0.364) or for each specific tissue (all p-values > 0.04).
Moreover, postmortem damage should occur randomly with respect to positions in the sequence,
whereas we find distinctive nonrandom patterns, such as more heteroplasmies in the control region
(Fig. 1). The cause of death may also influence heteroplasmy; however none of the individuals died
from any disease known to be associated with mtDNA mutations, and only one death was attributed
to cancer. We investigated whether there was any association between cause of death (according to
the major categories in Table S1) and the number of heteroplasmies detected (Fig. S1.1).
Figure S1.1 Boxplots of the number of heteroplasmies detected in each tissue according to cause of
death (A: Cardiovascular; B:Traumatic injuries; C:Natural causes (non-cardiovascular); D:
Intoxication; E: Unclear or other). See Fig. 1 for tissue abbreviations.
3
Individuals dying from intoxication had significantly fewer heteroplasmies in small intestine, liver,
and myocardial muscle tissue than did individuals dying from other causes; however, individuals
dying from intoxication were also significantly younger than individuals dying from other causes
(age at death = 42 vs. 60; P = 0.0039, Mann-Whitney test), and heteroplasmies are strongly age-
related (Fig. 3). To control for this age effect, for each tissue we generated 10,000 random subsets of
individuals dying from other causes, containing the same number of samples as the set of individuals
dying from intoxication, and required that the difference in average age between the two sets to be
less than or equal to three years. There were no significant differences in the number of
heteroplasmies detected in individuals dying from intoxication vs. individuals dying from other
causes in these age-controlled subsets. We thus conclude that, when age is controlled for, there is no
effect of cause of death on heteroplasmy incidence.
Second, all samples were pooled into a single library and sequenced together on multiple lanes,
eliminating potential batch effects due to variation between sequencing runs/lanes. Biased capture
during capture enrichment can also be excluded because individuals of different ages were included
in each pool of libraries that was subject to capture-enrichment, and allele-specific heteroplasmies
are correlated with age (Fig. S3), which would not be expected if biased capture were responsible for
the heteroplasmy observations.
Third, systematic differences in coverage across tissues, or across specific mtDNA regions, could
result in tissue-related differences in the detection of heteroplasmy. As shown in Fig. S1.2, coverage
is highest for myocardial muscle and lowest for blood, with the coverage for the remaining tissues
approximately the same, and there are characteristic “peaks” and “valleys” in the coverage across the
mtDNA genome.
Figure S1.2 Variation in coverage across the mtDNA genome per tissue. The X-axis is the position
in the mtDNA genome, while the Y-axis is the average coverage.
However, the number of heteroplasmies detected is not systematically correlated with coverage (Fig.
S1.3), nor does coverage differ with respect to the age of an individual (Fig. S1.4). Thus, variation in
coverage cannot explain the age-related correlations we see in heteroplasmy. Moreover, variation in
4
coverage does not differ between alleles for the allele-specific heteroplasmies (Fig. S1.5). We
therefore conclude that variation in coverage cannot explain the age-related, tissue-related, and
allele-specific heteroplasmies found in this study.
Figure S1.3 Number of heteroplasmies identified (Y-axis) vs. average coverage (X-axis) for each
tissue sample. The numbers after each tissue abbreviation are the Spearman rank correlation
coefficient, followed by the associated p-value.
5
Figure S1.4 Average coverage (Y-axis) vs. age for each individual (X-axis) . The numbers after
each tissue abbreviation are the Spearman rank correlation coefficient, followed by the associated p-
value.
6
Figure S1.5 Boxplots of the variation in coverage for each of the seven tissue-specific and allele-
specific heteroplasmic positions shown in Fig. 2 of the main manuscript. In each plot, the coverage is
shown for each tissue and each consensus allele.
Fifth, if we focus on heteroplasmies shared by two or more tissues in the same individual, this
should enrich for heteroplasmies that were either transmitted from the mother or occurred early in
development (prior to the divergence of the tissues that share the heteroplasmy). A tree relating
tissues based on such shared heteroplasmies closely corresponds to patterns of tissue development
(Fig. S1.6), indicating that shared heteroplasmies are behaving as expected.
Sixth, previous studies of fewer tissues and individuals have found some of the same tissue-
related and allele-related heteroplasmies that we find (Table S2). However, these previous studies
analyzed too few samples to identify the significant tissue-related and allele-related patterns that we
have identified.
As a further check on the reproducibility of the results, we selected 15 samples for resequencing.
New libraries were prepared from the DNA extracts, pooled, captured for mtDNA sequencing as
before (1) and sequenced on the HiSeq platform (paired-end reads, 96 bp); the results are shown in
Fig. S1.7. There is a very high and significant correlation between the alternative allele frequencies
at each heteroplasmic site detected in the two HiSeq runs (r = 0.971, p <0.0001); thus the original
findings are reproducible with the same technology.
7
Figure S1.6 Neighbor-joining tree based on the alternative allele frequency at heteroplasmic sites
shared by two or more tissues. See legend to Fig. 1 for tissue abbreviations.
Figure S1.7 Correlation between alternative allele frequencies at heteroplasmic sites for 15 samples
resequenced on the HiSeq platform.
8
Finally, we also applied a different technology, namely droplet digital PCR (ddPCR, see
Methods) to independently estimate the heteroplasmy alternative allele frequency in a subset of the
data. We chose 8 positions (Table S7); one position (16086) exhibits heteroplasmy in virtually every
tissue, and was analyzed in every tissue from five individuals. The remaining positions show tissue-
related heteroplasmy (including NS mutations at positions 4142, 10851, 11126, and 12569 that occur
preferentially in liver). The results are provided in Table S8 and shown in Fig. S1.8; the correlation
between alternative allele frequencies (where the consensus allele is defined as the consensus among
all of the tissues from an individual) estimated from sequencing vs. ddPCR is quite high (Pearson’s r
= 0.996; p <0.00001). Even when restricting the comparison to low-level heteroplasmies, with
alternative allele frequencies less than 0.05, the correlation remains quite convincing (n= 57, r =
0.835, p<0.00001). Thus, these results provide independent confirmation of the heteroplasmies
inferred from sequencing.
In sum, we are not able to find any experimental or analytical artifact that could explain the age-
related, tissue-specific, and allele-specific heteroplasmies that we find in this study.
Figure S1.8 Correlation between heteroplasmy level (alternative allele frequency) estimated from
sequencing, vs. that estimated from ddPCR. Note that the consensus allele is defined from the
consensus sequences from all of the tissues in an individual, and hence the alternative allele
frequency in a specific tissue can be greater than 0.5.
Table S8 Comparison of alternative allele frequencies determined by ddPCR vs. sequencing on the
Illumina platform. Samples are indicated by tissue abbreviation (from the legend to Fig. 1) and ID
number; ddPCR and Sequencing are alternative allele frequencies inferred by ddPCR and Illumina
sequencing, respectively.
Position Sample Consensus
allele Alternative
allele ddPCR Sequencing
16086 KI200 C T 0.032 0.036
16086 KI202 C T 0.048 0.044
16086 KI282 C T 0.021 0.023
16086 KI354 C T 0.036 0.024
16086 KI375 C T 0.020 0.021
16086 CEL200 C T 0.061 0.089
16086 CEL202 C T 0.050 0.051
16086 CEL282 C T 0.081 0.083
16086 CEL354 C T 0.139 0.137
16086 CEL375 C T 0.050 0.066
16086 CER200 C T 0.250 0.232
16086 CER202 C T 0.117 0.126
16086 CER282 C T 0.183 0.194
16086 CER354 C T 0.372 0.355
16086 CER375 C T 0.136 0.153
16086 BL200 C T 0.033 0.018
16086 BL202 C T 0.010 0.011
16086 BL354 C T 0.023 0.010
16086 BL375 C T 0.000 0.000
16086 MM200 C T 0.242 0.244
16086 MM282 C T 0.085 0.086
16086 MM354 C T 0.451 0.445
16086 MM375 C T 0.019 0.015
16086 SM200 C T 0.898 0.909
16086 SM202 C T 0.390 0.464
16086 SM282 C T 0.900 0.917
16086 SM354 C T 0.950 0.945
16086 SM375 C T 0.313 0.353
16086 SK200 C T 0.021 0.024
16086 SK202 C T 0.022 0.010
16086 SK282 C T 0.024 0.005
16086 SK354 C T 0.027 0.018
16086 SK375 C T 0.012 0.006
16086 CO200 C T 0.239 0.228
16086 CO202 C T 0.158 0.143
16086 CO282 C T 0.156 0.151
16086 CO354 C T 0.286 0.291
16086 CO375 C T 0.148 0.156
16086 LI200 C T 0.093 0.078
16086 LI202 C T 0.083 0.077
39
Position Sample Consensus allele
Alternative allele
ddPCR Sequencing
16086
LI282
C
T
0.092
0.091
16086 SI200 C T 0.139 0.135
16086 SI202 C T 0.093 0.116
16086 SI282 C T 0.034 0.030
16086 SI354 C T 0.159 0.150
16086 SI375 C T 0.017 0.007
16086 LIV200 C T 0.024 0.028
16086 LIV282 C T 0.025 0.028
16086 LIV354 C T 0.048 0.047
16086 LIV202 C T 0.059 0.062
16086 LIV375 C T 0.064 0.058
16086 LIV375 C T 0.072 0.058
11126 SM290 G A 0.027 0.029
11126 LIV240 G A 0.028 0.034
11126 LIV248 G A 0.039 0.037
11126 LIV289 G A 0.072 0.056
11126 LIV307 G A 0.024 0.024
11126 LIV361 G A 0.086 0.079
11126 LIV197 G A 0.020 0.024
4142 LIV264 G A 0.023 0.026
4142 LIV289 G A 0.039 0.034
4142 LIV334 G A 0.029 0.030
10851 LIV338 G A 0.057 0.054
10851 LIV344 G A 0.024 0.026
10851 LIV197 G A 0.018 0.024
12569 LIV261 T C 0.135 0.127
12569 LIV315 T C 0.021 0.040
408 SM192 T A 0.066 0.105
408 CO192 T A 0.010 0.010
408 SM239 T A 0.089 0.122
408 SM289 T A 0.226 0.282
408 SM248 T A 0.263 0.231
408 CO248 T A 0.022 0.011
408 SM379 T A 0.062 0.090
408 SM323 T A 0.060 0.057
408 SM279 T A 0.058 0.063
408 SM222 T A 0.033 0.038
408 SM347 T A 0.043 0.035
408 SM268 T A 0.024 0.020
408 SM214 T A 0.017 0.021
564 MM315 G A 0.011 0.014
564 SM315 G A 0.058 0.052
564 MM290 G A 0.047 0.044
564 MM193 G A 0.046 0.043
564 MM381 G A 0.029 0.029
40
Position Sample Consensus allele
Alternative allele
ddPCR Sequencing
564 MM279 G A 0.024 0.030
564 MM220 G A 0.024 0.020
564 MM227 G A 0.023 0.022
564 SM326 G A 0.030 0.026
16327 SM362 C T 0.185 0.199
16327 SM289 C T 0.124 0.139
16327 SM244 C T 0.093 0.114
16327 SM235 C T 0.045 0.067
16327 SM248 C T 0.065 0.082
16327 SM227 C T 0.033 0.053
16327 SM355 C T 0.034 0.050
16327 SM230 C T 0.034 0.040
16327 SM380 C T 0.025 0.030
16327 LI380 C T 0.002 0.007
16327 SM314 C T 0.021 0.020
16327 SM204 C T 0.072 0.093
16327 SK204 C T 0.011 0.010
41
Additional References
1. Maricic T, Whitten M, & Paabo S (2010) Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS One 5:e14004.
2. Nielsen R (2005) Molecular signatures of natural selection. Annu Rev Genet 39:197-218. 3. Ingman M & Gyllensten U (2007) Rate variation between mitochondrial domains and adaptive
evolution in humans. Hum Mol Genet 16:2281-2287. 4. Kryazhimskiy S & Plotkin JB (2008) The population genetics of dN/dS. PLoS Genet 4:e1000304. 5. Mugal CF, Wolf JB, & Kaj I (2014) Why time matters: codon evolution and the temporal
dynamics of dN/dS. Mol Biol Evol 31:212-231. 6. Reva B, Antipin Y, & Sander C (2011) Predicting the functional impact of protein mutations:
application to cancer genomics. Nucleic Acids Res 39:e118.