Top Banner
1 Supplementary Materials Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials includes: Supplementary Notes 1-2 Supplementary Figures 1-4 Supplementary Tables 1-8
41

Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

Apr 08, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

1

Supplementary Materials

Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for

somatic mutations

Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking

Supplementary Materials includes:

Supplementary Notes 1-2

Supplementary Figures 1-4

Supplementary Tables 1-8

Page 2: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

2

Supplementary Note 1. Authenticity of the Results

In this note we consider several potential technical artifacts that might explain the tissue-specific

and allele-specific heteroplasmies. First, the postmortem interval between death and tissue sampling

varied from 24-72 hours, so heteroplasmies might reflect postmortem degradation of DNA (which

might be more pronounced in some tissues, such as liver). However, the correlation between the

number of heteroplasmies identified and the postmortem interval is not significant for either

individuals overall (Spearman’s r=0.075, P=0.364) or for each specific tissue (all p-values > 0.04).

Moreover, postmortem damage should occur randomly with respect to positions in the sequence,

whereas we find distinctive nonrandom patterns, such as more heteroplasmies in the control region

(Fig. 1). The cause of death may also influence heteroplasmy; however none of the individuals died

from any disease known to be associated with mtDNA mutations, and only one death was attributed

to cancer. We investigated whether there was any association between cause of death (according to

the major categories in Table S1) and the number of heteroplasmies detected (Fig. S1.1).

Figure S1.1 Boxplots of the number of heteroplasmies detected in each tissue according to cause of

death (A: Cardiovascular; B:Traumatic injuries; C:Natural causes (non-cardiovascular); D:

Intoxication; E: Unclear or other). See Fig. 1 for tissue abbreviations.

Page 3: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

3

Individuals dying from intoxication had significantly fewer heteroplasmies in small intestine, liver,

and myocardial muscle tissue than did individuals dying from other causes; however, individuals

dying from intoxication were also significantly younger than individuals dying from other causes

(age at death = 42 vs. 60; P = 0.0039, Mann-Whitney test), and heteroplasmies are strongly age-

related (Fig. 3). To control for this age effect, for each tissue we generated 10,000 random subsets of

individuals dying from other causes, containing the same number of samples as the set of individuals

dying from intoxication, and required that the difference in average age between the two sets to be

less than or equal to three years. There were no significant differences in the number of

heteroplasmies detected in individuals dying from intoxication vs. individuals dying from other

causes in these age-controlled subsets. We thus conclude that, when age is controlled for, there is no

effect of cause of death on heteroplasmy incidence.

Second, all samples were pooled into a single library and sequenced together on multiple lanes,

eliminating potential batch effects due to variation between sequencing runs/lanes. Biased capture

during capture enrichment can also be excluded because individuals of different ages were included

in each pool of libraries that was subject to capture-enrichment, and allele-specific heteroplasmies

are correlated with age (Fig. S3), which would not be expected if biased capture were responsible for

the heteroplasmy observations.

Third, systematic differences in coverage across tissues, or across specific mtDNA regions, could

result in tissue-related differences in the detection of heteroplasmy. As shown in Fig. S1.2, coverage

is highest for myocardial muscle and lowest for blood, with the coverage for the remaining tissues

approximately the same, and there are characteristic “peaks” and “valleys” in the coverage across the

mtDNA genome.

Figure S1.2 Variation in coverage across the mtDNA genome per tissue. The X-axis is the position

in the mtDNA genome, while the Y-axis is the average coverage.

However, the number of heteroplasmies detected is not systematically correlated with coverage (Fig.

S1.3), nor does coverage differ with respect to the age of an individual (Fig. S1.4). Thus, variation in

coverage cannot explain the age-related correlations we see in heteroplasmy. Moreover, variation in

Page 4: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

4

coverage does not differ between alleles for the allele-specific heteroplasmies (Fig. S1.5). We

therefore conclude that variation in coverage cannot explain the age-related, tissue-related, and

allele-specific heteroplasmies found in this study.

Figure S1.3 Number of heteroplasmies identified (Y-axis) vs. average coverage (X-axis) for each

tissue sample. The numbers after each tissue abbreviation are the Spearman rank correlation

coefficient, followed by the associated p-value.

Page 5: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

5

Figure S1.4 Average coverage (Y-axis) vs. age for each individual (X-axis) . The numbers after

each tissue abbreviation are the Spearman rank correlation coefficient, followed by the associated p-

value.

Page 6: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

6

Figure S1.5 Boxplots of the variation in coverage for each of the seven tissue-specific and allele-

specific heteroplasmic positions shown in Fig. 2 of the main manuscript. In each plot, the coverage is

shown for each tissue and each consensus allele.

Fifth, if we focus on heteroplasmies shared by two or more tissues in the same individual, this

should enrich for heteroplasmies that were either transmitted from the mother or occurred early in

development (prior to the divergence of the tissues that share the heteroplasmy). A tree relating

tissues based on such shared heteroplasmies closely corresponds to patterns of tissue development

(Fig. S1.6), indicating that shared heteroplasmies are behaving as expected.

Sixth, previous studies of fewer tissues and individuals have found some of the same tissue-

related and allele-related heteroplasmies that we find (Table S2). However, these previous studies

analyzed too few samples to identify the significant tissue-related and allele-related patterns that we

have identified.

As a further check on the reproducibility of the results, we selected 15 samples for resequencing.

New libraries were prepared from the DNA extracts, pooled, captured for mtDNA sequencing as

before (1) and sequenced on the HiSeq platform (paired-end reads, 96 bp); the results are shown in

Fig. S1.7. There is a very high and significant correlation between the alternative allele frequencies

at each heteroplasmic site detected in the two HiSeq runs (r = 0.971, p <0.0001); thus the original

findings are reproducible with the same technology.

Page 7: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

7

Figure S1.6 Neighbor-joining tree based on the alternative allele frequency at heteroplasmic sites

shared by two or more tissues. See legend to Fig. 1 for tissue abbreviations.

Figure S1.7 Correlation between alternative allele frequencies at heteroplasmic sites for 15 samples

resequenced on the HiSeq platform.

Page 8: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

8

Finally, we also applied a different technology, namely droplet digital PCR (ddPCR, see

Methods) to independently estimate the heteroplasmy alternative allele frequency in a subset of the

data. We chose 8 positions (Table S7); one position (16086) exhibits heteroplasmy in virtually every

tissue, and was analyzed in every tissue from five individuals. The remaining positions show tissue-

related heteroplasmy (including NS mutations at positions 4142, 10851, 11126, and 12569 that occur

preferentially in liver). The results are provided in Table S8 and shown in Fig. S1.8; the correlation

between alternative allele frequencies (where the consensus allele is defined as the consensus among

all of the tissues from an individual) estimated from sequencing vs. ddPCR is quite high (Pearson’s r

= 0.996; p <0.00001). Even when restricting the comparison to low-level heteroplasmies, with

alternative allele frequencies less than 0.05, the correlation remains quite convincing (n= 57, r =

0.835, p<0.00001). Thus, these results provide independent confirmation of the heteroplasmies

inferred from sequencing.

In sum, we are not able to find any experimental or analytical artifact that could explain the age-

related, tissue-specific, and allele-specific heteroplasmies that we find in this study.

Figure S1.8 Correlation between heteroplasmy level (alternative allele frequency) estimated from

sequencing, vs. that estimated from ddPCR. Note that the consensus allele is defined from the

consensus sequences from all of the tissues in an individual, and hence the alternative allele

frequency in a specific tissue can be greater than 0.5.

Page 9: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

9

Supplementary Note 2. Detecting positive selection involving nonsynonymous heteroplasmies.

Our introduction of the hN/hS statistic is motivated by a similar statistic, the dN/dS ratio, which

is commonly used as a test for selection on protein-coding genes (2): dN is the number of

nonsynonymous differences per nonsynonymous site between two sequences, while dS is the number

of synonymous differences per synonymous site. An analogous statistic is the ka/ks ratio, in which

ka is the number of nonsynonymous changes along a lineage per nonsynonymous site, and ks is the

number of synonymous changes along the same lineage per synonymous site; ka/ks ratios have, for

example, been used to evaluate claims of climate-related selection on human mtDNA variation (3).

For our hN/hS statistic, the numerator, hN, is the number of nonsynonymous heteroplasmies divided

by the total number of sites in the sequence where a mutation would produce a nonsynonymous

difference. The denominator, hS, is the number of synonymous heteroplasmies divided by the total

number of sites in the sequence where a mutation would produce a synonymous difference. Thus, the

ratio hN/hS is normalized for the number of nonsynonymous and synonymous sites in the sequence

(typically, about 70% of the positions in a protein-coding sequence are nonsynonymous sites and

30% are synonymous, but the actual numbers vary depending on the specific codons used).

The purpose of dividing hN by hS is to distinguish positive selection on nonsynonymous

heteroplasmies (which would increase hN only) from an elevated mutation rate (which would

increase both hN and hS). The conventional interpretation of hN/hS ratios is as follows:

hN/hS< 1 : some degree of purifying (negative) selection on nonsynonymous heteroplasmies (fewer

nonsynonymous than synonymous heteroplasmies)

hN/hS = 1: complete neutrality (no selection against or for nonsynonymous heteroplasmies)

hN/hS> 1: positive selection (more nonsynonymous than synonymous heteroplasmies)

Note that the lower the hN/hS ratio, the greater the degree of negative selection against amino acid

changes. If functional constraints are relaxed, such that the negative selection pressure is decreased,

then hN/hS will increase, but with relaxed constraints the hN/hS ratio is not expected to become

greater than one. If nonsynonymous and synonymous heteroplasmies are occurring at the same rate

(as would be expected, for example, with postmortem degradation), then the expectation is hN/hS =

1. However, it has been shown that the above interpretation of dN/dS (in our case, hN/hS) ratios only

hold strictly when distantly-related lineages are compared, such that dN and dS can be taken to

represent fixed differences between lineages; when comparing polymorphisms within a species, the

above relationships may not hold – e.g., it is possible to get dN/dS ratios that are less than one even

with positive selection, or dN/dS ratios greater than one without positive selection (4, 5). The

standard tests for significance of a dN/dS ratio therefore may not give accurate results when applied

to intraspecific data, and we would expect this to also hold for intra-individual data.

Therefore, in order to investigate the significance of the observed hN/hS ratio of 3.11 in liver-

specific heteroplasmies, we used a resampling approach that takes into account the observed

spectrum of heteroplasmic mutations. There were 114 liver-specific heteroplasmies in the mtDNA

coding region, with the following spectrum of mutations:

Page 10: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

10

A>C: 2

A>G: 3

C>T: 3

G>A: 62

T>C: 43

T>G: 1

We took the coding portion of the rCRS sequence, applied the above spectrum of changes to

positions at random, and calculated the resulting hN/hS ratio. We repeated this procedure 100,000

times to generate a distribution of hN/hS ratios that would be expected if the observed spectrum of

mutational changes were occurring at random with respect to nonsynonymous vs. synonymous sites,

taking into account the actual codons used in human mtDNA sequences. The results are shown in

Fig. S2.1, and there are two important conclusions. First, the average hN/hS ratio is 1.4, and the

probability of a random hN/hS ratio that is greater than one is 0.93. This is in accordance with

previous observations that even under neutrality, dN/dS ratios greater than one can be obtained for

within-population comparisons (4, 5). Second, the empirical probability that a random hN/hS ratio

exceeds the observed value of 3.11 for liver-specific heteroplasmies is only 0.00241. This result is

significant after Bonferroni correction for the number of independent tests of hN/hS ratios: there are

16 such tests (12 based on tissue-shared heteroplasmies and 4 based on tissue-specific

heteroplasmies; see Table S5), resulting in an adjusted significance level of 0.05/16 = 0.00313. Thus,

this analysis provides strong evidence against the null hypothesis that relaxed constraints against

nonsynonymous mutations are producing the observed hN/hS ratio in liver tissue. Instead, these

results favor the alternative hypothesis of positive selection for nonsynonymous somatic mutations in

liver.

Figure S2.1 Distribution of hN/hS ratios obtained from resampling with the observed mutational

spectrum for liver-specific heteroplasmic mutations in the mtDNA coding region. The red line shows

Page 11: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

11

the hN/hS ratio of 3.11 that is observed for liver-specific heteroplasmies, and the associated

empirical p-value.

We adopted a similar resampling approach to investigate if there was an excess of liver-specific

NS mutations that were predicted to have a high or medium risk of a functional effect on the protein.

We observed 103 NS liver-specific mutations; for 100 of these, the risk of a functional effect could

be assigned by the Mutationassessor software (6), and 84% are predicted to have a high or medium

risk of a functional effect. Using the observed mutational spectrum for these 100 mutations, we

resampled 100 NS mutations at random (based on the rCRS), predicted the risk of a functional effect

for each mutation, and repeated this process 100,000 times, to generate a null distribution for the

frequency of high/medium risk NS mutations. The results are shown in Fig. S2.2, and indicate that

the probability by chance of observing 84% of NS mutations with a high or medium risk of a

functional effect is only 0.00179. This analysis thus suggests that high/medium risk NS somatic

mutations are occurring preferentially in liver tissue.

Figure S2.2 Distribution of the proportion of predicted high/medium risk NS mutations, based on

random resampling of NS mutations conditioned on the mutation spectrum for liver-specific NS

heteroplasmies. The red line indicates the observed proportion of 0.84 for liver-specific NS

mutations and the associated empirical p-value.

Page 12: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

12

Figure S1 Age distribution of the individuals in this study.

Page 13: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

13

Figure S2 Sequencing coverage for each tissue. See legend to Fig. 1 for tissue abbreviations.

Page 14: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

14

Figure S3 Correlation between age and level of heteroplasmy. The label for each plot indicates the

tissue, correlation coefficient, and p-value for the null hypothesis that the correlation coefficient is 0.

a np 72, consensus allele T

Page 15: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

15

b np 189, consensus allele A

Page 16: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

16

c np 94, consensus allele G

Page 17: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

17

d np 408, consensus allele T

Page 18: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

18

e np 64, consensus allele C

Page 19: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

19

f np 16327, consensus allele C

Page 20: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

20

g np 60, consensus allele T

Page 21: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

21

h np 564, consensus allele G

Page 22: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

22

i np 204, consensus allele T

Page 23: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

23

j np 16148, consensus allele C

Page 24: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

24

Figure S4 Predicted secondary structure in the genomic region surrounding each heteroplasmic

position exhibiting a significant allele-specific effect.

a np 72: left T, right C

Page 25: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

25

b np 185: left A, right G

Page 26: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

26

c np 189: left A, right G

Page 27: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

27

d np16086: left C, right T

Page 28: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

28

e np 16092: left C, right T

Page 29: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

29

f np16093: left C, right T

Page 30: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

30

g np 16129: left A, right G

Page 31: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

31

Table S1. Major categories of cause of death for the subjects in this study.

Cause of death No. Percentage Avg. Age

Cardiovascular (myocaridal infarction, coronary disease, etc.) 49 32.2 60

Traumatic injuries 35 23.0 54

Natural causes (non-cardiovascular) 33 21.7 63

Intoxication 16 10.5 42

Unclear or other 19 12.5 65

TOTAL 152 100 58

Page 32: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

32

Table S2 List of tissue and allele-related heteroplasmies found in this study that have also been reported in the same tissue

(or in tumors of that tissue) in previous studies.

Position Nucleotides Tissues showing allele specificity in this study

Tissues previously reported

Tumors previously reported

60 T>C KI, LIV KI1,2, LIV1,2

64 C>A KI, SM SM1,2

72 T>C KI,LIV,SM KI1,2, LIV1,2, SM1,3,10 LIV4,5

94 G>A KI, LIV KI2, LIV2

185 A>G All SM3,10 Pancreas6

189 A>G All except BL,KI,SK,OV CER1, LIV1, SM1,2,3,7 LIV4,5

203 G>A KI, LIV LIV2

408 T>A SM SM1,2

16092 C>T All CER1,CEL1

16093 C>T All multiple1,2,8

16129 A>G All except BL, SK Skeletal remains9

1 He, Y. et al. Nature 464, 610-614 (2010).

2 Samuels, D.C. et al. PLoS Genetics 9, e1003929 (2013).

3 Zsurka, G. et al. Nature Genetics 37, 873-877 (2005).

4 Lee, H.C. et al. Mutation Research 547, 71-78 (2004).

5 Zhang, R. et al. Journal of Experimental & Clinical Cancer Research 29, 130 (2010).

6 Navaglia, F. et al. American Journal of Clinical Pathology 126, 593-601 (2006).

7 Theves, C. et al. Journal of Forensic Sciences 51, 865-873 (2006).

8 Krjutškov, K. et al. Current Genetics 60, 1-6 (2013).

9 Nelson, K. & Melton, T. Journal of Forensic Sciences 52, 557-61 (2007).

10 Durham, S.E., Samuels, D.C. & Chinnery, P.F. Neuromuscul Disord 16, 381-386 (2006).

Page 33: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

33

Table S3 Mutations pairs that occurred more often than expected from their frequencies in the population (p-value<0.001,q-value<0.005)

Tissue Sample size First

mutation np

Count Second

mutation np

Count Observed count with

both mutations

Expected count with

both mutations

MM 149 204 26 564 37 16 6

SM 150 408 65 16327 46 44 20

KI 151 185 8 189 11 8 1

LI 150 185 9 16126 13 6 1

SM 150 185 12 16126 9 5 1

CO 152 185 9 16126 11 5 1

LIV 151 185 8 16126 14 5 1

SI 150 152 9 564 12 5 1

BL 139 12684 4 12705 6 3 0

Page 34: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

34

Table S4 Mutation spectrum for tissue-specific and tissue-shared heteroplasmies, and for polymorphisms (differences among

consensus sequences) from the same individuals. AAF, alternative allele frequency.

Mutations

Tissue-

specific

Proportion

(%)

Tissues-

shared

Proportion

(%) Polymorphism

Proportion

(%)

Heteroplasmies

AAF>10%

Proportion

(%)

AT 49 11.7 22 2.8 3 0.5 26 3.4

GA 185 44 342 44 281 44 361 46.8

CA 27 6.5 34 4.4 19 3 20 2.6

GT 14 3.3 2 0.3 5 0.8 0 0

CT 144 34.2 372 47.8 325 51.1 353 45.8

GC 2 0.5 5 0.6 3 0.5 11 1.4

Transversions 92 21.9 63 8.1 30 4.7 57 7.4

Transitions 329 78.1 714 91.9 606 95.3 714 92.6

Page 35: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

35

Table S5 Nonsynonymous (NS) and synonymous (S) heteroplasmies that are either shared by two or more tissues, or specific to a single tissue.

“New” indicates heteroplasmies that have not been reported as polymorphisms (Phylotree Build 15). hN/hS is the ratio of NS heteroplasmies per

NS site to S heteroplasmies per S site.

Shared heteroplasmies Tissue-specific heteroplasmies All heteroplasmies

Tissue Number NS S New

NS New S hN/hS Number NS S New NS New S hN/hS Number NS S

New

NS

New

S hN/hS

BL 234 46 44 29 8 0.35 47 19 8 14 2 0.79 281 65 52 43 10 0.42

SI 340 46 32 26 7 0.48

340 46 32 26 7 0.48

LI 330 46 33 27 9 0.46 2

1

332 46 34 27 9 0.45

KI 458 42 30 23 7 0.47 38 1

1

496 43 30 24 7 0.48

LIV 497 38 25 20 6 0.5 146 103 11 95 2 3.11 643 141 36 115 8 1.3

MM 351 38 26 21 6 0.49 28 1

379 39 26 21 6 0.5

SM 486 40 27 22 6 0.49 129 1

1

615 41 27 23 6 0.5

CO 416 41 28 22 7 0.49

416 41 28 22 7 0.49

CEL 331 43 30 24 7 0.48

331 43 30 24 7 0.48

CER 374 40 27 20 6 0.49 6 2 2

0.33 380 42 29 20 6 0.48

SK 265 46 41 26 8 0.37 25 4 1 2

1.32 290 50 42 28 8 0.4

OV 74 12 8 10 1 0.5

74 12 8 10 1 0.5

All 4156 478 351 270 78 0.45 421 131 23 113 4 1.89 4577 609 374 383 82 0.54

Page 36: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

36

Table S6 Liver-specific nonsynonymous and synonymous heteroplasmies per mtDNA

protein-coding gene. The P-value is for the null hypothesis that the hN/hS ratio is equal to

one.

Gene NS S hN/hS P-value

ND2 5 1 1.67 1

ND5 24 0 >7.87 0.0026

ND4 16 1 5.46 0.08843

ND1 15 1 5.35 0.08514

ND4L 2 1 0.71 1

COX3 5 2 0.81 0.6814

ATP6 4 0 >1.39 0.577

ND3 7 1 2.23 0.6857

CYTB 11 2 1.79 0.7425

ND6 6 1 1.95 1

COX1 6 0 1.96 0.3459

COX2 2 1 0.64 0.5644

ATP8 0 0 NA NA

All 103 11 3.11 0.0024

Page 37: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

37

Table S7 List of positions and associated primers and probes analyzed in ddPCR experiments.

Numbers indicate positions; F and R indicate forward and reverse PCR primers; Probe indicates

allele-specific probe, 5’ Modification indicates fluorescent label attached to probe.

Position and primer/probe Sequence ( 5´- 3 ´) 5´Modification

16086_F TCATGGGGAAGCAGATTTGGG

16086_R ATATTCATGGTGGCTGGCAGT

16086_Probe_c CCATCAACAACCGCcATGTATTTCGTACA [6FAM]

16086_Probe_t CCCATCAACAACCGCtATGTATTTCGTACA [HEX]

11126_F CGCCACTTATCCAGTGAACC

11126_R ATCGGGTGATGATAGCCAAG

11126_Probe_g ATATCTTCTTCgAAACCACACTTATCCCC [6FAM]

11126_Probe_a ATATCTTCTTCaAAACCACACTTATCCCC [HEX]

4142_F CTCCCCTGAACTCTACACAACA

4142_R GGGGAAATGCTGGAGATTGT

4142_Probe_g AGCATACCCCCgATTCCGCT [6FAM]

4142_Probe_a AGCATACCCCCaATTCCGCT [HEX]

10851_F GCTAAAACTAATCGTCCCAACAA

10851_R AAAGGTTGGGGAACAGCTAAA

10851_Probe_g CACAACCACCCACAgCCTAATTATTAGC [6FAM]

10851_Probe_a CACAACCACCCACAaCCTAATTATTAGC [HEX]

12569_F TCAGTCTCTTCCCCACAACA

12569_R CGAACAATGCTACAGGGATG

12569_Probe_c CCAGCTCTCCCcAAGCTTCAAACTAG [6FAM]

12569_Probe_t CCAGCTCTCCCtAAGCTTCAAACTAG [HEX]

408_F CCAAACCCCAAAAACAAAGA

408_R TGGGAGGGGAAAATAATGTG

408_Probe_t CAAATTTTATCTTTtGGCGGTATGCACTT [6FAM]

408_Probe_a CAAATTTTATCTTTaGGCGGTATGCACTT [HEX]

564_F CTAACCCCATACCCCGAAC

564_R GGTGATGTGAGCCCGTCTA

564_Probe_g CCAAACCCCAAAgACACCCCC [6FAM]

564_Probe_a CCAAACCCCAAAaACACCCCC [HEX]

16327_F CAAACCTACCCACCCTTAACA

16327_R ATTGATTTCACGGAGGATGG

16327_Probe_c CATAAAGCCATTTAcCGTACATAGCACATT [6FAM]

16327_Probe_t CATAAAGCCATTTAtCGTACATAGCACATT [HEX]

Page 38: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

38

Table S8 Comparison of alternative allele frequencies determined by ddPCR vs. sequencing on the

Illumina platform. Samples are indicated by tissue abbreviation (from the legend to Fig. 1) and ID

number; ddPCR and Sequencing are alternative allele frequencies inferred by ddPCR and Illumina

sequencing, respectively.

Position Sample Consensus

allele Alternative

allele ddPCR Sequencing

16086 KI200 C T 0.032 0.036

16086 KI202 C T 0.048 0.044

16086 KI282 C T 0.021 0.023

16086 KI354 C T 0.036 0.024

16086 KI375 C T 0.020 0.021

16086 CEL200 C T 0.061 0.089

16086 CEL202 C T 0.050 0.051

16086 CEL282 C T 0.081 0.083

16086 CEL354 C T 0.139 0.137

16086 CEL375 C T 0.050 0.066

16086 CER200 C T 0.250 0.232

16086 CER202 C T 0.117 0.126

16086 CER282 C T 0.183 0.194

16086 CER354 C T 0.372 0.355

16086 CER375 C T 0.136 0.153

16086 BL200 C T 0.033 0.018

16086 BL202 C T 0.010 0.011

16086 BL354 C T 0.023 0.010

16086 BL375 C T 0.000 0.000

16086 MM200 C T 0.242 0.244

16086 MM282 C T 0.085 0.086

16086 MM354 C T 0.451 0.445

16086 MM375 C T 0.019 0.015

16086 SM200 C T 0.898 0.909

16086 SM202 C T 0.390 0.464

16086 SM282 C T 0.900 0.917

16086 SM354 C T 0.950 0.945

16086 SM375 C T 0.313 0.353

16086 SK200 C T 0.021 0.024

16086 SK202 C T 0.022 0.010

16086 SK282 C T 0.024 0.005

16086 SK354 C T 0.027 0.018

16086 SK375 C T 0.012 0.006

16086 CO200 C T 0.239 0.228

16086 CO202 C T 0.158 0.143

16086 CO282 C T 0.156 0.151

16086 CO354 C T 0.286 0.291

16086 CO375 C T 0.148 0.156

16086 LI200 C T 0.093 0.078

16086 LI202 C T 0.083 0.077

Page 39: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

39

Position Sample Consensus allele

Alternative allele

ddPCR Sequencing

16086

LI282

C

T

0.092

0.091

16086 SI200 C T 0.139 0.135

16086 SI202 C T 0.093 0.116

16086 SI282 C T 0.034 0.030

16086 SI354 C T 0.159 0.150

16086 SI375 C T 0.017 0.007

16086 LIV200 C T 0.024 0.028

16086 LIV282 C T 0.025 0.028

16086 LIV354 C T 0.048 0.047

16086 LIV202 C T 0.059 0.062

16086 LIV375 C T 0.064 0.058

16086 LIV375 C T 0.072 0.058

11126 SM290 G A 0.027 0.029

11126 LIV240 G A 0.028 0.034

11126 LIV248 G A 0.039 0.037

11126 LIV289 G A 0.072 0.056

11126 LIV307 G A 0.024 0.024

11126 LIV361 G A 0.086 0.079

11126 LIV197 G A 0.020 0.024

4142 LIV264 G A 0.023 0.026

4142 LIV289 G A 0.039 0.034

4142 LIV334 G A 0.029 0.030

10851 LIV338 G A 0.057 0.054

10851 LIV344 G A 0.024 0.026

10851 LIV197 G A 0.018 0.024

12569 LIV261 T C 0.135 0.127

12569 LIV315 T C 0.021 0.040

408 SM192 T A 0.066 0.105

408 CO192 T A 0.010 0.010

408 SM239 T A 0.089 0.122

408 SM289 T A 0.226 0.282

408 SM248 T A 0.263 0.231

408 CO248 T A 0.022 0.011

408 SM379 T A 0.062 0.090

408 SM323 T A 0.060 0.057

408 SM279 T A 0.058 0.063

408 SM222 T A 0.033 0.038

408 SM347 T A 0.043 0.035

408 SM268 T A 0.024 0.020

408 SM214 T A 0.017 0.021

564 MM315 G A 0.011 0.014

564 SM315 G A 0.058 0.052

564 MM290 G A 0.047 0.044

564 MM193 G A 0.046 0.043

564 MM381 G A 0.029 0.029

Page 40: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

40

Position Sample Consensus allele

Alternative allele

ddPCR Sequencing

564 MM279 G A 0.024 0.030

564 MM220 G A 0.024 0.020

564 MM227 G A 0.023 0.022

564 SM326 G A 0.030 0.026

16327 SM362 C T 0.185 0.199

16327 SM289 C T 0.124 0.139

16327 SM244 C T 0.093 0.114

16327 SM235 C T 0.045 0.067

16327 SM248 C T 0.065 0.082

16327 SM227 C T 0.033 0.053

16327 SM355 C T 0.034 0.050

16327 SM230 C T 0.034 0.040

16327 SM380 C T 0.025 0.030

16327 LI380 C T 0.002 0.007

16327 SM314 C T 0.021 0.020

16327 SM204 C T 0.072 0.093

16327 SK204 C T 0.011 0.010

Page 41: Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and … · 2015. 2. 5. · Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking Supplementary Materials

41

Additional References

1. Maricic T, Whitten M, & Paabo S (2010) Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS One 5:e14004.

2. Nielsen R (2005) Molecular signatures of natural selection. Annu Rev Genet 39:197-218. 3. Ingman M & Gyllensten U (2007) Rate variation between mitochondrial domains and adaptive

evolution in humans. Hum Mol Genet 16:2281-2287. 4. Kryazhimskiy S & Plotkin JB (2008) The population genetics of dN/dS. PLoS Genet 4:e1000304. 5. Mugal CF, Wolf JB, & Kaj I (2014) Why time matters: codon evolution and the temporal

dynamics of dN/dS. Mol Biol Evol 31:212-231. 6. Reva B, Antipin Y, & Sander C (2011) Predicting the functional impact of protein mutations:

application to cancer genomics. Nucleic Acids Res 39:e118.