DOI: 10.1126/science.1212959 , 369 (2011); 334 Science , et al. Robert J. Schmitz Methylation Variants Transgenerational Epigenetic Instability Is a Source of Novel This copy is for your personal, non-commercial use only. clicking here. colleagues, clients, or customers by , you can order high-quality copies for your If you wish to distribute this article to others here. following the guidelines can be obtained by Permission to republish or repurpose articles or portions of articles ): November 6, 2011 www.sciencemag.org (this infomation is current as of The following resources related to this article are available online at http://www.sciencemag.org/content/334/6054/369.full.html version of this article at: including high-resolution figures, can be found in the online Updated information and services, http://www.sciencemag.org/content/suppl/2011/09/14/science.1212959.DC1.html can be found at: Supporting Online Material http://www.sciencemag.org/content/334/6054/369.full.html#related found at: can be related to this article A list of selected additional articles on the Science Web sites http://www.sciencemag.org/content/334/6054/369.full.html#ref-list-1 , 13 of which can be accessed free: cites 40 articles This article http://www.sciencemag.org/cgi/collection/genetics Genetics subject collections: This article appears in the following registered trademark of AAAS. is a Science 2011 by the American Association for the Advancement of Science; all rights reserved. The title Copyright American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the Science on November 6, 2011 www.sciencemag.org Downloaded from
121
Embed
Transgenerational Epigenetic Instability Is a Source of Novel …signal.salk.edu/publications/Schmitz_etal_SOM.pdf · 2011. 12. 21. · DOI: 10.1126/science.1212959 Science 334, 369
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
subject collections:This article appears in the following
registered trademark of AAAS. is aScience2011 by the American Association for the Advancement of Science; all rights reserved. The title
CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience
FM pulsing can be implemented by a simplecircuit of three genes (rsbW, rsbV, and sigB), withinput from a phosphatase complex. This systemprovides a fundamental signal-processing capabil-ity to bacterial cells, enabling them to convertsteady “DC” inputs into pulsatile, predominantly“AC” outputs. Noise plays a key functional rolein this signal processing system (3). The sB cir-cuit conserves its core architecture in diversebacteria (7), and other alternative sigma factorssimilarly feature both posttranslational regulationby anti-sigma factors and autoregulatory feed-back. Thus, related stochastic pulse modulationschemes are likely employed more generally inbacteria (10). The relatively slow time scale of sB
pulses (Fig. 1E) could confer advantages in re-sponding to unpredictable environments andmain-taining a broad, but dynamic, distribution of statesin the population through bet-hedging (25, 26).Given the negative effect of sB activation ongrowth rate in some conditions, even under energystress (27), these results suggest that cells balancethe benefits and costs of sB activation dynami-cally. It will be interesting to see whether otherdynamic encoding schemes are similarly imple-mented by relatively simple circuit modules.
References and Notes1. A. Raj, A. van Oudenaarden, Cell 135, 216 (2008).2. N. Rosenfeld, J. W. Young, U. Alon, P. S. Swain,
M. B. Elowitz, Science 307, 1962 (2005).
3. A. Eldar, M. B. Elowitz, Nature 467, 167 (2010).4. R. Losick, C. Desplan, Science 320, 65 (2008).5. L. Cai, C. K. Dalal, M. B. Elowitz, Nature 455, 485
(2008).6. E. Rotem et al., Proc. Natl. Acad. Sci. U.S.A. 107,
12541 (2010).7. M. Hecker, J. Pané-Farré, U. Völker, Annu. Rev. Microbiol.
61, 215 (2007).8. W. G. Haldenwang, R. Losick, Nature 282, 256
(1979).9. O. A. Igoshin, M. S. Brody, C. W. Price, M. A. Savageau,
J. Mol. Biol. 369, 1333 (2007).10. T. M. Gruber, C. A. Gross, Annu. Rev. Microbiol. 57,
441 (2003).11. M. J. Kazmierczak, S. C. Mithoe, K. J. Boor, M. Wiedmann,
J. Bacteriol. 185, 5722 (2003).12. U. Lorenz et al., Microbes Infect. 10, 217 (2008).13. M. Hecker, U. Völker, Mol. Microbiol. 29, 1129
(1998).14. J. C. Locke, M. B. Elowitz, Nat. Rev. Microbiol. 7, 383
(2009).15. S. Zhang, W. G. Haldenwang, J. Bacteriol. 187, 7554
(2005).16. A. L. Hodgkin, A. F. Huxley, J. Physiol. 117, 500 (1952).17. G. M. Süel, J. Garcia-Ojalvo, L. M. Liberman,
M. B. Elowitz, Nature 440, 545 (2006).18. M. B. Elowitz, S. Leibler, Nature 403, 335 (2000).19. G. M. Süel, R. P. Kulkarni, J. Dworkin, J. Garcia-Ojalvo,
M. B. Elowitz, Science 315, 1716 (2007).20. A. Goldbeter, D. E. Koshland Jr., Proc. Natl. Acad.
Sci. U.S.A. 78, 6840 (1981).21. G. J. Melen, S. Levy, N. Barkai, B. Z. Shilo, Mol. Syst. Biol.
1, 2005.0028 (2005).22. Z. Cheng, F. Liu, X. P. Zhang, W. Wang, Biophys. J.
97, 2867 (2009).23. J. C. Ray, O. A. Igoshin, PLOS Comput. Biol. 6, e1000676
(2010).
24. S. Alper, L. Duncan, R. Losick, Cell 77, 195(1994).
25. M. Acar, A. Becskei, A. van Oudenaarden, Nature 435,228 (2005).
26. E. Kussell, S. Leibler, Science 309, 2075 (2005).27. T. Schweder, A. Kolyschkow, U. Völker, M. Hecker,
Arch. Microbiol. 171, 439 (1999).28. A. Dufour, W. G. Haldenwang, J. Bacteriol. 176, 1813
(1994).29. M. S. Brody, K. Vijay, C. W. Price, J. Bacteriol. 183,
6422 (2001).30. A. A. Wise, C. W. Price, J. Bacteriol. 177, 123
(1995).Acknowledgments: We thank C. Price and D. Rudner for
providing strains. We thank A. Eldar, R. Kishony, C. Price,N. Wingreen, J. Levine, and other members of M.B.E’slaboratory for helpful discussions. Work in M.B.E’slaboratory was supported by NIH grants R01GM079771and P50 GM068763, U.S. National Science FoundationCAREER Award 0644463, and the Packard Foundation.J.C.W.L was supported by the International HumanFrontier Science Program Organization and the EuropeanMolecular Biology Organization.
Supporting Online Materialwww.sciencemag.org/cgi/content/full/science.1208144/DC1Materials and MethodsSOM TextFigs. S1 to S20Table S1ReferencesMovies S1 and S2
10 May 2011; accepted 1 September 2011Published online 6 October 2011;10.1126/science.1208144
Transgenerational EpigeneticInstability Is a Source ofNovel Methylation VariantsRobert J. Schmitz,1,2 Matthew D. Schultz,1,2,3 Mathew G. Lewsey,1,2 Ronan C. O’Malley,2
Mark A. Urich,1,2 Ondrej Libiger,4 Nicholas J. Schork,4 Joseph R. Ecker1,2,5*
Epigenetic information, which may affect an organism’s phenotype, can be stored and stablyinherited in the form of cytosine DNA methylation. Changes in DNA methylation can producemeiotically stable epialleles that affect transcription and morphology, but the rates of spontaneousgain or loss of DNA methylation are unknown. We examined spontaneously occurring variationin DNA methylation in Arabidopsis thaliana plants propagated by single-seed descent for 30generations. We identified 114,287 CG single methylation polymorphisms and 2485 CG differentiallymethylated regions (DMRs), both of which show patterns of divergence compared with the ancestralstate. Thus, transgenerational epigenetic variation in DNA methylation may generate new allelicstates that alter transcription, providing a mechanism for phenotypic diversity in the absence ofgenetic mutation.
Cytosine methylation is a DNA base mod-ification with roles in development anddisease in animals as well as in silencing
transposons and repetitive sequences in plantsand fungi (1). In plants, CG methylation is com-monly found within gene bodies (2–5), whereasnon-CG methylation, CHG and CHH (where His A, C, or T), is enriched in transposons and re-petitive sequences (1). The RNA-directed DNAmethylation (RdDM) pathway targets both CGand non-CG sites for methylation and is com-
monly associated with transcriptional silencing(6). This pathway can also target and silenceprotein-coding genes, giving rise to epigenetic al-leles or so-called epialleles that can be heritablethrough mitosis and/or meiosis (7, 8) and can bedependent on the methylation of a single CG di-nucleotide (9).
Two meiotically heritable epialleles result-ing in morphological variation are the peloric(Linaria vulgaris) and colorless non-ripening(Solanum lycopersicum) loci (10, 11). Both show
spontaneous epigenetic silencing events withintheir respective populations (10, 12). However, thefrequency at which such spontaneous meioticallyheritable epialleles naturally arise in populationsis unknown. Although epiallelic variation has beenidentified between genetically diverse populationswithin Arabidopsis thaliana (13), it is unclearwhether these identified epialleles are due tounderlying genetic variation. Epialleles have al-so been artificially generated after mutagenesisor because of mutations in the cellular com-ponents required for the maintenance of DNAmethylation (14–16).
An A. thaliana (Columbia-0) population, theMA lines, derived by single-seed descent for 30generations (17) was used to examine the extentof naturally occurring variation in DNA methyla-tion and the frequency at which spontaneous epi-alleles emerge over time. We used the MethylC-Seqmethod (3) to determine the whole-genome baseresolution DNA methylomes for three ancestral
1Plant Biology Laboratory, The Salk Institute for BiologicalStudies, La Jolla, CA 92037, USA. 2Genomic Analysis Labo-ratory, The Salk Institute for Biological Studies, La Jolla, CA92037, USA. 3Bioinformatics Program, University of Californiaat San Diego, La Jolla, CA 92093, USA. 4The Scripps Transla-tional Science Institute and the Department of Molecular andExperimental Medicine, The Scripps Research Institute, LaJolla, CA 92037, USA. 5Howard Hughes Medical Institute, TheSalk Institute for Biological Studies, 10010 North Torrey PinesRoad, La Jolla, CA 92037, USA.
*To whom the correspondence should be addressed. E-mail:[email protected]
www.sciencemag.org SCIENCE VOL 334 21 OCTOBER 2011 369
MA lines (numbers 1, 12, and 19) and fivedescendant MA lines (numbers 29, 49, 59, 69,and 119) (fig. S1). We refer to lines 1, 12, and 19as ancestors throughout this study, althoughthey are not direct ancestors because they arethree generations removed from the originalfounder line (fig. S1). These specific descendantlines were selected because their genomes havebeen sequenced and they have a known level ofspontaneous mutation (18). Biological replicates(sibling plants) for each leaf methylome weresequenced to an average of ~34-fold coverage,which allowed for an average per line exami-nation of 39,897,093 (96.35%) uniquely mappedcytosines and 5,307,077 (98.39%) uniquely mappedCGs (table S1).
A total of 1,730,761 CGs were methylated(mCGs) in at least one MA line (Fig. 1A), andabout 91% of the covered mCGs were invar-iably methylated across all eight lines (19). Thevariable mCGs revealed a set of 114,287 high-confidence CG single methylation polymorphisms(SMPs) that showed a consensus of the meth-ylation status of CG dinucleotides between bi-ological replicates (Fig. 1A). Next, a referenceMA founder DNA methylome was created bypooling the completely conserved mCG sitecalls for all ancestral MA lines and used to de-termine the frequency of discordant CG-SMPsites within the descendant population (Fig. 1B).Within the descendant lines, ~1.62% of the CGmethylome shows susceptibility to dynamic ac-quisitions and losses of mCGs over time (tableS2). On average, ~66,000 methylated CG-SMPs(mCG-SMPs) were identified for each ances-tral and descendant line (fig. S2). Although thetotal number of mCG-SMPs was similar be-tween all lines, the conservation of these poly-morphisms among and between ancestral anddescendant populations was different (Fig. 1Cand table S3). A pairwise comparison of bothpopulations for methylation conservation, esti-mated by global similarity of mCG-SMP sites(19), revealed that all of the ancestral lines arehighly similar (table S4). Descendant lines showedgreater similarity in CG-SMPs methylation sta-tus to ancestral lines than to other descendantlines (table S4).
We calculated an estimate of the epimutationrate per generation in this population by usinglinear regression and TREE PUZZLE, which re-vealed 704 and 2876 methylation changes eachgeneration, respectively (19). We estimated a lowerbound of the epimutation rate with the linearregression results, which revealed 4.46 × 10−4
methylation polymorphisms per CG site per gen-eration (P < 0.0000216) (table S5). This findingcontrasts with the previously reported spontane-ous genetic mutation rate of 7 × 10−9 base sub-stitutions per site per generation for these sameMA lines (18). The TREE PUZZLE analysis re-vealed higher estimated epimutation rates in earliergenerations (19). One possible source of this var-iation could be due to seed age, storage, and/orselection for seed survival. Therefore, although
DNA methylation is predominantly static overrelatively long periods of time, changes in cyto-sine methylation do occur and at a frequencygreater than that of mutation observed at theDNA sequence level.
By using CG-SMPs derived from both an-cestral and descendant populations, we carriedout a genome-wide analysis of differentially meth-ylated regions (DMRs) and identified 2485 CG-DMRs that ranged in size from 11 to 1110 basepairs (bp) (Fig. 2A and table S6). Hierarchicalclustering of CG-DMRs in this population, cal-culated solely on the basis of the methylationdensity, revealed that the ancestral lines segregateas an independent cluster from the descendantlines (Fig. 2B and fig. S3). Multivariate distance-based regression (MDMR) (20, 21) confirmedthis finding, indicating a statistically significant(P < 0.00005) association between ancestor ordescendant status and methylation density of theCG-DMR profiles. The ancestor or descendantstatus explained 47% of the variance in the dis-similarity in methylation density of CG-DMRsbetween pairs of samples, indicating that, overtime, there is a divergence of DNA methylationpatterns in both formation and elimination of CG-DMRs. Furthermore, the genome-wide locationsof these CG-DMRs were not uniformly distributed(P < 2.20 × 10−16), because 60.5% (1504/2485)
were found in genic regions compared with 3.3%(82/2485) and 36.2% (899/2485) located in in-tergenic regions and transposons, respectively(Fig. 2B).
Next, we performed a genome-wide surveyfor nonCG-DMRs and uncovered a total of 284among all eight lines (table S7). In general, thenonCG-DMRs were largely localized to inter-genic regions (141/284) of the genome, becauseonly 57/284 overlapped with genes and 86/284overlapped with transposons. The size ranges ofthe nonCG-DMRs were similar to those of theCG-DMRs because the vast majority occurred insmaller segments of the genome (10 to 682 bp).Therefore, variation in DNA methylation ap-pears to occur in all three methylation sequencecontexts.
CG methylation is present within gene bodiesand is enriched toward the 3′ end (2–5), whereasCG and nonCG methylation is associated withheterochromatin, transposons, and repetitive se-quences (1). In agreement with these findings, weobserved that the 3′ portion of genes containedthe greatest source of CG-DMRs and that themajority of nonCG-DMRs were enriched out-side of the gene bodies (Fig. 2C). Furthermore,we observed a ~twofold depletion of CG-DMRsin exons compared with introns (Fig. 2D). Thegenome-wide distributions of CG-SMPs, CG-DMRs,
Fig. 1. Epigenetic variation of CG-SMPs. (A) An example of a CG-SMP. Gold lines indicate CG methyl-ation, maroon rectangle indicates the untranslated regions, and green rectangles indicated exons. (B) Abreakdown of the methylation distribution of CG dinucleotides among all samples. (C) A heatmap indi-cating the number of CG-SMPs that differ between two samples (table S3).
21 OCTOBER 2011 VOL 334 SCIENCE www.sciencemag.org370
and nonCG-DMRs were depleted in hetero-chromatic regions in the genome (Fig. 2, Eand F). These depletions were mostly observed atthe pericentromeres and centromeres (Fig. 2, Eand F, and figs. S4 and S5). CG-DMRs areenriched in transposons located in euchromatinbut depleted in transposons present near thecentromere. Because the centromeric regions ofthe genome contain the highest density of DNAmethylation (Fig. 2, E and F), these observationscombined with the observations that CG-DMRsare enriched in intron sequences may indicatethat DNA methylation that is associated with nu-cleosomes (22) (i.e., exons or tightly packagedchromatin in the pericentromeres and centro-meres) may be maintained at a higher fidelityand that DNA methylation not associated withnucleosomes may undergo greater epigeneticdrift.
A genome-wide screen for DMRs simulta-neously occurring in all three methylation sequencecontexts (C-DMRs are CG, CHG, and CHH)was performed to assess the extent of epiallelicvariation that is characteristic of RdDM acrossthe MA population. In total, 72 C-DMRs wereidentified, of which functional categorization
revealed that two-thirds overlapped with trans-poson and intergenic sequences whereas aboutone-third overlapped with gene bodies and pro-moters (Fig. 3A and table S8). To determinewhether transposition-induced methylation couldpotentially give rise to the methylated C-DMRs(mC-DMRs) (23), genomic DNA encompassingall C-DMRs was amplified and compared in allancestral and descendant lines. In every case,the observed amplicon size was identical forall MA lines and was equal to the expected sizeof the locus (table S8), indicating that theseC-DMRs are unlinked to cis-genetic variationlocated within 500 bp, a distance that would beexpected to reveal methylation induced by trans-poson insertions at these loci (23). Additionally,none of the genetic variants identified by genomeresequencing of this population (18) overlappedwith any of these C-DMRs. Lastly, restrictionenzyme digestion and Southern blot analyseswere performed to rule out the possibility thatcopy number variants were the cause of spon-taneous epiallele formation, as is the case for thePAI epialleles (24). In all cases examined, the ob-served hybridization pattern and gene copy num-ber were identical for each of the MA lines
(fig. S6). Therefore, we conclude that the 72C-DMRs represent a set of spontaneously occur-ring epialleles within the MA lines, because theywere not associated with any genetic variation.
By using a set of C-DMRs that exhibitedan identical methylation status (fig. S7), we de-termined the frequency of discordance of theancestral state with the descendant lines and foundthat 29 of the C-DMRs were highly variable (>1descendant line was discordant with the ances-tral state) (Fig. 3B). C-DMRs discordant in onlyone of the five descendant lines were the mostfrequent class, but there was an unexpectedlyhigh number of C-DMRs (63%) that were dis-cordant in more than one descendant (Fig. 3B).Within the set of 576 C-DMRs identified (eightlines by 72 C-DMRs), 7 were discordant betweenthe biological replicates (table S8). These datasuggest that, although many C-DMRs representthe formation of spontaneous epialleles, a smallsubset may reflect the presence of “hotspots”(metastable epialleles).
We sequenced small RNA (smRNA) pop-ulations for all eight lines and found thatsmRNAs [represented as RPKCMs (reads perkilobase of each C-DMR per million reads) in
Fig. 2. CG-DMRs diverge over time and are enriched ingene bodies. (A) Example CG-DMR present in an unmeth-ylated state in both replicates of line 69. (B) A heatmaprepresentation of a two-dimensional hierarchical cluster-ing based on DMRs. Columns represent samples. Rowsindicate DMRs. The column to the left of the heatmapindicates the genomic location of the DMR (blue, genebody; gold, transposon; gray, intergenic; red, transposonin gene body). (C) The average distribution of CG-DMRs
(red) and nonCG-DMRs (blue) across gene bodies (from the start of the 5′ UTR to the end of the 3′ UTR, including 500 bp up- and downstream). (D) CG gene-body DMRs are specifically depleted in exons. (E) Genome-wide distributions of mCG (red), CG-SMPs (green), and CG-DMRs (blue) across chromosome I. (F)Genome-wide distributions of methylated nonCGs (mnonCG, red) and nonCG-DMRs (green) across chromosome I. The centromere is indicated by the pinkvertical bar for (E) and (F).
www.sciencemag.org SCIENCE VOL 334 21 OCTOBER 2011 371
tables S9 to 12] were associated with an in-crease in the average methylation density ofC-DMRs (Fig. 3C). Furthermore, this associationresembled a binary switch, because the mostdensely methylated C-DMRs contained abun-dant 24-nucleotide (nt) smRNAs (Fig. 3C).
Of the eight previously documented plantepialleles resulting in phenotypic variation, all af-fected transcriptional output of the differentiallymethylated locus (9–11, 23–28). mRNA abun-dance was measured in all eight lines with quan-titative reverse transcription polymerase chainreaction (qRT-PCR) at eight C-DMRs that over-lapped with protein-coding regions. In four ofthese genes, the gain or loss of DNA methylationwas correlated with a large decrease or increasein mRNA abundance, respectively, and with thepresence of 24-nt smRNAs at each silenced epi-allele (Fig. 3, D to F, and fig. S8). These find-ings reveal that changes in epiallelic state canlead to major effects on transcriptional output(fig. S9).
We also observed that the methylation sta-tus of one C-DMR resulted in alternative pro-moter usage of ACTIN RELATED PROTEIN 9(At5g43500) (fig. S10C). The loss of DNA meth-ylation within the 5′ untranslated region (UTR)of the At5g43500.1 isoform led to an increase in
mRNA expression, whereas expression of iso-form At5g43500.2, with a transcriptional startsite located further downstream, was unaffected(fig. S10, D and E).
Although epialleles can have major impactson phenotypic diversity, until now their identi-
fication was not trivial. Even more puzzling isthe origin of “pure” alleles, which are definedby their formation in the absence of any geneticvariation in cis or trans (8). One route to epi-allele formation may be the failure to correctlymaintain the proper methylation status through-
-1 0 1 2 3 4 5 6 7 8
A B
Transposonsn = 27
Intergenicn = 21
Genesn = 14
Promotersn = 7
ncRNAsn = 2
Pseudogenen = 1
# of descendant lines discordant with ancestral state
Log2 fold change in mRNA levels of At5g24240 (relative to line 1)At5g24240
At5g242501
19
12
29
49
59
69
119
Num
ber
of C
-DM
Rs
C
F
24nt23nt22nt21nt
1
19
12
29
49
59
69
119
smRNA levels at At5g24240 C-DMR (RPKCMs)
0 2 4 6 8 10 12 14 16
mC-DMR density quantiles (%)
Ave
rage
sm
RN
A R
PK
CM
s
0
2
4
6
8
10
12
14
10 20 30 40 50 60 70 80 90 100
21nt22nt23nt24nt
Fig. 3. Epiallelic variation at protein-coding loci is associated with transcrip-tional variation. (A) Classification of C-DMRs and their genomic locations. (B)The number of descendant lines discordant with the ancestral C-DMR stateand the C-DMR methylation status. The black portions of the bar indicate thedescendant C-DMRs that became methylated, whereas the white portionsindicate regions that became unmethylated, compared with the ancestral pop-ulation. (C) The 24-nt smRNA levels are associated with increasing methyla-tion density. The 24-nt smRNA RPKCMs for all 576 C-DMRs (8 MA lines by
72 C-DMRs) were ranked and binned into 10% quantiles, and then the aver-age mC densities were plotted. (D) A representative C-DMR at At5g24240 inwhich both biological replicates of descendant line 59 were unmethylated. (E)qRT-PCR analysis of At5g24240 reveals >50-fold increase in mRNA abundancein unmethylated line 59. Error bars indicate SEM. (F) The 24-nt smRNAs areenriched specifically in the MA lines that are transcriptionally silenced in (E)for the At5g24240 locus with the exception of line 59, which is abundantlyexpressed in (E).
0
10
20
30
40
50
60
met1 ddc
PartiallyMethylated
# of mC-DMRs that become
unmethylated in
# of C-DMRs that become
re-methylated in
rdd
Not methylated
in
Col-0
Num
ber
of C
-DM
Rs
Fig. 4. Methylation status of all 72 epialleles in methylation and demethylation mutant backgrounds.Most of the epialleles become unmethylated in met1-3, whereas a smaller number become remeth-ylated in the DNA demethylase triple mutant rdd.
21 OCTOBER 2011 VOL 334 SCIENCE www.sciencemag.org372
out epigenetic reprogramming that occurs post-fertilization (29, 30). It is noteworthy that 63 ofthe 72 C-DMRs overlap with regions previouslyshown to have altered methylation patterns inmethylation enzyme mutants (Fig. 4) (3). Of the14 C-DMRs that overlap with genes, 5 becomereexpressed in met1-3 and 1 transcript becomessilenced in rdd (3). These results suggest that afailure to faithfully maintain genome-wide meth-ylation patterns by MET1 and/or RDD is likelyone source of spontaneous epiallele formation.
Regardless of their origin, the majority of epi-alleles identified in this study are meiotically sta-ble and heritable across many generations in thispopulation. Understanding the basis for such trans-generational instability and the mechanism(s)that trigger and/or release these epiallelic stateswill be of great importance for future studies.
References and Notes1. J. A. Law, S. E. Jacobsen, Nat. Rev. Genet. 11, 204 (2010).2. S. J. Cokus et al., Nature 452, 215 (2008).3. R. Lister et al., Cell 133, 523 (2008).4. X. Zhang et al., Cell 126, 1189 (2006).5. D. Zilberman, M. Gehring, R. K. Tran, T. Ballinger,
S. Henikoff, Nat. Genet. 39, 61 (2007).6. S. W.-L. Chan et al., Science 303, 1336 (2004).7. J. Paszkowski, U. Grossniklaus, Curr. Opin. Plant Biol. 14,
195 (2011).
8. E. J. Richards, Nat. Rev. Genet. 7, 395 (2006).9. K. Shibuya, S. Fukushima, H. Takatsuji, Proc. Natl. Acad.
Sci. U.S.A. 106, 1660 (2009).10. P. Cubas, C. Vincent, E. Coen, Nature 401, 157 (1999).11. K. Manning et al., Nat. Genet. 38, 948 (2006).12. A. J. Thompson et al., Plant Physiol. 120, 383 (1999).13. M. W. Vaughn et al., PLoS Biol. 5, e174 (2007).14. F. Johannes et al., PLoS Genet. 5, e1000530 (2009).15. F. K. Teixeira et al., Science 323, 1600 (2009);
10.1126/science.1165313.16. A. Vongs, T. Kakutani, R. A. Martienssen, E. J. Richards,
Science 260, 1926 (1993).17. R. G. Shaw, D. L. Byers, E. Darmo, Genetics 155, 369
(2000).18. S. Ossowski et al., Science 327, 92 (2010).19. Additional experiments and descriptions of methods used
to support our conclusions are presented as supportingmaterial on Science Online.
20. C. M. Nievergelt et al., Am. J. Med. Genet. B. Neuropsychiatr.Genet. 141B, 234 (2006).
21. M. A. Zapala, N. J. Schork, Proc. Natl. Acad. Sci. U.S.A.103, 19430 (2006).
22. R. K. Chodavarapu et al., Nature 466, 388 (2010).23. J. Liu, Y. He, R. Amasino, X. Chen, Genes Dev. 18,
2873 (2004).24. J. Bender, G. R. Fink, Cell 83, 725 (1995).25. S. Melquist, B. Luff, J. Bender, Genetics 153, 4017
(1999).26. S. E. Jacobsen, E. M. Meyerowitz, Science 277, 1100
(1997).27. H. Saze, T. Kakutani, EMBO J. 26, 3641 (2007).28. W. J. Soppe et al., Mol. Cell 6, 791 (2000).29. R. A. Mosher et al., Nature 460, 283 (2009).30. R. K. Slotkin et al., Cell 136, 461 (2009).
Acknowledgments: We thank M. White, R. Lister, M. Galli,and R. Amasino for discussions; R. Shaw and E. Darmofor seeds; J. Nery for sequencing operations; andM. Axtell for Southern blot protocol. R.J.S. was supportedby an NIH National Research Service Award postdoctoralfellowship (F32-HG004830). M.D.S. was supported bya NSF Integrative Graduate Education and ResearchTraineeship grant (DGE-0504645). M.G.L. was supportedby an European Union Framework Programme 7Marie Curie International Outgoing Fellowship(project 252475). O.L. and N.J.S. are supported byNIH/National Center for Research Resources grantnumber UL1 RR025774. This work was supported bythe Mary K. Chapman Foundation, the NSF (grantsMCB-0929402 and MCB1122246), the Howard HughesMedical Institute, and the Gordon and Betty MooreFoundation (GBMF) to J.R.E. J.R.E. is a HHMI–GBMFInvestigator. Analyzed data sets can be viewedat http://neomorph.salk.edu/30_generations/browser.html. Sequence data can be downloaded from NationalCenter for Biotechnology Information Sequence ReadArchive (SRA035939). Correspondence and requests formaterials should be addressed to J.R.E. ([email protected]).
Supporting Online Materialwww.sciencemag.org/cgi/content/full/science.1212959/DC1Materials and MethodsSOM TextFigs. S1 to S11Tables S1 to S16References
22 August 2011; accepted 7 September 2011Published online 15 September 2011;10.1126/science.1212959
Computation-Guided BackboneGrafting of a Discontinuous Motifonto a Protein ScaffoldMihai L. Azoitei,1* Bruno E. Correia,1,2* Yih-En Andrew Ban,1† Chris Carrico,1,3
Oleksandr Kalyuzhniy,1 Lei Chen,4 Alexandria Schroeter,1 Po-Ssu Huang,1 Jason S. McLellan,4
Peter D. Kwong,4 David Baker,1,5 Roland K. Strong,3 William R. Schief1,6,7‡
The manipulation of protein backbone structure to control interaction and function is achallenge for protein engineering. We integrated computational design with experimental selectionfor grafting the backbone and side chains of a two-segment HIV gp120 epitope, targeted by thecross-neutralizing antibody b12, onto an unrelated scaffold protein. The final scaffolds bound b12 withhigh specificity and with affinity similar to that of gp120, and crystallographic analysis of a scaffoldbound to b12 revealed high structural mimicry of the gp120-b12 complex structure. The methodcan be generalized to design other functional proteins through backbone grafting.
Computational protein design tests ourunderstanding of protein structure andfolding and provides valuable reagents
for biomedical and biochemical research; long-term goals include the design of field- or clinic-ready biosensors (1), enzymes (2), therapeutics (3),and vaccines (4, 5). A major limitation has beenan inability to manipulate backbone structure;most computational protein design has involvedsequence design on predetermined backbone struc-tures or with minor backbone movement (1–5).Accurate backbone remodeling presents a sub-stantial challenge for computational methodsowing to limited conformational sampling andimperfect energy functions (6).
Novel recognition modules (7), inhibitors (8, 9),enzymes (2), and immunogens (4, 5, 10, 11) havebeen designed by grafting functional constel-lations of side chains onto protein scaffolds ofpredefined backbone structure. In all cases, therestriction to using predetermined scaffold back-bone structures limited the complexity of thefunctional motifs that could be transplanted. Forexample, the de novo enzymes could accommo-date grafting of only three or four catalytic groups,whereas many natural enzymes have six or more(12), and the immunogens were limited to con-tinuous (single-segment) epitopes even thoughmost antibody epitopes are discontinuous (involv-ing two or more antigen segments) (13, 14).
To address the challenge of incorporating back-bone flexibility modeling into grafting design, wedeveloped a hybrid computational-experimentalmethod for grafting the backbone and side chainsof functional motifs onto scaffolds (Fig. 1). Wetested this method by grafting a discontinuousHIV gp120 epitope, targeted by the broadly neu-tralizing monoclonal antibody b12 (15), ontoan unrelated scaffold. b12 binds to a conservedepitope within the CD4-binding site (CD4bs) ofgp120 (16), an area of great interest for vaccinedesign. We focused on transplantation of twosegments from gp120: residues 365 to 372, knownas the CD4b (CD4 binding) loop (17), and resi-dues 472 to 476, known as the ODe (outer domainexit) loop (16). The b12-gp120 interaction in-volves six or seven backbone segments on gp120(16), but 60% of the buried surface area on gp120lies on the CD4b and ODe loops, and a Rosettaenergy calculation (18) suggested that these two
1Department of Biochemistry, University of Washington, Seattle,WA 98195, USA. 2Ph.D. Program in Computational Biology,Instituto Gulbenkian de Ciência, Oeiras, Portugal. 3Divison ofBasic Sciences, Fred Hutchinson Cancer Research Center, Seattle,WA 98109, USA. 4Vaccine Research Center, National Institute ofAllergy and Infectious Diseases, Bethesda, MD 20892, USA.5Howard Hughes Medical Institute, University of Washington,Seattle, WA 98195, USA. 6IAVI Neutralizing Antibody Center,The Scripps Research Institute, La Jolla, CA 92037, USA. 7De-partment of Immunology and Microbial Science, The ScrippsResearch Institute, La Jolla, CA 92037, USA.
*These authors contributed equally to this work.†Present address: Arzeda Corporation, Seattle, WA 98102,USA.‡To whom correspondence should be addressed. E-mail:[email protected]
www.sciencemag.org SCIENCE VOL 334 21 OCTOBER 2011 373
Transgenerational Epigenetic Instability Is a Source of Novel Methylation Variants
Robert J. Schmitz, Matthew D. Schultz, Mathew G. Lewsey, Ronan C. O’Malley, Mark A. Urich, Ondrej Libiger, Nicholas J. Schork, Joseph R. Ecker*
*To whom correspondence should be addressed. E-mail: [email protected]
Published 15 September 2011 on Science Express DOI: 10.1126/science.1212959
This PDF file includes:
Materials and Methods SOM Text Figs. S1 to S11 Tables S1 to S16 References
2
Supporting Online Material Materials and Methods SOM Text References Figs. S1 to S11 Tables S1 to S16 Materials and Methods Plant material
All seeds used in these experiments were descendants of a Columbia (Col-0) accession and have been previously described by Shaw et. al. (17). All eight lines were grown in replicate under long day conditions within a single tray. Leaf tissue was flash frozen in liquid nitrogen at approximately the 10-leaf stage within a 15-minute period of time. DNA was isolated using a Qiagen Plant DNeasy kit (Qiagen, Valencia, CA) following the manufacturer’s recommendations. RNA was isolated using the Qiagen Plant RNeasy kit (Qiagen) following the manufacturer’s instructions. MethylC-Seq library construction
Approximately two micrograms of genomic DNA was sonicated to ~100 bp using the Covaris S2 System using the following parameters: cycle number = 6, duty cycle = 20%, intensity = 5, cycles/burst = 200 and time = 60 seconds. Sonicated DNA was purified using Qiagen DNeasy minielute columns (Qiagen). Each sequencing library was constructed using the NEBNext DNA Sample Prep Reagent Set 1 (New England Biolabs, Ipswich, MA) according to the manufacturer’s instructions with the following slight modifications. Methylated adapters were used in place of the standard genomic DNA adapters from Illumina (Illumina, San Diego, CA). Ligation products were purified with AMPure XP beads (Beckman, Brea, CA). DNA (450 ng) was bisulfite treated using the MethylCode Kit (Invitrogen, Carlsbad, CA) following the manufacturer’s guidelines and then PCR amplified using Pfu Cx Turbo (Agilent, Santa Clara, CA) instead of using the Phusion Taq included in the NEBNext kit using the following PCR conditions (2 minutes at 95C, 4 cycles of 15 seconds at 98C, 30 seconds at 60C, 4 minutes at 72C and 10 minutes at 72C). Small RNA library construction
Approximately 200 mg of finely ground tissue from single rosette plants containing 10 leaves was used for small RNA isolation following the instructions provided within the Ambion mirVana miRNA Isolation Kit (Ambion, Austin, TX). Small RNAs isolated from this kit were EtOH precipitated and loaded into a 15% TBE-UREA gel (Life Technologies, Carlsbad, CA). smRNAs were excised from the gel from the
3
range of ~15-50 nucleotides in length. These smRNAs were used for library construction following the protocol provided in the TruSeq Small RNA Sample Preparation Kit (Illumina). Sequencing
MethylC-Seq libraries were sequenced using the Illumina HiSeq 2000 (Illumina) as per manufacturer’s instructions. Sequencing of libraries was performed up to 101 cycles. Image analysis and base calling were performed with the standard Illumina pipeline version RTA 2.8.0. Sequencing analysis
Fastq files were aligned to TAIR10 using Bowtie (31) and custom algorithms were used for identification of mC sites as described previously (32). Generating the CG-SMPs
In each of the samples, methylated cytosines (mCs) were identified using the same method described in (32). Only positions where the mC called in each biological replicate of a particular sample agreed (i.e., both positions were either methylated or unmethylated) and only those in the CG context were considered. nonCG-SMPs were not queried as the levels of methylation at these sites are much more variable compared to CGs which is likely due to mechanism by which they are maintained. From this list of potential CG-SMPs, all positions were removed where all 8 individuals had the same methylation state. Additionally, any site that did not have coverage in all samples and all biological replicates was removed. A site was considered covered in a sample if a methylated cytosine had been called for that position or if that position had coverage of four or more sequencing reads. This list of CG-SMPs was used for all subsequent analysis included in Figure 1. Calculating the “epimutation rate” using linear regression
To estimate the number of SMPs that arise per generation, the number SMPs was regressed against the generation number of each individual. To calculate the number of SMPs in a given sample, a reference methylome was created. The methylation status of the reference methylome was determined by examining the three ancestor samples and finding the majority methylation status at each CG position (e.g., a site with two methylated ancestors and one unmethylated ancestor would cause that site to be methylated in the reference methylome). The number of SMPs in a sample was determined by counting the number of differences between this probable founder methylome and the methylomes of each sample at all sites where there was coverage in both replicates of all eight samples as well as where both replicates agreed (see the Generating CG SMPs section for determination of coverage). A simple linear relationship between the number of generations and the number of SMPs was assumed (i.e., Number of SMPs = Generation Number * β1 + β0). The linear regression was performed in R as implemented by the lm function. When all samples were included, this regression yielded an epimutation rate per generation (β1) of 788 with a standard error term of ± 109 as well as an adjusted R2 of .87. However, when the regression was repeated removing one sample at a time, the regression significantly improved with the removal of sample 69
4
(B1 = 703.66; standard error = 59.27; adjusted R2 = 0.9524) and changed little with the exclusion of other samples (see table S5). It is unclear if this difference exists for biological or technical reasons; it seems reasonable to exclude this sample for the purposes of this regression as it is likely confounding the results. Based on the epimutation rate per generation calculated using linear regression, we estimated the epimutation rate per site per generation by normalizing the estimate from the linear regression by the number of sites that had coverage in all 8 samples and both replicates. This calculation yielded an epimutation rate per site per generation for all samples of 4.99 * 10-4 as well as an error of ± 6.91* 10-5 and an epimutation rate per site per generation for all samples except 69 of 4.46 * 10-4 as well as an error of ± 3.75* 10-5. Calculating the “epimutation rate” using Tree puzzle
To estimate the rate of epimutations per generation (e.g., number of CGs that change methylation status per generation), we used the program Tree Puzzle (33) to create a phylogram of our 8 samples. As input to this program, we used the 1,109,132 sites for which we had coverage in all 8 samples and for which the replicates agreed in their methylation state (i.e., both replicates were methylated or both replicates were unmethylated). The tree was generated using the default settings of the program (fig. S11). The branch lengths for each node were multiplied by the number of sites input to Tree Puzzle (again, 1,109,132) to obtain an expected number of changes along each branch from the root. This expected number was then divided by the number of generations that separated the node from the root to generate the expected number of epimutations per generation (table S13). The average number of epimutations per generation was calculated by averaging the aforementioned epimutation rates along each branch (with the exception of 69, see Linear Regression), which was 2,876 epimutations per generation. Generating the CG-SMPs heatmap
All pairwise comparisons of the eight samples were considered for each CG-SMP. The total number of differences (i.e., sites where one sample is methylated at a CG-SMP and the other is not) is shown in the heatmap with deeper red intensity indicating greater dissimilarity. Statistical analysis
To determine the distribution of CG-DMRs in genomic features the expected proportion of CG-DMRs in genes, transposons and intergenic regions was calculated by summing the number of bases covered by these genomic contexts and normalizing that sum by 119 Mb. The observed proportions of CG-DMRs in each of these contexts were tested against the expected proportion using a chi-square test (table S14). Furthermore, a Pearson product moment correlation test was used to determine the strength of linear relationship of the methylation status (expressed as a binary string of methylation statuses at each site) at CG-SMPs between pairs of samples. Multivariate distance-based regression (MDMR (21, 34)) tests the hypotheses that the distance (greater or lesser) between individuals is associated with additional variables. The Euclidean distance was used to assess the distance between strains in terms of methylation status patterns (expressed as the fraction of reads containing a methylated cytosine divided by the length
5
of the DMR). These distances were then tested for association with ancestral versus descendant status. To assess the distance between methylation densities of the DMRs the Pearson's product moment correlation-based distance was used. These distances were then tested for association with the location of DMRs (e.g., genic, exonic, intronic, intergenic and in transposons). P-values were determined via permutation tests (number of permutations = 1000). A Python implementation of MDMR based on the program DISTLM developed by Marti J. Anderson (35) was used. The program can be found at http://www.stsiweb.org/index.php/infrastructure/software_data/multivariate_distance_matrix_regression_mdmr/. The R package pvclust was used to calculate bootstrap values for the clusters in Figure 2B. The values in green above each part of the tree represent the bootstrap probability that the cluster does exist in our data set and the values in red represent the approximately unbiased (AU) p-values. The null hypothesis for both of these tests is that the cluster does not exist in our data set. Consequently, a high percentage indicates a high confidence that the cluster does indeed exist. Red rectangles indicate clusters that have an AU value greater than 95%. Not only do all of the replicates cluster strongly with one another as expected, the ancestors cluster strongly together in a group that is separate from the descendants. Identification of DMRs
DMRs were identified using the methylPipe package in R (36). Each specific methylation context (CG, CHG and CHH) was scanned genome-wide requiring at least 10 mC differences within a 100 bp window. The 100 bp window size is an initial query which is later reduced to the first and last cytosine in the DMR (which can be less than 100 bp). The methylation level of the sites within a window was then compared across all samples using a using a Kruskal- Wallis test. Next, these potential DMRs were consolidated by joining neighboring DMRs that occur within 50 bp of each other. The P-values of joined DMRs were combined using Fisher’s Method. The P-values of these joined DMRs were then adjusted for multiple hypotheses testing with the Benjamini-Hochburg method as implemented in R, and any DMR with an adjusted P-value below 0.01 was kept. Furthermore, a stringent requirement of an 8-fold difference in methylation density between the least methylated and most methylated sample was also required. The list of mC DMRs was determined by finding the intersection of the DMRs in all three contexts. The final list of CG-, nonCG- and C-DMRs can be found in table S6-8. DMR distribution across gene bodies
To calculate the relative density of DMRs across gene bodies, DMRs (both CG and nonCG) were overlapped with protein coding genes from the TAIR10 reference. For our gene annotations, we used the file found at: ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_gff3/TAIR10_GFF3_genes.gff. The list of nonCG DMRs was created by taking the intersect of DMRs in the CHG and CHH context. These overlaps were then used to calculate the density of these DMRs within genes by dividing each gene into 12 bins (10 evenly spaced bins in the gene body and a 500 bp bin upstream and downstream of each gene) and calculating the density of DMRs within each of those bins (# bp of DMR overlap / #
6
of bp within a bin across all genes). The densities for a particular DMR class (i.e., CG or nonCG) were then normalized by the minimum density within that class. Distribution of DMRs in introns and exons
A list of introns and exons was created using the TAIR10 GFF file obtained at ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_gff3/TAIR10_GFF3_genes.gff. Next, the number of bases that overlapped between CG-DMRs or nonCG-DMRs and either exons or introns was calculated. This count of bases was then normalized by the total number of CGs/nonCGs in each feature type. Genome-wide distribution of methylation variation
Counts of SMPs, CG-DMRs, nonCG-DMRs, mnonCGs, mCGs, mappable nonCGs, and mappable CGs were generated and plotted in 100 equally sized bins across each chromosome. Mappable CGs/nonCGs were defined as those that were covered by at least one read in one of our 16 samples. The SMP, CG-DMR, nonCG-DMR, mnonCG and mCG counts in each bin were normalized by the count in the respective mappable CG/nonCG bin. These normalized counts were then scaled to one by dividing all elements within a dataset (i.e., within SMPs, CG-DMRs, nonCG-DMRs, m-nonCGs, or mCGs) by the maximum value in their respective dataset. These scaled and normalized counts were then plotted for each chromosome. Alignment of smRNA reads
smRNA reads were first processed to remove the 3’ adapter sequence and smRNAs greater than 16 bp in length were aligned to the TAIR10 reference genome using the Bowtie alignment algorithm using the following parameters - -solexa-quals -e 1 -l 20 -n 0 -a -m 1000 --best --nomaqround. Reads were retained that contained perfect matches within the genome and that did not have more than a thousand locations. Analysis of sequence variants
Primer sets for all 72 of the C-DMRs identified were designed to encompass the entire DMR in addition to an extra 500 bp flanking the DMR. All primer sets can be found in table S15. Southern blot analysis
Genomic DNA was isolated from single plants using the Qiagen DNeasy kit following the manufacturer’s protocol (Qiagen). Two micrograms of genomic DNA was digested with enzyme (5 units/ug) and incubated for 12 hours at 37C. Digested DNA was EtOH precipitated on ice for 30 minutes and resuspended in 20ul of TE. Samples were incubated at 65C for 10 minutes and then loaded into a 0.7% TAE agarose gel and run overnight at a rate of <I V/cm. The gel was soaked in Alkaline Transfer Buffer for 15 minutes two times and transferred overnight to Amersham Hybond XL (GE Healthcare, Piscataway, NJ). The membrane was crosslinked using a UV Stratalinker with the default settings. The membrane was incubated in 10ml of the Sigma Perfect-Hyb buffer (Sigma, St. Louis, MO) for 1 hour at 65C. Probes were prepared following the manufacturer’s instructions included with the NEBlot Kit (New England Biolabs) and then cleaned with Sephadex G-50 spin columns (GE Healthcare, Piscataway, NJ). The probe was incubated
7
with the membrane overnight and then washed with 20 ml of low stringency buffer (2X SSC, 0.1% SDS) for 5 minutes once and then twice with high stringency buffer (0.5X SSC, 0.1% SDS) for 20 minutes and then a final time in ultra-high stringency buffer (0.1X SSC, 0.1% SDS) for 2 minutes. Finally, membranes were exposed overnight to Biomax film (Carestream Health, Rochester, NY). RNA expression analysis
RNA abundance was assessed by quantitative real-time (qRT)-PCR, as described in Lewsey et. al. (37). Primers against transcripts of interest are listed in table S16. Data were analyzed using LinRegPCR (38, 39) and qBasePlus (40) software, to give efficiency-corrected relative fold changes in RNA abundance with correct propagation of errors. SOM Text Epialleles that did not alter mRNA levels
Four C-DMRs showed little or no correlation between mRNA and methylation levels, these genes all showed a strong correlation between methylation density and 24nt smRNAs (fig. S9). It should be noted, however, that these genes also contained the smallest and least dense C-DMRs of those tested and may only be expressed in a tissue-specific manner, similar to FWA (41).
Explanation of possible sources of epiallele formation and stability
By sequencing of the methylomes of eight MA lines, approximately 1.7% variation in DNA methylation was observed across five descendent lines separated by 30 generations; this amount of variation is ~5 orders of magnitude greater than the measured genetic variation observed in these same lines. Although full genome sequencing of the MA population did not uncover a single genetic variant that arose independently multiple times in these descendant lineages (18), our analysis of variation at the level of DNA methylation has identified numerous sites that are discordant in multiple descendant lines compared with their ancestral population. Furthermore, none of the 114,287 SMPs, 284 nonCG-DMRs or 72 C-DMRs and only one of the 2,485 CG DMRs overlapped with the previously identified mutations identified by resequencing these MA lines (18). Additionally, the possibility that these C-DMRs are due to local cis-linked variation (transposon insertions) was ruled out by genomic characterization of these regions. Furthermore, copy number variants that could act in cis or trans were also eliminated as a possible source of epiallele formation indicating that C-DMRs can arise independently of such mutations. Therefore, while mutations found in the MA lines were randomly distributed (with the exception of G:C - A:T transitions (18)), the variation in DNA methylation and the spontaneous formation of epialleles is likely constrained to specific sequences or chromosomal contexts (Fig. 2E and F and fig. S4 and S5).
One possible source of naturally occurring epialleles is through the RNAi pathway (15). Interestingly, one of the C-DMRs that extends into protein-coding genes occurred at a locus with overlapping sense and antisense transcripts (fig. S8), possibly targeting this region for RdDM silencing. Expression of overlapping transcripts in the same cell and at the same developmental time could result in formation of dsRNA which
8
may trigger the production of small RNAs that direct DNA methylation to these target genes (1).
With only seven occurrences out of a possible 576 events of a change in the methylation status found between biological replicates (siblings), we conclude that these identified epialleles, in large part, are meiotically heritable. As shown by their functional effects on transcription, these novel epialleles, which arose spontaneously over 30 generations (~four years in chronological time), have a significant potential to alter the phenotype of the host organism. References 1. J. A. Law, S. E. Jacobsen, Nat Rev Genet 11, 204 (2010). 15. F. K. Teixeira et al., Science 323, 1600 (2009). 17. R. G. Shaw, D. L. Byers, E. Darmo, Genetics 155, 369 (2000). 18. S. Ossowski et al., Science 327, 92 (2010). 21. M. A. Zapala, N. J. Schork, Proc Natl Acad Sci U S A 103, 19430 (2006). 31. B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Genome Biol 10, R25 (2009). 32. R. Lister et al., Nature 462, 315 (2009). 33. H. A. Schmidt, K. Strimmer, M. Vingron, A. von Haeseler, Bioinformatics 18,
502 (2002). 34. C. M. Nievergelt, O. Libiger, N. J. Schork, PLoS Genet 3, e51 (2007). 35. B. H. McArdle, M. J. Anderson, Ecology 82, 290 (2001). 36. R. Lister et al., Nature 471, 68 (2011). 37. M. G. Lewsey et al., Mol Plant Microbe Interact 23, 835 (2010). 38. C. Ramakers, J. M. Ruijter, R. H. Deprez, A. F. Moorman, Neurosci Lett 339, 62
(2003). 39. J. M. Ruijter et al., Nucleic Acids Res 37, e45 (2009). 40. J. Hellemans, G. Mortier, A. De Paepe, F. Speleman, J. Vandesompele, Genome
Biol 8, R19 (2007). 41. T. Kinoshita et al., Science 303, 521 (2003).
References and Notes
1. J. A. Law, S. E. Jacobsen, Nat Rev Genet 11, 204 (2010).
2. S. J. Cokus et al., Nature 452, 215 (2008).
3. R. Lister et al., Cell 133, 523 (2008).
4. X. Zhang et al., Cell 126, 1189 (2006).
5. D. Zilberman, M. Gehring, R. K. Tran, T. Ballinger, S. Henikoff, Nat. Genet. 39, 61 (2006).
6. S. W. Chan, X. Zhang, Y. V. Bernatavichute, S. E. Jacobsen, PLoS Biol. 4, e363 (2006).
7. S. W. Chan et al., Science 303, 1336 (2004).
8. J. Paszkowski, U. Grossniklaus, Curr Opin Plant Biol, (2011).
9. E. J. Richards, Nat. Rev. Genet. 7, 395 (2006).
10. K. Shibuya, S. Fukushima, H. Takatsuji, Proc Natl Acad Sci U S A 106, 1660 (2009).
11. P. Cubas, C. Vincent, E. Coen, Nature 401, 157 (1999).
12. K. Manning et al., Nat Genet 38, 948 (2006).
13. A. J. Thompson et al., Plant Physiol 120, 383 (1999).
14. F. Johannes et al., PLoS Genet 5, e1000530 (2009).
15. F. K. Teixeira et al., Science 323, 1600 (2009).
16. A. Vongs, T. Kakutani, R. A. Martienssen, E. J. Richards, Science 260, 1926 (1993).
17. R. G. Shaw, D. L. Byers, E. Darmo, Genetics 155, 369 (2000).
18. S. Ossowski et al., Science 327, 92 (2010).
19. Additional experiments and descriptions of methods used to support our conclusions are presented as supporting material on Science Online.
20. C. M. Nievergelt et al., Am J Med Genet B Neuropsychiatr Genet 141B, 234 (2006).
21. M. A. Zapala, N. J. Schork, Proc Natl Acad Sci U S A 103, 19430 (2006).
22. R. K. Chodavarapu et al., Nature 466, 388 (2010).
23. J. Liu, Y. He, R. Amasino, X. Chen, Genes Dev 18, 2873 (2004).
24. J. Bender, G. R. Fink, Cell 83, 725 (1995).
25. H. Yi, E. J. Richards, Genetics 183, 1227 (2009).
26. S. E. Jacobsen, E. M. Meyerowitz, Science 277, 1100 (1997).
27. H. Saze, T. Kakutani, EMBO J., (2007).
28. W. J. Soppe et al., Mol. Cell 6, 791 (2000).
29. R. A. Mosher et al., Nature 460, 283 (2009).
30. R. K. Slotkin et al., Cell 136, 461 (2009).
31. B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Genome Biol 10, R25 (2009).
32. R. Lister et al., Nature 462, 315 (2009).
33. H. A. Schmidt, K. Strimmer, M. Vingron, A. von Haeseler, Bioinformatics 18, 502 (2002).
34. C. M. Nievergelt, O. Libiger, N. J. Schork, PLoS Genet 3, e51 (2007).
35. B. H. McArdle, M. J. Anderson, Ecology 82, 290 (2001).
36. R. Lister et al., Nature 471, 68 (2011).
37. M. G. Lewsey et al., Mol Plant Microbe Interact 23, 835 (2010).
38. C. Ramakers, J. M. Ruijter, R. H. Deprez, A. F. Moorman, Neurosci Lett 339, 62 (2003).
39. J. M. Ruijter et al., Nucleic Acids Res 37, e45 (2009).
40. J. Hellemans, G. Mortier, A. De Paepe, F. Speleman, J. Vandesompele, Genome Biol 8, R19 (2007).
Supplemental Figure 1. A scheme of the generation of the mutation acccumulation population usedin this study. A single founder line was used to generate this population. The three replicate ancestral lines (1, 12, 19) are separated from the original found by three generations.The five replicate descendant lines are 30 generations removed from the original founder line.
Descendants
Ancestors
Ancestors Descendants
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
1 19 12 29 49 59 69 119
Unmethylated SMP
Methylated SMP
Tota
l Num
ber o
f SM
Ps
Supplemental Figure 2. Total number of methylated and unmethylated SMPs per line.The number of methylated SMPs per line are not dependent on ancestral or descendantstatus. Red portions indicate unmethylated SMPs and blue portions indicate methylatedSMPs.
29_r
1
29_r
2 59_r
1
59_r
2 119_
r1
119_
r2
69_r
1
69_r
2
49_r
1
49_r
2
1_r1
1_r2
12_r
1
12_r
2
19_r
1
19_r
2
0.5
1.0
1.5
2.0
2.5
Cluster dendrogram with AU/BP values (%)
Cluster method: completeDistance: euclidean
Hei
ght
100
100100100 99
100100
100100
100
92
787070
au
100
100100100 99
100100
10099
100
83
342729
bp
1
234 5
67
89
10
11
121314
edge #
Supplemental Figure 3. Results from clustering using the R package pvclust. Values in red represent the approximately unbiased (AU) values and values in green represent bootstrap probability values. Red rectangles indicate portions of the tree that have AU values above 95. Based on these rectangles, one can see that the ancestors group together away from the descendants.
Supplemental Figure 4. Genome-wide distribution of mCGs, CG-SMPs and CG-DMRs for eachchromosome. Each chromosome was broken down into 100 equally spaced bins. The CG-SMP, CG-DMR, and mCG counts in each bin were normalized by the number of mappable CGs withineach bin. Red shaded boxes indicate positions of the centromeres for each chromosome. Redlines indicate mCGs, green lines indicate CG-SMPs and blue lines indicate CG-DMRs.
0.2
0.4
0.6
0.8
1.0
Chromosome II
Nor
mal
ized,
sca
led
coun
t
0 2 6 10 14 18Position (Mb)
0.2
0.4
0.6
0.8
1.0
Chromosome III
Nor
mal
ized,
sca
led
coun
t
0.2
0.4
0.6
0.8
1.0
Chromosome IV
Nor
mal
ized,
sca
led
coun
t
0.2
0.4
0.6
0.8
1.0
Chromosome V
Nor
mal
ized,
sca
led
coun
t
4 8 12 16 0 2 6 10 14 18Position (Mb)
4 8 12 16
0 2 6 10 14 18Position (Mb)
4 8 12 16 0 2 6 10 14 18Position (Mb)
4 8 12 16
20 22
2220 24
mCGCG-SMPCG−DMR
mCGCG-SMPCG−DMR
mCGCG-SMPCG−DMR
mCGCG-SMPCG−DMR
0.0
0.2
0.4
0.6
0.8
1.0
Chromosome II
Nor
mal
ized,
sca
led
coun
t
0 2 6 10 14 18
non-mCGnonCG−DMR
4 8 12 16 20Position (Mb)
0.0
0.2
0.4
0.6
0.8
1.0
Chromosome V
Nor
mal
ized,
sca
led
coun
tm-nonCGnonCG−DMR
0 2 6 10 14 18Position (Mb)
4 8 12 16 2220 24
non-mCGnonCG−DMR
1816 20 22
0.0
0.2
0.4
0.6
0.8
1.0
Chromosome III
Nor
mal
ized,
sca
led
coun
t
0 2 6 10 144 8 12Position (Mb)
0.0
0.2
0.4
0.6
0.8
1.0
Chromosome IV
Nor
mal
ized,
sca
led
coun
t
non-mCGnonCG−DMR
0 2 6 10 14 184 8 12 16Position (Mb)
Supplemental Figure 5. Genome-wide distribution of m-nonCGs and nonCG-DMRs for eachchromosome. Each chromosome was broken down into 100 equally spaced bins. The nonCG-DMR,and m-nonCG counts in each bin were normalized by the number of mappable nonCGs withineach bin. Red lines indicate m-nonCGs, green lines indicate nonCG-DMRs.
1 12 19 29 49 59 69 119 1 12 19 29 49 59 69 119
10 kb
8 kb
6 kb5 kb4 kb
3 kb
2 kb
1.5 kb1 kb
10 kb
8 kb
6 kb5 kb4 kb
3 kb
2 kb
1.5 kb1 kb
Supplemental Figure 6. Southern blot analysis of the C-DMR region overlapping with At5g24240(A) and the C-DMR overlapping with At3g01345 (B). Arrows indicate the expected size offragments from genomic digestions with (BamHI - A) and (EcoRI - B). The higher product observed in (B) is a homologous sequence that is invariably present in all eight lines.
A B
Line 1
Line 12Line 19
12
77
467 7
12
Supplemental Figure 7. A Venn diagram representing the overlap of C-DMRs between the threeancestral lines studied (1, 12, 19). In total, 46 C-DMRs are found in agreement in all three lines.Each ancestral line contains discordant C-DMRs among each other indicating possible hotspots ofepiallelic variation.
-1 0 1 2 3 4 5 6 7 8
D E
F G1
19
12
29
49
59
69
119
Ance
stor
sD
esce
ndan
ts
At1g53490 Log2 fold change in mRNA abundance of At1g53490 (relative to line 29)
Supplemental Figure 8. (A) An example of a C-DMR at At3g01345 that has lost methylation in all biological replicates of descendant lines 29, 49, 69 and 119. (B) qRT-PCR analysis of At3g01345 reveals >500-fold increase in mRNA abundance in the unmethylated lines. (C) The 24nt smRNAs are associated with transcriptional silencing in each of the corresponding MA lines. Interestingly, 21nt smRNAs appear in the MA lines that are transcriptionally active. (D and E) Examples of C-DMRs that overlap with protein-coding regions. qRT-PCR results that reveal a strong correlation between an absence of the mC-DMR and an increase in mRNA abundance. (E), smRNA levels at each C-DMR. 24nt smRNA levels correlate with mC-DMRs and less abundant mRNA levels.
At1g53480
-1 0 1 2 3 4 5 6 71
19
12
29
49
59
69
119
Ance
stor
sD
esce
ndan
ts
At1g53480
Log2 fold change in mRNA abundanceof At1g53480 (relative to line 19)
Supplemental Figure 9. (A-D), Examples of C-DMRs that overlap with protein-coding regions visualizedin DNA methylation tracks of ancestral and descendant lines. (E-H), qRT-PCR results that reveal no correlation between an absence of the mC-DMR and an increase in mRNA abundance. (I-L), 24nt smRNA levels associate with mC-DMRs
Log2 fold change in mRNA levels of At5g43500.2 (relative to line 1)
Log2 fold change in mRNA levels of At5g43500.1 (Relative to line 1)
Ance
stor
sD
esce
ndan
ts
1191229495969119
Ance
stor
sD
esce
ndan
ts
Supplemental Figure 10. Epiallelic regions overlapping ncRNAs and alternative transcriptional start sites. (A) An example of a C-DMR at a ncRNA (At4g14548). DNA methylation is absent from both biological replicates of line 1. (B) qRT-PCR analysis of At4g14548 reveals increased mRNA abundance in the line that has lost DNA methylation. (C) A C-DMR occurring near the transcriptional start site of one of two splice variants. A zoomed in view of the C-DMR reveals an additional region of variation in DNA methylation (outlined with red box). (D) qRT-PCR analysis of the At5g43500.2 splice variant shows no correlation between mRNA expression and methylation state. (E) The methylation status of this region is associated with the transcriptional output of At5g43500.1. Error bars indicate standard error of the mean (s.e.m.).
Supplemental Figure 11. An estimate of the rate of epimutations per generation (e.g., number of CGs that change methylation status per generation), was generated using the program Tree Puzzle to create a phylogram of all8 lines.
Table S1. MethylC-Seq data set details. The non-conversion percentage is a measure of the bisulfite conversion reaction efficiency. A 1% non-conversion rate indicates that 99% of unmethylated Cs were converted by the reaction. The non-conversion rate is determined as described by Lister et al 2008 Cell (3).
Table S2. Variation in CG methylation within the descendant lines. The number of variable CGs was calculated by looking for sites with coverage in all 5 descendant samples that did not completely agree (i.e., were not all methylated or all unmethylated). This number was then divided by the number of CGs present in the Arabidopsis genome on both strands (6,269,413).
Variable CGs Percentage of CGs that vary out of all CGs Difference Omit 69_2 77270 1.23% 0.38%
Omit 49_1 87502 1.40% 0.22% All 101256 1.62% 0.00%
Table S3. Number of dissimilar CG-SMPs used in the heatmap construction in Figure 1D. This table summarizes the number of dissimilar sites of all pairwise comparisons of the samples. A site is considered dissimilar if one sample has a methylated CG at a particular position and the other has an unmethylated CG. Only positions that agreed within replicates and that were covered in all 8 samples were considered.
Table S4. Pearson correlation coefficients of the methylation status of CG MPs for pairs of samples. Samples labeled 19, 1, 12 represent the ancestral lines while the labels 49, 29, 59, 69 and 119 represent the ancestral lines. All pairwise correlations were high statistically significant.
1 6721879 6722331 No 1 7138302 7138494 Yes 1 9573098 9573237 No 1 12083824 12083955 No 1 12097125 12097392 No 1 12642562 12642676 No 1 12693068 12693494 No 1 13024899 13025017 No 1 13065621 13065813 No 1 13106344 13106654 No 1 14018983 14019250 No 1 15935702 15935801 Yes 1 16790636 16790824 No 1 16881902 16882300 No 1 17229326 17229416 Yes 1 18078584 18078783 No 1 19744227 19744336 No 1 19963111 19964044 No 1 21249682 21249966 No 1 23755482 23755655 No 1 23856241 23856305 No 1 25215925 25215993 No 1 26185816 26186008 Yes 1 28515048 28516112 No 2 2228985 2229108 No 2 2232080 2232216 No 2 10241165 10241316 No 2 10591404 10591529 No 2 11812783 11812961 No 2 12623205 12623326 No 2 18343591 18343672 No 3 129070 129620 No 3 1848987 1849151 No 3 7253885 7254003 No 3 8048211 8048423 No 3 10782032 10782078 Yes 3 11023752 11023813 No
3 11097266 11097400 Yes 3 11101052 11101159 No 3 12460173 12460346 No 3 14095664 14095826 No 3 15080339 15080442 No 3 15796135 15796984 No 3 17520356 17520587 No 4 5323339 5323500 No 4 5965397 5965593 No 4 5987665 5988307 No 4 6551569 6551829 No 4 7089697 7089756 No 4 7843434 7843652 No 4 8088616 8088806 No 4 13536498 13536663 No 4 17413090 17413185 No 5 486678 486859 No 5 491553 491845 No 5 503109 503284 No 5 3752051 3752582 No 5 7039542 7039625 No 5 8081538 8081752 No 5 8233963 8234252 No 5 9818859 9819021 No 5 10807842 10807960 No 5 13834763 13834799 No 5 14089098 14089483 No 5 15154323 15154464 No 5 16602741 16603118 No 5 16995015 16995176 No 5 17469569 17469734 Yes 5 19198293 19199027 No 5 19830052 19830235 No 5 22804946 22805144 No 5 26480972 26481169 No
Table S9. A table of C-DMRs and corresponding 21nt smRNA RPKCMs chr start end 1 12 19 29 49 59 69 119