This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Distribution, silencing potential and evolutionary impactof promoter DNA methylation in the human genomeMichael Weber1, Ines Hellmann2,3, Michael B Stadler1, Liliana Ramos4, Svante Paabo2, Michael Rebhan1 &Dirk Schubeler1
To gain insight into the function of DNA methylation at cis-regulatory regions and its impact on gene expression, we measuredmethylation, RNA polymerase occupancy and histone modifications at 16,000 promoters in primary human somatic and germlinecells. We find CpG-poor promoters hypermethylated in somatic cells, which does not preclude their activity. This methylation ispresent in male gametes and results in evolutionary loss of CpG dinucleotides, as measured by divergence between humans andprimates. In contrast, strong CpG island promoters are mostly unmethylated, even when inactive. Weak CpG island promotersare distinct, as they are preferential targets for de novo methylation in somatic cells. Notably, most germline-specific genes aremethylated in somatic cells, suggesting additional functional selection. These results show that promoter sequence and genefunction are major predictors of promoter methylation states. Moreover, we observe that inactive unmethylated CpG islandpromoters show elevated levels of dimethylation of Lys4 of histone H3, suggesting that this chromatin mark may protect DNAfrom methylation.
Cytosine methylation is the only covalent DNA modificationdescribed in mammals. Genetic studies have established that thisepigenetic mark is required for embryonic development1, genomicimprinting2 and X-chromosome inactivation3, and alterations in DNAmethylation are linked to many human diseases, including cancer4.
In mammals, methylation is restricted to CpG dinucleotides, whichare largely depleted from the genome except at short genomic regionscalled CpG islands, which commonly represent promoters5. Cytosinemethylation can interfere with transcription factor binding, yetrepression seems to occur largely indirectly, via recruitment ofmethyl-CpG binding domain (MBD) proteins that induce chromatinchanges6. Consequently, the strength of repression could depend onthe local concentration of CpGs within the promoter. Indeed, it isestablished that methylation of CpG-rich promoters is incompatiblewith gene activity, yet no conclusive picture has emerged for promo-ters containing low amounts of CpGs7,8. Equally uncertain is thecontribution of promoter DNA methylation to tissue-specific geneexpression, which predicts a dynamic reprogramming during devel-opment9. Most CpG island promoters remain unmethylated even incell types that do not express the gene10. However, changes in DNAmethylation linked to tissue-specific gene expression have been seensporadically on CpG-rich promoters11,12, although other studies failedto show such a connection based on the analysis of a small set ofgenes13,14. This inconclusive picture is a consequence of the limitednumber of genes analyzed and is further complicated by potential
artifacts resulting from studying immortalized cell lines, which accu-mulate aberrant methylation in culture15.
Genomic depletion of CpG dinucleotides in mammals is thought toreflect inherent mutability of methylated cytosines as observed inbacteria16 and in vitro17. Moreover, deamination of an unmethylatedcytosine creates a uracil that is easily recognized by the base excisionrepair machinery, yet deamination of a methylated cytosine creates athymine, leading to a potential C to T transition. Notably, twoenzymes (thymine DNA glycosylase (TDG) and MBD4) have beenreported to selectively remove thymine from a T:G mismatch in thecontext of CpG dinucleotides18,19, thus questioning if C to T transi-tions are mandatory. In light of these repair pathways, the evolu-tionary dynamics of CpGs could depend on positive or negativeselection for CpGs rather than methylation in the germline. However,current estimates are mostly derived indirectly from sequence ratherthan actual measurement of DNA methylation20,21.
To test models on the genomic distribution of DNA methylationand its impact on gene activity and sequence evolution, we generatedan epigenomic map of DNA methylation, RNA polymerase IIoccupancy and chromatin state for 16,000 promoters in humanprimary somatic and germline cells. We find that both methylationfrequency and its silencing potential are related to a gene’spromoter sequence and the function of its product, and wepropose that weak CpG islands are predisposed to de novo methylationduring differentiation.
Received 31 October 2006; accepted 29 January 2007; published online 4 March 2007; doi:10.1038/ng1990
1Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, CH-4058 Basel, Switzerland. 2Max Planck Institute for Evolutionary Anthropology,Deutscher Platz 6, D-04103 Leipzig, Germany. 3University of Copenhagen, Universitetsparken 15, Copenhagen +, Denmark, 2100. 4Department of Obstetricsand Gynaecology, Radboud University Nijmegen Medical Center, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands. Correspondence should be addressed toD.S. ([email protected]).
NATURE GENETICS VOLUME 39 [ NUMBER 4 [ APRIL 2007 45 7
RESULTSProfiling promoter DNA methylation in the human genomeTo determine the methylation status for a comprehensive set of humanpromoters, we enriched methylated DNA from human primaryfibroblasts using methylated DNA immunoprecipitation (MeDIP)methodology22 and combined it with microarray detection. Thechosen array represents 24,134 putative human promoters, eachcovered by 15 oligonucleotides spanning from 1.3 kb upstream to0.2 kb downstream of the transcription start site (Fig. 1a). Toeliminate potentially falsely assigned promoters that might representintergenic regions, we used experimental and computational evidencefrom various sources (see Methods) to generate a subset of 15,609high-confidence promoters (Fig. 1b). These promoters largely over-lapped with start sites of Ref Seq genes (Fig. 1b) and were used in allfurther analyses. In addition, measurements were limited to oligo-nucleotides from 700 bp upstream to 200 bp downstream of thetranscription start site, to reduce noise caused by distal oligonucleo-tides residing in upstream intergenic regions (Supplementary Fig. 1online). The measurements for each promoter proved to be highlyreproducible between biological replicates (R ranging from 0.91 to0.95; see Supplementary Fig. 2 online and Methods), from which wecalculated a mean value.
Single-gene controls confirmed that the array measurements accu-rately reflected the enrichment in the MeDIP procedure (Fig. 1c).Among genes with high promoter DNA methylation, we detected anumber of imprinted genes previously shown to have allele-specificpromoter methylation (Fig. 1d). In agreement with the link betweenpromoter DNA methylation and X chromosome inactivation infemales3, we also observed that promoter DNA methylation washigher on the X chromosome than on autosomes (SupplementaryFig. 3 online). This reflects CpG island promoter methylation ofgenes that undergo X inactivation; genes that escape X inactivationwere indistinguishable from autosomal genes (Fig. 1e). Notably,
non–CpG island promoters did not show differential DNA methyla-tion in relation to their X inactivation status (Fig. 1e), suggestingthat their inactivation was not reflected in changes in DNA methyla-tion (see below).
Promoter classes in relation to CpG frequencyApproximately 70% of human genes are linked to promoter CpGislands, whereas the remaining promoters tend to be depleted inCpGs21. This is evident in our set of 15,609 promoters, which had twodistinct populations with high and low CpG frequency (Fig. 2).However, both populations showed a substantial overlap correspond-ing to promoters with intermediate CpG frequency. We hypothesizedthat these might differ from low and high CpG promoters in theirregulation by DNA methylation. Therefore, we defined three classes ofpromoters based on CpG ratio, GC content and length of CpG-richregion (see Methods for details). High-CpG promoters (HCPs) andlow-CpG promoters (LCPs) form two nonoverlapping populationsthat represent strong CpG islands and clear non–CpG island promo-ters, respectively (Fig. 2). Promoters with intermediate CpG content(ICPs) contain many promoters that are close to the CpG islandcriteria introduced in ref. 23, and 91% of them (compared with 8%of LCPs and 100% of HCPs) fulfill the less-stringent CpG islandcriteria defined in ref. 24; therefore, ICPs will also be referred to as‘weak’ CpG islands.
To estimate differences in expression patterns between the threeclasses, we matched the promoters with a set of 2,018 housekeepinggenes defined from public expression data (see Methods). Thesehousekeeping genes are unevenly distributed in the classes, as theyare 1.2-fold overrepresented in the HCP class, 1.2-fold underrepre-sented in ICPs and 2.3-fold underrepresented in LCPs (w2 test: P ¼4.6 � 10–37). This agrees with previous reports showing that CpGisland promoters are more frequently, but not exclusively, associatedwith housekeeping genes21.
Figure 1 Defining the promoter methylome in
human primary fibroblasts. (a) Input DNA and
5-methylcytosine (5mC)-enriched MeDIP
samples were cohybridized to a high-density
oligonucleotide microarray representing
human promoters. Promoter methylation
levels are calculated as the average of
oligonucleotide ratios (5mC bound over input)
between –700 bp and +200 bp relative to
the transcription start site (Supplementary
Fig. 1). (b) To remove potentially falsely
annotated promoters, we filtered them based
on RefSeq, FirstEF and mRNA annotations
(see Methods). The Venn diagram illustrates
that the validated promoters largely overlapwith promoters of RefSeq genes.
(c) Validation of microarray results. Randomly
selected promoters were amplified by
PCR from input (IN) and MeDIP-enriched (M)
fractions. In each case, the PCR reflects the
enrichment measured on the microarray
(given as a log2 ratio). (d) Microarray
detection of DNA hypermethylation on the
promoter of the imprinted MEST gene, as previously described49. The dots mark the methylation level (log2 ratio) of RefSeq gene promoters
shown below the graph. (e) Promoter DNA methylation on the X chromosome. Promoter sequences were matched to published X inactivation expression
data45. Box plots show promoter methylation levels for genes subjected to (+) or escaping (–) X-inactivation, depending if promoters contain a CpG island.
Only CpG island promoters of genes that undergo X inactivation show hypermethylation. Here and in all figures, the blue line marks the median, lower and
upper limits of the box mark the 25th and 75th percentiles, and lower and upper horizontal lines mark the 10th and 90th percentiles. P values were
calculated using a t-test.
Input
24,134 promoters50-mer oligos
RefSeq
Filtering
15,609 validated promoters
Validatedpromoters
429
1,19
3
15,180
RefSeqpromoters
IN M5mC
log2 ratio
FirstEF mRNAs
–700 bp +200 bp
MeDIP
UBE2B
HIST1H3B
PEX13
RCD1
HOXA9
ER
KCNA1
SMPDL3A
IPO13
SFRS5
NME6
SURF1
MGC23280
POU5F1
AQP2
MYF5
LDHC
OXT
GPR109A
–0.879
–0.610
–0.560
–0.537
–0.371
–0.353
–0.265
–0.245
–0.065
–0.015
0.060
0.088
0.242
0.378
0.448
0.485
0.723
0.906
1.200
0.6 Chr 7: 129500–129900 kb
0.4
–0.4
5mC
(lo
g2)
5mC
(lo
g2)
0.2
–0.2
0
0.8
0.6
0.4
–0.6
–0.8
–0.4
0.2
–0.2
0
TSGA14 MEST TSGA13
P = 1.15e-19 P = 0.388
COPG2 KLF14
+ + – –– + – +
CpG islandX inactivation
a c d
eb CPA4
4 58 VOLUME 39 [ NUMBER 4 [ APRIL 2007 NATURE GENETICS
Differential methylation of promoter classes in somatic cellsFigure 3a shows the DNA methylation levels in primary fibroblasts forall autosomal promoters in the three classes relative to their CpGcontent. In the case of HCPs, most promoters showed MeDIPenrichments close to the median, whereas a small subset of promotersshowed strong enrichment (Fig. 3a). Bisulfite genomic sequencingconfirmed that the least-enriched HCPs were free of methylatedcytosines, whereas those with enrichments around the median con-tained a few methylated cytosines, which, owing to the high CpGcontent, translates into a low percentage of methylation (for example,for CASP2, 4 out of 61 CpGs were methylated (7% methylation);
Fig. 3b and Supplementary Fig. 4). HCPs with MeDIP enrichmentabove 0.4 were strongly methylated (Fig. 3b and SupplementaryFig. 4), and these represent 3% (292 out of 9,527) of autosomalHCPs. Therefore, as predicted from previous work20, CpG islandsremain mostly free of DNA methylation even in terminally differ-entiated cells, yet 3% of HCPs show high methylation.
Weak CpG islands showed a markedly higher frequency of DNAmethylation (Fig. 3a): 21% (385 out of 1,841) of autosomal ICPsshowed high methylation values (log2 ratio 40.4) indicative ofcomplete methylation, as confirmed by bisulfite genomic sequencing(Fig. 3b). LCPs showed a different pattern of DNA methylation: weobserved a positive correlation between promoter enrichment andCpG content (Fig. 3a). This dependency can be reconciled if mostLCPs show a high rate of CpG methylation, and consequently theirenrichment becomes a function of their number of CpGs. Indeed,bisulfite genomic sequencing on randomly chosen promoters showedthat most LCPs were methylated (Fig. 3b and Supplementary Fig. 5online). Thus, low enrichment in the LCP class does not reflect anunmethylated state but rather the low abundance of substrate to berecognized by the 5-methylcytosine (5mC) antibody. Similar toHCPs, modest enrichments around the median represent few methy-lated CpGs, yet in LCPs this translates into a high relative methylationlevel owing to low CpG content (for example, 4.5 out of 5 CpGs(90%) were methylated in EHF; Fig. 3b). We conclude that LCPsare overall methylated, HCPs are almost exclusively unmethylated andICPs show a high frequency of methylation. Consequently, LCPsand ICPs are largely overrepresented among hypermethylatedpromoters (Fig. 3c).
Polymerase occupancy in relation to DNA methylationNext, we determined the activity of all promoters by measuring RNApolymerase II occupancy using chromatin immunoprecipitation
Figure 3 Frequency of DNA methylation in
promoter classes. (a) The scatter plots show the
DNA methylation levels for all promoters relative
to their CpG content (CpG/bp) for the three
promoter classes. Each spot represents one
promoter. The dashed line marks the threshold(log2 ratio ¼ 0.4) above which promoters
in ICP and HCP classes are classified as
hypermethylated based on bisulfite sequencing
(Fig. 3b and Supplementary Fig. 4). A similar
threshold does not apply to LCPs, as in this
class, MeDIP enrichment can be limited by the
low number of CpGs even in the methylated state
(see Fig. 3b and main text). Numbered promoters
refer to the bisulfite controls shown in b.
(b) Bisulfite sequencing controls for a subset of
promoters in each class. The red line indicates
the region covered by the oligonucleotides on
the microarray, and the black line the region
amplified for bisulfite sequencing. CpGs are
represented as open dots (if unmethylated) or
filled dots (if methylated). The percentage of
CpG methylation is indicated for each promoter.
Additional bisulfite controls are shown inSupplementary Figures 4 and 5. (c) Pie charts
showing the relative frequency of classes among
total promoters and hypermethylated promoters
(defined by log2 ratio 4 0.4). LCPs and ICPs are largely overrepresented among hypermethylated promoters (w2 test: P ¼ 8 � 10–258). Note that the
percentage of LCPs among hypermethylated promoters is underestimated, as many fully methylated LCPs do not contain sufficient CpGs to pass the 0.4
enrichment threshold (see text).
600
700
500
400
300
200
100
0
600
200
100
0
700
500
400
300
200
100
0
0 0.2 0.4 0.6 0.8 1.0 1.2CpG ratio (obs/exp)
0 0.2 0.4 0.6 0.8 1.0 1.2CpG ratio (obs/exp)
Num
ber
of p
rom
oter
s
ICP
LCP
HCP
Figure 2 Promoter classification based on CpG representation. The gray
histogram represents the distribution of observed versus expected CpG
frequencies for all 15,609 promoters analyzed, showing a bimodal
distribution of CpG-rich and CpG-poor promoters. To separate two
nonoverlapping populations, lower- and higher-stringency criteria were used
to define the low (LCPs, red, n ¼ 3,627) and high (HCPs, blue, n ¼ 9,928)
CpG content groups, as well as a smaller group with intermediate CpG
content (ICPs, green, n ¼ 2,054) (see Methods for details on calculations).
6
1.5
1.0
0.5
0
–0.5
–1.0
–1.50 0.02 0.04
CpG/bp0 0.02 0.06 0.10 0.14 0.18
CpG/bp0.02 0.04 0.06 0.08
CpG/bp
5mC
(lo
g2)
LCP ICP HCP
7
8
9
5
4
12
3
Total promoters
Hypermethylated promoters
LCP
LCP
ICP
ICP
HCP
HCP
65%
33%
23%
25%42%
12%
(1) MT1B (4) FGF6 (7) CDX1
(8) CASP2
(9) BCLAF1
(5) GSTT1
(6) HIST1H4E
66%
(2) OTOR 71%
(3) EHF 90%
71%
4%
1% 0%
7%
85%
a
b
c
NATURE GENETICS VOLUME 39 [ NUMBER 4 [ APRIL 2007 45 9
(ChIP). The enrichment profile showed a bimodal distribution, whichwe used to define a set of polymerase-bound and presumably activepromoters (Fig. 4a). A comparison with TAF-1 and polymerase II–bound promoters identified in unrelated human fibroblasts25 showedmarked similarity. Of those promoters identified in ref. 25 that arerepresented on our microarray, 94% were scored active in our data set(Fig. 4b). The frequency of activity varies between promoter classes,with 66% of HCPs being active compared with 41% of ICPs and 11%of LCPs. This reflects the enrichment of housekeeping genes in CpGisland promoters and the higher abundance of rarely expressed tissue-specific genes in non-CpG island promoters, as demonstrated above.
Low CpG promoters showed no significant correlation betweengene activity and the abundance of methylated cytosines, suggestingthat active LCPs are not preferentially unmethylated. Indeed, thedistribution of DNA methylation values for active and inactive LCPswas not different (Fig. 4c). Bisulfite sequencing on a number of activeLCPs confirmed their methylated state (Fig. 4d and SupplementaryFig. 5). We confirmed that these methylated promoters are sites oftranscriptional initiation by showing that polymerase binding is biasedtoward the predicted start sites (Supplementary Fig. 5). Notably, thepromoter of the highly expressed FGF7 gene was hypomethylated inprimary fibroblasts (Supplementary Fig. 5), opening the possibilitythat a subset of LCPs is unmethylated when active. We conclude thatthe majority of low CpG promoters are methylated in the inactive aswell as in the active state, implying that low concentrations ofmethylated cytosines do not preclude gene activity.
In contrast to LCPs, the activity of ICPs and HCPs was negativelycorrelated with their DNA methylation status. The percentage of active
genes decreased to low levels for promoters showing elevated DNAmethylation (Fig. 4e), indicating that DNA methylation of ICPs andHCPs is largely incompatible with their activity. However, inactiveICPs and HCPs differed in their frequency of DNA methylation.Whereas the vast majority of inactive HCPs remained unmethylated, amuch higher proportion of inactive ICPs was hypermethylated(Fig. 4e). Thus, HCPs remain unmethylated even when inactive,whereas inactive ICPs are frequently methylated, implying that theyare less protected against de novo methylation.
Inactive CpG islands reside in active chromatinTo gain insight into potential mechanisms preventing DNA methyla-tion of CpG island promoters, we tested if they are associated with anestablished mark of transcriptionally active chromatin: dimethylationof Lys4 of histone H3 (H3K4)26. Active promoters show overall higherlevels of dimethylated H3K4 than inactive promoters (Fig. 5a), con-firming previous work in higher eukaryotes25,27, but we were surprisedto find that inactive promoters formed two populations with differentlevels of dimethylated H3K4 (Fig. 5a) that mirrored their DNAmethylation status. Inactive HCPs, which remain largely hypomethy-lated, showed elevated H3K4 dimethylation compared with inactiveLCPs and most ICPs (Fig. 5b). The rarely methylated HCPs show noenrichment of dimethylated H3K4, but they form too small of a groupto be visible in the density plot. Among inactive ICPs, only unmethy-lated promoters showed enrichment of dimethylated H3K4 similarto HCPs, whereas hypermethylated ones showed no enrichment(Fig. 5c). We conclude that CpG-rich promoters that are pro-tected from DNA methylation are associated with elevated levels of
1,000
800
600
400
200
0–1 0 1 2
Pol–Pol+
Pol–Pol+
Pol– Pol+ Pol– Pol+
3Pol II (log2)
Pol II, WI38 fibroblasts
2,84
3
TAF1, IMR90 fibroblasts
Num
ber
of p
rom
oter
s
4,851 325
0
–1 05mC (log2)
1
0.2
Den
sity
0.4
0.6
0.8 P = 0.10
LCP
ICP HCP ICP HCP
URP2
AGER
CHRNA10 83%
80%
90%
100
80
60
40
20
5mC (log2) 5mC (log2) 5mC (log2)5mC (log2)
–1 1
–0.8 0.8
–0.6 0.6
–0.4 0.4
–0.2 0.20 –1 1
–0.8 0.8
–0.6 0.6
–0.4 0.4
–0.2 0.20
Per
cent
age
0
100
80
60
40
20
0
P < 2 × 10–16 P < 2 × 10–16
–1 0 1 –1 0 1
1.02.5
2.0
1.5
1.0
0.5
0
0.8
0.6
0.4
0.2
0
Den
sity
a b c d
e
Figure 4 Functional consequence of DNA methylation on promoter activity depends on CpG content. (a) Density histogram representing the promoter
enrichments for RNA polymerase II (Pol II). Active promoters (marked in green) are defined as having a log2 ratio 40. (b) The Venn diagram compares
active promoters identified in this study with TAF1/Pol II binding sites identified in unrelated primary fibroblasts25. Of the TAF1/Pol II sites present on our
array, 94% (4,851 out of 5,176) are also scored as active. Notably, we also identify additional active promoters that presumably use initiation factors other
than TAF1 (ref. 50). (c) Density plot comparing the distribution of DNA methylation values for active (green) and inactive (orange) LCPs, which show no
significant differences. The P value was calculated using a Wilcoxon test. (d) Bisulfite genomic sequencing on selected active LCPs, confirming that these
are hypermethylated (Supplementary Fig. 5). (e) The left panels show the percentage of active and inactive promoters relative to increasing DNA methylation
for the ICP and HCP classes. The percentage of active promoters decreases with increasing methylation levels, showing that promoter activity and
hypermethylation are incompatible for ICPs and HCPs. Right panels show density plots comparing the distribution of DNA methylation values for active and
inactive promoters. The vertical dashed line marks the threshold for hypermethylation (log2 ratio ¼ 0.4). These plots illustrate the high frequency of DNA
methylation among inactive ICPs, whereas most inactive HCPs remain unmethylated. P values were calculated using a Wilcoxon test.
4 60 VOLUME 39 [ NUMBER 4 [ APRIL 2007 NATURE GENETICS
dimethylated H3K4 in the absence of transcription. This shows that achromatin state can predict the DNA methylation state of inactiveCpG-rich promoters and opens the possibility that chromatin struc-ture is functionally involved in protecting CpG-rich promoters fromDNA methylation.
Dynamic DNA methylation between soma and germlineTo establish if the observed promoter methylation profiles are uniqueto somatic cells, we determined the promoter methylome in maturesperm, the product of the male germline. The MeDIP experimentsproved to be highly reproducible when comparing sperm samplesfrom the same (R ¼ 0.95) or genetically unrelated donors (R ¼ 0.91,Supplementary Fig. 2). The LCP class showed high similarity in DNAmethylation patterns between fibroblasts and sperm (Fig. 6a andSupplementary Fig. 6 online): 79% (373 out of 472) of the hyper-methylated promoters from fibroblasts were also highly enriched insperm (Supplementary Fig. 6). Similar to fibroblasts, methylationenrichment of LCPs in sperm increased with CpG content(Supplementary Fig. 6), indicating that constitutive methylation inthis class was present in both somatic cells and gametes. In contrast,hypermethylation of ICPs and HCPs detected in fibroblasts was mostly
absent in germ cells (Fig. 6a and Supplemen-tary Fig. 6). Among HCPs and ICPs thatwere hypermethylated in the somatic sample,86% (236 out of 276) and 49% (184 out of373), respectively, were unmethylated insperm (Supplementary Fig. 6). Thus, mosthypermethylation of CpG-rich promoters infibroblasts seems to be somatically acquired,indicating that a defined subset of CpGislands becomes de novo methylated duringdevelopment. Notably, the frequency of thisacquisition is higher in ICPs, suggesting thatweak CpG islands are more prone to methy-lation during differentiation.
Evolutionary impact of CpG methylationCpG depletion in the human genome isthought to reflect a higher mutation rate ofmethylated cytosines16 in the germline. Usingthe promoter methylome of the sperm sam-ple, we tested if promoter hypermethylationin germ cells was manifested in an increased
rate of CpG loss. To infer rates of ongoing CpG loss and gain in thehuman lineage, we used the AMBIORE package28 to perform three-way alignments of the human, chimpanzee and rhesus macaquegenomes (using rhesus as an outgroup to assess the directionality ofCpG mutations). This demonstrated that CpG loss was considerablyhigher for LCPs than for ICPs and HCPs, whereas CpG gain and non-CpG divergence was indistinguishable (Fig. 6b and data not shown).Given that LCPs were mostly methylated in sperm, this favors themodel that DNA methylation induces CpG depletion in these pro-moters. To further relate CpG loss with DNA methylation, we dividedthe ICP class based on their methylation status in sperm and observedthat CpG loss was higher for methylated promoters than for theunmethylated promoters (Fig. 6c). Therefore, within the same pro-moter class, promoter DNA methylation in the product of the malegermline was associated with an increased evolutionary loss of CpGs.Notably, ICPs seem to lose CpG noticeably faster than HCPs evenwhen unmethylated in sperm, which could reflect either temporarymethylation in the germline or an inherent selection for CpG lossat ICPs.
Pol+Pol–
P < 2 × 10–16P = 2.11 × 10–187
0
–3 –2
H3-K4me2 (log2)
–1 0 1 2 –3 –2
H3-K4me2 (log2)
–1 0 1 2
0.1
0.2
0.3
0.4
0.5
Den
sity
0.7
0.6
0
0.1
0.2
0.3
0.4
0.5
Den
sity
0.7
0.6Pol+
H3-
K4m
e2 (
log2
)Pol–HCPICPLCP
2
1
0
–1
–2
–3
<0.4
5mC (log2)
>0.4
a b c
Figure 5 Elevated levels of H3K4 dimethylation mark inactive CpG islands. (a) Density plots comparing
H3K4 dimethylation profiles for active and inactive promoters of all classes. Active promoters show
uniform high H3K4 dimethylation, whereas inactive promoters show both intermediate and low levels
evident as two separate peaks. The P value was calculated using a Wilcoxon test. (b) Profiles of H3K4
dimethylation for inactive promoters in each promoter class. This shows that promoters with an
intermediate level of H3K4 dimethylation represent mainly HCPs and a subset of ICPs. The H3K4
dimethylation profile for active promoters is shown as a dashed line for comparison. (c) The box
plot represents the distribution of H3K4 dimethylation values for inactive ICPs and HCPs that are
hypomethylated (5mC log2 ratio o0.4) or hypermethylated (5mC log2 ratio 40.4). This demonstrates
that only hypomethylated promoters show elevated H3K4 dimethylation, whereas hypermethylated
promoters show no enrichment of H3K4 dimethylation. The P value was calculated using a t-test.
5mC
(lo
g2)
fibro
blas
ts -
spe
rm
2.0
–2.0<0 >0
.40–
0.4 <0 >0
.40–
0.4 <0 >0
.40–
0.4
5mC (log2) infibroblasts
5mC (log2) infibroblasts
5mC (log2) infibroblasts
1.5
–1.5
1.0
–1.0
0.5
–0.5
0
2.0
–2.0
1.5
–1.5
1.0
–1.0
0.5
–0.5
0
2.0 DNAmethylation in
somatic cells only
DNAmethylation in
sperm only–2.0
1.5
–1.5
1.0
–1.0
0.5
–0.5
0
LCP ICP HCP
0.30
0.25
0.20
0.15
0.10
0.05
0.00
LCP
ICP
HCPLC
PIC
PHCP
ICP
<00–
0.4
>0.4
5mC (log2) in sperm
Hum
an C
pG lo
ss 0.0300.035
0.0250.0200.0150.0100.0050.000
Hum
an C
pG g
ain
0.140.120.100.080.060.040.020.00H
uman
CpG
loss
a
b c
Figure 6 Promoter DNA methylation in the germline is associated with CpG
loss. (a) Comparison of DNA methylation of autosomal promoters in human
primary fibroblasts and sperm. In each class, promoters were grouped in
three bins based on their DNA methylation level in fibroblasts. For promoters
in each bin, we subtracted the methylation measurement in sperm from that
in fibroblasts. A positive value reflects higher methylation in somatic cells
than in the germline. The box plots illustrate that methylation of LCPs is
very similar between fibroblasts and sperm, whereas hypermethylation of
ICPs and HCPs detected in fibroblasts (log2 ratio 40.4) is largely specific
to somatic cells. (b) Comparison of human, chimpanzee and rhesus
sequence was used to define CpG loss and CpG gain in the human lineage
(see main text and Methods). CpG loss and gain are shown for each
promoter class, illustrating the higher rate of CpG loss in the constitutively
methylated LCP group compared with ICPs and HCPs. (c) CpG loss for ICPs,sorted according to methylation status in sperm (hypermethylation: 5mC
log2 ratio 40.4; hypomethylation: 5mC log2 ratio o0.4). This illustrates
the link between DNA methylation in the germline and a higher rate of
ongoing CpG depletion.
NATURE GENETICS VOLUME 39 [ NUMBER 4 [ APRIL 2007 46 1
Promoter methylation of germline-specific genes in somaFinally, to gain insights into the biological roles of DNA methylationin somatic cells, we asked if methylated CpG-rich promoters inprimary fibroblasts regulate genes involved in specific biologicalprocesses. Of the rarely hypermethylated HCPs, 17% are linked togenes showing a testis-specific expression, according to GNF Sym-Atlas29 (which does not provide expression data for the human femalegermline), including well-studied genes expressed in both male andfemale germline, such as DAZL, SPO11, SOX30, BRDT, ALF, TPTE orREC8 (refs. 30,31). To confirm this observation, we analyzed GeneOntology annotations for methylated autosomal ICPs and HCPs andobserved a significant enrichment for ontology terms related togeneration of gametes (Fig. 7a). The only other enriched GO categoryin the methylated fraction refers to perception of smell and reflectsDNA methylation of a small subgroup of olfactory receptor genes thatcontain CpG-rich promoters (data not shown). This unique methyla-tion of germline-specific genes is illustrated by the histone genecluster, where the testis-specific histone variants HIST1H2BA(known as TSH2B) and HIST1H1T (known as H1t) show highpromoter DNA methylation, as reported in rodents32,33 (Fig. 7b).We confirmed this observation by PCR (Fig. 7) and bisulfite sequen-cing (Supplementary Fig. 4). Notably, the majority of describedgermline-specific genes (Supplementary Table 1 online) showedhypermethylation (Fig. 7c), indicating that this process happensquantitatively in this class of genes. This methylation of germline-specific genes was absent in mature sperm (Fig. 7c,d), suggesting thatit is established after fertilization during somatic development. More-over, it was not unique to the particular cells we studied, as weobserved it in genetically unrelated male fibroblasts and primarysamples from kidney and colon (Fig. 7d). We conclude that somaticcells show a systematic methylation of promoters for germline-specificgenes, including strong CpG islands that are otherwise protected fromDNA methylation.
DISCUSSIONPrevious models of the distribution and function of DNA methylationat cis-regulatory regions have been deduced from small data setsor inferred indirectly from DNA sequence. Moreover, the impact ofDNA methylation on transcription was determined using approachessuch as transient transfections7,8 or genomic targeting of randomintegration sites34, which do not necessarily recapitulate theendogenous chromosomal situation. In each case, low samplingnumbers limited the potential to generalize findings, especiallywhen exceptions occur at low frequencies. Thus, our comprehensiveanalysis of DNA methylation, polymerase occupancy andchromatin state of 15,609 promoters provides a useful frameworkto derive quantitative and predictive models of promoter DNAmethylation (Fig. 8).
We find the vast majority of strong CpG island promoters (HCPs)hypomethylated on autosomes, in agreement with previous observa-tions10,20,35 and computational predictions36. Thus, even though DNAmethylation is sufficient to inactivate CpG island promoters, it is notnecessary, as most inactive CpG island promoters are unmethylated.The fact that transcription seems not to be required to maintain ahypomethylated state points to alternative mechanisms that protectCpG islands against de novo methylation. Our results imply chromatinstructure as a putative pathway, as hypomethylated CpG islands showelevated levels of H3K4 dimethylation even in the absence of tran-scription. Dimethylation of H3K4 occurs uniformly on all CpG islandpromoters, arguing that it is an inherent characteristic of CpG islands.Equally notably, H3K4 dimethylation is not shared by the LCP class(Fig. 5), which contain as few methylated cytosines as HCPs; there-fore, H3K4 dimethylation seems to require a local concentration ofunmethylated CpGs. In line with this model, recruitment of H3K4methylases by unmethylated CpGs has recently been suggested37,38.Moreover, the euchromatic features of CpG islands do not seem to belimited to H3K4 methylation, as a broad H3 hyperacetylation in CpG
Perception of smell
Sexual reproduction
Spermatogenesis
Gametogenesis
Hypermethylated promoters 5mC
(lo
g2)
Unmethylated promoters
0 5 20
Relative frequency
P = 1.25 × 10–5
P = 1.25 × 10–5
P = 8.93 × 10–4
P = 8.93 × 10–4
1.5
1.0
0.5
0.0
–0.5
–1.0
–1.5
Histone gene cluster
TSH2B
H1t
Fibroblasts
Total genes Germline-specific genes
Sperm
1.5
1.0
0.5
0
1.2
0.8
0.4
0
Den
sity
5mC (log2) 5mC (log2)
P < 2 × 10–16
P = 0.08
–1 0 1 –1 0 1
H19 ICRUBE2B
HIST1H3B
TSH2B ICP
ICP
ICP
HCP
HCP
HCP
H1t
LDHCBRDT
AURKC
SP011
Tes
tis-s
peci
fic
IN M IN M IN M IN M IN MW
I38
HFL-1
Kidney
Colon
Sperm
a b c
dFigure 7 Methylation of promoters associated with germline-specific genes in somatic cells.
(a) Gene ontology analysis of autosomal ICPs and HCPs hypermethylated in fibroblasts. The dark
blue bars represent the frequency of GO terms among hypermethylated promoters relative to the
frequency among unmethylated promoters (which is set to 1). LCPs are excluded from this
ontology analysis because of their constitutive methylation. P values were false discovery rate
(FDR)-adjusted with the Babelomics FatiGO tool. (b) Promoter methylation in the histone gene
cluster on 6p21.3, showing that only the testis-specific variants of H2B (TSH2B) and H1 (H1t)
are methylated in primary fibroblasts. Vertical bars represent the promoter methylation of
individual histone genes ranked by chromosomal position. (c) Comparison of promoter methylation
profiles of germline-specific genes versus total genes in WI38 primary fibroblasts and sperm cells.
The density plots show that most germline-specific genes are hypermethylated in somatic cells
and unmethylated in sperm. Only ICPs and HCPs are considered. The complete gene list is given
in Supplementary Table 1. P values were calculated using a Wilcoxon test. (d) Methylation ofgermline-specific genes in other somatic tissues. Candidate promoters were PCR amplified from input (IN) and MeDIP-enriched (M) fractions from WI38 and
HFL-1 primary fibroblasts, primary kidney and colon samples and sperm cells. Germline-specific promoters are methylated in all somatic tissue samples
tested. The promoter class of the tested genes is indicated on the right. The imprinted H19 ICR serves as positive control for methylation, and the
housekeeping genes UBE2B and HIST1H3B as unmethylated negative controls.
4 62 VOLUME 39 [ NUMBER 4 [ APRIL 2007 NATURE GENETICS
islands has been reported39. These observations make it conceivablethat an active chromatin state is involved in precluding DNA methyl-transferase (DNMT) recruitment to CpG islands. However, it alsoraises the question of how spurious activation of such accessiblepromoters is prevented.
In contrast to CpG islands, promoters with low CpG content(LCPs) are predominantly methylated, in agreement with recentbisulfite sequencing results on human chromosomes35. We nowshow that this hypermethylation does not preclude gene expression.The lack of repression of low abundance of 5mC is also illustrated inthe HCP class, where many active promoters contain a low percentageof methylation (4%–7%; see Figs. 3 and 4). This indicates thatrepression by DNA methylation requires high 5mC density. In lightof the prevailing model of an indirect repression pathway by MBDprotein, this suggests that MBD binding is not sufficient at low DNAmethylation density for active repression. However, this does notexclude a role for low-density methylation in reducing transcriptionalnoise that could be generated by spurious initiation40. If itindeed occurs, such regulation might be more prominent at tissue-specific genes, which are enriched among LCPs. Of note, we alsoobserve a low number of LCPs that are unmethylated and active,opening the possibility that at some LCPs, demethylation occurs upongene activation.
The dynamics and role of DNA methylation in somatic celldifferentiation is controversial13. Our data argue that dynamic DNAmethylation cannot be a default repression mechanism for tissue-specific gene expression, as most inactive CpG island promotersremain unmethylated in primary cells. However, we identify severalhundred CpG island promoters (4% of the total number in thestudied fibroblasts) that are methylated in somatic cells but not inthe germline, demonstrating that somatic methylation of CpG islandsdoes occur at a significant rate in primary cells. Genomic imprinting isunlikely to account for most of this methylation, as alleles were foundequally methylated in all six cases tested by bisulfite sequencing.Notably, this soma-specific methylation occurs more frequently atICPs, indicating that weak CpG islands are preferential targets for
de novo methylation in development (Fig. 8) and that the promotersequence is a determinant of dynamic methylation. Preliminary datain mouse suggest that preferential targeting of weak CpG islands is ageneral phenomenon in mammals (F. Mohn, M. Bibel and D.S.,unpublished data). One possibility is that protection from de novomethylation is a direct function of the local CpG density, making itmore likely for weak CpG islands to become de novo methylated.
Targets for CpG island de novo methylation in somatic cells are alsopartly specified by the function of the linked gene, as germline-specificgenes are preferentially methylated. This observation is in agreementwith recent data on five genes in mouse41,42, but we now show that itis a quantitative process, because almost all CpG island promoters ofgermline-specific genes are DNA methylated in somatic cells.Although it remains to be determined how DNA methylation ispreferentially targeted to promoters of germline-specific genes andhow this process is temporarily coordinated, we speculate that DNAmethylation functions to preclude deleterious activation of meioticgenes in somatic cells. This finding predicts that the frequentlyobserved ectopic expression of testis-specific genes in tumors entailspromoter demethylation43. Notably, the preferential methylation ofgermline promoters and the increased frequency of ICPs methylationare probably independent processes, as most methylated germline-specific genes fall in the HCP class (Supplementary Table 1).Furthermore, germline-specific genes account only for a subgroupof somatically methylated CpG islands. The remaining targets donot represent defined ontology groups, yet we observe methylation ofseveral tissue-specific transcription factors (for example, CDX1,TFDP1, FHL2, NRF3, MYF5 and RUNX3), opening the possibilitythat de novo methylation could be used in part to prevent alter-native differentiation pathways by selectively repressing lineage-specific genes.
The promoter methylome of male gametes also sheds light on theevolutionary consequences of DNA methylation. When comparing thehuman and chimpanzee genomes, we observe that promoters methy-lated in the product of the male germline show a higher rate ofevolutionary CpG loss. Although the methylation state of other stagesof the male and female germline remains to be tested, this findingprovides evidence that the ongoing CpG depletion in the hominidlineage is DNA methylation dependent. However, a subset of ICPs(10% of total) show high methylation in sperm, but they are CpG rich.These might reflect evolutionarily recent methylation events, andconsequently these promoters might have different epigenetic statesbetween human and chimpanzee. Further work is necessary to addressthis possibility. At the same time, most ICPs are unmethylated in thegermline, thus raising the question of why these promoters have alower CpG content than expected. It is possible that this reflects aspecific selection for intermediate CpG content promoters in mam-malian genomes.
Our results demonstrate that DNA methylation is primarily afunction of promoter CpG content, which results in a constitutivehypo- or hypermethylated state. On top of this stable framework, weidentify a dynamic component that mediates soma-specific de novomethylation preferential to weak CpG islands. Although the exactmechanisms of targeting dynamic methylation are still elusive, ourresults suggest that in primary cells, both frequency of reprogrammingand its impact on transcription are influenced by the composition ofindividual cis-regulatory regions.
METHODSArray design and analysis. Samples were hybridized to a microarray represent-
ing promoter regions of 24,134 human genes (NimbleGen Systems,
CpG
con
tent
HCP
ICP
LCP
Promoterclass CpG
loss
in th
e
hum
an lin
eage
Somat
ic m
ethy
lation
Repre
ssion
of
m
ethy
lated
stat
e
Proba
bility
of
de
nov
o m
ethy
lation
Figure 8 Regulation of promoter DNA methylation in the human genome.
Schematic representation of promoter CpG content (which translates into
the different classes) relative to frequency of hypermethylation, impact of
methylation on sequence evolution, ability of methylated state to repress
transcription and likelihood of de novo methylation in somatic cells. This
synopsis illustrates that weak CpG islands (ICPs) are prone to regulation
by DNA methylation, as they show frequent DNA methylation in somatic
cells, and this methylated state precludes their activation. The width of
each bar represents frequency of the event or strength in case oftranscriptional repression.
NATURE GENETICS VOLUME 39 [ NUMBER 4 [ APRIL 2007 46 3
gain transitions, (vi) CpG-gain transversions and (vii) CCG-G ¼ CGG-C.
According to the recommendations for rather short sequences, we sampled
1,000 estimates after the burn-in phase. The median and the 95% confidence
intervals of the 1,000 samples were determined. The 10% of the samples with
the most extreme confidence intervals were removed from their respective
mutation categories.
Accession codes. Microarray data are accessible from the Gene Expression
Omnibus (GSE6715).
URLs. Processed data can be downloaded from our project website (http://
www.fmi.ch/members/dirk.schubeler/supplemental.htm). The R package can
be found at http://www.r-project.org. GNF SymAtlas can be found at http://
symatlas.gnf.org. The Babelomics FatiGO tool can be found at http://fatigo.
bioinfo.cipf.es.
Note: Supplementary information is available on the Nature Genetics website.
ACKNOWLEDGMENTSWe thank members of the Schubeler laboratory for advice during the course ofthe project and comments on the manuscript; E. Oakeley for generating scriptsfor data reformatting, A. Peters, M. Lorincz, C. Alvarez, P. de Boer, E. Selker andM. Groudine for critical reading of the manuscript. Primary samples fromkidney and colon were obtained from M. Haase (Dresden University ofTechnology). Work in the laboratory of D.S. is supported by the NovartisResearch Foundation, the EU 6th framework program NOE ‘The Epigenome’(LSHG-CT-2004-503433) and a European Molecular Biology Organization(EMBO) Young Investigator Award. I.H. is supported by an EMBO long-termfellowship (ALTF 1160-2005).
AUTHOR CONTRIBUTIONSM.W. designed and performed experiments and analysis and wrote themanuscript. D.S. designed the study and wrote the manuscript. M.B.S. performedhousekeeping annotations and wrote custom software. M.R. performed CpGclassifications and promoter confidence analysis, retrieved genomic informationand contributed to the writing of the manuscript. I.H. and S.P. performeddivergence analysis and contributed to the writing of the manuscript. L.R.provided purified human samples.
COMPETING INTERESTS STATEMENTThe authors declare no competing financial interests.
Published online at http://www.nature.com/naturegenetics
Reprints and permissions information is available online at http://npg.nature.com/
reprintsandpermissions
1. Li, E., Bestor, T.H. & Jaenisch, R. Targeted mutation of the DNA methyltransferasegene results in embryonic lethality. Cell 69, 915–926 (1992).
2. Li, E., Beard, C. & Jaenisch, R. Role for DNA methylation in genomic imprinting.Nature 366, 362–365 (1993).
3. Heard, E., Clerc, P. & Avner, P. X-chromosome inactivation in mammals. Annu. Rev.Genet. 31, 571–610 (1997).
4. Egger, G., Liang, G., Aparicio, A. & Jones, P.A. Epigenetics in human disease andprospects for epigenetic therapy. Nature 429, 457–463 (2004).
5. Ioshikhes, I.P. & Zhang, M.Q. Large-scale human promoter mapping using CpG islands.Nat. Genet. 26, 61–63 (2000).
6. Klose, R.J. & Bird, A.P. Genomic DNA methylation: the mark and its mediators. TrendsBiochem. Sci. 31, 89–97 (2006).
7. Boyes, J. & Bird, A. Repression of genes by DNA methylation depends on CpG densityand promoter strength: evidence for involvement of a methyl-CpG binding protein.EMBO J. 11, 327–333 (1992).
8. Hsieh, C.L. Dependence of transcriptional repression on CpG methylation density. Mol.Cell. Biol. 14, 5487–5494 (1994).
9. Brandeis, M., Ariel, M. & Cedar, H. Dynamics of DNA methylation during development.Bioessays 15, 709–713 (1993).
10. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21(2002).
11. Futscher, B.W. et al. Role for DNA methylation in the control of cell type specificmaspin expression. Nat. Genet. 31, 175–179 (2002).
12. Song, F. et al. Association of tissue-specific differentially methylated regions (TDMs)with differential gene expression. Proc. Natl. Acad. Sci. USA 102, 3336–3341(2005).
14. Warnecke, P.M. & Clark, S.J. DNA methylation profile of the mouse skeletal alpha-actinpromoter during development and differentiation. Mol. Cell. Biol. 19, 164–172(1999).
15. Smiraglia, D.J. et al. Excessive CpG island hypermethylation in cancer celllines versus primary human malignancies. Hum. Mol. Genet. 10, 1413–1419(2001).
16. Coulondre, C., Miller, J.H., Farabaugh, P.J. & Gilbert, W. Molecular basis of basesubstitution hotspots in Escherichia coli. Nature 274, 775–780 (1978).
17. Shen, J.C., Rideout, W.M., III. & Jones, P.A. The rate of hydrolytic deaminationof 5-methylcytosine in double-stranded DNA. Nucleic Acids Res. 22, 972–976(1994).
18. Hendrich, B., Hardeland, U., Ng, H.H., Jiricny, J. & Bird, A. The thymine glycosylaseMBD4 can bind to the product of deamination at methylated CpG sites. Nature 401,301–304 (1999).
19. Neddermann, P. & Jiricny, J. The purification of a mismatch-specific thymine-DNAglycosylase from HeLa cells. J. Biol. Chem. 268, 21218–21224 (1993).
20. Rollins, R.A. et al. Large-scale structure of genomic methylation patterns. GenomeRes. 16, 157–163 (2006).
21. Saxonov, S., Berg, P. & Brutlag, D.L. A genome-wide analysis of CpG dinucleotides inthe human genome distinguishes two distinct classes of promoters. Proc. Natl. Acad.Sci. USA 103, 1412–1417 (2006).
22. Weber, M. et al. Chromosome-wide and promoter-specific analyses identify sites ofdifferential DNA methylation in normal and transformed human cells. Nat. Genet. 37,853–862 (2005).
23. Takai, D. & Jones, P.A. Comprehensive analysis of CpG islands in human chromosomes21 and 22. Proc. Natl. Acad. Sci. USA 99, 3740–3745 (2002).
24. Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. J. Mol. Biol.196, 261–282 (1987).
25. Kim, T.H. et al. A high-resolution map of active promoters in the human genome.Nature 436, 876–880 (2005).
26. Peters, A.H. & Schubeler, D. Methylation of histones: playing memory with DNA. Curr.Opin. Cell Biol. 17, 230–238 (2005).
27. Schubeler, D. et al. The histone modification pattern of active genes revealed throughgenome-wide chromatin analysis of a higher eukaryote. Genes Dev. 18, 1263–1271(2004).
28. Hwang, D.G. & Green, P. Bayesian Markov chain Monte Carlo sequence analysis revealsvarying neutral substitution patterns in mammalian evolution. Proc. Natl. Acad. Sci.USA 101, 13994–14001 (2004).
29. Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes.Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004).
30. Assou, S. et al. The human cumulus–oocyte complex gene-expression profile. Hum.Reprod. 21, 1705–1719 (2006).
31. Koslowski, M. et al. Frequent nonrandom activation of germ-line genes in humancancer. Cancer Res. 64, 5988–5993 (2004).
32. Choi, Y.C. & Chae, C.B. DNA hypomethylation and germ cell-specific expressionof testis-specific H2B histone gene. J. Biol. Chem. 266, 20504–20511(1991).
33. Singal, R. et al. Testis-specific histone H1t gene is hypermethylated in nongerminalcells in the mouse. Biol. Reprod. 63, 1237–1244 (2000).
34. Schubeler, D. et al. Genomic targeting of methylated DNA: influence of methylation ontranscription, replication, chromatin structure, and histone acetylation. Mol. Cell. Biol.20, 9103–9112 (2000).
35. Eckhardt, F. et al. DNA methylation profiling of human chromosomes 6, 20 and 22.Nat. Genet. 38, 1378–1385 (2006).
36. Bock, C. et al. CpG island methylation in human lymphocytes is highly correlatedwith DNA sequence, repeats, and predicted DNA structure. PLoS Genet 2, e26(2006).
37. Ayton, P.M., Chen, E.H. & Cleary, M.L. Binding to nonmethylated CpG DNA is essentialfor target recognition, transactivation, and myeloid transformation by an MLL onco-protein. Mol. Cell. Biol. 24, 10470–10478 (2004).
38. Lee, J.H. & Skalnik, D.G. CpG-binding protein (CXXC finger protein 1) is acomponent of the mammalian Set1 histone H3-Lys4 methyltransferase complex,the analogue of the yeast Set1/COMPASS complex. J. Biol. Chem. 280, 41725–41731 (2005).
39. Roh, T.Y., Cuddapah, S. & Zhao, K. Active chromatin domains are defined byacetylation islands revealed by genome-wide mapping. Genes Dev. 19, 542–552(2005).
NATURE GENETICS VOLUME 39 [ NUMBER 4 [ APRIL 2007 46 5
41. Maatouk, D.M. et al. DNA methylation is a primary mechanism for silencing post-migratory primordial germ cell genes in both germ cell and somatic cell lineages.Development 133, 3411–3418 (2006).
42. Pohlers, M. et al. A role for E2F6 in the restriction of male-germ-cell-specific geneexpression. Curr. Biol. 15, 1051–1057 (2005).
43. De Smet, C., Loriot, A. & Boon, T. Promoter-dependent mechanism leading to selectivehypomethylation within the 5¢ region of gene MAGE-A1 in tumor cells. Mol. Cell. Biol.24, 4781–4790 (2004).
44. Davuluri, R.V., Grosse, I. & Zhang, M.Q. Computational identification of promoters andfirst exons in the human genome. Nat. Genet. 29, 412–417 (2001).
45. Carrel, L. & Willard, H.F. X-inactivation profile reveals extensive variability in X-linkedgene expression in females. Nature 434, 400–404 (2005).
46. Eisenberg, E. & Levanon, E.Y. Human housekeeping genes are compact. Trends Genet.19, 362–365 (2003).
47. Simpson, A.J., Caballero, O.L., Jungbluth, A., Chen, Y.T. & Old, L.J. Cancer/testisantigens, gametogenesis and cancer. Nat. Rev. Cancer 5, 615–625 (2005).
48. Li, Z. et al. A global transcriptional regulatory role for c-Myc in Burkitt’s lymphomacells. Proc. Natl. Acad. Sci. USA 100, 8164–8169 (2003).
49. Riesewijk, A.M. et al. Monoallelic expression of human PEG1/MEST is paralleled byparent-specific methylation in fetuses. Genomics 42, 236–244 (1997).
50. Muller, F. & Tora, L. The multicoloured world of promoter recognition complexes.EMBO J. 23, 2–8 (2004).
4 66 VOLUME 39 [ NUMBER 4 [ APRIL 2007 NATURE GENETICS