-
1
SUPPLEMENTAL DATA
Supplemental TextGrowth Conditions for Human Embryonic Stem
CellsQuality Control for Human Embryonic Stem
CellsAntibodiesChromatin ImmunoprecipitationArray DesignData
Normalization and AnalysisIdentification of Bound
RegionsControlling for the Effect of Murine Embryonic Fibroblast
Feeder CellsComparing Bound Regions to Known and Predicted
GenesComparing Binding and Human Expression DataEstimating Error
RatesBinding of Suz12, Eed and H3K27me3GO Classification for RNA
Polymerase II and Suz12-Bound GenesComparing Suz12-Bound Regions to
Domains of ConservationGenerating Suz12-deficient Mouse Cells and
Analysis of their Expression PatternSample Preparation and Analysis
of Differentiated MuscleComparing Suz12 Binding with Oct4, Sox2 and
Nanog Binding
Index of Supplemental TablesTable S1. Regions bound by RNA
polymerase II and their relationship to known and
predicted genes.Table S2. HUGO/EntrezGene identifiers for RNA
Pol II bound, annotated genes.Table S3. RNA polymerase II-bound
regions that predict novel gene candidates.Table S4. Gene models
bound by RNA polymerase II.Table S5. MicroRNA genes bound by RNA
polymerase II and Suz12 in ES cells.Table S6. Expression of genes
bound by RNA polymerase II in ES cells.Table S7. Regions bound by
Suz12 and their relationship to known and predicted genes.Table S8.
HUGO/EntrezGene identifiers for Suz12-bound, annotated genes.Table
S9. Detection of Suz12, Eed and H3K27me3 occupancy using promoter
arrays.Table S10. Enriched gene ontologies among RNA Pol II-bound
and Suz12-bound genes.Table S11. Developmental transcription
factors bound by Suz12.Table S12. Developmental signaling proteins
bound by Suz12.Table S13. Expression of Suz12-bound genes during ES
cell differentiation.Table S14. Genes bound by Suz12 in ES cells
and upregulated in Suz12 -/- mouse cells.Table S15. Developmental
regulators associated with PRC2 in ES cells and muscle.
Index of Supplemental FiguresFigure S1. Human H9 ES cells
cultured on a low density of irradiated MEFs.Figure S2. Analysis of
human ES cells for markers of pluripotency.Figure S3. Analysis of
human ES cells for differentiation potential.Figure S4. The
fraction of annotated promoters bound by RNA polymerase II or
Suz12.Figure S5. Estimating error rates.Figure S6. Co-occupation of
gene promoters by Suz12, Eed and H3K27me3.Figure S7. Protein domain
classification of Suz12- and Pol II-bound transcription
factors.Figure S8. Suz12 occupies large regions of DNA.
-
2
Figure S9. H3K27me3 co-occupies large domains with Suz12.Figure
S10. Generation of Suz12 -/- cells.Figure S11. Binding of Suz12 in
differentiated muscle.
Figure S12. Detection of genes bound by RNA polymerase II and
Suz12 in human EScell expression datasets.
Figure S13. Relationship between size of Suz12 binding domain
and RNA polymerase IIco-occupancy and gene expression.
Figure S14. Association of Oct4, Sox2 or Nanog with Suz12-bound
promoters.Figure S15. Motifs associated with DNA regions that are
bound by Oct4, Sox2, Nanog
and Suz12 or bound by Oct4, Sox2 and Nanog.
Supplemental References
-
3
Supplementary Text
Growth Conditions for Human Embryonic Stem Cells
Human embryonic stem (ES) cells were obtained from WiCell
(Madison, WI; NIH CodeWA09) and grown as described (Cowan et al.,
2004). Briefly, passage 34 cells weregrown in KO-DMEM medium
supplemented with serum replacement, basic fibroblastgrowth factor
(FGF), recombinant human leukemia inhibitory factor (LIF) and a
humanplasma protein fraction. Detailed protocol information on
human ES cell growth conditionsand culture reagents are available
at http://www.mcb.harvard.edu/melton/hues.
In order to minimize any MEF contribution to our analysis, H9
cells were cultured on alow density of irradiated murine embryonic
fibroblasts (ICR MEFs) resulting in a ratio ofapproximately >8:1
H9 cell to MEF (Figure S1). The culture of H9 on low-density
MEFshad no adverse affects on cell morphology, growth rate, or
undifferentiated status asdetermined by immunohistochemistry for
pluripotency markers (e.g. Oct4, SSEA-3, Tra-1-60; see below). In
addition, H9 cells grown on a minimal feeder layer maintained
theability to generate derivates of ectoderm, mesoderm, and
endoderm upon differentiation(see below).
Quality Control for Human Embryonic Stem Cells
Immunohistochemical analysis of pluripotency markersFor analysis
of pluripotency markers, cells were fixed in 4% paraformaldehyde
for 30minutes at room temperature and incubated overnight at 4°C in
blocking solution (5 mlNormal Donkey Solution:195 ml PBS + 0.1%
Triton-X) (Figure S2). After a brief wash inPBS, cells were
incubated with primary antibodies to Oct-3/4 (Santa Cruz
sc-9081),SSEA-3 (MC-631)(Solter and Knowles, 1979), SSEA-4
(MC-813-70)(Solter and Knowles,1979), Tra-1-60 (MAB4360; Chemicon
International), and Tra-1-81 (MAB4381;Chemicon International) in
blocking solution overnight at 4°C. Following incubation
withprimary antibody, cells were incubated with either rhodamine
red or FITC-conjugatedsecondary antibody (Jackson Labs) for 2-5hrs
at 4°C. Nuclei were stained with 4’,6-diamidine-2-phenylidole
dihydrochloride (DAPI). Epifluorescent images were obtainedusing a
fluorescent microscope (Nikon TE300). Data is shown for Oct4 and
SSEA-3. Ouranalysis indicated that >90% of the H9 cells were
strongly positive for all pluripotencymarkers.
Alkaline phosphatase activity of human ES cells was analyzed
using the Vector RedAlkaline Phosphatase Substrate Kit (Cat. No.
SK-5100; Vector Laboratories) according tomanufacturer’s
specifications and the reaction product was visualized using
fluorescentmicroscopy.
Teratoma formationTeratomas were induced by injecting 2-5 x 106
cells into the subcutaneous tissue above therear haunch of 6 week
old Nude Swiss (athymic, immunocompromised) mice. Eight totwelve
weeks post-injection, teratomas were harvested and fixed overnight
in 4%
-
4
paraformaldehyde at 4°C. Samples were then immersed in 30%
sucrose overnight beforeembedding the tissue in O.C.T freezing
compound (Tissue-Tek). Cryosections wereobtained and 10 µm sections
were incubated with the appropriate antibodies as above andanalyzed
for the presence of the following differentiation markers by
confocal microscopy(LSM 210): neuronal class II β-tubulin, Tuj1
(ectoderm; MMS-435P Covance); striatedmuscle-specific myosin, MF20
(mesoderm; kind gift from D. Fischman), andalphafetoprotein
(endoderm; DAKO) (Figure S3). Nuclei were stained blue with
4’,6-diamidine-2-phenylidole dihydrochloride (DAPI). Antibody
reactivity was detected formarkers of all three germ layers
confirming that the human embryonic cells used in ouranalysis had
maintained differentiation potential.
Embryoid bodies (EB)ES cells were harvested by enzymatic
digestion and EBs were allowed to form by plating~1 X 106
cells/well in suspension on 6-well non-adherent, low cluster dishes
for 30 days.EBs were grown in the absence of leukemia inhibitory
factor (LIF) and basic fibroblastgrowth factor (FGF) in culture
medium containing 2x serum replacement. EBs were thenharvested,
fixed for 30 minutes in 4% paraformaldehyde at room temperature,
and placedin 30% sucrose overnight prior to embedding the tissue in
O.C.T. freezing compound(Tissue-Tek). Cryosections were obtained as
described for teratoma formation. Confocalimages were obtained for
all three germ layer markers again confirming that the H9 cellsused
in our analysis have maintained differentiation potential (data not
shown; resultssimilar to those shown in Figure S3).
Antibodies
RNA polymerase II-bound genomic DNA was isolated from whole cell
lysate using8WG16, a mouse monoclonal antibody (Thompson et al.,
1989). This antibodypreferentially binds a form of RNA polymerase
II that lacks phosphorylation at the C-terminal domain of the
largest subunit of polymerase (Patturajan et al., 1999; Cho et
al.,2001; Jones et al., 2004) although this preference can be
subject to experimentalconditions.
Suz12-bound genomic DNA was isolated from whole cell lysate with
a Suz12 rabbitpolyclonal antibody purchased from Upstate
(07-379).
Eed-bound genomic DNA was isolated from whole cell lysate using
an Eed mousemonoclonal antibody previously described (Hamer et al.,
2002).
H3-K27me3-bound genomic DNA was isolated from whole cell lysate
using rabbitpolyclonal antibody purchased from Abcam (AB6002).
Chromatin immunoprecipitationsagainst H3K27me3 were compared to
reference DNA obtained by chromatinimmunoprecipitation of total
histone H3 (Abcam AB1791; epitope derived from C-terminal 100 amino
acids of histone H3) to normalize for nucleosome density.
-
5
Chromatin Immunoprecipitation
Protocols describing all materials and methods can be downloaded
fromhttp://web.wi.mit.edu/young/hES_PRC.
We performed independent immunoprecipitations for each
whole-genome analysis. HumanWA09 embryonic stem cells were grown to
a final count of 5x107 – 1x108 cells for eachlocation analysis
reaction. Cells were chemically crosslinked by the addition of
one-tenthvolume of fresh 11% formaldehyde solution for 15 minutes
at room temperature. Cellswere rinsed twice with 1xPBS and
harvested using a silicon scraper and flash frozen inliquid
nitrogen. Cells were stored at –80oC prior to use.
Cells were resuspended, lysed in lysis buffers and sonicated to
solubilize and shearcrosslinked DNA. Sonication conditions vary
depending on cells, culture conditions,crosslinking and equipment.
We used a Misonix Sonicator 3000 and sonicated at power 7for 10 x
30 second pulses (90 second pause between pulses). Samples were
kept on ice atall times.
The resulting whole cell extract was incubated overnight at 4°C
with 100 µl of DynalProtein G magnetic beads that had been
preincubated with approximately 10 µg of theappropriate antibody.
For cases where suppliers did not provide information
regardingantibody concentration, 20 µl of the supplied solution was
used per reaction. Theimmunoprecipitation was allowed to proceed
overnight.
Beads were washed 5 times with RIPA buffer and 1 time with TE
containing 50 mM NaCl.Bound complexes were eluted from the beads by
heating at 65°C with occasional vortexingand crosslinking was
reversed by overnight incubation at 65°C. Whole cell extract
DNA(reserved from the sonication step) was also treated for
crosslink reversal.
Immunoprecipitated DNA and whole cell extract DNA were then
purified by treatmentwith RNAse A, proteinase K and multiple
phenol:chloroform:isoamyl alcohol extractions.Purified DNA was
blunted and ligated to linker and amplified using a two-stage
PCRprotocol. Amplified DNA was labeled and purified using Bioprime
random primerlabeling kits (Invitrogen, immunoenriched DNA was
labeled with Cy5 fluorophore, wholecell extract DNA was labeled
with Cy3 fluorophore).
Labeled DNA was mixed (5-6 µg each of immunoenriched and whole
cell extract DNA)and hybridized to arrays in Agilent hybridization
chambers for up to 40 hours at 40°C.Arrays were then washed and
scanned. Whole genome arrays were hybridized in batchesof 35 to 60
arrays.
Slides were scanned using an Agilent DNA microarray scanner BA.
PMT settings wereset manually to normalize bulk signal in the Cy3
and Cy5 channel. For efficient batchprocessing of scans, we used
Genepix (version 6.0) software. Scans were automaticallyaligned and
then manually examined for abnormal features. Intensity data were
thenextracted in batch.
-
6
Array Design
Whole genome arrays
We designed a set of 115 60-mer oligonucleotide arrays to cover
the non-repeat maskedregion of the sequenced human genome. Arrays
were produced by Agilent Technologies(www.agilent.com).
Selection of regions and design of subsequencesWe tiled the
genome with variable density: transcription units (defined below)
were tiledwith higher density and non-transcription regions were
tiled with a slightly lower density.
To define transcription units, we first selected transcripts
from five different databases:RefSeq, Ensembl, MGC, VEGA
(www.vega.sanger.ac.uk) and Broad(www.broad.mit.edu). The first
three are commonly used databases for gene annotation,the last two
are manually annotated databases covering subsets of the human
genome fromthe Sanger Institute and Broad Institute, respectively.
We also added all microRNAs fromthe Rfam database (Griffiths-Jones
et al., 2003) and a small set of collected non-codingRNAs (manual
selection).
The entire collection of transcripts was sorted by chromosomal
order. We then extendedeach transcript 10 kb upstream to capture
proximal promoter regions. Each of theseextended transcripts was
considered a “transcription unit”. In cases where one or
moretranscription units overlapped, we merged the transcription
units into a single, larger unit.We extracted DNA sequence for all
transcription units. Separately, we extractedintervening genomic
DNA (“intergenic units”) between transcription units. All
sequencesand coordinates are from the May 2004 build of the human
genome (NCBI build 35), usingthe repeatmasked (-s) option.
We then separated sequences into subsequences in order to
efficiently process sequencesfor oligo selection. We first removed
all unmasked regions 100 bp or smaller. The smallsize of these
regions makes it more difficult to identify high quality oligos for
use on thearray. These small regions represented a small fraction
of the genome and were oftencovered by neighboring probes designed
against larger subsequences. For unmaskedregions that were 101 to
300 bp long, we treated each strand (Watson and Crick) as aseparate
subsequence. This ensured that we would have two oligos to
represent thesesubsequences if the region could not be covered by
neighboring 60-mers. For regions thatwere 301 to 640 bp long, we
divided the region into two, evenly sized subsequences.Unmasked
regions greater than 640 bp were divided into evenly sized
subsequences suchthat no individual subsequence was greater than
320 bp.
We used the program ArrayOligoSelector (AOS)(Bozdech et al.,
2003) to score 60-mersfor use on the array, but modified the oligo
selection process. We had two primary reasonsfor this. First, AOS
uses a relative quality scale in selecting oligos. For any
particularsubsequence, it generates scores based on four parameters
to evaluate each 60-mer in thesubsequence and looks for the best
oligos within that set, ignoring the absolute quality ofthe oligo.
As a result, lower quality oligos can be selected. Second, AOS does
not have aparameter to set distance between oligos. Consequently,
resolution is largely set bydefining subsequence size but is still
subject to highly variable placement within each
-
7
subsequence. For instance, if the desired tiling density is 300
bp, we would selectsubsequences 300 bp long. For any two adjacent
subsequences, probes could be separatedby as little as 0 bp (both
probes were placed near the shared subsequence border) or asmuch as
480 bp (both probes placed at opposite subsequence ends).
To avoid selecting lower quality oligos, we ran AOS to derive
scores for every 60-mer inall subsquences and then eliminated
oligos based on these scores. AOS uses a scoringsystem for four
criteria: GC content, self-binding, complexity and uniqueness.
Weselected the following ranges for each parameter: GC content
between 30 percent and 100percent, self-binding score less than
100, complexity score less than or equal to 24,uniqueness greater
than or equal to –40.
To achieve more uniform tiling, we instituted a method to find
probes within a particulardistance from each other for the
transcription unit subsequences. We sorted all qualifiedprobes into
chromosomal order and identified gaps in the genomic sequence that
were notcovered by one or more 60-mers. These gaps typically
represented regions that wererepeat masked or generated regions of
consistently low quality oligos. For our purposes,gaps that were
greater than 640 bp long represented potential dead zones or
“borders”.Based on empirical experience with genome-wide location
analysis technology, weconservatively estimated that we would not
identify binding events that occurred morethan 320 bp away from the
genomic location of any particular probe. As a result, gaps
thatwere longer than 640 bp long likely contained one or more
basepairs within the gap thatwould not be detected even if we used
the closest qualified oligos as probes. Using theseborders, we
split the set of all probes into “packages” containing all
qualified probesbetween two borders.
For packages up to 300 bp long, we designed two probes where
possible, one from eachstrand (Watson and Crick). This resulted in
two different probes in the region,compensating for those instances
where a small region would be found isolated by twoborders from the
nearest, potentially informative, neighboring probe. For packages
greaterthan 301 bp long, we selected the first qualified probe in
the package (lowest chromosomalcoordinate), then selected the next
qualified probe that was between 150 bp and 280 bpaway. If there
were multiple, eligible probes, we chose the most distal probe
within the280 bp limit. If there were no probes within this limit,
we continued scanning until wefound the next acceptable probe. The
process was then repeated with the most recentlyselected probe. If
the most recently selected probe was within 250 bp of the next
border,we automatically selected the qualified probe closest to the
next border. This ensured thatwe were selecting probes as close to
the ends of packages as possible.
For intergenic unit tiling, we generated subsequences and
identified borders and packagesas described for genic tiling. We
divided packages into evenly sized segments where themaximum
segment size was 480 bp. We then selected the qualified probe
closest to themidpoint of each segment.
All probes from both transcription unit and intergenic unit
tiling were combined andgrouped by chromosome and sorted by
position.
-
8
Compiled Probes and ControlsThe design process described above
led to the production of a set of 115 Agilentmicroarrays containing
a total of 4,652,484 features. Each array contains 40,457
featuresexcept for array #115, which contains 40,386 features. The
probes are arranged such thatarray 1 begins with the left arm of
chromosome 1, array 2 picks up where array 1 ends,array 3 picks up
where array 2 ends, and so on. There are some gaps in coverage
thatreflect our inability to identify high quality unique 60-mers:
these tend to be unsequencedregions, highly repetitive regions that
are not repeat masked (such as telomeres or genefamilies) and
certain regions that are probably genome duplications. We estimate
that only10% of the total, non-repeat masked region is not covered
by probes. As an estimate ofprobe density, 95% of all 60-mers are
within 450 bp of another 60-mer; 80% of all 60-mersare within 350
bp of another 60-mer.
We added several sets of control probes (1,500 total) to the
whole genome array designs.On each array, there are 40 oligos
designed against five Arabidopsis thaliana genes thatare printed in
triplicate, and thus available for use with spike-in controls.
TheseArabidopsis oligos were BLASTed against the human genome and
do not register anysignificant hits. Since E2F4 chromatin
immunoprecipitations can be accomplished with awide range of cell
types and have provided a convenient positive control for
ChIP-Chipexperiments (for putative regulators where no prior
knowledge of targets exist, forexample), we added a total of 80
oligos representing four proximal promoter regions ofgenes that are
known targets of the transcriptional regulator E2F4
(NM_001211,NM_002907, NM_031423, NM_001237). Each of the four
promoters is represented by 20different oligos that are evenly
positioned across the region from 3 kb upstream to 2 kbdownstream
of the transcription start site. We also included a control probe
set thatprovides a means to normalize intensities across multiple
slides throughout the entiresignal range. There are 384 oligos
printed as intensity controls; based on testhybridizations, this
set of oligos gives signal intensities that cover the entire
dynamic rangeof the array. Twenty additional intensity controls,
representing the entire range ofintensities, were selected and
printed fifteen times each for an additional 300 controlfeatures.
We also incorporated 616 “gene desert” controls. To design these
probes, weidentified intergenic regions of 1 Mb or greater and
designed probes in the middle of theseregions. These are intended
to identify genomic regions that are least likely to be bound
bypromoter-binding transcriptional regulators (by virtue of their
extreme distance from anyknown gene). We have used these as
normalization controls in situations where a factorbinds to a large
number of promoter regions. In addition to these 1,500 controls,
there are2,256 controls added by Agilent (standard) and 77 blank
spots.
Promoter Array
This set of 10 arrays was designed to cover regions between -8
kb and +2 kb relative to thetranscription start sites of 16,710
genes. See Boyer et al. (2005) for details of the design ofthe
arrays.
Transcription Factor Array
This array was designed to cover regions between –5 kb and +5 kb
relative to thetranscription start sites of 2,288 human genes
encoding transcription factors as determinedby GO classifications
and manual annotation. Probes were designed essentially as
-
9
described above for the whole genome array although tiling
density was slightly improved(1 probe approximately every 250 bp).
There are a total of 2,079 control spots on thetranscription factor
array. The 40 Arabidopsis oligos and 80 E2F4 oligos described
abovefor the whole genome design are each printed once. A total of
404 intensity controls areprinted twice. A total of 1,085 “gene
desert” controls (described above in the wholegenome design) are
each printed once. The intensity controls and “gene desert”
controlsare expanded sets of the controls described above for the
whole genome design.
Data Normalization and Analysis
We used GenePix software (Axon) to obtain background-subtracted
intensity values foreach fluorophore for every feature on the
array. To obtain set-normalized intensities, wefirst calculated,
for each slide, the median intensities in each channel for the set
of 1,500control probes described above and included on each array.
For multiple slide sets (wholegenome and promoter array), we then
calculated the average of these median intensities forall slides.
Intensities were then normalized such that the median intensity of
each channelfor an individual slide equaled the average of the
median intensities of that channel acrossall slides.
Among the Agilent controls is a set of negative control spots
that contain 60-mersequences that do not cross-hybridize to human
genomic DNA. We calculated the medianintensity of these negative
control spots in each channel and then subtracted this numberfrom
the set-normalized intensities of all other features.
To correct for different amounts of genomic and
immunoprecipitated DNA hybridized tothe chip, the set-normalized,
negative control-subtracted median intensity value of the
IP-enriched DNA channel was then divided by the median of the
genomic DNA channel.This yielded a normalization factor that was
applied to each intensity in the genomic DNAchannel.
Next, we calculated the log of the ratio of intensity in the
IP-enriched channel to intensityin the genomic DNA channel for each
probe and used a whole chip error model (Hughes etal., 2000) to
calculate confidence values for each spot on each array (single
probe p-value).This error model functions by converting the
intensity information in both channels to anX score which is
dependent on both the absolute value of intensities and background
noisein each channel. When available, replicate data were combined,
using the X scores andratios of individual replicates to weight
each replicate’s contribution to a combined X scoreand ratio. The X
scores for the combined replicate are assumed to be normally
distributedwhich allows for calculation of a p-value for the
enrichment ratio seen at each feature. P-values were also
calculated based on a second model assuming that, for any range of
signalintensities, IP:control ratios below 1 represent noise (as
the immunoprecipitation shouldonly result in enrichment of specific
signals) and the distribution of noise among ratiosabove 1 is the
reflection of the distribution of noise among ratios below 1.
-
10
Identification of Bound Regions
Whole Genome ArraysTo automatically determine bound regions in
the datasets, we developed an algorithm toincorporate information
from neighboring probes. For each 60-mer, we calculated theaverage
X score of the 60-mer and its two immediate neighbors. If a feature
was flaggedas abnormal during scanning, we assumed it gave a
neutral contribution to the average Xscore. Similarly, if an
adjacent feature was beyond a reasonable distance from the
probe(1000 bp), we assumed it gave a neutral contribution to the
average X score. The distancethreshold of 1000 bp was determined
based on the maximum size of labeled DNAfragments put into the
hybridization. Since the maximum fragment size wasapproximately 550
bp, we reasoned that probes separated by 1000 or more bp would not
beable to contribute reliable information about a binding event
halfway between them.
This set of averaged values gave us a new distribution that was
subsequently used tocalculate p-values of average X (probe set
p-values). If the probe set p-value was less than0.001, the three
probes were marked as potentially bound.
As most probes were spaced within the resolution limit of
chromatin immunoprecipitation,we next required that multiple probes
in the probe set provide evidence of a binding event.Candidate
bound probe sets were required to pass one of two additional
filters: two of thethree probes in a probe set must each have
single probe p-values < 0.005 or the centerprobe in the probe
set has a single probe p-value < 0.001 and one of the flanking
probeshas a single point p-value < 0.1. These two filters cover
situations where a binding eventoccurs midway between two probes
and each weakly detects the event or where a bindingevent occurs
very close to one probe and is very weakly detected by a
neighboring probe.For RNA polymerase II, this algorithm identified
22,912 bound probe sets of RNApolymerase II ChIP-enriched DNA
across the genome.
Individual probe sets that passed these criteria and were spaced
closely together werecollapsed into bound regions if the center
probes of the probe sets were within 1000 bp ofeach other. This
final step reduced the 22,912 peaks to 10,244 bound regions. The
boundregions had a median size of 950 bp.
The ES cell line we used (H9) has a female karyotype (XX).
Nineteen (0.18%) of the RNApolymerase II bound regions mapped to
the Y chromosome and 6 of these correspond tothe promoters of known
genes. Each of these 6 genes (ASMTL, CXYorf2,HIT000024005, PLCXD1,
PPP2R3B and SYBL1) are also present on the X chromosomesuggesting
that all of these bound regions are duplicate measurements of X
chromosomebinding events caused by hybridization of X chromosome
DNA to Y chromosome probes.Subtracting out these duplicates leaves
10,225 unique genomic regions bound by RNApolymerase II in ES
cells.
Peak finding for genome-wide Suz12 binding data was carried out
as described above forRNA polymerase II with the following
modifications. Probe sets were marked aspotentially bound if the
p-value of average X (probe set p-values) was less than 0.0001
andprobe sets were required to pass one of two additional filters:
two of the three probes in aprobe set must each have single probe
p-values < 0.0005 or the center probe in the probeset has a
single probe p-value < 0.0001 and one of the flanking probes has
a single point p-
-
11
value < 0.01. This algorithm identified 16,438 bound probe
sets of Suz12 ChIP-enrichedDNA across the genome. As before,
individual probe sets that passed these criteria andwere spaced
closely together were collapsed into bound regions if the center
probes of theprobe sets were within 1,000 bp of each other. This
final step reduced the 16,348 peaks to3,446 bound regions. The
bound regions had a median size of 1,248 bp.
Unlike RNA polymerase II, Suz12 was often associated with large
regions of DNAstretching over multiple kilobases of contiguous
sequence. 28% of Suz12-bound regionswere over 2 kb in size,
compared with only 7% of RNA polymerase II-bound regions. Insome
instances, multiple large regions were clustered in close proximity
as shown for theHox clusters.
Promoter ArrayProbe sets were marked as potentially bound if the
p-value of average X (probe set p-values) was less than 0.001 and
probe sets were required to pass one of two additionalfilters: two
of the three probes in a probe set must each have single probe
p-values < 0.005or the center probe in the probe set has a
single probe p-value < 0.001 and one of theflanking probes has a
single point p-value < 0.1. This algorithm identified 7,074
boundprobe sets of Suz12 ChIP-enriched DNA, 6,302 bound probe sets
of Eed ChIP-enrichedDNA and 8,205 bound probe sets of H3K27me3
ChIP-enriched DNA. As before,individual probe sets that passed
these criteria and were spaced closely together werecollapsed into
bound regions if the center probes of the probe sets were within
1,000 bp ofeach other. This final step reduced the peaks to 1,415
(Suz12), 1,549 (Eed) and 1,885(H3K27me3).
Transcription Factor ArrayProbe sets were marked as potentially
bound if the p-value of average X (probe set p-values) was less
than 0.001 and probe sets were required to pass one of two
additionalfilters: two of the three probes in a probe set must each
have single probe p-values < 0.005or the center probe in the
probe set has a single probe p-value < 0.001 and one of
theflanking probes has a single point p-value < 0.1. As before,
individual probe sets thatpassed these criteria and were spaced
closely together were collapsed into bound regions ifthe center
probes of the probe sets were within 1,000 bp of each other. This
algorithmidentified 465 bound probe sets (299 bound regions) of
Suz12 ChIP-enriched DNA inmuscle cells, 7,199 bound probe sets (645
bound regions) of Suz12 ChIP-enriched DNA inES cells, 1,375 bound
probe sets (455 bound regions) of H3K27me3 ChIP-enriched DNAin
muscle cells and 5,455 bound probe sets (775 bound regions) of
H3K27me3 ChIP-enriched DNA in ES cells.
Controlling for the Effect of Murine Embryonic Fibroblast Feeder
Cells
We performed two sets of experiments to measure the contribution
of the murineembryonic fibroblasts (MEFs) to the RNA polymerase II
binding data. In the firstexperiment, we grew a population of MEFs
isolated from E13.5 embryos, irradiated andreplated the cells for
24 hours, treated the cells with formaldehyde to crosslink
polymeraseto DNA and performed a chromatin IP. This DNA was then
purified and labeled exactlyas described for samples of ES cells.
Labeled DNA was hybridized to self-printed arraysand analyzed as
described previously (Odom et al., 2004). The results indicate that
mouse
-
12
feeder cells are unlikely to contribute more than 1% false
positives to RNA polymerase IIchromatin immunoprecipitation
results. Using our standard analysis, there are only 47features
that show enrichment with the mouse feeder cells RNA polymerase II
chromatinimmunoprecipitation. In contrast, there are typically
4,000-5,000 enriched features withhuman RNA polymerase II chromatin
immunoprecipitation on self-printed arrays. In thesecond set of
experiments, we obtained ES cells that were MEF-subtracted by
preplatingthe cells on ungelatinized culture dishes for 1-2 hours
at 37°C. The supernatant enrichedfor ES cells was then cross-linked
as above and harvested for immunoprecipitation. Theresults were
essentially the same with and without feeder cells. There are
somedifferences, presumably due to the extra manipulations needed
to separate the cells and thedecreased cell number resulting from
these manipulations. While it is technically possiblethat the
oligonucleotide arrays perform differently from our self-printed
arrays, theseexperiments generally suggest that the contribution of
8-12% of feeder cells is unlikely tohave an effect on the final
results.
Comparing Bound Regions to Known and Predicted Genes
The coordinates for the complete lists of RNA polymerase
II-bound and Suz12-boundregions on the whole-genome arrays can be
found in Table S1 and Table S7, respectively.Mapping the location
of RNA polymerase II using genome-tiling arrays directly
identifiedthe physical location of active promoters in living
cells, thus improving our confidence intranscription start sites
previously inferred from RNA evidence. Mapping the location ofSuz12
identified the location of genomic regions targeted by the
chromatin regulatorPRC2. This knowledge should be valuable for
improving annotation of the genome andidentifying regulatory
elements that may not be detected by alternative methods.
Comparisons to Known GenesThe locations of RNA polymerase
II-bound and Suz12-bound regions were comparedrelative to
transcript start and stop coordinates of known genes compiled from
fivedifferent databases: RefSeq (Pruitt et al., 2005), Mammalian
Gene Collection (MGC)(Gerhard et al., 2004), Ensembl (Hubbard et
al., 2005), University of California Santa Cruz(UCSC) Known Genes
(genome.ucsc.edu)(Kent et al., 2002) and Human Invitational (H-Inv)
full-length cDNAs (Imanishi et al., 2004). All coordinate
information wasdownloaded in January 2005 from the UCSC Genome
Browser (NCBI build 35). Of the10,225 RNA polymerase II-bound
regions, 6,741 (66%) occurred within 1 kb of gene startsfrom one of
these 5 databases (Table S1). Of the 3,446 Suz12-bound regions,
2,113 (61%)occurred within 1 kb of gene starts from one of these 5
databases (Table S7).
To convert bound transcription start sites to more useful gene
names, we used conversiontables downloaded from UCSC and Ensembl to
automatically assign
EntrezGene(http://www.ncbi.nlm.nih.gov/entrez/) gene IDs and
symbols to the RefSeq, MGC,Ensembl, UCSC Known Genes and H-Inv
transcripts. Transcripts for which noEntrezGene annotation could be
found in this manner were annotated manually. Thisresulted in a
total of 7,106 EntrezGene genes being bound by RNA polymerase II
(TableS2) and 1,893 EntrezGene genes being bound by Suz12 (Table
S8).
-
13
The Distribution of Distances to Known GenesThe distances
between each bound region and the closest RefSeq, Ensembl or
MGCtranscription start site were calculated and plotted as a
histogram (Text: Figure 1E). Asmight be expected for RNA polymerase
II, there is a higher frequency of binding eventsover the start
sites of known genes. This distribution gradually tails off in both
directionsas the distance to the start site increases. Suz12 shows
a similar, but broader distributionas a sizable subsetset of Suz12
binding events cover large regions of the genome. Forcomparison,
the same distance calculation was made for all probes on chromosome
1.
Fraction of transcription start sites bound by RNA polymerase II
or Suz12 in ES cellsWe used several human gene databases to
identify the fraction of annotated transcriptionstart sites bound
by RNA polymerase II or Suz12 in ES cells (Figure S4). For
eachdatabase, we calculated the percentage of annotated
transcription start sites that lie within 1kb of a bound region
(RNA polymerase II: MGC 42%, RefSeq 34%, Ensembl 28%, UCSCKnown
Genes 26% and H-Inv 26%; Suz12: MGC 6%, RefSeq 8%, Ensembl 7%,
UCSCKnown Genes 6% and H-Inv 4%).
Comparisons to Predicted GenesThe locations of bound regions
were also compared relative to transcript start and stopcoordinates
of predicted genes compiled from eight different databases; GenScan
(Burgeand Karlin, 1997), GeneID (Parra et al., 2000), FirstEF
(Davuluri et al., 2001), ACEview(www.aceview.org), ECgene (Kim et
al., 2005), UniGene (www.ncbi.nlm.nih.gov/UniGene), UCSC
RetroFinder (Kent et al., 2003) and Non-human mRNAs (Kent et
al.,2002). These gene models are generally derived through ab
initio computational genemodeling (GenScan, GeneID and FirstEF) or
EST clustering and alignment to the humangenome (ACEview, ECgene,
UniGene, UCSC RetroFinder and Non-human mRNAs). Allpredictions were
derived from downloads of coordinates of predicted human genes
mappedto NCBI build 35 of the public human genome sequence from
UCSC in January 2005. Ofthe 3,484 RNA polymerase II-bound regions
not mapping to a known gene, 2,110 mappedwithin 1 kb of the start
site of a predicted gene (Table S1). Therefore, a total of
8,851(87%) RNA polymerase II-bound regions corresponded to a known
or predictedtranscription start site. Of the 1,353 Suz12-bound
regions not mapping to a known gene,1,158 mapped within 1 kb of the
start site of a predicted gene (Table S7). Therefore, a totalof
3,271 (95%) Suz12-bound regions corresponded to a known or
predicted transcriptionstart site.
Candidate Novel GenesWe reasoned that RNA polymerase II bound
regions located outside of known genes andrelatively far from known
transcription sites might represent novel genes. We identified1,053
genomic regions bound by RNA polymerase II that lie over 10 kb away
from andoutside of any known gene (as defined as being present in
any one of RefSeq, MGC,Ensembl, UCSC Known Genes or H-Inv
databases) (Table S3). Of these, we calculatedthat 432 occur within
1 kb of the transcription start sites predicted by one or more of
eightgene prediction algorithms (Table S4). These gene predictions
are made based on ab initiocomputational gene modeling (GenScan,
GeneID and FirstEF), EST clustering andalignment (ACEview
(www.aceview.org), ECgene, UniGene(www.ncbi.nlm.nih.gov/UniGene),
UCSC RetroFinder and Non-human mRNAs). Allpredictions were derived
from downloads of coordinates of predicted human genes mappedto
NCBI build 35 of the public human genome sequence from UCSC in
January 2005.
-
14
While we favor the interpretation that RNA polymerase II-bound
regions that are relativelydistant from annotated transcript start
sites represent promoters for novel genes, there areseveral other
possibilities. These regions could also represent new, distal start
sites forknown genes. These distal regions might represent
enhancers that are captured via long-range interactions with RNA
polymerase II bound to proximal promoters. Finally, theseregions
could also represent regions that are spatially, but not linearly,
co-localized, similarto the localization of separate regions of
chromosomes in the nucleolus.
Promoters for miRNAsRNA polymerase II and Suz12 were also found
associated with microRNA genes in EScells. MicroRNAs (miRNAs) are a
non-coding class of small RNAs with significantregulatory potential
(Bartel, 2004). In a few cases, miRNA primary transcripts have
beencharacterized and shown to have the hallmarks of RNA polymerase
II transcripts (Lee etal., 2004), but due to the rapid processing
of primary miRNA transcripts, the location andclassification of the
majority of miRNA promoters remains unknown. We found RNApolymerase
II associated with genes specifying 66 miRNAs in human ES
cells,representing 29% of all annotated miRNAs (Table S5). RNA
polymerase occupied thepromoters of protein-coding genes harboring
35 intronic miRNAs, strengthening theproposal that miRNAs located
within protein-coding genes are typically regulated by thepromoters
of the corresponding host genes. We also identified the promoters
of 31miRNAs that occur independently of protein-coding genes,
providing global evidence thatindependently transcribed miRNAs are
generally RNA polymerase II transcripts. Thissystematic
identification of miRNA genes bound by RNA polymerase II overcomes
manyof the limitations to miRNA detection such as the small size of
the mature species and thecross-hybridization of closely related
miRNAs.
Similar analysis for Suz12-bound regions indicated that Suz12
binds the promoter regionsof 34 miRNAs. These included mir-124, a
miRNA preferentially expressed in brain tissue(Sempere et al.,
2004) that can shift gene expression profiles towards that of brain
(Lim etal., 2005). The observation that Suz12 occupies genes that
specify both transcriptional andpost-transcriptional regulators of
development indicates that PRC2 functions to repressdevelopmental
transcriptional programs in ES cells at multiple levels.
Bound regions were assigned to miRNAs as follows. MiRNA clusters
(data from Rfam,May 2005) were divided into two classes; intronic
(inside known genes in the sameorientation) and independent.
Intronic miRNA genes were classified as bound if thepromoter of
their host gene was bound. For genes with alternative promoters, a
promoterupstream of the miRNA had to be bound. Intronic miRNAs
appeared to be transcribedfrom the promoters of their host genes;
we did not observe any other RNA polymerase IIbinding close to
intronic miRNAs. Intergenic miRNAs were classified as bound if
RNApolymerase II or Suz12 binding was identified within 10 kb
upstream of the miRNA,unless the bound region could be attributed
to a neighboring gene. However, in most cases,the bound region was
detected much closer to the DNA encoding the miRNA stem loops.
-
15
Comparing Binding and Human Expression Data
Transcription of genes bound by RNA polymerase II and Suz12We
collected 7 previously published ES cell expression datasets for
comparison with ourRNA polymerase II and Suz12 binding data. The
expression data, gathered usingmassively parallel signature
sequencing (MPSS) and Affymetrix gene expression arrays,were
processed as follows:
MPSS data: Three MPSS datasets were collected, two from a pool
of the ES cell lines H1,H7 and H9 (Brandenberger et al., 2004; Wei
et al., 2005) and one for HES-2 (Wei et al.,2005). For each study,
only MPSS tags detected at or over 4 transcripts per million
(tpm)were used. In addition, the data provided by Wei and
colleagues (Wei et al., 2005) allowedus to select only those tags
that could be mapped to a single unique location in the
humangenome. For tags without a corresponding EntrezGene ID, IDs
were assigned using thegene name or RNA accession numbers provided
by the authors.
Gene expression microarray data: Four Affymetrix HG-U133 gene
expression datasetswere collected for the cell lines H1 (Sato et
al., 2003), H9, HSF1 and HSF6 (Abeyta et al.,2004). Each cell line
was analyzed by the authors in triplicate. EntrezGene IDs
wereassigned to the probe-sets using Affymetrix annotation or using
RNA accession numbersprovides by the authors. For each probe-set,
we counted the number of “Present” calls inthe three replicate
array experiments performed for each cell line. Most genes
arerepresented by more than one probe-set and, to enable comparison
to MPSS and RNApolymerase II binding data, we then found the
maximum number of P calls for each gene(defined by unique
EntrezGene ID). A gene was defined as detected if it was
called“Present” in at least 2 of the 3 replicate arrays.
This provided 7 lists of genes expressed in ES cells, 3 from
MPSS data and 4 frommicroarray data. We found that microarray
analysis of H9 ES cells detected transcripts for78% of the genes
bound by RNA polymerase II in H9 cells that were present on
theAffymetrix arrays. In total, the 7 expression experiments
detected transcripts for 88% ofgenes bound by RNA polymerase II
(Table S6). In contrast to genes bound by RNApolymerase II, the
expression of genes bound by Suz12 was detected more rarely
(FigureS12). We found that 20% (±6%) of the genes bound by Suz12
alone in H9 ES cells wereexpressed, depending on the expression
dataset used. The expression of some of thesegenes may be due to
the incomplete shut down of transcription by Suz12, variations in
thegenes bound by Suz12 in different cell culture conditions, or
due to the detection of RNAtranscripts that are present in a
minority of differentiated cells. Transcription of genesbound by
both Suz12 and RNA polymerase II is detected substantially more
often thangenes bound by Suz12 alone, consistent with the presence
of RNA polymerase II.
ES cell expression relative to differentiated cell typesWe
examined the relative expression levels of genes associated with
PRC2 andH3K27me3 in human ES cells (Text: Figure 2C). In order to
compare ES cells with asmany human cell and tissue types as
possible, we combined the data from three studies, allperformed
using the Affymetrix HG-U133A platform: 3 replicates of H1 ES cells
(Sato etal., 2003), 3 replicates each of H9, HSF1 and HSF6 ES cells
(Abeyta et al., 2004) and 2
-
16
replicates of 79 other human cell and tissue types (Su et al.,
2004). We extracted data fromthe original CEL files from each array
and scaled the data to a median signal of 150 inGCOS (Affymetrix).
We then exported the data, created expression ratios using the
mediangene expression of each gene across all arrays, transformed
the data into log base2 andmedian centered both gene and arrays (so
that the median log2 expression ratio for eachgene and each array
is 0). EntrezGeneIDs were assigned to each probe-set and for
geneswith multiple probe-sets, the expression ratios averaged. This
resulted in a set of 12,968unique genes. Of these, 604 were bound
by Suz12, Eed and H3K27me3 at highconfidence.
In addition to examining the relative expression levels of
Suz12-target genes in ES cellsand differentiated cells, we also
examined the Affymetrix absolute Present/Absentexpression calls
(Figure S12). Using this measurement, we found that RNA transcripts
ofSuz12-target genes were detected in ES cells much less frequently
for RNA transcripts ofRNA polymerase II target genes. However, in
differentiated cells, RNA transcripts weredetected for the two
classes of genes more equally, indicating that many of the
genessilenced by Suz12 in ES cells are transcriptionally active in
differentiated cells.
Inverse correlation between the size of the Suz12 binding domain
and gene expressionWe found that, unlike RNA polymerase II, Suz12
was often associated with large regionsof DNA stretching over
multiple kilobases of contiguous sequence. For example, 28.3%
ofSuz12-bound regions were over 2 kb in size, compared with only
6.6% of RNApolymerase II-bound regions (Figure S8). To explore
whether the size of the genomicregion occupied by Suz12 had any
functional implications, we measured how RNApolymerase II
co-occupancy and gene expression varied according to Suz12
coverage(Figure S13). Suz12 bound regions were assigned to RefSeq
genes if they occurred within1 kb of a transcription start site.
For genes associated with multiple bound regions, theregions were
collapsed, unless the bound regions occupied alternative promoters,
in whichcase the largest region was selected. Then for each gene,
we determined whether the genewas co-occupied by RNA polymerase II
and whether or not the gene was transcribed.Genes have to pass one
of two criteria to be classified as transcribed: either
RNAtranscripts could be detected in all three MPSS datasets or RNA
transcripts could bedetected in all four Affymetrix gene expression
microarray datasets. We discovered thatthe greater the extent of
Suz12 binding, the less frequently the gene was transcribed andthe
less frequently the target gene was occupied by RNA polymerase II.
Genes associatedwith Suz12 over 4 kb of sequence were 8-times less
likely to be transcribed in ES cells(from 24% of RefSeq genes to
3%) and 4-times less likely to be bound by RNApolymerase II (from
36% to 9%). This suggests that transcriptional repression of genes
isfacilitated by the presence of Suz12 across large regions of
DNA.
Expression changes upon ES cell differentiationWe also compared
the expression level of genes between pluripotent ES cells
anddifferentiated ES cells (expression data from Sato et al, 2003).
The pluripotent ES cells(H1 cell line) were cultured on Matrigel in
MEF-conditioned medium and thendifferentiated (non-lineage
directed) on Matrigel in non-conditioned medium for 26 daysand both
samples were analyzed in triplicate on Affymetrix HG-U133A arrays.
Weextracted data from the original CEL files and scaled the data to
a median signal of 150 inGCOS (Affymetrix). We then exported the
data and, for each probe-set, calculated the ratio
-
17
of the average signal in differentiated cells to the average
signal in pluripotent cells.EntrezGeneIDs were assigned to each
probe-set and for genes with multiple probe-sets, theexpression
ratios averaged. We then selected only those genes that had
transcriptsdetectable in either pluripotent or differentiated ES
cells (gene called “P” in at least 2 ofthe 3 replicates), to avoid
analyzing expression ratios consisting of only noise.
To test whether Suz12 bound genes were preferentially
upregulated upon differentiation(Text: Figure 6, Table S13), we
compared the distribution of expression ratios for genesbound by
Suz12 but not RNA Pol II with the distribution of expression ratios
for all genes.As a control, we also compared the distribution of
expression ratios for genes bound byneither Suz12 nor RNA Pol II
(i.e. genes repressed by other means) with the distribution
ofexpression ratios for all genes. We chose to present data for
genes not bound by RNApolymerase II because this was the stricter
comparison (genes bound by RNA polymeraseII are less likely to
increase in expression as they are already being transcribed).
However,the preferential induction of genes bound by Suz12 is also
apparent without first filteringfor RNA polymerase II occupancy
(data not shown).
Estimating Error Rates
We used sequence-specific PCR to estimate false positive rates
for the whole-genomearray data (Figure S5). For RNA polymerase II,
a subset of the bound probe sets wereselected and primer pairs
designed to amplify between 100 and 200 bp within each boundprobe
set. Primers were tested for specificity using BLAST and ePCR. A
total of 192primer pairs were selected, where each primer had 10 or
fewer matches to the genome andthe pair predicted a single
amplicon. For RNA polymerase II IP samples, 10 ng ofimmunoenriched
DNA was used as input to the PCR. For whole cell extract
(WCE)samples, a range of unenriched DNA amounts (90, 30 and 10 ng
of DNA) was used. ThePCR was performed for 28 cycles and products
were visualized on an agarose gel stainedwith SYBR Gold (Amersham)
and quantified using ImageQuant (Amersham). Only PCRreactions
giving single bands with intensities ordered according to the WCE
concentrationwere used. Genomic regions were considered enriched if
the 10 ng IP sample showedeither 1.5-fold or greater enrichment
compared to the 30 ng WCE sample or greater than 1-fold enrichment
compared to the 90 ng WCE sample. Genomic regions were
considerednot enriched if the band intensity of the 10 ng IP was
less than half that of the 30 ng WCEor less than the 10 ng WCE. A
total of 119 primer pairs yielded a clear enriched/notenriched
decision. 114 of these showed enrichment, indicating a false
positive rate of4.4%. Using this set of PCR results, we were also
able produce receiver-operator curvesshowing how changes in peak
identification criteria would affect the false positive andfalse
negative rates. The results suggest that our selected criteria are
useful for maximizingthe identification of true positives.
Two lines of evidence suggest that the false negative rate is
approximately 30%.Estimating a false negative rate is generally
much more problematic than measuring a falsepositive rate because
the measurement of a false negative rate assumes perfect
knowledgeof the true positives in the dataset. As every method will
have its own error rate,determining a set of true positives is
challenging, if not impossible. Despite this importantcaveat, we
have used both sequence-specific PCR and a comparison with
expressiondatasets to estimate a false negative rate.
-
18
To obtain an estimate of the false positive rate for our
sequence specific PCR reactions, wedesigned 49 primer pairs against
regions of the genome that had no indication of RNApolymerase II
binding (p-value for average X and center probes X > 0.3)
despite being indensely tiled regions. We reasoned that any
substantial PCR amplification in this regionwas more likely to
reflect a false positive in the PCR then to reflect binding of a
very largefraction of the genome to the initiation form of RNA
polymerase II. From these PCRs, wemeasured a false detection rate
of ~9%. We then designed a series of PCR primers againstprobes
‘expressing’ a broad range of p-values between these absolute
negatives and ourpositive list. 60 of these pairs produced positive
PCR amplifications. Correcting for theexpected false detection rate
of the PCR, we calculate a probe based false negative rate
of~33%.
We also used sequence-specific PCR to estimate false positive
and false negative rates forthe whole-genome Suz12 array data. For
estimating false positives, a total of 108 primerpairs yielded a
clear enriched/not enriched decision. 105 of these showed
enrichment,indicating a false positive rate of 2.8%. Correcting for
the expected false detection rate ofthe PCR, we calculated a probe
based false negative rate of 27%.
Binding of Suz12, Eed and H3K27me3
We used a microarray containing probes for the promoters of
16,710 genes to measure thecorrelation between Suz12 binding, Eed
binding and H3K27 methylation. This arraydetected binding of Suz12
to 1,039 genes, Eed to 909 genes and H3K27me3 to 1,007genes. (Text:
Figure 2A, Table S9). Due to the strict significance threshold we
use to calldefine a DNA binding event (see Identification of Bound
Regions section), any set ofgenes we define as being bound is
conservative, with a false negative rate of ~30% (seeEstimating
Error Rates section). We therefore compared the binding ratios
between Suz12,Eed and H3K27me3 to determine whether the genes that
were only called bound by onefactor were also bound by the other
factors, although at a significance that fell below ourstrict
threshold (Figure S6). For genes bound by any one of Suz12, Eed or
H3K27me3, wealigned the binding ratios from our Suz12 IP, our Eed
IP and our H3K27me3 IP. We foundthat the binding patterns of Suz12,
Eed and H3K27me3 followed one another, even atgenes where the
binding of only one factor was highly significant by our analysis.
Fromthis we conclude that Suz12, Eed and H3K27me3 are present at
essentially the same set ofgenes in ES cells, although we cannot
rule out that there is specific binding by these factorsat a small
number of genes.
The high degree of overlap between the Suz12, Eed, and H3K27me3
targets indicates thatSuz12 defines an active PRC2 complex at these
genes. As a critical subunit of the PRC2complex, Suz12 has widely
accepted roles in euchromatic gene silencing and
dosagecompensation, where Suz12 and H3K27me3 are transiently
enriched on the Xi during X-inactivation (Plath et al., 2003; Silva
et al., 2003; de la Cruz et al., 2005). However,alternative roles
for Suz12 have been proposed that suggest Suz12 may
functionindependently of PRC2 and H3K27me3. For example, Suz12
mutations are suppressors ofposition-effect variegation (PEV) and
can interact with the heterochromatin protein 1α(HP1α), indicating
a role in heterochromatin-linked gene silencing (Birve et al.,
2001;
-
19
Yamamoto et al., 2004). Suz12 is also required for germ cell
development independent ofother PcG proteins and can exhibit
different protein expression profiles compared to Eedand EZH2
(Birve et al., 2001; de la Cruz et al., 2005). While the vast
majority of Suz12 co-localizes with Eed to regions of H3K27
methylation, non-overlapping targets may berepresentative of these
alternative Suz12 roles that are independent of other PcG
proteins.
GO Classification for RNA Polymerase II and Suz12 Bound
Genes
We identified Gene Ontology classification terms
(http://www.geneontology.org) enrichedfor RNA polymerase II-bound
and Suz12-bound genes (defined as being within 1 kb of anannotated
TSS in either the RefSeq, MGC or Ensembl databases).
Hypergeometricdistributions were calculated to determine enriched
terms, using for reference the totalnumber of genes annotated to
that GO term. Categories with p-values < 10-5 are indicatedin
Table S10.
Many of the classifications enriched for Suz12-bound genes were
related to development,transcriptional regulation and signaling and
are further described in the main text. Amongthe remaining enriched
classifications, we noted an additional category of interest.
Over100 ion channel genes are bound by Suz12 (L-type calcium
channels, voltage-gated andinward rectifying potassium channels).
This is consistent with a role for PRC2 in blockingdifferentiation.
L-type calcium channels are involved in the neural vs epidermal
cell fatedecision and direct activation of these channels results
in neural induction (Moreau et al.,1994; Leclerc et al., 2001).
We identified 252 annotated human homeodomain transcription
factors using PFam(Bateman et al., 2002) and EntrezGene. Of these,
150 (60%) were bound by Suz12. Mostof these were associated with
extended domains of Suz12 binding. Given the considerablenumber of
homeodomain transcription factors bound by Suz12, we searched for
otherfamilies of transcription factors enriched in the set of
Suz12-bound targets. Genesannotated to the molecular function GO
terms GO:003700 (transcription factor activity),GO:0030528
(transcription regulator activity), or GO:003705 (RNA polymerase
IItranscription factor activity/enhancer binding); or the
biological process GO termsGO:006355 (regulation of transcription,
DNA-dependent), or GO:0045449 (regulation oftranscription) were
defined as transcription factors. Suz12-bound genes in this set
forwhich a SwissProt ID could be retrieved were input into the
PANDORA software package(Kaplan et al., 2003) using domain
annotation to search for enriched molecular domains atstandard
resolution. For reference, the same analysis was performed for
transcription factorgenes bound by RNA polymerase II. Results from
the first level of classification aredepicted in Figure S7,
expressed as a percentage of the number of total
transcriptionfactors placed in that category by PANDORA.
Comparing Suz12-Bound Regions to domains of conservation
Regions of genomic conservation were obtained from the PhastCons
database stored atUCSC (http://genome.ucsc.edu). PhastCons
identifies genomic segments of conservationbased on a two-state
phylogenetic hidden Markov model with a state for conserved
regionsand a state for non-conserved regions. Each conserved
element is assigned a log-odds
-
20
score equal to its log probability under the conserved model
minus its log probability underthe non-conserved model. The
elements are then assigned a conservation score, which is alog
transformation of the log-odds score and scales from 0 to
1000(http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=phastConsElements).
A LoD score of100 corresponds to a conservation score of ~500.
Conserved elements overlapping exonsin the Refseq and UCSC Known
Gene database were removed for most analyses. For casesin which
Suz12-bound regions or TSS proximal regions (-8kb to +2kb around
known startsites) contained multiple conserved elements, the top
conservation score was used.
To calculate the significance of overlap between Suz12 binding
and conserved domains,we tested randomly generated genomic regions
that reflected the variation in size of Suz12binding regions and
the array coverage of the genome. The randomizations wereperformed
by finding, for each bound region, a random region of equal size in
the genomethat was present on the array. Each of these random
regions was then tested to see if itoverlapped a conserved element
and scored as above. Multiple runs of the randomizationwere
performed and P-values were determined by assuming a binomial
distribution with anexpectation derived from the randomized
regions. For comparison, the same analysis wasperformed for RNA
polymerase II.
Generating Suz12-deficient Mouse Cells and Analysis of their
Expression Pattern
Generation of Suz12 -/- mouse cellsTo generate Suz12-deficient
mouse cells, a targeting vector was constructed (Figure S10)from a
BAC DNA clone containing the Suz12 gene isolated from a mouse
genomic libraryderived from R1 embryonic stem cells (O. Ohara and
H.Koseki, unpublished). Thetargeting construct had two homology
sequences, a 6.1 kb EcoRI/XbaI fragment that lies5’ of the 11th
exon of the locus and a 3.1 kb KpnI/XbaI fragment that lies 3’ of
the 14thexon and extends just past the stop codon, removing DNA
encoding amino acids 482 – 741containing the VEFS domain which is
involved in interactions with EZH2 (Yamamoto etal., 2004). For the
negative selection, the HSV tk cassette from pPNT vector was
added.Successful integration replaced a 6.0 kb fragment containing
four exons with a neomycin-resitance (Neor) gene cassette from
pHR68 (gifted by Dr. T. Kondo) in a reverseorientation relative to
Suz12 transcription. This vector was introduced into R1
embryonicstem cells as described previously (Akasaka et al., 1996)
and four homologousrecombinants were obtained. Suz12 heterozygous
ES cells were introduced into recipientblastocysts and germline
transmission of the null allele was obtained with no
apparentphenotypic differences between wildtype and heterozygous
animals. Suz12 +/- mice werebackcrossed six times onto a C57BL/6.
Genotyping was performed by Southern blottingagainst BamHI-digested
genomic DNA to detect the appearance of a 3.5 kb fragment(generated
by cutting at a BamHI site introduced with the neo cassette) and
the loss of a6.2 kb fragment that would occur with endogenous BamHI
sites (Figure S10).
Suz12 -/- cell lines were derived from blastocysts from crosses
between heterozygousSuz12 mutant animals based on conventional
protocols (Hogan et al., 1994). Loss of wild-type Suz12 protein in
Suz12 -/- cells was confirmed by Western blotting (Figure S10).The
homozygous mutant cells display reduced ability to tri-methylate
H3K27 (data notshown) indicating that PRC2 complex function is
disrupted in these cells. The cells retain
-
21
some characteristics of ES cells, such as cellular morphology,
relatively normal levels ofOct4 and Nanog expression and the
ability to proliferate in culture, while gaining
somecharacteristics of differentiated cells, such as upregulation
of developmental transcriptionfactors (as described below).
Moreover, Suz12 -/- embryos in this study arrestdevelopment at 7.75
dpc, similar to that as previously described for Suz12, Eed, and
Ezh2null embryos (Schumacher et al., 1996; O'Carroll et al., 2001;
Pasini et al., 2004).
Microarray expression analysisTotal RNA was purified from the
two replicate wild-type mouse ES cell lines and thereplicate Suz12
-/- cell lines using TRIzol. RNA from each Suz12 -/- cell line was
labeledwith Cy5 using the Low RNA Input Fluorescent Linear
Amplification Kit (Agilent) andhybridized to Mouse Development
Arrays (G4120A, Agilent) with Cy3 labeled total RNAfrom the
corresponding wild-type cells. Each experiment was also repeated,
swapping thedyes, giving a total of four expression datasets, with
each of the two biological replicatesbeing represented by two
technical replicates. The arrays were scanned with an
Agilentmicroarray scanner and the processed signals and expression
ratios were obtained usingFeature Extraction software (Agilent). We
filtered the data to remove features with signalintensities not
significantly above background in both channels. Average expression
ratioswere generated through inter-slide and intra-slide
comparisons between the signals forSuz12 -/- cells and wild-type ES
cells for each replicate. The average ratios between theself-self
comparisons within each replicate set were also calculated and this
population wasthen defined as the null-distribution. Expression
ratios were then compared to this null-distribution and the number
of standard deviations from the mean calculated. Theexpression of a
gene was considered to be significantly altered in Suz12 -/- cells
if theexpression ratio between Suz12 -/- and wild-type ES cells was
over 2 standard deviationsfrom the mean of the null-distribution
and the expression ratio for the same gene in theself-self
comparisons was less than 1 standard deviation from the mean of the
nulldistribution.
Comparing mouse expression data with human binding data and
human expression dataWe reasoned that genes bound by Suz12 in human
ES cells have orthologs in mice thatshould be upregulated in Suz12
-/- mouse cells. A significant overlap in the genes boundby Suz12
in human ES cells and the genes upregulated in Suz12 -/- mouse
cells wouldsupport a role for Suz12 in the repression of its target
genes in ES cells. We expected thatthe overlap in genes bound by
Suz12 in human ES cells and genes upregulated in Suz12 -/-mouse
cells would be incomplete because of 1) potential differences in
Suz12 occupancyin human and mouse ES cells, 2) possible repression
of PRC2 target genes by additionalmechanisms 3) the effects of the
Suz12 -/- on genes downstream of Suz12-target genes,due to the fact
that many of these are transcriptional regulators and 4) false
positive andnegative errors in both binding and expression
analysis.
The mouse microarrays contained 5341 features with a mouse
EntrezGene ID. Featuresrepresenting duplicate genes were averaged,
giving single expression values for 4266unique genes. Using the
Agilent GeneName field, we then used the Homologene databaseto
identify orthologous human genes. Homologene listed a human
ortholog for 3971 of the4266 mouse genes. We determined that 557 of
these 3971 genes with human orthologswere significantly upregulated
in Suz12 -/- mouse cells.
-
22
We compared the set of 3971 human-mouse gene orthologs with the
human Suz12 bindingdata and found that 346 of these 3971 genes were
bound by Suz12 in human ES cells. Bycomparing the set of 346 bound
genes with the set of 557 upregulated genes, we found that70 (20%)
of the 346 genes bound by Suz12 in human ES cells were upregulated
in Suz12 -/- cells. This overlap is significant (6x10-4) and given
the complexities associated withhuman-mouse comparisons, strongly
supports a role for Suz12 in the repression of itstarget genes in
ES cells. Strikingly, 8 of the 10 most upregulated
developmentaltranscription factors in Suz12 -/- cells were bound by
Suz12 in human ES cells. Theidentities of the genes bound by Suz12
in human ES cells and upregulated in Suz12 -/-mouse cells are
listed, together with their expression changes, in Table S14.
To determine the degree to which Suz12 bound genes were
preferentially upregulated inSuz12 -/- cells (Text: Figure 6C), we
performed the same analysis previously used todetermine whether
Suz12 bound genes were preferentially upregulated upon human EScell
differentiation (p.16 of Supplemental Data).
To compare the expression changes that occur in Suz12 -/- cells
with those that occur uponhuman H1 ES cell differentiation (Text:
Figure 6D), we identified a set of 182 genes thatwere present in
both filtered datasets and were bound by Suz12 in human ES cells
(thehuman ES cell differentiation dataset was filtered as described
on page 16).
Sample Preparation and Analysis of Differentiated Muscle
Primary Human Skeletal Muscle Cells (HSkMCs) were obtained from
Cell Applications,Inc. (San Diego, CA) and expanded according to
supplier's protocols in growth medium.Upon reaching confluence,
cells were shifted to differentiation medium in plates coatedwith
collagen to promote attachment of differentiating cells, and medium
was replacedevery 2 days. Growth and differentiation media were
supplied by Cell Applications, Inc.After 6 days of differentiation,
cells had fused to form multinucleated myotubes. Cellswere
crosslinked and ChIP experiments were performed as described above.
ChIP-chipdata were analyzed as described above.
We observed Suz12 binding at a number of loci that were also
Suz12-bound in ES cells,but loss of Suz12 and H3K27me3 was seen at
several genes critical for the development ofdifferentiated muscle
tissue, including MyoD, Pax3, Pax7, and Six1. Previous
workindicates Ezh2 is removed from the chromatin of the
muscle-specific structural genesMCK and MHCII and replaced by
transcriptional activators upon differentiation (Caretti etal.,
2004). While Ezh2 levels appear to decline over the course of
muscle differentiation,we note Suz12 binding in differentiated
muscle tissue. We cannot rule out that Suz12 maybe playing a
different role in this terminally differentiated tissue than it
plays in ES cells,and further work will be necessary to clarify
this issue. However, the loss of Suz12 atfactors necessary for
development of muscle tissue and the retention of Suz12-binding at
anumber of genes important for other lineages suggests that removal
of Suz12 accompaniesthe expression of key developmental regulators
during cellular differentiation.
-
23
Comparing Suz12 Binding with Oct4, Nanog and Sox2 Binding
To explore how Suz12 might be targeted to genes, we compared our
Suz12 binding data toour previous results from profiling the DNA
binding of the ES cell transcription factorsOct4, Sox2 and Nanog
(Boyer et al., 2005). We found that there was a significant
overlapbetween the genes bound by Suz12 and the genes bound by
Oct4, Sox2 or Nanog. Of the1606 genes bound by Suz12 and present on
the promoter arrays we used previously forOct4, Sox2 and Nanog, 196
(12%) were also bound by Oct4, 148 (9%) by Sox2 and 271(17%) by
Nanog. 92 genes (6%) were bound by all three factors and a total of
342 (21%)by any one of the three factors. We found that genes bound
by Suz12 and Oct4, Suz12 andSox2 or Suz12 and Nanog were enriched
for genes encoding transcriptional regulators ofdevelopment when
compared to genes bound by Suz12 alone (p = 5.4x10-20, 3.5x10-10
and2.4x10-18, respectively). There were 315 genes encoding
developmental transcriptionfactors that were present on the
whole-genome and promoter arrays and bound by Suz12.Of these, 103
(33%) were also bound by Oct4, 81 (26%) were also bound by Sox2 and
107(34%) were also bound by Nanog. 57 genes (18%) were bound by all
three factors and atotal of 143 (45%) were bound by any one of the
three factors (Table S11).
Although Oct4, Sox2 and Nanog are all indispensable for ES cell
propagation, mutations ineach regulator display slightly different
phenotypes (reviewed in Chambers, 2004 andBoiani and Scholer, 2005)
suggesting each may have unique contributions to stem cellidentity.
This led us to ask if there were differences in the association of
Oct4, Sox2 orNanog with Suz12 when the factors were considered
individually or in pairs. Directanalysis of the bound sites,
compared to randomized binding data, showed that Oct4 wasmore
associated with Suz12 bound regions then either Sox2 or Nanog. This
associationwas consistent whether we compared the factors alone or
in pairs. Surprisingly, sites boundby Sox2 or Nanog but not Oct4
were not particularly associated with Suz12 (Figure S14).These data
point out subtle differences in the association of Oct4, Sox2 or
Nanog withSuz12 that may eventually help identify regulatory
mechanisms specific to eachtranscription factor.
Suz12, Oct4, Nanog and Sox2 binding and sequence conservationThe
observation that a set of repressed genes bound by Oct4, Sox2 and
Nanog wereoccupied by Suz12 and contain highly conserved non-coding
DNA sequences led us toexamine whether the DNA-binding regulators
occupy conserved sequence motifs thatmight contribute to PRC2
targeting at these genes. When we examined the Oct4, Sox2 andNanog
binding sites at developmental TFs bound by Suz12, we found that
~50% of theirbound regions overlapped conserved elements that had
LoD conservation scores > 100.The association of these
DNA-binding transcription factors with conserved elements at
asubstantial fraction of Suz12 occupied sites suggests that they
may have some role intargeting PRC2 to conserved elements in many
genes encoding developmental regulators.Oct4, Sox2 and Nanog were
not found at all PRC2-occupied genes, however, so
additionalregulators must also be involved in PRC2-mediated
silencing in ES cells.
Suz12, Oct4, Nanog and Sox2 binding and DNA motifsIf Oct4, Sox2
and Nanog are involved in targeting PRC2 to genes, we expect that
otherfactors must influence Suz12 binding and thus explain why
Suz12 is observed at only asubset of the genes bound by these
transcription factors. We used MEME (Bailey andElkan, 1995) to
search for additional DNA sequence motifs that might
discriminate
-
24
between genes bound by Oct4, Sox2, Nanog and Suz12 and genes
bound by only Oct4,Sox2 and Nanog. There was one motif consisting
of repeats of the dinucleotide GT thatwas specifically associated
with the Oct4, Sox2, Nanog and Suz12-bound sites (FigureS15). This
motif is similar to one previously associated with polycomb
response elementsin Drosophila (Ringrose et al., 2003). We also
found DNA elements that were specificallyassociated with sites
bound by Oct4, Sox2 and Nanog and not Suz12 (Figure S15). One
ofthese elements contains DNA binding sites for multiple
transcription factors, includingHoxA3 and C/EBP. This suggests that
Oct4, Sox2 and Nanog may act with othertranscriptional regulators
to positively regulate transcription at some genes, but in
theabsence of these other regulators, may recruit PcG proteins and
thus negatively regulatetranscription at other genes. Similar
bimodal activities have been suggested for proteinsinvolved in PcG
targeting in Drosophila including GAGA and zeste (Kerrigan et al.,
1991;Laney and Biggin, 1992; Strutt et al., 1997). Oct4, Sox2 and
Nanog have previously beendescribed as having both positive and
negative roles in transcription (Yuan et al., 1995;Botquin et al.,
1998; Nishimoto et al., 1999; Guo et al., 2002) and the association
of Suz12with only a subset of promoters bound by these regulators
would be consistent with theseobservations.
-
25
Index of TablesAll tables can be found on the supporting
website; the URLs below can be used todownload the appropriate
table.
Table S1. Regions bound by RNA polymerase II and their
relationship to known andpredicted
genes.http://web.wi.mit.edu/young/hES_PRC/TableS1.xls
Table S2. HUGO/EntrezGene identifiers for RNA Pol II bound,
annotated genes.http://web.wi.mit.edu/young/hES_PRC/TableS2.xls
Table S3. RNA polymerase II-bound regions that predict novel
gene candidates.http://web.wi.mit.edu/young/hES_PRC/TableS3.xls
Table S4. Gene models bound by RNA polymerase
II.http://web.wi.mit.edu/young/hES_PRC/TableS4.xls
Table S5. MicroRNA genes bound by RNA polymerase II and Suz12 in
ES cells.http://web.wi.mit.edu/young/hES_PRC/TableS5.xls
Table S6. Expression of genes bound by RNA polymerase II in ES
cells.http://web.wi.mit.edu/young/hES_PRC/TableS6.xls
Table S7. Regions bound by Suz12 and their relationship to known
and predicted
genes.http://web.wi.mit.edu/young/hES_PRC/TableS7.xls
Table S8. HUGO/EntrezGene identifiers for Suz12-bound, annotated
genes.http://web.wi.mit.edu/young/hES_PRC/TableS8.xls
Table S9. Detection of Suz12, Eed and H3K27me3 occupancy using
promoter arrays.http://web.wi.mit.edu/young/hES_PRC/TableS9.xls
Table S10. Enriched gene ontologies among RNA Pol II-bound and
Suz12-bound
genes.http://web.wi.mit.edu/young/hES_PRC/TableS10.xls
Table S11. Developmental transcription factors bound by
Suz12.http://web.wi.mit.edu/young/hES_PRC/TableS11.xls
Table S12. Developmental signaling proteins bound by
Suz12.http://web.wi.mit.edu/young/hES_PRC/TableS12.xls
Table S13. Expression of Suz12-bound genes during ES cell
differentiation.http://web.wi.mit.edu/young/hES_PRC/TableS13.xls
Table S14. Genes bound by Suz12 in ES cells and upregulated in
Suz12 -/- mouse
cells.http://web.wi.mit.edu/young/hES_PRC/TableS14.xls
Table S15. Developmental regulators associated with PRC2 in ES
cells and
muscle.http://web.wi.mit.edu/young/hES_PRC/TableS15.xls
-
26
Figure Legends
Figure S1. Human H9 ES cells cultured on a low density of
irradiated murineembryonic fibroblasts.Bright-field image of H9
cell culture.
Figure S2. Analysis of human ES cells for markers of
pluripotency.Human embryonic stem cells were analyzed by
immunohistochemistry for thecharacteristic pluripotency markers
Oct4 and SSEA-3. For reference, nuclei were stainedwith DAPI. Our
analysis indicated that >90% of the ES cell colonies were
positive forOct4 and SSEA-3. Alkaline phosphatase activity was also
strongly detected in human EScells.
Figure S3. Analysis of human ES cells for differentiation
potential.Teratomas were analyzed for the presence of markers for
ectoderm (Tuj1), mesoderm(MF20) and endoderm (AFP). For reference,
nuclei are stained with DAPI. Antibodyreactivity was detected for
derivatives of all three germ layers confirming that the
humanembryonic stem cells used in our analysis have maintained
differentiation potential.
Figure S4. The fraction of annotated promoters bound by RNA
polymerase II orSuz12.The fraction of unique gene transcription
start sites that lie within 1 kb of a genomic regionbound by RNA
polymerase II and Suz12. The total number of start sites in each
database isas follows: MGC n=17,188; RefSeq n=19,349; Ensembl
n=30,121; UCSC Known Genesn=42,160; H-Inv n=42,777.
Figure S5. Estimating error rates.a. Example gel images showing
PCR products amplified from 16 genomic regions judgedto be bound by
RNA polymerase II using the whole-genome arrays. Each primer-pair
wasused to amplify unenriched, whole cell extract (WCE) DNA (90, 30
and 10 ng) andimmunoenriched (IP) DNA (10 ng). Enrichment in the IP
DNA is indicated by a “+” and alack of enrichment by a “-“. PCR
reactions judged to be inconclusive were labeled with an“N”.b.
Example gel images showing PCR products amplified from 16 genomic
regions judgednot to be bound by RNA polymerase II using the
whole-genome arrays. Each genomicregion represents an annotated
transcription start site.c. Receiver-operator curve for RNA
polymerase II binding in human ES cells. Curvecompares percentage
of true positives and false positives in binding events called
fromChIP/chip compared to RT-PCR amplifications of anti-Pol II ChIP
DNA. ROC curveswere determined for all regions of the genome (blue)
and for the subset of regions locatedwithin 1 kb of known
transcription start sites (red).
Figure S6. Co-occupation of gene promoters by Suz12, Eed and
H3K37me3.Suz12 occupancy (top panel), Eed occupancy (middle panel)
and H3K27me3 occupancy(bottom panel) at transcription start sites.
Each row represents a gene considered occupiedby either Suz12, Eed
or H3K27me3 using our high-confidence gene calling algorithm
(seesections on Data Normalization and Analysis and Identification
of Bound Regions). Thesame genes are illustrated in each of the
three panels. Each column represents the datafrom an
oligonucleotide probe positioned relative to the start site as
indicated by the gene
-
27
diagram below. The log binding ratios for each oligo are plotted
for each protein; blueindicates enrichment of the
immunoprecipitated factor (enrichment ratio >1). A scale forthe
binding ratios for each panel is shown. Each factor follows the
same binding pattern.From this we conclude that Suz12, Eed and
H3K27me3 are present at essentially the sameset of genes and that
our stringent gene calling algorithm sometimes calls a gene bound
byone factor but not another factor because of the inherent false
negative rate of ~30% (seeEstimating Error Rates section).
Figure S7. Protein domain classification of Suz12- and Pol
II-bound transcriptionfactors.Fraction of transcription factor
categories bound by Suz12 (green) or RNA Pol II (blue).The
percentage is expressed relative to all transcription factor genes
assigned to thatcategory by InterPro domain (PANDORA) annotation at
the default resolution.Abbreviations are b-HLH (basic
helix-loop-helix), NHR (nuclear hormone receptor), ETS(erythroblast
transformation specific), b-Zip (basic leucine zipper), PHD finger
(planthomeodomain finger), SMAD (Sma- and Mad-related) and FHA
(forkhead-associated). nindicates the number of transcription
factor genes assigned to a given category.
Figure S8. Suz12 occupies large regions of DNA.Number of RNA
polymerase II (blue bars, left hand axis) and Suz12 (green bars,
righthand axis) bound regions of certain sizes (x axis). Unlike RNA
polymerase II, Suz12occupies over 2 kb of sequence at a significant
number of genes.
Figure S9. H3K27me3 co-occupies large domains with Suz12.a.
Correlation between size of domains of Suz12 binding and H3K27me3
binding. Thetrend was calculated by computing the moving average of
the size of H3K27me3 regionsusing a sliding window of 20 genes
across the set of genes bound by Suz12 and H3K27and ordered by size
of Suz12 bound region. Sizes of bound regions were calculated
frompromoter arrays.b. Binding profile of H3K27me3 (black) across
~500 kb regions encompassing Hoxclusters A-D. Unprocessed
enrichment ratios for all probes within a genomic region areshown
(ChIP vs. whole genomic DNA). Approximate Hox cluster region sizes
areindicated within black bars.
Figure S10. Generation of Suz12 -/- cells.a. Targeted deletion
of the Suz12 locus. Homologous recombination was used to replacethe
5’ portion of Suz12 with a neo cassette. Location of probe used for
southern blotverification in (b) is shown. Restriction enzymes are
denoted B, BamH1; E, EcoR1; X,Xba1.b. Southern blot analysis of
BamH1 digested genomic DNA from each genotype.c. Western blot
analysis of whole cell extracts derived from each genotype.
Immunoblotswere probed with anti-Suz12 (top) or anti-Lamin B
(bottom).d. Embryos generated from Suz12 heterozygous crosses were
analyzed at different stagesof development. At 7.75 dpc,
normal as well as morphologically smaller embryos wereevident.
Genotyping analysis indicated that the abnormal embryos were
homozygous forthe Suz12 null allele confirming that Suz12 is
required for early development.
Figure S11. Binding of Suz12 in differentiated muscle.a. Suz12
binding profiles across the muscle regulator MYOD1 gene in H9 human
ES cells
-
28
(green) and differentiated myotubes (grey). The plots show
unprocessed enrichment ratiosfor all probes within a genomic region
(ChIP vs. whole genomic DNA). Genes are shownto scale below plots
(exons are represented by vertical bars). The start and direction
oftranscription are noted by arrows.b. H3K27me3 profiles across the
muscle regulator MYOD1 gene in H9 human ES cells(black) and
differentiated myotubes (blue). The plots show unprocessed
enrichment ratiosfor all probes within a genomic region (ChIP vs.
whole genomic DNA). Genes are shownto scale below plots (exons are
represented by vertical bars). The start and direction
oftranscription are noted by arrows.c. Suz12 binding profiles
across the muscle regulator PAX3 gene, as in a.d. H3K27me3 profiles
across the muscle regulator PAX3 gene, as in b.e. Suz12 binding
profiles across the muscle regulator PAX7 gene, as in a.f. H3K27me3
profiles across the muscle regulator PAX7 gene, as in b.
Figure S12. Detection of genes bound by RNA polymerase II and
Suz12 in human EScell expression datasets.Percentages of genes that
are bound by RNA polymerase II only, RNA polymerase II andSuz12,
and Suz12 only that are detected in 7 ES cell expression datasets
and onedifferentiated cell expression dataset. The first four ES
cell datasets and the differentiatedcell dataset were generated
using gene expression arrays (H1: U133A arrays (Sato et al.,2003);
H9, HSF1 and HSF6: U133A+B arrays (Abeyta et al., 2004);
differentiated tissues:U133A arrays (Su et al., 2004)). The
percentages are relative to the fraction of bound genesthat are
represented on the arrays. The last three ES cell datasets were
generated usingMPSS (Brandenberger et al., 2004; Wei et al.,
2005).
Figure S13. Relationship between size of Suz12 and RNA
polymerase II co-occupancyand gene expression.The percentage of
genes with detectable RNA (grey bars) and associated with
RNApolymerase II (blue bars) as a function of the extent of Suz12
binding. The frequencies forgenes not bound by Suz12 are indicated
on the left as controls.
Figure S14. Association of Oct4, Sox2 or Nanog with Suz12-bound
regions.a. The percentage of Oct4-bound regions (purple arrow),
Sox2-bound regions (red arrow)or Nanog-bound regions (green arrow)
that overlap with Suz12-bound regions are shownalong the x-axis.
Comparisons were made between promoter array data from Boyer et
al.,2005 and whole genome Suz12 data presented here. The dashed
line indicates thedistribution of the expected overlap based on
randomized data. For comparison, we alsoshow the results for a
fourth transcription factor, E2F4 (blue arrow).b. The percentage of
Sox2 and Oct4-cobound regions or Nanog and Oct4-bound
regions(purple arrows) that overlap with Suz12-bound regions are
shown along the x-axis.Comparisons were made between promoter array
data from Boyer et al., 2005 and wholegenome Suz12 data presented
here. The dashed line indicates the distribution of theexpected
overlap based on randomized data. For comparison, we also show the
results forSox2-bound regions that are not bound by Oct4 (red
arrow) and Nanog-bound regions thatare not bound by Oct4 (green
arrow).
-
29
Figure S15. Motifs associated with DNA regions that are bound by
Oct4, Sox2, Nanogand Suz12 or bound by Oct4, Sox2 and Nanog.a.
Consensus sequence of a motif associated with DNA regions bound by
Oct4, Sox2,Nanog and Suz12. This motif was found in approximately
50% of the regions bound byOct4, Sox2, Nanog and Suz12 and enriched
4.8-fold compared to regions bound by Oct4,Sox2 and Nanog but not
Suz12.b. Consensus sequence of a motif associated with DNA regions
bound by Oct4, Sox2 andNanog and not bound by Suz12. This motif was
found in approximately 20% of theregions bound by Oct4, Sox2, Nanog
and enriched 3.0-fold compared to regions bound byOct4, Sox2, Nanog
and Suz12. Putative transcription factor binding sites are labeled
andindicated by black lines. Binding sites were identified with
P-Match (http://www.gene-regulation.com) using the input sequence
CCTGTAATCCCAGC and cut-off selection formatrix group to minimize
the sum of false positives and negatives.c. Consensus sequence of a
motif associated with DNA regions bound by Oct4, Sox2 andNanog and
not bound by Suz12. This motif was found in approximately 15% of
theregions bound by Oct4, Sox2, Nanog and enriched 2.4-fold
compared to regions bound byOct4, Sox2, Nanog and Suz12. No
putative transcription factor binding sites wereidentified when
examined as described in b. using the input sequenceATCTCGGCTCACTG.
More lenient selections for the cut-off selection indicate
potentialbinding sites for C/EBP, HoxA3, CdxA, Msx-1 and v-Myb.
-
30
Supplementary References
Abeyta, M. J., Clark, A. T., Rodriguez, R. T., Bodnar, M. S.,
Pera, R. A., and Firpo, M. T.(2004). Unique gene expression
signatures of independently-derived human embryonicstem cell lines.
Hum Mol Genet 13, 601-608.
Akasaka, T., Kanno, M., Balling, R., Mieza, M. A., Taniguchi,
M., and Koseki, H. (1996).A role for mel-18, a Polycomb
group-related vertebrate gene, during
theanteroposteriorspecification of the axial skeleton. Development
122, 1513-1522.
Bailey, T. L., and Elkan, C. (1995). The value of prior
knowledge in discovering motifswith MEME. Proc Int Conf Intell Syst
Mol Biol 3, 21-29.
Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis,
mechanism, and function. Cell116, 281-297.
Bateman, A., Birney, E., Cerutti, L., Durbin, R., Etwiller, L.,
Eddy, S. R., Griffiths-Jones,S., Howe, K. L., Marshall, M., and
Sonnhammer, E. L. (2002). The Pfam protein familiesdatabase.
Nucleic Acids Res 30, 276-280.
Birve, A., Sengupta, A. K., Beuchle, D., Larsson, J.,
Kennison,