Supporting Appendix for: MicroRNA-15a and 16-1 Act Via MYB to Elevate Fetal Hemoglobin Expression in Human Trisomy 13 Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, Jo-Anne Vergilio, Peng Ji, Jinkuk Kim, Prathapan Thiru, Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish* *To whom correspondence should be addressed. E-mail: [email protected]
55
Embed
Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Supporting Appendix for: MicroRNA-15a and 16-1 Act Via MYB to Elevate Fetal Hemoglobin Expression in Human Trisomy 13 Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, Jo-Anne Vergilio, Peng Ji, Jinkuk Kim, Prathapan Thiru, Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish* *To whom correspondence should be addressed. E-mail: [email protected]
1
SI Materials and Methods
Cell Culture
293T cells were maintained in DMEM with 10% FCS and 2% penicillin/streptomycin. These cells
were transfected with the FuGene 6 (Roche) reagent according to manufacturer’s protocol.
K562 erythroleukemia cells were cultured in RPMI-1640 medium with 10% FCS, 2%
penicillin/streptomycin, and 1% L-glutamine. Cells were maintained at a density of 0.1-1 X 106
cells/ ml.
Culture and differentiation of primary human CD34+ cells was performed similar to what has
been previously described (1). Briefly, the CD34+ cells were obtained from magnetically-sorted
mononuclear samples of G-CSF mobilized peripheral blood from donors and were frozen after
isolation. Cells were obtained from the Yale Center of Excellence in Molecular Hematology
(YCEMH). Cells were thawed and washed into RPMI-1640 with 10% FCS, and then seeded in
StemSpan SFEM Medium (StemCell Technologies Inc.) with 1X CC100 cytokine mix (StemCell
Technologies Inc.) and 2% penicillin/streptomycin. Cells were maintained in this expansion
medium at a density of 0.1-1 X 106 cells/ ml with media changes every other or every third day
as necessary. Cells were kept in expansion medium for a total of 6 days. On day 6, cells were
reseeded into StemSpan SFEM Medium with 2% P/S, 20 ng/ml SCF, 1 U/ml Epo, 5 ng/ml IL-3,
2 micromolar dexamethasone, and 1 micromolar β-estradiol. Cells were maintained in
differentiation medium, with media changes every other or every third day as needed. Cells
were maintained at a density of 0.1-1 X 106 cells/ ml. By day 3 of differentiation, homogeneous
larger blasts were present in the culture. By day 5, the majority of cells had proerythroblast
morphology and on day 7 the majority of the cells had basophilic erythroblast morphology. By
day 12 of differentiation, the majority of cells demonstrated orthochromatophilic and
2
polychromatophilic erythroblast morphology. This morphological classification has been
confirmed using phenotypic markers of erythropoiesis, including CD235, CD71, CD45, and
CD36 expression.
Constructs
A 628 bp genomic DNA fragment from human chromosome 13 containing the hairpin region of
miR-15a and 16-1 and 200bp flanking sequence on each side was isolated by PCR from human
genomic DNA and was cloned into the XhoI/XbaI and FseI/PacI sites of the pLVX-puro
(Clontech) and pSMPUW-puro (Cell BioLabs, Inc.) lentiviral expression vectors, respectively
(core primer sequences: forward 5’-GGGCACAGAATGGACTTCAG-3’; reverse 5’-
GATGGCATTCAATACAATTATTA-3’).
The 1.21 kb 3’-UTR of MYB was cloned into the XhoI and NotI sites of the psiCheck2 vector
(Promega) after PCR amplification from human genomic DNA.
shRNA lentiviral constructs targeting human MYB were obtained from the Sigma-Aldrich
Mission shRNA collection and the clones used in this study were TRCN0000009853 and
TRCN0000040058. The sequences of the shRNAs encoded by these clones is (respectively):
Each subject is either affected or unaffected with elevated HbF. This information is stored in a
binary vector ]1 0 1TA
N
=v
L1442443
N, also of length , where 1 signifies an affected subject.
11
To summarize the data, we can construct probability distributions of trisomy of various bands in
the affected and unaffected populations (shown in Figures 1B and S1 in the paper):
( )
( ) ( ) ( )| / 1
| 1 / 1
T T
T T
P Trisomy Affected A A
P Trisomy Unaffected A A
=
= − −
S
S 1
v v v
v v v v v
Av
1T
Here, is a column vector of length and simply serves to either sum the elements of a
vector, or to reverse the bits in . Thus,
1v
N
Av v
1 A−v v
A
is simply the total number of affected subjects in
the data set, is a vector analogous to v
but with 1s for unaffected subjects, and
is the total number of unaffected subjects. For a full list of symbols used in this
document, refer to the Mathematical Glossary.
( )1 1A−v v vT
Model Notation and Assumptions
We test a number of models designed to compute the probability that a particular band X of
chromosome 13 contains the gene responsible for elevation of HbF. This is similar to the model
used by Korbel et al. in their study of segmental trisomy 21 (17). However, we also model gene
silencing by supposing that there exists a regulatory region, which is necessary for the genes in
band X to be expressed or, alternatively, to ensure that the genes in band X are not silenced.
We assume that if the gene responsible for elevation of HbF is expressed in trisomy, then that
subject has a probability of being affected (or *P *1 P− of being unaffected). On the other hand,
if the responsible gene is not expressed in trisomy (either because band X is not present in
trisomy, or if it is present in trisomy but the regulatory region is not), then the subject has a
probability of being affected (1 of being unaffected), with 0P 0P− 0 *P P< .
12
In all models, we have a single parameter X , and we wish to compute as
in Equation (1). There are two other potential parameters, and , but their values can be
ascertained from full trisomy data. The vast majority of individuals with full trisomy 13 have
elevated HbF, and since full trisomy implies that the gene of interest is certainly expressed in
trisomy, we set . On the other hand, HbF elevation is very rare in individuals without
trisomy 13, so we set . We also test these assumptions, and show that as long as
, on average we obtain the same answer as if we assume the aforementioned explicit
values for and .
( )| ,P X x data M=
*P 0P
* 0.8P =
0 0.05P =
0 *P P<
*P 0P
Finally, we consider the prior probability of X : ( )P X x= (how strongly we believe that the
gene of interest resides on a particular band x , before we see any data). There are many ways
to derive a valid prior; for example, we could say that the probability of a gene being on a
particular band is proportional to the number of genes contained within that band. For simplicity,
we use a uniform prior, where we consider each band equally likely to contain the gene of
interest. In the remainder of the document, all the equations contain the general term
, but in the numerical analysis we use ( )P X x= ( ) 1P X x x . B
= = for all
General Probability Structure
For each model, we compute the probability ( )| ,P data X x M= . Due to the definitions of
and , the conceptual equation of this probability is:
*P
0P
( ) ( ) ( ) ( ) ( )* 0 * 0| , 1 1ATxE ADx ATxS UTxE UDx UTxSN N N N N NP data X x M P P P P
+= = − −
+ (2)
13
where is the number of subjects who are affected (ATxEN A ) with elevated HbF and in whom
band x is expressed ( E ) in trisomy ( ), is the number of subjects who are affected and
in whom band
T ADxN
x is present in disomy ( ), is the number of subjects who are affected
and in whom band
D ATxSN
x is present in trisomy but is silenced ( ); , , and N are the
analogues for the unaffected (U ) population. In the equation for each model, we show a matrix-
vector notation for how each of these counts is calculated from the data (S and ).
S UTxEN UDxN UTxS
Av
( )P X x
The denominator of Equation (1) is simply Equation (2) multiplied by the prior = , and
summed over all possible values of x (by the law of total probability):
( ) ( ) (1
| | ,B
kP data M P data X k M P X k
=)= = =∑
14
Model 1
In Model 1, we assume that no other regulatory region is necessary for expression of the genes
on band X . This is equivalent to assuming that a regulatory region does exist, but that it also
resides on band X . For this model, the posterior probability distribution for X is:
( ) ( ) ( ) ( ) ( )( ) ( )( ) ( ) ( )
( ) ( ) ( ) ( )( ) ( )( ) ( ) ( )
1 1 1 1* 0 * 0
1 1 1 1* 0 * 0
1
1 1| , 1
1 1
T TT Tx x x x
T TT Tk k k k
A S A S A S A S
B A S A S A S A S
k
P P P P P X xP X x data M
P P P P P X k
− − − −
− − − −
=
− − == =
− − =∑
v v v v v v v v v v v v
v v v v v v v v v v v v
x xS δ= Sv v
th column of S . x is the Where
Model 2
In Model 2, we assume that a regulatory region exists somewhere between the p terminus and
X , but not within X . We also assume that the regulatory region is equally likely to exist in any
of the bands between the p terminus and ( )1x1/X (probability of − for each band). The
posterior probability distribution for X is similar to that of Model 1, but with a different set of
exponents:
15
( )
( ) ( ) ( ) ( )( )
( ) ( )( )
) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
1 1* 0
1 1 1 1* 0
1 1
| , 2
1 1
T Tx x x x x
T Tk k k k k
T Tk k k k k
A L A S L
A L A S L
A L A S L
P P
P P
P X x
P X x data M
P P
P X k
− + −
− + −
− − − +
⎧ ⎫×⎪ ⎪
( ) ( )
−
( ) ( )( )
( ) (
1 1 1 1* 0T T
x x x x xA L A S L
P P
− − − + −⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪
D D
D D
D D
D D
v v v v vv v
v v v v vv v
v v v v v v vv v
v v v v v v vv v
1
B
k=
⎪⎩ ⎭⎪
For clarity, we use t ctor
∑
(3)
he ve xS as in Model 1, and define a diagonal matrix containing the
vector
v
xSv
on the main diagonal ( )x xdiag S=Dv
, and a vector containing the fraction of bands
between the p terminus and band x (excluding band x ) that are trisomied 1x x
δ⎛ ⎞⎜ ⎟−⎝ ⎠
vv
To understand the meaning of
xL <= S .
xLv
, consider the two subjects in Figure 1, and assume they are
both affected. Subject 1 has trisomy of bands 1-6 and subject 2 has trisomy of bands 4-10. If we
are looking at band 5x = , and we assume that the regulatory region is between the p term
(left side) and 5, then ct S1 contains the regulatory region within the trisomied segment
with probability 1 and therefore S1
inus
subje
contributes a 1 to the ATxEN count. Subject 2 however,
contains trisomy of band 5, but only contains trisomy of one of the four bands that could contain
the regulatory region. Since each of the four bands (1-4) have equal probability of containing the
regulatory region, S2 has a 0.25 chance of also having trisomy of the regulatory region and
being subject to *P , and a 0.75 chance of not containing the regulatory region within its trisomy
region in which case he/she is subject to 0P . Therefore, S2 would contribute 0.25 to the ATxEN
16
count and 0.75 to the ATxSN count. xLv
contains the fraction of bands that potentially contain the
regulatory region and are trisomied.
S
S2
1
1 2 3 4 5 6 7 8 9 10 qp
Partial trisomy of two hypothetical affected subjects.
Model 3
odel 3 is similar t el 2, except that we assume that the regulatory r
somewhere between
Figure 1
o ModM egion exists
X
within
and the q terminus. The other assumptions of Model 2 remain: the
regulatory region is not X X, and it is equally likely to be in any of the bands between
and the q te Model 3 is:
rminus. The equation for
( )
( ) ( ) ( ) ( )( )
( ) ( )
(( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
* 0
1 1 1 1* 0
1 1
| , 3
1 1
T Tx x x x x
T Tk k k k k
A G A S G
A G A S G
P P
P P
P X x data MP P
P P
P X k
− + −
− − − + −
( ) ( ) ( ) ( )( )
( )
) ( ) ( ) ( )( )
1 1 1 1
1 1* 0
T Tx x x x x
T Tk k k k k
A G A S G
A G A S G
P X x
− − − + −
− + −
⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪⎩ ⎭
D D
D D
D D
D D
v v v v v v v
v v v v v v v
v v v v v v v v v
v v v v v v v v v
(4)
1
B
k=
⎪
∑
17
Where we replace xL with v
xxG
B xδ>⎛ ⎞
= ⎜ ⎟−
v
⎝ ⎠S
vra of , which contains the f ction trisomied bands
between band x and the q terminus (bes xLv
ides the different denominators, note that in we
bands less than only look at x , but in xGv
x we only look at bands greater than ).
Model 4:
In Model 4, we assume that there is a regulatory region, but we make no restriction on where it
an be. We still assume that it is equally likely to be in any of the bands, except bc and X . The
equation for this probability distribution is similar to Equations (3) and (4) except that we define
a vector which contains the fraction of all bands outside of band X that are trisomied:
11
B xxO
Bδ⎛ ⎞−
= ⎜ ⎟−⎝ ⎠S
v vv
. Using this definition and the same definitions of xSv
and xD from above, we
−
can write the distribution for this model as:
( ) ( ) ( ) ( )( )
( )
1 1* 0T T
x x x x xA O A S OP P
− + −⎧ ⎫×D D
v v v v v v v
⎪ ⎪
( )
( ) ( )( ) ( ) ( )( )
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1 1 1* 0
1 1* 0
1 1 1 1* 0
1 1
| , 4
1 1
T Tx x x x x
T Tk k k k k
T Tk k k k
A O A S O
A O A S O
A O A S O
P P
P X x
P X x data MP P
P P
P X k
− − − + −
− + −
− − − +
⎪ ⎪
k
⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪⎩ ⎭
D D
D D
D D
v v v v v v v v v
v v v v v v v
v v v v v v v vv
1
B
k=
⎪
∑
(5)
In Models 2-4, we assumed that the regulatory region does not exist within the same band as
the gene of interest. For completeness, we also consider models where we include band X as
18
a possible location of the regulatory region. We name these models M2I-M4I, since they are
inclusive of band X . The equations are:
odel 2I
We define
M
to also include band x : xxIL
xδ≤⎛ ⎞
= ⎜ ⎟⎝ ⎠
Sv
vxILv
:
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
1 1 1 1* 0
1 1* 0
1 1 1 1* 0
1 1
| , 2
1 1
T Tx xI x x xI
T Tx xI x x xI
T Tk kI k k kI
T Tk kI k k kI
A L A S L
A L A S L
A L A S L
A L A S L
P P
P P
P X x
P X x data M IP P
P P
P X k
− + −
− − − + −
− + −
− − − + −
⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ×⎪⎪⎪
− − ×⎨⎪
=⎪⎩
D D
D D
D D
D D
v v v v vv v
v v v v v v vv v
v v v v vv v
v v v v v v vv v
1
B
k=
⎫⎪⎪⎪⎬⎪⎪
⎪ ⎪⎭
∑
(6)
Model 3I
e define xIGv
to also include band x : 1
xxIG
B xδ≥⎛ ⎞
= ⎜ ⎟− +⎝ ⎠S
vv
: W
19
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
1 1 1 1* 0
1 1* 0
1 1 1 1* 0
1 1
| , 3
1 1
T Tx xI x x xI
T Tx xI x x xI
T Tk kI k k kI
T Tk kI k k kI
A G A S G
A G A S G
A G A S G
A G A S G
P P
P P
P X x
P X x data M IP P
P P
P X k
− + −
− − − + −
− + −
− − − + −
⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ×⎪⎪⎪ − − ×⎨⎪
=⎪⎩
D D
D D
D D
D D
v v v v v v v
v v v v v v v v v
v v v v v v v
v v v v v v v v v
(7)
1
B
k=
⎫⎪⎪⎪⎬⎪⎪
⎪ ⎪⎭
∑
Model 4I
We define
xIOv
to also include band x : 1B
xIOB
⎛ ⎞= ⎜ ⎟
⎝ ⎠S
vv
:
( ) ( ) ( ) ( )
( )
( )
( )( ) ( )( ) ( ) ( )( )
1 1* 0
1 1 1 1* 01 1
T Tx xI x x xI
T Tx xI x x xI
A O A S O
A O A S O
P P
P P
− + −
− − − + −
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
1 1 1 1* 0
| , 4
1 1
T Tk kI k k kI
T Tk kI k k kI
A O A S O
A O A S O
P X x
P X x data M IP P
P P
P X k
− + −
− − − + −
⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ×⎪⎪⎪ − − ×⎨⎪
=⎪⎩
D D
D D
D D
D D
v v v v v v v
v v v v v v v v v
v v v v v v v
v v v v v v v v v
1
B
k=
⎫⎪⎪⎪⎬⎪⎪
⎪ ⎪⎭
∑
(8)
20
Results
Figure 2 contains the BS for all 7 models. Judging from the values, Model 2 is the most
appropriate.
M1 M2 M3 M4 M2I M3I M4I0
1
2
3
4
5
6
7x 10-12
Model
Bay
esia
n S
core
Model Bayesian Scores
Figure 2 Model M2 has the highest BS, followed closely by models M1 and M2I. Although it cannot be distinguished from the figure, the order of BS for the remaining models is M4I, M4, M3I, M3 (from highest to lowest).
We show the probability distributions according to all 7 models in Figure 3. It is interesting to
note that Models 1, 4, and 4I all suggest 13q12 as the one most likely to contain the gene
related to elevation of HbF. However, the model which best explains the data (Model 2) peaks
for 13q14. Since Model 2 is the best model according to Figure 2, we conclude that 13q14
contains the gene causing elevation of HbF.
The next logical question concerns our choice of and : how sensitive is the BS and the
most likely band to the choice of these values? To answer this question, we computed the
probability distributions and BSs for each model for a set of and values, maintaining only
*
*P 0P
P 0P
21
the condition that . Figure 4A shows that the actual values of and affect which is
the most likely model. There are four regions: Model 1 is most appropriate for low values of
and relatively low values of , Model 3 is most appropriate for high values of and high
values of , while Model 2 is most appropriate for the majority of ( , ) values with a
subsection containing smaller values of owned by Model 2I. Figure 4B shows that the BSs
corresponding to the most likely models from panel A are actually fairly well contained in the
( , ) space, being highest around and . Figure 4C shows that if we
choose the most appropriate model for each ( , ) combination, and then choose the band
with the highest posterior probability, band 13q14 dominates the majority of the parameter
space. Smaller sections of the parameter space favor bands 13q11 - 13q13 and the terminal
bands pter and qter. Figure 4D shows the actual posterior probabilities of the corresponding
Figure 3 Posterior probability distributions according to all 7 models. The majority of models favor band 13q12; however, Model 2, which is the most likely according to the BS criterion, points to band 13q14.
23
A
P0
P*
Model with the highest BS
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
M1
M2
M3
M4
M2I
M3I
M4I
B
P0
P*
BS of Most Appropriate Model
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.5
1
1.5
2
2.5
3
3.5
x 10-11
C
P0
P*
Most Likely Band from Most Appropriate Model
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
pterp12p11.2p11.1q11q12q13q14q21q22q31q32q33qter
D
P0
P*
Probability of Most Likely Band from Most Appropriate Model
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.2
0.4
0.6
0.8
1
Figure 4 The most likely model for a particular , is shown in panel A. Panel B shows the actual BS of the corresponding models in A. Panel C shows the most likely band from the distributions corresponding to the most likely models in A, and panel D shows the actual probability of the corresponding band in panel C.
*P 0P
To summarize the , perturbation study detailed in Figure 4 ,we compute the average over
all values of and of the probability that each band is the highest-probability band
according to the most likely model. In other words: for each value of and , compute the
BS of each model, select the model with the highest BS (
*P 0P
*P 0P
*P 0P
*M ), use to ( )*| ,P X x data M=
24
*select the band with the highest probability ( x ), and record the value
which captures both the likelihood of the particular band containing the gene of interest and the
appropriateness of the model in explaining all the data. Then, for each band , compute the
average of over all the and for which
( )* *, |P X x data M=
b
( )* *, |P X x data M= *P 0P *x b= . This result is shown
in Figure 5, where we clearly see that regardless of choice of and , if we pick the most
appropriate model for the situation, band 13q14 most likely contains the gene responsible for
Figure 5 Probability that a band contains the gene responsible for elevated HbF according to the most appropriate model (M*), averaged over all values of and .
Conclusion
We constructed seven models to calculate the probability that a particular band of chromosome
13 houses the gene responsible for elevation of HbF when the gene is expressed in trisomy.
The likelihood of each model was evaluated using a data set containing 57 partial trisomy cases
with documented levels of HbF. The most likely model assumes that a single band contains the
gene of interest, and that a regulatory region for that gene exists between the p terminus and
25
the gene itself. According to this model, the gene implicated in elevation of HbF is located on
band 13q14. This conclusion is not sensitive to the particular values of (probability of
elevated HbF if the responsible gene is expressed in trisomy) and (probability of elevated
HbF if the responsible gene is not expressed in trisomy) assumed in the model.
*P
0P
26
Mathematical Glossary
Here we present a quick reference for all the symbols used in the text above. All vectors are
column vectors; their transposes (superscript ) are shown in the definitions below. T
x th column of : SSubject trisomy matrix, one subject on each
row, subjects, N B chromosome bands, 1
means trisomy of a particular band:
Subject 1 1 1 0 ⎫⎡ ⎤ ⎡Subject 2 1 0 0
Subject N 0 1 1
N
B
⎤⎪⎢ ⎥ ⎢ ⎥⎪⎢ ⎥ ⎢ ⎥= = ⎬⎢ ⎥ ⎢ ⎥⎪⎢ ⎥ ⎢ ⎥⎪⎣ ⎦ ⎣ ⎦⎭
SL L L
L M L M M O M
L L L1442443
[
L L L
Affected vector, each subject is either affected
(1) or unaffected (0):
]1 0 1TA
N
=v
L1442443
Av
[
Vector of ones, same size as :
]1 1 1 1T
N
=v
L1442443
Vector of ones, length B :
[ ]1 1 1 1TB
B
=v
L1442443
The
x xSv
δ= Sv
Diagonal matrix with the vector xSv
on the main
diagonal:
( )x xdiag S=Dv
Vector containing the fraction of bands
between the p terminus and band x that are
trisomied:
excluding x : 1
xxL
xδ<⎛ ⎞
= ⎜ ⎟
vv
−⎝ ⎠S
including x : xxIL
xδ≤⎛ ⎞
= ⎜ ⎟⎝ ⎠
Sv
v
Vector containing the fraction of bands
between the band x and the q terminus that
are trisomied:
excluding x : xxG
B xδ>⎛ ⎞
= ⎜ ⎟
vv
−⎝ ⎠S
27
28
Vector of zeros with a 1 in position
corresponding to x ; length B :
[ ]0 0 1 0 0
x
=
↑
L L
T
B64444744448vxδ
xδ<
v, xδ>
v, etc are vectors like xδ
v with ones
where the subscript is true, for example xδ<
v
has ones for all bands less than x .
including x :1
xxIG
B xδ≥⎛ ⎞
= ⎜ ⎟− +⎝ ⎠S
vv
Vector containing the fraction of bands outside
band x that are trisomied:
excluding x :1
1B x
xOB
δ⎛ ⎞−= ⎜ ⎟−⎝ ⎠
Sv v
v
including x : 1B
xIOB
⎛ ⎞= ⎜ ⎟
⎝ ⎠S
vv
Table S1. Chromosomal Band 13q14 Gene Information and Relative Erythroid Expression. Name RefSeq EntrezGene CD71+
Table S2. Chromosomal Band 13q14 Promoters with Significant H3K4me3 Peaks in K562 Cells.
Gene Name AK056182 AK095119 AK124928
ALG11 ATP7B
BC039553 C13orf1 CAB39L CDADC1 CKAP2 CLLD8
CR625002 DHRS12
DKFZp434B105 DKFZp434F1622 DKFZp434H1720
DLEU2 EBPL EBRP
GTF2F2 HNRNPA1L2
INTS6 ITM2B
KBTBD6 KIAA0564 KIAA1704
KPNA3 MRPS31 MTRF1
NARG1L NUDT15 NUFIP1 pp13759
RB1 SETDB2
SLC25A15 SUGT1 TPT1
UTP14C WBP4 WDF2
WDFY2 XTP6
ZC3H13
32
33
Table S3. Chromosomal Band 13q14 Genes with Elevated Expression in CD71+ Erythroid Progenitors and Significant H3K4me3 Promoter Peaks in K562 Cells.
Gene Name
CKAP2
DLEU2
MRPS31
MTRF1
RB1
SLC25A15
SUGT1
WBP4
ZC3H13
Table S4. Pathological Analysis of Hematopoiesis in Trisomy 13 Cases.
Identification Number Age, Gender, and Phenotypic Information Notes from Pathological Assessment TR13-1 3 month, female
Trisomy 13 ‐ 90% cellular BM ‐ M:E = 1:1-2 ‐ Full erythroid maturation ‐ 6 megas per 40x hpf, increased ‐ Mild EMH in spleen
TR13-2 5 day, male Trisomy 13
‐ >90% cellular BM ‐ M:E = 1-2:1 ‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Mild EMH in liver
TR13-3 1 day, male Trisomy 13
‐ 100% cellular BM ‐ M:E = 1-2:1 ‐ Full erythroid maturation ‐ 1-2 megas per 40 x hpf ‐ Mild EMH in liver
TR13-4 3 day, female Trisomy D (unclear if karotype done, phenotype
compatible)
‐ 100% cellular BM ‐ M:E = 10:1 ‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Mild EMH in liver
TR13-5 1 day, female Trisomy 13
‐ 100% cellular BM ‐ M:E = 10:1 ‐ Rare erythroids, but seem mature ‐ Overall 1-2 megas per 40 x hpf, focal clusters
present ‐ Mild EMH in liver
TR13-6 1 hr 18 min, male
Trisomy 13 ‐ 100% cellular BM ‐ M:E = 3-5:1 ‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Marked EMH in liver
TR13-7 5 hr, male Trisomy 13
‐ 100% cellular BM ‐ M:E = 10:1 ‐ Only rare erythroids, maturation difficult to
assess ‐ 1-2 megas per 40 x hpf ‐ Moderate EMH in liver
TR13-8 1 day, female Trisomy 13 mosaic
‐ 100% cellular BM ‐ M:E = 1:3
34
‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Prominent EMH in liver
TR13-9 4 day old, male, Trisomy 13 [46,XY,-D,t(DqDq)+]
‐ 100% cellular BM ‐ 5:1 M:E ratio ‐ Slight left shift of myeloid elements ‐ Full erythroid maturation ‐ Abnormal megakaryocyte nuclei are noted
(hyperchromatic nuclei with “Staghorn” appearance), megas with patchy distribution t/o marrow, foci of increased megakaryocytes
‐ Mild EMH in liver TR13-10 7 day old female, Trisomy 13 Mosaic (present in 36% of
cells) ‐ 100% cellular BM ‐ 1:1 to 1:2 M:E ratio ‐ Full erythroid maturation, but with increased
immature forms (SEE PHOTO) ‐ Overall 2-3 megas per hpf, patchy distribution ‐ Megakaryocytes with abnormal nuclei (Small
hyperchromatic nuclei and “Staghorn” appearance, SEE PHOTO)
‐ No extramedullary hematopoiesis noted in liver ‐ Spleen with mild hematopoiesis
TR13-11 11 day old female, Trisomy 13 ‐ 100% cellular BM ‐ > 5:1 M:E ratio ‐ Some left-shift of myeloid elements ‐ Full erythroid maturation ‐ 5-10 megas per 40x hpf (increased), abnormal
nuclei (small hyperchromatic – SEE PHOTO) ‐ No extramedullary hematopoiesis noted in liver ‐ MIld hematopoiesis in spleen
TR13-12 6 hour old male, Trisomy D1/13 (no genetic studies reported, but phenotypically
compatible)
‐ Scant marrow space, appears 100% cellular ‐ 1:1 M:E ratio (Note: large number of
hematogones excluded in this estimate) ‐ Full erythroid maturation ‐ Scattered megas seen (cytologically
unremarkable, ? appropriate in number) ‐ Marked extramedullary hematopoiesis in liver
with a 1:1 M:E ratio present in the liver TR13-13 5 day old female, Trisomy 13, ABO incombatibility with
jaundice
‐ 100% cellular BM ‐ 2:1 M:E ratio -> varies to 1:1 in certain regions ‐ Full maturation of erythroid elements ‐ Megakaryocytes decreased in number, 0-1 per
40x hpf ‐ ? Liver (not examined) -> small foci of
extramedullary hematopoiesis noted on pathology report
TR13-14 1 year old male [Trisomy 13 never confirmed – these were pre-mortem
studies]
‐ 100% cellular BM (slightly hypercellular for age) ‐ M:E ratio 5-6:1
35
36
‐ Myeloid cells showing some left shift ‐ Full erythroid maturation noted ‐ On avg 3-4 megakaryocytes per 40x hpf, with
small and condensed hyperchromatic nuclei (some with staghorn appearance)
‐ No extramedullary hematopoiesis was noted TR13-15 9 month old female, Trisomy D1 (Trisomy 13) ‐ >95% cellular
‐ M:E ratio 1-3:1 ‐ Slight left shift in myeloid cells ‐ Full maturation of erythroid elements ‐ Megakaryocytes show small hyperchromatic
nuclei with “Staghorn” appearance ‐ Slight increase in megakaryocyte numbers
noted per high power field (~5-7 megakaryocytes seen in most high power fields examined)
‐ Liver not available for review TR13-16 23 day old female, Partial Trisomy D1
Mother gravida 8:
One prior stillborn Four first trimester abortions Two normal kids (one male & one female) Similarities and differences with classic phenotype of trisomy 13
‐ Appropriate cellularity for age (>95% cellular) ‐ Increased hematogones ‐ M:E ratio is 5:1 ‐ Slight left shift in myeloid cells ‐ Full erythroid maturation visualized ‐ Megakaryocytes show small, “Staghorn”
appearing, hyperchromatic nuclei ‐ Slight increase in megakaryocyte numbers, 5-10
per high power field is noted ‐ Small minor clusters of hematopoiesis in the
liver are noted (with both erythroid and myeloid elements seen)
TR13-17 25 day old female, Trisomy D1 (Trisomy 13) ‐ Appropriate cellularity for age (>95% cellular) ‐ M:E ratio is 3-5:1 (morphology is poor) ‐ Left shift in myeloid cells is observed ‐ Full erythroid maturation is seen ‐ Megakaryocytes show small hyperchromatic
“Staghorn” nuclei ‐ ~2 megas per high power field, but given limited
cellularity, appears increased ‐ No observable hematopoiesis in the liver
(multiple sections assessed)
Supplementary Figure Legends
Fig. S1. Schematic of partial trisomy 13 cases with elevated and normal fetal hemoglobin (HbF)
levels. The figure on the left shows the 14 cases with elevated levels of HbF, from which the
proportion of each chromosomal region involved in cases with elevated HbF is derived in Figure
1B. The figure on the right shows the 43 cases with normal HbF levels, from which the
proportion of each chromosomal region involved in cases with normal HbF is derived in Figure
1B. Each case is shown in a vertical column and chromosomal regions included in each case
are shown in red.
Fig. S2. Relative expression in CD71+ erythroid precursors is shown for genes in the
chromosomal band 13q14 region. Relative expression is shown as a log ratio compared to a
panel of 78 other tissues (12). Genes that are known to play a role in erythropoiesis and globin
gene regulation consistently show a relative expression > 1 using this approach (including
BCL11A, GATA1, KLF1, and SOX6). All of these genes and their relative expression in CD71+
erythroid progenitors are shown in Table S2.
Fig. S3. Relative miR-15a (top, blue) and miR-16 (bottom, pink) expression at day 5 of
erythroid differentiation in primary human CD34+ derived cells transduced with pLVX-puro or
pLVX-miR-15a-16-1 lentiviruses. Quantification was performed using the ∆∆Ct method using
RNU19 as a control.
Fig. S4. Relative miR-15a (top, blue) and miR-16 (bottom, pink) expression in K562 cells
transduced with pSMPUW or pSMPUW-miR-15a-16-1 lentiviruses. Quantification was
performed using the ∆∆Ct method using RNU19 as a control.
37
Fig. S5. CD36 and CD45 staining of pLVX control (black) and pLVX-miR-15a-16-1 (cyan)
transduced erythroid progenitors on day 5 of differentiation show similar levels of staining. The
FACS plot shown are representative of three independent samples. The extent of staining is
similar overall with no major differences for mean fluorescence intensity of either marker. CD36
is expressed at this stage of erythropoiesis and low levels of CD45 are also present. This
suggests that these cells are phenotypically similar. There is a slight tendency for the pLVX-
miR-15a-16-1 cells to be slightly bigger, giving rise to correspondingly higher mean
fluorescence intensities of these markers. The high intensity tail present in CD45 samples is
from the small amount of myeloid cells present in the culture.
Fig. S6. A comparison of aggregate PCT compared with relative expression in erythroid tissues
of interest. Graphs are shown for bone marrow, fetal liver, and K562 cells showing relative
expression (as a log2 expression ratio) compared to a panel of 78 other tissues (12). The MYB
gene is highlighted in red in all of the graphs. The x-axis plots aggregate PCT (14) on a linear
scale, while the y-axis shows relative expression in the various tissues as a log2 ratio.
Fig. S7. Relative normalized expression of γ-globin (HBG) and ε-globin (HBE) from a MYB
siRNA dataset in primary erythroid cord blood progenitors (GSE13110). The expression of γ-
globin (HBG) and ε-globin (HBE) was normalized with all genes in the dataset and plotted
relative to the expression of adult β-globin (HBB). The data from control, mock siRNA, and
MYB siRNA experiments are shown, respectively, from left to right. These data confirm the role
that MYB plays in silencing the fetal and embryonic globin genes in human erythroid
progenitors.
Fig. S8. Representative cytospins from control or shMYB 1 transduced cells from day 6 of
erythroid differentiation. The shMYB cells appear more mature with smaller cell size and more
38
compact nuclei. Additionally an increased number of myeloid (monocytes) are found in various
cytospins, as exemplified by the image on the far right.
Fig. S9. Gene expression of known regulators of globin gene expression and switching.
Expression levels of the previously characterized regulators of globin gene expression,
BCL11A, GATA1, KLF1, ZFPM1, and SOX6 (13) are depicted after processing and normalizing
the microarray data from control cells and cells in which MYB has been knocked down with
shMYB (n = 4 per group). The data is depicted using log2 expression levels. The error bars
show the standard deviation of the mean.
Fig. S10. Gene set enrichment analysis (GSEA) demonstrates that precocious erythroid
differentiation occurs with MYB knockdown. GSEA (6, 7) was used to examine whether a gene
expression set derived from significantly upregulated genes between the MYB shRNA and
control cells (totaling 188 significantly different genes), was enriched when comparing later time
points of erythroid differentiation among G1E cells (9). For these comparisons, the 21 hour
(top) and 30 hour (bottom) time points were compared with the 0 hour time points. The G1E
expression data was derived from the Gene Expression Omnibus accession number GSE628
dataset (9).
Fig. S11. Marked upregulation of γ-globin gene expression upon knockdown of MYB in K562
cells. qRT-PCR was used to compare γ-globin gene expression in K562 cells transduced with
pSMPUW or pSMPUW-miR-15a-16-1 lentiviruses following selection. All experimental samples
are significantly different than the controls (p < 0.001, n = 3-4 per group).
Fig. S12. Alteration in cell cycle regulators on knockdown of MYB. Increase in the relative
expression of cell cycle regulatory genes from microarray data of shMYB and control samples is
39
shown. The decrease of certain positive cell cycle regulators and increase of certain negative
cell cycle regulators is depicted in the graphs. All data is depicted as the normalized mean ±
the standard deviation (n=4 per group).
Fig. S13. Normal megakaryocyte morphology on bone marrow histological sections. These
sections are shown at the same magnification and with similar processing and staining of
samples, as those shown in Fig. 4. Examples of two normal megakaryocytes (with normal
nuclear morphology) are highlighted in the images with cyan arrows. All images are shown at
400X magnification and slides were stained with hematoxylin and eosin.
40
References
1. Sankaran VG, et al. (2008) Human fetal hemoglobin expression is regulated by the
developmental stage-specific repressor BCL11A. Science (New York, N.Y 322(5909):1839-1842.
2. Sankaran VG, Orkin SH, & Walkley CR (2008) Rb intrinsically promotes erythropoiesis by coupling cell cycle exit with mitochondrial biogenesis. Genes & development 22(4):463-475.
3. Sankaran VG, et al. (2009) Developmental and species-divergent globin switching are driven by BCL11A. Nature 460(7259):1093-1097.
4. Dai M, et al. (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic acids research 33(20):e175.
5. Bianchi E, et al. (2010) c-myb supports erythropoiesis through the transactivation of KLF1 and LMO2 expression. Blood 116(22):e99-e110.
6. Mootha VK, et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature genetics 34(3):267-273.
7. Subramanian A, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43):15545-15550.
8. Watkins NA, et al. (2009) A HaemAtlas: characterizing gene expression in differentiated human blood cells. Blood 113(19):e1-9.
9. Welch JJ, et al. (2004) Global regulation of erythroid gene expression by transcription factor GATA-1. Blood 104(10):3136-3147.
10. Gairdner D, Marks J, & Roscoe JD (1952) Blood formation in infancy. Part II. Normal erythropoiesis. Archives of disease in childhood 27(133):214-221.
11. Gairdner D, Marks J, & Roscoe JD (1952) Blood formation in infancy. Part I. The normal bone marrow. Archives of disease in childhood 27(132):128-133.
12. Su AI, et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings of the National Academy of Sciences of the United States of America 101(16):6062-6067.
13. Sankaran VG, Xu J, & Orkin SH (2010) Advances in the understanding of haemoglobin switching. British journal of haematology 149(2):181-194.
14. Friedman RC, Farh KK, Burge CB, & Bartel DP (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome research 19(1):92-105.
41
42
15. Rogers JF (1984) Clinical delineation of proximal and distal partial 13q trisomy. Clin Genet 25(3):221-229.
16. Tharapel SA, Lewandowski RC, Tharapel AT, & Wilroy RS, Jr. (1986) Phenotype-karyotype correlation in patients trisomic for various segments of chromosome 13. J Med Genet 23(4):310-315.
17. Korbel JO, et al. (2009) The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies. Proc Natl Acad Sci U S A 106(29):12031-12036.