Functional Mapping Functional Mapping A statistical model for mapping dynamic A statistical model for mapping dynamic genes genes
Feb 04, 2016
Functional MappingFunctional MappingA statistical model for mapping dynamic genesA statistical model for mapping dynamic genes
Simple regression model for univariate trait
Phenotype = Genotype + Error yi = xij + ei xi is the indicator for QTL genotype
j is the mean for genotype jei ~ N(0, 2)
Recall: Interval mapping for a univariate trait
! QTL genotype is unobservable (missing data)
A simulation example (F2)
Trait distribution
Trait value
Freq
uenc
y
0 5 10 15
010
020
030
040
050
0
Overall trait distribution
The overall trait distribution is composed of three distributions, each one coming fromone of the three QTL genotypes, QQ, Qq, and qq.
Solution: consider a finite mixture model
m-a m+d m+a
Trait
m-a m+d m+a
Trait
With QQ=m+a, Qq=m+d, qq=m-a
We use finite mixture model forestimating genotypic effects (F2)
yi ~ p(yi|,) = 2|if2(yi) + 1|i f1(yi) + 0|i f0(yi)QTL genotype (j) QQ Qq qq Code 2 1 0
fj(yi) is a normal distribution densitywith mean j and variance 2
= (2, 1, 0)= QTL conditional probability given on flanking markers
where
Subject
Marker (M) Conditional probability
M1 M2 … Mm
Phenoty
pe (y)
of QTL genotypeQQ(2) Qq(1) qq(0)
1 AA(2) BB(2) … y1
2|1
1|1
0|1
2 AA(2) BB(2) ... y2
2|2
1|2
0|23 Aa(1) Bb(1) ... y
3
2|3
1|3
0|34 Aa(1) Bb(1) ... y4
2|4
1|4
0|45 Aa(1) Bb(1) ... y
5
2|5
1|5
0|56 Aa(1) bb(0) ... y6
2|6
1|6
0|67 aa(0) Bb(1) ... y
7
2|7
1|7
0|78 aa(0) bb(0) … y8
2|8
1|8
0|8
Data Structure
Human Development
Robbins 1928, Human Genetics, Yale University Press
Tree growth
Looks mess, but there are simple rules underlying the complexity.
The dynamics of gene expression• Gene expression displays in a dynamic fashion
throughout lifetime.• There exist genetic factors that govern the
development of an organism involving:– Those constantly expressed throughout the lifetime (called
deterministic genes)– Those periodically expressed (e.g., regulation genes)
• Also environment factors such as nutrition, light and temperature.
• We are interested in identifying which gene(s) govern(s) the dynamics of a developmental trait using a procedure called Functional Mapping.
Stem diameter growth in poplar trees
Ma et al. (2002) Genetics
Poplar tree - height & diameter
Mouse growth
1 2 3 4 5 6 7 8 9 100
5
10
15
20
25
30
35
40
45
50
Week
Wei
ght
A: male; B: female
Developmental Pattern of Genetic Effects
Wu and Lin (2006) Nat. Rev. Genet.
QQQq
QQQq
Sample
Marker (M) Phenotype (y) Conditional probability
1 2 … m
t1 t2 … tT
of QTL genotypeQQ(2) Qq(1) qq(0)
1 2 2 … y1(1) y
1(2) … y
1(T)
2|1
1|1
0|1
2 2 2 ... y2(1) y
2(2) … y
2(T)
2|2
1|2
0|23 1 1 … y
3(1) y
3(2) … y
3(T)
2|3
1|3
0|3
4 1 1 … y4(1) y
4(2) … y
4(T)
2|4
1|4
0|4
5 1 1 … y5(1) y
5(2) … y
5(T)
2|5
1|5
0|5
6 1 0 … y6(1) y
6(2) … y
6(T)
2|6
1|6
0|6
7 0 1 … y7(1) y
7(2) … y
7(T)
2|7
1|7
0|7
8 0 0 ... y8(1) y
8(2) … y
8(T)
2|8
1|8
0|8
Data StructureParents AA aa
F1 Aa Aa
F2 AA Aa aa ¼ ½ ¼
Mapping methods for dynamic traits• Traditional approach: treat traits measured at each time point as a
univariate trait and do mapping with traditional QTL mapping approaches such as interval or composite interval mapping.
• Limitations: – Single trait model ignores the dynamics of the gene expression
change over time, and is too simple without considering the underlying biological developmental principle.
• A better approach: Incorporate the biological principle into a mapping procedure to understand the dynamics of gene expression using a procedure called Functional Mapping (pioneered by Wu and group).
A general framework pioneered by Dr. Wu and his colleagues, to map QTLs that affect the pattern and form of development in time course- Ma et al., Genetics 2002- Wu et al., Genetics 2004 (highlighted in Nature
Reviews Genetics)- Wu and Lin, Nature Reviews Genetics 2006
While traditional genetic mapping is a combination between classic genetics and statistics, functional mapping combines genetics, statistics and biological principles.
Functional Mapping (FunMap)
Data structure for an F2 population
Phenotype Marker_______________________________ ________________________________________
Sample y(1) y(2) … y(T) 1 2 … m_____________________________________________________________________________________1 y11 y21 … yT1 1 1 … 02 y12 y22 … yT2 -1 1 … 13 y13 y23 … yT3 -1 0 … 14 y14 y24 … yT4 1 -1 … 05 y15 y25 … yT5 1 1 … -16 y16 y26 … yT6 1 0 … -17 y17 y27 … yT7 0 -1 … 08 y18 y28 … yT8 0 1 … 1
n y1n y2n … yTn 1 0 … -1
There are nine groups of two-marker genotypes, 22, 21, 20, 12, 11, 10, 02, 01 and 00, with sample sizes n22, n21, …, n00; The conditional probabilities of QTL genotypes, QQ (2), Qq (1) and qq (0) given these marker genotypes 2i, 1i, 0i.
Univariate interval mapping
L(y) =
fj(yi) = j=2,1,0 for QQ, Qq, qq The Lander-Botstein model estimates (2, 1, 0, 2, QTL position)
Multivariate interval mapping L(y) =
Vector y = (y1, y2, …, yT)
fj(yi) =
Vectorsuj = (j1, j2, …, jT) Residual variance-covariance matrix =
The unknown parameters: (u2, u1, u0, , QTL position) [3T + T(T-1)/2 +T parameters]
n
iiiiiii yfyfyf
1001122 )()()(
}exp{21
2
2
2
)(
jiy
n
1ii0i0i1i1i2i2 )(f)(f)(f yyy
€
1
(2)T/ 2 1/ 2 ex{−
12 (yi −u j)
T −1 (yi −u j)}
21
121
TT
T
Functional mapping: the framework
Observed phenotype: yi = [yi(1), …, yi(T)] ~ MVN(uj, )Mean vector: uj = [μj(1), μj(2), …, μj(T)], j=2,1,0
)()2,()1,(
),2()2()1,2(),1()2,1()1(
2
2
2
TTT
TT
Σ (Co)variance matrix:
An innovative model for genetic dissection of complex traits by incorporating mathematical aspects of biological principles into a mapping framework
Functional Mapping
Provides a tool for cutting-edge research at the interplay between gene action and development
Functional mapping does not estimate (u2, u1, u0, ) directly, instead of the biologically meaningful parameters.
The Finite Mixture Model
€
L( ,Θ ,Θq|M,y)
2|i f2 (yi) 1|i f1(yi) 0|i f0 (yi) i1
n
∏
Modeling mixture proportions, i.e., genotype frequencies at a putative QTL
Modeling the mean vectorModeling the (co)variance matrix
Three statistical issues:
Modeling the developmental Mean Vector
• Parametric approach Growth trajectories – Logistic curve HIV dynamics – Bi-exponential function Biological clock – Van Der Pol equation Drug response – Emax model
• Nonparametric approach Lengedre function (orthogonal polynomial)
Spline techniques
Example: Stem diameter growth in poplar trees
Ma, et al.Genetics2002
Modeling the genotype-dependent mean vector,uj = [uj(1), uj(2),…, uj(T)]
= [ , , …, ]
Instead of estimating mj, we estimate curveparametersΘp = (aj, bj, rj)jr
j
j
eb
a1 jr
j
j
eb
a21 jTr
j
j
eb
a1
Number of parameters to be estimated in the mean vectorTime points Traditional approach Our approach 5 3 5 = 15
3 3 = 910 3 10 = 30 3 3 = 950 3 50 = 150
3 3 = 9
Logistic Curve of Growth – A Universal Biological Law (West et al.: Nature 2001)
Modeling the Covariance MatrixStationary parametric approach
Autoregressive (AR) model with log transformation
Nonstationary parameteric approachStructured antedependence (SAD) modelOrnstein-Uhlenbeck (OU) process
=
1
11
1
321
32
2
12
TTT
T
T
T
Functional interval mapping L(y) = Vector y = (y1, y2, …, yk) f2(yi) =
f1(yi) =
f0(yi) = u2 = ( , ,…, )
u1 = ( , , …, )
u0 = ( , , …, )
€
1
(2)k/ 2 1/ 2 ex{−
12 (yi −u 2 )
T−1 (yi −u 2 )}
€
1
(2)k/ 2 1/ 2 ex{−
12 (yi −u1 )
T −1(yi −u1)}
n
1ii0i0i1i1i2i2 )(f)(f)(f yyy
2r2
2
eb1a
2r22
2
eb1a
1r1
1
eb1a
1r21
1
eb1a
0r0
0
eb1a
0r20
0
eb1a
22
2
1 Treba
11
1
1 Treba
00
0
1 Treba
€
1
(2)k/ 2 1/ 2 ex{−
12 (yi −u 0 )
T−1(yi −u 0)}
n
i jijijqp yfyL
1
2
0| )(log),|,(log MΘΘ
Θ
n
i jij
qp
ij
ijij
n
iij
qp
ij
ijjj ijij
ijij
n
i jj ijij
ijij
j ijij
ij
qp
yf
yfyf
yf
yf
yf
yf
yf
yML
qp
ij
1
2
0
|
||
1
|
|
2
02
0 |
|
1
2
02
0 |
|
2
0 |
)(log1
)(log1)(
)(
)(
)(
)(
)(
),|,(log
|
ΘΘ
ΘΘ
ΘΘ
ΘΘ
Estimation
The EM algorithm
€
j|i j|i fj(yi)
′ j |i f ′ j (yi)′ j 0
2∑
M step 0)|,( yL qp ΘΘ
E step
Iterations are made between the E and M steps until convergence
Calculate the posterior probability of QTL genotype j for individual i that carries a known marker genotype
Solve the log-likelihood equations
EM continuedThe likelihood function:
))}ulog(())ulog((exp{)2(
1)( 1'21
2/12/ jijiTij zzzf
n
iiiiiiiii zfzfzfzfL
1000101101022 )()()()()(
€
u j(aj
1 bje−j
,aj
1 bje−2j
,...,aj
1 bje−Tj
)
Statistical DerivationsM-step: update the parameters (see Ma et al. 2002, Genetics for details)
Testing QTL effect: Global test• Instead of testing the mean difference at every time points for
different genotypes, we test the difference of the curve parameters.
• The existence of QTL is tested by
• H0 means the three mean curves overlap and there is no QTL effect.
• Likelihood ratio test with permutation to assess significance.
where the notation “~” and “^” indicate parameters estimated under the null and the alternative hypothesis, respectively.
)](log)~([log2
LLLR
Testing QTL effect: Regional test• Regional test: to test at which time period [t1,t2] the
detect QTL triggers an effect, we can test the difference of the area under the curve (AUC) for different QTL genotype, i.e.,
where
• Permutation tests can be applied to assess statistical significance.
2
1
2
1
2
1|||:0ttaa
ttAa
ttAA AUCAUCAUCH
)]log()[log(
3,2,11
|
12
2
1
2
1
trj
trj
j
j
t
t trj
jttj
jj
j
ebebra
jdteb
aAUC
Applications• Several real examples are used to show the
utility of the functional mapping approach.• Application I is about a poplar growth data set.• Application II is about a mouse growth data set.• Application III is about a rice tiller number
growth data set.
Application I: A Genetic Studyin Poplars
Parents AA aa
F1 Aa AA
BC AA Aa ½ ½
Genetic design
Stem diameter growth in poplar trees
Ma, Casella & Wu, Genetics 2002
rtbeatg
1
)(
a:Asymptotic growth
b:Initial growth
r:Relative growth rate
Differences in growth across agesUntransformed Log-transformed
Poplardata
Modeling the covariance structureStationary parametric approach First-order autoregressive model (AR(1))
Multivariate Box-Cox transformation to stabilize variance (Box and Cox, 1964
Transform-both-side (TBS) technique to reserve the interpretability of growth parameters (Carrol and Ruppert, 1984; Wu et al., 2004). For a log transformation (i.e., =0),
1
11
21
2
1
2
TT
T
T
Σ Θq = (,2)
Functional mapping incorporated by logistic curves and AR(1) model
QTL
Results by FunMap
Results by Interval mapping
FunMap has higher power to detect the QTL than the traditional interval mapping method does.
Ma, Casella & Wu, Genetics 2002
Application II: Mouse Genetic StudyDetecting Growth Genes
Data supplied by Dr. Cheverud at Washington University
Mouse Linkage Map
Body Mass Growth for Mouse
1 2 3 4 5 6 7 8 9 100
5
10
15
20
25
30
35
40
45
50
Week
Wei
ght
510 individuals measuredOver 10 weeks
Parents AA aa
F1 Aa Aa
F2 AA Aa aa ¼ ½ ¼
Functional mappingGenetic control of body mass growth in mice
Zhao, Ma, Cheverud & Wu, Physiological Genomics2004
Application III: functional mapping of PCD QTL
• Rice tiller development is thought to be controlled by genetic factors as well as environments.
• The development of tiller number growth undergoes a process called programmed cell death (PCD).
Parents AA aa
F1 Aa
DH AA aa ½ ½
Genetic design
Joint model for the mean vector• We developed a joint modeling approach with growth and
death phases are modeled by different functions.
• The growth phase is modeled by logistic growth curve to fit the universal growth law .
• The dead phase is modeled by orthogonal Legendre function to increase the fitting flexibility.
Cui et al. (2006) Physiological Genomics
QTL trajectory plot
Advantages of Functional Mapping• Incorporate biological principles of growth and development
into genetic mapping, thus, increasing biological relevance of QTL detection
• Provide a quantitative framework for hypothesis tests at the interplay between gene action and developmental pattern- When does a QTL turn on?- When does a QTL turn off?- What is the duration of genetic expression of a QTL?- How does a growth QTL pleiotropically affect developmental events?
• The mean-covariance structures are modeled by parsimonious parameters, increasing the precision, robustness and stability of parameter estimation
Functional Mapping:toward high-dimensional biology• A new conceptual model for genetic
mapping of complex traits• A systems approach for studying
sophisticated biological problems• A framework for testing biological
hypotheses at the interplay among genetics, development, physiology and biomedicine
Functional Mapping:Simplicity from complexity
• Estimating fewer biologically meaningful parameters that model the mean vector,
• Modeling the structure of the variance matrix by developing powerful statistical methods, leading to few parameters to be estimated,
• The reduction of dimension increases the power and precision of parameter estimation