2001
Bio 4118 Applied BiostatisticsL12.1
Université d’Ottawa / University of Ottawa
Lecture 12: Generalized Linear Lecture 12: Generalized Linear Models (GLM)Models (GLM)
What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions
2001
Bio 4118 Applied BiostatisticsL12.2
Université d’Ottawa / University of Ottawa
What are General(ized) Linear What are General(ized) Linear ModelsModels
What are General(ized) Linear What are General(ized) Linear ModelsModels
GLMs are models of the form:
with Y, a vector of dependent variables, b, a vector of estimated coefficients, X, a vector of independent variables and e, a vector of error terms.
GLMs are models of the form:
with Y, a vector of dependent variables, b, a vector of estimated coefficients, X, a vector of independent variables and e, a vector of error terms.
Y bX e
Multivariate modelsMultivariate models
Simple linear regressionSimple linear regression
Multiple regressionMultiple regression
Analysis of variance(ANOVA)
Analysis of variance(ANOVA)
Analysis of covariance(ANCOVA)
Analysis of covariance(ANCOVA)
2001
Bio 4118 Applied BiostatisticsL12.3
Université d’Ottawa / University of Ottawa
Some GLM proceduresSome GLM proceduresSome GLM proceduresSome GLM procedures
ProcedureDependentvariable
Independent variable(s)
Simpleregression
1 continuous 1 continuous
SingleclassificationANOVA
1 continuous 1 categorical*
Multiple-classificationANOVA
1 continuous 2 or more categorical*
ANCOVA 1 continuousAt least 1 categorical*, atleast 1 continuous
Multipleregression
1 continuous 2 or more continuous
*either categorical or treated as a categorical variable
2001
Bio 4118 Applied BiostatisticsL12.4
Université d’Ottawa / University of Ottawa
When do we use When do we use ANCOVA?ANCOVA?
When do we use When do we use ANCOVA?ANCOVA?
to compare the relationship between a dependent (Y) and independent (X1) variable for different levels of one or more categorical variables (X2)
e.g. relationship between body mass (Y) and body size (X1) for different taxonomic groups (birds & mammals, X2)
to compare the relationship between a dependent (Y) and independent (X1) variable for different levels of one or more categorical variables (X2)
e.g. relationship between body mass (Y) and body size (X1) for different taxonomic groups (birds & mammals, X2)
Body size
Bo
dy
ma
ssBody size
2001
Bio 4118 Applied BiostatisticsL12.5
Université d’Ottawa / University of Ottawa
When do we use When do we use ANCOVA?ANCOVA?
When do we use When do we use ANCOVA?ANCOVA?
In doing comparisons, we assume that the qualitative form of the model is the same for all levels of the categorical variables...
…otherwise, one is comparing apples and oranges!
In doing comparisons, we assume that the qualitative form of the model is the same for all levels of the categorical variables...
…otherwise, one is comparing apples and oranges!
Level 1 of X2
Level 2 of X2
Y
Qualitativelysimilar models
X1
Y
Qualitativelydifferent models
2001
Bio 4118 Applied BiostatisticsL12.6
Université d’Ottawa / University of Ottawa
When do we use When do we use ANCOVA?ANCOVA?
When do we use When do we use ANCOVA?ANCOVA?
ANCOVA is used to compare linear models …
… although ANCOVA-like extensions have been developed for nonlinear models.
ANCOVA is used to compare linear models …
… although ANCOVA-like extensions have been developed for nonlinear models.
Level 1 of X2
Level 2 of X2
X1
Y
Non- linear models
X1
Y
Linear models
2001
Bio 4118 Applied BiostatisticsL12.7
Université d’Ottawa / University of Ottawa
The simple regression modelThe simple regression model The regression model
is:
So, all simple regression models are described by 2 parameters, the intercept () and slope (b).
=YX(slope)
(intercept)
iii XY
ObservedExpected
X X
Y
i
Xi
Yi
2001
Bio 4118 Applied BiostatisticsL12.8
Université d’Ottawa / University of Ottawa
Simple GLMsSimple GLMsSimple GLMsSimple GLMs
Two linear models may differ as follows:
differences in both intercepts () and slopes ()
different intercepts but the same slopes (ANCOVA model)
Two linear models may differ as follows:
differences in both intercepts () and slopes ()
different intercepts but the same slopes (ANCOVA model)
X1
Y
Different &
X1
Y
Different ,same
2001
Bio 4118 Applied BiostatisticsL12.9
Université d’Ottawa / University of Ottawa
Simple GLMsSimple GLMsSimple GLMsSimple GLMs
Two linear models may also differ as follows:
different slopes () but the same intercepts ()
same slopes and intercepts (common regression model)
Two linear models may also differ as follows:
different slopes () but the same intercepts ()
same slopes and intercepts (common regression model)
X1
Y
Same different
X1
Y
Same ,same
2001
Bio 4118 Applied BiostatisticsL12.10
Université d’Ottawa / University of Ottawa
Fitting GLMsFitting GLMsFitting GLMsFitting GLMs Proceeds in hierarchical
fashion fitting the most complex model first.
Evaluate significance of a term by fitting two models: one with the term in, the other with it removed.
Test for change in model fit ( MF) associated with removal of the term in question.
Proceeds in hierarchical fashion fitting the most complex model first.
Evaluate significance of a term by fitting two models: one with the term in, the other with it removed.
Test for change in model fit ( MF) associated with removal of the term in question.
Model A(term in)
Model B(term out)
MF
Delete term( small)
Retain term( large)
2001
Bio 4118 Applied BiostatisticsL12.11
Université d’Ottawa / University of Ottawa
Model fitting: evaluating the significance Model fitting: evaluating the significance of model termsof model terms
Model fitting: evaluating the significance Model fitting: evaluating the significance of model termsof model terms
Fit higher order model (hom) including all possible terms; retain SSresidual and MSresidual .
Fit reduced model (rm), retain SSresidual .
Test for significance of removed term by computing:
Fit higher order model (hom) including all possible terms; retain SSresidual and MSresidual .
Fit reduced model (rm), retain SSresidual .
Test for significance of removed term by computing:
Higher ordermodel
Reducedmodel
F
Delete term(p)
Retain term(p)
Fkresidual
rmresidual
residual
SS SSMS
( ) /hom
hom
1
2001
Bio 4118 Applied BiostatisticsL12.12
Université d’Ottawa / University of Ottawa
The full model with 2 The full model with 2 independent variablesindependent variablesThe full model with 2 The full model with 2
independent variablesindependent variables The full model is:
i is the slope of the regression of Y on X1 (the covariate) estimated for level i of the categorical variable X2 .
i is the difference between the mean of each level i of the categorical variable X2
and the overall mean.
The full model is:
i is the slope of the regression of Y on X1 (the covariate) estimated for level i of the categorical variable X2 .
i is the difference between the mean of each level i of the categorical variable X2
and the overall mean.
Y X Xij i i ij i ij ( )
Level 1 of variable X2
Level 2 of variable X2
Y1
Y2
X1 X2
1
1 j
X j1
X Xj1 1
2
1 2
2001
Bio 4118 Applied BiostatisticsL12.13
Université d’Ottawa / University of Ottawa
The full model : null The full model : null hypotheseshypotheses
The full model : null The full model : null hypotheseshypotheses
For the full model with 2 independent variables, there are 3 null hypotheses:
For the full model with 2 independent variables, there are 3 null hypotheses:
0:
constant,:
, allfor 0:
03
02
01
i
i
i
H
H
iH
Level 1 of variable X2
Level 2 of variable X2
Y1
Y2
X1 X2
1
1 j
X j1
X Xj1 1
2
1 2
2001
Bio 4118 Applied BiostatisticsL12.14
Université d’Ottawa / University of Ottawa
0:
constant,:
, allfor 0:
03
02
01
i
i
i
H
H
iH
0:
constant,:
, allfor 0:
03
02
01
i
i
i
H
H
iH
0:
constant,:
, allfor 0:
03
02
01
i
i
i
H
H
iH
Y
Y
Y
2001
Bio 4118 Applied BiostatisticsL12.15
Université d’Ottawa / University of Ottawa
Assumptions for full model Assumptions for full model hypothesis testinghypothesis testing
Residuals are independent and normally distributed.
Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity).
No error in independent variables Relationship between Y and covariate is
linear.
2001
Bio 4118 Applied BiostatisticsL12.16
Université d’Ottawa / University of Ottawa
ProcedureProcedureProcedureProcedure
Fit full model, test for differences among slopes.
If H02 rejected, run separate regressions for each level of categorical variable(s).
If H02 accepted, proceed to fit ANCOVA model.
Fit full model, test for differences among slopes.
If H02 rejected, run separate regressions for each level of categorical variable(s).
If H02 accepted, proceed to fit ANCOVA model.
H i02: constant
Level 1 of variable X2
Level 2 of variable X2
ANCOVASeparate
regressions
H02 accepted H02 rejected
X1
Y
2001
Bio 4118 Applied BiostatisticsL12.17
Université d’Ottawa / University of Ottawa
The full model is:
is the slope of the regression of Y on X1 (the covariate) pooled over levels of the categorical variable X2 .
i is the difference between the mean of each level i of the categorical variable X2 and the overall mean.
The full model is:
is the slope of the regression of Y on X1 (the covariate) pooled over levels of the categorical variable X2 .
i is the difference between the mean of each level i of the categorical variable X2 and the overall mean.
The ANCOVA model The ANCOVA model with 2 independent with 2 independent
variablesvariables
The ANCOVA model The ANCOVA model with 2 independent with 2 independent
variablesvariables
Y X Xij i ij i ij ( )
Level 1 of variable X2
Level 2 of variable X2
Y1
Y2
X1 X2
1
1 j
X j1
X Xj1 1
2
2001
Bio 4118 Applied BiostatisticsL12.18
Université d’Ottawa / University of Ottawa
The ANCOVA model: The ANCOVA model: null hypothesesnull hypotheses
The ANCOVA model: The ANCOVA model: null hypothesesnull hypotheses
For the ANCOVA model with 2 independent variables, there are 2 null hypotheses:
For the ANCOVA model with 2 independent variables, there are 2 null hypotheses:
0:
, allfor 0:
02
01
i
i
H
iH
Level 1 of variable X2
Level 2 of variable X2
Y1
Y2
X1 X2
1
1 j
X j1
X Xj1 1
2
2001
Bio 4118 Applied BiostatisticsL12.19
Université d’Ottawa / University of Ottawa
H i
Hi
i
01
02
0
0
: ,
:
for all
H i
Hi
i
01
02
0
0
: ,
:
for all
H i
Hi
i
01
02
0
0
: ,
:
for all
Y
Y
Y
2001
Bio 4118 Applied BiostatisticsL12.20
Université d’Ottawa / University of Ottawa
Assumptions for hypothesis testing Assumptions for hypothesis testing in ANCOVA modelin ANCOVA model
Residuals are independent and normally distributed.
Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity).
No error in independent variables Relationship between Y and covariate is linear. The slope of the regression of Y on X1 (the
covariate) is the same for all levels of the categorical variable X2 (not an assumption for full model!).
2001
Bio 4118 Applied BiostatisticsL12.21
Université d’Ottawa / University of Ottawa
Fit ANCOVA model; test for differences among intercepts.
If H01 rejected, do multiple comparisons to see which intercepts differ (if there are more than 2 levels for X2).
If H01 accepted, proceed to fit common regression model.
Fit ANCOVA model; test for differences among intercepts.
If H01 rejected, do multiple comparisons to see which intercepts differ (if there are more than 2 levels for X2).
If H01 accepted, proceed to fit common regression model.
ProcedureProcedureProcedureProcedure
H i01: constant
Level 1 of variable X2
Level 2 of variable X2
Commonregression
Multiplecomparisons
H01 accepted H01 rejected
X1
Y
2001
Bio 4118 Applied BiostatisticsL12.22
Université d’Ottawa / University of Ottawa
The model is:
is the slope of the regression of Y on X1 pooled over levels of the categorical variable X2 .
is the pooled intercept. is the pooled average of
X1.
The model is:
is the slope of the regression of Y on X1 pooled over levels of the categorical variable X2 .
is the pooled intercept. is the pooled average of
X1.
The common regression The common regression model with 2 model with 2
independent variablesindependent variables
The common regression The common regression model with 2 model with 2
independent variablesindependent variables
Y X Xij ij ij ( )
Level 1 of variable X2
Level 2 of variable X2
X
1 j
X j1
X Xj1
X
2001
Bio 4118 Applied BiostatisticsL12.23
Université d’Ottawa / University of Ottawa
The common regression The common regression model : null hypothesesmodel : null hypothesesThe common regression The common regression model : null hypothesesmodel : null hypotheses
For the common regression model, there are 2 null hypotheses:
For the common regression model, there are 2 null hypotheses:
H
H01
02
0: ,
: .
0
Level 1 of variable X2
Level 2 of variable X2
X
1 j
X j1
X Xj1
2001
Bio 4118 Applied BiostatisticsL12.24
Université d’Ottawa / University of Ottawa
Assumptions for hypothesis testing Assumptions for hypothesis testing in common regression modelin common regression model
Residuals are independent and normally distributed.
Residual variance is equal for all values of X.
No error in independent variable Relationship between Y and X is linear.
2001
Bio 4118 Applied BiostatisticsL12.25
Université d’Ottawa / University of Ottawa
Example 1: effects of sex and age on Example 1: effects of sex and age on sturgeon size at The Passturgeon size at The Pas
Example 1: effects of sex and age on Example 1: effects of sex and age on sturgeon size at The Passturgeon size at The Pas
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7LAGE
1.5
1.6
1.7
1.8
LFKL
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
Males Females
2001
Bio 4118 Applied BiostatisticsL12.26
Université d’Ottawa / University of Ottawa
AnalysisAnalysisAnalysisAnalysis
Log(forklength)(LFKL) is dependent variable; log(age) (LAGE) is the covariate, and sex (SEX$) is the categorical variable (2 levels).
Q1: is slope of regression of LFKL on LAGE the same for both sexes?
Log(forklength)(LFKL) is dependent variable; log(age) (LAGE) is the covariate, and sex (SEX$) is the categorical variable (2 levels).
Q1: is slope of regression of LFKL on LAGE the same for both sexes?
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7LAGE
1.5
1.6
1.7
1.8
LFKL
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
Females
Males
2001
Bio 4118 Applied BiostatisticsL12.27
Université d’Ottawa / University of Ottawa
Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Passturgeon at The Pas
Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Passturgeon at The Pas
2001
Bio 4118 Applied BiostatisticsL12.28
Université d’Ottawa / University of Ottawa
AnalysisAnalysisAnalysisAnalysis
Conclusion 1: slope of regression of LFKL on LAGE is the same for both sexes (accept H03 ) since p(SEX$*LAGE) > .05 .
Q2: is intercept the same for both males and females?
Conclusion 1: slope of regression of LFKL on LAGE is the same for both sexes (accept H03 ) since p(SEX$*LAGE) > .05 .
Q2: is intercept the same for both males and females?
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7LAGE
1.5
1.6
1.7
1.8
LFKL
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
Females
Males
2001
Bio 4118 Applied BiostatisticsL12.29
Université d’Ottawa / University of Ottawa
Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)sturgeon at The Pas (ANCOVA model)
Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)sturgeon at The Pas (ANCOVA model)
2001
Bio 4118 Applied BiostatisticsL12.30
Université d’Ottawa / University of Ottawa
AnalysisAnalysisAnalysisAnalysis
Conclusion 2: Intercept is the same for both males and females. H02 is accepted since p(SEX$ > 0.05), implying that…
…best model is common regression model.
Note that reduction in fit (R2) from full model to ANCOVA model is negligible (.697 to .696) indicating that deleting a model term has a negligible impact on model fit.
Conclusion 2: Intercept is the same for both males and females. H02 is accepted since p(SEX$ > 0.05), implying that…
…best model is common regression model.
Note that reduction in fit (R2) from full model to ANCOVA model is negligible (.697 to .696) indicating that deleting a model term has a negligible impact on model fit.
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7LAGE
1.5
1.6
1.7
1.8
LFKL
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
Females
Males
2001
Bio 4118 Applied BiostatisticsL12.31
Université d’Ottawa / University of Ottawa
Effects of sex and age on size of sturgeon Effects of sex and age on size of sturgeon at The Pas (common regression)at The Pas (common regression)
Effects of sex and age on size of sturgeon Effects of sex and age on size of sturgeon at The Pas (common regression)at The Pas (common regression)
2001
Bio 4118 Applied BiostatisticsL12.32
Université d’Ottawa / University of Ottawa
Example 2: Effect of location and Example 2: Effect of location and age on sturgeon sizeage on sturgeon size
Example 2: Effect of location and Example 2: Effect of location and age on sturgeon sizeage on sturgeon sizeLofW
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
Nelson
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
LFKL
LFKL
2001
Bio 4118 Applied BiostatisticsL12.33
Université d’Ottawa / University of Ottawa
AnalysisAnalysisAnalysisAnalysis
Log(forklength)(LFKL) is dependent variable; log(age) (LAGE)is the covariate, and location (SEX$) is the categorical variable (2 levels).
Q: is slope of regression of LFKL on LAGE the same at both locations?
Log(forklength)(LFKL) is dependent variable; log(age) (LAGE)is the covariate, and location (SEX$) is the categorical variable (2 levels).
Q: is slope of regression of LFKL on LAGE the same at both locations?
NelsonRiver
Lake ofthe Woods
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
LofW
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
Nelson
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
LFKL
2001
Bio 4118 Applied BiostatisticsL12.34
Université d’Ottawa / University of Ottawa
Effect of location and age on Effect of location and age on sturgeon sizesturgeon size
Effect of location and age on Effect of location and age on sturgeon sizesturgeon size
2001
Bio 4118 Applied BiostatisticsL12.35
Université d’Ottawa / University of Ottawa
AnalysisAnalysisAnalysisAnalysis
Conclusion: slope of regression of LFKL on LAGE is different at the two locations (reject H03 ) since p(LOCATION$*LAGE) < .05 .
So, should fit individual regressions for each location.
Conclusion: slope of regression of LFKL on LAGE is different at the two locations (reject H03 ) since p(LOCATION$*LAGE) < .05 .
So, should fit individual regressions for each location.
NelsonRiver
Lake ofthe Woods
LofW
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
Nelson
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
LofW
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
Nelson
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE
1.5
1.6
1.7
1.8
1.9
LFKL
LFKL
LFKL
2001
Bio 4118 Applied BiostatisticsL12.36
Université d’Ottawa / University of Ottawa
What do you do if?What do you do if?What do you do if?What do you do if?
More than 2 levels of categorical variable?
More than 2 levels of categorical variable?
Follow above procedure but if H03 (same slope) rejected, do pairwise contrasts of individual slopes.
If H03 accepted but H02 (same intercepts) rejected, do pairwise comparisons of intercepts.
Always control for experiment-wise Type I error rate.
Follow above procedure but if H03 (same slope) rejected, do pairwise contrasts of individual slopes.
If H03 accepted but H02 (same intercepts) rejected, do pairwise comparisons of intercepts.
Always control for experiment-wise Type I error rate.
Y
X
2001
Bio 4118 Applied BiostatisticsL12.37
Université d’Ottawa / University of Ottawa
What do you do if?What do you do if?What do you do if?What do you do if?
Biological hypothesis implies one-tailed null(s)?
Biological hypothesis implies one-tailed null(s)?
Follow above procedure but if H03 (same slope) rejected, do one-tailed pairwise contrasts of individual slopes.
If H03 accepted but H02 (same intercepts) rejected, do one-tailed pairwise comparisons of intercepts.
Follow above procedure but if H03 (same slope) rejected, do one-tailed pairwise contrasts of individual slopes.
If H03 accepted but H02 (same intercepts) rejected, do one-tailed pairwise comparisons of intercepts.
Y
X
2001
Bio 4118 Applied BiostatisticsL12.38
Université d’Ottawa / University of Ottawa
Power analysis Power analysis in GLMin GLM
Power analysis Power analysis in GLMin GLM
In any GLM, hypotheses are tested by means of an F-test.
Remember: the appropriate SSerror and dferror depends on the type of analysis and the hypothesis under investigation.
Knowing F, we can compute R2, the proportion of the total variance in Y explained by the factor (source) under consideration.
In any GLM, hypotheses are tested by means of an F-test.
Remember: the appropriate SSerror and dferror depends on the type of analysis and the hypothesis under investigation.
Knowing F, we can compute R2, the proportion of the total variance in Y explained by the factor (source) under consideration.
F
FR
df
df
SS
SS
dfSS
dfSS
MS
MSF
factor
error
error
factor
errorerror
factorfactor
error
factor
1
/
/
2
2001
Bio 4118 Applied BiostatisticsL12.39
Université d’Ottawa / University of Ottawa
Partial and total Partial and total RR22Partial and total Partial and total RR22
The total R2 (R2Y•B) is the
proportion of variance in Y accounted for (explained by) a set of independent variables B.
The partial R2 (R2Y•A,B- R2
Y•A ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed.
The total R2 (R2Y•B) is the
proportion of variance in Y accounted for (explained by) a set of independent variables B.
The partial R2 (R2Y•A,B- R2
Y•A ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed.
Proportion of varianceaccounted for by both A
and B (R2Y•A,B)
Proportion of variance
accounted for by A only
(R2Y•A)(total R2)
Proportion of variance accounted
for by Bindependent of A
(R2Y•A,B- R2
Y•A )(partial R2)
2001
Bio 4118 Applied BiostatisticsL12.40
Université d’Ottawa / University of Ottawa
Partial and total Partial and total RR22
Partial and total Partial and total RR22
The total R2 (R2Y•B) for
set B equals the partial R2 (R2
Y•A,B- R2Y•A ) for set
B if either (1) the total R2 for A (R2
Y•A) is zero; or (2) if A and B are independent (in which case R2
Y•A,B= R2Y•A +
R2Y•B).
The total R2 (R2Y•B) for
set B equals the partial R2 (R2
Y•A,B- R2Y•A ) for set
B if either (1) the total R2 for A (R2
Y•A) is zero; or (2) if A and B are independent (in which case R2
Y•A,B= R2Y•A +
R2Y•B).
Proportion of variance
accounted for by B
(R2Y•B)(total R2)
Proportion of variance
independent of A(R2
Y•A,B- R2Y•A )
(partial R2)
A
Y
B
A
Equal iff
Université d’Ottawa / University of Ottawa
L12.41 Bio 4118 Applied Biostatistics
2001
Partial and total Partial and total RR22Partial and total Partial and total RR22
In simple linear regression and single-factor ANOVA, there is only one independent variable X (either continuous or categorical).
In these cases, set B includes only one variable X and total R2 (R2
Y•B) = total R2 (R2Y•X) and the
partial and total R2 are the same.
In simple linear regression and single-factor ANOVA, there is only one independent variable X (either continuous or categorical).
In these cases, set B includes only one variable X and total R2 (R2
Y•B) = total R2 (R2Y•X) and the
partial and total R2 are the same.
X
Y
Water temperature (°C)
16 20 24 280.00
0.04
0.08
0.12
0.16
0.20
Gro
wth
ra
te
(c
m/d
ay)
2001
Bio 4118 Applied BiostatisticsL12.42
Université d’Ottawa / University of Ottawa
Partial and total Partial and total RR22Partial and total Partial and total RR22
In ANCOVA and multiple-factor ANOVA, there are several independent variables X1, X2, ... (either continuous or categorical), so set B includes several variables.
In this case, the total and partial R2 may be very different.
In ANCOVA and multiple-factor ANOVA, there are several independent variables X1, X2, ... (either continuous or categorical), so set B includes several variables.
In this case, the total and partial R2 may be very different.
X1
Y
pH = 6.5pH = 4.5
Water temperature (°C)16 20 24 28
0.00
0.04
0.08
0.12
0.16
0.20
Gro
wth
ra
te
(c
m/d
ay)
2001
Bio 4118 Applied BiostatisticsL12.43
Université d’Ottawa / University of Ottawa
Example: Partial and total Example: Partial and total RR2 2 in ANCOVAin ANCOVAExample: Partial and total Example: Partial and total RR2 2 in ANCOVAin ANCOVA
Two independent variables: X1 (continuous) and X2
(categorical)
Two independent variables: X1 (continuous) and X2
(categorical)
121
2
1
21
2,
22,
2
22
22
,2
,2
21 ,
XYXXYAYBAY
XYBY
XYAY
XXYBAY
RRRR
RR
RR
RR
XBXA
X1
Y
X2 = L1
X2 = L2
2001
Bio 4118 Applied BiostatisticsL12.44
Université d’Ottawa / University of Ottawa
Defining effect size in GLMDefining effect size in GLMDefining effect size in GLMDefining effect size in GLM
The effect size, denoted f2, is given by the ratio of the factor (source) R2
factor and 1 minus the appropriate error R2
error.
Note: both R2factor and
R2error depend on the
null hypothesis under investigation.
The effect size, denoted f2, is given by the ratio of the factor (source) R2
factor and 1 minus the appropriate error R2
error.
Note: both R2factor and
R2error depend on the
null hypothesis under investigation.
2
22
1 error
factor
R
Rf
2001
Bio 4118 Applied BiostatisticsL12.45
Université d’Ottawa / University of Ottawa
Effects of sex and age on size of sturgeon Effects of sex and age on size of sturgeon at The Pas (common regression)at The Pas (common regression)
Effects of sex and age on size of sturgeon Effects of sex and age on size of sturgeon at The Pas (common regression)at The Pas (common regression)
2001
Bio 4118 Applied BiostatisticsL12.46
Université d’Ottawa / University of Ottawa
Defining effect size in GLM: case 1Defining effect size in GLM: case 1Defining effect size in GLM: case 1Defining effect size in GLM: case 1
Case 1: a set B is related to Y, and the total R2 (R2
Y•B) is determined. The error variance
proportion is then 1- R2
Y•B .
H0: R2Y•B = 0
Example: effect of age on sturgeon size at The Pas
B = {LAGE}
Case 1: a set B is related to Y, and the total R2 (R2
Y•B) is determined. The error variance
proportion is then 1- R2
Y•B .
H0: R2Y•B = 0
Example: effect of age on sturgeon size at The Pas
B = {LAGE}
23.2690.1
690.
11 2
2
2
22
LAGE
LAGE
error
factor
R
R
R
Rf
2001
Bio 4118 Applied BiostatisticsL12.47
Université d’Ottawa / University of Ottawa
Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Passturgeon at The Pas
Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Passturgeon at The Pas
2001
Bio 4118 Applied BiostatisticsL12.48
Université d’Ottawa / University of Ottawa
Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)sturgeon at The Pas (ANCOVA model)
Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)sturgeon at The Pas (ANCOVA model)
2001
Bio 4118 Applied BiostatisticsL12.49
Université d’Ottawa / University of Ottawa
Defining effect size in GLM: case 2Defining effect size in GLM: case 2Defining effect size in GLM: case 2Defining effect size in GLM: case 2 Case 2: the proportion of
variance of Y due to B over and above that due to A is determined (R2
Y•A,B- R2Y•A ).
The error variance proportion is then 1- R2
Y•A,B . H0: R2
Y•A,B- R2Y•A = 0
Example: effect of SEX$*LAGE on sturgeon size at The Pas
B ={SEX$*LAGE}, A,B = {SEX$, LAGE, SEX$*LAGE}
Case 2: the proportion of variance of Y due to B over and above that due to A is determined (R2
Y•A,B- R2Y•A ).
The error variance proportion is then 1- R2
Y•A,B . H0: R2
Y•A,B- R2Y•A = 0
Example: effect of SEX$*LAGE on sturgeon size at The Pas
B ={SEX$*LAGE}, A,B = {SEX$, LAGE, SEX$*LAGE}
003.697.1
.696.697.
1 2}*$,$,{
2}$,{
2}*$,$,{
2
LAGESEXLAGESEX
LAGESEX
LAGESEXLAGESEX
R
R
R
f
2001
Bio 4118 Applied BiostatisticsL12.50
Université d’Ottawa / University of Ottawa
Determining powerDetermining powerDetermining powerDetermining power Once f2 has been
determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non-central F parameter .
Knowing and factor (source) (1) and error (2) degrees of freedom, we can determine power from appropriate tables for given .
Once f2 has been determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non-central F parameter .
Knowing and factor (source) (1) and error (2) degrees of freedom, we can determine power from appropriate tables for given .
= .05)
= .01)
Decreasing 2
1-
1 = 2
= .05
2 3 4 5
= .01
1 1.5 2 2.5
)1( 212 f
2001
Bio 4118 Applied BiostatisticsL12.51
Université d’Ottawa / University of Ottawa
Example: effect of pH and nutrient Example: effect of pH and nutrient levels on growth rate of basslevels on growth rate of bass
Example: effect of pH and nutrient Example: effect of pH and nutrient levels on growth rate of basslevels on growth rate of bass
Sample of 35 lakes 3 pH levels: acid,
circumneutral, basic For each lake, an estimate of
growth rate is obtained (e.g. from size-age regression).
What is probability of detecting a true effect size as large as the sample effect size for pH*N once effects of N and pH have been controlled for, given = .05?
Sample of 35 lakes 3 pH levels: acid,
circumneutral, basic For each lake, an estimate of
growth rate is obtained (e.g. from size-age regression).
What is probability of detecting a true effect size as large as the sample effect size for pH*N once effects of N and pH have been controlled for, given = .05?
Variable df p
pH 2 0.15
Nutrient (N) 1 <.01
pH*N 2 0.20
Error 29
R2{pH, N, pH*N} 0.44
R2{pH, N } 0.36
R2{N} 0.27
2001
Bio 4118 Applied BiostatisticsL12.52
Université d’Ottawa / University of Ottawa
Example: effect of pH and nutrient Example: effect of pH and nutrient levels on growth rate of basslevels on growth rate of bass
Example: effect of pH and nutrient Example: effect of pH and nutrient levels on growth rate of basslevels on growth rate of bass
Sample effect size f2 for pH once effects of N and pH*N have been controlled for = 0.14
Source (pH) df = 1 = 2; error df = 2 = 35 - 2 - 2- 1 - 1 = 29
Use tables of based on R2 to get power (NOT the same tables as for ANOVA).
Sample effect size f2 for pH once effects of N and pH*N have been controlled for = 0.14
Source (pH) df = 1 = 2; error df = 2 = 35 - 2 - 2- 1 - 1 = 29
Use tables of based on R2 to get power (NOT the same tables as for ANOVA).
),(..)(.
)(f
..
...R
RRf
}N*pH,N,pH{
}pH,N{}N*pH,N,pH{
21
212
2
222
, ,given tables,from441484129214
1
144413644
1