Stat 504, Lecture 16 1 ✬ ✫ ✩ ✪ Introduction to log-linear models Key Concepts: • Benefits of models • Two-way Log-linear models • Parameters Constraints, Estimation and Interpretation • Inference for log-linear models Objectives: • Understand the structure of the log-linear models in two-way tables • Understand the concepts of independence and associations described via log-linear models in two-way tables
27
Embed
Introduction to log-linear modelspersonal.psu.edu/abs12//stat504/Lecture/lec16.pdfcell counts with the log-linear model of independence and ask if this model fits well. Stat 504,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Stat 504, Lecture 16 1'
&
$
%
Introduction to
log-linear models
Key Concepts:
• Benefits of models
• Two-way Log-linear models
• Parameters Constraints, Estimation and
Interpretation
• Inference for log-linear models
Objectives:
• Understand the structure of the log-linear models
in two-way tables
• Understand the concepts of independence and
associations described via log-linear models in
two-way tables
Stat 504, Lecture 16 2'
&
$
%
Useful Links:
• The CATMOD procedure in SAS:http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/catmod_index.htm
• The GENMOD procedure in SAS:http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/genmod_index.htm
• The SAS source on log-linear model analysishttp://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/
catmod_sect30.htm#stat_catmod_catmodllma
• Fitting Log-linear models in Rhttp://stat.ethz.ch/R-manual/R-patched/library/stats/html/loglin.html
• Fitting Log-linear models in R via generalizedlinear models (glm())http://spider.stat.umn.edu/R/library/stats/html/glm.html
Readings:
• Agresti (2002) Ch. 8, 9
• Agresti (1996) Ch. 6, 7
Stat 504, Lecture 16 3'
&
$
%
Benefits of models over significance tests
Thus far our focus has been on describing interactions
or associations between two or three categorical
variables mostly via single summary statistics and
with significance testing.
Models can handle more complicated situation, and
analyze the simultaneous effects of multiple variables,
including mixtures of categorical and continuous
variables.
For example, the Breslow-Day statistics only works
for 2× 2 × K tables, while log-linear models will allow
us to test of homogenous associations in I × J × K
and higher-dimensional tables.
The structural form of the model describes the
patterns of interactions and associations. The model
parameters provide measures of strength of
associations.
Stat 504, Lecture 16 4'
&
$
%
In models, the focus is on estimating the model
parameters. The basic inference tools (e.g., point
estimation, hypothesis testing, and confidence
intervals) will be applied to the these parameters.
When discussing models, we will keep in mind
• Objective
• Model structure (e.g. variables, formula,
equation)
• Model assumptions
• Parameter estimates and interpretation
• Model fit (e.g. goodness-of-fit tests and statstics)
• Model selection
Stat 504, Lecture 16 5'
&
$
%
For example, recall a simple linear regression model
• Objective: model the expected value of a
continuous variable, Y , as a linear function of the
continuous predictor, X, E(Yi) = β0 + β1xi
• Model structure: Yi = β0 + β1xi + ei
• Model assumptions: Y is is normally distributed,
ei ∼ N(0, σ2), and independent, and X is fixed,
and constant variance σ2.
• Parameter estimates and interpretation: β̂0 is
estimate of β0 or the intercept, and β̂1 is estimate
of the slope, etc... What is the interpretation of
the slope?
• Model fit: R2, residual analysis, F-statistic
• Model selection
See handout labeled as Lec16LinRegExample.doc on
modeling average water usage given the amount of
bread production:
Water = 2273 + 0.0799 Production
Stat 504, Lecture 16 6'
&
$
%
Two-way ANOVA
Does the amount of sunlight and watering affect the
growth of geraniums?
Objective: model the continuous response as function
of two factors.
Model structure: Yijk = µ + αi + βj + γij + eijk with
eijk ∼ N(0, σ2), i = 1, ..., I, j = 1, ...., J , and
k = 1, ..., nij
Model assumptions: At each combination of levels the
outcome is normally distributed with the same
variance: yijk ∼ N(µij , σ2), where
µij = E(yijk) = µ + αi + βj + γij
Stat 504, Lecture 16 7'
&
$
%
This model is over-parametrized because term γij
already has I × J parameters corresponding to the
cell means µij . The constant, µ, and the main effects,
αi and βj give us additional 1 + I + J parameters.
We use constraints such asP
i αi =P
j βj =P
i
P
j γij = 0, to deal with this
overparametrization.
Does level of watering affect the growth of potted
geraniums? (Is there a significant main effect for
factor A?, e.g. H0 : αi = 0 for all i)
Does level of sunlight affect the growth of potted
geraniums? (Is there a significant main effect for
factor B?)
Does the effect of level of sunlight depend on level of
watering? (Is there a significant interaction between
factors A and B?)
Stat 504, Lecture 16 8'
&
$
%
Analysis of Variance for YIELD
Source DF SS MS F P
WATER 1 342.3 342.3 24.02 0.000
SUNLIGHT 1 20.3 20.3 1.42 0.256
Interaction 1 132.3 132.3 9.28 0.010
Error 12 171.0 14.3
Total 15 665.8
Individual 95% CI
WATER Mean ------+---------+---------+---------+-----
HIGH 22.0 (------*------)
LOW 12.8 (------*------)
------+---------+---------+---------+-----
12.0 16.0 20.0 24.0
Individual 95% CI
SUNLIGHT Mean ----+---------+---------+---------+-------
HIGH 18.5 (--------------*-------------)
LOW 16.3 (-------------*--------------)
----+---------+---------+---------+-------
14.0 16.0 18.0 20.0
Stat 504, Lecture 16 9'
&
$
%
Two-way Log-Linear Model
Now let µij be the expected counts, E(nij), in an
I × J table. An analogous model to two-way ANOVA
is
log(µij) = µ + αi + βj + γij
or in the notation used by Agresti
log (µij) = λ + λAi + λ
Bj + λ
ABij
with constraints:P
i λi =P
j λj =P
i
P
j λij = 0, to
deal with overparametrization.
Log-linear models specify how the cell counts depend
on the levels of categorical variables. They model the
association and interaction patterns among
categorical variables.
The log-linear modeling is natural for Poisson,
Multinomial and Product-Mutlinomial sampling.
They are appropriate when there is no clear
distinction between response and explanatory
variables, or there are more than two responses.
Stat 504, Lecture 16 10'
&
$
%
Example: General Social Survey
Cross-classification of respondents according to
choice for the president in 1992 presidental election
(Bush, Clinton, Perot) and political view on the 7
point scale (extremely liberal, liberal, slightly liberal,