Loglinear Models and Mosaic Displays Michael Friendly Psych 6136 October 5, 2017 A B C D E F Male Female Admitted Rejected Model: (DeptGender)(Admit) -4.2 4.2 4.2 -4.2 A B C D E F Male Female Admitted Rejected Model: (DeptGender)(DeptAdmit) Admit Male Female Admit Reject A B C D E F Admit Reject Admit Reject Male Female Gender A B C D E F Male Female Admit Reject A B C D E F Male Female A B C D E F Dept
71
Embed
Loglinear Models and Mosaic Displayseuclid.psych.yorku.ca/www/psy6136/lectures/Loglin-1up.pdf · n-way tables Mosaic displays: Basic ideas Mosaic displays: Basic ideas Hartigan and
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Loglinear Models and Mosaic Displays
Michael Friendly
Psych 6136
October 5, 2017
A
B
C
D
E
F
Male Female Admitted Rejected
Model: (DeptGender)(Admit)
-4.2 4.2 4.2 -4.2A
B
C
D
E
F
Male Female Admitted Rejected
Model: (DeptGender)(DeptAdmit)
Admit
Male Female
Adm
it
Rej
ect
A B C D E F
Adm
it
Rej
ect
Admit Reject
Mal
e
Fem
ale
Gender
A B C D E F
Mal
e
Fem
ale
Admit Reject
A
B
C
D
E
F
Male Female
A
B
C
D
E
F
Dept
n-way tables Mosaic displays: Basic ideas
Mosaic displays: Basic ideas
Hartigan and Kleiner (1981), Friendly (1994, 1999)
Area-proportional display offrequencies in an n-way table
Michael Friendly (York University) Visualizing Categorical Data VCD, 2012 134 / 350
n-way tables Mosaic displays: Basic ideas
Mosaic displays: Basic ideas
Independence: Two-way table
Expected frequencies:
m̂ij =ni+n+j
n++= n++row %col %
⇒ rows & columns align whenvariables are independent
Michael Friendly (York University) Visualizing Categorical Data VCD, 2012 135 / 350
n-way tables Mosaic displays: Basic ideas
Mosaic displays: Residuals & shading
Pearson residuals:
dij =nij − m̂ij√
m̂ij
Pearson χ2 = ΣΣd2ij = ΣΣ
(nij−m̂ij )2
m̂ij
Other residuals: deviance (LR),Freeman-Tukey (FT), adjusted(ADJ), ...
Shading:
Sign: − negative in red; +positive in blueMagnitude: intensity of shading:|dij | > 0, 2, 4, . . .
⇒ Independence: rows align, orcells are empty!
Michael Friendly (York University) Visualizing Categorical Data VCD, 2012 136 / 350
Overview Mosaic displays
Mosaic displays: AnimationA 3× 2 table, of answers to a question (Yes, ?, No), by sex.Marginal proportions of answers is fixed at (.40, .25, .35)Proportion of M, F is varied from frame to frame
7 / 1
Overview Loglinear models
Loglinear models: Perspectives
Loglinear approach
Loglinear models were first developed as an analog of classical ANOVAmodels, where multiplicative relations (under independence) are re-expressedin additive form as models for log(frequency).
log mij = µ+ λAi + λB
j ≡ [A][B] ≡∼ A + B
This expresses the model of independence for a two-way table (no A*Bassociation)The notations [A][B] ≡∼ A + B are shorthandsFit using MASS:loglm()
loglm(Freq A + B + C, data=) loglm(Freq A + B * C,data=)
8 / 1
Overview Loglinear models
Loglinear models: Perspectives
GLM approach
More generally, loglinear models are also generalized linear models (GLMs)for log(frequency), with a Poisson distribution for the cell counts.
log m = Xβ
This looks just like the general linear ANOVA, regression model, but forlog frequencyThis approach allows quantitative predictors and special ways of treatingordinal factorsFit using glm(), with family=poisson→ a model for log(Freq)
glm(Freq ˜ A + B + C, family = poisson)glm(Freq ˜ A + B * C, family = poisson)
9 / 1
Overview Loglinear models
Loglinear models: Perspectives
Logit models
When one table variable is a binary response, a logit model for that responseis equivalent to a loglinear model (as discussed later).
log(m1jk/m2jk ) = α + βBj + βC
k ≡ [AB][AC][BC]
log(m1jk/m2jk ) represents the log odds of response category 1 vs. 2The model formula includes only terms for the effects on A of variables Band CThe equivalent loglinear model is [AB] [AC] [BC]The logit model assumes [BC] association, and [AB]→ βB
j , [AC]→ βCk
Fit usingglm(outcome=="survived" ˜ B + C, family=binomial
10 / 1
Overview Loglinear models
Loglinear models: Overview
Two-way tables: Loglinear approach
For two discrete variables, A and B, suppose a multinomial sample of totalsize n over the IJ cells of a two-way I × J contingency table, with cellfrequencies nij , and cell probabilities πij = nij/n.
The table variables are statistically independent when the cell (joint)probability equals the product of the marginal probabilities,Pr(A = i & B = j) = Pr(A = i)× Pr(B = j), or,
πij = πi+π+j .
An equivalent model in terms of expected frequencies, mij = nπij is
mij = (1/n) mi+ m+j .
This multiplicative model can be expressed in additive form as a modelfor log mij ,
log mij = − log n + log mi+ + log m+j . (1)
11 / 1
Overview Loglinear models
Loglinear models: Overview
Independence model
By anology with ANOVA models, the independence model (??) can beexpressed as
log mij = µ+ λAi + λB
j , (2)
µ is the grand mean of log mij
the parameters λAi and λB
j express the marginal frequencies of variablesA and B — “main effects”typically defined so that
∑i λ
Ai =
∑j λ
Bj = 0 as in ANOVA
12 / 1
Overview Loglinear models
Loglinear models: Overview
Saturated modelDependence between the table variables is expressed by adding associationparameters, λAB
ij , giving the saturated model ,
log mij = µ+ λAi + λB
j + λABij ≡ [AB] ≡∼ A ∗ B . (3)
The saturated model fits the table perfectly (m̂ij = nij ): there are as manyparameters as cell frequencies. Residual df = 0.A global test for association tests H0 : λAB
ij = 0.If reject H0, which λAB
ij 6= 0 ?For ordinal variables, the λAB
ij may be structured more simply, giving testsfor ordinal association.
13 / 1
Overview Loglinear models
Example: Independence
Generate a table of Education by Party preference, strictly independent
In the GLM approach, the vector of cell frequencies, n = {nij} is specified tohave a Poisson distribution with means m = {mij} given by
log m = Xβ
X is a known design (model) matrix, expressing the table factorsβ is a column vector containing the unknown λ parameters.This is the same as the familiar matrix formulation of ANOVA/regression,except that
The response, log m makes multiplicative relations additiveThe distribution is taken as Poisson rather than Gaussian (normal)
17 / 1
Overview Loglinear models
Example: 2 x 2 table
For a 2× 2 table, the saturated model (??) with the usual zero-sumconstraints can be represented as
log
m11m12m21m22
=
1 1 1 11 1 −1 −11 −1 1 −11 −1 −1 1
µλA
1λB
1λAB
11
only the linearly independent parameters are represented. λA2 = −λA
1 ,because λA
1 + λA2 = 0, and so forth.
association is represented by the parameter λAB11
can show that λAB11 = 1
4 log(θ) (log odds ratio)Advantages of the GLM formulation: easier to express models withordinal or quantitative variables, special terms, etc. Can also allow forover-dispersion.
18 / 1
Overview Loglinear models
Assessing goodness of fitGoodness of fit of a specified model may be tested by the likelihood ratio G2,
G2 = 2∑
i
ni log(
ni
m̂i
), (4)
or the Pearson X 2,
X 2 =∑
i
(ni − m̂i )2
m̂i, (5)
with degrees of freedom df = # cells - # estimated parameters.E.g., for the model of independence, [A][B], df =IJ − [(I − 1) + (J − 1)] = (I − 1)(J − 1)The terms summed in (??) and (??) are the squared cell residualsOther measures of balance goodness of fit against parsimony, e.g.,Akaike’s Information Criterion (smaller is better)
AIC = G2 − 2df or AIC = G2 + 2 # parameters
19 / 1
Overview Loglinear models
R functions for loglinear models
chisq.test() and vcd::assocstats() — only χ2 tests for two-waytables, not a model (no parameters, no residuals)MASS::loglm() — general loglinear models for n-way tables
loglm(formula, data, subset, na.action, ...)
glm() — all generalized linear models; loglinear with family=poisson
glm(formula, family = poisson, data, weights, subset, ...)
Formulas have the form: ˜ A + B + ... (independence); ˜ A*B + C(allow A*B association)Both return an R object, with named components — usenames(object)
Both have print(), summary(), coef(), residuals(), plot() andother methods
20 / 1
Twoway tables
Example: Arthritis treatmentData on effects of treatment for rheumatoid arthritis (in case form)
## Re-fitting to get frequencies and fitted values## Eye## Hair Brown Blue Hazel Green## Black 4.00 -3.39 -0.49 -2.21## Brown 1.21 -2.02 1.31 -0.35## Red -0.08 -1.85 0.82 2.04## Blond -7.33 6.17 -2.47 0.60
24 / 1
Twoway tables
Mosaic displays: Hair color and eye color
4.4
-3.1
2.3
-5.9
-2.2
7.0
Black Brown Red Blond
Bro
wn
Ha
ze
l G
ree
n
Blu
e
We know that hair color and eye colorare associated (χ2(9) = 138.29). Thequestion is how?
Dark hair goes with dark eyes,light hair with light eyesRed hair, hazel eyes anexception?Effect ordering: Rows/colspermuted by CA Dimension 1
⇒ Opposite corner pattern
25 / 1
Three-way tables Saturated model
Three-way tables
Saturated modelFor a 3-way table, of size I × J × K for variables A,B,C, the saturatedloglinear model includes associations between all pairs of variables, as wellas a 3-way association term, λABC
ijk
log mijk = µ+ λAi + λB
j + λCk
+ λABij + λAC
ik + λBCjk + λABC
ijk .(6)
One-way terms (λAi , λ
Bj , λ
Ck ): differences in the marginal frequencies of
the table variables.Two-way terms (λAB
ij , λACik , λBC
jk ) pertain to the partial association for eachpair of variables, controlling for the remaining variable.The three-way term, λABC
ijk allows the partial association between any pairof variables to vary over the categories of the third variable.Fits perfectly, but doesn’t explain anything, so we hope for a simplermodel!
26 / 1
Three-way tables Reduced models
Three-way tables: Reduced models
Reduced modelsLoglinear models are usually hierarchical: a high-order term, such asλABC
ijk → all low-order relatives are automatically included.Thus, a short-hand notation for a loglinear model lists only the high-orderterms,i.e., the saturated model (??) ≡ [ABC], and implies all two-way andone-way termsThe usual goal is to fit the smallest model (fewest high-order terms) thatis sufficient to explain/describe the observed frequencies.This is similar to ANOVA/regression models with all possible interactions
27 / 1
Three-way tables Reduced models
Three-way tables: Reduced models
Reduced modelsFor a 3-way table there are a variety of models between the mutualindependence model, [A][B][C], and the saturated model, [ABC]Each such model has an independence interpretation: A ⊥ B means anhypothesis that A is independent of B.
Table: Log-linear Models for Three-Way Tables
Model Model symbol InterpretationMutual independence [A][B][C] A ⊥ B ⊥ CJoint independence [AB][C] (A B) ⊥ CConditional independence [AC][BC] (A ⊥ B) |CAll two-way associations [AB][AC][BC] homogeneous assoc.Saturated model [ABC] ABC interaction
28 / 1
Three-way tables Reduced models
Three-way tables: Model types
Joint independence: (AB) ⊥ C, allows A*B association, but asserts noA*C and B*C associations
[AB][C] ≡ log mijk = µ+ λAi + λB
j + λCk + λAB
ij
Conditional independence: A ⊥ B, controlling for C
[AC][BC] ≡ log mijk = µ+ λAi + λB
j + λCk + λAC
ik + λBCjk
Homogeneous association: All two-way, but each two-way is the sameover the other factor
[AB][AC][BC] ≡ log mijk = µ+ λAi + λB
j + λCk + λAB
ij + λACik + λBC
jk
29 / 1
Three-way tables Reduced models
Collapsibility: Marginal and Conditional Associations
Q: When can we legitimately collapse a table, ABC over some variable(C)?A: When the marginal association of AB is the same as the conditionalassociation, AB |CRecall: Berkeley data
The marginal association of Admit, Gender ignoring Dept showed a strongassociationThe partial associations within Dept were mostly NSThis is an example of Simpson’s paradox
Three-way tables: The AB marginal and AB |C conditional associationsare the same, if either
A and C are conditionally independent, A ⊥ C |B ≡ [AB][CB]B and C are conditionally independent, B ⊥ C |A ≡ [AB][AC]=⇒ no three-way interaction
30 / 1
Three-way tables Response vs. association
Response vs. Association models
In association models, the interest is just on which variabels areassociated, and how
Hair-eye data: [Hair Eye]? [Hair Sex]? [Eye Sex]=⇒ fit the homogeneous association model (or the saturated model)
Test the individual terms, delete those which are NS
In response models, the interest is on which predictors are associatedwith the response
The minimal (null or baseline) model is the model of joint independence ofthe response (say, A) from all predictors, [A] [B C D . . . ]Associations among the predictors are fitted exactly (not analyzed)Similar to regression, where predictors can be arbitrarily correlatede.g., Berkeley data: fit the baseline model [Admit] [Gender Dept]lack-of-fit =⇒ associations [Admit Gender] and/or [Admit Dept]
31 / 1
Three-way tables GOF & ANOVA tests
Goodness of fit tests
As noted earlier, overall goodness of fit of a specified model may be tested bythe likelihood ratio G2, or the Pearson X 2,
G2 = 2∑
i
ni log(
ni
m̂i
)X 2 =
∑
i
(ni − m̂i )2
m̂i,
with residual degrees of freedom ν = # cells − # estimated parameters.
These measure the lack of fit of a given model— a large value =⇒ apoor modelBoth are distributed as χ2(ν) (in large samples: all m̂i > 5)E(χ2(ν)) = ν, so G2/ν (or X 2/ν) measures lack of fit per degree offreedom (overdispersion)But: how to compare or test competing models?
32 / 1
Three-way tables GOF & ANOVA tests
Nested models and ANOVA-type tests
Nested modelsTwo models, M1 and M2 are nested when one (say, M2) is a special case ofthe other
Model M2 (with ν2 df) fits a subset of the parameters of M1 (with ν1 df)M2 is more restrictive — cannot fit better than M1: G2(M2) ≥ G2(M1)The least restrictive model is the saturated model [ABC. . . ] with G2 = 0and ν = 0
Therefore, we can test the difference in G2 as a specific test of the addedrestrictions in M2 compared to M1. This test has has a χ2 distribution with df =ν2 − ν1.
∆G2 ≡ G2(M2 |M1) = G2(M2)−G2(M1) (7)
= 2∑
ni log(m̂i1/m̂i2)
33 / 1
Three-way tables GOF & ANOVA tests
Example: Berkeley admissions dataFor the UC Berkeley data, with table variables [A]dmit, [D]ept and [G]enderthe following models form a nested chain
[A][D][G] ⊂ [A][DG] ⊂ [AD][AG][DG] ⊂ [ADG]
Table: Hierarchical G2 tests for loglinear models fit to the UC Berkeley data
Only testing decrease in G2 from one model to the nextHere, each model is significantly better than the previousJoint vs. all 2-way: does Admit depend on Dept and/or Gender?Absolute fit of the all 2-way model is not terrible. Investigate further!
34 / 1
Fitting loglinear models
Fitting loglinear models in Rloglm() - data in contingency table form (MASS package)
loglm() simpler for nominal variablesglm() allows a wider class of models and quantitative predictors(covariates)gnm() fits models for structured association and generalized non-linearmodelsvcdExtra package provides visualizations for all.
35 / 1
Fitting loglinear models
Example: Berkeley admission data
Fit the model of mutual independence using loglm()
Did departments differ in the total number of applicants?Did men and women apply differentially to departments?
A
B
C
D
E
F
Male Female
Model: (Dept)(Gender)
Model [Dept] [Gender]: G2(5) =
1220.6.Note: Departments ordered A–Fby overall rate of admission.Men more likely to apply todepartments A,B; women morelikely in depts C–F
40 / 1
Fitting loglinear models Mosaic displays
Mosaic displays for multiway tables
Generalizes to n-way tables: divide cells recursivelyCan fit any log-linear model (e.g., 2-way, 3-way, . . . ),
For a 3-way table: [A][B][C], [AB][C], [AB][AC], . . . , [ABC]
Each mosaics shows:DATA (size of tiles)(some) marginal frequencies (spacing→ visual grouping)RESIDUALS (shading) — what associations have been omitted?
Visual fitting:Pattern of lack-of-fit (residuals)→ “better” model— smaller residuals“cleaning the mosaic”→ “better” model— empty cellsbest done interactively!
GLM approach allows fitting aspecial term for Dept. ANote: These displays usestandardized residuals:better statistical properties.
43 / 1
Fitting loglinear models Mosaic displays
Other variations: Double decker plotsVisualize dependence of one categorical (typically binary) variable onpredictorsFormally: mosaic plots with vertical splits for all predictor dimensions,highlighting the response by shading
DeptGender
AMale Female
BMale Female
CMale Female
DMale Female
EMaleFemale
FMale Female
Admitted
Rejected
Admit
44 / 1
Fitting loglinear models 4-way example
4-way example: Survival on the TitanicData on the fate of passengers and crew on the HMS Titanic, a 4× 2× 2× 2frequency table:
Two types of zero cells:structural zeros — could not occur (children in crew)sampling zeros — did not happen to occur (children in 1st & 2nd whodied)Structural zeros can cause problems — loss of df; 0/0 = NaN in χ2 tests
46 / 1
Fitting loglinear models 4-way example
Exploratory plotsOne-way doubledecker plots against survival show what might be expected:
With S as response, the baselinemodel includes all associationsamong [CGA]But this model asserts noassociations of these with survivalG2(15) = 671.96, a very poor fit
49 / 1
Fitting loglinear models 4-way example
Adding associations
# main effects of C, G, A on survival: [CGA][CS][GS][AS]mod2 <- loglm(˜ 1*2*3 + (1+2+3)*4, data=Titanic)mosaic(mod2, main="Model [CGA][CS][GS][AS]")
−3.4
−2.0
0.0
2.0
4.0
Pearsonresiduals:
p−value =<2e−16
Model [CGA][CS][GS][AS]
●● ●●
●● ●●
Sex
Survived
Cla
ss
Age
Cre
w
No Yes
Adu
lt
NoYes
Chi
ld
3rd
Adu
ltC
hild
2nd
Adu
ltChi
ld
1st
Male Female
Adu
ltChi
ld
This model allows associations ofeach of C, G, A with SurvivedG2(10) = 112.57, still not goodPattern of residuals suggests2-way interactions (3-way terms):“Women & children first”:suggests a term [GAS]Allow interactions of Class withGender [CGS] and Class with Age[CAS]
mod3, [CGA][CGS][CAS], looks best by AIC and BIC, and also shows NS lackof fit!
53 / 1
Fitting loglinear models 4-way example
Model interpretation
Regardless of Gender and Age, lower Class =⇒ decreased survivalDifferences in survival by Class were moderated by both Gender and Ageterm [CGS]: Women in 3rd class did not have an advantage, while men in1st did vs. other classesterm [CAS]: No children in 1st or 2nd class died, but nearly 2/3 in 3rdclass didSummary:
Not so much “women and children first” as“women and chilren, ordered by class and 1st class men”
54 / 1
Sequential plots and models
Sequential plots and models
Mosaic for an n-way table→ hierarchical decomposition of associationJoint cell probabilities are decomposed as
pijk`··· =
{v1v2}︷ ︸︸ ︷pi × pj|i × pk|ij︸ ︷︷ ︸
{v1v2v3}
× p`|ijk × · · · × pn|ijk···
First 2 terms→ mosaic for v1 and v2
First 3 terms→ mosaic for v1, v2 and v3
· · ·Roughly analogous to sequential fitting in regression: X1, X2|X1, X3|X1X2,· · ·The order of variables matters for interpretation
55 / 1
Sequential plots and models
Sequential plots and models
Sequential models of joint independence→ additive decomposition of thetotal association, G2
Sequential plots and models: ExampleHair color x Eye color marginal table (ignoring Sex)
Black Brown Red Blond
Bro
wn
H
azel
Gre
en
Blu
e
(Hair)(Eye), G2 (9) = 146.44
57 / 1
Sequential plots and models
Sequential plots and models: Example3-way table, Joint Independence Model [Hair Eye] [Sex]
Black Brown Red Blond
Bro
wn
H
azel
Gre
en
Blu
e
M F
(HairEye)(Sex), G2 (15) = 19.86
58 / 1
Sequential plots and models
Sequential plots and models: Example3-way table, Mutual Independence Model [Hair] [Eye] [Sex]
Black Brown Red Blond
Bro
wn
H
azel
Gre
en
Blu
e
M F
(Hair)(Eye)(Sex), G2 (24) = 166.30
59 / 1
Sequential plots and models
Sequential plots and models: Example
Marginal
Black Brown Red Blond
Bro
wn
H
azel
Gre
en
Blu
e
(Hair)(Eye), G2 (9) = 146.44
[Hair] [Eye]G2
(9) = 146.44
+
Joint
Black Brown Red Blond
Bro
wn
H
azel
Gre
en
Blu
e
M F
(HairEye)(Sex), G2 (15) = 19.86
[Hair Eye] [Sex]G2
(15) = 19.86
=
Mutual (total)
Black Brown Red Blond
Bro
wn
H
azel
Gre
en
Blu
e
M F
(Hair)(Eye)(Sex), G2 (24) = 166.30
[Hair] [Eye] [Sex]G2
(24) = 166.30
60 / 1
Sequential plots and models Applications
Applications
Response models
When one variable, R, is a response and E1,E2, . . . are explantory, thebaseline model is the model of joint independence, [E1,E2, . . . ][R]Sequential mosaics then show the associations among the predictorsThe last mosaic shows all associations with RBetter-fitting models will need to add associations of the form[EiR], [EiEjR] . . .
Causal modelsSometimes there is an assumed causal ordering of variables:
A→ B → C → D
Each path of arrows: A→ B, A→ B → C is a sequential model of jointindependence: [A][B], [AB] [C], [ABC] [D].Testing these decomposes all joint probabilities
61 / 1
Sequential plots and models Applications
Example: Marital status, pre- and extra-marital sex
? studied divorce patterns in relation to premarital and extramarital sex, a 24
## Gender Women Men## PremaritalSex Yes No Yes No## ExtramaritalSex Yes No Yes No Yes No Yes No## MaritalStatus## Divorced 17 54 36 214 28 60 17 68## Married 4 25 4 322 11 42 4 130
Sub-models:[G][P] : do men and women differ in pre-marital sex?[GP][E ] : given G & P, are there differences in extra-marital sex?[GPE ][M] : given G, P & E, are there differences in divorce?
62 / 1
Sequential plots and models Applications
Example: Marital status, pre- and extra-marital sex
Order the table variables as G→ P → E → M
PreSex <- aperm(PreSex, 4:1) # order variables G, P, E, M
Fit each sequential model to the marginal sub-tablemod.1 <- loglm(˜ Gender + PremaritalSex, data=PreSex)mod.2 <- loglm(˜ Gender * PremaritalSex + ExtramaritalSex, data=PreSex)...
How does the association between two (or more) variables vary over levelsof other variables?Mosaic plots for the main variables show partial association at each level ofthe other variables.E.g., Hair color, Eye color subset by Sex
2.8
-2.1
-3.3
3.3
Black Brown Red Blond
Bro
wn
Ha
zel
Gre
en
B
lue
Sex: Male
3.5
-2.3 -2.5
-4.9
-2.0
6.4
Black Brown Red Blond
Bro
wn
Ha
zel
Gre
en
Blu
e
Sex: Female
69 / 1
Marginal and partial displays Partial association
Partial association, Partial mosaics
Stratified analysis: conditional decomposition of G2
Fit models of partial (conditional) independence, A ⊥ B |Ck at each levelof (controlling for) C.⇒ partial G2s add to the overall G2 for conditionalindependence,A ⊥ B |C
G2A⊥B |C =
∑
k
G2A⊥B |C(k)
Table: Partial and Overall conditional tests, Hair ⊥ Eye |Sex
Model df G2 p-value[Hair ][Eye] | Male 9 44.445 0.000[Hair ][Eye] | Female 9 112.233 0.000[Hair ][Eye] | Sex 18 156.668 0.000