Doing HLM using SAS Doing HLM using SAS PROC MIXED PROC MIXED Kaz Kaz www.estat.us Feb 2005 Feb 2005
Mar 26, 2015
Doing HLM using SAS Doing HLM using SAS PROC MIXEDPROC MIXED
Doing HLM using SAS Doing HLM using SAS PROC MIXEDPROC MIXED
KazKazwww.estat.us
Feb 2005 Feb 2005
My points
(1) Easy to compare HLM and other models that are not HLM; thus, helpful. This is because PROC MIXED lets you run models that are not HLM.
(2) Easy to understand what makes HLM HLM. In SAS, what is not essential to HLM is done outside PROC MIXED (e.g., centering)
OLS vs. HLM in PROC MIXED.
The difference is a RANDOM statement. • OLS regression syntax
PROC MIXED ;MODEL Y= X;Run;
• HLM syntax PROC MIXED;
Model Y=X;random intercept X/ subject=school;Run;
(1)
OLS
Y_jk = b0 + error_jk
(2)
HLMLevel1: Y_jk=b0 + error_jkLevel2: b0=g0 + error_kor Y_jk=b0 + error_jk + error_k
Again, turning a simple linear model into HLM
(1)PROC MIXED;
Model Y=X W;Run;(2)
PROC MIXED;Model Y=X W;random intercept X W/subject=GroupID;Run;
(3)Random statement below reads:I request that the intercept, as wellas the effects of X and W beEstimated for each subject whichcan be identified by GroupID.
How to write SAS PROC MIXED syntax: Intuitive way
(1) Write all the variable names at the model statement.
model Y=X W;
(2) Decide which variables’ effect you want to estimate by schools
random intercept X W/subject=school;
More careful way1. Start from level-specific specification.
e.g., level1:y=b0 + b1*X + error_ij level2: b0=g00 + g01*W + error_0j level2: b1=g10 + g11*W + error_1j
2. Insert level-2 equations into level-1 equations. 3. Write the variable names involved in model statement.4. Find “random components” (written in Roman alphabets)
RULE1: Put “intercept” in the random statement to accommodate higher level errors.RULE2: If the name of any variables sits right next to level-2 error with an asterisk (e.g., X*level-2 error), put those variable names in the random statement.(RULE3:No worry about residual. It is set by default.)
Example 1 Anova Model
Level1: Y_ij=b0j + Residual_ijLevel2: b0j= g00 + U_0j Y= g00 + U_0j + Residual
proc mixed ;class group;model Y= ;random intercept/subject=school;run;
I said:RULE1: Put “intercept” in the random statement to accommodate higher level errors.RULE2: If the name of any variables sits right next to level-2 error with an asterisk (e.g., X*level-2 error), put those variable names in the random statement.ONLY RULE1 relevant in this model.
Example 2 Slope as outcome models
Level1: Y_ij=b0j + b1j*X + Residual_ijLevel2: b0j= g00 + g01*W + U_0jLevel2: b1j=g10+ g11*W + U_1j Y= g00 + g01*W + g10*X + g11*W*X
+ U_1j*X + U_0j + Residual
proc mixed ;class group;model Y= W X W*X;random intercept X/subject=school;run;
What were..RULE1?RULE2?
How to do substitution: Cheating using HLM software!
PUSH MIXED buttonto get a little window like this.
How to do substitution by hand
Level1: Y_ij=b0j + b1j*X + Residual_ijLevel2: b0j= g00 + t g01*W + U_0jLevel2: b1j=gt10+ g11*W + U_1j
1. Insert higher level equations into the level-1 equation.
Y=[g00 + g01*W + U_0j] + [g10+g11*W + U_1j]*X + Residual_ij
2. Take out the brackets--> Y=g00 + g01*W + U_0j + g10*X +g11*W*X + U_1j *X + Residual_ij
3. Notice which parts are structural part and which parts are random components. Y=g00 + g01*W + g10*X +g11*W*X + U_1j *X + U_0j + Residual_ij
proc mixed ;model W X W*X;random intercept X /subject=school;run;
What were rule1 and rule 2?
Fixed Effects or Random EffectsOLS regression is a fixed effect model
PROC MIXED;Model Y=X;Run;
OLS regression is a model with fixed effects. So in a way OLS is a special case of HLM. This is an awfully inflexible model that does not consider the existence of various sources of errors.
HLM PROC MIXED;Model Y=X;random intercept X/ subject=groupID;Run;
If a researcher thinks the effect of X (and the intercept) is different by groups, so we should treat these coefficients as random effects.
Benefit 1 of using random effectConceptual one
Useful to think about Micro-Macro problems(1)Student: Math score=b0 + b1*parents’ education level +
…. + errorCountry:b1=g00 + g01*SELECTION + error
(2)Classroom: teacher perception of math ability of class=b0 + b1*average parents’ education level
+b2*average math score +b3*noise + error
Country:b1=g10 + g11*National Exam + errorb2=g20 + g21*National Exam + errorb3=g30 + g31*National Exam + error
Benefit 2: Statistical benefit
• Statistical Benefits– In deriving a grand mean (re: the effect of X or an intercept) HLM does
“shrinkage.” This pulls inaccurate group means towards the grand mean, so we can reduce the influence of outliners if their estimates are inaccurate (i.e., having large error variance and/or coming from a small number of observations within each group unit)
Shrunk School mean=reliability*school meanwhere reliability is a function of N of observation in a group unit and variance. (R&B HLM book, p. 48)
Quiz: 1) what happens to a school whose reliability is 1?2) What happens if all schools are 1 on reliability?3) What happens if all schools are .5 on reliability?
Quick decision rule Random or fix
• Do I open the door at 11PM?– Literature– Theory– Exploratory analysis (let’s see what
happens.)
Moore complicated: Two step decisions regarding random effects
(I need your help in phrasing this.)
• Step 1: Effect different by school?
• Step 2: Random or Fixed?– Fixed: Use a series of dummy
variables (in reality too tedious)– Random: Shrinkage applies and get a
precision guided grand mean
ExampleStudent Engagement
study using ESM
by Uekawa, Borman, and Lee
Engagement Level (Rasch model composite)
• When you were signaled the first time today, SD D A SA
• •I was paying attention……………………….. O O O O • •I did not feel like listening…………………… O O O O• •My motivation level was high……………….. O O O O • •I was bored………………………………….. O O O O• •I was enjoying class………………………… O O O O• •I was focused more on class than anything else O O
O O• •I wished the class would end soon………… . O O O O• •I was completely into class ………………… O O O O
The MEANS Procedure
Analysis Variable : engagement engagement
N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 2316 0.1167889 10.0106694 -31.5283511 26.4164991 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
3-level HLM• Level1: Repeated Measures (10 beeps)• Level2: Students (10 kids from a class)• Level3: courses (34 courses, Monday to
Friday)
3-level HLMLibname here "C:\";/*This is three level model*/proc mixed data=here.esm covtest
noclprint;class IDclass IDstudent;model engagement= /solution ddfm=kr;random intercept /sub=IDstudent(IDclass);random intercept /sub=IDclass ; run;
Quiz: how can we make this a 2-level hlm?
PROC MIXED statement
proc mixed data=here.esm covtest noclprint;
• “covtest” does a test for covariance components (whether variances are significantly larger than zero.). The reason why you have to request such a simple thing is that COVTEST is not based on chi-square test that one would use for a test of variance. It uses instead t-test or something that is not really appropriate. Shockingly, SAS has not corrected this problem for a while. Anyways, because SAS feels bad about it, it does not want to make it into a default option, which is why you have to request this. Not many people know this and I myself could not believe this. So I guess that means that we cannot really believe in the result of COVTEST and must use it with caution.
• When there are lots of group units, use NOCLPRINT to suppress the printing of group names.
CLASS statementclass IDclass IDstudent Hisp;
• We throw in the variables that we want SAS to treat as categorical variables. Variables that are characters (e.g., city names) must be on this line (it won’t run otherwise). Group IDs, such as IDclass in my example data, must be also in these lines; otherwise, it won’t run. Variables that are numeric but dummy-coded (e.g., black=1 if black;else 0) don’t have to be in this line, but the outputs will look easier if you do.
• One thing that is a pain in the neck with CLASS statement is that it chooses a reference category by alphabetical order. Whatever group in a classification variable that comes the last when alphabetically ordered will be used as a reference group. We can control this by data manipulation. For example, if gender=BOY or GIRL, then I tend to create a new variable to make it explicit that I get girl to be a reference group:If gender=”Boy” then gender2=”(1) Boy”;If gender=”Girl” then gender2=”(2) Girl”;
MODEL statementmodel engagement= /solution ddfm=kr;
ddfm=kr specifics the ways in which the degree of freedom is calculated. It seems most close to the degree of freedom option used by Bryk, Raudenbush, and Congdon’s HLM program.
Could be computationally very heavy if a model
is complicated. ddfm=bw would run faster, though DF would be wrong.
Random statementrandom intercept X/sub=IDstudent(IDclass);
random intercept X/sub=IDclass ;
We can estimate variance of slopes for categorical variables using “group=“ option --- without necessarily making them into dummy variables.
random intercept race /sub=IDclass group=race;
(instead of “random intercept black white hispanic/sub=IDclass;”)
Libname here "G:\SAS";proc mixed data=here.esm covtest noclprint;weight precision_weight;class IDclass IDstudent;model engagement= /solution ddfm=kr;random intercept /sub=IDstudent(IDclass); random intercept /sub=IDclass ;run;
• The Mixed Procedure
• Covariance Parameter Estimates
• Standard Z• Cov Parm Subject Estimate Error Value Pr Z
• Intercept IDstudent(IDclass) 23.3556 2.5061 9.32 <.0001• Intercept IDclass 5.3168 2.0947 2.54 0.0056• Residual 31.9932 1.0282 31.12 <.0001
Solution for Fixed Effects
Standard Effect Estimate Error DF t Value Pr > |t|
Intercept -1.2315 0.5081 30.2 -2.42 0.0216
Engagement=b0 + residual + e_student + e_course
R e s i d u a l5 3 %
S t u d e n t3 8 %
C o u r s e9 %
R e s i d u a l S t u d e n t C o u r s e
MODEL 1RQ: How does the engagement score varies by students and class?
Libname here "G:\SAS";proc mixed data=here.esm covtest noclprint;weight precision_weight;class IDclass IDstudent subject;model engagement= hisp /solution ddfm=kr;random intercept /sub=IDstudent(IDclass);random intercept hisp /sub=IDclass ;run;
Solution for Fixed Effects
Standard Effect Estimate Error DF t Value Pr > |t|
Intercept -0.6287 0.5101 33.1 -1.23 0.2265 hisp -2.2113 1.0031 17.7 -2.20 0.0410
Covariance Parameter Estimates
Standard Z Cov Parm Subject Estimate Error Value Pr Z
Intercept IDstudent(IDclass) 22.4743 2.4712 9.09 <.0001 Intercept IDclass 3.6944 1.7791 2.08 0.0189 hisp IDclass 5.9656 5.3290 1.12 0.1315 Residual 31.9871 1.0275 31.13 <.0001
MODEL 2
RQ: What is the effect of being Hispanic students?And is the effect vary by class?
MODEL 3
proc mixed data=here.esm covtest noclprint;weight precision_weight;class IDclass IDstudent subject;model engagement= hisp math hisp*math /solution ddfm=kr;random intercept /sub=IDstudent(IDclass);random intercept hisp /sub=IDclass ;run;
Solution for Fixed Effects
Standard Effect Estimate Error DF t Value Pr > |t|
Intercept -0.3249 0.7061 34.2 -0.46 0.6484 hisp -4.1236 1.4562 14.5 -2.83 0.0129 math -0.6081 1.0108 33.7 -0.60 0.5515 hisp*math 3.3305 1.9233 15.2 1.73 0.1035
The Mixed Procedure
Covariance Parameter Estimates
Standard Z Cov Parm Subject Estimate Error Value Pr Z
Intercept IDstudent(IDclass) 22.6987 2.5020 9.07 <.0001 Intercept IDclass 3.5328 1.7254 2.05 0.0203 hisp IDclass 4.2309 5.0161 0.84 0.1995 Residual 31.9761 1.0269 31.14 <.0001
Hispanic alienation phenomenon—related to subject? Math versus Science
MODEL 3 Solution for Fixed Effects
Standard Effect Estimate Error DF t Value Pr > |t|
Intercept -0.3249 0.7061 34.2 -0.46 0.6484 hisp -4.1236 1.4562 14.5 -2.83 0.0129 math -0.6081 1.0108 33.7 -0.60 0.5515 hisp*math 3.3305 1.9233 15.2 1.73 0.1035
The Mixed Procedure
Covariance Parameter Estimates
Standard Z Cov Parm Subject Estimate Error Value Pr Z
Intercept IDstudent(IDclass) 22.6987 2.5020 9.07 <.0001 A Intercept IDclass 3.5328 1.7254 2.05 0.0203 B hisp IDclass 4.2309 5.0161 0.84 0.1995 -> C Residual 31.9761 1.0269 31.14 <.0001
Level1:engagement=b0+ b1*Hispanic + residual
Level2: b0=g00 + A
Level2: b1=g10
Level3: g00=t_000 + t_100*Math + B
Level3: g10=t_100 + t_101*Math + C
Level1:engagement=t_000 + t_100*Math + t_100*Hispanic + t_101*Math*Hispanic
+ C*Hispanic + B + A + residual
VERSUS
Which is easy to understand?
In HLM software In SAS PROC MIXEDLevel-1 Intercept DisappearsLevel-2 Intercept DisappearsLevel-3 Intercept InterceptLevel-1 Error ResidualLevel-2 Error Random effects Level-3 Error Random effects
HLM way
Level1:engagement=b0+ b1*Hispanic + residual
Level2: b0=g00 + A
Level2: b1=g10
Level3: g00=t_000 + t_100*Math + B
Level3: g10=t_100 + t_101*Math + C
PROC MIXED way Level1:engagement=t_000 + t_100*Math + t_100*Hispanic + t_101*Math*Hispanic
+ C*Hispanic + B + A + residual
Why do we center variables?
Level1:engagement=t_000 + t_100*Math + t_100*Hispanic + t_101*Math*Hispanic
+ C*Hispanic + B + A + residual Imagine we have to report to teachers
their students’ average engagement score.We want to use B + t_000. To be clear aboutMeaning of t_000 part, we “could” center variables,if it makes sense.
What about Centering?
In SAS, we use PROC STANDARD to do centering and this is outside of PROC MIXED. When I learned this, I thought, “I have done it before!” because centering is similar to the concept of Z-scores.
This is GROUP MEAN CenteringProc standard data=X mean=0;by GroupID;var X;Run;
This is GRAND MEAN Centeringproc standard data=X mean=0;var X;Run;
By the way, just for your information, this is to create Z-scoresproc standard data=X mean=0 STD=1; var X;Run;
When you use SAS PROC MIXED, you notice Centering is not really a topic that is specific to HLM—because it is done outside PROC MIXED.
What does it mean to center dummy variable, like gender?
1.To adjust for gender composition.2.
-Without it, the intercept = either male or female
-With it, the intercept is adjusted for gender composition.
3. See my Excel Presentation if we have time. www.estat.us/sas/centering.xls
ENDTo go back to my HLM pagewww.estat.us/id38.html