Top Banner
MULTIPLE REGRESSION AND MULTILEVEL MODELLING Workshop PISA research conference, Kiel, Sep 2009 Eveline Gebhardt, Australian Council for Educational Research 1 Please check if you have 4 files in the folder C Please start SPSS now
47

Multi Level Modelling&Weights Workshop Kiel09

Jun 21, 2015

Download

Documents

egebhardt72

A workshop on using sampling weights in multi-level modelling
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multi Level Modelling&Weights Workshop Kiel09

1

MULTIPLE REGRESSION AND

MULTILEVEL MODELLINGWorkshop PISA research conference, Kiel, Sep 2009Eveline Gebhardt, Australian Council for Educational Research

Please check if you have 4 files in the folder C:\KIEL\

Please start SPSS now

Page 2: Multi Level Modelling&Weights Workshop Kiel09

2 INTRODUCTION

1. Brief overview of multiple regression analysis

2. Multiple regression using PISA data

3. Brief overview of multilevel modelling

4. Multilevel modelling using PISA data

5. Differences between the two types of analyses

Page 3: Multi Level Modelling&Weights Workshop Kiel09

3

OVERVIEW MULTIPLE REGRESSION

Section 1

Page 4: Multi Level Modelling&Weights Workshop Kiel09

4

SIMPLE REGRESSION

MODEL

Predicting the dependent variable using linear relationship with independent variables

Regression analysis with one independent variable:

β0 is the intercept (the value of Ŷi when Xi=0)

β1 is the slope of the line that minimises εi

Page 5: Multi Level Modelling&Weights Workshop Kiel09

5

β0

β1

Page 6: Multi Level Modelling&Weights Workshop Kiel09

6MULTIPLE REGRESSION

MODEL

Ŷi

r=0

R<>0

Page 7: Multi Level Modelling&Weights Workshop Kiel09

7MULTIPLE REGRESSION

MODEL

O O O

Page 8: Multi Level Modelling&Weights Workshop Kiel09

8

MULTIPLE REGRESSION WITH PISA DATA

Section 2

Page 9: Multi Level Modelling&Weights Workshop Kiel09

9PISA DATA

PLAUSIBLE VALUES

Plausible values for cognitive performance

5 randomly drawn values from a student’s most likely ability range (posterior distribution)

Unbiased populations estimates (even with one PV)

Imputation variance (measurement error)

NEVER average the plausible values! Instead, average 5 statistics (means, regression coefficients, etc.)

Page 10: Multi Level Modelling&Weights Workshop Kiel09

10PISA DATA

STUDENT WEIGHTS

Final student weight: the number of students represented in the population by each student

The inverse of the probability to select the student’s school times the probably of selecting the student given that the school is selected

Non-response and post-stratification adjustments and trimming

Page 11: Multi Level Modelling&Weights Workshop Kiel09

11PISA DATA

REPLICATE WEIGHTS

80 BRR replicate weight with Fay’s k=0.5

Used to compute sampling variance

Computation of sampling variance using BRR weights

Takes two-stage sampling method into account

Takes stratification into account

is identical for any statistic

Page 12: Multi Level Modelling&Weights Workshop Kiel09

12 ERROR VARIANCE

Error variance is a combination of the sampling variance and the imputation variance (measurement error)

Imputation variance can only be estimated when using a set of plausible values

Imputation variance is small compared to the sampling variance

Standard error is the square root of the error variance

Page 13: Multi Level Modelling&Weights Workshop Kiel09

13COMPUTATION OF

STANDARD ERROR

Error variance

Sampling variance

Imputation variance

Standard error is the square root of the error variance

Page 14: Multi Level Modelling&Weights Workshop Kiel09

14SPSS REPLICATES

ADD-IN

Password WI-FI: Hawking09+

mypisa.acer.edu.au

Public data & analysis

Software & manuals

Download and install replicates add-in

Start SPSS

Copy CD to C:\Kiel\ and unzip file

Page 15: Multi Level Modelling&Weights Workshop Kiel09

15 EXAMPLE IN SPSS

C:\Kiel\INT_Stu06_SCHWGT.sav

German data

Regress science performance on

Sex

Immigration status

ESCS

Page 16: Multi Level Modelling&Weights Workshop Kiel09

16

OVERVIEW MULTILEVEL MODELLING

Section 3

Page 17: Multi Level Modelling&Weights Workshop Kiel09

17 EXAMPLE

For Japan in 2006:

Strong relationship between ESCS and science (38.8)

Large intra-class correlation in performance

Small intra-class correlation in ESCS

For this example, only nine Japanese schools are selected

Page 18: Multi Level Modelling&Weights Workshop Kiel09

18SINGLE LEVEL

REGRESSION

Overall slope is 38.8

Page 19: Multi Level Modelling&Weights Workshop Kiel09

19MULTILEVEL MODEL

WITH RANDOM SLOPES

Page 20: Multi Level Modelling&Weights Workshop Kiel09

20MULTILEVEL MODEL WITH FIXED SLOPES

Average slope is 7.2

Page 21: Multi Level Modelling&Weights Workshop Kiel09

21INTERPRETATION

REGRESSION COEFFICIENTS

Single level regression gives the overall relationship between ESCS and performance in a country (38.8 in Japan)

Multi-level regression takes the 2-level structure of the data into account and

Estimates a unique slope within each school (or the variance of the slopes) or

Estimates the average slope within schools (7.2 in Japan)

Which type of analysis is more correct?

Page 22: Multi Level Modelling&Weights Workshop Kiel09

22

Random intercept model

Level 1:

Level 2:

Random slopes and random intercept

Level 1:

Level 2:

NOTATION MLM

0 1

0 00 01 0

ij j ij ij

j j j

Y X r

W u

0 1

0 00 01 0

1 10 11 1

j

j j

ij j ij ij

j j

j

j

Y X r

W u

W u

Page 23: Multi Level Modelling&Weights Workshop Kiel09

23

System of equations

Level 1:

Level 2:

Mixed-effects model

RANDOM INTERCEPT

0 1

0 00 0 0

ij j ij ij

j j j j

Y X r

W u

00 0 1 0ij j j ij j ijY W X u r

Fixed part Random part

Page 24: Multi Level Modelling&Weights Workshop Kiel09

24

System of equations

Level 1:

Level 2:

Mixed-effects model

RANDOM INTERCEPT AND RANDOM SLOPES

0 1

0 00 0 0

1 10 11 1

j

j j

ij j ij

j

ij

j j j j

Y X r

W u

W u

00 0 0 10 11 1

00 0 10 11 1 0

ij j j j j j ij ij

j j ij j ij j ij j ij

Y W u W u X r

W X W X u X u r

Fixed part Random partCross-level interaction

Page 25: Multi Level Modelling&Weights Workshop Kiel09

25VARIANCE

DECOMPOSITION

In single level regression analysis, the overall variance of the dependent variable is estimated and the amount of this variance that is explained by the independent variables (R-squared)

In multilevel analysis, the variance is decomposed in between-cluster (school) and within-cluster variance

The independent variables can explain variance at either level or at both levels

Page 26: Multi Level Modelling&Weights Workshop Kiel09

26 VARIANCES

Total variance = within-cluster variance + between-cluster variance

Average within-cluster variance

Between-cluster variance

( 2) (1)2

2

(2) (1)1 1 1

n nij j

rj i

y y

n n

( 2)2

220 (2) (1)

1

nj r

jj

y y

n n

Page 27: Multi Level Modelling&Weights Workshop Kiel09

27INTRACLASS CORRELATION AND EXPLAINED VARIANCE

Null model:

Intraclass correlation (rho)=between-cluster variance / total variance

Explained variance (R-squared) of a model with predictors:

Level 1: 1 - (var(W)p / var(W)n)

Level 2: 1 - (var(B)p / var(B)n)

00 0ij j ijy u r

Page 28: Multi Level Modelling&Weights Workshop Kiel09

28THE STANDARD

ERROR

One assumption of OLS is independence of observations

In 2-stage sampling designs, observations within clusters are often not independent

MLM allows for correlated errors and therefore gives unbiased SEs

Generally, SEs estimated with OLS are too small

However, BRR replicate weights are designed to deal with the dependence of observations within schools, so OLS with BRR gives correct standard errors!

Page 29: Multi Level Modelling&Weights Workshop Kiel09

29 WEIGHTING - 1

Single level regression: final students weights and BRR replicate weights

How do we use PISA weights in MLM?

Data analysis manual: normalise final student weights and replication weights and run the analysis in SPSS or SAS

We now know this is not the best way

Page 30: Multi Level Modelling&Weights Workshop Kiel09

30 WEIGHTING - 2

SPSS and SAS do not assume the weights to be sampling weights (they are precision weights)

SPSS and SAS can only weight at the student level

MLM and BRR are both taking the multi-level structure of the data into account, so this is done twice in the PISA data analysis manual method

However, there is no final consensus about the right way to use weights in MLM

Page 31: Multi Level Modelling&Weights Workshop Kiel09

31 WEIGHTING - 3

In PISA school-level sampling is much more informative than student-level sampling (stratification is at school-level; students have often very similar weights within schools )

Therefore, schools should be weighted by a school-level weight

Students should be weighted by a conditional student weight (inverse of the probability to be selected given that the student’s school is sampled)

Page 32: Multi Level Modelling&Weights Workshop Kiel09

32 WEIGHTING - 4

Options for conditional student level weights:

Equal weights (weight=1)

Raw conditional student weights

Rescaled weights: Pfefferman method 1 when student sampling is not informative

Rescaled weights: Pfefferman method 2 when student sampling is informative

Differences are small when cluster sizes are larger than 20 students

Page 33: Multi Level Modelling&Weights Workshop Kiel09

33RAW CONDITIONAL STUDENT WEIGHTS

Raw conditional student weights:

School weight is included in the school questionnaire data file

Not exactly correct, because some adjustments are made independent of schools (e.g. non-response adjustment)

Often leads to an overestimation of the between-school variance

(1)|

W_FSTUWT

W_FSCHWTi jw

Page 34: Multi Level Modelling&Weights Workshop Kiel09

34PFEFFERMAN

METHOD 1

When student sampling is not informative at level 1

Conditional student weights are multiplied by the sum of weights within cluster divided by the sum of squared weights within cluster

(1)

(1)

(1)|

(1) 1|

2(1)|

1

1

j

j

n

i ji

i j n

i ji

wPFEFF w

w

Page 35: Multi Level Modelling&Weights Workshop Kiel09

35PFEFFERMAN

METHOD 2

When student sampling is informative

Conditional student weights are divided by the average conditional student weight in school j or

This is the same as normalising full student weights within schools

(1)

(1)(1)|

(1)|

1

2j

ji j n

i ji

nPFEFF w

w

Page 36: Multi Level Modelling&Weights Workshop Kiel09

36LET’S TRY IT OUT IN

MLWIN

Australia, because they oversample indigenous students who perform less than non-indigenous students (positive correlation between conditional student weights and performance)

C:\Kiel\ INT_Stu06_SCHWGT.sav

I have added the full school weights (W_FSCHWT) and the normalised school weights (N_FSCHWT)

N_FSCHWT= W_FSCHWT*SAMPSIZE/POPSIZE

Page 37: Multi Level Modelling&Weights Workshop Kiel09

37COMPARING CONDITIONAL

STUDENT WEIGHTS IN MLWIN - 1

Equal Raw Pfeff1 Pfeff2 Std MLwiNResponse PV1SCIE PV1SCIE PV1SCIE PV1SCIE PV1SCIE

Fixed PartCONS 521 523 522 522 520

Random PartLevel: SCHOOLIDCONS/CONS 1527 1300 1517 1508 1782Level: STIDSTDCONS/CONS 8605 29105 8172 8472 8404

-2*loglikelihood: 169452 170788 169614 169633 173562DIC: Units: SCHOOLID 356 356 356 356 356Units: STIDSTD 14170 14170 14170 14170 14170

Page 38: Multi Level Modelling&Weights Workshop Kiel09

38COMPARING CONDITIONAL

STUDENT WEIGHTS IN MLWIN – 2

Equal weights (=1) and Pfefferman methods 1 and 2 give similar results when using PISA data

Pfefferman method 2 most conservative: recommended

Raw weights over-estimate the within-school variance (I think this is MLwiN specific, similar problem with unscaled school weights)

Page 39: Multi Level Modelling&Weights Workshop Kiel09

39WEIGHTS STANDARDISED

BY MLWIN

MLwiN’s standardisation of the weights:

At the school level, the full school weight is normalised at country level

The student level weight is the Pfefferman 2 conditional student weight * the normalised school weight * a factor to make the average student weight equal to one

Odd that the school weight is included at both levels, but results are the same as in HLM

Page 40: Multi Level Modelling&Weights Workshop Kiel09

40WHICH WEIGHTS ARE

BETTER?

In simulation study the differences in results were minimal, but the differences were big when using data from some real countries

We do not know which method is best

Probably safest in MLwiN to use standardised weights, because we do not know how the weights are built into their algorithm

Need to explore what other software packages do (gllamm in STATA)

Page 41: Multi Level Modelling&Weights Workshop Kiel09

41 REFERENCES

Rabe-Hesketh, S. & Skondral, A. (2006). Multi-level modelling of complex survey data. Journal of Royal Statistical Society, 169, 805-827

Chantala, K., Blanchette, D. & Suchindran, C. M. (2006). Software to compute sampling weights for multilevel analysis. http://www.cpc.unc.edu/restools/data_analysis/ml_sampling_weights/Compute%20Weights%20for%20Multilevel%20Analysis.pdf

Page 42: Multi Level Modelling&Weights Workshop Kiel09

42

PRACTISE MLM WITH PISA DATA

Section 4

Page 43: Multi Level Modelling&Weights Workshop Kiel09

43 EXERCISE

For MLwiN, data has to be sorted first by the highest level ID variable, then by the second highest, etc. (SCHOOLID in PISA)

MLwiN needs a constant in the data (compute CONS=1.) to estimate the intercept

Start with data from Chile, where the intraclass correlation in both science performance and ESCS is high

Start MLwiN…

Page 44: Multi Level Modelling&Weights Workshop Kiel09

44 WARNINGS

Definition of a school is not the same in each country and not always that clear (campus)

Differences in educational systems between or even within countries or cycles (tracked)

Risk of swimming and too complicated models to interpret if MLM is more data driven than theory driven

To interpret results carefully, you need to know enough about the educational system in a country or differences across countries

Page 45: Multi Level Modelling&Weights Workshop Kiel09

45

COMPARING MULTIPLE REGRESSION AND MULTILEVEL ANALYSIS

Section 5

Page 46: Multi Level Modelling&Weights Workshop Kiel09

46 COMPARISONS

OLS with BRR

Fixed effects

Includes measurement error

Takes stratification into account

Output is SPSS data file for easy editing

MLM

Random effects and cross-level interactions

Difficult to include measurement error

I think it doesn’t take school stratification into account

Output is often in text format

Page 47: Multi Level Modelling&Weights Workshop Kiel09

47OPTIONS FOR FINAL

PART OF THE WORKSHOP

Try a MLM on data of your own country

Try school and student level variables

Try to add cross level interactions (free the slopes)

Discuss MLMs that you have tried in the past or would like to do in the future

Ask any PISA related data analysis questions