Direct estimation using NAEP data with AM (SAS or …eo2.commpartners.com/.../Direct_Estimation_Using_NAEP_Data...2013.pdfDirect estimation using NAEP data with AM (SAS or SPSS) ...

Direct estimation using NAEP data with AM (SAS or SPSS)

AERA 2013 NAEP Data Training

Purpose: To demonstrate how to derive direct estimation of NAEP scale scores with AM software and

how to prepare data for direct estimation using SAS or SPSS. This instruction consists of three parts: Part

1. Preparing the dictionary file; Part 2. Data preparation using SAS or SPSS; and Part 3. Direct estimation

using AM.

Data: This demonstration uses the NAEP Primer mini-sample public-use data file. This probability sample

contains data from the assessed students within the schools that participated in the 2005 NAEP grade 8

mathematics assessments. The sample includes about 10% of the students who sat for this assessment.

This file is rectangular and has a row for each assessed student. It contains selected variables from the

NAEP respondent data file, including student, teacher, and school background questionnaire data as

well as sampling information. The use of these data is not restricted. We have chosen this data set

because the full NAEP database is restricted in order to protect the anonymity of the NAEP participants.

Using the full database would require obtaining a Restricted-Use Data License (see

http://nces.ed.gov/statprog/instruct.asp) and careful attention to the rules for usage. The variables in

this NAEP Primer sample were chosen to avoid the identification of individuals or schools and additional

safeguards such as data perturbation were taken to assure the privacy of the sample. The result is that

the statistical estimates made from these data will differ slightly from those made from the full NAEP

database.

Part 1. Preparing the dictionary file

The dictionary file is an input file for direct estimation. It includes the following information: Item ID,

Item type, Subscale the items belongs to, item parameters, the A and B transformation coefficients

(from the Calibrating Scale Units to the Units of the Reporting Scale), weights of each subscale in

determining the composite scale, variable names for the sampling weight and design variables (stratum

and cluster).

In this exercise we will use a Macro written in Excel that contains all the relevant information mentioned

above. The macros will create the dictionary file “Math05g8_dictionary.dct”. The Excel file is titled

“Math05g8_dictionary.xls”. The first sheet of the Excel Workbook (Items) includes the item level

information. Part of the information in this sheet is displayed below.

ID item on A C B D1 D2 D3 D4

type subtest

M066401 MC 1 0.6784 0.1494 -

0.3275

M020101 CR 1 1.3277 -

0.7218 0

M093701 MC 1 1.2169 0.165 1.8095

M086001 MC 1 1.0513 0.2178 1.0011

M051901 MC 1 1.5963 0.0747 0.6095

M046001 CR 1 0.5531 -

2.5249 0

M046901 CR 1 0.8571 -

1.8442 0

…

M076001 CR 2 0.4702 1.3421 -

2.0168 0.8414 0.1818 0.9937

M047201 MC 2 0.8848 0.0691 -

1.0809

M068008 CR 2 0.3969 2.0894 0.2514 -

0.2514

M093601 CR 2 0.8392 1.5419 1.4806 -

1.4806

M085601 MC 2 1.2074 0.2823 -

0.1178

M085501 MC 2 1.2542 0.193 0.5841

M047101 MC 2 1.0961 0.169 0.178

M047901 CR 2 1.1949 1.3106 0

M140401 MC 2 1.1048 0.1213 -

0.6061

M140801 MC 2 1.3253 0.2014 0.6344

M141001 MC 2 1.2542 0.171 0.0911

The first column indicates the item ID. This is the variable names used in the data files for the items. The

second column indicates whether the item is a Multiple-Choice (MC) or a Constructed Response (CR)

item. The third column indicates which subscale the item belongs to. There are 5 subscales in the Grade

8 Mathematics assessment in 2005: (1) number properties and operations, (2) measurement, (3)

geometry, (4) data analysis and probability, and (5) algebra. The other columns indicate the item

parameters. All items have an A (discrimination) and a B (difficulty) parameter. In addition, MC items

have a nonzero C (guessing) parameter and polytomously scored CR items have D parameters. For

dichotomously scored CR items D1 is set to zero. These items parameters are listed in PART 6 of the

Data companion titled “2005_NAEPDataCompanion.pdf”.

The second sheet of the Excel Workbook includes user inputs. The information in this sheet is displayed

below.

AM dct file name Math05g8_dictionary.dct

CR item parameterization (1. Masters' PCL 2. Likert version) 2

omit 8 IRT model scale (D) 1.7

main test scale 1 location 0

number of subtest 5

Numbers scale 37.775 location 277.107 weight 0.20

Measurement scale 46.571 location 275.861 weight 0.15

Geometry scale 33.221 location 275.640 weight 0.20

Data_analysis scale 40.371 location 281.910 weight 0.15

Algebra scale 35.64 location 281.787 weight 0.30

survey design variable name

ORIGWT weight

repgrp1 stratum

jkunit cluster

The first raw indicates the name of the dictionary file the Macro will create: Math05g8_dictionary.dct.

The second row indicates the type of IRT model to be used for CR items. The third row indicates the

code used for Omit (8). In NAEP a partial credit is given to MC items that are omitted. Therefore it is

important to distinguish these from responses coded as missing. The same row indicates the IRT scaling

factor as 1.7. The next row sets the scale and the location of the main test (composite) to 1 and 0. This

defines the calibration metric. The fifth row specifies the number of subscales as 5 (number properties

and operations scale, measurement, geometry, data analysis and probability, and algebra). The next five

rows define the scale and the location for each subscale. These are the A and B coefficients of linear

transformations from the calibrating scale units to the units of the Reporting Scale.

http://nces.ed.gov/nationsreportcard/tdw/analysis/2004_2005/trans_constants_2005math.asp

Finally the last three columns specify the design variables (weight, stratum, and cluster) to be

incorporated in analyses.

To run the Macro go to View Macro and click on Run. The resulting dictionary file

(Math05g8_dictionary.dct) will include the user inputs and the item level information as specified in the

Excel file. Part of the dictionary file is displayed below:

test = "main test" id = 1 scale=1 location=0

subtest = "Numbers" id=1 scale=37.7749290665129 location=277.106996120817

weight=0.2

subtest = "Measurement" id=2 scale=46.5712993436862

location=275.86072019433 weight=0.15

subtest = "Geometry" id=3 scale=33.2209765581178

location=275.640022491989 weight=0.2

subtest = "Data_analysis" id=4 scale=40.3705130837592

location=281.909903652864 weight=0.15

subtest = "Algebra" id=5 scale=35.6402722009404 location=281.787303279947

weight=0.3

http://nces.ed.gov/nationsreportcard/tdw/analysis/2004_2005/trans_constants_2005math.asp

Variables

M066401 IRM=3pl ipa=0.67844 ipb=-0.32747 ipc=0.14938 scale=1.7 ontest=1

onsubtest=1 omitted=8

M020101 ontest=1 onsubtest=1 IRM=PCL ipa=1.32768 ipb=-0.72183 scale=1.7

omitted=8 IPD1=0

M093701 IRM=3pl ipa=1.21694 ipb=1.80954 ipc=0.16501 scale=1.7 ontest=1


M086001 IRM=3pl ipa=1.05133 ipb=1.00113 ipc=0.21776 scale=1.7 ontest=1


….

Part 2. Data preparation (scoring)

SAS version:

Create your working folder to save modified SAS Scoring program and generated data file, for

example, ‘C:\NAEP PRIMER\SAS’ and ‘C:\NAEP PRIMER\Data’.

Open the SAS program, “FORMAT.SAS”, and save it in your working folder. At the top of the

“FORMAT.SAS” program, the following highlighted file path need to be changed according to

where you want to save the formats SAS catalog (for example, you want to save it to 'C:\NAEP

Primer\Data'):

TITLE1 "2005 National Mathematics Assessment (NAEP Primer)";

TITLE2 "SAS Formats for All Data Fields";

LIBNAME PRIMER 'C:\NAEP Primer\Data';

Save the modified SAS program.

Open the SAS program, “M36NT2PM.SAS”, and save it in your working folder. At the top of the

M36NT2PM.SAS program, the following highlighted file paths need to be changed according to

where the data file is and where you want to save your scored SAS data set:

%INCLUDE 'C:\NAEP Primer\SAS\FORMAT.SAS'; (This %include statement

executes SAS program “FORMAT.SAS” modified above.)

TITLE1 "2005 Primer Mathematics Grade 8 Student, Teacher, & School

Data";

TITLE2 "Create system file ";

LIBNAME PRIMER 'C:\NAEP Primer\Data';

(This path is where you want to save your scored SAS data file, and it

should be the same as the SAS library in “FORMAT.SAS”)

DATA PRIMER.M36NT2PM_Scored (DROP=I);

(User can name their dataset as they want; in this manual,

“M36NT2PM_Scored” is used.)

OPTIONS FMTSEARCH=(PRIMER);

INFILE ‘H:\ESSIN Task 14_2012\Research, Analysis and Psychometric Support\NAEP Primer \Data\M36NT2PM.DAT' LRECL=0960;

(The file path points to where your .DAT data file is located)

Scroll down to the part that is shown as in the screenshot below, and change the boxed part

(the numbers) to the same values as in the screenshot.

Then scroll down a little bit, change the “I” values (boxed part) to the same values as in the

screenshot below.

Scroll down to the end of the program, remove the “*” before “%SCORE;” and click anywhere in

the editor window, then save the changes.

Run the modified SAS program to generate a scored SAS data file. The scored data file will be

saved in the library you created (in this example, Primer).

SPSS version:

Copy the SPSS control file, “M36NT2PM.SPS”, from SPSS folder into (i.e. your) work folder.

Open the SPSS syntax file. Modify the path highlighted in yellow to where the ‘DAT’ file is

located, as shown below.

TITLE "2005 Primer Mathematics Grade 8 Student, Teacher, & School Data".

SUBTITLE "Create system file".

FILE HANDLE MNT2 /NAME='C:\PRIMER\DATA\M36NT2PM.DAT' /LRECL=0960.

DATA LIST FILE=MNT2/

Modify the recode scoring statements to score cognitive item variables. Locate the section of

code containing the text “the scoring the items” in SPSS which, is shown as below, and make the

following changes:

i) TEMPORARY command should be deleted or commented out (as shown in the next page,

which is highlighted in green) in order to save scored variables. Please note that the

‘TEMPORARY’ command instructs SPSS to perform the subsequent scoring statements on a

temporary basis and reset those variables to their original values after the EXECUTE

statement is run.

ii) To assign scores to any or all the special codes, user needs to change the appropriate value

substitution codes to the appropriate values; the remaining response codes without scoring

will be set to “SYSMIS.” Note that by default, all special response codes (e.g. #OM for omitted

response to a multiple-choice item) are recoded to the system “missing value”, which we

might want to score as “8”. Please modify each “RECODE” statement according to the scoring

guideline table shown as below which also could be found from “The NAEP Primer”. The

example to assign scores to special response codes are highlighted in yellow as below. (NOTE:

If there are any changes in the scoring guidelines with a particular data file, this change

should be reflected in recoding. Task lead will check if this change is necessary.)

Click “Run”->”All” after modifying the program and save the scored data file to where the work

folder is. Then, SPSS data file is created (e.g.“ M36NT2PM_scorced.sav”.)

To use SPSS data file to run AM direct estimation, open the AM program, click on

File/Import/SPSS.sav file, and then select the SPSS data file created in the previous step.

Part 3. Direct estimation (AM)

This part uses SAS dataset as an example, and the steps are exactly the same for SPSS except choosing

the SPSS dataset instead of SAS.

1. Import data into AM

Open AM, and import the scored SAS data file saved in your folder by clicking on “File/Import/General Import,” change the Files of Type control to “SAS for windows V7/8/9 (*.sas7bdat),” and locate and select the SAS data file as shown below. Click “open” to import the data.

The following window will pop up. SAS format does not have to be imported for AM direct estimation procedures. Click “done” in the SAS Format Finder window to complete the data import step.

Click on “Update Metadata” from the File menu as shown below. Select “M36NT2PM.dct” in the “D:\NAEP Primer\AM” folder, then click OPEN. The extension “dct” indicates that this is a dictionary file.

This procedure of updating metadata is to read the item parameter estimates and subtest information for each cognitive item into AM along with the transformation constants so that theta scale estimates for each subscale from item response data and IRT parameter data can be transformed into a NAEP reporting metric. Composite scores are estimated as weighted average of subscale estimates, using the relative weight information as provided in the .dct file.

Save the data file as “M36NTPM_Scored.am” in your working folder if you need to use this data file again. In doing so, it helps to avoid repeating the step of “Update Metadata” every time.

2. Analyze data with AM

Click on Data/Filter Observations as shown below. In the Select Observations dialog box, select and double click on “RPTSAMP” in the Variables window and then double click on the Equal Sign (“=”) in the Operators/Functions window. Type “1” in the Selection Criteria box and then click on OK.

Click on “Statistics/MML Procedures for Test Data” to start the analysis as shown below, and choose “MML Composite Means”.

AM allows user to select variables by dragging variables from one dialog box to another. Here drag and drop the whole test at once or five subtests under the Tests tab on the left window (as shown below) individually into the Dependent Variables box. Then, select the Variables tab, locate the variable DSEX, and drag and drop it into the Independent Variables box. Finally, change the output type to Plain Text Output. Make sure that Strata is “REPGRP1”, Cluster is “JKUNIT” and Weight is “ORIGWT”, as shown below. Then go to “Advanced Parameters” tab, in “Max. Iterations” type in 500 and click OK.

Fill in the title ‘2005 Primer Mathematics Grade 8 Student, Teacher & School Data’, then

drag and drop the tests from the ‘Test’ tab into the box for dependent variables.

Identify the variable “DSEX” under ‘Variables’ tab; and drag and drop it into the box for independent variables

Change the output format to “Plain Text Output” to output text file and make sure that Strata is “REPGRP1”, Cluster is “JKUNIT” and Weight is “ORIGWT”, as shown below.

Change the number of Max. Iterations to 500; and click OK to run direct estimation.

The result window opens in several minutes. Save it as a text file “M36NT2PM_Scored.txt” shown below in your working folder.

If you change the variable DSEX to SDRACEM, but all other steps are the same, then you will get the following result.

Direct estimation using NAEP data with AM (SAS or …eo2.commpartners.com/.../Direct_Estimation_Using_NAEP_Data...2013.pdfDirect estimation using NAEP data with AM (SAS or SPSS) ...

Documents