Direct estimation using NAEP data with AM (SAS or SPSS) AERA 2013 NAEP Data Training Purpose: To demonstrate how to derive direct estimation of NAEP scale scores with AM software and how to prepare data for direct estimation using SAS or SPSS. This instruction consists of three parts: Part 1. Preparing the dictionary file; Part 2. Data preparation using SAS or SPSS; and Part 3. Direct estimation using AM. Data: This demonstration uses the NAEP Primer mini-sample public-use data file. This probability sample contains data from the assessed students within the schools that participated in the 2005 NAEP grade 8 mathematics assessments. The sample includes about 10% of the students who sat for this assessment. This file is rectangular and has a row for each assessed student. It contains selected variables from the NAEP respondent data file, including student, teacher, and school background questionnaire data as well as sampling information. The use of these data is not restricted. We have chosen this data set because the full NAEP database is restricted in order to protect the anonymity of the NAEP participants. Using the full database would require obtaining a Restricted-Use Data License (see http://nces.ed.gov/statprog/instruct.asp) and careful attention to the rules for usage. The variables in this NAEP Primer sample were chosen to avoid the identification of individuals or schools and additional safeguards such as data perturbation were taken to assure the privacy of the sample. The result is that the statistical estimates made from these data will differ slightly from those made from the full NAEP database. Part 1. Preparing the dictionary file The dictionary file is an input file for direct estimation. It includes the following information: Item ID, Item type, Subscale the items belongs to, item parameters, the A and B transformation coefficients (from the Calibrating Scale Units to the Units of the Reporting Scale), weights of each subscale in determining the composite scale, variable names for the sampling weight and design variables (stratum and cluster). In this exercise we will use a Macro written in Excel that contains all the relevant information mentioned above. The macros will create the dictionary file “Math05g8_dictionary.dct”. The Excel file is titled “Math05g8_dictionary.xls”. The first sheet of the Excel Workbook (Items) includes the item level information. Part of the information in this sheet is displayed below. ID item on A C B D1 D2 D3 D4
18
Embed
Direct estimation using NAEP data with AM (SAS or …eo2.commpartners.com/.../Direct_Estimation_Using_NAEP_Data...2013.pdfDirect estimation using NAEP data with AM (SAS or SPSS) ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Direct estimation using NAEP data with AM (SAS or SPSS)
AERA 2013 NAEP Data Training
Purpose: To demonstrate how to derive direct estimation of NAEP scale scores with AM software and
how to prepare data for direct estimation using SAS or SPSS. This instruction consists of three parts: Part
1. Preparing the dictionary file; Part 2. Data preparation using SAS or SPSS; and Part 3. Direct estimation
using AM.
Data: This demonstration uses the NAEP Primer mini-sample public-use data file. This probability sample
contains data from the assessed students within the schools that participated in the 2005 NAEP grade 8
mathematics assessments. The sample includes about 10% of the students who sat for this assessment.
This file is rectangular and has a row for each assessed student. It contains selected variables from the
NAEP respondent data file, including student, teacher, and school background questionnaire data as
well as sampling information. The use of these data is not restricted. We have chosen this data set
because the full NAEP database is restricted in order to protect the anonymity of the NAEP participants.
Using the full database would require obtaining a Restricted-Use Data License (see
http://nces.ed.gov/statprog/instruct.asp) and careful attention to the rules for usage. The variables in
this NAEP Primer sample were chosen to avoid the identification of individuals or schools and additional
safeguards such as data perturbation were taken to assure the privacy of the sample. The result is that
the statistical estimates made from these data will differ slightly from those made from the full NAEP
database.
Part 1. Preparing the dictionary file
The dictionary file is an input file for direct estimation. It includes the following information: Item ID,
Item type, Subscale the items belongs to, item parameters, the A and B transformation coefficients
(from the Calibrating Scale Units to the Units of the Reporting Scale), weights of each subscale in
determining the composite scale, variable names for the sampling weight and design variables (stratum
and cluster).
In this exercise we will use a Macro written in Excel that contains all the relevant information mentioned
above. The macros will create the dictionary file “Math05g8_dictionary.dct”. The Excel file is titled
“Math05g8_dictionary.xls”. The first sheet of the Excel Workbook (Items) includes the item level
information. Part of the information in this sheet is displayed below.
ID item on A C B D1 D2 D3 D4
type subtest
M066401 MC 1 0.6784 0.1494 -
0.3275
M020101 CR 1 1.3277 -
0.7218 0
M093701 MC 1 1.2169 0.165 1.8095
M086001 MC 1 1.0513 0.2178 1.0011
M051901 MC 1 1.5963 0.0747 0.6095
M046001 CR 1 0.5531 -
2.5249 0
M046901 CR 1 0.8571 -
1.8442 0
…
M076001 CR 2 0.4702 1.3421 -
2.0168 0.8414 0.1818 0.9937
M047201 MC 2 0.8848 0.0691 -
1.0809
M068008 CR 2 0.3969 2.0894 0.2514 -
0.2514
M093601 CR 2 0.8392 1.5419 1.4806 -
1.4806
M085601 MC 2 1.2074 0.2823 -
0.1178
M085501 MC 2 1.2542 0.193 0.5841
M047101 MC 2 1.0961 0.169 0.178
M047901 CR 2 1.1949 1.3106 0
M140401 MC 2 1.1048 0.1213 -
0.6061
M140801 MC 2 1.3253 0.2014 0.6344
M141001 MC 2 1.2542 0.171 0.0911
The first column indicates the item ID. This is the variable names used in the data files for the items. The
second column indicates whether the item is a Multiple-Choice (MC) or a Constructed Response (CR)
item. The third column indicates which subscale the item belongs to. There are 5 subscales in the Grade
8 Mathematics assessment in 2005: (1) number properties and operations, (2) measurement, (3)
geometry, (4) data analysis and probability, and (5) algebra. The other columns indicate the item
parameters. All items have an A (discrimination) and a B (difficulty) parameter. In addition, MC items
have a nonzero C (guessing) parameter and polytomously scored CR items have D parameters. For
dichotomously scored CR items D1 is set to zero. These items parameters are listed in PART 6 of the
Data companion titled “2005_NAEPDataCompanion.pdf”.
The second sheet of the Excel Workbook includes user inputs. The information in this sheet is displayed
Modify the recode scoring statements to score cognitive item variables. Locate the section of
code containing the text “the scoring the items” in SPSS which, is shown as below, and make the
following changes:
i) TEMPORARY command should be deleted or commented out (as shown in the next page,
which is highlighted in green) in order to save scored variables. Please note that the
‘TEMPORARY’ command instructs SPSS to perform the subsequent scoring statements on a
temporary basis and reset those variables to their original values after the EXECUTE
statement is run.
ii) To assign scores to any or all the special codes, user needs to change the appropriate value
substitution codes to the appropriate values; the remaining response codes without scoring
will be set to “SYSMIS.” Note that by default, all special response codes (e.g. #OM for omitted
response to a multiple-choice item) are recoded to the system “missing value”, which we
might want to score as “8”. Please modify each “RECODE” statement according to the scoring
guideline table shown as below which also could be found from “The NAEP Primer”. The
example to assign scores to special response codes are highlighted in yellow as below. (NOTE:
If there are any changes in the scoring guidelines with a particular data file, this change
should be reflected in recoding. Task lead will check if this change is necessary.)
Click “Run”->”All” after modifying the program and save the scored data file to where the work
folder is. Then, SPSS data file is created (e.g.“ M36NT2PM_scorced.sav”.)
To use SPSS data file to run AM direct estimation, open the AM program, click on
File/Import/SPSS.sav file, and then select the SPSS data file created in the previous step.
Part 3. Direct estimation (AM)
This part uses SAS dataset as an example, and the steps are exactly the same for SPSS except choosing
the SPSS dataset instead of SAS.
1. Import data into AM
Open AM, and import the scored SAS data file saved in your folder by clicking on “File/Import/General Import,” change the Files of Type control to “SAS for windows V7/8/9 (*.sas7bdat),” and locate and select the SAS data file as shown below. Click “open” to import the data.
The following window will pop up. SAS format does not have to be imported for AM direct estimation procedures. Click “done” in the SAS Format Finder window to complete the data import step.
Click on “Update Metadata” from the File menu as shown below. Select “M36NT2PM.dct” in the “D:\NAEP Primer\AM” folder, then click OPEN. The extension “dct” indicates that this is a dictionary file.
This procedure of updating metadata is to read the item parameter estimates and subtest information for each cognitive item into AM along with the transformation constants so that theta scale estimates for each subscale from item response data and IRT parameter data can be transformed into a NAEP reporting metric. Composite scores are estimated as weighted average of subscale estimates, using the relative weight information as provided in the .dct file.
Save the data file as “M36NTPM_Scored.am” in your working folder if you need to use this data file again. In doing so, it helps to avoid repeating the step of “Update Metadata” every time.
2. Analyze data with AM
Click on Data/Filter Observations as shown below. In the Select Observations dialog box, select and double click on “RPTSAMP” in the Variables window and then double click on the Equal Sign (“=”) in the Operators/Functions window. Type “1” in the Selection Criteria box and then click on OK.
Click on “Statistics/MML Procedures for Test Data” to start the analysis as shown below, and choose “MML Composite Means”.
AM allows user to select variables by dragging variables from one dialog box to another. Here drag and drop the whole test at once or five subtests under the Tests tab on the left window (as shown below) individually into the Dependent Variables box. Then, select the Variables tab, locate the variable DSEX, and drag and drop it into the Independent Variables box. Finally, change the output type to Plain Text Output. Make sure that Strata is “REPGRP1”, Cluster is “JKUNIT” and Weight is “ORIGWT”, as shown below. Then go to “Advanced Parameters” tab, in “Max. Iterations” type in 500 and click OK.
Fill in the title ‘2005 Primer Mathematics Grade 8 Student, Teacher & School Data’, then
drag and drop the tests from the ‘Test’ tab into the box for dependent variables.
Identify the variable “DSEX” under ‘Variables’ tab; and drag and drop it into the box for independent variables
Change the output format to “Plain Text Output” to output text file and make sure that Strata is “REPGRP1”, Cluster is “JKUNIT” and Weight is “ORIGWT”, as shown below.
Change the number of Max. Iterations to 500; and click OK to run direct estimation.
The result window opens in several minutes. Save it as a text file “M36NT2PM_Scored.txt” shown below in your working folder.
If you change the variable DSEX to SDRACEM, but all other steps are the same, then you will get the following result.