The Art and Science of Test Development—Part D
Develop norm (standardization) sample plan
The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
Kevin S. McGrew, PhD.
Educational Psychologist
Research DirectorWoodcock-Muñoz Foundation
Part A: Planning, development frameworks & domain/test specification blueprints
Part B: Test and Item Development
Part C: Use of Rasch Technology
Part D: Develop norm (standardization) plan
Part E: Calculate norms and derived scores
Part F: Psychometric/technical and statistical analysis: Internal
Part G: Psychometric/technical and statistical analysis: External
The Art and Science of Test Development
The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence.
The current module is designated by red bold font lettering
Develop norm (standardization) sampling plan
The goal:A nationally representative samplefrom which to develop test norms
Implementation of sampling plan in practical test development framework
Gather the norm (standardization) data and use for Rasch scaling, norm development, psychometric, and statistical analyses
Implementation of sampling plan inConceptual Psychometric Validity Framework
Gather the norm (standardization) data and use
for Rasch scaling, norm development, psychometric, and statistical analyses
Develop norm (standardization) sample specifications for country/nation where test is intended to be used
The goal:A nationally representative samplefrom which to develop test norms
Sampling plan should be based on best available source of national statistics (select examples below)
Stage 1: Sampling of communities
Stage 2: Sampling of schools
Stage 3: Sampling of subjects
• School-age subjects – random sampling in grades
• Preschool, university and adult subjects - quotas selected as per selected US Census variables
Three-stage sampling plan strategy used for WJ batteries in United States
The goal:A nationally representative samplefrom which to develop test norms
Goal is a nationally representative sample upon
which to base the norms for the measures (tests, clusters)
No sampling plan is perfect: “Tweak” final norm data via subject weighting
A few tips/cautions
•Oversample small groups and then down-weight
•No amount of creative weighting can “fix” a poorly executed sampling plan
•Continuous sample analysis: Make sure to frequently monitor data as it is collected to see how close the “fit” between the sampling plan and actual subject characteristics. Important so you can make adjustments during the data collection (before it is too late)
WJ III Three-stage sampling plan strategy: Stage 1
Sampling of communities – communities sampled according to 10 different community characteristics
$$$$$$$$
$$
Are students who are categorized as “low SES” (at parent/family level) from $$ Community likely to be having the same community/life/family/educational experiences as “low SES” students (at parent/family level) from Community $$$$$$$$ ?
Important question to think about
Family/Parent SES classification
Co
mm
un
ity
SE
S c
lass
ific
atio
n
Low Middle High
Lo
w
M
idd
le
Hig
h
Cm
nty
AC
mn
ty B
Cm
nty
C
Illustration of potential threat to representative sample when community SES is ignored and only family/parent SES is used in sampling plan
Level of abilities of selected students
Community SES is one of the most important, yet most frequently overlooked variables in a test
standardization sampling plan
Sample research/norm demographic and data
file variable coding sheet: WJ III
Tips/Cautions
• Spend considerable time designing this data collection form (be OCD)
• Think of all future analyses when deciding what to include/code
• If in doubt about a variable…include it (you typically can’t go back later to get information)
• Create coding system with quality data entry procedures in mind
• You can’t analyze what you haven’t gathered and coded from the beginning
Development and implementation of sampling plan is probably the most critical phase of test development
• GIGO – garbage ingarbage out
Retain as much control over all aspects of data collection and data entry as possible
Quality of data is only as good as your pool of examiners
• Recruiting, training, supervising, and retaining good examiners requires major attention and is very important
• Examiners need supervision
• Terminate poor examiners as soon as you can
• Central office should review every single piece of information on submitted test records, especially at the beginning of an examiners testing (you don’t want them practicing errors)
• Run special Rasch “person fit” reports to flag test records that look suspicious (and see if they come from certain examiners)
• Best examiners are not necessarily psychologists or doctoral students.
(cont. next slide)
Spend considerable time developing the test record and variable coding sheets
Run “continuous sample analysis reports” to monitor sampling plan adherence or drift – so you can make changes quickly before it is too late
There is no single-purpose system, or collection of published software programs, that can handle the detailed and fluid work of data entry, editing and monitoring. Be prepared to develop and pay for custom software.
Dedicated professional data-entry software is a must. SPSS, Excel, etc. won’t cut it.
• Double data entry verification a must• Only have a few well trained and diligent individuals enter the data
Don’t succumb to samples of convenience
Oversample small groups – and then down weight
Seriously consider three-stage sampling plan that controls for community SES
Ce
ntr
al
Eu
rop
e R
ep
A
sa
mp
le
US norming US norming
Ce
ntr
al
Eu
rop
e R
ep
B
sa
mp
le
Central Europe Rep A sample
Ce
ntr
al
Eu
rop
e R
ep
B
sa
mp
le Poor sampling plan and data collection
cannot hide from during data analysis: Select example
Rasch item (W-difficulties) for a WJ III test in US norming sample and two neighboring (and very similar) Central European Republics
Something is wrong with data from Reb B. Reb A is similar to US data; Rep A and Rep B, which are similar, are not even similar in plots
Poor sampling plan and data collection cannot hide from good data analysis:
Select example
Distribution of W-abilities. Something is wrong with data from Reb B.
Rep B data is negatively skewed
US normingCentral Europe Rep A sample
Central Europe Rep B sample
Planned “incomplete” (missing) data collection
Matrix sampling (Partial)
Multiple and EM-based data imputation
Test 1 A B C D E X Total n
1. Verbal Comprehension C C C C C C 1346
2. Visual-Auditory Learning C C C C C C 1336
3. Spatial Relations C C C C C C 1375
4. Sound Blending C C C C C C 1382
5. Concept Formation C C C C C C 1300
6. Visual Matching C C C C C C 1364
7. Numbers Reversed C C C C C C 1325
8. Incomplete Words C M 627
9. Auditory Working Memory C M 548
11. General Information M M 389
12. Retrieval Fluency M M 388
13. Picture Recognition M M 390
14. Auditory Attention M M 388
15. Analysis-Synthesis M M 362
16. Decision Speed M M 376
17. Memory for Words M M 364
1. Letter-Word Identification C C C C C C 1323
2. Reading Fluency C M 588
5. Calculation C C C C C C 1210
6. Math Fluency C M 588
7. Spelling C M 557
8. Writing Fluency M M 352
9. Passage Comprehension C C C C C C 1203
10. Applied Problems C C C C C C 964
11. Writing Samples M M 342
Totals 450 102 117 127 108 120 372 1396
Mix
ed b
atte
ries
Recent example: Australian WJ IIII standardization partial matrix sampling plan
Yellow column (1) represents initial “core”
battery prior to implementation
of “core + matrix” battery
plan
End of Part D
Additional steps in test development process will be presented in subsequent modules as they are developed