Nadine McBride, NCDPI Melinda Taylor, NCDPI Carrie Perkis , NCDPI
Post on 15-Feb-2016
65 Views
Preview:
DESCRIPTION
Transcript
Enhancing the Technical Quality of the North Carolina Testing
Program: An Overview of Current Research Studies
Nadine McBride, NCDPIMelinda Taylor, NCDPICarrie Perkis, NCDPI
Overview
• Comparability• Consequential validity• Other projects on the horizon
Comparability• Previous Accountability Conference
presentations provided early results• Research funded by an Enhanced
Assessment Grant from the US Department of Education
• Focused on the following topics:– Translations– Simplified language– Computer-based– Alternative formats
What is Comparability?
Not just “same score”• Same content coverage• Same decision consistency• Same reliability & validity• Same other technical properties (i.e.,
factor structure)• Same interpretations of test results, with
the same level of confidence
Goal
• Develop and evaluate methods for determining the comparability of scores from test variations to scores from the general assessments
• The same inferences should be able to be made, with the same level of confidence, from variations of the same test.
Research Questions
• What methods can be used to evaluate score comparability?
• What types of information are needed to evaluate score comparability?
• How do different methods compare in the types of information about comparability they provide?
Products
• Comparability Handbook– Current Practice
• State Test Variations• Procedures for Developing Test Variations and Evaluating
Comparability– Literature Reviews – Research Reports– Recommendations
• Designing Test Variations• Evaluating Comparability of Scores
Results - Translations
• Replication methodology helpful when faced with small samples and widely different proficiency distributions– Gauge variability due to sampling (random) error– Gauge variability due to distribution differences
• Multiple methods for evaluating structure are helpful• Effect size criteria helpful for DIF• Congruence b/w structural & DIF results
Results – Simplified Language
• Carefully documented and followed development procedures focused on maintaining the item construct can support comparability arguments.
• Linking/equating approaches can be used to examine and/or establish comparability.
• Comparing item statistics using the non-target group can provide information about comparability.
Results – Computer-based
• Propensity score matching produced similar results to studies using within-subjects samples.
• Propensity score method provides a viable alternative to the difficult-to-implement repeated measures study.
• Propensity score method is sensitive to group differences. For instance, the method performed better when 8th and 9th grade groups were matched separately.
Results – Alternative Formats
• The burden of proof is much heavier for this type of test variation.
• A study based on students eligible for the general test can provide some, but not solid, evidence of comparability.
• Judgment-based studies combined with empirical studies are needed to evaluate comparability.
• More research is needed in methods for evaluating what constructs each test type is measuring.
Lessons Learned• It takes a village…
– Cooperative effort of SBE, IT, districts and schools to implement special studies
– Researchers to conduct studies, evaluate results
– Cooperative effort of researchers and TILSA members to review study design and results
– Assessment community to provide insight and explore new ideas
Consequential Validity
• What is consequential validity?– Amalgamation of evidence regarding the
degree to which use of test results have social consequences
– Can be both positive and negative; intended and unintended
Who’s Responsibility?
• Role of the Test Developer versus the Test User?
• Responsibility and roles are not clearly defined in the literature
• State may be designated as both a test developer and a user
Test Developer Responsibility
• Generally responsible for… – Intended effects– Likely side effects– Persistent unanticipated effects– Promoted use of scores– Effects of testing
Test Users’ Responsibility
• Generally responsible for… – Use of scores
• the further from the intended uses, the greater the responsibility
Role of Peer Review
• Element 4.1– For each assessment, including the
alternate assessment, has the state documented the issue of validity…. with respect to the following categories:• g) has the state ascertained whether the
assessment produces intended and unintended consequences?
Study Methodology
• Focus Groups– Conducted in five regions across the state– Led by NC State’s Urban Affairs – Completed in Dec 09 and Jan 10– Input of teachers and administration staff– Included large, small, rural, urban,
suburban schools
Study Methodology
• Survey Creation– Drafts currently modeled after surveys
conducted in other states– However, most of those were conducted
10+ years ago– Surveys will be finalized after focus group
results are reviewed
Study Methodology
• Survey administration– Testing Coordinators to receive survey
notification– Survey to be available in late March to April
Study Results
• Stay tuned!– Hope to make the report publicly available
on DPI testing website
Other Research Projects
• Trying out different item types• Item location effects• Auditing
Contact Information
• Nadine McBridePsychometriciannmcbride@dpi.state.nc.us
• Melinda TaylorPsychometricianmtaylor@dpi.state.nc.us
• Carrie PerkisData Analystcperkis@dpi.state.nc.us
top related