Top Banner
NCSA, Detroit, June 2010
49

NCSA, Detroit, June 2010. Kevin King, Utah State Office of Education Sarah Susbury, Virginia Department of Education Dona Carling, Measured Progress.

Dec 28, 2015

Download

Documents

Brett Baker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

NCSA, Detroit, June 2010

Page 2: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Kevin King, Utah State Office of Education Sarah Susbury, Virginia Department of

Education Dona Carling, Measured Progress Kelly Burling, Pearson Chris Domaleski, National Center for the

Improvement of Educational Assessment

Page 3: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

For not surprising reasons, many states deliver statewide assessments in both paper and computer modes.

Also not surprising, comparability of scores for the two modes is of concern.

What about comparability issues when switching computer-based testing interfaces?

Page 4: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

This session will explore the idea of comparability in the context of statewide testing using two administration modes.

Challenges addressed will include:◦ satisfying peer review,◦ the impact of switching testing providers,◦ changing interfaces, ◦ adding tools and functionality, and◦ adding item types.

Page 5: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

The ability of a system to deliver data that can be compared in standard units of measurement and by standard statistical techniques with the data delivered by other systems. (online statistics dictionary)

Comparability refers to the commonality of score meaning across testing conditions including delivery modes, computer platforms, and scoring presentation. (Bennett, 2003)

Page 6: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

From Peer Review Guidance (2007) Comparability of results Many uses of State

assessment results assume comparability of different types: comparability from year to year, from student to student, and from school to school. Although this is difficult to implement and to document, States have an obligation to show that they have made a reasonable effort to attain comparability, especially where locally selected assessments are part of the system.

Page 7: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Section 4, Technical Quality 4.4 When different test forms or formats are

used, the State must ensure that the meaning and interpretation of results are consistent.

Has the State taken steps to ensure consistency of test forms over time?

If the State administers both an online and paper and pencil test, has the State documented the comparability of the electronic and paper forms of the test?

Page 8: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Section 4, Technical Quality 4.4 Possible Evidence Documentation describing the State’s

approach to ensuring comparability of assessments and assessment results across groups and time.

Documentation of equating studies that confirm the comparability of the State’s assessments and assessment results across groups and across time, as well as follow-up documentation describing how the State has addressed any deficiencies.

Page 9: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Bennett, R.E. (2003). Online assessment and the comparability of score meaning. Princeton, NJ: Educational Testing Service.

U.S. Department of Education (2009). Standards and Assessments Peer Review Guidance: Information and Examples for Meeting Requirements of the No Child Left Behind Act of 2001. Washington, DC: U.S. Government Printing Office.

Page 10: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Kevin KingAssessment Development Coordinator

Page 11: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Previous concerns have been around PBT and CBT comparability

Potentially more variability between CBT and CBT than between CBT and PBT

There are policy considerations about when to pursue comparability studies and when not to

Page 12: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

• Which tests– 27 Multiple Choice CRTs• Grades 3 – 11 English language arts• Grades 3 – 7 math, Pre-Algebra, Algebra 1, Geometry, Algebra 2• Grades 4 – 8 science, Earth Systems Science, Physics, Chemistry,

Biology

• Timeline (total test administrations approximately 1.2 million)

– 2001 – 2006: 4% – 8%– 2007: 8%– 2008: 50%– 2009: 66%– 2010: 80%

Page 13: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Focus on PBT to CBT◦ Prior to this year, that is what was warranted

Page 14: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

2006 (8% CBT participation) ◦ Item by item performance comparison◦ Matched Samples Comparability Analyses

Using NRT as a basis for the matched set◦ ELA (3, 5, & 8), Math (3, 5, & 8), Science (5 & 8)◦ Results◦ Actionable conclusions

Page 15: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Content

Grade

Higher/Lower on CBT compared to PBT

Statistically significant

ELA 3 Lower Yes

ELA 5 Lower No

ELA 8 Higher No

Math 3 Higher Yes

Math 5 Higher Yes

Math 8 Higher Yes

Science

5 Lower No

Science

8 Higher No

2006 Study Results

Conclusions: Additional investigations around mode by items be conductedPolicy overtones: Rapid movement to 100% CBT impacts these investigations.

Page 16: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

• 2008 (50 % CBT participation)– Focus on mode transition (i.e., from PBT one year

to CBT the next year)–Determine PBT and CBT rs-ss tables for all

courses– Benefit to CBT if PBT ss is lower than CBT– Very few variances between rs-ss tables (no

variances at proficiency cut)• Conclusions and Policy: move forward with

CBT as base for data decisions

Page 17: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Issues◦ Variations due to local configurations

Screen resolution (e.g., 800x600 vs. 1280x1024) Monitor size Browser differences

How much of an issue is this

Page 18: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

A current procurement dilemma◦How to mitigate

What could be different will be different◦Item displays

How items are displayed As graphic images, with text wrapping How graphics are handled by the different systems

◦Item transfer between providers and interoperability concerns Brought about re-entering of items and potential

variations in presentation

Page 19: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

How to make decisions as the same system with the same vendor advances◦ Tool availability◦ Text wrapping◦ HTML coding

When do you take advantage of technology advance, but sacrifice item similarity

Page 20: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

◦ Portal/Interface for item access How students navigate the system How tools function (e.g., highlighter, cross out, item

selection) How advanced tools function

Text enlarger Color contrast

Page 21: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Utah will be bringing on Technology Enhanced items.

How will the different tests from year to year be comparable?

What about PBT tests versions as accommodated?

Page 22: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Technical disruptions during testing Student familiarity with workstation and

environment

Page 23: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

PBT and CBT CBT and PBT Vendors Operating Systems and Browsers Curriculum changes

Page 24: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Forced us to really address ◦ why is comparability important◦ AND◦ what does that mean?

Is “equal” always “fair”?

Page 25: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Sarah SusburyDirector of Test Administration, Scoring,

and Reporting

Page 26: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

• Which tests• Grades 3 – 8 Reading and End-of-Course (EOC) Reading• Grades 3 – 8 Math, EOC Algebra 1, EOC Geometry, EOC Algebra 2• Grades 3, 5, and 8 Science, EOC Earth Science, EOC Biology, EOC

Chemistry• Grade 3 History, Grades 5 – 8 History, EOC VA & US History, EOC

World History I, EOC World History II, EOC World Geography

• Phased Approach (EOC --> MS --> ES)• Participation by districts was voluntary• Timeline (growth in online tests administered)

– 2001: 1,700– Spring 2004: 265,000– Spring 2006: 924,000– Spring 2009: 1.67 million – Spring 2010: 2.05 million

Page 27: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

• Conducted comparability studies with the online introduction of each new EOC subject (2001 – 2004)

• Students were administered a paper/pencil test form and an equivalent online test form in the same administration– Students were not provided scores until both tests

were completed– Students were aware they would be awarded the

higher of the two scores (motivation)

Page 28: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

• Results indicated varying levels of comparability.

• Due to Virginia’s graduation requirements of passing certain EOC tests, decision was made to equate online tests and paper/pencil tests separately.

• Required planning to ensure adequate n-counts in both modes would be available for equating purposes.

• Comparability has improved over time.

Page 29: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Some accommodations transfer easily between modes:◦ Read aloud and audio test administrations◦ Allowable manipulatives (bilingual dictionary,

calculator, etc)◦ Answer transcription◦ Visual aids, magnifiers

Other accommodations do not readily transfer:◦ Flexible administration (variable number of test

items at a time)◦ Braille (cost of Braille writers)◦ Large Print forms (ability to hold the test form)

Page 30: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

• Screen resolution: Changing from 800 X 600 dpi to 1024 X 768 dpi– Eliminating older hardware from use for online testing– Changing the amount of scrolling needed for full view– Revise flow of text?

• Desktop vs laptop computers– Less of an issue than in early 2000’s

• Laptop computers vs “Netbooks”– Variability of “Netbooks”

Page 31: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

New vendor or changes in current vendor’s system◦ Test navigation may vary

Advancing or returning to items Submitting a test for scoring Audio player controls

◦ Test taking tools & strategies may vary Available tools (e.g., highlighter, choice eliminator,

mark for review, etc)◦ Changes in test security

Prominent display of student’s name on screen?

Page 32: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Virginia is implementing technology enhanced items simultaneously with revised Standards of Learning (SOL)◦ Mathematics SOL

Revised mathematics standards approved in 2009 Field testing (embedded) new technology enhanced

math items during 2010-2011 test administrations Revised math assessments implemented in 2011-2012

with new standard setting

◦ English and Science SOL Revised English and science standards approved in 2010 Field testing (embedded) new English and science items

during 2010-2011 test administrations Revised assessments (reading, writing, and science)

implemented in 2013

Page 33: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Sometimes there is no choice:◦ New technology; prior technology becomes obsolete◦ Procurement changes/decisions

Sometimes there is a choice (or timing options):◦ Advances in technology◦ Advances in assessment concepts

Page 34: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

A Shared Responsibility◦ Provide teachers with training and exposure to

changes in time to impact instruction and test prep. Systems, test navigation, test items, accommodations,

etc.◦ Provide students with training and exposure to

changes prior to test preparation and prior to testing Practice tests in the new environment, sample new item

types, etc.◦ Communicate changes and change processes to all

stakeholders in the educational community

Page 35: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Kelly BurlingPearson

Page 36: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

WhenWhatWhyHowWhat’s Next

Page 37: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Whenever!

Comparability Studies for Operational Systems

& Comparability Studies for Research

Purposes

Page 38: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Operational Introducing CBT Curricular Changes New Item Types New Provider New Interface Any time there are

changes in a high stakes environment

Research Introducing CBT Curricular Changes New Item Types New Provider New Interface Changes over time Changes in technology

use in schools

Page 39: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

See slides 5, 6, 7, & 8

Page 40: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Evaluation Criteria◦ Validity◦ Psychometric◦ Statistical

Wang, T., & Kolen, M. J. (2001). Evaluating comparability in computerized adaptive testing: Issues, criteria and an example. Journal of Educational Measurement 38, 19–49.

Page 41: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

User Experience & Systems Designs Cognitive Psychology Construct Dimensionality Relationships to External Criterion Variables Sub-group differences

Page 42: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Score distribution Reliability Conditional SEM Mean difference Propensity Scores Item Level

◦ Mean difference◦ IRT parameter differences◦ Response distributions◦ DIF

Page 43: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Evaluating the assumptions underlying the◦ scoring model◦ test characteristics◦ study design

Page 44: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Performance assessments E Portfolio with digitally created content, e

portfolio with traditional content, physical portfolio

Platforms Devices Data Mining

Page 45: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

Chris DomaleskiNational Center for the Improvement of

Educational Assessment

Page 46: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

• NCLB Federal Peer Review Guidance (4.4) requires the state to document the comparability of the electronic and paper forms of the test

• AERA, APA, and NCME Joint Standards (4.10) “a clear rationale and supporting evidence should be provided for any claim that scores earned on different forms of a test may be used interchangeably.”

Page 47: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

• Design– Item and form development processes (e.g. comparable

blueprints and specifications) – Procedures to ensure comparable presentation of and

interaction with items (e.g. Can examinees review the entire passage when responding to passage dependent items?) • Don’t’ forget within mode consistency. For example, do items

render consistently for all computer users? – Adequate pilot and field testing of each mode

• Administration– Certification process for computer based administration

to ensure technical requirements are met– Examinees have an opportunity to gain familiarity with

assessment mode– Resources (e.g. calculator, marking tools) are equivalent– Accommodations are equivalent

Page 48: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

• Analyses– Comparability of item statistics • To what extent do the same items produce different

statistics by mode?• Are there differences by item ‘bundles’ (e.g. passage or

stimulus dependent items) • DIF studies

– Comparability of scores• Comparability of total score by tests (e.g. grade, content)• Comparability of total score by group (e.g. SWD, ELL etc.)• Comparability of total score by ability regions (e.g. by

quartiles, TCC correspondence)• DTF• Classification consistency studies

Page 49: NCSA, Detroit, June 2010.  Kevin King, Utah State Office of Education  Sarah Susbury, Virginia Department of Education  Dona Carling, Measured Progress.

No single approach to demonstrating comparability and no single piece of evidence is likely to be sufficient

Don’t assume that findings apply to all grades, content areas, subgroups

Item type may interact with presentation mode Design considerations

◦ Are inferences based on equivalent groups? If so, how is this supported?

◦ Are inferences on repeated measures? If so are order effects addressed?

Be clear about standard of evidence required. ◦ Effect size?◦ Classification differences?