Overview of Testing Standards Brian Bontempo Chad Buckendahl Timothy Muckle Sandra Neustel
Overview of Testing Standards
Brian Bontempo
Chad Buckendahl
Timothy Muckle
Sandra Neustel
What is a “Standard”?
A prescribed set of rules, conditions, or
requirements concerning definitions of terms;
classification of components; specification of
materials, performance, or operations;
delineation of procedures; or measurement
of quantity and quality in describing
materials, products, systems, services, or
practices.
Source: www.techstreet.com
Accreditation & Best Practice vs. Standards
Accreditation is the process by which a
credentialing or educational program is evaluated
against defined standards, and when in
compliance with these standards, is awarded
recognition by a third party.
A best practice is a suggested method or
technique that has consistently shown results
superior to those achieved with other means.
Technical recommendations
Codes
Guidelines
A Brief History of Standards
Antiquity
Torah – Noah’s ark and “What’s a cubit?!”
Calendars – Cavemen, Egyptians, Sumerians, Oh Mayans!
Cylindrical Stones in 7,000 B.C. Egypt
Middle Ages
Henry I, “Ell,” Length of the king’s arm
A Brief History of Standards
Industrial Revolution
International markets
Mass production
Equipment Commonality
Interchangeability / interoperability
20th Century
Global markets
Proliferation of standards
Standards Organizations: NBS / NIST (1901), ANSI (1918),
ISO (1926);
Types of Standards
Measurement
Exchange
Communication
Safety
Quality
Performance
Content / Achievement Standards
Language of Standards in Testing
Psycho“metrics”
“Standard” Setting
Question and Test “Interoperability” (QTI)
Written Standards
Technical Recommendations for Psychological Tests
and Diagnostic Techniques (APA, 1954)
Technical Recommendations for Achievement Tests
(AERA/NCMUE, 1955)
Testing Standards
K-12 Academic Testing Standards & Guidelines
US DOE Standards and Assessment Guidance
CCSSO/ATP Best Practices
AERA/NCME/APA
Certification & Licensure
National Commission for Certifying Agencies
ANSI/ISO/IEC 17024
ICE 1100
Buros Center for Testing
Employment Testing
Equal Employment Opportunity Commission
Uniform Guidelines on Employee Selection
Brian Bontempo, Ph.D.
Brian Bontempo, Ph.D.
AERA/APA/NCME Standards
Although these are called standards, they’re not.
The Standards exclusively use the word “should”.
There is no accreditation model to ascertain that testing programs have met the Standards.
The sheer volume of standards is too excessive to call these standards.
No organization has succeeded in meeting the relevant standards.
Although many professional organizations were involved in the review process, the Standards were written by academics and lack a sense of the real world.
AERA/APA/NCME Standards
The Standards are a great set of Best Practice
Guidelines.
The Standards are process standards.
The Standards are applicable to a wide variety of
testing programs which makes them a universal
reference point.
The Standards are quite extensive.
The Standards place a heavy emphasis on
validity and validity evidence, especially
consequential validity.
“The largest motivating force behind K12
Academic Assessment Standards is compliance
rather than quality.” – Famous Long-haired NCCA
Psychometric Reviewer
K-12 Academic Testing Standards & Guidelines
Vocabulary
Types of Standards
Content Standards
Achievement Standards
Assessment Standards
Types of Assessment
Summative Assessment
Formative Assessment
Interim Assessment
Vocabulary
Test Sponsors
State
District
School
Teacher
Populations Being Tested
Special Education Students
ELL Students
All other Students
Types of K12 Academic Assessments
Statewide Summative Assessments
District or School Formative Assessments
Classroom Assessments
Unusual Aspects of Statewide Assessments
“System” of Assessments
Multiple Grades
Multiple Subjects
Multiple Measures
Statewide Assessments
Local Assessments
Multiple Uses for a single measure
Alternate Assessments (For IEP and LEP students)
Super detailed achievement standards
Statewide K12 Academic Assessment Standards
US Department of Education Guidelines
AERA/APA/NCME Standards for Educational and
Psychological Testing
CCSSO & ATP Operational Best Practices for
Statewide Large-Scale Assessment Programs
(109 pages)
Target Audience
US DOE Guidance
State School Officers
US DOE Peer Reviewers
Test Vendors
CCSSO & ATP Best Practices
State School Officers
Test Vendors
AERA/APA/NCME Standards (Ch 13)
State School Officers
US DOE Peer Reviewers
Superintendents
Test Vendors
Target Audience
Super-intendents
US DOE Peer Reviewers
State School Officers
Test Vendors
US DOE
CCSSO/ATP
AERA/APA/NCME
US Department of Education Guidelines
Elementary and Secondary Education Act (1965)
Amended in 2002 as No Child Left Behind
ED Recovery Act as part of the American
Recovery and Reinvestment Act (2009)
Race to the Top
US Department of Education Guidelines
US DOE Guidelines
ESEA
Not intended to provide a set of standards
Provide guidance to the Peer Reviewers who make
recommendations to DOE
In essence, it provides the space for there to be 50
different assessment systems.
Race to the Top
Not intended to provide a set of standards
Used Peer Review Guidelines in determining winners
In essence, it consolidates the number of assessments
US DOE Non Regulatory Guidance Documents
http://www2.ed.gov/policy/gen/guid/significant-guidance.html
2010-05-21 – Race to the Top Assessment Program Guidance and Frequently Asked Questions
2009-01-12 – Growth Models – Non-Regulatory Guidance
2007-10-19 – Final Guidance on Maintaining, Collecting, and Reporting Racial and Ethnic Data to the U.S. Department of Education
2007-12-21 – Standards and Assessment Peer Review Guidance: Information and Examples for Meeting Requirements of the No Child Left Behind Act of 2001
2007-07-20 – Modified Academic Achievement Standards – Non-Regulatory Guidance
2007-07-20 – Additional Title I Provisions included in the Regulations Package on Modified Academic Achievement Standards Published in the Federal Register on April 9, 2007 – Non-Regulatory Guidance
2007-05 – Assessment and Accountability for Recently Arrived and Former Limited English Proficient (LEP) Students – Non-Regulatory Guidance
2003-03-10 – Standards and Assessments – Non-Regulatory Guidance
US DOE Standards & Assessments Guidance (2003)
III. Academic Assessment
IV. Issues Related to Special Populations and
Standards and Assessments
V. Assessment Data
VI.Assessments of English Language Proficiency
VII.Federal Funds for State Standards and
Assessments
US DOE Standards and Assessments Guidance (2003)
“More detailed information on validity and
reliability may be found in the forthcoming
Technical Addendum for Standards and
Assessments.”
This was never published.
US DOE Peer Review Guidelines
Ch1 – Content Standards
Ch 2 – Achievement Standards
Ch 3 – System of Assessments
Ch 4 – Technical Quality of Assessments
Ch 5 – Alignment of Assessments with Standards
Ch 6 – Assessment of All Students
Ch 7 – Assessment Reports
US DOE Guidelines Salient Aspects
“System” of Assessments
Content Standards
Achievement Standards
Coherency across grades and subjects
Equivalency in content, difficulty, quality
Higher Order Thinking Skills
Matrix Sampling
Communication to Stakeholders
Results must be expressed in terms of the standards
Reporting Standards
CCSSO & ATP Best Practices for Statewide Large-
Scale Assessment Programs
CCSSO & ATP Best Practices
It’s not a Best Practices Guidelines, it’s an
instructional text
Very Practical, real world
First Attempt
Published in 2012
CCSSO & ATP Best Practices Salient Aspects
Program Management
Program Management & Customer Service
Third Party Management
Assessment Program Procurement
Transition from one Provider to Another
Large Scale Distributed Paper & Pencil Administration
Manufacturing of Materials
Materials Packing
Transportation of Materials
Receiving & Processing Materials
Scanning
CCSSO & ATP Best Practices Common Aspects
Test Development
Item Development
Test Construction
Field Testing
Assessment of Special Populations
Test Administration
Test Administration
Online Assessment & Technical Support
Scoring
Score Reporting
Technical Defensibility
Technical
Item Banking
Data Management
Security
CCSSO & ATP Best Practices Value
Roadmap for Staff at State School Offices
Roadmap for Test Vendors
Raise awareness of the Public
Increase quality
Increase similarity of programs across states
AERA/APA/NCME
Standards for Educational and Psychological Testing
AERA/APA/NCME Standards – Ch 13
School, District, or State Tests
Admissions Tests
Special Needs Assessments
Classroom Tests
AERA/APA/NCME Standards – Ch 13 Salient Aspects
Test Use
Provide evidence to support each use
Monitor unintended use
Test Administration
Qualify Test Administrators
Allow unlimited attempts if used for graduation
Test Preparation
OTL
Teaching to the Test
Interpretation
Advocate Local Norms
Stats to be reported
Visual Summarization of K12 Academic Assessment Standards
Development
Administration
Security
Psychometrics
Reporting
Interpretation
System of Assessments
Group Results (AYP, AMO)
Test Preparation
Program Management
Large Scale Paper & Pencil Administration
US DOE
CCSSO/ATP AERA/APA/NCME
K12 Academic Assessment Standards Summary
US DOE
Pseudo-accreditation model (Get funding or not)
Compliance rather than Best Practices
Lacks detail and specificity in its guidance
Regulator Lens
AERA/APA/NCME Standards
No accreditation model
Best Practices
Enormous amounts of specificity
Academic Lens
CCSSO/ATP Best Practices
No accreditation model
Best Practices
Large amount of specificity
Practitioner Lens
Sandra Neustel, Ph.D.
Sandra Neustel, Ph.D.
38
NCCA
20 standards
Details added with “Essential elements”
3 Main Groups
Organizational
Assessment/Psychometric
Certification
Product oriented; documentation;
procedures/evidence of procedures
NCCA History
1977 National Commission for Health Certifying
Agencies (NCHCA)
1989:
National Commission for Certifying Agencies (NCCA)–
standard setting and accreditation
National Organization for Competency Assurance
(NOCA)—membership organization
Institute for Credentialing Excellence (ICE) 2009
NCCA Standards
Psychometric
Credential &
Certificate Organization
NCCA- Organization
1 Purpose of certification program
2 Structure of program- policy/procedures; no conflict of interest
3 Governance (public member; members of board)
4 Finance
5 Staff- knowledge/skill; enough staff
6 Information publishing policies; eligibility requirements; exam results, etc.
7 Description of assessment instrument
8 Award certificate equally
9 Verification of certification
NCCA-Psychometric
10 Practice analysis
11 Exam Development and administration
12 Standard setting
13 Scoring procedures covers performance exams;
score reports; aggregate reports
14 Score reliability
15 Equating
NCCA-Certification
16 Standardization of assessment (administration;
scoring; development)
17 Documentation
18 Records policy
19 Recertification policy
20 Recertification justification
21 Maintain Accreditation
ANSI/ISO/IEC 17024
Focus on “conformity assessment”
a demonstration that specified requirements relating to
a product, process, system, person or body are fulfilled
Benchmarks for organizations….
Globally accepted
Relates back to other ISO standards
Published 2003- update later this year
ANSI/ISO/IEC 17024
The standards emphasize the structure of the
organization:
1 Scope
2 Normative References
3 Terms and Definitions
4 Requirements for certification bodies
5 Requirements for persons employed or contracted by a
certification body
6 Certification process
4 Requirements for certification bodies
Certification Body- aspirational ability to make its own definition; procedure; requirements
Organizational structure-more practical than above
Independence; finances; conflict of interest;
Development and maintenance of certification scheme
Practice analysis mentioned here; appropriate examination methodology
Management system
Subcontracting
Records
Confidentiality
Security
5 Requirements for persons employed / contracted by cert. body
General
Job descriptions; competence requirements; employee
records
Requirements for examiners
6 Certification process
Application
Evaluation reporting procedures; documentation
of exam
Decision- content of certificate
Surveillance
Recertification
Use of certificates and logos/marks
ANSI/ISO/IEC 17024
Psychometric
Credential &
Certificate Organization
Buros
Buros Institute for Assessment Consultation and
Outreach (BIACO)
“Standards for Proprietary Testing Programs”
(2006)
Draws from AERA/NCME/APA standards
Audit/accreditation component
Buros
1 Purpose of the Testing Program
2 Structure and Resources of the Testing Program
3 Examination Development
4 Examination Administration
5 Scoring and Score Interpretation
6 Exam Security
7 Responsibilities to Examinees and the Public
Buros
Psychometric
(Testing)
Credential &
Certificate
Organization
ICE 1100
SCOPE
Assessment-based certificate programs
Programs that provide education and training and
use an assessment at the end
Not classes, courses, programs, or events…
ICE 1100
Organizational Structure and Resources
Conduct and Oversight of Certificate Program
Activities
Records and Document Management
QA & Program Evaluation
Development, Delivery & Evaluation of
Education/Training
Development, Delivery & Evaluation of
Assessments
Issuance and Use of Certificates
Relative Emphasis
ORGANIZATIONAL
ANSI
NCCA
Buros
PSYCHOMETRICS
Buros
NCCA
ANSI
Chad Buckendahl, Ph.D.
Chad Buckendahl, Ph.D.
57
Employment testing
Standards for Educational and Psychological
Testing (new version anticipated fall 2013)
Cognitive, non-cognitive/personality, or a
combination
Cognitive: job-related knowledge, skills, abilities
Non-cognitive/Personality: traits associated with
predictive success for fit with the position and/or the
organization
58
Core similarities
Validity
Intended use (e.g., screen in, screen out)
Current competency or predictor of future success
Reliability
Often internal consistency of the instrument
Decision consistency with external criterion
Fairness
Substantive: job-related characteristics, minimize CIV
Procedural: processes, conditions, opportunity
59
Highlighted differences
Intended reference group
Internal candidate pool for development and validation
Sample size, restriction of range
Criterion variable
Validity influences confidence in test construction
Objective (e.g., sales performance) versus more
subjective (e.g., supervisor performance ratings)
Adverse impact
4/5th’s rule: presumed if protected class performs at
less than 80% of the majority group’s performance
60
Additional resources
Society for Industrial and Organizational
Psychology (SIOP) Principles for the Validation
and Use of Personnel Selection Procedures
Uniform Guidelines on Employee Selection
Equal Employment Opportunity Commission
(EEOC)
Caselaw
61
Wrap Up
Timothy Muckle, Ph.D.
62
63
AERA/NCE/APA
BUROS
K12
CSSO/ATP
US DOE
Licensure/Cert
ICE 1100; 17024
NCCA
ANSI
Employment
EEOC
Uniform Guidelines
Non-cognitive External criterion
Future success
Recertification Protection of Public
“System of Assessments” Legal compliance
Group Results AYP / Funding
Visual Summary of Testing Standards
Wrap Up
64 http://buros.org/standards-codes-guidelines
Questions?