Coupling Peer Review Feedback With the CCSSO
Criteria to Develop High-Quality Assessment
Systems in Literacy and Mathematics
Laura Hansen, Senior Literacy Assessment Specialist
Shelbi Cole, Senior Mathematics Specialist
PAGE 2
Coupling Peer Review Feedback With the CCSSO
Criteria to Develop High-Quality Assessment
Systems in Literacy and Mathematics
The US DOE’s peer review of state assessment systems is
designed to provide feedback to states to support the
development and administration of high-quality assessments.
States submitted evidence as part of the peer review process and
received Status Letters back from the DOE. In this session:
• Elliott Asp from Achieve will present an overview of some of
the trends that emerged from the initial round of feedback.
• Laura Hansen and Shelbi Cole from Student Achievement
Partners explain how the CCSSO Criteria for Procuring and
Evaluating High-Quality Assessments can be used to support
states in providing evidence for how their assessment
program is tailored their academic content standards.
• Melissa Fincher from Georgia will describe how Georgia is
using assessment reviews to evaluate different aspects of its
assessment system and plan for changes.
Trends in Peer Review FeedbackElliott Asp
Senior Fellow, Policy and Practice
Achieve
NCSA, Austin 2017
Peer Review for State Assessment Systems
“The Every Student Succeeds Act (ESSA) maintains the essential requirements from NCLB that each State annually administer high-quality assessments in at least reading/language arts, mathematics, and science that meet nationally recognized professional and technical standards. Therefore, as you know, the Department reinstituted peer review of State assessment systems so that each State receives feedback from external experts on the assessments it is currently administering.”
Decision Letters
• Peer review letters are posted on the USDE website: Decision Letters on each State's Final Assessment System
• “Current” Decision Letters for 35 States and the District of Columbia were posted as of June 9, 2017.
Critical Elements for Peer Review
1. Standards and Assessments
2. Assessment System Operation
3. Technical Quality – Validity
4. Technical Quality – Other
5. Inclusion of All Students
6. Achievement – Standards and Reporting
Peer Review Ratings
• Meets– Meets all of the requirements of statute and regulations.
• Substantially Meets– Meets most of the requirements, but some additional info is needed.
• Partially Meets– Does not meet a number of requirements; substantial additional
information is needed.
• Does not Meet– Does not meet most of the requirements and must be revised.
Substantially meets requirements(31% of states for one or more elements)
“…these components meet most of the requirements of the statute and regulations but some additional information is required. The Department expects that (the state )should be able to provide this additional information within one year.”
Partially meets requirements(69% of states for one or more elements)
“…the component does not meet a number of the requirements of the statute and regulations and the state will need to provide substantial additional information to demonstrate it meets the requirements. The Department expects that (the state) may not be able to submit all of the required information within one year.”
0 10 20 30 40 50 60 70
Stds and A's
A-System
Validity
Tech Q
Inclusion
Ach Stds-Report
Percent of States that Substantially or Partially Met (i.e., failed to meet) Peer Review Requirements
by Component
0
10
20
30
40
50
60
70
80
90
100
Stds and A's A-system Validity Tech Q Inclusion Ach Stds/Report
Percent of States that Substantially or Partially Met Peer Review Requirements by Element
Elements Where the Smallest % of States Failed to Meet Peer Review Requirements
1.1 Content standards for all students = .05%
1.2 Coherent and rigorous content standards = 1%
1.3 Required assessments = 19%
6.1 Achievement standards for all students = 30%
2.6 Protecting data integrity and privacy = 36%
2.5 Test security = 38%
6.3 Aligned achievement standards = 38%
Elements where the largest % of States Substantially or Partially Met (did not fully meet) Peer Review Requirements
2.1 Test design and development = 100%
3.1 Overall validity (including content) = 100%
6.4 Reporting = 83%
2.2 Item development = 77%
4.6 Multiple version of an assessment = 77%
3.2 Validity based on internal structure = 75%
4.1 Reliability = 75%
5.3 Accommodations = 75%
Percent of States That Met the Requirements for 2.1 and 3.1
• 25% (9) partially met requirements
• 33% (12) substantially met requirements
• 42% (15) had mixed ratings
• .03% (1) did not meet requirements
Example of Additional Evidence Required for 2.1
For the reading/language arts (R/LA) and mathematics general assessments in high school (ACT), (the state) must provide:
• Evidence that the test design measures the full range of the State’s grade level academic content standards (e.g., evidence of alignment of the test design blueprint to academic content standards). This evidence should include information about the State’s plan to assess the full breadth of the State’s R/LA standards during the period of the waiver.
Example of Additional Evidence Required for 3.1
• Evidence of an independent alignment study evaluating the test items to the State content standards for all assessments (State Test, Alt, and ACT).
PAGE 17
Elements where the Largest % of States Substantially
or Partially Met Peer Review Requirements
• 2.1 Test design and development = 100%
• 3.1 Overall validity (including content) = 100%
• 6.4 Reporting = 83%
• 2.2 Item development = 77%
• 4.6 Multiple version of an assessment = 77%
• 3.2 Validity based on internal structure = 75%
• 4.1 Reliability = 75%
• 5.3 Accommodations = 75%
PAGE 18
Expectations of Peer Review
“Assessments aligned to the full range of a State’s academic content standards.
A State’s assessment system under ESEA Title I must assess the depth and
breadth of the State’s grade-level academic content standards — i.e., be aligned
to the full range of those standards. Assessing the full range of a State’s
academic content standards means that each State assessment covers the
domains or major components within a content area. For example, if a State’s
academic content standards for reading/language arts identify the domains of
reading, language arts, writing, and speaking and listening, assessing the full
range of reading/language arts standards means that the assessment is aligned
to all four of these domains. Assessing the full range of a State’s standards also
means that specific content in a State’s academic content standards is not
systematically excluded from a State’s assessment system. Assessing the full
range of standards, however, does not mean that each State assessment must
annually cover all discrete knowledge and skills represented within a State’s
academic content standards; rather, assessing the full range of a State’s
academic standards means that a State’s assessment system covers all of the
knowledge and skills over a period of time. Both Critical Element 2.1 – Test
Design and Development and Critical Element 3.1 – Overall Validity, including
Validity Based on Content examine whether a State’s assessment system is
aligned to the full range of the State’s academic content standards.” (pg. 14)
PAGE 19
Critical Element 2.1
• The State’s test design and test development process is well-
suited for the content, is technically sound, aligns the
assessments to the full range of the State’s academic content
standards, and includes:
– Statement(s) of the purposes of the assessments and the intended
interpretations and uses of results;
– Test blueprints that describe the structure of each assessment in
sufficient detail to support the development of assessments that are
technically sound, measure the full range of the State’s grade-level
academic content standards, and support the intended
interpretations and uses of the results;
– Processes to ensure that each assessment is tailored to the
knowledge and skills included in the State’s academic content
standards, reflects appropriate inclusion of challenging content, and
requires complex demonstrations or applications of knowledge and
skills (i.e., higher-order thinking skills);
– If the State administers computer-adaptive assessments, the item
pool and item selection procedures adequately support the test
design.
PAGE 20
Critical Element 3.1
• The State has documented adequate overall validity evidence
for its assessments, and the State’s validity evidence includes
evidence that the State’s assessments measure the knowledge
and skills specified in the State’s academic content standards,
including:
– Documentation of adequate alignment between the State’s
assessments and the academic content standards the assessments
are designed to measure in terms of content (i.e., knowledge and
process), the full range of the State’s academic content standards,
balance of content, and cognitive complexity;
– If the State administers alternate assessments based on alternate
academic achievement standards, the assessments show adequate
linkage to the State’s academic content standards in terms of content
match (i.e., no unrelated content) and the breadth of content and
cognitive complexity determined in test design to be appropriate for
students with the most significant cognitive disabilities.
PAGE 21
Sections of the CCSSO Document
A. Meet Overall Assessment Goals and Ensure
Technical Quality
B. Align to Standards – English Language
Arts/Literacy
C. Align to Standards – Mathematics
D. Yield Valuable Reports on Student Progress and
Performance
E. Adhere to Best Practices in Test Administration
F. State Specific Criteria (as desired)
PAGE 22
Format of the CCSSO Quality Criteria
Document
B.5 Assessing writing:
Assessments
emphasize writing
tasks that require
students to engage
in close reading
and analysis of
texts so that
students can
demonstrate
college- and
career-ready
abilities.
Test blueprints and other specifications as well as exemplar test
items for each grade level are provided, demonstrating the
expectations below are met.
Writing tasks reflect the types of writing that will prepare students
for the work required in college and the workplace, balancing
expository, persuasive/argument, and narrative writing, as state
standards require. At higher grade levels, the balance shifts towards
more exposition and argument.
For example, for common core aligned assessments, goals include:
o Taking all forms of the test together, writing tasks are
approximately one-third each exposition, argument, and
narrative (some tasks may represent blended structures), with
the balance shifting towards more exposition and argument
at the higher grade levels.
Tasks (including narrative tasks) require students to confront text or
other stimuli directly, to draw on textual evidence, and to support
valid inferences from text or stimuli.
PAGE 23
CCSSO Section B
Criterion B.1: Assessing student reading and writing
achievement in both ELA and literacy
Criterion B.2: Focusing on complexity of texts
Criterion B.3: Requiring students to read closely and use
evidence from texts
Criterion B.4: Requiring a range of cognitive demand
Criterion B.5: Assessing writing
Criterion B.6: Emphasizing vocabulary and language skills:
Criterion B.7: Assessing research and inquiry
Criterion B.8: Assessing speaking and listening
Criterion B.9: Ensuring high-quality items and a variety of
item types
PAGE 24
Criterion Evidence
Requiring students to read closely and use evidence from texts:Reading assessments consist of test questions or tasks, as appropriate, that demand that students read carefully and deeply and use specific evidence from increasingly complex texts to obtain and defend correct responses.
• Test blueprints and other specifications as well as exemplar test items are provided for each grade level, demonstrating the expectations below are met. • All reading questions are text-dependent and • Arise from and require close reading and analysis of text; • Focus on the central ideas and important particulars of the text,
rather than on superficial or peripheral concepts; and • Assess the depth and specific requirements delineated in the
standards at each grade level (i.e., the concepts, topics, and texts specifically named in the grade-level standards).
• Many reading questions require students to directly provide textual evidence in support of their responses. • For example, for common core aligned assessments, goals
include • A majority of reading score points is devoted to questions that
ask students to directly provide textual evidence in support of their responses (e.g., constructed-response and/or two-part evidence-based selected-response item formats).
Assessing Reading
PAGE 25
Grade 3 Reading Item
What are two details from Because of Winn-Dixie that show that Miss Franny is becoming friends with Winn-Dixie?A. “he was okay, as long as he could see me”B. “She thought he was a bear.” C. “he came in and lay down with a ‘hummmppff.’”D. “He’ll be good,’” I told her.”E. “That dog is smiling at me.”F. “’Certain ones,’ said Miss Franny.”
This item requires students read closely, as they must look for evidence in the text to support a claim about a character. The excerpt is about the relationship between Miss Franny and Winn-Dixie, so the item focuses on central ideas of texts. Reading for Literature Standard 3.3 requires students “describe characters in a story,” the item aligns to the standard, as they must describe how the character of Miss Franny develops her relationship with Winn Dixie. Finally, because the answer options are direct quotations from the text, the item requires direct textual evidence.
Assessing Reading
PAGE 26
Assessing Writing
Criterion Evidence
Assessing writing: Assessments emphasize writing tasks that require students to engage in close reading and analysis of texts so that students can demonstrate college-and career-ready abilities.
• Test blueprints and other specifications as well as exemplar test items for each grade level are provided, demonstrating the expectations below are met.
• Writing tasks reflect the types of writing that will prepare students for the work required in college and the workplace, balancing expository, persuasive/argument, and narrative writing, as state standards require. At higher grade levels, the balance shifts toward more exposition and argument.
• For example, for common core aligned assessments, goals include• Taking all forms of the test together, writing tasks are
approximately one-third each exposition, argument, and narrative (some tasks may represent blended structures), with the balance shifting toward more exposition and argument at the higher grade levels.
• Tasks (including narrative tasks) require students to confront text or other stimuli directly, to draw on textual evidence, and to support valid inferences from text or stimuli.
PAGE 27
Assessing Writing
Grade 5 Grade 11
Write an essay in which you provide an opinion that either Marco Polo told the truth in his book or that Marco Polo made up his stories. Be sure to use information from the texts to support your opinion. Write your essay on the lines below.
Based on the speech, explain which freedom Thomas Jefferson likely considers most important for the success of the new nation. Then explain the reasons he would place that particular freedom above others mentioned. Be sure to use details and evidence from the speech as you craft your response.
The item meets the expectations of Criterion B.6 because students must carefully read the texts to make their argument. They must provide evidence to support their claims, and the evidence must come from the text.
The item meets the expectations of Criterion B.6 because students must carefully read the texts to make their argument. They must provide evidence to support their claims, and the evidence must come from the text.
PAGE 28
Assessing Speaking and Listening
Criterion Evidence
Assessing speaking and listening: Over time, and as assessment advances allow, the assessments measure the speaking and listening communication skills students need for college and career readiness.
• Over time, and as assessment advances allow, the speaking and listening skills required for college and career readiness are assessed.
For example, for common core aligned assessments, test items assessing speaking• Assess students’ ability to express well-supported ideas clearly
and to probe others’ ideas; and • Include items that measure students’ ability to marshal
evidence from research and orally present findings in a performance task.
For example, for common core aligned assessments, test items assessing listening • Are based on texts and other stimuli that meet the criteria for
complexity, range, and quality outlined in criteria B.1 and B.2 above; and
• Permit the evaluation of active listening skills (e.g., taking notes on main ideas, elaborating on remarks of others).
PAGE 29
Assessing Speaking and Listening
Grade 8 Item Aligned to Speaking and Listening Standards
Part A: What does the speaker of the video: “This is how Cookie Monster makes your kid smarter” believe to be true?A. Well-researched television shows can help children learn.B. Funny characters are an important part of keeping children interested.C. Parents must watch television shows with their children.D. Children can learn something from all television programs.
Part B: Which quotation from the video best supports the correct answer to Part A?A. “When we’re thinking about a new curriculum topic, the producers and writers really hear from
experts about what is critical for children to know and understand, and how they learn through media.”
B. “Okay, so we see that he explains what listening with his whole body is, and we have a payoff here, which is the karate belt.”
C. “It is a way to show children that if they don’t pay attention that they aren’t going to learn the instructions to get actually what they want.”
D. “Having that octopus come in allows Cookie Monster to be still.”
The item meets the requirements of criterion B.7 because the stimuli, in this case, a video, is a high-quality source that is appropriate for the grade level. The question requires students to carefully listen to the text and attend to the specific words speakers use to make and support their claims.
PAGE 30
CCSSO Section C
Criterion C.1: Focusing strongly on the content most
needed for success in later mathematics
Criterion C.2: Assessing a balance of concepts,
procedures, and applications
Criterion C.3: Connecting practice to content
Criterion C.4: Requiring a range of cognitive demand
Criterion C.5: Ensuring high-quality items and a variety of
item types
PAGE 31
Expectations of Peer Review
“Assessments aligned to the full range of a State’s academic content standards.
A State’s assessment system under ESEA Title I must assess the depth and
breadth of the State’s grade-level academic content standards — i.e., be aligned
to the full range of those standards. Assessing the full range of a State’s
academic content standards means that each State assessment covers the
domains or major components within a content area. For example, if a State’s
academic content standards for reading/language arts identify the domains of
reading, language arts, writing, and speaking and listening, assessing the full
range of reading/language arts standards means that the assessment is aligned
to all four of these domains. Assessing the full range of a State’s standards also
means that specific content in a State’s academic content standards is not
systematically excluded from a State’s assessment system. Assessing the full
range of standards, however, does not mean that each State assessment must
annually cover all discrete knowledge and skills represented within a State’s
academic content standards; rather, assessing the full range of a State’s
academic standards means that a State’s assessment system covers all of the
knowledge and skills over a period of time. Both Critical Element 2.1 – Test
Design and Development and Critical Element 3.1 – Overall Validity, including
Validity Based on Content examine whether a State’s assessment system is
aligned to the full range of the State’s academic content standards. (pg. 14)
PAGE 32
Measuring Breadth & Attending to Focus:
These Cannot be Competing Goals
• DO NOT turn the standards into a checklist to meet
the requirement for “breadth.”
• DO have a validity framework that clearly
demonstrates how decisions about which content
to measure in which proportion are grounded in
research supporting the test’s purpose(s).
PAGE 33
Mathematics
topics
intended at
each grade by
at least two-
thirds of A+
countries
Mathematics
topics
intended at
each grade by
at least two-
thirds of 21
U.S. states
Focus Strongly Where the Standards Focus
The shape of math in A+ countries
1 Schmidt, Houang, & Cogan, “A Coherent Curriculum: The Case of Mathematics.” (2002).
The standards’ strong emphasis on arithmetic in the elementary grades follows longstanding domestic recommendations and conclusions drawn from Trends in International Mathematics and Science Study (TIMSS) and other international studies.
PAGE 34
Grade Focus Areas
K–2Addition and subtraction - concepts, skills, and
problem solving and place value
3–5
Multiplication and division of whole numbers
and fractions – concepts, skills, and problem
solving
6Ratios and proportional relationships; early
expressions and equations
7Ratios and proportional relationships; arithmetic
of rational numbers
8 Linear algebra and linear functions
Key Areas of Focus in Mathematics
PAGE 37
Keep Validity at the Forefront: Why not more
statistics?
Statistics has its own tools and ways of thinking, and statisticians are quite insistent that those of us who teach mathematics realize that statistics is not mathematics, nor is it even a branch of mathematics. In fact, statistics is a separate discipline with its own unique ways of thinking and its own tools for approaching problems.
- J. Michael Shaughnessy, “Research on Students’ Understanding of Some Big Concepts in Statistics” (2006)
PAGE 38
Keep Validity at the Forefront: Why not more
geometry?
• In a study of postsecondary relevance of the
CCSSM, “all respondents rated the Geometry
category relatively lower. This finding suggests that
the Geometry category may be a candidate for
further review in order to increase its applicability
and importance by eliminating or consolidating
some standards. ”
For full report, see https://www.epiconline.org/reaching-the-goal-full-report/
PAGE 39
Construct a blueprint that…
• Clearly communicates the balance of content
across different domains/clusters/standards and
create and make public documentation (e.g.,
content specifications) describing the rationale for
the content balance grounded in the test’s purpose
and relevant research
PAGE 40
CCSSO Section C
Criterion C.1: Focusing strongly on the content most
needed for success in later mathematics
Criterion C.2: Assessing a balance of concepts,
procedures, and applications
Criterion C.3: Connecting practice to content
Criterion C.4: Requiring a range of cognitive demand
Criterion C.5: Ensuring high-quality items and a variety of
item types
PAGE 41
Items are designed to address the aspect(s) of rigor (conceptual
understanding, procedural skill, and application) evident in the
language of the content standards.
3.NF.A.2 Understand a fraction as a number on the number line;
represent fractions on a number line diagram.
PAGE 42
Items are designed to address the aspect(s) of rigor
(conceptual understanding, procedural skill, and
application) evident in the language of the content
standards.
3.OA.A.4 Determine the unknown whole number in a
multiplication or division equation relating three whole numbers.
PAGE 43
3.OA.A.1 Interpret products of whole numbers, e.g., interpret 5 × 7 as the
total number of objects in 5 groups of 7 objects each.
MP2. Reason abstractly and quantitatively.
The demands of items measuring the Standards for
Mathematical Practice are appropriate to the targeted
grade level.
PAGE 44
Create item specifications that…
• Provide clear examples of items to developers and
other stakeholders that clearly communicate how
item type(s) are selected to ensure that items:
– Measure the aspect(s) of rigor called for by CCR standards
– Integrate practice or process standards with content
standards
– Elicit evidence of cognitive processes consistent with the
expectations in the standards
PAGE 45
In Summary . . . How Can States Use the
Criteria?
• Guidance for providing additional evidence around
meeting the expectations for 2.1 and 3.1
• The basis for an independent alignment study
• As a tool to better understand the strengths and
weaknesses of the program.
PAGE 46
Additional Resources
• U.S. Department of Education Peer Review of State
Assessment Systems Non-Regulatory Guidance for
States
• CCSSO Criteria for Procuring and Evaluating High-
Quality Assessments
• SAPs Item Alignment Modules for ELA/Literacy and
Mathematics
Richard Woods, Georgia’s School Superintendent“Educating Georgia’s Future”
gadoe.org
Alignment Considerations for an SEA
Melissa Fincher, Ph.D.
Deputy Superintendent for Assessment & Accountability
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Peer Review
• A necessary endeavor…helps to ensure states attend to technical quality throughout the entire assessment process
‒ from conceptualization and design to scores and usage
• Implicitly requires external evaluation of alignment by an independent third party
• Implicitly requires that states demonstrate, through documentation, their recursive efforts to ensure the alignment of their assessment systems
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Peer Review
• Requirements of Peer Review are related to the Standards for Educational and Psychological Testing, promulgated by APA, AERA, and NCME.
• For the first time, alignment is now an explicitconsideration within the Standards.
• The Standards define alignment as: the degree to which the content and cognitive demands of test questions match targeted content and cognitive demands described in the test specifications (p. 216).
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Why is alignment important?
One word: VALIDITY
• According to the Standards, “test design and development procedures must support the validity of the interpretations of test scores for their intended uses”
• For criterion-referenced tests, states seek to make claims about student mastery of state content standards…
‒ and such claims about mastery (via achievement standards) serve as the foundation for scale meaning (i.e., interpretation)
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Why is alignment important?• “…issues bearing on validity, reliability, and fairness
are interwoven within the stages of test development”(pg 75).
• “Test design and development procedures must support the validity of the interpretations of test scores for their intended uses….current educational assessments often are used to indicate students’ proficiency with regard to standards for the knowledge and skill a student should exhibit; thus, the relationship between the test content and the established content standards is key” (pg 75).
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Alignment is at the heart of these Standards
• Standard 4.12: Test developers should document the extent to which the content domain of a test represents the domain defined in the test specifications.
• Standard 12.4: When a test is used as an indicator of achievement in an instructional domain or with respect to specified content standards, evidence of the extent the test samples the range of knowledge and elicits the processes reflected in the target domain should be provided. Both the tested and the target domains should be described in sufficient detail for their relationship to be evaluated. The analyses should make explicit those aspects of the target domain that the test represents, as well as those aspects that the test fails to represent.
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Alignment
• As states build their assessment systems, consideration of alignment must be front and center
‒ understanding the different approaches within the different alignment methodologies is important – just as not two tests are the same, no two methodologies are the same;
‒ alignment is complicated, multifaceted, and not dichotomous;
‒ alignment, like beauty, is often in the eye of the beholder;
‒ the methodology employed to evaluate alignment should match the design and purpose of the assessment.
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Richard Woods, Georgia’s School Superintendent
“Educating Georgia’s Future”gadoe.org
Continuous Improvement
• Alignment considerations must move away from post-hoc summative evaluations of single test forms
‒ it is important to examine if the system is designed and implemented to result in aligned forms – both process and outcomes (Forte, 2016)
• Alignment studies can provide solid formative feedback that can be used to improve assessment systems
‒ impartial perspective that is invaluable
‒ can be used to address deficits with contractor
‒ different methodologies can provide different (and meaningful) perspectives