The Challenge of Assessing Reading for Understanding to Inform Instruction Barbara Foorman, Ph.D. Florida Center for Reading Research Florida State University.

The Challenge of Assessing Reading for Understanding to Inform Instruction

Barbara Foorman, Ph.D.

Florida Center for Reading Research

Florida State University

Problem

• What is reading comprehension and how is it measured?

• Dissatisfaction with traditional “mean proficiency” approaches that judge students’ achievement relative to a benchmark has led to an interest in basing accountability on individual academic growth.

• Yet the challenge of informing instruction remains

Simple View of Reading

The ability to read and obtain meaning from what was read.

Gough and Tunmer (1986)

Recognizing words in text & sounding them out phonemically

The ability to understand language

Multiplied by Equals

Scarborough (2002)

Constructs/Measures by Grade

Construct Measure GradePhonological Awareness TOPEL; CTOPP preK-G1

Print

LN/LS TOPEL; FAIR preK-K

Word Reading TOWRE; FAIR; SARA G1-G9

Vocabulary (+MORPH) TOPEL; PPVT; FAIR; SARA preK-G9

Syntax CELF preK-G9

Listening Comprehension CELF; FAIR preK-G6

Reading Comprehension

Efficiency TOSREC; FAIR; SARA G1-G9

Text GMRT; FAIR; SARA G1-G9

TEXT

ACTIVITY

READER

A heuristic for thinking about reading comprehension (Sweet & Snow, 2003)

Word recognition, vocabulary,background knowledge, strategyuse, inference-making abilities,motivation

Text structure, vocabulary, genrediscourse, motivating features,print style and font

Purpose, social relations,school/classroom/peers/families

Environment, cultural norms

Components of Reading Comprehension (Perfetti, 1999)

Comprehension ProcessesIn

fere

nces

Situation Model

Text Representation

Parser

General Knowledge

Linguistic System Phonology Syntax Morphology

Meaning and Form Selection

WordRepresentation

OrthographicUnits

PhonologicalUnits

Visual Input

Word Identification

Lexicon Meaning Morphology Syntax

OrthographyMapping tophonology

Goal of RFU Assessment Grant

ETS and FSU/FCRR are designing a new assessment system consisting of:

1.Purpose-driven, scenario-based, summative assessment (textbase + situation model; motivation; prior knowledge; text complexity.

2.Component skill measures to predict achievement trajectories and provide additional information about non-proficient readers.

9

2009 NAEP FrameworkLiterary Text

● Fiction● Literary Nonfiction● Poetry

Informational Text● Exposition ● Argumentation and Persuasive Text● Procedural Text and Documents

Cognitive Targets Distinguished by Text Type

Locate/Recall Integrate/Interpret Critique/Evaluate

10

Advanced G8 students at the Advanced level should be able to:

Make complex inferences Critique point of view Evaluate character motivation Describe thematic connections across literary texts Evaluate how an author uses literary devices to convey meaning

G8 students at Advanced level should be able to:

Make complex inferences Evaluate the author’s purpose Evaluate strength & quality of supporting evidence Compare & contrast ideas across texts Critique causal relations

Proficient G8 students at the Proficient level should be able to:, Make inferences that describe problem and solution, cause and effect Analyze character motivationInterpret mood or tone Explain theme Identify similarities across texts Analyze how an author uses literary devices to convey meaningI nterpret figurative language

G8 students at Proficient level should be able to:

Summarize major ideas Draw conclusions Provide evidence in support of an argument Describe author’s purpose Analyze and interpret implicit causal relations

Basic G8 students at the Basic level should be able to:

I nterpret textually explicit information Make simple inferences Identify supporting details Describe character’s motivation Describe the problem Identify mood

G8 students at the Basic level should be able to:

Locate the main idea Distinguish between fact and opinion Make inferences Identify author’s explicitly stated purpose Recognize explicit causal relations

AchievementLevel

Literary Informational

Achievement Levels for Grade 8 NAEP Reading

Current Approaches to Measuring Growth

• TN uses EVASS (Sanders, 2000): deviation from mean level of growth.

• Ohio uses achievement plus growth from previous and current year: above, met, and below expected growth.

• Colorado uses the Student Growth Percentile Model based on conditional percentile ranks and quantile regression (Betebenner, 2008).

Interim Assessments

• Why not measure growth within a year?

• Interim assessments are mid-way between formative and summative and can be aggregated up above the classroom level.

• Interim assessments are ideal for informing learning, instruction, and placement.

Florida Assessments for Instruction in Reading

• A K-2 assessment system administered to individual students 3 times a year, with electronic scoring, Adobe AIR version, and PMRN reports linked to instructional resources.

• A 3-12 computer-based system where students take the assessments 3 times a year. Several tasks are adaptive. PMRN reports are available, linked to instructional resources. Printed toolkit available.

The K-2 “Big Picture” MapBroad Screen/Progress Monitoring Tool (BS/PMT) “All” students

• Letter Naming & Sounds• Phonemic Awareness • Word Reading

Broad Diagnostic Inventory (BDI) “All” students“Some” students for vocabulary

• Listening Comprehension• Reading Comprehension• Vocabulary • Spelling (2nd grade only)

Targeted Diagnostic Inventory(TDI) “Some” students; some tasks

• K = 9 tasks• 1st = 8 tasks• 2nd = 6 tasks

Ongoing Progress Monitoring(OPM)“Some” students

•K – 2 = TDI tasks•1 – 2 = ORF

K-2 Targeted Diagnostic Inventory (TDI)

Kindergarten • Print Awareness• Letter name and sound knowledge• Phoneme Blending• Phoneme Deletion Word Parts/Initial• Letter Sound Connection Initial• Letter Sound Connection Final• Word Building –Initial Consonants• Word Building –Final Consonants• Word Building –Medial Vowels

First Grade • Letter Sound Knowledge • Phoneme Blending• Phoneme Deletion Initial• Phoneme Deletion Final• Word Building –Consonants• Word Building –Vowels• Word Building –CVC /CVCe• Word Building –Blends

Second Grade • Phoneme Deletion Initial• Phoneme Deletion Final• Word Building –Consonants• Word Building –CVC /CVCe •Word Building –Blends & Vowels• Multisyllabic Word Reading

The K – 2 “Score” Map

BS/PMT PRS = Probability of Reading Success

BDI LC = Listening Comprehension Total questions correct (implicit/explicit)

RC = Reading Comprehension Total questions correct (implicit/explicit), Fluency, Percent Accuracy Target Passage

VOC = Vocabulary Percentile Rank

SPL = Spelling Percentile Rank

TDI ME = Meets Expectations

BE = Below Expectations

OPM ORF = Adjusted Fluency

OPM TDI Tasks = ME or BE and Raw Score16

Broad Screen/Progress Monitoring Tool

Reading Comprehension Task

(3 Times a Year)

Targeted Diagnostic Inventory

Maze & Word Analysis TasksDiagnostic

Toolkit(As Needed)

OngoingProgress

Monitoring(As Needed)

If necessary

Grades 3-12 Assessments Model

• RC Screen– Helps us identify students who may not be able to meet the grade

level literacy standards at the end of the year as assessed by the FCAT without additional targeted literacy instruction.

• Mazes– Helps us determine whether a student has more fundamental

problems in the area of text reading efficiency and low level reading comprehension. Relevant for students below a 6 th grade reading level.

• Word Analysis – Helps us learn more about a student's fundamental literacy

skills--particularly those required to decode unfamiliar words and read and write accurately.

Purpose of Each 3-12 AssessmentPurpose of Each 3-12 Assessment

How is the student placed into the first passage/item?

Task Placement Rules

Reading Comprehension - Adaptive

•For AP 1, the first passage the student receives is determined by: • Grade level and prior year FCAT (if available) • If no FCAT, students placed into a specific grade-level passage

• All 3rd grade students are placed into the same initial passage

•For AP 2 and 3, the first passage is based on students’ final ability score from the prior Assessment Period (AP).

Maze – Not adaptive

Two predetermined passages based on grade level and assessment period (AP).

WA - Adaptive • AP 1-3 starts with predetermined set of 5 words based on grade level. Student performance on this first set of 5 words determines the next words the student receives.

• 5-30 words given at each assessment period based on ability.

How is the student placed into subsequent passages?

• Based on the difficulty of the questions the student answers correctly on the first passage, the student will then be given a harder or easier passage for their next passage. – Difficulty of an item is determined using Item Response

Theory (IRT). Estimates based on theta/SE.

• Because of this, the raw score of 7/9 for Student A and 7/9 for Student B, when reading the same passage, does not mean they will have the same converted scores.

21

Florida Assessments for instruction in Reading (FAIR): 3-12 Measures

Type of Assessment Name of Assessment

Broad Screen/Progress Monitoring Tool (BS/PMT) – Appropriate for ‘All’ students

• Reading Comprehension (RC)

Targeted Diagnostic Inventory(TDI) – “Some” students

• Maze• Word Analysis (WA)

Ongoing Progress Monitoring (OPM) – “Some” students

• Maze• ORF• RC

Informal Diagnostic Toolkit(Toolkit) – “Some” students

• Phonics Inventory• Academic Word Inventory• Lexiled Passages• Scaffolded Discussion Templates

22

FAIR 3-12 Score TypesReading Comprehension - BS/PMT

FCAT Success Probability (FSP) Color- coded

Percentile Standard Score Lexile® Ability Score and Ability Range FCAT Cluster Area Scores

Maze - TDI Percentile Standard Score Adjusted Maze Score

Word Analysis - TDI Percentile Standard Score Ability Score (WAAS)

OPM RC – Ability Score, Ability Range, Cluster Scores Maze – Adjusted Maze Score ORF (3rd – 5th) Adjusted Fluency Score

Research Questions

• How do FAIR RC scores correlate to FCAT?

• What is the value-added of FAIR RC to prior FCAT?

• Does FAIR RC significantly reduce identification errors above and beyond prior FCAT?

• What is the value added of growth vs. difference?

• What is the value added of growth and prior FCAT?

FL Comprehensive Assessment Test (FCAT): Grades 3-10

• FCAT Reading is a group-administered, criterion-referenced test consisting of 6-8 informational & literary passages, with 6-11 multiple choice items per passage.

• 4 content clusters in 0910: 1) words & phrases in context; 2) main idea; 3) comparison/cause & effect; 4) reference & research.

• Reliability up to .9; content & concurrent validity (FLDOE, 2001; Schatschneider et al., 2004)

Participants: 951,893 students in FL Public Schools in PNRN

Grade Number of StudentsGrade 3 156,265

Grade 4 130,119

Grade 5 129,856

Grade 6 107,737

Grade 7 108,394

Grade 8 105,071

Grade 9 112,686

Grade 10 101,765

Analyses

• Correlations of FCAT SSS & FAIR’s FSP & SS

• Multiple regression of current FCAT by: prior FCAT; FCAT plus FAIR RCA; FAIR RCA alone.

• Comparisons of NPP in predicting current FCAT: 1) prior FCAT, or 2) prior FCAT + FAIR’s RCA.

• HLM comparisons of current FCAT by: Bayesian growth in RCA vs. difference scores.

• HLM comparisons of current FCAT: Bayesian growth with & without prior FCAT

Fall Winter Spring

Grade

RC Screen Standard

Score FSP

RC ScreenStandard

Score FSP

RC ScreenStandard

Score FSP

3 0.72 0.71 0.74 0.72 0.75 0.73

4 0.69 0.74 0.73 0.74 0.74 0.74

5 0.70 0.75 0.73 0.76 0.73 0.76

6 0.73 0.74 0.72 0.74 0.72 0.74

7 0.71 0.72 0.69 0.72 0.69 0.72

8 0.71 0.76 0.71 0.76 0.71 0.76

9 0.69 0.73 0.67 0.73 0.67 0.73

10 0.69 0.75 0.67 0.74 0.66 0.74

Table 1: Correlations between the FCAT and both RC Screen & FSP

Grade

Variables 4 5 6 7 8 9 10

Prior FCAT 49.5% 58.6% 53.0% 60.0% 64.7% 57.5% 59.0%

Prior FCAT + RCA 53.3% 62.5% 60.3% 63.0% 68.4% 59.2% 61.3%

Unique Var. 3.8% 3.9% 7.3% 3.0% 3.7% 1.7% 2.3%

Table 2: Estimates of Variance Explained by Prior FCAT and FCAT + RCA

Grade

Variables 4 5 6 7 8 9 10

Prior FCAT 86% 86% 86% 84% 78% 70% 61%Prior FCAT +

RCA 98% 94% 93% 85% 92% 90% 54%

Table 3: Comparing Negative Predictive Power in Predicting Current FCAT with either Prior FCAT or Prior FCAT + FAIR’s RCA

Grade Base Bayesian Slope Simple Difference

3 0.55 0.55 0.62

4 0.53 0.53 0.59

5 0.53 0.53 0.60

6 0.51 0.51 0.61

7 0.47 0.47 0.57

8 0.50 0.50 0.59

9 0.45 0.45 0.55

10 0.45 0.45 0.55

Table 4. HLM Estimates of FCAT Comparing R2 in Growth vs. Difference Score Models

GradeFCA

TFCAT +

RCABayesian

SlopeSimple

Difference

4 0.58 0.65 0.65 0.68

5 0.6 0.66 0.66 0.68

6 0.62 0.67 0.68 0.69

7 0.56 0.61 0.62 0.63

8 0.51 0.57 0.57 0.59

9 0.5 0.53 0.54 0.55

10 0.52 0.57 0.57 0.59

Table 5. HLM Estimates of FCAT Comparing R2 in Autoregressive, Growth, and Difference Score Models

Improvements on “Mean Proficiency” approach

• FAIR + Prior FCAT accounts for up to 7% unique variance;

• Simple Difference approach to measuring FAIR growth accounts for up to 2-3% unique variance beyond prior FCAT + FAIR;

• Improvements in prediction lead to:– Reduction in mis-identification of risk (from 14%-30%

with prior FCAT to 2%-15% with prior + FAIR)– Better placement for reading intervention

Challenges to Implementing FAIR

• How is progress monitored? (score types, RTI decision-making)

• What is the value of an adaptive test?

• Benchmark mania

• Scaling professional development, making instructional resources available, and building local capacity

Reading Comprehension Mazes Word Analysis

AP Score PM score AP Score PM score AP Score PM score

student lexile score

student lexile score

%ile & SS

RCAS

Percentilerank

Percentilerank

FSP

Adj. Maze

SS SS

WAAS

AP = Assessment Period; PM = progress monitoring; SS = standard score

Value of Computer-Adaptive Tests• Provides more reliable & quicker assessment of

student ability than a traditional test, because it creates a unique test tailored to the individual student’s ability.

• Provide more reliable assessments particularly for students at the extremes of ability (extremely low ability or extremely high ability).

• Grade-level percentiles are currently provided; Grade Equivalent scores will be provided next year.

Benchmark Conundrum

• Benchmark tests rarely have enough items to be reliable at the benchmark level. Besides, teaching to benchmarks (e.g., “the student will use context clues to determine meanings of unfamiliar words”) results in fragmented skills. Teach to the standard(s) (e.g., “The student uses multiple strategies to develop grade appropriate vocabulary). Assess at aggregate levels (e.g., Reporting Categories), if CFA show categories are valid.

FCAT 2.0 Reporting Categories

• Reporting Category 1: Vocabulary

• Reporting Category 2: Reading Application

• Reporting Category 3: Literary Analysis- Fiction/Nonfiction

• Reporting Category 4: Informational Text/ Research Process

FCAT 2.0: Benchmarks x Grade Category 1 Category 2 Category 3 Category 4 Total

Grade 3 4 6 3 1 14

4 4 6 3 1 145 4 6 3 2 15

6 4 5 3 2 147 4 5 3 2 148 4 5 3 2 14

9/10 4 5 3 2 14

Possible Benchmark Solutions

• Stop-gap: start each students with grade-level passage. Provide % correct on Reporting Categories. Then continue to current adaptive system to obtain reliable, valid FSP and RCAS.

• For the future: – Align FAIR to the Common Core. Develop grade-

level CAT that is item adaptive. Incorporates vocabulary depth/breadth.

– Challenges: Dimensionality; multi-dimensional IRT; testlet effects.

Kinds of Vocabulary Knowledge

Word Meanings

Text

• Definitional

• Usage/contextual

• Definitional/contextual

• Relational

• Morphological

Thank You Comments or Questions?