Top Banner
SWPBS Forum October 2008 Claudia Vincent and Scott Spaulding [email protected] [email protected] University of Oregon
55

State and District Evaluation Tools

Dec 31, 2015

Download

Documents

talon-carpenter

State and District Evaluation Tools. SWPBS Forum October 2008. Claudia Vincent and Scott Spaulding [email protected] [email protected] University of Oregon. Goals:. Provide information about desirable features of SWPBS evaluation tools - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: State and District Evaluation Tools

SWPBS Forum October 2008

Claudia Vincent and Scott Spaulding [email protected] [email protected] of Oregon

Page 2: State and District Evaluation Tools

Provide information about desirable features of SWPBS evaluation tools

Provide an overview of the extent to which SWPBS evaluation tools meet these desirable features

Page 3: State and District Evaluation Tools

PBS Self-AssessmentImplement sy

stems

to support p

ractices

Implem

ent

practices Improved student

outcomes

EVALUATIONDATAUse eval data for

decision-making

Fidelit

ym

easu

res

Student outcome measures

Action Plan

Inte

rpre

tev

al d

ata

1. Drive implementation decisions

2. Provide evidence for SWPBS impact on student outcomes

Page 4: State and District Evaluation Tools

A measure that drives implementation decisions should be:◦ socially valid◦ contextually appropriate◦ sufficiently reliable

(reliable enough to make defensible decisions)◦ easy to use

A measure that builds the evidence base for SWPBS should: ◦ have known reliability◦ have known validity◦ clearly link implementation status to student outcomes

Page 5: State and District Evaluation Tools

Measurement scores have twocomponents:

◦ True score, e.g. a school’s true performance on “teaching behavioral expectations”

◦ Error, e.g. features of the measurement process itself

Our goal is to use tools that 1.maximize true score and minimize measurement error,

and therefore2.yield precise and interpretable data,

and therefore 3.lead to sound implementation decisions and defensible evidence.

True score(relevant to construct)

Error (noise)

Page 6: State and District Evaluation Tools

True score is maximized and error minimized if the evaluation tool is technically adequate, i.e.

◦ can be applied consistently (has good reliability)

◦ measures the construct of interest (has good validity)

Sound implementation decisions are made if the evaluation tool is practical, i.e. data

◦ are cost efficient to collect (low impact)

◦ are easy to aggregate across units of analysis (e.g. students, classrooms, schools, districts, states)

◦ are consistently used to make meaningful decisions (have high utility)

Page 7: State and District Evaluation Tools

Consistency across

◦ Items/subscales/total scales (“internal consistency”)

◦ Data collectors (“inter-rater reliability” or “inter-observer agreement”)

◦ Time (“test-retest reliability”)

Page 8: State and District Evaluation Tools

Definition:◦ Extent to which the items on an instrument adequately and

randomly sample a cohesive construct, e.g. “SWPBS implementation”

Assessment:◦ If the instrument adequately and randomly samples one

construct, and if it were divided into two equal parts, both parts should correlate strongly

Metric:◦ coefficient alpha (the average split-half correlation based on

all possible divisions of an instrument into two parts)

Interpretation:◦ α ≥ .70 (adequate for measures under development)◦ α ≥ .80 (adequate for basic research)◦ α ≥ .90 (adequate for measures on which consequential decisions

are based)

Page 9: State and District Evaluation Tools

Definition:◦ Extent to which the instrument measures the same

construct regardless of who collects the data

Assessment:◦ If the same construct were observed by two data

collectors, their ratings should be almost identical

Metric:◦ Expressed as percentage of agreement between two data

collectors

Interpretation:◦ ≥ 90% good ◦ ≥ 80% acceptable◦ < 80% problematic

Page 10: State and District Evaluation Tools

Definition:◦ Extent to which the instrument yields consistent results

at two points in time

Assessment:◦ The measure is administered at two points in time. The

time interval is set so that no improvement is expected to occur between first and second administration.

Metric:◦ Expressed as correlation between pairs of scores from

the same schools obtained at the two measurement administrations

Interpretation:◦ r ≥ .6 acceptable

Page 11: State and District Evaluation Tools

How can we interpret this graph?

Page 12: State and District Evaluation Tools

Interpretability of data!

Did these schools truly differ in the extent to which they taught behavioral expectations?

Or…did these schools obtain different scores because◦ the tool’s items captured only some schools’ approach to teaching expectations?

(tool lacked internal consistency)

◦ they had different data collectors? (tool lacked inter-rater agreement)

◦ some collected data in week 1 and some in week 2 of the same month?(tool lacked test-retest reliability)

Page 13: State and District Evaluation Tools

Content validity

Criterion-related validity◦ Concurrent validity◦ Predictive validity

Construct validity

Page 14: State and District Evaluation Tools

Definition:◦ Extent to which the items on an instrument relate to

the construct of interest, e.g. “student behavior”

Assessment:◦ Expert judgment if items measure content

theoretically or empirically linked to the construct

Metric:◦ Expressed as percentage of expert agreement

Interpretation:◦ ≥ 80% agreement desirable

Page 15: State and District Evaluation Tools

Definition:◦ Extent to which the instrument correlates with another

instrument measuring a similar aspect of the construct of interest and administered concurrently or subsequently

Assessment:◦ Concurrent validity: compare data from concurrently

administered measures for agreement◦ Predictive validity: compare data from subsequently

administered measures for predictive accuracy

Metric:◦ Expressed as a correlation between two measures

Interpretation:◦ Moderate to high correlations are desirable◦ Concurrent validity: Very high correlations might indicate

redundancy of measures

Page 16: State and District Evaluation Tools

Definition:◦ Extent to which the instrument measures what it is supposed to

measure (e.g. the theorized construct “student behavior”)

Assessment:◦ factor analyses yielding information about the instrument’s

dimensions (e.g. aspects of “student behavior”)◦ Correlations between constructs hypothesized to impact each

other (e.g. “student behavior” and “student reading achievement”)

Metric:◦ Statistical model fit indices (e.g. Chi-Square)

Interpretation:◦ Statistical significance

Page 17: State and District Evaluation Tools

How can we interpret this graph?

Page 18: State and District Evaluation Tools

Interpretability of data!

Can we truly conclude that student behavior is better in school F than school J?◦ Does the tool truly measure well-defined behaviors? (content validity)

◦ Do student behaviors measured with this tool have any relevance for the school’s overall climate? For the student’s long-term success? (concurrent, predictive validity)

◦ Does the tool actually measure “student behavior”, or does it measure “teacher behavior”, “administrator behavior”, “parent behavior” ?(construct validity)

Page 19: State and District Evaluation Tools

Consider sample size◦ Psychometric data derived from large samples are

better than psychometric data derived from small samples.

Consider sample characteristics◦ Psychometric data derived from specific samples

(e.g. elementary schools) do not automatically generalize to all contexts (e.g. middle schools, high schools).

Page 20: State and District Evaluation Tools

Making implementation decisions based on evaluation data◦ When has a school reached “full” implementation?

“Criterion” scores on implementation measures should be calibrated based on student outcomes

Implementation

student outcomegoals

criterion

academic achievement

10 20 30 40 50 60 70 80 90 100

social achievement

Page 21: State and District Evaluation Tools

Evaluation data lead to consequential decisions, e.g.◦ Additional trainings when data indicate

insufficient implementation◦ Emphasis on specific supports where data

indicate greatest student needs

To make sure we arrive at defensible decisions, we need to collect evaluation data with tools that ◦ have documented reliability and validity◦ clearly link implementation to student outcomes

Page 22: State and District Evaluation Tools

1. Collect evaluation data regularly

2. Collect evaluation data with tools that have good reliability and validity

3. Guide implementation decisions with evaluation data clearly linked to student outcomes

Page 23: State and District Evaluation Tools

Provide information about desirable features of SWPBS evaluation tools

Provide an overview of the extent to which SWPBS evaluation tools meet these desirable features

Page 24: State and District Evaluation Tools

How is my school doing? My school is “80/80”. Now what? My school is just beginning SWPBS. Where

do I start? How do we handle the kids still on support

plans? I’ve heard about school climate. What is

that? What about the classroom problems we still

have?

Page 25: State and District Evaluation Tools

Measurement within SWPBS Research or evaluation? What tools do we have? What evidence exists for use of these tools? Guidelines for using the measures

Page 26: State and District Evaluation Tools

Focus on the whole school School-wide PBS began with a focus on

multiple systems Evaluation of a process Evaluation of an outcome Growth beyond initial

implementation

Non-

class

room

Classroom

IndividualStudent

School-wideSystems

Sugai & Horner (2002)

Page 27: State and District Evaluation Tools

Primary Prevention:

School-/Classroom-Wide Systems for

All Students,Staff, & Settings

Continuum of School-wide Positive Behavior Support

~80% of Students

~15%

~5%

Secondary Prevention:

Specialized GroupSystems for

Students with At-Risk Behavior

Tertiary Prevention:Specialized IndividualizedSystems for Students with

High-Risk Behavior

Page 28: State and District Evaluation Tools

Student

Unit of Measurement and AnalysisSchool Nonclassroom

Classroom

?

AcademicsBehavior

Academics

Behavior

Dimension of Measurement

Academic Achievement

Social Behavior

??

???

???

Tertiary

Secondary

Primary

Outcomes

Process

Page 29: State and District Evaluation Tools

Measurement within SWPBS Research or evaluation? What tools do we have? What evidence exists for use of these tools? Guidelines for using the measures

Page 30: State and District Evaluation Tools

1. Drive implementation decisions2. Provide evidence for SWPBS impact on

student outcomes

Measures have developed to support research-quality assessment of SWPBS

Measures have developed to assist teams in monitoring their progress

Page 31: State and District Evaluation Tools

Measurement within SWPBS Research or evaluation? What tools do we have? What evidence exists for use of these tools? Guidelines for using the measures

Page 32: State and District Evaluation Tools

Some commonly used measures: Effective Behavior Supports Survey Team Implementation Checklist Benchmarks of Quality School-wide Evaluation Tool Implementation Phases Inventory

Page 33: State and District Evaluation Tools

Newer measures: Individual Student Schoolwide Evaluation

Tool Checklist for Individual Student Systems Self-assessment and Program Review

Page 34: State and District Evaluation Tools

Whole-School Non-classroom

Classroom

Tertiary ISSETCISS

Secondary ISSETCISS

Universal EBSTICSETBoQ

EBS EBS

Page 35: State and District Evaluation Tools

Measurement within SWPBS Research or evaluation? What tools do we have? What evidence exists for use of these

tools? Guidelines for using the measures

Page 36: State and District Evaluation Tools

Is it important, acceptable, and meaningful? Can we use it in our school? Is it consistent? Is it easy to use? Is it “expensive”? Does it measure what it’s supposed to? Does it link implementation to outcome?

Page 37: State and District Evaluation Tools

Effective Behavior Supports Survey (EBS) School-wide Evaluation Tool (SET) Benchmarks of Quality (BoQ)

Page 38: State and District Evaluation Tools

Effective Behavior Supports Survey ◦ Sugai, Horner, & Todd (2003)◦ Hagan-Burke et al. (2005)◦ Safran (2006)

Internal consist.

T-R Inter-rater

Content Criterion

Construct

Page 39: State and District Evaluation Tools

46-item, support team self-assessment Facilitates initial and annual action planning Current status and priority for improvement

across four systems: ◦ School-wide ◦ Specific Setting◦ Classroom◦ Individual Student

Summary by domain, action planning activities 20-30 minutes, conducted at initial assessment,

quarterly, and annual intervals

Page 40: State and District Evaluation Tools

Internal consistency◦ Sample of 3 schools◦ current status: α =.85 ◦ improvement priority: α =.94 ◦ Subscale α from .60 to .75 for “current status”

and .81 to .92 for “improvement priority” Internal consistency for School-wide

◦ Sample of 37 schools◦ α = .88 for “current status” ◦ α = .94 for the “improvement priority”

Page 41: State and District Evaluation Tools

School-wide Evaluation Tool ◦ Sugai, Horner & Todd (2000)◦ Horner et al. (2004)

Internal consist.

T-R Inter-rater

Content Criterion

Construct

╳ ╳ ╳ ╳ ╳

Page 42: State and District Evaluation Tools

28-item, research evaluation of universal implementation

Total implementation score and 7 subscale scores:1. school-wide behavioral expectations 2. school-wide behavioral expectations taught3. acknowledgement system4. consequences for problem behavior5. system for monitoring of problem behavior6. administrative support7. District support

2-3 hours, external evaluation, annual

Page 43: State and District Evaluation Tools

Internal consistency ◦ Sample of 45 middle and elementary schools◦ α = .96 for total score◦ α from .71 (district-level support) to .91

(administrative support) Test-retest analysis

◦ Sample of 17 schools◦ Total score, IOA = 97.3%◦ Individual subscales, IOA = 89.8%

(acknowledgement of appropriate behaviors) to 100% (district-level support)

Page 44: State and District Evaluation Tools

Content validity◦ Collaboration with teachers, staff, and

administrators at 150 middle and elementary schools over a 3-year period

Page 45: State and District Evaluation Tools

Construct validity◦ Sample of 31 schools◦ SET correlated with EBS Survey◦ Pearson r = .75, p < .01

Sensitivity to differences in implementation across schools ◦ Sample of 13 schools◦ Comparison of average scores before and after

implementation◦ t = 7.63; df = 12, p < .001

Page 46: State and District Evaluation Tools

Schoolwide Benchmarks of Quality ◦ Kincaid, Childs, & George (2005)◦ Cohen, Kincaid, & Childs (2007)

Internal consist.

T-R Inter-rater

Content Criterion

Construct

╳ ╳ ╳ ╳ ╳

Page 47: State and District Evaluation Tools

Used to identify areas of success / improvement Self-assessment completed by all team

members 53-items rating level of implementation Team coaches create summary form, noting

discrepancies in ratings Areas of strength, needing development, and of

discrepancy noted for discussion and planning 1-1.5 hours (1 team member plus coach) Completed annually in spring

Page 48: State and District Evaluation Tools

Items grouped into 10 subscales: 1. PBS team2. faculty commitment3. effective discipline procedures4. data entry5. expectations and rules6. reward system7. lesson plans for teaching behavioral expectations8. implementation plans9. crisis plans10.evaluation

Page 49: State and District Evaluation Tools

Internal consistency ◦ Sample of 105 schools ◦ Florida and Maryland

44 ES, 35 MS, 10 HS, 16 center schools◦ overall α of .96◦ α values for subscales

.43 “PBS team” to .87 “lesson plans for teaching expectations”

Page 50: State and District Evaluation Tools

Test-retest reliability◦ Sample of 28 schools◦ Coaches scores only◦ Total score: r = .94, p < 0.01◦ r values for subscales:

0.63 “implementation plan” to 0.93 “evaluation” acceptable test-retest reliability

Inter-observer agreement (IOA)◦ Sample of 32 schools◦ IOA = 89%

Page 51: State and District Evaluation Tools

Content validity◦ Florida PBS training manual & core SWPBS

elements◦ Feedback from 20 SWPBS research and evaluation

professionals ◦ Interviewing to identify response error in the items◦ Pilot efforts with 10 support teams

Concurrent validity ◦ Sample of 42 schools◦ Correlation between BoQ and SET ◦ Pearson r = .51, p < .05

Page 52: State and District Evaluation Tools

Measurement within SWPBS Research or evaluation? What tools do we have? What evidence exists for use of these tools? Guidelines for using the measures

Page 53: State and District Evaluation Tools

What measures do I use? How do I translate a score into “practice”? What school variables affect measurement

choices?◦ SWPBS implementation status

Evaluation template

Page 54: State and District Evaluation Tools

Fidelity Tool Year 1 Year 2 Year 3

EBS Survey X 1 2 3 4 1 2 3 4 1 2 3 4

Universal

TIC X X X X X X X X X

SET / BoQ X X X

Secondary / tertiary CISS X X X X X X X X X

ISSET X X X

Classroom setting

Class internal X X X X X X X X X

Class external X X X

Page 55: State and District Evaluation Tools

Evaluation of School-wide PBS occurs for implementation and outcomes

Evidence of a “good” measure depends on its intended use

The quality of implementation decisions depends on the quality of evaluation tools

Evaluation occurs throughout the implementation process, with different tools for different purposes at different stages