IES Summer Research Training Institute: Single-Case Intervention Design and Analysis August 18-22, 2014 The Lowell Center Madison, Wisconsin 1.

IES Summer Research Training Institute: Single-Case Intervention

Design and Analysis August 18-22, 2014

The Lowell Center

Madison, Wisconsin

1

Introductions and General OverviewInstitute Faculty, Participants, and Institute of Education SciencesAssumptions/Goals and ObjectivesSchedule and LogisticsSmall Group Research Planning Sessions and Individual Consultation SessionsResources on Single-Case Design and AnalysisFollow Up with Institute Faculty

Single-Case Design Institute FacultyThomas R. Kratochwill, PhD

(University of Wisconsin-Madison)Joel R. Levin, PhD

(University of Arizona)John Ferron, PhD

(University of South Florida)Erin Barton, PhD

(Vanderbilt University) Wendy Machalicek, PhD(University of Oregon)William Shadish, PhD

(University of California, Merced)Carnac the Magnificent Statistician, PhD, PsyD, JD, MD, DDS,

(but…ABD)(Psychic University)

3

Welcome Participants!

Schedule and Logistics

Agenda for the Institute (The Lowell Center)Breaks, Lunch, and DinnerBreak-out Sessions Individual Project Consultations (see agenda schedule)

Small Group Research Planning Sessions and Consultation

Format for Small Group ActivitiesSmall Group Break-Out Rooms (Rooms will be B1B, 115, 116, and 117; reserved for the week)Graduate Student Assistants:

• Elizabeth Hagermoser, MS• Megan Willes, MS

Schedule for Individual Project Consultations

Goals

Personal Design Consultation:Each participant leaves the Institute with a single-case research study (or program of study) that fits their grant/research agenda.

Group Activity:Each participant leaves the Institute with the experience of working in a collaborative group to build a single case study, and apply visual analysis, statistical analysis, and effect size measures.

Personal Consultation

20 min meetings with faculty (additional time available per schedule)

Specify design or analysis questionsProduct:

Personal ResearchResearch Question(s)Conceptual ModelSingle-Case Design, and anticipated dataAnalysis plan

NOTE: Personal research plans are not required to be shared or disseminated. We want to respect the intellectual contributions of each scholar, yet provide a context where each participant leaves the institute with at least one new study that will meet WWC criteria .

Group Activity

GoalEach participant leaves the Institute with the experience of working in a collaborative group to build a single case study, and apply visual analysis, statistical analysis, and effect size measures.

Resources on Single-Case Design and Analysis

Required ReadingsRecommended ReadingsAdditional ResourcesBooks and MonographsFaculty as a Resource

Follow up to Institute

Thomas R. Kratochwill, PhDWisconsin Center for Education Research1025 West Johnson StreetUniversity of Wisconsin-MadisonMadison, Wisconsin 53706

E-Mail: [email protected]

Single-Case Intervention Research Opportunities at the Institute of

Education SciencesKristen Rhoads, PhD, Project Officer,

National Center for Special Education Research

DAY 1Logic and Foundations of Single-Case Intervention

Research Joel R. Levin and Thomas R. Kratochwill

University of Wisconsin-Madison

Joel LevinBrief Review of the Principles

of Scientifically CredibleIntervention Research and Transition to Single-Case Intervention Research

Tom KratochwillLogic and Foundations of Single-Case

Intervention Research Purposes and Fundamental Assumptions of Single-

Case Intervention Research Methods

Defining features of SCDs Core design types Internal validity and the role of replication

Characteristics of Scientifically Credible Single-Case Intervention Studies “True” Single-Case Applications and the WWC Standards

(design and evidence credibility) Classroom-Based Applications (design and evidence

credibility) 17

Features of Single-Case Research Methods

Single-Case Research will have Four Features:Independent variable Dependent variable Focus is on functional relation (causal effect)Dimension(s) of predicted change (e.g., level, trend, variability, score overlap)

Additional Considerations

Operational definition of dependent variable (DV)Measure of DV is valid, reliable, and addresses the dimension(s) of concern.

Operational definition of independent variable (IV)Core features of IV are defined, and if necessary measured to document fidelity (see Sanetti & Kratochwill, 2014).

Unit of IV implementation Group versus individual unit.

Background on Single-Case Designs: Defining Features

Design: Repeated measurement of an outcome before, during, and/or after active manipulation of independent variable

Often Used in Applied and Clinical Fields Allows study of low prevalence disorders where otherwise would

need large sample for statistical power (Odom, et al., 2005). Sometimes more palatable to service providers because SCDs do

not include a no-treatment comparison group. Hammond and Gast (2010): Descriptive analysis of “single

subject” research designs: 1983-2007. Shadish and Sullivan (2011): Characteristics of single-case

designs used to assess intervention effects in 2008.

20

Descriptive Analysis

Hammond and Gast (2011) reviewed 196 randomly identified journal issues (from 1983-2007) containing 1,936 articles (a total of 556 single-case designs were coded). Multiple baseline designs were reported more often than withdrawal designs and these were more often reported across individuals and groups.

Research Currently Meeting WWC Design Standards

Sullivan and Shadish (2011) assessed the WWC pilot Standards related to implementation of the intervention, acceptable levels of observer agreement/reliability, opportunities to demonstrate a treatment effect, and acceptable numbers of data points in a phase. In published studies in 21 journals in 2008, they found that nearly 45% of the research met the strictest WWC standards of design and 30% met with some reservations. So, it can be concluded from this sample that around 75% of the published research during a sampling year of major journals that publish single-case intervention research would meet (or meet with reservations) the WWC design standards.

Types of Research Questions that Can be Answered with Single-Case Design Types

Evaluate Intervention Effects Relative to Baseline Does Multi-systemic Therapy reduce the level of problem

behavior for students with emotional behavior disorders?

Compare Relative Effectiveness of Interventions Is “function-based behavior support” more effective than “non-

function-base support” at reducing the level and variability of problem behavior for this participant?

Compare Single- and Multi-Component Interventions Does adding Performance Feedback to Basic Teacher Training

improve the fidelity with which instructional skills are used by new teachers in the classroom?

Some Examples of SCD Research Questions that Might be Addressed

Is a certain teaching procedure functionally related to an increase in the level of social initiations by young children with autism?

Is time delay prompting or least-to-most prompting more effective in increasing the level of self-help skills performed by young children with severe intellectual disabilities?

Is the pacing of reading instruction functionally related to increased level and slope of reading performance (as measured by ORF) for third graders?

Is Adderal (at clinically prescribed dosage) functionally related to increased level of attention performance on the Attention Network Test for elementary age students with Attention Deficit Disorder?

Single-Case Designs are Experimental Designs

Like RCTs, purpose is to document causal relationshipsControl for major threats to internal validityDocument effects for specific individuals / settingsReplication (across studies) required to enhance external validityCan be distinguished from case studies

Single-Case Design Standards were Developed to Address Threats to Internal Validity (when the unit of analysis is the individual)

Ambiguous Temporal PrecedenceSelectionHistoryMaturationTestingInstrumentationAdditive and Interactive Effects of Threats

Additional Threats to Internal Validity May Occur When Clusters are the

Unit of AnalysisExamples:• Cluster Selection• Cluster Composition• Within-Cluster Variability• Attrition of Within-Cluster Participants

and of Clusters• Within-Cluster Extraneous Variables• Across-Cluster Contagion Effects

Distinctions Between Experimental Single-Case

Design and Clinical Case Study Research

Some Characteristics of Traditional Case Study Research

Often characterized by narrative description of case, treatment, and outcome variablesTypically lack a formal design with replication but can involve a basic design format (e.g., A/B)Methods have been suggested to improve drawing valid inferences from case study research [e.g., Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings(2nd ed.). New York: Oxford University Press]

Some Examples of Ways to Improve Case Study Research

Type of dataAssessment occasionsPlanned vs. ex post factoProjections of performanceTreatment effect sizeTreatment effect impactNumber of participants/replicationStandardization of treatmentIntegrity of treatment

Wendy MachalieckConstructing Single-Case

Designs Based on Research Questions and Logic Models

Tom KratochwillCharacteristics of

Scientifically Credible Single-Case Intervention

Studies Based on the WWC Pilot Standards

Current Needs in the FieldPublication criteria for peer reviewed journals

Design, Analysis, Interpretation

Grant review criteria (e.g., IES, NSF, NIMH/NIH)RFP stipulations, grant reviewer criteria

Documentation of “Evidence-based Practices”Professional agreement

Training expectations for new scholarsVisual Analysis; Statistical Analysis

Meta-analyses procedures that will allow single-case research content to reach broader audiences

Single-case researchers have a number of conceptual and methodological standards to guide their synthesis work. These standards, alternatively referred to as “guidelines,” have been developed by a number of professional organizations and authors interested primarily in providing guidance for reviewing the literature in a particular content domain (Smith, 2012; Wendt & Miller, 2012). The development of these standards has also provided researchers who are designing their own intervention studies with a protocol that is capable of meeting or exceeding the proposed standards.

Reviews of Appraisal Guidelines

Wendt and Miller (2012) identified seven “quality appraisal tools” and compared these standards to the single-case research criteria advanced by Horner et al. (2005). Smith (2012) reviewed research design and various methodological characteristics of single-case designs in peer-reviewed journals, primarily from the psychological literature (over the years 2000-2010). Based on his review, six standards for appraisal of the literature were identified (some of which overlap with the Wendt and Miller review).

Examples of Professional Groups with SCD Standards or Guidelines:

National Reading PanelAmerican Psychological Association (APA) Division 12/53American Psychological Association (APA) Division 16What Works Clearinghouse (WWC)Consolidated Standards of Reporting Trials (CONSORT) Guidelines for N-of-1 Trials (the CONSORT Extension for N-of1 Trials [CENT] 3

8

ContextSingle-case methods developed and traditionally used within Applied Behavior Analysis

Shavelson &

Towne, 2002

Claims that Visual

Analysis is Unreliable

Emergence

of “Evidence-

based”

practices

IES commitment to Rigorous Education Research

Context: WWC White PaperSingle-Case Intervention Research Design Standards Panel

Thomas R. Kratochwill, ChairUniversity of Wisconsin-Madison

John H. HitchcockOhio University

Robert H. HornerUniversity of Oregon Joel R. LevinUniversity of Arizona

Samuel M. OdomUniversity of North Carolina at Chapel Hill

David M. RindskopfCity University of New York

William R. ShadishUniversity of California Merced

Available at: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf

"True" Single-Case Applications and the WWC Standards

What Works Clearinghouse StandardsDesign StandardsEvidence Criteria Social Validity

Joel LevinClassroom-Based

Applications of Single-Case Design

Tom KratochwillIssues to Consider in Selecting a

Single-Case Intervention Design

WWC Design Standards

Evaluating the Quality of Single-Case Designs

51

Evaluate the Design

Meets Design Standards Meets with Reservations Does Not Meet Design Standards

Evaluate the Evidence

Strong Evidence Moderate Evidence No Evidence

Effect-Size Estimation

Social Validity Assessment

WWC Single-Case Pilot Design Standards

Four Standards for Design Evaluation Systematic manipulation of independent variable Inter-assessor agreement Three attempts to demonstrate an effect at three different

points in time Minimum number of phases and data points per phase, for

phases used to demonstrate an effect

Standard 3 Differs by Design Type Reversal / Withdrawal Designs (ABAB and variations) Alternating Treatments Designs Multiple Baseline Designs

53

Standard 1: Systematic Manipulation of the Independent Variable

Researcher Must Determine When and How the Independent Variable Conditions Change.

If Standard Is Not Met, Study Does Not Meet Design Standards.

54

Examples of Manipulation that is Not Systematic

Teacher/Consultee Begins to Implement an Intervention Prematurely Because of Parent Pressure.

Researcher Looks Retrospectively at Data Collected during an Intervention Program.

55

Standard 2: Inter-Assessor Agreement Each Outcome Variable for Each Case Must be

Measured Systematically by More than One Assessor.

Researcher Needs to Collect Inter-Assessor Agreement: In each phase On at least 20% of the data points in each condition (i.e., baseline,

intervention)

Rate of Agreement Must Meet Minimum Thresholds: (e.g., 80% agreement or Cohen’s kappa of 0.60)

If No Outcomes Meet These Criteria, Study Does Not Meet Design Standards.

56

In Current WWC Reviews: Author Queries Occur When Study Provides Insufficient IOA Information

Determine if Standard is Met Based on Response If the result of the query indicates that the study does not meet

standards, treat it as such.

If No Response, Assume Standard is Met if: The minimum level of agreement is reached. The study assesses IOA at least once in each phase. The study assesses IOA on at least 20% of all sessions.

Footnote is added to WWC Product Indicating that IOA Not Fully Determined.

57

Standard 3: Three Attempts to Demonstrate an Intervention Effect at Three Different Points in Time

“Attempts” Are about Phase Transitions

Designs that Could Meet This Standard Include: ABAB design Multiple baseline design with three baseline phases and staggered

introduction of the intervention Alternating treatment design (other designs to be discussed during the Institute)

Designs Not Meeting this Standard Include: AB design ABA design Multiple baselines with three baseline phases and intervention

introduced at the same time for each case 58

Standard 4: Minimum Number of Phases and Data Points per Phase (for Phases in Std 3)

59

Reversal Design

MB Design

AT Design

Meet StandardsNumber of Phases 4 6 n/aWith Data Points per Phase

At least 5 At least 5 At most 2 per phase; At least 5 per condition

Meet Standards with ReservationsNumber of Phases 4 6 n/aWith Data Points per Phase

At least 3 At least 3 At most 2 per phase; At least 4 per condition

Some Examples that "Meet", "Meet with Reservations," and "Does Not Meet Design Standards"

Meets Design Standards with Reservations (MDSWR)

3 Attempts At 3 Different Points in Time

4 Phases with At Least 3 Data Point per Phase

61Adapted from Horner and Spaulding, 2010

0

10

20

30

40

50

60

70

80

90

Prop

ortio

n of

10

Seco

nd In

terv

als

Acad

emic

ally

Eng

aged

Days

Second Demonstration of Effect

Third Demonstration of Ef fect

Baseline1Intervention1

(Peer Tutoring) Baseline2Intervention2

(Peer Tutoring)

Adam

Meets Design Standards (MDS)

3 Attempts At 3 Different Points in Time

6 Phases with At Least 5 Data Point per Phase

62Source: Kern et al., 1994

Ratings Differ by Research Question with ATDs; For Example:

MDSWR – Int 1 v. Int 2

DNotMDS – Int 1 v. Int 3

DNotMDS – Int 2 v. Int 3

63Source: Horner and Spaulding, 2010

1 2 3 405

1015202530354045

Intervention 1 Intervention 2 Intervention 3

Session

Rat

e of

Pro

blem

Beh

avio

r

Extensions of Core Designs; To be Discussed on Day 2 of the Institute

Changing Criterion Designs Researcher pre-schedules changes in the intervention criterion

or intensity of the intervention Can meet evidence standards with at least 3 criterion shifts (for

Standard 3)

Non-concurrent Multiple Baseline Completely non-concurrent MBDs baselines that do not overlap

when examined vertically Designs with NO vertical overlap at baseline do not meet design

standards because of the history threat

Multiple Probe Multiple Probe (Days) Multiple Probe (Conditions) 64

Design Evaluation

Meets Design StandardsIV manipulated directlyIOA documented (e.g., .80 percent agreement; .60 Kappa)

20% of data points in each phaseDesign allows opportunity to assess basic effect at three different points in time.Five data points per phase (or design equivalent)

ATD (four comparison option)

Meets Design Standards with ReservationAll of above, except at least three data points per phase

Does not Meet Design Standards

Wendy Machalicek and Tom Kratochwill

Overview of Basic and Advanced Single-Case Intervention Designs

Establishing “Design Standards” as Applied to Basic Single-Case Designs

ABAB Designs

Multiple Baseline Designs

Alternating Treatment Designs

ABAB Design Description

Simple phase change designs [e.g., ABAB; BCBC design]. (In the literature, ABAB designs are sometimes referred to as withdrawal designs, intrasubject replication designs, within-series designs, or reversal designs)

ABAB Reversal/Withdrawal Designs

In these designs, estimates of level, trend, and variability within a data series are assessed under similar conditions; the manipulated variable is introduced and concomitant changes in the outcome measure(s) are assessed in the level, trend, and variability between phases of the series, with special attention to the degree of overlap, immediacy of effect, and similarity of data patterns across similar phases (e.g., all baseline phases).

Dep

ende

nt V

aria

ble

Baseline(A1 Phase)

Intervention(B1 Phase)

Baseline(A2)

Intervention(B2)

Days/Weeks/Months/Sessions

ABAB Reversal/Withdrawal Designs

Some Design Limitations:• Behavior must be reversible in the ABAB…series

(e.g., return to baseline).• May be ethical issues involved in reversing

behavior back to baseline (A2).• May be a complex study when multiple

conditions need to be compared.• There may be order effects in the design.

Multiple Baseline Design Description

Multiple baseline design. The design can be applied across units(participants), across behaviors, across situations

Multiple Baseline Designs

In these designs, multiple AB data series are compared and introduction of the intervention is staggered across time. Comparisons are made both between and within a data series. Repetitions of a single simple phase change are scheduled, each with a new series and in which both the length and timing of the phase change differ across replications.

Parti

cipa

nt 1

Parti

cipa

nt 2

Parti

cipa

nt 3

Parti

cipa

nt 4

Dep

ende

nt V

aria

ble

Baseline Intervention

Days/Weeks/Months/Sessions

Multiple Baseline Design

Some Design Limitations:• The design is generally limited to demonstrating

the effect of one independent variable on some outcome.

• The design depends on the “independence” of the multiple baselines (across units, settings, and behaviors).

• There can be practical as well as ethical issues in keeping individuals on baseline for long periods of time (as in the last series).

Alternating Treatment Designs

Alternating treatments (in the behavior analysis literature, alternating treatment designs are sometimes referred to as part of a class of multi-element designs)

Alternating Treatment Design Description

In these designs, estimates of level, trend, and variability in a data series are assessed on measures within specific conditions and across time. Changes/differences in the outcome measure(s) are assessed by comparing the series associated with different conditions.

Baseline Interventions 1 and 2

Days/Sessions

Dep

ende

nt V

aria

ble

Intervention 1

Interv. 1 =Interv. 2 =

Alternating Treatment Design

Some Design Limitations:• Behavior must be reversed during alternation of the intervention.• There is the possibility of interaction/carryover

effects as conditions are alternated.• Comparing more than three treatments may be

very challenging.

Application of the WWC Design Standards to Single-Case Designs

When Assessing Design Standards

Does the design allow for the opportunity to assess experimental control?

BaselineAt least five data points per phase (3 w/reservation)Opportunity to document at least 3 basic effects, each at a different point in time.

Basic Effect versus Experimental Control

Basic Effect: Change in the pattern of responding after manipulation of the independent variable (level, trend, variability).

Experimental Control: At least three demonstrations of basic effect, each at a different point in time.

First Demonstration of Basic Effect

Second Demonstration of Basic Effect

Third Demonstration of Basic Effect

Intervention X Intervention X

1. Baseline. 2. Each phase has at least 5 data points (3 w/reservation).3. Design allows for assessment of “basic effect” at three different points in time.

Design Standards Exercise 1Rate the Quality of the Following Single-Case Designs

For each of the following graphs, note:1) The design type 2) The highest possible rating (Meets Standards,

Meets Standards with Reservations, Does Not Meet Standards).

98

Example 1

99Source: Dunlap, et al., 1994

Example 2

100Source: Cunningham et al., 1998

Example 3

101

WCPM = Words read correctly per min

Source: Begeny, J.C., Daly III, E.J., and Valleley, R.J. (2006).

Example 4

102Source: Ingram et al., 2005

Example 5

103Source: Todd et al., 1999

Study Design Exercise 2Inter-Observer Agreement

For each example, note the following about IOA collection:1) Collected for each case on each outcome variable? 2) Collected at least once in all phases?3) Collected on at least 20% of the baseline sessions and

on at least 20% of the intervention sessions? 4) Does IOA meet minimum acceptable values for each

case on each outcome?

104

March and Horner (2002)Note: Problem Behavior and Academic Engagement meet protocol screening

requirements. Participants are Andy, Bill, and Cathy

Interobserver agreement data were collected for problem behavior and academic engagement on at least 22% of observation periods for each phase for each participant. Two independent observers using synchronized earphones to match observation intervals monitored the behavior of a student. Interobserver agreement was calculated on an interval-by-interval basis by dividing the number of intervals with perfect agreement by the total number of intervals observed and multiplying by 100%. In addition, kappa was computed to assess reliability when chance agreement was controlled. Interobserver agreements for problem behavior for Andy, Bill, and Cathy were 87%, 80%, and 83%, respectively. Corresponding kappa scores were .60, .48, and .49. Interobserver agreements for academic engagement for Andy, Bill, and Cathy were 82%, 87%, and 88%, respectively, with kappa scores of .51, .59, and .58. 105

Kamps, Wendland, and Culpepper (2006) Note: Student disruptive behaviors and the percentage of time on task meet protocol screening requirements.

Reliability data for student and teacher behaviors were collected across 14 sessions for Patricia (19%), and across 2 sessions for Michael (8%). For disruptive behaviors, the mean percentage agreement across conditions was 92% (59–100), for academic compliance 94% (69–100), for praise 84% (0–100), and for reprimands 93% (0–100). Low rates reflected low incidence during sessions. On-task reliability was collected for six sessions (8%) for Patricia (M = 93%) and two sessions (8%) for Michael (M = 96%).

106

Randomized Single-Case Intervention Designs

Joel Levin

Visual Analysis of Single-Case Intervention Data

Tom Kratochwill

WWC StandardsEvaluating Single-Case Design Outcomes

With Visual Analysis:Evidence Criteria

110

Evaluate the Design






Visual Analysis of Single-Case Evidence

112

Traditional Method of Data Evaluation for SCDs Determine whether evidence of a causal relation exists Characterize the strength or magnitude of that relation Singular approach used by WWC for rating SCD evidence

Methods for Effect-Size Estimation Several parametric and non-parametric methods proposed Some SCD WWC panel members among those developing

these methods, but methods are still being tested and most not comparable with group-comparison studies

WWC standards for effect-size are being developed as field reaches greater consensus on appropriate statistical approaches

Goal, Rationale, Advantages, and Limitations of Visual Analysis

113

Goal is to Identify Intervention Effects A basic effect is a change in the dependent variable in response to

researcher manipulation of the independent variable. “Subjective” determination of evidence, but practice and common

framework for applying visual analysis can help to improve agreement rate.

Evidence criteria are met by examining effects that are replicated at different points.

Encourages Focus on Interventions with Strong Effects Strong effects are generally desired by applied researchers and

clinicians. Weak results are filtered out because effects should be clear from

looking at data - viewed as an advantage. Statistical evaluation can be more sensitive than visual analysis in

detecting intervention effects.

Goal, Rationale, Advantages, Limitations (cont’d)

114

Statistical Evaluation and Visual Analysis have Some Conceptual Simularies (Kazdin, 2011): Both attempt to avoid Type I and Type II errors

Type I: Concluding the intervention produced an effect when it did not Type II: Concluding the intervention did not produce an effect when it did

Possible Limitations of Visual Analysis Lack of concrete decision-making rules (e.g., in contrast to p<0.05

used in statistical analysis) Multiple influences need to be analyzed simultaneously

Multiple Influences Need to be Considered in Applying Visual Analysis

115

Level: Mean of the data series within a phase Trend: Slope of the best-fit line within a phase Variability: Deviation of data around the best-fit line Percentage of Overlap: Percentage of data from an

intervention phase entering that enters the range of data from the previous phase

Immediacy: Magnitude of change between the last 3 data points in one phase and the first 3 in the next

Consistency: Extent to which data patterns are similar in similar phases

Research on Visual Analysis

Applied Outcome Criteria and Visual Analysis

Decision Criteria in Visual Analysis

Standards for Visual Analysis

Research on Visual AnalysisResearch on visual analysis contains a number of methodological limitations. These limitations have been recognized by Brossart et al. (2006, p. 536) in offering the following recommendations for improvement of visual-analysis research:• Graphs should be fully contextualized, describing a particular

client, target behavior(s), time frame, and data collection instrument.

• Judges should not be asked to predict the degree of statistical significance (i.e., a significance probability p-value) of a particular statistic, but rather should be asked to judge graphs according to their own criteria of practical importance, effect, or impact.

117

Research on Visual Analysis (Contd.)

• Judges should not be asked to make dichotomous yes/no decisions, but rather to judge the extent or amount of intervention effectiveness.

• No single statistical test should be selected as “the valid criterion”; rather, several optional statistical tests should be tentatively compared to the visual analyst’s judgments.

• Only graphs of complete SCD studies should be examined (e.g., ABAB, Alternating Treatment, and Multiple-Baseline Designs).

118

Some Recent Research Findings

• Lieberman, R. G., Yoder, P. J., Reichow, B., & Wolery, M. (2010). Visual analysis of multiple baseline across participants graphs when change is delayed. School Psychology Quarterly, 25, 28-44.

• Kahng, S. W., Chung, K-M., Gutshall, K., Pitts, S. C., Kao, J., & Girolami, K. (2010). Consistent visual analysis of intrasubject data. Journal of Applied Behavior Analysis, 43, 35-45.

119

Lieberman, Yoder, Reichow, and Wolery (2010) tested various characteristics of multiple-baseline designs to determine whether the data features affected the judgments of visual-analysis experts (N= 36 editorial board members of journals that publish SCDs) regarding the presence of a functional relation and agreement on the outcomes. It was found that graphs with steep slopes (versus shallow slopes) when the intervention was introduced were judged as more often having a functional relation. Nevertheless, there was still some disagreement on whether the functional relation had been established. Lieberman et al. (2010) noted that training visual judges to address conditions in which there is change long after the intervention, and where there is inconsistent latency of change across units, may be helpful in reviewers’ concurrence about a functional relation.

120

Kahng, Chung, Gutshall, Pitts, Kao, and Girolami (2010) replicated and extended earlier research on visual analysis by including editorial board members of the Journal of Applied Behavior Analysis as participants in the study. Board members were asked to judge 36 ABAB design graphs on a 100-point scale while rating the degree of experimental control. These authors reported high levels of agreement among judges, noting that the reliability of visual analysis has improved over the years, due in part to better training in visual-analysis methods.

121

122

Erin BartonTraining Protocols in Visual Analysis

Overview of Visual Analysis of Single-Case Data

Parameters Associated with Visual Analysis Four steps in visual analysis Six features considered in visual analysis Additional considerations for MBL and ATD

Concerns about Visual Analysis

Evaluate the Design






Evidence Criteria

Strong Baseline

Documentation of research question “problem” Documentation of predictable pattern (>5 data points)

Each Phase of the Analysis Documentation of predictable pattern (> 5 data points)

Basic effects Documentation of predicted change in the DV when IV is

manipulated Experimental Control

Three demonstrations of basic effect, each at a different point in time.

No demonstrations of intervention failure

Evidence Criteria

Moderate All of “Strong” criteria, with these exceptions:

Only 3-4 data points per phase Three demonstrations of effect, but with additional

demonstrations of failure-to-document effect

No Evidence Misnomer Evidence does not meet Moderate level.

Visual Analysis

Baseline Document the “problem” requiring intervention Typically 5 or more data points Documentation of a pattern of responding that allows prediction into

the future. Each Phase

Documents a clear pattern of responding Typically 5 or more data points

Adjacent phases Do data document a “basic effect”

Whole study Do the phases document experimental control (e.g., at least three

demonstrations of a basic effect, each at a different point in time).

Documenting Experimental Control Three demonstrations of a “basic effect” at three different

points in time. A “basic effect” is a predicted change in the dependent

variable when the independent variable is actively manipulated.

To assess a “basic effect” Visual Analysis includes simultaneous assessment of: Level, Trend, Variability, Immediacy of Effect, Overlap

across Adjacent Phases, Consistency of Data Pattern in Similar Phases.

(Parsonson & Baer, 1978; Kratochwill & Levin, 1992)

Visual Analysis within Single-Case Design

Assessing within phase “pattern” and Between phase “basic effect”

Level Trend Variability + Overlap Immediacy of Effect Consistency across similar phases_________________________________________ Other: vertical analysis; intercept gap

Within Phase

Between

Phases

Special Topics in Single-Case Intervention Research

131

Tom Kratochwill and Erin BartonOverview of Special Topics

132

• Negative Results• Effect Size Applications• Applications of the WWC Standards in

Literature Reviews

Negative Results in Single-Case Intervention Research

133

Negative Results in Single-Case Intervention Research

• The Legacy of Negative Results and its Relationship to Publication Bias

• The Importance of Negative Results in Developing Evidence-Based Practices (Kratochwill, Stoiber, & Gutkin, 2000)

• Negative Results in Single-Case Intervention Research

• Examples using the WWC Standards

134

Negative Results Definition

The term negative results traditionally has meant that there are either: (a) no statistically significant differences between groups that receive different intervention conditions in randomized controlled trials; or (b) no documented differences (visually and/or statistically) between baseline and intervention conditions in experimental single-case designs.

135

Negative Results in Single-Case Design

In the domain of SCD research, negative results reflect findings of (a) no difference between baseline (A) and intervention (B) phases (A = B), (b) a difference between baseline and intervention phases but in the opposite direction to what was predicted (A > B, where B was predicted to be superior to A), (c) no difference between two alternative interventions, B and C (B = C), or (d) a difference between two alternative interventions, but in the direction opposite to what was predicted (B > C, where C was predicted to be superior to B). 136

Negative Effects

Negative results/findings in SCD intervention research should be distinguished from negative effects in intervention research (i.e., iatrogenic effects). Some interventions may actually produce negative effects on participants (i.e., participants get worse or show negative side effects from an intervention)―see, for example Barlow (2010).

137

Selective Results

Selective results refer to the withholding of any findings in a single study or in a replication series (i.e., a series of single-case studies in which the treatment is replicated several times in independent experiments; see also our discussion below for selective results issues in replication series) and can be considered as a part of the domain of negative results.

138

Erroneous ResultsErroneous results have been considered in traditional “group” research in situations where various statistical tests are incorrectly conducted or interpreted to yield findings that are reported as statistically significant but are found not to be when the correct test or interpretation is applied (e.g., Levin, 1985). Also included in the erroneous results category are “spurious” findings that are produced in various research contexts.

139

Erin BartonExample Negative Results Research

140

141

Applications of the WWC Standards in Literature Reviews

142

Toward a Professional Consensus on Using Single-Case Research to Identify Evidence-Based Practices

Five studies documenting experimental control (i.e., MDS or MDSWRs)

Conducted by at least three research teams with no overlapping authorship at three different institutions

The combined number of cases totals at least 20

Each study demonstrates an effect size of ___ ??

Examples of using Single-Case Research to Document Evidence-Based Practice

A systematic evaluation of token economies as a classroom management tool for students with challenging behavior (Maggin, Chafouleas, Goddard, & Johnson, 2011)

Studies documenting experimental control [n=7/3 (MDS-student/classroom),4/0 (MDSWR-student/classroom)]At least three settings /scholars (yes)At least 20 participants (no)

EVIDENCE CRITERIA:Strong evidence (n=1 at the student level and n=3 at the classroom level)Moderate evidence (n=8 at the student level and n=0 at the classroom level)No evidence (n=2 at the student level and n=0 at the classroom level)

Examples of using Single-Case Research to Document Evidence-Based Practice

An application of the What Works Clearinghouse Standards for evaluating single-subject research: Synthesis of the self-management literature base (Maggin, Briesch, & Chafouleas, 2013).

Studies documenting experimental control [n=37 (MDS)/n=31(MDSWR)]At least three settings /scholars (Yes)At least 20 participants (Yes)

EVIDENCE CRITERIA:Strong evidence (n=25)Moderate evidence (n=30)No evidence (n=13)

Implications for Single-Case Research

Role for Single-case research in the development of PROGRAMS of intervention research

Iterative development of interventionsDocumentation of effective practicesDocumentation of modifications for weak and non-responders.

Implications for Single-Case Research

Increase precision of Research QuestionsDefine conceptual logic for research questionDefine research question with greater precision

IV related to change in level, trend, variability?

“Is there is a functional relation between self-management interventions and reduction in the level and variability of problem behavior?”

MeasuresDefine assumptions Distribution (counts)

Implications for Single-Case ResearchBaseline

5 data pointsDocument “problem” under studyDocument predictable patterns

Data points per phaseAt least 5 points per phase (maybe more for some effect size measures)More data points when the data indicate elevated trend and/or variability

Combination of visual and statistical analysisVisual analysis confirmed with statistical analysis

Need for effect size measures in single-case designsFor individual studiesFor meta-analyses

In SummarySingle case methods are an effective and efficient approach for documenting experimental effects.

Need exists for more precise standards for training and using visual analysis, and combinations of visual analysis with statistical analysis.

There are encouraging (but still emerging) approaches for statistical analysis that will improve meta-analysis options.

More precision in review stipulationsEstablishes expectations for reviewers.

ReferencesBarlow, D. H. (2010). Negative effects from psychological treatments: A perspective. American Psychologist, 65, 13-20.

Brossart, D. F., Parker, R. I., Olson, E. A., & Mahadevan, L. (2006). The relationship between visual analysis and five statistical analyses in a simple AB single-case research design. Behavior Modification, 30, 531-563.

References (Continued)Hammond, D. & Gast, D. L. (2010). Descriptive analysis of single-subject research designs: 1983-2007. Education and Training in Autism and Developmental Disabilities, 45, 187-202.

Hartmann, D. P., Barrios, B. A., & Wood, D. D. (2004). Principles of behavioral observation. In S. N. Haynes and E. M. Hieby (Eds.), Comprehensive handbook of psychological assessment (Vol. 3, Behavioral assessment) (pp. 108-127). New York: John Wiley & Sons.

References (Continued)Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidence-based practice in special education. Exceptional Children, 71, 165-179.

Horner, R., & Spaulding, S. (2010). Single-case research designs (pp. 1386-1394). In N. J. Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage Publications.

References (Continued)Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M & Shadish, W. R. (2010). Single case designs technical documentation. In What Works Clearinghouse: Procedures and standards handbook (version 2.0). Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_procedures_v2_standards_handbook.pdf

Maggin, D. M., Chafouleas, S. M., Goddard, K. M., & Johnson, A. H. (2011). A systematic evaluation of token economies as a classroom management tool for students with challenging behavior. Journal of School Psychology, 49, 529-554.

http://ies.ed.gov/ncee/wwc/pdf/wwc_procedures_v2_standards_handbook.pdf

http://ies.ed.gov/ncee/wwc/pdf/wwc_procedures_v2_standards_handbook.pdf

References (Continued)Maggin, D. M., Briesch, A. M., & Chafouleas, S. M. (2013). An application of the What Works Clearinghouse Standards for evaluating single-subject research: Synthesis of the self-management literature base. Remedial and Special Education, 34, 44-58.

Kratochwill, T. R., Stoiber, K. C., & Gutkin, T. B. (2000). Empirically supported interventions in school psychology: The role of negative results in outcome research. Psychology in the Schools, 37, 399-413.

Levin, J. R. (1985). Some methodological and statistical “bugs” in research on children’s learning. In M. Pressley & C. J. Brainerd (Eds.), Cognitive learning and memory in children (pp. 204–233), New York, NY: Springer-Verlag.

References (Continued)Odom, S.L., Brantlinger, E., Gersten, R., Horner, R. H., Thompson, B., & Harris, K. (2005). Research in special education: Scientific methods and evidence-based practices. Exceptional Children, 71, 137–148.

Parsonson, B., & Baer, D. (1978). The analysis and presentation of graphic data. In T. Kratochwill (Ed.) Single Subject Research (pp. 101–166). New York: Academic Press.

Reichow, B., Barton, E. E., Sewell, J. N., Good, L., & Wolery, M. (2010). Effects of weighted vests on the engagement of children with developmental delays and autism. Focus on Autism and Other Developmental Disabilities, 25, 3-11.

References (Continued)Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assessment treatment effects in 2008. Behavioral Research Methods, 43, 971-980. DOI 10.3758/s13428-011-0111-y

Smith, J. D. (2012). Single-case experimental designs: A systematic review of published research and current standards. Psychological Methods, 17,510-550.

Sullivan, K. J. & Shadish, W. R. (2011). An assessment of single-case designs by the What Works Clearinghouse.

References (Continued)Wendt, O., & Miller, B. (2012). Quality appraisal of single-subject experimental designs: An overview and comparison of different appraisal tools. Education and Treatment of Children, 35, 235–268.