IES Summer Research Training Institute: Single- Case Intervention Design and Analysis August 18-22, 2014 The Lowell Center Madison, Wisconsin 1
Jan 06, 2018
IES Summer Research Training Institute: Single-Case Intervention
Design and Analysis August 18-22, 2014
The Lowell Center
Madison, Wisconsin
1
Introductions and General OverviewInstitute Faculty, Participants, and Institute of Education SciencesAssumptions/Goals and ObjectivesSchedule and LogisticsSmall Group Research Planning Sessions and Individual Consultation SessionsResources on Single-Case Design and AnalysisFollow Up with Institute Faculty
Single-Case Design Institute FacultyThomas R. Kratochwill, PhD
(University of Wisconsin-Madison)Joel R. Levin, PhD
(University of Arizona)John Ferron, PhD
(University of South Florida)Erin Barton, PhD
(Vanderbilt University) Wendy Machalicek, PhD(University of Oregon)William Shadish, PhD
(University of California, Merced)Carnac the Magnificent Statistician, PhD, PsyD, JD, MD, DDS,
(but…ABD)(Psychic University)
3
Welcome Participants!
Schedule and Logistics
Agenda for the Institute (The Lowell Center)Breaks, Lunch, and DinnerBreak-out Sessions Individual Project Consultations (see agenda schedule)
Small Group Research Planning Sessions and Consultation
Format for Small Group ActivitiesSmall Group Break-Out Rooms (Rooms will be B1B, 115, 116, and 117; reserved for the week)Graduate Student Assistants:
• Elizabeth Hagermoser, MS• Megan Willes, MS
Schedule for Individual Project Consultations
Goals
Personal Design Consultation:Each participant leaves the Institute with a single-case research study (or program of study) that fits their grant/research agenda.
Group Activity:Each participant leaves the Institute with the experience of working in a collaborative group to build a single case study, and apply visual analysis, statistical analysis, and effect size measures.
Personal Consultation
20 min meetings with faculty (additional time available per schedule)
Specify design or analysis questionsProduct:
Personal ResearchResearch Question(s)Conceptual ModelSingle-Case Design, and anticipated dataAnalysis plan
NOTE: Personal research plans are not required to be shared or disseminated. We want to respect the intellectual contributions of each scholar, yet provide a context where each participant leaves the institute with at least one new study that will meet WWC criteria .
Group Activity
GoalEach participant leaves the Institute with the experience of working in a collaborative group to build a single case study, and apply visual analysis, statistical analysis, and effect size measures.
Resources on Single-Case Design and Analysis
Required ReadingsRecommended ReadingsAdditional ResourcesBooks and MonographsFaculty as a Resource
Follow up to Institute
Thomas R. Kratochwill, PhDWisconsin Center for Education Research1025 West Johnson StreetUniversity of Wisconsin-MadisonMadison, Wisconsin 53706
E-Mail: [email protected]
Single-Case Intervention Research Opportunities at the Institute of
Education SciencesKristen Rhoads, PhD, Project Officer,
National Center for Special Education Research
DAY 1Logic and Foundations of Single-Case Intervention
Research Joel R. Levin and Thomas R. Kratochwill
University of Wisconsin-Madison
Joel LevinBrief Review of the Principles
of Scientifically CredibleIntervention Research and Transition to Single-Case Intervention Research
Tom KratochwillLogic and Foundations of Single-Case
Intervention Research Purposes and Fundamental Assumptions of Single-
Case Intervention Research Methods
Defining features of SCDs Core design types Internal validity and the role of replication
Characteristics of Scientifically Credible Single-Case Intervention Studies “True” Single-Case Applications and the WWC Standards
(design and evidence credibility) Classroom-Based Applications (design and evidence
credibility) 17
Features of Single-Case Research Methods
Single-Case Research will have Four Features:Independent variable Dependent variable Focus is on functional relation (causal effect)Dimension(s) of predicted change (e.g., level, trend, variability, score overlap)
Additional Considerations
Operational definition of dependent variable (DV)Measure of DV is valid, reliable, and addresses the dimension(s) of concern.
Operational definition of independent variable (IV)Core features of IV are defined, and if necessary measured to document fidelity (see Sanetti & Kratochwill, 2014).
Unit of IV implementation Group versus individual unit.
Background on Single-Case Designs: Defining Features
Design: Repeated measurement of an outcome before, during, and/or after active manipulation of independent variable
Often Used in Applied and Clinical Fields Allows study of low prevalence disorders where otherwise would
need large sample for statistical power (Odom, et al., 2005). Sometimes more palatable to service providers because SCDs do
not include a no-treatment comparison group. Hammond and Gast (2010): Descriptive analysis of “single
subject” research designs: 1983-2007. Shadish and Sullivan (2011): Characteristics of single-case
designs used to assess intervention effects in 2008.
20
Descriptive Analysis
Hammond and Gast (2011) reviewed 196 randomly identified journal issues (from 1983-2007) containing 1,936 articles (a total of 556 single-case designs were coded). Multiple baseline designs were reported more often than withdrawal designs and these were more often reported across individuals and groups.
Research Currently Meeting WWC Design Standards
Sullivan and Shadish (2011) assessed the WWC pilot Standards related to implementation of the intervention, acceptable levels of observer agreement/reliability, opportunities to demonstrate a treatment effect, and acceptable numbers of data points in a phase. In published studies in 21 journals in 2008, they found that nearly 45% of the research met the strictest WWC standards of design and 30% met with some reservations. So, it can be concluded from this sample that around 75% of the published research during a sampling year of major journals that publish single-case intervention research would meet (or meet with reservations) the WWC design standards.
Types of Research Questions that Can be Answered with Single-Case Design Types
Evaluate Intervention Effects Relative to Baseline Does Multi-systemic Therapy reduce the level of problem
behavior for students with emotional behavior disorders?
Compare Relative Effectiveness of Interventions Is “function-based behavior support” more effective than “non-
function-base support” at reducing the level and variability of problem behavior for this participant?
Compare Single- and Multi-Component Interventions Does adding Performance Feedback to Basic Teacher Training
improve the fidelity with which instructional skills are used by new teachers in the classroom?
Some Examples of SCD Research Questions that Might be Addressed
Is a certain teaching procedure functionally related to an increase in the level of social initiations by young children with autism?
Is time delay prompting or least-to-most prompting more effective in increasing the level of self-help skills performed by young children with severe intellectual disabilities?
Is the pacing of reading instruction functionally related to increased level and slope of reading performance (as measured by ORF) for third graders?
Is Adderal (at clinically prescribed dosage) functionally related to increased level of attention performance on the Attention Network Test for elementary age students with Attention Deficit Disorder?
Single-Case Designs are Experimental Designs
Like RCTs, purpose is to document causal relationshipsControl for major threats to internal validityDocument effects for specific individuals / settingsReplication (across studies) required to enhance external validityCan be distinguished from case studies
Single-Case Design Standards were Developed to Address Threats to Internal Validity (when the unit of analysis is the individual)
Ambiguous Temporal PrecedenceSelectionHistoryMaturationTestingInstrumentationAdditive and Interactive Effects of Threats
Additional Threats to Internal Validity May Occur When Clusters are the
Unit of AnalysisExamples:• Cluster Selection• Cluster Composition• Within-Cluster Variability• Attrition of Within-Cluster Participants
and of Clusters• Within-Cluster Extraneous Variables• Across-Cluster Contagion Effects
Distinctions Between Experimental Single-Case
Design and Clinical Case Study Research
Some Characteristics of Traditional Case Study Research
Often characterized by narrative description of case, treatment, and outcome variablesTypically lack a formal design with replication but can involve a basic design format (e.g., A/B)Methods have been suggested to improve drawing valid inferences from case study research [e.g., Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings(2nd ed.). New York: Oxford University Press]
Some Examples of Ways to Improve Case Study Research
Type of dataAssessment occasionsPlanned vs. ex post factoProjections of performanceTreatment effect sizeTreatment effect impactNumber of participants/replicationStandardization of treatmentIntegrity of treatment
Wendy MachalieckConstructing Single-Case
Designs Based on Research Questions and Logic Models
Tom KratochwillCharacteristics of
Scientifically Credible Single-Case Intervention
Studies Based on the WWC Pilot Standards
Current Needs in the FieldPublication criteria for peer reviewed journals
Design, Analysis, Interpretation
Grant review criteria (e.g., IES, NSF, NIMH/NIH)RFP stipulations, grant reviewer criteria
Documentation of “Evidence-based Practices”Professional agreement
Training expectations for new scholarsVisual Analysis; Statistical Analysis
Meta-analyses procedures that will allow single-case research content to reach broader audiences
Single-case researchers have a number of conceptual and methodological standards to guide their synthesis work. These standards, alternatively referred to as “guidelines,” have been developed by a number of professional organizations and authors interested primarily in providing guidance for reviewing the literature in a particular content domain (Smith, 2012; Wendt & Miller, 2012). The development of these standards has also provided researchers who are designing their own intervention studies with a protocol that is capable of meeting or exceeding the proposed standards.
Reviews of Appraisal Guidelines
Wendt and Miller (2012) identified seven “quality appraisal tools” and compared these standards to the single-case research criteria advanced by Horner et al. (2005). Smith (2012) reviewed research design and various methodological characteristics of single-case designs in peer-reviewed journals, primarily from the psychological literature (over the years 2000-2010). Based on his review, six standards for appraisal of the literature were identified (some of which overlap with the Wendt and Miller review).
Examples of Professional Groups with SCD Standards or Guidelines:
National Reading PanelAmerican Psychological Association (APA) Division 12/53American Psychological Association (APA) Division 16What Works Clearinghouse (WWC)Consolidated Standards of Reporting Trials (CONSORT) Guidelines for N-of-1 Trials (the CONSORT Extension for N-of1 Trials [CENT] 3
8
ContextSingle-case methods developed and traditionally used within Applied Behavior Analysis
Shavelson &
Towne, 2002
Claims that Visual
Analysis is Unreliable
Emergence
of “Evidence-
based”
practices
IES commitment to Rigorous Education Research
Context: WWC White PaperSingle-Case Intervention Research Design Standards Panel
Thomas R. Kratochwill, ChairUniversity of Wisconsin-Madison
John H. HitchcockOhio University
Robert H. HornerUniversity of Oregon Joel R. LevinUniversity of Arizona
Samuel M. OdomUniversity of North Carolina at Chapel Hill
David M. RindskopfCity University of New York
William R. ShadishUniversity of California Merced
Available at: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf
"True" Single-Case Applications and the WWC Standards
What Works Clearinghouse StandardsDesign StandardsEvidence Criteria Social Validity
Joel LevinClassroom-Based
Applications of Single-Case Design
Tom KratochwillIssues to Consider in Selecting a
Single-Case Intervention Design
WWC Design Standards
Evaluating the Quality of Single-Case Designs
51
Evaluate the Design
Meets Design Standards Meets with Reservations Does Not Meet Design Standards
Evaluate the Evidence
Strong Evidence Moderate Evidence No Evidence
Effect-Size Estimation
Social Validity Assessment
WWC Single-Case Pilot Design Standards
Four Standards for Design Evaluation Systematic manipulation of independent variable Inter-assessor agreement Three attempts to demonstrate an effect at three different
points in time Minimum number of phases and data points per phase, for
phases used to demonstrate an effect
Standard 3 Differs by Design Type Reversal / Withdrawal Designs (ABAB and variations) Alternating Treatments Designs Multiple Baseline Designs
53
Standard 1: Systematic Manipulation of the Independent Variable
Researcher Must Determine When and How the Independent Variable Conditions Change.
If Standard Is Not Met, Study Does Not Meet Design Standards.
54
Examples of Manipulation that is Not Systematic
Teacher/Consultee Begins to Implement an Intervention Prematurely Because of Parent Pressure.
Researcher Looks Retrospectively at Data Collected during an Intervention Program.
55
Standard 2: Inter-Assessor Agreement Each Outcome Variable for Each Case Must be
Measured Systematically by More than One Assessor.
Researcher Needs to Collect Inter-Assessor Agreement: In each phase On at least 20% of the data points in each condition (i.e., baseline,
intervention)
Rate of Agreement Must Meet Minimum Thresholds: (e.g., 80% agreement or Cohen’s kappa of 0.60)
If No Outcomes Meet These Criteria, Study Does Not Meet Design Standards.
56
In Current WWC Reviews: Author Queries Occur When Study Provides Insufficient IOA Information
Determine if Standard is Met Based on Response If the result of the query indicates that the study does not meet
standards, treat it as such.
If No Response, Assume Standard is Met if: The minimum level of agreement is reached. The study assesses IOA at least once in each phase. The study assesses IOA on at least 20% of all sessions.
Footnote is added to WWC Product Indicating that IOA Not Fully Determined.
57
Standard 3: Three Attempts to Demonstrate an Intervention Effect at Three Different Points in Time
“Attempts” Are about Phase Transitions
Designs that Could Meet This Standard Include: ABAB design Multiple baseline design with three baseline phases and staggered
introduction of the intervention Alternating treatment design (other designs to be discussed during the Institute)
Designs Not Meeting this Standard Include: AB design ABA design Multiple baselines with three baseline phases and intervention
introduced at the same time for each case 58
Standard 4: Minimum Number of Phases and Data Points per Phase (for Phases in Std 3)
59
Reversal Design
MB Design
AT Design
Meet StandardsNumber of Phases 4 6 n/aWith Data Points per Phase
At least 5 At least 5 At most 2 per phase; At least 5 per condition
Meet Standards with ReservationsNumber of Phases 4 6 n/aWith Data Points per Phase
At least 3 At least 3 At most 2 per phase; At least 4 per condition
Some Examples that "Meet", "Meet with Reservations," and "Does Not Meet Design Standards"
Meets Design Standards with Reservations (MDSWR)
3 Attempts At 3 Different Points in Time
4 Phases with At Least 3 Data Point per Phase
61Adapted from Horner and Spaulding, 2010
0
10
20
30
40
50
60
70
80
90
Prop
ortio
n of
10
Seco
nd In
terv
als
Acad
emic
ally
Eng
aged
Days
Second Demonstration of Effect
Third Demonstration of Ef fect
Baseline1Intervention1
(Peer Tutoring) Baseline2Intervention2
(Peer Tutoring)
Adam
Meets Design Standards (MDS)
3 Attempts At 3 Different Points in Time
6 Phases with At Least 5 Data Point per Phase
62Source: Kern et al., 1994
Ratings Differ by Research Question with ATDs; For Example:
MDSWR – Int 1 v. Int 2
DNotMDS – Int 1 v. Int 3
DNotMDS – Int 2 v. Int 3
63Source: Horner and Spaulding, 2010
1 2 3 405
1015202530354045
Intervention 1 Intervention 2 Intervention 3
Session
Rat
e of
Pro
blem
Beh
avio
r
Extensions of Core Designs; To be Discussed on Day 2 of the Institute
Changing Criterion Designs Researcher pre-schedules changes in the intervention criterion
or intensity of the intervention Can meet evidence standards with at least 3 criterion shifts (for
Standard 3)
Non-concurrent Multiple Baseline Completely non-concurrent MBDs baselines that do not overlap
when examined vertically Designs with NO vertical overlap at baseline do not meet design
standards because of the history threat
Multiple Probe Multiple Probe (Days) Multiple Probe (Conditions) 64
Design Evaluation
Meets Design StandardsIV manipulated directlyIOA documented (e.g., .80 percent agreement; .60 Kappa)
20% of data points in each phaseDesign allows opportunity to assess basic effect at three different points in time.Five data points per phase (or design equivalent)
ATD (four comparison option)
Meets Design Standards with ReservationAll of above, except at least three data points per phase
Does not Meet Design Standards
Wendy Machalicek and Tom Kratochwill
Overview of Basic and Advanced Single-Case Intervention Designs
Establishing “Design Standards” as Applied to Basic Single-Case Designs
ABAB Designs
Multiple Baseline Designs
Alternating Treatment Designs
ABAB Design Description
Simple phase change designs [e.g., ABAB; BCBC design]. (In the literature, ABAB designs are sometimes referred to as withdrawal designs, intrasubject replication designs, within-series designs, or reversal designs)
ABAB Reversal/Withdrawal Designs
In these designs, estimates of level, trend, and variability within a data series are assessed under similar conditions; the manipulated variable is introduced and concomitant changes in the outcome measure(s) are assessed in the level, trend, and variability between phases of the series, with special attention to the degree of overlap, immediacy of effect, and similarity of data patterns across similar phases (e.g., all baseline phases).
Dep
ende
nt V
aria
ble
Baseline(A1 Phase)
Intervention(B1 Phase)
Baseline(A2)
Intervention(B2)
Days/Weeks/Months/Sessions
ABAB Reversal/Withdrawal Designs
Some Design Limitations:• Behavior must be reversible in the ABAB…series
(e.g., return to baseline).• May be ethical issues involved in reversing
behavior back to baseline (A2).• May be a complex study when multiple
conditions need to be compared.• There may be order effects in the design.
Multiple Baseline Design Description
Multiple baseline design. The design can be applied across units(participants), across behaviors, across situations
Multiple Baseline Designs
In these designs, multiple AB data series are compared and introduction of the intervention is staggered across time. Comparisons are made both between and within a data series. Repetitions of a single simple phase change are scheduled, each with a new series and in which both the length and timing of the phase change differ across replications.
Parti
cipa
nt 1
Parti
cipa
nt 2
Parti
cipa
nt 3
Parti
cipa
nt 4
Dep
ende
nt V
aria
ble
Baseline Intervention
Days/Weeks/Months/Sessions
Multiple Baseline Design
Some Design Limitations:• The design is generally limited to demonstrating
the effect of one independent variable on some outcome.
• The design depends on the “independence” of the multiple baselines (across units, settings, and behaviors).
• There can be practical as well as ethical issues in keeping individuals on baseline for long periods of time (as in the last series).
Alternating Treatment Designs
Alternating treatments (in the behavior analysis literature, alternating treatment designs are sometimes referred to as part of a class of multi-element designs)
Alternating Treatment Design Description
In these designs, estimates of level, trend, and variability in a data series are assessed on measures within specific conditions and across time. Changes/differences in the outcome measure(s) are assessed by comparing the series associated with different conditions.
Baseline Interventions 1 and 2
Days/Sessions
Dep
ende
nt V
aria
ble
Intervention 1
Interv. 1 =Interv. 2 =
Alternating Treatment Design
Some Design Limitations:• Behavior must be reversed during alternation of the intervention.• There is the possibility of interaction/carryover
effects as conditions are alternated.• Comparing more than three treatments may be
very challenging.
Application of the WWC Design Standards to Single-Case Designs
When Assessing Design Standards
Does the design allow for the opportunity to assess experimental control?
BaselineAt least five data points per phase (3 w/reservation)Opportunity to document at least 3 basic effects, each at a different point in time.
Basic Effect versus Experimental Control
Basic Effect: Change in the pattern of responding after manipulation of the independent variable (level, trend, variability).
Experimental Control: At least three demonstrations of basic effect, each at a different point in time.
First Demonstration of Basic Effect
Second Demonstration of Basic Effect
Third Demonstration of Basic Effect
Intervention X Intervention X
1. Baseline. 2. Each phase has at least 5 data points (3 w/reservation).3. Design allows for assessment of “basic effect” at three different points in time.
Design Standards Exercise 1Rate the Quality of the Following Single-Case Designs
For each of the following graphs, note:1) The design type 2) The highest possible rating (Meets Standards,
Meets Standards with Reservations, Does Not Meet Standards).
98
Example 1
99Source: Dunlap, et al., 1994
Example 2
100Source: Cunningham et al., 1998
Example 3
101
WCPM = Words read correctly per min
Source: Begeny, J.C., Daly III, E.J., and Valleley, R.J. (2006).
Example 4
102Source: Ingram et al., 2005
Example 5
103Source: Todd et al., 1999
Study Design Exercise 2Inter-Observer Agreement
For each example, note the following about IOA collection:1) Collected for each case on each outcome variable? 2) Collected at least once in all phases?3) Collected on at least 20% of the baseline sessions and
on at least 20% of the intervention sessions? 4) Does IOA meet minimum acceptable values for each
case on each outcome?
104
March and Horner (2002)Note: Problem Behavior and Academic Engagement meet protocol screening
requirements. Participants are Andy, Bill, and Cathy
Interobserver agreement data were collected for problem behavior and academic engagement on at least 22% of observation periods for each phase for each participant. Two independent observers using synchronized earphones to match observation intervals monitored the behavior of a student. Interobserver agreement was calculated on an interval-by-interval basis by dividing the number of intervals with perfect agreement by the total number of intervals observed and multiplying by 100%. In addition, kappa was computed to assess reliability when chance agreement was controlled. Interobserver agreements for problem behavior for Andy, Bill, and Cathy were 87%, 80%, and 83%, respectively. Corresponding kappa scores were .60, .48, and .49. Interobserver agreements for academic engagement for Andy, Bill, and Cathy were 82%, 87%, and 88%, respectively, with kappa scores of .51, .59, and .58. 105
Kamps, Wendland, and Culpepper (2006) Note: Student disruptive behaviors and the percentage of time on task meet protocol screening requirements.
Reliability data for student and teacher behaviors were collected across 14 sessions for Patricia (19%), and across 2 sessions for Michael (8%). For disruptive behaviors, the mean percentage agreement across conditions was 92% (59–100), for academic compliance 94% (69–100), for praise 84% (0–100), and for reprimands 93% (0–100). Low rates reflected low incidence during sessions. On-task reliability was collected for six sessions (8%) for Patricia (M = 93%) and two sessions (8%) for Michael (M = 96%).
106
Randomized Single-Case Intervention Designs
Joel Levin
Visual Analysis of Single-Case Intervention Data
Tom Kratochwill
WWC StandardsEvaluating Single-Case Design Outcomes
With Visual Analysis:Evidence Criteria
110
Evaluate the Design
Meets Design Standards Meets with Reservations Does Not Meet Design Standards
Evaluate the Evidence
Strong Evidence Moderate Evidence No Evidence
Effect-Size Estimation
Social Validity Assessment
Visual Analysis of Single-Case Evidence
112
Traditional Method of Data Evaluation for SCDs Determine whether evidence of a causal relation exists Characterize the strength or magnitude of that relation Singular approach used by WWC for rating SCD evidence
Methods for Effect-Size Estimation Several parametric and non-parametric methods proposed Some SCD WWC panel members among those developing
these methods, but methods are still being tested and most not comparable with group-comparison studies
WWC standards for effect-size are being developed as field reaches greater consensus on appropriate statistical approaches
Goal, Rationale, Advantages, and Limitations of Visual Analysis
113
Goal is to Identify Intervention Effects A basic effect is a change in the dependent variable in response to
researcher manipulation of the independent variable. “Subjective” determination of evidence, but practice and common
framework for applying visual analysis can help to improve agreement rate.
Evidence criteria are met by examining effects that are replicated at different points.
Encourages Focus on Interventions with Strong Effects Strong effects are generally desired by applied researchers and
clinicians. Weak results are filtered out because effects should be clear from
looking at data - viewed as an advantage. Statistical evaluation can be more sensitive than visual analysis in
detecting intervention effects.
Goal, Rationale, Advantages, Limitations (cont’d)
114
Statistical Evaluation and Visual Analysis have Some Conceptual Simularies (Kazdin, 2011): Both attempt to avoid Type I and Type II errors
Type I: Concluding the intervention produced an effect when it did not Type II: Concluding the intervention did not produce an effect when it did
Possible Limitations of Visual Analysis Lack of concrete decision-making rules (e.g., in contrast to p<0.05
used in statistical analysis) Multiple influences need to be analyzed simultaneously
Multiple Influences Need to be Considered in Applying Visual Analysis
115
Level: Mean of the data series within a phase Trend: Slope of the best-fit line within a phase Variability: Deviation of data around the best-fit line Percentage of Overlap: Percentage of data from an
intervention phase entering that enters the range of data from the previous phase
Immediacy: Magnitude of change between the last 3 data points in one phase and the first 3 in the next
Consistency: Extent to which data patterns are similar in similar phases
Research on Visual Analysis
Applied Outcome Criteria and Visual Analysis
Decision Criteria in Visual Analysis
Standards for Visual Analysis
Research on Visual AnalysisResearch on visual analysis contains a number of methodological limitations. These limitations have been recognized by Brossart et al. (2006, p. 536) in offering the following recommendations for improvement of visual-analysis research:• Graphs should be fully contextualized, describing a particular
client, target behavior(s), time frame, and data collection instrument.
• Judges should not be asked to predict the degree of statistical significance (i.e., a significance probability p-value) of a particular statistic, but rather should be asked to judge graphs according to their own criteria of practical importance, effect, or impact.
117
Research on Visual Analysis (Contd.)
• Judges should not be asked to make dichotomous yes/no decisions, but rather to judge the extent or amount of intervention effectiveness.
• No single statistical test should be selected as “the valid criterion”; rather, several optional statistical tests should be tentatively compared to the visual analyst’s judgments.
• Only graphs of complete SCD studies should be examined (e.g., ABAB, Alternating Treatment, and Multiple-Baseline Designs).
118
Some Recent Research Findings
• Lieberman, R. G., Yoder, P. J., Reichow, B., & Wolery, M. (2010). Visual analysis of multiple baseline across participants graphs when change is delayed. School Psychology Quarterly, 25, 28-44.
• Kahng, S. W., Chung, K-M., Gutshall, K., Pitts, S. C., Kao, J., & Girolami, K. (2010). Consistent visual analysis of intrasubject data. Journal of Applied Behavior Analysis, 43, 35-45.
119
Lieberman, Yoder, Reichow, and Wolery (2010) tested various characteristics of multiple-baseline designs to determine whether the data features affected the judgments of visual-analysis experts (N= 36 editorial board members of journals that publish SCDs) regarding the presence of a functional relation and agreement on the outcomes. It was found that graphs with steep slopes (versus shallow slopes) when the intervention was introduced were judged as more often having a functional relation. Nevertheless, there was still some disagreement on whether the functional relation had been established. Lieberman et al. (2010) noted that training visual judges to address conditions in which there is change long after the intervention, and where there is inconsistent latency of change across units, may be helpful in reviewers’ concurrence about a functional relation.
120
Kahng, Chung, Gutshall, Pitts, Kao, and Girolami (2010) replicated and extended earlier research on visual analysis by including editorial board members of the Journal of Applied Behavior Analysis as participants in the study. Board members were asked to judge 36 ABAB design graphs on a 100-point scale while rating the degree of experimental control. These authors reported high levels of agreement among judges, noting that the reliability of visual analysis has improved over the years, due in part to better training in visual-analysis methods.
121
122
Erin BartonTraining Protocols in Visual Analysis
Overview of Visual Analysis of Single-Case Data
Parameters Associated with Visual Analysis Four steps in visual analysis Six features considered in visual analysis Additional considerations for MBL and ATD
Concerns about Visual Analysis
Evaluate the Design
Meets Design Standards Meets with Reservations Does Not Meet Design Standards
Evaluate the Evidence
Strong Evidence Moderate Evidence No Evidence
Effect-Size Estimation
Social Validity Assessment
Evidence Criteria
Strong Baseline
Documentation of research question “problem” Documentation of predictable pattern (>5 data points)
Each Phase of the Analysis Documentation of predictable pattern (> 5 data points)
Basic effects Documentation of predicted change in the DV when IV is
manipulated Experimental Control
Three demonstrations of basic effect, each at a different point in time.
No demonstrations of intervention failure
Evidence Criteria
Moderate All of “Strong” criteria, with these exceptions:
Only 3-4 data points per phase Three demonstrations of effect, but with additional
demonstrations of failure-to-document effect
No Evidence Misnomer Evidence does not meet Moderate level.
Visual Analysis
Baseline Document the “problem” requiring intervention Typically 5 or more data points Documentation of a pattern of responding that allows prediction into
the future. Each Phase
Documents a clear pattern of responding Typically 5 or more data points
Adjacent phases Do data document a “basic effect”
Whole study Do the phases document experimental control (e.g., at least three
demonstrations of a basic effect, each at a different point in time).
Documenting Experimental Control Three demonstrations of a “basic effect” at three different
points in time. A “basic effect” is a predicted change in the dependent
variable when the independent variable is actively manipulated.
To assess a “basic effect” Visual Analysis includes simultaneous assessment of: Level, Trend, Variability, Immediacy of Effect, Overlap
across Adjacent Phases, Consistency of Data Pattern in Similar Phases.
(Parsonson & Baer, 1978; Kratochwill & Levin, 1992)
Visual Analysis within Single-Case Design
Assessing within phase “pattern” and Between phase “basic effect”
Level Trend Variability + Overlap Immediacy of Effect Consistency across similar phases_________________________________________ Other: vertical analysis; intercept gap
Within Phase
Between
Phases
Special Topics in Single-Case Intervention Research
131
Tom Kratochwill and Erin BartonOverview of Special Topics
132
• Negative Results• Effect Size Applications• Applications of the WWC Standards in
Literature Reviews
Negative Results in Single-Case Intervention Research
133
Negative Results in Single-Case Intervention Research
• The Legacy of Negative Results and its Relationship to Publication Bias
• The Importance of Negative Results in Developing Evidence-Based Practices (Kratochwill, Stoiber, & Gutkin, 2000)
• Negative Results in Single-Case Intervention Research
• Examples using the WWC Standards
134
Negative Results Definition
The term negative results traditionally has meant that there are either: (a) no statistically significant differences between groups that receive different intervention conditions in randomized controlled trials; or (b) no documented differences (visually and/or statistically) between baseline and intervention conditions in experimental single-case designs.
135
Negative Results in Single-Case Design
In the domain of SCD research, negative results reflect findings of (a) no difference between baseline (A) and intervention (B) phases (A = B), (b) a difference between baseline and intervention phases but in the opposite direction to what was predicted (A > B, where B was predicted to be superior to A), (c) no difference between two alternative interventions, B and C (B = C), or (d) a difference between two alternative interventions, but in the direction opposite to what was predicted (B > C, where C was predicted to be superior to B). 136
Negative Effects
Negative results/findings in SCD intervention research should be distinguished from negative effects in intervention research (i.e., iatrogenic effects). Some interventions may actually produce negative effects on participants (i.e., participants get worse or show negative side effects from an intervention)―see, for example Barlow (2010).
137
Selective Results
Selective results refer to the withholding of any findings in a single study or in a replication series (i.e., a series of single-case studies in which the treatment is replicated several times in independent experiments; see also our discussion below for selective results issues in replication series) and can be considered as a part of the domain of negative results.
138
Erroneous ResultsErroneous results have been considered in traditional “group” research in situations where various statistical tests are incorrectly conducted or interpreted to yield findings that are reported as statistically significant but are found not to be when the correct test or interpretation is applied (e.g., Levin, 1985). Also included in the erroneous results category are “spurious” findings that are produced in various research contexts.
139
Erin BartonExample Negative Results Research
140
141
Applications of the WWC Standards in Literature Reviews
142
Toward a Professional Consensus on Using Single-Case Research to Identify Evidence-Based Practices
Five studies documenting experimental control (i.e., MDS or MDSWRs)
Conducted by at least three research teams with no overlapping authorship at three different institutions
The combined number of cases totals at least 20
Each study demonstrates an effect size of ___ ??
Examples of using Single-Case Research to Document Evidence-Based Practice
A systematic evaluation of token economies as a classroom management tool for students with challenging behavior (Maggin, Chafouleas, Goddard, & Johnson, 2011)
Studies documenting experimental control [n=7/3 (MDS-student/classroom),4/0 (MDSWR-student/classroom)]At least three settings /scholars (yes)At least 20 participants (no)
EVIDENCE CRITERIA:Strong evidence (n=1 at the student level and n=3 at the classroom level)Moderate evidence (n=8 at the student level and n=0 at the classroom level)No evidence (n=2 at the student level and n=0 at the classroom level)
Examples of using Single-Case Research to Document Evidence-Based Practice
An application of the What Works Clearinghouse Standards for evaluating single-subject research: Synthesis of the self-management literature base (Maggin, Briesch, & Chafouleas, 2013).
Studies documenting experimental control [n=37 (MDS)/n=31(MDSWR)]At least three settings /scholars (Yes)At least 20 participants (Yes)
EVIDENCE CRITERIA:Strong evidence (n=25)Moderate evidence (n=30)No evidence (n=13)
Implications for Single-Case Research
Role for Single-case research in the development of PROGRAMS of intervention research
Iterative development of interventionsDocumentation of effective practicesDocumentation of modifications for weak and non-responders.
Implications for Single-Case Research
Increase precision of Research QuestionsDefine conceptual logic for research questionDefine research question with greater precision
IV related to change in level, trend, variability?
“Is there is a functional relation between self-management interventions and reduction in the level and variability of problem behavior?”
MeasuresDefine assumptions Distribution (counts)
Implications for Single-Case ResearchBaseline
5 data pointsDocument “problem” under studyDocument predictable patterns
Data points per phaseAt least 5 points per phase (maybe more for some effect size measures)More data points when the data indicate elevated trend and/or variability
Combination of visual and statistical analysisVisual analysis confirmed with statistical analysis
Need for effect size measures in single-case designsFor individual studiesFor meta-analyses
In SummarySingle case methods are an effective and efficient approach for documenting experimental effects.
Need exists for more precise standards for training and using visual analysis, and combinations of visual analysis with statistical analysis.
There are encouraging (but still emerging) approaches for statistical analysis that will improve meta-analysis options.
More precision in review stipulationsEstablishes expectations for reviewers.
ReferencesBarlow, D. H. (2010). Negative effects from psychological treatments: A perspective. American Psychologist, 65, 13-20.
Brossart, D. F., Parker, R. I., Olson, E. A., & Mahadevan, L. (2006). The relationship between visual analysis and five statistical analyses in a simple AB single-case research design. Behavior Modification, 30, 531-563.
References (Continued)Hammond, D. & Gast, D. L. (2010). Descriptive analysis of single-subject research designs: 1983-2007. Education and Training in Autism and Developmental Disabilities, 45, 187-202.
Hartmann, D. P., Barrios, B. A., & Wood, D. D. (2004). Principles of behavioral observation. In S. N. Haynes and E. M. Hieby (Eds.), Comprehensive handbook of psychological assessment (Vol. 3, Behavioral assessment) (pp. 108-127). New York: John Wiley & Sons.
References (Continued)Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidence-based practice in special education. Exceptional Children, 71, 165-179.
Horner, R., & Spaulding, S. (2010). Single-case research designs (pp. 1386-1394). In N. J. Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage Publications.
References (Continued)Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M & Shadish, W. R. (2010). Single case designs technical documentation. In What Works Clearinghouse: Procedures and standards handbook (version 2.0). Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_procedures_v2_standards_handbook.pdf
Maggin, D. M., Chafouleas, S. M., Goddard, K. M., & Johnson, A. H. (2011). A systematic evaluation of token economies as a classroom management tool for students with challenging behavior. Journal of School Psychology, 49, 529-554.
References (Continued)Maggin, D. M., Briesch, A. M., & Chafouleas, S. M. (2013). An application of the What Works Clearinghouse Standards for evaluating single-subject research: Synthesis of the self-management literature base. Remedial and Special Education, 34, 44-58.
Kratochwill, T. R., Stoiber, K. C., & Gutkin, T. B. (2000). Empirically supported interventions in school psychology: The role of negative results in outcome research. Psychology in the Schools, 37, 399-413.
Levin, J. R. (1985). Some methodological and statistical “bugs” in research on children’s learning. In M. Pressley & C. J. Brainerd (Eds.), Cognitive learning and memory in children (pp. 204–233), New York, NY: Springer-Verlag.
References (Continued)Odom, S.L., Brantlinger, E., Gersten, R., Horner, R. H., Thompson, B., & Harris, K. (2005). Research in special education: Scientific methods and evidence-based practices. Exceptional Children, 71, 137–148.
Parsonson, B., & Baer, D. (1978). The analysis and presentation of graphic data. In T. Kratochwill (Ed.) Single Subject Research (pp. 101–166). New York: Academic Press.
Reichow, B., Barton, E. E., Sewell, J. N., Good, L., & Wolery, M. (2010). Effects of weighted vests on the engagement of children with developmental delays and autism. Focus on Autism and Other Developmental Disabilities, 25, 3-11.
References (Continued)Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assessment treatment effects in 2008. Behavioral Research Methods, 43, 971-980. DOI 10.3758/s13428-011-0111-y
Smith, J. D. (2012). Single-case experimental designs: A systematic review of published research and current standards. Psychological Methods, 17,510-550.
Sullivan, K. J. & Shadish, W. R. (2011). An assessment of single-case designs by the What Works Clearinghouse.
References (Continued)Wendt, O., & Miller, B. (2012). Quality appraisal of single-subject experimental designs: An overview and comparison of different appraisal tools. Education and Treatment of Children, 35, 235–268.