Understanding CELF-5 Reliability & Validity to Improve Diagnostic ...

1 Copyright © 2014. Pearson, Inc., or its affiliates. All rights reserved.

CELF-5 Reliability and Validity

Adam Scheller, Ph.D.

Pearson Clinical Assessment

Understanding CELF-5 Reliability &

Validity to Improve Diagnostic Decisions

Adam Scheller, Ph.D. Senior Educational Consultant

Pearson

2 | Copyright 2013. Pearson Education and its Affiliates. All rights reserved 2 | Copyright 2013. Pearson Education and its Affiliates. All rights reserved

Disclosures

• Dr. Scheller is an employee of Pearson, publisher of the CELF-5. No other language assessments will be presented in this presentation.






Agenda

1. Introduction 1. Disclosure

2. Agenda

3. Learning Objectives

2. Research Overview 1. Standardization

2. Reliability

3. Validity

3. Summary/Questions


Learning Outcomes

1. Name at least one study conducted to evaluate the reliability of the CELF-5.

2. Describe two procedures used to evaluate test score and index score differences to determine if the differences are significant.

3. Describe the average difference of CELF-4 and CELF-5 scores in a study conducted with typically developing students.

4. Identify the sensitivity/specificity on a chart using the cut score used in the clinicians’ place of employment.

5. Name the optimal cut score on CELF-5 that provides the best balance between the sensitivity and specificity measures.





Research Overview


Research Overview

• Multiple research phases

• Over 4000 students tested in standardization and related reliability and validity studies

• Students tested from March through December 2012

• Over 450 SLPs across the U.S. participated in standardization testing






Multiple Bias Studies

• Multiple phases of objective and subjective reviews of administration directions, cues, test items, and test formats – Assessment/bias experts examined test items for

potential bias related to • Socioeconomic status

• Race/Ethnicity

• Gender

• Culture

• Region

– Clinicians in the field provided feedback about students’ responses and engagement in test tasks

– Statistical analysis of bias verified or refuted subjective bias concerns


A Diverse STDZ Sample: Race/Ethnicity

Race/Ethnicity N %

Asian 87 3.7%

Black 328 13.8%

Hispanic 476 20.0%

Other 137 5.8%

White 1352 56.8%

Total Sample 2380 100%






A Diverse STDZ Sample: Parent Education Level

Parent Education Level

N %

Less than 11 years 292 12.3%

H.S. Diploma or GED 544 22.9%

1-3 Years College or Technical School

817 34.3%

4 or more Years of College

727 30.6%



A Diverse STDZ Sample: Region

Region of the US N %

Midwest 567 23.8%

Northeast 363 15.3%

South 873 36.7%

West 577 24.3%







Other Demographic Variables

Gender N %

Female 1190 50%

Male 1190 50%



Raw Score Means and SDs






Raw Score Means and SDs (cont.)

Reliability






Reliability 101

• How confident are you in the accuracy of a test score?

• Reliability = accuracy, consistency and stability of test scores across situations

• True Score = Observed Score + Error

– Errors are systemic and random


Internal Consistency & Test-retest

• Internal consistency: estimates how consistently the items of the test measure one construct (homogeneity)

– Split-half method (Spearman-Brown correction): correlation between the total scores of two half-tests

• Test-retest stability: correlation between test and retest scores.

– Time interval between the test and retest is as short as possible.






Test Reliability Coefficients

CELF-5 Test

Average Reliability

Coefficients

(across target ages)

Sentence Comprehension .87 Good

Linguistic Concepts .91 Excellent

Word Structure .89 Good

Word Classes .90 Excellent

Following Directions .91 Excellent

Formulated Sentences .86 Good

Recalling Sentences .94 Excellent

Understanding Spoken Paragraphs .85 Good

Word Definitions .89 Good

Sentence Assembly .93 Excellent

Semantic Relationships .89 Good

Pragmatics Profile .98 Excellent

Reading Comprehension .87 Good

Structured Writing .75 Acceptable


Index Score Reliability Coefficients

CELF-5 Index Scores

Average Reliability

Coefficients

(across target ages)

Core Language Score .96 Excellent

Receptive Language Index .95 Excellent

Expressive Language Index .95 Excellent

Language Content Index .95 Excellent

Language Structure Index .96 Excellent

Language Memory Index .95 Excellent






Reliabilities for Clinical Groups

Clinical Group

Test

Language Disorder (n=166)

Learning Disability

(Reading & Writing) (n=69)

Autism Spectrum Disorder (n=66)

Average rxx

Sentence

Comprehension .94 Excellent -- .96 Excellent .95 Excellent

Linguistic

Concepts .96 Excellent -- .98 Excellent .97 Excellent

Word Structure .93 Excellent -- .94 Excellent .94 Excellent

Word Classes .96 Excellent .92 Excellent .97 Excellent .95 Excellent

Following

Directions .96 Excellent .90 Excellent .98 Excellent .96 Excellent

Formulated

Sentences .97 Excellent .89 Good .96 Excellent .95 Excellent

Recalling

Sentences .98 Excellent .92 Excellent .97 Excellent .96 Excellent


Clinical Group

Test

Language Disorder (n=166)

Learning Disability

(Reading & Writing) (n=69)

Autism Spectrum Disorder (n=66)

Average rxx

Understanding

Spoken

Paragraphs

.81 Good .75 Acceptable .91 Excellent .84 Good

Word

Definitions .87 Good .91 Excellent .95 Excellent .92 Excellent

Sentence

Assembly .92 Excellent .94 Excellent .97 Excellent .95 Excellent

Semantic

Relationships .88 Good .89 Good .96 Excellent .92 Excellent

Pragmatics

Profile .99 Excellent .99 Excellent .99 Excellent .99 Excellent

Reading

Comprehension .93 Excellent .86 Good .93 Excellent .91 Excellent

Reliabilities for Clinical Groups (cont.)






Standard Error of Measurement (SEM) & Confidence Interval


Standard Error of Measurement (SEM) & Confidence Interval (cont.)






Test-retest Stability

Test Retest Stability (n= 137)

Test-retest interval: 7-46 days

Mean: 19 days


Inter-scorer Agreement

Inter-scorer agreement Reliability

Word Structure .99

Formulated Sentences .95

Word Definitions .91

Structured Writing .96






Score Differences

• Interpretation of performance:

1. Examine if difference is statistically significant

• Reflection of Standard Error

2. Examine if difference is clinically significant (rare)

Validity






Validity 101

• How well can your test results predict the presence of a disorder (or, predict one’s skill)?

• Validity is demonstrated through evidence supporting a test’s interpretations and uses. – Validity Evidence: the degree to which specific

data, research, or theory supports that: 1. A test measures the concepts it’s supposed to

measure

2. The test is applicable to its intended population


Evidence Based on Test Content

• Validity evidence related to test content

– Content Relevance: when the content areas being measured are accepted as relating to the proposed construct

– Content Coverage: when the content areas measured by the test are accepted to be an adequate sampling of these areas (Also developmentally appropriate)

• Validity evidence includes: literature review; users’ feedback; and expert review

• CELF-5 designed to measure…






Intercorrelation Studies


Relationship with Other Variables (CELF-4)






CELF-5 – CELF-4 (cont.)


Language Disorder Study






LD Study (cont.)


Sensitivity and Specificity 101

Sensitivity: The probability that someone who has the “condition” will test positive for it.

Specificity: The probability that someone who does not have the “condition” will test negative.

Errors False Positive: A student who is falsely identified as

having a condition or disorder.

False Negative: A student with a condition or disorder

who is not correctly identified by a test (the most serious error).






CELF-5 Diagnostic Accuracy


Hypothetical using 1.5 SD (77) cutoff with CELF-5: 10,000 Students, 10% Prevalence

Language Disorder

(.85)

No Language Disorder

(.99) Total

Positive Test Results

850 90 940

Negative Test Results

150 8,910 9,060

Total 1,000 9,000 10,000






Another Example: October is Breast Cancer Awareness Month!! (These numbers are approximates and based a 12% prevalence rate)

http://www.memorialbreastcenter.org/how-accurate-are-mammograms/

Breast Cancer

No Breast Cancer

Total

Positive Mammogram

Results 100

62 (≈7%)

162

Negative Mammogram

Results

20 (≈17%)

818 838

Total 120

(≈12%) 880 1,000


Learning Disorder (Reading and/or Writing)






Autism Spectrum Disorder


Summary/Questions?

• Reliability is reflection of error

• Validity helps us determine how we apply test interpretations and predict

• Choosing cutoff has implications for including or not including people in groups.





CELF5Family.PearsonClinical.com

PEARSON Customer Service

1-800-627-7271 (USA) PearsonClinical.com

1-866-335-8418 (Canada)

PearsonAssess.ca

Facebook.com/SpeechandLanguage

Twitter.com/SpeechnLanguage

pinterest.com/speechandlang/the-new-celf-5/

Understanding CELF-5 Reliability & Validity to Improve Diagnostic ...

Documents