Top Banner
Reliability and Validity
23

Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Mar 27, 2018

Download

Documents

buidung
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Reliability and

Validity

Page 2: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Reliability: Notation/Symbols

rxx = reliability of the predictor

ryy = reliability of the criterion

Page 3: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Informal The consistency and stability of

measurement

Formal “The extent of unsystematic variation

in the quantitative description of an

individual when that individual is

measured a number of times.”

(Ghiselli, Campbell, & Zedeck, 1981,

p. 482)

Reliability: Definitions

Page 4: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Observed

Score

Reliability: “Classic Measurement Theory”

= ( True

Score+

Systematic

Error ) + Random

Error

Page 5: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

A poor measurement instrument

A poor user of the measurement instrument

An unstable trait, characteristic, or attribute

Reliability: Some Sources Affecting Reliability

Page 6: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Test - Retest

Reliability: Forms of Reliability

Use the same test with the same

sample (people) on 2 different

occasions; correlate the scores from

the 2 administrations of the test

Correlation sometimes referred to as “Coefficient of Stability”)

Page 7: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Some Sources of Error

Test - Retest

Change in testing conditions

Change in test taker

Practice effects

Reliability: Forms of Reliability

Page 8: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Parallel

Forms (a.k.a.

equivalent or

alternate forms)

Develop 2 or more similar tests

designed to measure the same thing;

correlate scores from the “parallel”

forms, using same people to complete

both forms.

Correlation may be referred to as “Coefficient of Equivalence”

2 approaches:

1. Immediate

2. Delayed

Reliability: Forms of Reliability

Page 9: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Some Sources of Error

Parallel Forms

Immediate – fatigue, content of tests, practice

Delayed – content of tests, time

Problems with constructing two versions of measure

Reliability: Forms of Reliability

Page 10: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Give the measure once

Somehow divide measure into 2

parts with equal number of items

(e.g., odd/even split)

Correlate scores on the 2 halves

Correct this value! (It’s an

underestimate of overall internal

consistency)

Internal

Consistency

Extent to which responses are

consistent to items designed to

measure the same thing within a

single test.

Approaches:

1. Split half

Reliability: Forms of Reliability

Page 11: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Observed

Score = True

Score+

Systematic

Error+ Random

Error( )

Remember the

“random error”

component!

Other things being equal, the longer a test is (i.e., more items),

the more “reliable” the test will be in terms of estimating the

“true score.” This is why you must correct the “split half”

estimate of reliability!

Reliability: Forms of Reliability

Page 12: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Spearman – Brown Prophecy Formula for Split Half rxx:

rfull test =2rxx

1 + rxx

Reliability

estimate from

split half

Reliability: Forms of Reliability

Page 13: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Approach #2: Coefficient alpha (a.k.a. “Cronbach’s alpha”)

Give the measure once

Consistency of responses to all of

the items in a “test” of same thing

once

Mathematical estimate of average

of all possible split halves

Do NOT have to correct this value!

Symbolized as

Reliability: Forms of Reliability

Page 14: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Inter-rater

Reliability (a.k.a.

inter-rater

agreement)

Examples – judges in Olympic events,

assessors in assessment center, profs

grading term papers

Problems – inter-rater differences in

temperament, motivation, observation skills,

etc.

Improve with training, clear definitions,

better measures

Extent to which 2 or more observers

of the same behavior, using a similar

measurement instrument, rate or

score the behavior in a similar or

consistent manner.

Reliability: Forms of Reliability

Page 15: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Validity

Page 16: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Simple Extent to which a test measures

what it is supposed to measure

Formal “… refers to the appropriateness,

meaningfulness, and usefulness of

the specific inferences made from

test scores. Test validation is the

process of accumulating evidence

to support such inferences.”

Standards for Educational and Psychological Testing (1985) p.9

Validity: Definitions

Page 17: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Definition - Principles for the Validation

and Use of Personnel Selection

Procedures (SIOP, 2003)

Validity

“… the degree to which accumulated evidence and theory

support specific interpretations of test scores entailed by

proposed uses of a test” (AERA et al., 1999, p. 184).”

“Validity is the most important consideration in developing and

evaluating selection procedures. Because validation involves

the accumulation of evidence to provide a sound scientific

basis for the proposed score interpretations, it is the

interpretations of these scores required by the proposed uses

that are evaluated, not the selection procedure itself.”

Page 18: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Validity: Types

1. Criterion-related validity (Symbol: rxy)

Predictive design (“Predictive validity”)

Concurrent design (“Concurrent validity”)

2. Content validity

3. Construct validity

Page 19: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Criterion-Related: Predictive Design

Time 1 “Applicants” take predictor/test

Time 2 Collect job performance

(criterion) data after “applicants”

have been on the job for some

time

Correlate predictor scores with

job performance data

Validity: Types

Page 20: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Time 1 Job incumbents take predictor/test

Time 1 Collect job performance (criterion)

data for job incumbents

Correlate predictor scores with

job performance data

Criterion-Related: Concurrent Design

Validity: Types

Page 21: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Definition “…refers to the degree to which the test

items are a representative sample of

behaviors exhibited in some performance

domain.”

(Schneider & Schmitt, 1986, p. 237)

Almost entirely based on judgment of

process used to identify test content

Tenopyr (1977) called it “content-oriented

test construction.”

Robinson’s (1981) “Construction Error

Recognition Test”

Example

Content ValidityValidity: Types

Page 22: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Definition …extent to which a measure (indirectly)

assesses an underlying concept or

construct

… the interpretation of what the scores on a

measure represent (Spector, 1996)

2 Issues: Define the construct (what it is and what it

is not)

Make judgments, usually over a series of

studies, of how well a pattern of results

from a measure matches up with the

patterns expected for that construct

Convergent validity & discriminant validity

Validity: Types Construct Validity

Page 23: Reliability and Validitycoursework.mansfield.edu/psy3315/3315 - Reliability and Validity... · A poor measurement instrument A poor user of the measurement instrument An unstable

Example

Validity: Types Construct Validity