Top Banner
Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap..
25

Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Can you do it again?

Reliability and Other Desired Characteristics

Linn and Gronlund Chap.. 5

Page 2: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Let’s take a quiz…

Page 3: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

In a nutshell… Reliability is the consistency of a

measurement. Today and next class we will look at:

6 ways to estimate reliability The standard error of measurement What influences reliability Cut scores One slide on usability or practical

considerations

Page 4: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Reliability via a mathematical proof.

X = T + E

X = Obtained or observed score (fallible)

T = True score (reflects stable characteristics)

E = Measurement error (reflects random error)

Page 5: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Some initial thoughts… All measurement is susceptible to error. Any factor that introduces error into the

measurement process effects reliability. To increase our confidence in scores, we

try to detect, understand, and minimize random measurement error.

Page 6: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

More initial thoughts Just like validity, when talking about

reliability one talks about the scores, not the test or assessment.

Validity is the sum of its parts, reliability is only valid in its parts. (let me explain…)

One must have reliability to have validity. (Again, validity must have consistency.)

Page 7: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Error from Content Sampling… Many theorists believe the major source of

random measurement error is “Content Sampling.”

Since tests represent performance on only a sample of behavior, the adequacy of that sample is very important.

Page 8: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

And from temporal instability… Random measurement error can also

be the result “temporal instability:” Random and transient:

- Situation-Centered influences (e.g., lighting & noise)

- Person-Centered influences (e.g., fatigue, illness)

Page 9: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Or error comes from… Administration errors (e.g., incorrect

instructions, inaccurate timing).

Scoring errors (e.g., subjective scoring, clerical errors).

Page 10: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Back to correlations… -1.00 to 1.00 Let’s look on p.106 Correlation coefficients – 2 scores from same group

Ex - # of kids and level of stress Validity coefficients are based on some other

criterion – predicts or estimates Ex – SAT r GPA

Reliability coefficients are based on results within the same procedure – measure same construct

Let’s look at 6 options…

Page 11: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Method #1 – Test-Retest What if I gave you the exact same quiz right

now? Again? How many of you asked classes ahead of you in the

day what was on a quiz? This measures the “stability” of the test. The longer one goes before the retest, the more

external factors influence the reliability Two examples

GRE Scores and 5 years IEP re-evaluations every 3 years

Page 12: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Method #2 – Equivalent Forms Just make sure they are equivalent All standardized tests use this This measures the equivalence of the test Now your book uses the phrase, “in close

succession” Like, the same time close

Page 13: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Method #3 – Test-retest with Equivalent Forms One gets both stability and equivalence How many took the SAT or ACT more

than once? Improve? The best method in determining reliability

is test-retest with equivalent forms

Page 14: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Method #4 – Split-halves This you can calculate with only 1 administration

of the test. Let’s do it to your quiz

Spearman’s Rho or Spearman-Brown formula = 2(r of the halves)/1+r Do an example…

This is a reliability r that estimates the full reliability.

This is used to measure internal consistency

Page 15: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Method #5 – Coefficient Alpha

Also known as Kuder-Richardson Formula, or KR-20 if

dichotomous variables Also is calculated with only one

administration of the test to measure internal consistency

Technically, it is the average of all split-half coefficients

Should be used when the test is specific to a topic

Page 16: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Method #6 – Interrater Consistency

Think of the Russian judge in Olympic figure skating How do they handle this now?

Table on p.113 Table on p.114 is more likely Raters are trained using rubrics, etc…

Back to figure skating…

Page 17: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

In summing up these methods… Test-retest with equivalent forms is the best

for determining reliability See p.115 Each one can be used for specific types of

assessments. It is the assessment which determines which type of reliability you will report.

Page 18: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Standard Error of Measurement You scored x and you scored y; so what

did you really score? Scores vary, this estimates how much. Computers make this somewhat doable. Let me show the bell curve and explain

standard error of measurement (it’s kinda like standard deviations…) Use IQ for an example…

Page 19: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

More on SEM High reliability coefficients gives small SEM, and

vice-versa. Statistics can calculate a table… p.122 Really see p.117 SEM = 9.3, John scores a 722 and Mary a 732, are

Mary and John different? Finally, two great things:

The units one measures with in SEM are in the same units as the measurement itself, and

The SEM is normally consistent from group to group on the same assessment.

Page 20: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Factors that influence reliability Have you ever taken a test in a hot room?

Hungry? Tired? Three the book deals with

Page 21: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Factor #1 Number of Items or Assessment Tasks

Generally, the more the items, the higher the reliability will be.

Adding 10 very hard or very easy questions would not increase reliability at all, because…

Page 22: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Factor #2 Spread of Scores

The larger the spread, the higher the reliability

90-100 or 0-200? My physics story

Page 23: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Factor #3 Objectivity

Back to MC tests Back to the Russian judges

Page 24: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Fixed Standards (cut scores) Why my cousin-in-law says we test…

Do they meet a standard? GWB

Who cares the score, can they do it? Back to a rater table…sort of… p126 Cut scores are dangerous, where to put the

line? SEM?

Page 25: Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Now for the practical… Are the tests easy to administer? Do you have time constraints? Can you easily score these? Do these scores apply to what you are

after? Can you get an equivalent measure? And it’s all about money.