Top Banner
RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova
23

RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Jan 14, 2016

Download

Documents

Megan Houston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

RELIABILITYPrepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Page 2: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Outline

1. Defining reliability2. How to measure reliability3. Reliability coefficient 4. Observed score and true score5. SEM6. Item analyses

Page 3: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Tests as measuring tools

‘A test is something (as a series of questions or exercises) for measuring the skill, knowledge, intelligence, capacities, or aptitudes of an individual or group’

(Merriam Webster Dictionary Online, 2013)

Page 4: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.
Page 5: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.
Page 6: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Tests as measuring tools

‘…a language test is a procedure for gathering evidence of general or specific language abilities from performance on tasks designed to provide a basis for predictions about an individual’s use of those abilities in real world contexts.’

(McNamara, 2000:11)

Page 7: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

A reliable test

A perfectly reliable test is ‘one which would give precisely the same results for a particular set of candidates regardless of when it happened to be administered.’

(Hughes, 1989:31)

Page 8: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

An unreliable test

A completely unreliable test is one ‘which would give sets of results unconnected with each other.’

(Hughes, 1989: 32)

Page 9: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Strategies to estimate reliability

We can use statistics to estimate how reliable a test is:• test-retest reliability;• equivalent (parallel) forms reliability;• internal consistency reliability.

Page 10: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Test-retest reliability

‘calculating a reliability estimate by administering a test on two occasions and calculating the correlation between the two sets of scores’

(Brown, 2002)

Page 11: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Equivalent (parallel/alternative) forms

reliability‘calculating a reliability estimate by administering two forms of a test and calculating the correlation between the two sets of scores’

(Brown, 2002)

Page 12: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Internal consistency reliability

‘calculating a reliability estimate based on a single form of a test administered on a single occasion using internal consistency equations’

(Brown, 2002)

Page 13: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Internal consistency reliability:

• calculating reliability from single administration of test;

• some commonly reported figures (reliability coefficients) are;- split-half;- Cronbach’s alpha.

• calculated automatically by many statistical software packages.

Page 14: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Split-half reliability:

• the test is split in half (e.g. odd / even) creating “equivalent forms”;

• the two “forms” are correlated with each other;

• the correlation coefficient is adjusted to reflect the entire test length.

Page 15: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Reliability coefficient:

• range: -1.0 (inverse relationship) to 0.0 (totally unreliable test) to 1.0 (perfectly reliable test);

• reliability coefficients are estimates of the systematic variance in the test scores;

• lower reliability coefficient = greater measurement error in the test score.

Page 16: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

How high should reliability be?

(Pope n.d.)

Kevin
I have assumed no date, as it originally said "online".
Page 17: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Standard error of measurement (SEM):

• This allows us to use the score that the test taker got for the test (observed score) and estimate what their true level of ability might be. Of course, we do not know, so the ‘true score’ that we estimate must be a range of numbers.

• Observed score.• True score.

Page 18: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Maria’s scores:

50 50 49 52 50 51 49 48 50

True score = observed score +/- error

Standard error of measurement (SEM):

Page 19: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

We would expect the student to score near the centre of the distribution most of the time.

Standard error of measurement (SEM):

Page 20: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

The standard error of measurement (SEM) is the standard deviation of all those scores averaged across persons and test administrations.

(Brown, 2002)

Standard error of measurement (SEM):

Page 21: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Sx √(1-rxx’) Sx – standard deviation of raw scores

rxx’ – reliability coefficient

Standard error of measurement (SEM):

Page 22: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

1 SEM = 68% confidence2 SEM = 95% confidence3 SEM = 99.7% confidence

Standard error of measurement (SEM):

Page 23: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Observed score = 50SEM = 3

68%: from 47 to 5395%: from 44 to 56

Standard error of measurement (SEM):