Top Banner
F. Kaftandjieva F. Kaftandjieva
54

F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Page 2: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Page 3: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Page 4: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Page 5: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Terminology

Page 6: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Milestones in Comparability

1904“The proof and measurement of association between two things““The proof and measurement of association between two things“

association

Page 7: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Milestones in Comparability

1904

1951“Scores on two or more tests may be said to be comparable for a certain population if they show identical distributions for that population.”

“Scores on two or more tests may be said to be comparable for a certain population if they show identical distributions for that population.”

comparable

population

Page 8: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Milestones in Comparability

1904

1951

1971‘Scales, norms, and equivalent scores’: EquatingEquating CalibrationCalibration ComparabilityComparability

‘Scales, norms, and equivalent scores’: EquatingEquating CalibrationCalibration ComparabilityComparability

Page 9: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Milestones in Comparability

1904

1951

1971

19921993

Page 10: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Milestones in Comparability

1904

1951

1971

19921993

19972001

Page 11: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Alignment

Alignment refers to the degree of match between test content and the standards

Dimensions of alignment Content Depth Emphasis Performance Accessibility

Page 12: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Alignment

Alignment is related to content validitycontent validity Specification (Manual – Ch. 4)

“Specification … can be seen as a qualitative method. … There are also quantitative methods for content validation but this manual does not require their use.” (p. 2)

24 pages of formsOutcome: “A chart profiling coverage graphically in

terms of levels and categories of CEF.” (p. 7) Crocker, L. et al. (1989). Quantitative Methods for

Assessing the Fit Between Test and Curriculum. In: Applied Measurement in Education, 2 (2), 179-194.

Page 13: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

0.235

Alignment (Porter, 2004)

www.ncrel.orgwww.ncrel.org

Page 14: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Milestones in Comparability

1904

1951

1971

19921993

19972001

Page 15: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Construct Instrument Examinees Moderator

Equating = = = no

Calibration = = no

Projection = no

Statistical moderation Other test

Social moderation Judges

Mislevy & Linn: Linking Assessments

Equating Equating Linking Linking

Page 16: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

in Calibration

The Good & The Bad

Page 17: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Model – Data Fit

Page 18: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Model – Data Fit

Page 19: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Model – Data Fit

Page 20: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

-2,0 -1,5 -1,0 -0,5 0,0 0,5 1,0 1,5 2,0

-2,0

-1,5

-1,0

-0,5

0,0

0,5

1,0

1,5

2,0301 itemsr = .975

Sub

-sam

ple

B

Sub-sample A

Sample-Free Estimation

Page 21: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

-2,0 -1,5 -1,0 -0,5 0,0 0,5 1,0 1,5 2,0

-2,0

-1,5

-1,0

-0,5

0,0

0,5

1,0

1,5

2,0

- b - values (r =+0.9998)

FA

CE

TS

OPLM

The ruler (θ scale)The ruler (θ scale)

Page 22: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

The ruler (θ scale)The ruler (θ scale)

-3 -2 -1 0 1 2 3

-3 -2 -1 0 1 2 3

Page 23: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

-3 -2 -1 0 1 2 3

-3 -2 -1 0 1 2 3

The ruler (θ scale)The ruler (θ scale)

Page 24: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

-300 -250 -200 -150 -100 -50 0 50 100 150

Celsius

-500 -400 -300 -200 -100 0 100 200 300

Fahrenheit

The ruler (θ scale)The ruler (θ scale)

boiling waterboiling waterabsolute zeroabsolute zero

Page 25: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

The ruler (θ scale)The ruler (θ scale)

F° = 1.8 * C° + 32 C° = (F° – 32) / 1.8 F° = 1.8 * C° + 32 C° = (F° – 32) / 1.8

Page 26: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Construct Instrument Examinees Moderator

Equating = = = no

Calibration = = no

Projection = no

Statistical moderation Other test

Social moderation Judges

Mislevy & Linn: Linking Assessments

Page 27: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Standard Setting

Page 28: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

The Ugly

Page 29: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Human judgment is the epicenter of every standard-setting method

Berk, 1995

Human judgment is the epicenter of every standard-setting method

Berk, 1995

Fact 1:

Page 30: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

When Ugliness turns to Beauty

Page 31: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

When Ugliness turns to Beauty

Page 32: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

The cut-off points on the latent continuum do not possess any objective reality outside and independently of our minds. They are mental constructs, which can differ within different persons.

Fact 2:

Page 33: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Whether the levels themselves are set at the proper points is a most contentious issue and depends on the defensibility of the procedures used for determining them

Messick, 1994

Consequently:

Page 34: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Defensibility

Page 35: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

National Standards Understands manuals

for devices used in their everyday life

Defensibility: Claims vs. Evidence

CEF – A2 Can understand

simple instructions on equipment encountered in everyday life – such as a public telephone (p. 70)

Page 36: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Cambridge ESOL DIALANG Finnish Matriculation CIEP (TCF) CELI Universitа per

Stranieri di Perugia Goethe-Institut TestDaF Institut WBT (Zertifikat

Deutsch)

Defensibility: Claims vs. Evidence

Page 37: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Common Practice (Buckendahl et al., 2000) External Evaluation of the alignment of

12 tests by 2 publishers Publisher reports:

No description of the exact procedure followedReports include only the match between items and

standards Evaluation study

At least 10 judges per test Comparison results

% of agreement: 26% - 55%Overestimation of the match by test-publishers

Defensibility: Claims vs. Evidence

Page 38: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Standard 1.7: When a validation rests in part of the opinion or decisions of

expert judges, observers or raters, procedures for selecting such experts and for eliciting judgments or ratings should be fully described. The description of procedures should include any training and instruction provided, should indicate whether participants reached their decisions independently, and should report the level of agreement reached. If participants interacted with one another or exchanged information, the procedures through which they may have influenced one another should be set forth.

Standards for educational and psychological testing,1999

Page 39: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Evaluation Criteria

Hambleton, R. (2001). Setting Performance Standards on Educational Assessments and Criteria for Evaluating the Process. In: Setting Performance Standards: Concepts, Methods and Perspectives., Ed. by Cizek, G., Lawrence Erlbaum Ass., 89-116.

A list of 20 questions as evaluation criteria Planning & Documentation 4 (20%) Judgments 11 (55%) Standard Setting Method 5 (25%)

Planning

Page 40: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Judges

Because standard-setting inevitably involves human judgment, a central issue is who is to make these judgments, that is, whose values are to be embodied in the standards.

Messick, 1994

Page 41: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Selection of Judges

The judges should have the right qualifications, but some other criteria such as

occupation, working experience, age, sex

may be taken into account, because ‘… although ensuring expertise is critical, sampling from relevant different constituencies may be an important consideration if the testing procedures and passing scores are to be politically acceptable’ (Maurer & Alexander, 1992).

Page 42: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Number of Judges

Livingston & Zieky (1982) suggest the number of judges to be not less than 5.

Based on the court cases in the USA, Biddle (1993) recommends 7 to 10 Subject Matter Experts to be used in the Judgement Session.

As a general rule Hurtz & Hertz (1999) recommend 10 to 15 raters to be sampled.

10 judges is a minimum number, according to the Manual (p. 94).

Page 43: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Training Session

The weakest point How much?

Until it hurts (Berk, 1995)

Main focus Intra-judge consistency

Evaluation forms Hambleton, 2001

Feedback

??

??

Page 44: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Training Session: Feedback Form

0,80

0,85

0,90

0,95

1,0011

0111

0211

0311

0411

0611

0811

1411

1511

1611

2412

0712

0912

1012

1212

1312

1812

1912

2012

2112

2212

2513

0513

1114

1716

2321

0621

1221

1721

2021

2721

3121

3222

0222

0322

1422

1822

2122

2422

2623

0423

0723

1923

2223

2323

2923

3723

4024

0124

0524

0924

1324

3024

3525

1625

3325

3826

3426

3626

3927

0827

1027

2828

1128

1528

25

Inter-judge Consistency

Con

sist

ency

Experts' ID

Page 45: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

-3 -2 -1 0 1 2

1

2

3

4

5

6

Intra-Judge Consistency: Expert 13

Leve

l

Item Difficulty ( )

Training Session: Feedback Form

Page 46: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Standard Setting Method

Good Practice The most appropriate Due diligence Field tested Reality check Validity evidence More than one

Page 47: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Probably the only point of agreement among standard-setting gurus is that there is hardly any agreement between results of any two standard-setting methods, even when applied to the same test under seemingly identical conditions.

Berk, 1995

Standard Setting Method

Page 48: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

-1,0

-0,8

-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8

1,0

CGBGA3A2A1A0

La

ng

ua

ge

Pro

ficie

ncy

( )

Standard Setting Methods

Test-centered methods

Examinee-centered methods

He that increaseth knowledge increaseth sorrow. (Ecclesiastes 1:18)

Page 49: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

1999 2000 2001 2002 2003 20040%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Pa

ss R

ate

He that increaseth knowledge increaseth sorrow. (Ecclesiastes 1:18)

Page 50: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

In sum, it may seem that providing valid grounds for valid inferences in standards-based educational assessment is a costly and complicated enterprise. But when the consequences of the assessment affect accountability decisions and educational policy, this needs to be weighed against the costs of uninformed or invalid inferences.

Messick, 1994

In sum, it may seem that providing valid grounds for valid inferences in standards-based educational assessment is a costly and complicated enterprise. But when the consequences of the assessment affect accountability decisions and educational policy, this needs to be weighed against the costs of uninformed or invalid inferences.

Messick, 1994

Instead of Conclusion

Page 51: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

The chief determiner of performance standards is not truth; it is consequencesconsequences.

Popham, 1997

Instead of Conclusion

Page 52: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Perhaps by the year 2000, the collaborative efforts of measurement researchers and practitioners will have raised the standard on standard-setting practices for this emerging testing technology.

Berk, 1996

Instead of Conclusion

Page 53: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva

Page 54: F. Kaftandjieva. Terminology F. Kaftandjieva Milestones in Comparability 1904 “The proof and measurement of association between two things“ association.

F. KaftandjievaF. Kaftandjieva