Top Banner
Principles of Language Assessment: Test Usefulness Course: Testing Bachman & Palmer, Ch. 2 Presenter: Sara
30
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Test Usefulness

Principles of Language Assessment:

Test Usefulness

Course: Testing

Bachman & Palmer, Ch. 2Presenter: Sara

Page 2: Test Usefulness

The most important quality of a test is its usefulness

But,

-What makes a test useful ?

- How do we know a test will be useful before we use it

- Or it has been useful after we have used it ?

Page 3: Test Usefulness

Simply using a test does not

make it useful !

A model of test usefulness has been

proposed that include six test

qualities.

Page 4: Test Usefulness

Model of test

usefulness

Reliability

Constructvalidity

Authenticity

Interactiveness

Impact

Practicality

Page 5: Test Usefulness

Usefulness =

Reliability + Construct validity +

Authenticity + Interactiveness + Impact

+ Validity

Page 6: Test Usefulness

This model along with the three

principles, provides a basis for

answering this question:

“ How useful is this particular

test for its intended purpose(s)

? “

Page 7: Test Usefulness

It is the overall usefulness of the test

that is to be maximized, rather than

the individual qualities that affect

usefulness.

Page 8: Test Usefulness

The individual test qualities cannot be

evaluated independently, but must be

evaluated in terms of their combined

effect on the overall usefulness of the

test.

Page 9: Test Usefulness

Test usefulness & the appropriate

balance among the different qualities

cannot be prescribed in general, but

must be determined for each specific

testing situation.

Page 10: Test Usefulness

Therefore,

In order to be useful, any given

lg. test must be developed

with a specific purpose, a particular group

of test takers and a specific lg. use domain.

“ target lg. use” or TLU

*( tasks in the TLU domain

“TLU tasks”

Page 11: Test Usefulness

1RELIABILITY

- Reliability is often defined as consistency of

measurement.

Scores on test

tasks with

characteristics

A

Scores on test

tasks with

characteristics

A’

Reliability

- It is not possible to eliminate inconsistencies

entirely. What we can do is to try to minimize

the potential sources of inconsistencies.

Page 12: Test Usefulness

2Construct

- Construct validity pertains to the

meaningfulness & appropriateness of the

interpretations that we make on the basis of

test scores.

-The term construct validity is used to refer to

the extent to which we can interpret a given

test score as an indicator of the ability(ies), or

construct(s), we want to measure with respect

to a specific domain of generalization.

Validity

Page 13: Test Usefulness

Score interpretation:

Interference

about lg.

ability

(construct

definition)

Domain

of

generalization

TEST SCORE

Lg. ability Characteristics of

the test taskInteractiveness

Co

nstr

uct

vali

dit

y

Au

then

ticity

Page 14: Test Usefulness

3AUTHENTICITY

Characteristics

of the

TLU task

Characteristics

of the

Test taskAuthenticity

- We define authenticity as the degree of

correspondence of the characteristics of a

given lg. test task to the features of a TLU task.

Authenticity is important, because:

1- It provides a link between test performance & the

TLU tasks & domain to which we want to generalize.

2- The way test takers perceive the relative

authenticity of test tasks can facilitate their test

performance.

Page 15: Test Usefulness

4INTERACTIVeness

-We define interactiveness as the extent & the

type of involvement of the test taker’s

individual characteristics in accomplishing a

test task.

- Unlike authenticity, interactiveness resides in

the interaction between the individual ( test

taker or lg. user) & the task (test or TLU).

Page 16: Test Usefulness

Interactiveness

LANGUAGE ABILITY (Lg. knowledge, Metacognitive strategies)

Characteristics of lg. test task

Topical Knowledge Affective

Schemata

Page 17: Test Usefulness

Example 1

The typists who perform certain typing

tasks in English very well but they might

be able simply to copy the letters &

words , without processing the document

as a piece of discourse. Therefore:

Authenticity : High

Interactiveness : Low

Page 18: Test Usefulness

Example 2

The typists who are capable of carrying on

“ small talk” about food, clothing, etc.

Authenticity : Low (Lack of relevance of

the test task to the TLU task.)

Interactiveness : High (Test takers have

reasonable amount of control in selecting

topics & influencing the structure of the

interaction.)

Page 19: Test Usefulness

Example 3

International students entering an

American university were given a test of

English vocabulary, to match the words in

one column to the meanings in another one.

Authenticity : Low (few domains involve

this kind of task)

Interactiveness : Low (Highly restricted

involvement of lg. knowledge)

Page 20: Test Usefulness

Example 4

To conduct a face-to-face role play; a

salesperson and a customer.

Authenticity : High (Correspondence

between the characteristics of the TLU

domain and the ones of test task.)

Interactiveness : High (High level of

involvement of all the areas of lg. &

test taker’s topical knowledge.)

Page 21: Test Usefulness

POINTS TO REMEMBER

1- Both authenticity & interactiveness are relative.

2- Three types of characteristics must be considered: those of test takers, TLU task & test task.

3- Certain test tasks may be relatively useful, even though they are low in authenticity or interactiveness.

4- In designing or analyzing tests, our estimates of authenticity & interactiveness are only guesses.

5- The minimum acceptable levels that we specify forauthenticity & interactiveness will depend on the specific testing situation.

Page 22: Test Usefulness

5IMPACT

- Another quality of tests is their impact on

society & educational systems. The impact of

test use operates at two levels:

a micro

levela macro

level

Individuals

who are

affected by

the particular

tests use.

In terms of

educational

system or

society.

Page 23: Test Usefulness

W A S H B A C K

“ the effect of testing on teaching &

learning.” (Hughes, 1989)

“ how assessment instruments affect

educational practices & beliefs. .”

(Cohen, 1994)

Page 24: Test Usefulness

Washback

Impact on individuals

Impact on society & educational system

A) tests takers

B) teachers

Page 25: Test Usefulness

A) IMPACT ON TEST TAKERS

Test takers can be affected by three aspects of testing

procedure:

the experience of taking &, in some cases, of

preparing for the test. (Test taker’s

perception of TLU domain, his areas

of lg. knowledge & his use of

strategies)

the feedback they receive, about their performance on

the test,

Page 26: Test Usefulness

B) IMPACT ON TEACHERS

If teachers find that they have to use a specified test, they may

find “ teaching to test” almost unavoidable.

This term implies doing something in teaching that may not

be compatible with teachers’ own values & goals, or with

the values & goals of the instructional program.

One way to minimize the potential for negative impact on

instruction is to change the way we test.

Page 27: Test Usefulness

6P RACTICALITY

While the other five qualities pertain to the

uses that are made of test scores, practicality

pertains primarily to the ways in which the

test will be implemented, &, to a large

degree, whether it will be developed & used at

all. Thus, a practical test is one whose design,

development & use do not require more resources

that are available.

Page 28: Test Usefulness

Thus, determining the practicality of a given test involves

the consideration of:

the resources that will be required to develop an

operational test that has the balance of qualities we want;

&

the allocation & management of the resources that

are available. Practicality = --------------------------------------Available resources

Required resources

If practicality 1 , the test development & use is

practical.

If practicality 1, the test development & use is not

Page 29: Test Usefulness

Types of Resources

1- Human resources (e.g test writers, scorers or raters, test

administrators & technical support.)

a) Space (e.g rooms for test development)

2- Material resources b) Equipment (eg. typewriters,

computers)

c) Materials (e.g. paper, picture)

a) Time for specific tasks (designing, writing,

analyzing)

3- Time b) Development time

Page 30: Test Usefulness