Top Banner
Developing and Evaluating Performance-Based Assessments Best Practices and Lessons Learned from an Online Chinese Course Katharine B. Nielson and Megan C. Masters Language Flagship: Results 2012 Friday October 26, 2012 New York City, NY
27

Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Oct 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Developing and Evaluating

Performance-Based

Assessments Best Practices and Lessons Learned

from an Online Chinese Course

Katharine B. Nielson and Megan C. Masters

Language Flagship: Results 2012 Friday October 26, 2012

New York City, NY

Page 2: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Outline

• Task-Based Language Assessment

• Research questions/methodology

• Results of empirical study

• Implications for classroom teachers

2

Page 3: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Task-Based Language Teaching

Framework for structuring, teaching, and

assessing courses (Ellis, 2003; Long, 1985; Long & Crookes, 1993; Long &

Norris, 2000; Norris, 2009; Skehan, 1998)

– Conduct a Needs Analysis (Long, 2005)

– Sequence course in terms of tasks (Robinson, 2001; Skehan, 1998)

– Promote learning by doing (Doughty & Long, 2003)

– Focus on form (Long, 1991; Long & Robinson, 1998)

– Use task as unit of analysis in assessments (Norris, 2002;

Norris, 2009)

3

Page 4: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Task-Based Language

Assessment (TBLA)

4

Performance-based, construct-based assessment, or combination?

• Performance-referenced assessment can be appropriate (Mislevy, et. al., 2002; Norris, 2002; Robinson & Ross, 1996)

• Performance-based assessment cannot stand alone and TBLT courses should include construct assessment (Bachman, 2002)

Page 5: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

TBLA

Develop rubrics

Specify criterial levels for each subtask, defining minimal evidence for task completion

Identify subtasks essential for task accomplishment

Identify target tasks

5

Page 6: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Empirical Study

• What is the relationship between language

performance and task accomplishment?

• How well do the rubrics (subtasks and

success criteria) measure learner

performance?

• How well does the rating scale work?

• Do rater differences affect scoring?

6

Page 7: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Research Setting

7

Yearlong, online, task-based Chinese course

35 Post-STARTALK, high school students

College-level intermediate course (CHIN 201)

3 college credits over 2 semesters

Page 8: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Methodology

8

Needs Analysis

Test Development

Test Administration

Rasch Analysis

Test Evaluation

Page 9: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Sample Rubric

9

Page 10: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Sample Rubric

10

Page 11: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Correlation Analysis

11

Page 12: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Multi-Faceted Rasch Analysis

• Person ability, item difficulty and rater

severity converted to logit (log-odds)

metrics

• Allows for direct comparisons of outcomes

• Consistency of person, item and rater

calibration

• Visual examination of task item difficulty

relative to person ability estimates

• Use of Likert scale

12

Page 13: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Kjhhkjhlkjh

13

“Arranging a Trip” Multi-faceted Output (n=19)

Raters do not exhibit substantial

differences in severity

Majority of learners have ability

estimates higher than most difficult

tasks

Rubric does not adequately measure learners with ability

estimates > 2.5 logits

Item redundancy

Page 14: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Probability Curves: Example

14

F-thresholds

Uniform probability curves

indicate equal-interval scale

Distinct portion of underlying

construct of interest

Important for parametric analyses

F1 F2 F3

Page 15: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

“Arranging a Trip” Probability

Curves

15

Learners most likely to be given

a rating of a 2 or 3

Learners least likely to be given

a rating of 4

Absence of rating 1

Not representative of

interval scale

F1 F2 F3

Page 16: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Kjhhkjhlkjh

16

“Buying Something” Multi-faceted Output (n=22)

Raters do not exhibit substantial

differences in severity

Majority of learners have ability

estimates higher than most difficult

tasks

Rubric does not adequately measure learners with ability

estimates > 3.5 logits

Item redundancy

Page 17: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

“Buying Something” Probability

Curves

17

Learners most likely to be given

a rating of a 2 or 3

Learners least likely to be given

a rating of 4

Idiosyncratic use of rating

scale

F1 F2 F3 F4

Page 18: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Kjhhkjhlkjh

18

“End-of-Course” Multi-faceted Output (n=20)

Raters do not exhibit substantial

differences in severity

Majority of learners have ability

estimates higher than most difficult

tasks

Rubric does not adequately measure learners with ability

estimates > 2.0 logits

Item redundancy

Page 19: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

“End-of Course” Probability

Curves

19

F1 F2 F3

Uniform use of rating scale

Placement of learners

proportional to range of learners’ ability estimates

Equal interval rating scale

Page 20: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Results of Rasch Analysis

20

• Commonalities among easy and difficult

subtasks across modules

– Clarifying information was difficult

– Confirming information was easy

• Overall, assessment items were too easy

for learners

• Likert scale could be reduced from 1-5 to

1-3

Page 21: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Conclusions

21

• More subtasks are needed for a nuanced

picture of learner abilities

• Important to take rater severity into

account when using criterion-referenced

PBAs

• More and clearer criteria are needed

• Future iterations of Likert rating scale

should be accompanied by category

definitions

Page 22: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Implications for Classroom

22

Performance-based assessment can offer information in addition

to standardized proficiency measures

Teachers can use Rasch analysis to iteratively develop and validate their

own tools

Rasch analysis can reveal issues

with testing instruments and

with rater severity estimates

Page 23: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Practical Considerations

23

• These assessments were developed for

online instruction

– More than one rater could be present

– Fluent interlocutors were not limited by

physical constraints

– Tasks needed to be adapted

– Technological constraints affected

assessments

Page 24: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Questions?

24

Page 25: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

Thank you!

• Special thanks to the Associate

Directorate of Education and Training

(ADET)

• …and to Dr. Der-lin Chao, Dr. Tamara

Green and the many talented graduate

students who collaborated with us on this

project

25

Page 26: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

References

Bachman, L. (2002). Some reflections on task-based language performance assessment. Language Testing, 19, 453-476.

Doughty, C., & Long, M. (2003). Optimal psycholinguistic environments for distance foreign language learning. Language Learning & Technology, 7(3), 50-80.

Fleming, S. & Hiple, D. (2004). Distance education to distributed learning: Multiple formats and technologies in language instruction. CALICO Journal, 22(1), 63 – 82.

Linacre, J.M. (1996): Facets, version no. 3.0. Chicago: MESA.

Long, M. (1985). A role for instruction in second language acquisiton: Task-based language teaching. In K. Hyltenstam & M. Pienemann (Eds.), Modelling and Assessing Second Language Acquisition (pp. 77-99). Clevedon: Multilingual Matters.

Long, M., & Crookes, G. (1993). Units of analysis in syllabus design: The case for task. In G. Crookes & S. Gass (Eds.), Tasks in a pedagogical context: Integrating theory and practice (pp. 9 – 54). Clevedon: Multilingual Matters.

Long, M. H., & Norris, J. M. (2000). Task-based teaching and assessment. In M. Byram (Ed.), Encyclopedia of language teaching (pp. 597-603). London: Routledge.

Page 27: Developing and Evaluating Performance-Based Assessments · Performance-based assessment can offer information in addition to standardized proficiency measures Teachers can use Rasch

References

Long, M., & Robinson, P. (1998). Focus on form: theory, research, and practice. In C. Doughty & J. Williams (Eds.), Focus on Form in Classroom Second Language Acquisition (pp. 15-41). New York: Cambridge University Press.

Mislevy, R. L., Steinberg, L. S., & Almond, R. G. (2002). Design and analysis in task-based language assessment. Language Testing 19, 4, 477-96.

Norris, J. (2002). Interpretations, intended uses and designs in task-based language assessment. Language Testing, 19(4), 337 – 346.

Norris, J. (2009). Task-based teaching and testing. In Long, M. H. & Doughty, C. J. (eds.), Handbook of language teaching (pp. 578-94). Oxford, Blackwell.

Robinson, P., & Ross, S. (1996). The development of task-based assessment in English for academic purposes contexts. Applied Linguistics 17(3), 455-76.

Skehan, P. (1998). A framework for the implementation of task-based instruction. Applied Linguistics, 17, 38 – 62.