Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in Higher Education (AALHE) Conference, Lexington, Kentucky, June 3, 2013 Dr. Yan Zhang Cooksey University of Maryland University College
23
Embed
Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project
Presented at the Third Annual Association for the Assessment of Learning in Higher Education
(AALHE) Conference, Lexington, Kentucky, June 3, 2013
Dr. Yan Zhang CookseyUniversity of Maryland University College
Outline of Today’s Presentation
• Background and purposes of the full-day grading project
• Procedural methods of the project• Discuss the results and decisions
informed by the assessment findings• Lessons learned through the
process
Purposes of the Full-day Grading Project
• To simplify the current assessment process
• To validate the newly developed common rubric measuring four core student learning areas (written communication, critical thinking, technology fluency, and information literacy)
UMUC Graduate School Previous Assessment Model: 3-3-3 Model
Previous Assessment Model: 3-3-3 Model (Cont.)
Previous Assessment Model: 3-3-3 Model (Cont.)
Strengths: Weaknesses:
• Tested rubrics • Added faculty workload
• Reasonable collection points
• Lack of consistency in assignments
• Larger samples - more data for analysis
• Variability in applying scoring rubrics
C2 Model: Common activity & Combined rubric
Compare 3-3-3 Model to (new)C2 Model
Current 3-3-3 Model Combined Activity/Rubric (C2) Model
•Multiple Rubrics: one for each of 4 SLEs
•Single rubric for all 4 SLEs
•Multiple assignments across graduate school
•Single assignment across graduate school
•One to multiple courses/4 SLEs •Single course/4 SLEs
•Multiple raters for the same assignment/course
•Same raters/assignment/course
•Untrained raters •Trained raters
Purposes of the Full-day Grading Project
• To simplify the current assessment process
• To validate the newly developed common rubric measuring four core student learning areas (written communication, critical thinking, technology fluency, and information literacy)
Procedural Methods of the Grading Project
• Data Source • Rubric • Experimental design for data
collection • Inter-rater reliability
Procedural Methods of the Grading Project (Cont.)
• Data Source (student papers/redacted)Course name # of PapersBTMN9040 27BTMN9041 29BTMN9080 7DETC630 9MSAF670 20MSAS670 13TMAN680 16
Total 121
Procedural Methods of the Grading Project (Cont.)
• Common Assignment• Rubric (rubric design and
refinement)• 18 Raters (faculty members)
Procedural Methods of the Grading Project (Cont.)
• Experimental design for data collection randomized trial (Group A&B) raters’ norming and training grading instruction
Procedural Methods of the Grading Project (Cont.)
• Inter-rater reliability (literature) Stemler (2004): in any situation that
involves judges (raters), the degree of inter-rater reliability is worthwhile to investigate, as the value of inter-rater reliability has significant implication for the validity of the subsequent study results.
Intraclass Correlation Coefficients (ICC) were used in this study.
Results and Findings
• Two-sample t-testGroup Statistics
Group # N Mean Std. Deviation
Std. Error Mean
Differ_Rater1and2
Group A-Experiment Group 483 .249 1.0860 .0494
Group B-Control Group 540 .024 1.2463 .0536
Results and Findings (Cont.)Independent Samples Test
• Strategies to improve inter-rater agreement More training Clear rubric criteria Map assignment instructions to rubric
criteria
• Make decisions based on the assessment results Further refined the rubric and common
assessment activity
Resources
• McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30-46 (Correction, 1(1), 390).
• Nunnally, J. (1978). Psychometric theory (2nd ed.). New York:
McGraw-Hill. • Stemler, S.E. (2004). A comparison of consensus, consistency,
and measurement approaches to estimating. Practical Assessment, Research & Evaluation, 9(4). Retrieved from http://pareonline.net/getvn.asp?v=9&n=4.
• Dr. Yan Zhang CookseyDirector for Outcomes AssessmentThe Graduate School, University of Maryland University CollegeEmail: [email protected]: http://assessment-matters.weebly.com