Impact of Structured Feedback on Examiner Judgements in Objective Structured Clinical Examinations (OSCEs) Using Generalisability Theory Wong, W. Y. A., Roberts, C., & Thistlethwaite, J. (2020). Impact of Structured Feedback on Examiner Judgements in Objective Structured Clinical Examinations (OSCEs) Using Generalisability Theory. Health Professions Education. https://doi.org/10.1016/j.hpe.2020.02.005 Published in: Health Professions Education Document Version: Publisher's PDF, also known as Version of record Queen's University Belfast - Research Portal: Link to publication record in Queen's University Belfast Research Portal Publisher rights Copyright 2020the authors. This is an open access article published under a Creative Commons Attribution-NoDerivs License (https://creativecommons.org/licenses/by- nd/4.0/), which permits reproduction and redistribute in any medium, provided the author and source are cited and any subsequent modifications are not distributed. General rights Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the Research Portal that you believe breaches copyright or violates any law, please contact [email protected]. Download date:21. Nov. 2021
12
Embed
Impact of Structured Feedback on Examiner Judgements in ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Impact of Structured Feedback on Examiner Judgements in ObjectiveStructured Clinical Examinations (OSCEs) Using GeneralisabilityTheoryWong, W. Y. A., Roberts, C., & Thistlethwaite, J. (2020). Impact of Structured Feedback on ExaminerJudgements in Objective Structured Clinical Examinations (OSCEs) Using Generalisability Theory. HealthProfessions Education. https://doi.org/10.1016/j.hpe.2020.02.005
Published in:Health Professions Education
Document Version:Publisher's PDF, also known as Version of record
Queen's University Belfast - Research Portal:Link to publication record in Queen's University Belfast Research Portal
Publisher rightsCopyright 2020the authors.This is an open access article published under a Creative Commons Attribution-NoDerivs License (https://creativecommons.org/licenses/by-nd/4.0/), which permits reproduction and redistribute in any medium, provided the author and source are cited and any subsequentmodifications are not distributed.
General rightsCopyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or othercopyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associatedwith these rights.
Take down policyThe Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made toensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in theResearch Portal that you believe breaches copyright or violates any law, please contact [email protected].
Health Professions Education xxx (xxxx) xxxwww.elsevier.com/locate/hpe
Impact of Structured Feedback on Examiner Judgements inObjective Structured Clinical Examinations (OSCEs) Using
Generalisability Theory
Wai Yee Amy Wong a,*, Chris Roberts b, Jill Thistlethwaite c
a School of Education & Faculty of Medicine, The University of Queensland, QLD 4072, Australiab Sydney Medical School, Faculty of Medicine and Health, The University of Sydney, NSW 2006, Australia
c Faculty of Health, University of Technology Sydney, NSW 2007, Australia
Received 16 October 2019; revised 18 February 2020; accepted 20 February 2020
2 W.Y.A. Wong et al. / Health Professions Education xxx (xxxx) xxx
+ MODEL
1. Introduction
The objective structured clinical examination(OSCE) is a widely used assessment strategy in bothundergraduate and postgraduate medical and healthprofessions education.1,2 A dominant reason for thewidespread use of OSCE is that it is perceived as anobjective and standardised measure of student clinicalcompetence.3,4,5 In maintaining the quality assuranceof assessments, it is essential to ascertain the variancein examiners’ scores awarded to students, and findways of reducing sources of unwanted construct-irrel-evant variance 6 in future iterations of the OSCE. Theaim of this study was to investigate the impact ofstructured feedback by comparing the examiner strin-gency and leniency variance in their judgements of thefinal-year students’ clinical competence before feed-back was provided for the pre-feedback (P1) OSCE,and shortly after feedback was provided for the post-feedback (P2) OSCE.
The OSCE in this study was a large-scale summativeassessment of the final-year students (n > 350) enrolledin a four-year graduate-entry Bachelor of Medicine/Bachelor of Surgery (MBBS) program at one Australianresearch-intensive university. The focus of the initiativein this study to reduce unwanted construct-irrelevantvariance was the examiner stringency and leniency. It isdefined as the tendency of examiners to use either the topor bottom end of the rating scale consistently. Thisdefinition is adapted from the study of Roberts et al.6 oninterviewer stringency and leniency.
The significance of the influence of examiner strin-gency and leniency on the consistency of examinerjudgements in high-stakes clinical examinations such asOSCEs has received considerable attention in the liter-ature.7e11 Harasym et al.9 analysed the extent of theinfluence of examiner stringency and leniency on thecommunication skill scores of 190 medical students attheir family medicine clerkship end-of-rotation OSCE.Results showed that the examiner stringency and le-niency contributed 44.2% to the variance in the students’scores, whereas student ability only amounted to 10.3%.
More recently, Hope and Cameron12 explored thechanges in examiner stringency in the scores of 278third-year undergraduate medical students in a sum-mative OSCE. Two days were required to allow allstudents to complete the eight face-to-face stations.Results showed that the examiners were most lenientat the start of the two-day OSCE. When comparingthe scores of the students who undertook the OSCE inthe first and last group, there was approximately 3.3%difference in the effect of the examiner stringency and
Please cite this article as: Wong WYA et al., Impact of Structured Feedba
inations (OSCEs) Using Generalisability Theory, Health Professions Educ
leniency on the student scores. Although the differ-ence was relatively small, it would have affected thescores for the borderline students. Examiner trainingwas emphasised as a crucial means to assure thatexaminer stringency and leniency did not vary overtime in future iterations of the OSCE, due to the factthat examiners assessed an increasing number ofsuccessful students.12
Results from these two studies9,12 highlighted theimportance of acquiring empirical evidence on effec-tive strategies to minimise the influence of unwantedsources of examiner variance, particularly in high-stakes summative assessments judged by a soleexaminer.13 This is necessary to guide initiatives aimedat reducing unwanted sources of variance, which mayhave a significant and direct impact on the robustnessof decisions about student progression, certification,and ultimately affect the quality of patient care deliv-ered by future doctors.14
Although recent literature suggested that examinerjudgements are inherently subjective and could bebased on idiosyncratic reasons,15,16,17 it is important toprovide a fair assessment of student clinical compe-tence taking into account the interactions betweenstudents and the specific context including the exam-iners and the circumstances.17 Previous empiricalstudies have attempted to evaluate the impact ofexaminer training to reduce the unwanted sources ofvariance in examiner judgements.18e23 However, re-sults have been inconclusive and difficult to compareas researchers applied different methodologies.24
Germane to the aim of providing students with fairassessment, this study addresses the critical challenge ofreducing the known impact of the influence of examinerstringency and leniency on the scores awarded to stu-dents,8,9,25 through implementing an examiner feedbacksystem in a high-stakes summative OSCE. The idea ofproviding examiners with feedback was developedbased on the three distinct but related perspectives ofexaminer cognition in the literature: examiners aretrainable; examiners are fallible; or they are meaning-fully idiosyncratic.14As the provision of feedback couldbe inferred as an examiner training intervention, thisstudy is closely aligned with the perspective that ex-aminers are trainable.14 The structured feedback createdan authentic learning opportunity for the examiners toformally review and reflect on their marking behaviour,and, potentially make subsequent evidenced-based de-cisions to change their marking practice.
While acknowledging that there are other factorsimpacting on the examiners’ scores such as the stationeffect, this study focused on exploring the impact of
ck on Examiner Judgements in Objective Structured Clinical Exam-
ation, https://doi.org/10.1016/j.hpe.2020.02.005
3W.Y.A. Wong et al. / Health Professions Education xxx (xxxx) xxx
+ MODEL
examiner stringency and leniency underpinning by thebelow two research questions (RQs). The pre-feedback(P1) OSCE for the final-year medical students was thefirst year of this study. The P1 OSCE examiners hadnever had feedback about their marking behaviour. Thepost-feedback (P2) OSCE for the final-year medicalstudents was the second year of this study. The P2OSCE examiners received the structured feedbackeight weeks prior to assessing students in the P2OSCE.
RQ 1. What is the contribution of and change inexaminer stringency and leniency variance (Vj) for theexaminers who assessed students in the pre-feedback(P1) OSCE, received structured feedback, and assessedstudents again in the post-feedback (P2) OSCE?
RQ 2. What is the contribution of and change inexaminer stringency and leniency variance (Vj) for theexaminers who assessed students in both the pre-feedback (P1) and post-feedback (P2) OSCEs and in atleast one common station across both OSCEs?
2. An analytical framework using generalisabilitytheory
We applied generalisability theory (G theory)26,27
as the analytical framework which suggests that for asingle OSCE station, the student score is a combi-nation of the true score of a student’s performanceand multiple sources of error variances,28 such as theexaminer stringency and leniency variance (Vj). Gtheory facilitates the exploration of the impact ofstructured feedback by computing and comparing themagnitude of Vj contributing to the examiners’scores in the pre-feedback (P1) and post-feedback(P2) OSCEs. We hypothesised that such structuredfeedback would have a constructive impact on the
Table 1
The variance components contributed to the examiners’ scores in this partia
Variance Component Notation Used in Section
8 Statistical Analysis
1. Students (p) Varstudent (Vp)
2. Stations (s) Varstation (Vs)
3. Examiners (j) Varexaminer (Vj)
4. Interaction between
examiners and stations (j x s)
Varexaminer*station (Vj*s)
5. Interaction between students
and stations (p x s)
Varstudent*station (Vp*s)
6. Measurement error (e) Varerror (Verr)
Please cite this article as: Wong WYA et al., Impact of Structured Feedba
inations (OSCEs) Using Generalisability Theory, Health Professions Educ
examiners’ marking behaviour when they assessedstudents in the P2 OSCE, thereby reducing Vj.
3. Context
The final-year OSCE for the four-year graduate-entryBachelor of Medicine/Bachelor of Surgery (MBBS)students at this Australian research-intensive universityis a high-stakes exit assessment as student results have adirect impact on their ability to graduate and thuscommence an internship as a qualified medical doctor inthe following year. It is a usual practice of this medicalschool to allocate a single examiner to assess a singlestudent in a station in the final-year OSCE. This medicalschool was selected as it has had the largest enrolmentsin Australia since 2010, with nearly 500 final-year stu-dents in 2014.29 Consequently, over 100 volunteer ex-aminers were involved in the annual final-year OSCE toassess students on two consecutive days across differenthospital sites. For both P1 and P2 OSCEs, four OSCEsessions (i.e. Saturday morning and afternoon, andSunday morning and afternoon) were held at one hos-pital site, whereas only a Saturday morning session washeld at the other three sites in the P1 OSCE and twoother sites in the P2 OSCE. Examiners were allocated toa specific site based on their availability, whereas stu-dents were allocated to the relevant sites based on theirgeographical locations. The researchers were notinvolved in the allocation of students and examiners forthe OSCEs.
4. Partially-crossed generalisability study design
Based on the G theory analytical framework, weadopted a quasi-experimental pre- and post-design ofa generalisability study (G study) as a feasible and
lly-crossed and unbalanced G study. Adapted from Crossley et al.31
Explanation
The consistent differences between student ability
across examiners and OSCE stations
The consistent differences in OSCE station difficulty
across students and examiners
The consistent differences in examiner stringency/leniency
across students and OSCE stations
The varying case-specific stringency/leniency of examiners
between OSCE stations across students
The varying case aptitude of students displayed between
stations across examiners
Any residual variation that cannot explained by other factors
ck on Examiner Judgements in Objective Structured Clinical Exam-
ation, https://doi.org/10.1016/j.hpe.2020.02.005
4 W.Y.A. Wong et al. / Health Professions Education xxx (xxxx) xxx
+ MODEL
effective way of analysing the secondary assessmentdata collected in the pre-feedback (P1) OSCE andpost-feedback (P2) OSCE. This G study was a quasi-experimental study, as allocating examiners to acontrol group would not be achievable when theprovision of structured feedback might have a real-lifeimpact on students’ scores in a high-stakesassessment.
The underlying design adopted was a multifacetedG study design,30 in which three facets were underinvestigation: examiners (j), students (p) and stations(s).
Pre-feedback (P1) OSCE
P1 OSCE consenting examiners
Examiners (j) = 141
Students (p) = 376
Unique stations (s) = 42
Analysis 1Among the 141 examiners, 51 examined again in the P2 OSCE.Examiners1 (j) = 51
Students (p) = 348
Unique stations (s) = 38
Analysis 2Among the 51 examiners, 26 examined in at least one station that was used in both OSCEs.Examiners2 (j) = 26
Students (p) = 251
Unique stations3 (s) = 13
1The composition of the 51 examiners was the same in the2The composition of the 26 examiners was different in the3A total of 15 P1 OSCE stations were used again in the P2
by the group of examiners who assessed students in both O
the result of one P1 OSCE station being divided into two
Feedback pto examiner
weeks beP2 OSC
Fig. 1. The number of examiners, students, and stations inv
Please cite this article as: Wong WYA et al., Impact of Structured Feedba
inations (OSCEs) Using Generalisability Theory, Health Professions Educ
However, to ensure the best estimates of exam-iner-related variances, this multifaceted G study wasmodified on account of the partially-crossed andunbalanced dataset.28 The dataset of students andexaminers was partially-crossed because only aproportion of students had the same set of examinersand thus the same set of stations. In addition, not allexaminers consented to participate in this study. Thedataset of examiners and stations was unbalanced asa number of examiners assessed students in multiplestations within and across different OSCE sessions.This partially-crossed and unbalanced design
Post-feedback (P2) OSCE
P2 OSCE consenting examiners
Examiners (j) = 111
Students (p) = 354
Unique stations (s) = 28
Analysis 1
Examiners1 (j) = 51
Students (p) = 322
Unique stations (s) = 27
Analysis 2Among the 51 examiners, 26 examined in at least one station that was used in both OSCEs.Examiners2 (j) = 26
Students (p) = 291
Unique stations3 (s) = 14
No feedback group
Examiners (j) = 60
Students (p) = 338
Unique stations (s) = 27
P1 and P2 OSCEs in Analysis 1.
P1 and P2 OSCEs in Analysis 2.
OSCE. However, only 13 of them were examined
SCEs. The additional station in the P2 OSCE was
stations in the P2 OSCE.
rovided s eight fore E
olved in the P1 and P2 OSCEs for Analysis 1 and 2.
ck on Examiner Judgements in Objective Structured Clinical Exam-
ation, https://doi.org/10.1016/j.hpe.2020.02.005
5W.Y.A. Wong et al. / Health Professions Education xxx (xxxx) xxx
+ MODEL
facilitates the calculations of the estimates of thevariance components contributed to the examiners’scores shown in Table 1, with the plain English ex-planations of these variance components adaptedfrom Crossley et al.31
5. Participants
The research participants were examiners of the final-year high-stakes summative OSCEs. All the OSCE ex-aminers attended a short briefing (maximum length was30 minutes) prior to the commencement of the OSCE ineach session, which was the only ‘on-the-spot’ examinertraining required. Apart from this, mandatory examinertraining was not offered, or required by this medicalschool. All examiners across all different sites wereinvited to participate in this study.
In the pre-feedback (P1) OSCE, a total of 159 ex-aminers assessed the final-year medical students acrossall four sessions; 141 examiners (88.7%) agreed to beresearch participants and assessed 376 students. Eachstudent was required to complete a full cycle of 12stations in a single allocated session. There were only42 unique stations, as six stations were used in morethan one session.
In the post-feedback (P2) OSCE, a total of 143examiners assessed the final-year medical studentsacross all four sessions; 111 examiners (77.6%) agreedto be research participants and assessed 354 students.Each student was required to complete a full cycle of
Fig. 2. Distribution of an examiner’s scor
Please cite this article as: Wong WYA et al., Impact of Structured Feedba
inations (OSCEs) Using Generalisability Theory, Health Professions Educ
10 stations in a single allocated session. There wereonly 28 unique stations, as 12 stations were used inmore than one session. As this study focused on theoverall OSCE, the total numbers of students, examinersand stations involved in the P1 and P2 OSCEs forAnalysis 1 and 2 are presented in Fig. 1.
6. Procedures of examiners scoring studentcompetence
Each OSCE station had a specific marking sheetwhich followed the same format and had beendeveloped over time by clinicians and medical edu-cators within the medical school. This study focusedon the examiners’ scores only in Part A of the markingsheet, which listed from three to seven criteria toassess a specific clinical skill or in response to theparticular clinical scenario in a station. For eachmarking criterion, there were checklist points to guidethe examiners. Examiners rated each marking crite-rion of each student’s performance based on thefollowing marking standards related to their achieve-ment, the corresponding scores recorded are shown inbrackets: very well (6); well (4); partially (2); poorly(1); or, not at all (0). Part B of the marking sheet wascommon to all OSCE stations and asked for the ex-aminers’ overall impression rating of a student’sperformance in a station independently of the check-list items for standard-setting purposes. This part wasoutside the scope of this study, as the majority of
es awarded to students in a station.
ck on Examiner Judgements in Objective Structured Clinical Exam-
ation, https://doi.org/10.1016/j.hpe.2020.02.005
Fig. 3. Comparison of an examiner’s scores to these of the other examiners in the same station.
6 W.Y.A. Wong et al. / Health Professions Education xxx (xxxx) xxx
+ MODEL
examiners awarded a pass to students across all sta-tions in both OSCEs which provided only limiteddiscrimination of the examiners’ marking behaviourin their cohort.
7. Provision of structured feedback as an exam-iner training strategy
All consenting examiners (n ¼ 141) from the P1OSCE received a structured feedback report via emailapproximately eight weeks before the P2 OSCE. Thisfeedback timing was anticipated to provide sufficienttime for the examiners to reflect on the feedback priorto assessing students again in the P2 OSCE. The designof the feedback reports aligned with the perspective ofexaminer cognition that examiners are trainable.14 Thepurpose of the reports was to provide the examiners
Please cite this article as: Wong WYA et al., Impact of Structured Feedba
inations (OSCEs) Using Generalisability Theory, Health Professions Educ
with data about the mean and range of scores given foran OSCE station, and comparisons with other exam-iners’ judgements in the same station, as well as in theentire examiner cohort.
The report began by introducing the background ofthe station in which the examiner was involved, themarking criteria and the total score available for thestation. The first part of the report consisted of a graphshowing the distribution of an examiner’s scoresawarded to students in a station (Fig. 2). The y-axisshows the ranking of students in terms of their scoresawarded in a descending order. This provided a quickway to show the range of scores given to the number ofstudents within a station.
The second part showed the comparison of anexaminer’s scores to the other examiners in the samestation (Fig. 3).
ck on Examiner Judgements in Objective Structured Clinical Exam-
ation, https://doi.org/10.1016/j.hpe.2020.02.005
0
20
40
60
80
100
1 3 5 7 911
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
63
65
67
69
71
73
75
77
79
81
83
85
87
89
91
93
95
97
99
10
110
310
5107
10
911
111
311
511
711
912
112
312
512
712
913
113
313
513
713
914
1
Mea
n pe
rcen
tage
scor
es
Examiners
Comparison of mean percentage scores among all consenting examiners (n=141)in the pre-feedback (P1) OSCE
Fig. 4. Comparison of an examiner’s mean percentage score among all consenting examiners in the pre-feedback (P1) OSCE.
Table 2
Results for Analysis 1 of the OSCE examiners’ scores.
a The composition of the 26 examiners in the P2 OSCE was different from the 26 examiners in the P1 OSCE. This is to ensure that at least one
station was common across both OSCEs.
7W.Y.A. Wong et al. / Health Professions Education xxx (xxxx) xxx
+ MODEL
Please cite this article as: Wong WYA et al., Impact of Structured Feedback on Examiner Judgements in Objective Structured Clinical Exam-
inations (OSCEs) Using Generalisability Theory, Health Professions Education, https://doi.org/10.1016/j.hpe.2020.02.005
8 W.Y.A. Wong et al. / Health Professions Education xxx (xxxx) xxx
+ MODEL
Finally, the third part showed the comparison of anexaminer’s mean percentage score with those of all theexaminers in the P1 OSCE using a bar graph. Eachexaminer was informed of their rank on the continuumfrom the most stringent (1st) to the most lenient(141th) examiner (Fig. 4). The feedback was intendedto prompt examiners to reflect on their markingbehaviour by exploring the patterns of their scores andthe comparisons with the cohort.
8. Statistical analysis
The quasi-experimental pre- and post-design studyfacilitated the exploration of the examiner stringencyand leniency variance (Vj) impacting on the examiners’scores before and after feedback. We applied G theoryand generated the estimates of each variance compo-nent in the examiners’ scores in the P1 and P2 OSCEsusing a Minimum Norm Quadratic Unbiased Estima-tion (MINQUE) procedure in the IBM StatisticalPackage for the Social Sciences (SPSS) Version 24.0.MINQUE was selected because of the unbalanceddataset31 used in this study. Analysis 1, whichaddressed RQ1, explored Vj of those examiners whoassessed students in both P1 and P2 OSCEs, and hencecontrolled for the differences in the examiners. Anal-ysis 2, which addressed RQ2, explored Vj of thoseexaminers who assessed students in at least one com-mon station across both P1 and P2 OSCEs, and hencecontrolled for the differences in the OSCE stations.
9. Results
9.1. Analysis 1: contribution of and change inexaminer stringency and leniency (Vj) of those ex-aminers who assessed students in both pre-feedback(P1) and post-feedback (P2) OSCEs
Results for Analysis 1 of the estimates of each vari-ance component in the examiners’ scores are presentedin Table 2. The first column lists all the variance com-ponents contributing to the examiners’ scores. The sec-ond and third columns list the corresponding estimatesand their percentages contributed to the overall variationof the examiners’ scores, respectively, in the P1 OSCE.The fourth and fifth columns list the corresponding es-timates and their percentages contributed to the overallvariation of the same 51 examiners’ scores, respectively,in the P2 OSCE. The last two columns show the per-centage changes in each of the estimates and in theircontribution to the overall variation of the examiners’scores, respectively, after feedback was provided.
Please cite this article as: Wong WYA et al., Impact of Structured Feedba
inations (OSCEs) Using Generalisability Theory, Health Professions Educ
Analysis 1 addressed RQ1 by controlling for thedifferences within the examiner cohort. Resultsrevealed that the magnitude of Vj contributing to theexaminers’ scores was reduced from 7.91 to 5.09 (%change in estimate¼35.65%) after feedback. Itscontribution to the overall variation of the examiners’scores also reduced from 23.01% to 15.58% (% changeto overall variation¼7.43%). Both reductions appearedto be associated with the possible impact of providingstructured feedback on decreasing the contribution ofthe examiner stringency and leniency variance (Vj) totheir scores in the subsequent OSCE.
Apart from the impact of Vj, station difficulty andstudent ability also contributed to the overall variationof the examiners’ scores. Results showed that the es-timate of station difficulty was 2.27, and its percentagecontributing to the overall variation of the examiners’scores was 6.95%, after feedback was provided in theP2 OSCE. This indicated that the consistent differencesin OSCE station difficulty contributed less to the ex-aminers’ scores compared to Vj (% contributed tooverall variation¼15.58%) in the P2 OSCE.
Moreover, the estimate of student ability was 5.18,and its percentage contributing to the overall variationof the examiners’ scores was 15.86% in the P2 OSCE.This indicated that the consistent differences betweenstudent ability contributed to a similar extent to theexaminers’ scores compared to Vj (% contributed tooverall variation¼15.58%) in the P2 OSCE.
To further investigate the decrease in the examinerstringency and leniency variance after feedback, wecontrolled the variance of station difficulty by focusingon the stations that were common across both OSCEsin Analysis 2.
9.2. Analysis 2: contribution of and change in Vj ofthose examiners who assessed students in at least onecommon station across both P1 and P2 OSCEs
Results for Analysis 2 of the estimates of eachvariance component in the examiners’ scores are pre-sented in Table 3 which follows the same format asTable 2 in terms of the information presented in eachcolumn.
Analysis 2 addressed RQ2 by controlling for thevariance of station difficulty to focus on the stations thatwere common across both OSCEs, the magnitude of Vj
contributing to the examiners’ scores was reduced from9.59 to 5.70 (% change in estimate¼40.56%) afterfeedback. Its contribution to the overall variation of theexaminers’ scores also reduced from 24.27% to 16.55%(%change to overall variation¼7.72%). Both reductions
ck on Examiner Judgements in Objective Structured Clinical Exam-
ation, https://doi.org/10.1016/j.hpe.2020.02.005
9W.Y.A. Wong et al. / Health Professions Education xxx (xxxx) xxx
+ MODEL
shown appeared to be associatedwiththe possible impactof structured feedback on decreasing the contribution ofthe examiner stringency and leniency variance (Vj) totheir scores in the subsequent OSCE.
Apart from the impact of Vj, station difficulty andstudent ability also contributed to the overall variationof the examiners’ scores. Results showed that the es-timate of station difficulty was 1.00, and its percentagecontributing to the overall variation of the examiners’scores was 2.90%, after feedback was provided in theP2 OSCE. This indicated that the consistent differencesin OSCE station difficulty contributed less to the ex-aminers’ scores compared to Vj (% contributed tooverall variation¼16.55%) in the P2 OSCE. This wasanticipated as the common stations from both yearswere used in this analysis.
Moreover, the estimate of student ability was 5.50,and its percentage contributing to the overall variationof the examiners’ scores was 15.97% in the P2 OSCE.This indicated that the consistent differences betweenstudent ability contributed to a similar extent to theexaminers’ scores as Vj (% contributed to overallvariation¼16.55%) in the P2 OSCE.
The estimate of error (Verr) was equal to zero inboth Analysis 1 and 2 because all the errors were re-distributed to all other variance components in bothanalyses. This is the result of using the selected designand analysis model in this study, which specified everyvariance component. There is no instance where anexaminer’s score could not be fully described in termsof these five specified variance components, that is,student ability, OSCE station difficulty, examinerstringency/leniency, case-specific stringency and caseaptitude (Table 1). Therefore, there should be no re-sidual (error) variance.
10. Discussion
Final-year OSCEs are high-stakes assessments ofstudent results having a direct impact on their pro-gression to internship. The OSCE examiners play a keyrole as gatekeepers to ensure that only those studentswho have demonstrated adequate clinical competenceare awarded the opportunity to progress their career asmedical doctors. This study, aligned with the examinercognition perspective that examiners are trainable,14
explored the change of the magnitude of examinerstringency and leniency variance (Vj) following theprovision of structured feedback to the examiners as aform of training strategy.
When comparing the pre-feedback and post-feed-back OSCEs, Vj reduced (from 7.91 to 5.09) for the 51
Please cite this article as: Wong WYA et al., Impact of Structured Feedba
inations (OSCEs) Using Generalisability Theory, Health Professions Educ
examiners who assessed students in both OSCEs. Thedecrease was more obvious (from 9.59 to 5.70) in the 26examiners who assessed students in both OSCEs and inat least one station common across both OSCEs. It isalso worthwhile to note that the contribution of Vj to theoverall variation of the examiners’ scores was reducedby about 7% in both groups of examiners (last columnin Tables 2 and 3) after feedback was provided. Thesefindings were consistent with the research hypothesisthat structured feedback reduced examiner variancewhen they assessed students subsequently. This initialevidence supports the value of providing structuredfeedback to examiners and suggests ways in which thefeedback could be better targeted to initiate and main-tain change in examiners’ assessment behaviours. Giventhat there are other possible confounding factorsimpacting on the examiners’ scores, and there is nocontrol group in this study, the results did not intend tomake causal inferences. More empirical research isrequired prior to making recommendations for practice.
10.1. Implications for future research
The impact of feedback on Vj highlights the impor-tance of examiners making their judgements of studentclinical competence based on students’ ability, instead ofbeing influenced by their own stringency and leniency.To further establish which specific aspects of the feed-back were the most impactful in changing examiners’assessment behaviour, we suggest that it is also importantto include the examiners’ perspective and conduct us-ability testing in designing an effective feedback reportthat will enable examiners to better understand theirmarking behaviour. In addition, to ensure a comprehen-sive dataset is collected for future naturalistic research ofOSCEs, it is crucial that researchers work collaborativelywith the academics, clinicians, examiners and profes-sional administrative staff to develop a well-designedexamination and data collection plan.
10.2. Strengths and limitations
This study is one of the first studies to have exploredthe impact of providing structured feedback to exam-iners, as a form of examiner training intervention, on themagnitude of Vj contributing to the examiners’ scores.Previous studies mainly focused on the impact of per-formance dimension, frame-of-reference and behav-ioural observation training.18,20 The findings of thisstudy advance the knowledge in suggesting an associa-tion between providing examiners with structuredfeedback, as a form of training, and its effect on Vj
ck on Examiner Judgements in Objective Structured Clinical Exam-
ation, https://doi.org/10.1016/j.hpe.2020.02.005
10 W.Y.A. Wong et al. / Health Professions Education xxx (xxxx) xxx
+ MODEL
contributed to their scores. Although the feedbackmechanism may well have reduced the examiner strin-gency and leniency variance, other factors might havecontributed to it. For example, as the OSCE examinersgain experience in assessing students, it is possible thatthey introduce less variance into their scores regardlessof the provision of structured feedback about theirmarking behaviour. Also, different cohorts of studentsmay have different levels and range of abilities and thiscould potentially have influenced the examiners’judgements. However, it is not possible to have the samecohort of students in the P1 and P2 OSCEs in this study,as the final-year OSCE is only conducted annually.
In addition, there are challenges with the quasi-experimental design in this study. We acknowledgethat the stability of the estimates of Vj will need to bedemonstrated in other institutions. The primaryconstraint was that this G study was contingent on theassessment data from large-scale OSCEs in which theexaminer judging plan was entirely pragmatic, and notmodifiable to gain better estimates of the variancecomponents in the examiners’ scores. Additionally, notall the examiners provided consent to participate in thisstudy, which was an agreement to have their scoresaggregated for quality improvement purposes,including publications. Therefore, we had to adopt apartially-crossed and unbalanced G study design.28
Nevertheless, the large cohorts of examiners andstudents involved in both OSCEs were a strength ofthis study, with 141 (88.7%) of the examiners in thepre-feedback (P1) OSCE and 111 (77.6%) of the ex-aminers in the post-feedback (P2) OSCE consenting toparticipate. These large cohorts facilitated the collec-tion of a reasonable amount of data to compare theexaminer stringency and leniency variance (Vj) in sub-groups of examiners in Analysis 1 and 2.
11. Conclusions
This study has offered preliminary support to thepossible impact of structured feedback on the exam-iners’ marking behaviour in a typical undergraduateOSCE setting using G theory. The findings enhance theunderstanding of the possible impact of structuredfeedback, as a form of training, on the magnitude ofexaminer stringency and leniency variance (Vj)contributing to the examiners’ scores before and afterfeedback. The statistical analyses from the G studysuggest that providing feedback to the examiners mightbe associated with a decrease in the magnitude of Vj
contributing to their scores. The outcomes of this studyprovide a basis to further explore the features of
Please cite this article as: Wong WYA et al., Impact of Structured Feedba
inations (OSCEs) Using Generalisability Theory, Health Professions Educ
effective feedback to examiners about their markingbehaviour. This is particularly important as examinerstringency and leniency in high-stakes assessmentsimpacts not only on student progression, but ulti-mately, and more importantly, on the delivery ofoptimal patient care and safety as medical doctors.
Contributors
WYAW and CR led the study conception andcontributed to the design, data analysis and interpre-tation. WYAW wrote the first draft of the paper. JT hascontributed to the design of the overall study and madesubstantial contributions to the interpretation of thesedata. All authors contributed to the critical revision ofthe paper and approved the final manuscript forpublication.
Ethical approval
This study was approved by The University ofQueensland, Behavioural & Social Sciences EthicalReview Committee (approval no: 2013001070).
Funding
This research did not receive any specific grant fromfunding agencies in the public, commercial, or not-for-profit sectors.
Declaration of Competing Interest
None.
Acknowledgements
The authors would like to thank Professor JimCrossley for his invaluable advice on the application ofGeneralisability Theory in estimating variance com-ponents, Associate Professor Karen Moni and Asso-ciate Professor Lata Vadlamudi for reviewing previousdrafts and providing helpful comments, and theparticipating OSCE examiners at The University ofQueensland.
References
1. Khan KZ, Ramachandran S, Gaunt K, Pushkar P. The objective
structured clinical examination (OSCE): AMEE guide no. 81. Part I:
an historical and theoretical perspective. Med Teach.
2013;35(9):e1437ee1446. https://doi.org/10.3109/
0142159X.2013.818634.
ck on Examiner Judgements in Objective Structured Clinical Exam-