-
Confidence-based assessment in the 1st year medical end-of-year
exam
Tony Gardner-MedwinPhysiology, UCL a useful study tool - but why
in exams?it reflects the proper meaning of knowledgeconventional
marking disadvantages able studentshow did the students do in the
exam?conf-asst was a more reliable measure of student abilityit
saves on the number of questions required
a useful study tool - but why in exams?it reflects the proper
meaning of knowledgeconventional marking disadvantages able
studentshow did the students do in the exam?conf-asst was a more
reliable measure of student abilityit saves on the number of
questions required
-
Knowledge depends on degree of belief, or confidence:knowledge
uncertainty ignorance misconception delusionWhat is
Knowledge?Knowledge depends on degree of belief, or
confidence:knowledgeuncertaintyignorancemisconceptiondelusion=0-log2(confidence*)
for truth of a=1true proposition
>>1
Measurement of knowledge requires the eliciting of confidence
(or *subjective probability) for the truth of correct
statements.
This requires a proper scheme of incentives
-
LAPT confidence-based scoring scheme
Confidence Level 1 2 3Score if Correct 1 2 3Score if incorrect 0
-2 -6P(correct) < 67% >67% >80%Odds < 2:1 >2:1
>4:1
-
conventional marking disadvantages able studentsSuppose 4
students go for the same answer options in an exam: 75 , 25 Ai is
confident of all his answers Bo is very hesitant about all her
answersCyis realistic (expects 75%), but cant distinguish reliable
& uncertain answersDi is confident of 50 answers (90% ) and
uncertain of the others (60% )Clearly: Di > Cy > Bo, AiDi has
extra insight - about her knowledge, or maybe about subtleties in
questions How can she use this insight?Conventional scoring: Her
only option is to omit uncertain answers: % correct: Ai = Bo = Cy =
75%, Di = 45% negative marking score (1): Ai = Bo = Cy = 50%, Di =
40% Confidence-based scoring: She can moderate her confidence:Ai
enters all at C=3, Bo at C=1:Ai = Bo = 25%Cy enters all at C=2:Cy =
33%Di splits answers C=3, C=1:Di = 48%[adjust] 33%
-
Summary aimsreward the ability to distinguish reliable and
uncertain answers (whatever the reason for uncertainty)penalise
confident errors more than errors from uncertaintyWhat people
sometimes think is the aim!to penalise a general over-confidence or
under-confidence - probably helped by practice & feedback, but
not an exam issue
-
How well did students discriminate? exam: 500 T/F Qs, in 2
sessions, each 2hrs331 students: 190 F, 141 M
-
0%20%40%60%80%100%0%20%40%60%80%100%conventional scaled score
(simple score) confidence-based scoreA. (50% correct)dacb
-
Simple scores (scaled conventional scores)- 65% of the variance
came from answers at C=1, but only 18% of the credit.
Breakdown of credit and variance due to
uncertaintyConfidence-based scores: these give less weight to
uncertain answers; uncertainty variance is then more in proportion
to credit, and was reduced by 46% (relative to the variation of
student marks)
-
Exam marks are determined by:1. the students knowledge and
skills in the subject area2. the level of difficulty of the
questions3. chance factors in the way questions relate to details
of the students knowledge 4. chance factors in the way
uncertainties are resolved (luck)The most convincing test of this
is to compare marks on one set of questions with marks for the same
student on a different set . A good correlation means we are
measuring something about the student, not just noise(1) = signal
(its measurement is the object of the exam) (3,4) = noise (random
factors obscuring the signal)Confidence-based marks improve the
signal-to-noise ratio
-
The correlation, across students, between scores on one set of
questions and another is higher for confidence than for simple
scores.But perhaps they are just measuring ability to handle
confidence ?
-
How should one handle students with poor
calibration?Significantly overconfident: 2 students (1%)e.g. 50%
correct @C=1, 59%@C=2, 73%@C=3Significantly underconfident: 41
students (14%)e.g. 83% correct @C=1, 89%@C=2, 99%@C=3Maybe one
shouldnt penalise such students Adjusted confidence-based
score:Mark the set of answers at each C level as if they were
entered at the C level that gives the highest score. mean benefit =
1.5% 2.1% (median 0.6%)
-
0%20%40%60%80%100%0%20%40%60%80%100%simple scaled score
confidence-based scoreA. (50% correct)(100% correct)
-
simple conf conf (adj)Signal / noise variance ratio: 2.8 5.3
4.3Savings in no. of Qs required: - 48% 35%
-
SUMMARY CONCLUSIONSAdjusted confidence scores seem the best
scores to use (they dont discriminate on the basis of the
calibration of a persons confidence judgements, and are also the
best predictors of performance on a separate set of
questions).Reliable discrimination of student knowledge can be
achieved with one third fewer questions, compared with conventional
scoring.Confidence scoring is not only fundamentally more fair
(rewarding students who can correctly identify which answers are
uncertain) but it is more efficient at measuring performance.
www.ucl.ac.uk/~cusplap
-
confident errors are far worse than acknowledged ignorance and
are a wake-up call (-6!) to pay attention to explanationsexpressing
uncertainty when you are uncertain is a good thingthinking about
the basis and reliability of answers can help tie bits of knowledge
together (to form understanding)checking an answer and rereading
the question are worthwhilesound confidence judgement is a valued
intellectual skill in every context, and one they can improve
Principles that students seem readily to understand :-both
under- and over- confidence are impediments to learning
-
biba talk feb 2002