Top Banner
Welcome Reliability Programme: Leading the way to better testing and assessments 22 March 2011 Event Chair: Dame Sandra Burslem, DBE, Ofqual's Deputy Chair
90

The Reliability Programme: Leading the way to better tests and assessments

Dec 09, 2014

Download

Education

This is the presentation from "The Reliability Programme: Leading the way to better tests and assessments" event.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Reliability Programme: Leading the way to better tests and assessments

Welcome

Reliability Programme: Leading the way to better testing and assessments

22 March 2011Event Chair: Dame Sandra Burslem, DBE, Ofqual's Deputy Chair

Page 2: The Reliability Programme: Leading the way to better tests and assessments

Welcome and Setting the Scene

Glenys Stacey, Ofqual Chief Executive

Page 3: The Reliability Programme: Leading the way to better tests and assessments

Ofqual’s Reliability Programme

Dennis Opposs

Page 4: The Reliability Programme: Leading the way to better tests and assessments

Reliability: quantifying the luck of the draw

Reliability work in England has generally been Isolated Partial Under-theorised Under-reported Misunderstood

Ofqual’s Reliability Programme aimed to improve the situation.

Background

Page 5: The Reliability Programme: Leading the way to better tests and assessments

To gather evidence for Ofqual to develop regulatory policy on reliability of results from national tests, examinations and qualifications

Aims

Page 6: The Reliability Programme: Leading the way to better tests and assessments

Strand 1: Generating evidence of reliabilityStrand 2: Interpreting and communicating evidence of

reliabilityStrand 3: Developing reliability policy Strand 3a: Exploring public understanding of reliability

Strand 3b: Developing Ofqual policy on reliability

Programme structure

Page 7: The Reliability Programme: Leading the way to better tests and assessments

Our Technical Advisory Group

Paul Black

Anton Beguin Alastair Pollitt

Gordon Stanley Jo-Anne Baird

Page 8: The Reliability Programme: Leading the way to better tests and assessments

Strand 1 – Generating evidence

Synthesising pre-existing evidence Literature reviews Generating new evidence Monitoring existing practices Experimental studies

Page 9: The Reliability Programme: Leading the way to better tests and assessments

Strand 2 – Interpreting and communicating evidence

How do we conceptualise reliability? How do we interpret our findings? How do we communicate our findings?

Page 10: The Reliability Programme: Leading the way to better tests and assessments

Strand 3 – Developing policy

Exploring public understanding of, and attitudes towards, assessment error

Stimulating national debate on the significance of the reliability evidence generated by the programme

Developing Ofqual’s policy on reliability

Page 11: The Reliability Programme: Leading the way to better tests and assessments

Student misclassification

Controversial area - earlier conclusions include: “… it is likely that the proportion of students awarded a level higher or lower than they should be because of the unreliability of the tests is at least 30% at key stage 2”Wiliam, D. (2001). Level best? London: ATL.

“Professors Black, Gardner and Wiliam argued […] that up to 30% of candidates in any public examination in the UK will receive the wrong level or grade”House of Commons Children, Schools and Families Committee. (2008a). Testing and Assessment. Third Report of Session 2007–08. Volume I. HC 169-I. London: TSO.

Is this accurate?

Page 12: The Reliability Programme: Leading the way to better tests and assessments

Strand 1 – Generating evidence (1)

National Curriculum tests: The reliabilities of KS2 science pre-tests and the stability

of consistency over time The reliabilities of the 2008 KS2 English reading pre-test

General qualifications: The reliabilities of GCSE components/units The reliability of GCE units

Vocational qualifications

Page 13: The Reliability Programme: Leading the way to better tests and assessments

Strand 1 – Generating evidence (2)KS2 science pre-tests The reliabilities of KS2 Science tests over five years Values of internal consistency reliability (alpha) generally

over 0.85 Classification accuracy (pre-tests) 83%-88% Classification consistency (between pre-tests and live tests)

72%-79% Reliability indices relatively stable over time Relatively high reliability compared with similar tests

Page 14: The Reliability Programme: Leading the way to better tests and assessments

Strand 1 – Generating evidence (3)

A KS2 English reading pre-test Data collected in 2007 during pre-testing 2008 KS2 English

reading test Containing 34 items and having a total of 50 marks (mean

28.5 and standard deviation 9.1, 1387 pupils) Internal consistency reliability 0.88 Standard error of measurement 3.1 Classification accuracy (IRT) 83% Classification consistency (IRT) 76%

Page 15: The Reliability Programme: Leading the way to better tests and assessments

Strand 1 – Generating evidence (4)

Cronbach’s alpha for GCSE components/units

Page 16: The Reliability Programme: Leading the way to better tests and assessments

Strand 1 – Generating evidence (5)

Cronbach’s alpha for GCE units

Page 17: The Reliability Programme: Leading the way to better tests and assessments

Strand 1 – Generating evidence (6)

Assessor agreement rates for a workplace-based vocational qualification

Qualification Number of decisions

Agreement rate (%)

Cohen’s Kappa

Q1 2144 96.1 0.763

Q2 479 100 1

Q3 3070 99.1 0.971

Page 18: The Reliability Programme: Leading the way to better tests and assessments

Strand 1 - Generating evidence (7)

The 2009 and 2010 live tests (populations)

Subject Cronbach’s alphaClassification accuracy (%)

Method 1 Method 2

Science 2009 0.928 88 87

Science 2010 0.926 87 86

Mathematics 2009 0.968 90 90

Mathematics 2010 0.964 91 90

English 2009 0.910 87 85

English 2010 0.919 85 85

Page 19: The Reliability Programme: Leading the way to better tests and assessments

Strand 2 – Interpreting and communicating evidence (1)

External research projects Estimating and interpreting reliability, based on CTT Estimating and interpreting reliability based on CTT and G-

theory Quantifying and interpreting GCSE and GCE component

reliability based on G-theory Reporting of results and measurement uncertainties Representing and reporting of assessment results and

measurement uncertainties in some USA tests Reliability of teacher assessment

Internal research projects Reliability of composite scores: based on CTT, G-theory

and IRT, qualification level

Page 20: The Reliability Programme: Leading the way to better tests and assessments

Strand 2– Interpretingandcommunicatingevidence (2)

Reporting resultsand associatederrors (studentsand parents)

Page 21: The Reliability Programme: Leading the way to better tests and assessments

Strand 2 – Interpreting and communicating evidence (3)Technical seminars Factors that affect the reliability of results from assessments Definition and meaning of different forms of reliability Statistical methods that are used to produce reliability estimates Representing and reporting assessment results and reliability

estimates / measurement errors Improving reliability and implications Disseminating reliability statistics Tension in managing public confidence whilst exploring and

improving reliability Operational issues for awarding bodies in producing reliability

information Challenges posed by the reliability programme in vocational

qualifications

Page 22: The Reliability Programme: Leading the way to better tests and assessments

Strand 2 – Interpreting and communicating evidence (4)International perspective on reliability Reliability studies should be built into the assessment quality

assurance process Information on reliability (primary and derived indices) should be

in the public domain The introduction of information about reliability (misclassification /

measurement error) should be managed carefully Education of the public to understand concept of reliability

(measurement error) is seen to play an important part to alleviate the problem of misinterpretation by the media

The reporting of results and measurement error can be complex as results are normally used by multiple users

Primary reliability indices and classification indices should be reported at population level

Standard error of measurement should be reported at individual test-taker level

Page 23: The Reliability Programme: Leading the way to better tests and assessments

Strand 3a – Public perceptions of reliability (1)External research projects

Ipsos MORI survey Ipsos MORI workshops AQA focus groups

Internal research project Online questionnaire survey

Investigating Understanding of the assessment process Understanding of factors affecting performance on exams Understanding of factors introducing uncertainty in exam

results Distinction between inevitable errors and preventable

errors Tolerance for errors in results Disseminating reliability information

Page 24: The Reliability Programme: Leading the way to better tests and assessments

Views on accuracy of GCSE grades

Strand 3a – Public perceptions of reliability (2)

Page 25: The Reliability Programme: Leading the way to better tests and assessments

Strand 3a – Public perceptions of reliability (3)

0

10

20

30

40

50

60

70

80

90

100

Teachers Students Employers

Perc

enta

ge a

gree

men

t (%

)

The national exam system is doing a very good job or a good jobThe exam system is doing a good job but can be improved furtherThe exam system is not doing a good job and should be reformed

Views on national exams system

Page 26: The Reliability Programme: Leading the way to better tests and assessments

Strand 3b – Developing Ofqual reliability policy (1)

Ofqual reliability policy based on Evaluating findings from this programme

Evaluating findings from other reliability related studies

Reviewing current practices adopted elsewhere

Page 27: The Reliability Programme: Leading the way to better tests and assessments

Ofqual Board recommendations

Continue work on reliability as a contribution to improving the quality assurance of qualifications, examinations and tests

Encourage awarding organisations to generate and publish reliability data

Continue to improve public and professional understanding of reliability and increase public confidence

Page 28: The Reliability Programme: Leading the way to better tests and assessments

Next steps

Publishing reliability compendium later this year Reliability work becomes “business as usual” Creation of a further policy

Page 29: The Reliability Programme: Leading the way to better tests and assessments

Today

Presentations from the Technical Advisory Group and experts in teaching, assessment research and communications

Question and answer session Tell us your opinions or email them to

[email protected]

Page 30: The Reliability Programme: Leading the way to better tests and assessments

Findings from the Reliability Research

Professor Jo-Anne Baird, Technical Advisory Group Member

Page 31: The Reliability Programme: Leading the way to better tests and assessments

Refreshment Break

Page 32: The Reliability Programme: Leading the way to better tests and assessments

A view from the assessment community

Paul E. NewtonDirector, Cambridge Assessment Network Division

Presentation to Ofqual event The reliability programme: leading the way to better testing and assessments.

22 March 2011.

Page 33: The Reliability Programme: Leading the way to better tests and assessments

We need to talk about error

Page 34: The Reliability Programme: Leading the way to better tests and assessments

Talking about error

Page 35: The Reliability Programme: Leading the way to better tests and assessments

The Telegraph (front page)

Page 36: The Reliability Programme: Leading the way to better tests and assessments

The professional justification– what the profession needs to accomplish

through talking about error

Page 37: The Reliability Programme: Leading the way to better tests and assessments

The bad old days

Boards seem to have strong objections to revealing their mysteries to ‘outsiders’ […] There have undoubtedly been cases of inquiries […] where publication would have been in the interests of education, and would have helped to prevent the spread of ‘horror-stories’ about such things as lack of equivalence which is an inevitable concomitant of the present cloak of secrecy.

Wiseman, S. (1961). The efficiency of examinations. In S. Wiseman (Ed.). Examinations in education. Manchester: MUP.

Page 38: The Reliability Programme: Leading the way to better tests and assessments

Promulgating the myth

However, any level of error has to be unacceptable – even just one candidate getting the wrong grade is entirely unacceptable for both the individual student and the system.

QCA. (2003). A level of preparation. TES Insert. The TES, 4 April.

Page 39: The Reliability Programme: Leading the way to better tests and assessments

The technical justification– why users and stakeholders need to know

about error

Page 40: The Reliability Programme: Leading the way to better tests and assessments

Using knowledge of error

• Students and teachers– maybe you’re better, or worse, than your grades suggest

• Employers and selectors– maybe such fine distinctions shouldn’t be drawn– maybe other information should be taken into account

• Parents– maybe that difference in value added is insignificant– maybe inferences like that should not be drawn

• Awarding bodies– maybe that examination (structure) is insufficiently robust

• Policy makers– maybe that proposed use of results is illegitimate– maybe that policy change will compromise accuracy

Page 41: The Reliability Programme: Leading the way to better tests and assessments

Talking about error

• the commitment to greater openness and transparency about error is nothing new

• but there is still a long way to go

Page 42: The Reliability Programme: Leading the way to better tests and assessments

The 20-point scale (1969-72)

The presentation of results on(i) the broadsheet will be by a single number denoting a scale point for each subject taken by each candidate, accompanied by a statement on the range of uncertainty; and(ii) the candidate's certificate as a range of scale points (eg 13-17, corresponding to 15 on the broadsheet and indicating a range of uncertainty of plus or minus 2 scale points.)

Schools Council (1971). General Certificate of Education. Introduction of a new A-level grading scheme. London: Schools Council.

Page 43: The Reliability Programme: Leading the way to better tests and assessments

The 20-point scale (1969-72)

The following rubric is proposed, to be prominently displayed on both broadsheets and certificates:

"Attention is drawn to the uncertainty inherent in any examination. In terms of the scale on which the above results are recorded, users should consider that a candidate's true level of attainment in each subject while possibly represented by a scale point one or two higher or lower, is more likely to be represented by the scale point awarded than by any other scale point [...]."

Report by the Joint Working Party on A-level comparability to the Second Examinations Committee of the Schools Council on grading at A-level in GCE examinations. (1971)

Page 44: The Reliability Programme: Leading the way to better tests and assessments

20-point scale (1983-86)

It was proposed that the new scheme should have the following characteristics:[...] (d) results should be accompanied by a statement of the possible margin of error.

JMB (1983). Problems of the GCE Advanced level grading scheme. Manchester: Joint Matriculation Board.

Page 45: The Reliability Programme: Leading the way to better tests and assessments

Talking about error

• there is disagreement within the profession over the concept of error

• but, at least, we are beginning to make these differences of opinion more explicit

Page 46: The Reliability Programme: Leading the way to better tests and assessments

Measuring attainment

Consistency

How likely is it that students

would be awarded different

levels […]

'Correctness'

How likely is it that students are

awarded 'incorrect' levels

[…]

Accuracy

How likely is it that students are awarded incorrect

levels […]

Question completionMarking facet (only) All (perhaps 'most') facets

Classification descriptor

Questionstem

1

3Royal-Dawson (2005)

Baker et al. (2006)Royal-Dawson et al. (2009)

5n.a.

2NFER in Newton (2009) - not

adjusted

4NFER in Newton (2009) - adjusted

Wiliam (2001)

6What the public want to know!

Page 47: The Reliability Programme: Leading the way to better tests and assessments

Judging performance

I argue that there is a strong case for saying that it is more sensible to accept that exams are just about fair competition – which means your performance must be reliably turned into a score but you accept as the luck of the draw things like the question paper being tough for you or having hay fever on the day, etc. Moreover, I think if you do that you can design things like regulatory work on reliability so that they reflect the priorities of the public. This was behind my first question to you about your presentation yesterday – do you really think Joe Public is interested in Cell 6? That’s an empirical question of course; I think the answer is no, but I’d love to find out for sure.

Mike Cresswell, 20 October 2009, personal communication

Page 48: The Reliability Programme: Leading the way to better tests and assessments

Uses of reliability information

• Evaluation and improvement– highly technical (detailed & specific & idiosyncratic)– obscure (typically not published)– primary users = awarding bodies

• Accountability– technical (but how detailed & generic & uniform?)– translatable (published but not necessarily disseminated)– primary users = regulator & analysts

• Education– non-technical (uncomplicated & generic & uniform)– translated (widely disseminated)– primary users = members of the public

Page 49: The Reliability Programme: Leading the way to better tests and assessments

For education

How can we achieve greater openness and

transparency?

Page 50: The Reliability Programme: Leading the way to better tests and assessments

The Sun

Page 51: The Reliability Programme: Leading the way to better tests and assessments

For education

• use analogy, wherever possible• use commonsense, not technical, terms• convey misrepresentation, not variation• rely on heuristics, not statistics

[…] results on a six or seven point grading scale are accurate to about one grade either side of that awarded.Schools Council. (1980). Focus on examinations. Pamphlet 5. London: Schools Council.

Page 52: The Reliability Programme: Leading the way to better tests and assessments

The importance of assessment results in today’s education system...

and communicating uncertainty in what they can tell us

Warwick Mansell

Page 53: The Reliability Programme: Leading the way to better tests and assessments

The emphasis being placed on test results

Page 54: The Reliability Programme: Leading the way to better tests and assessments

Child takes exams

Head teacherJudgement: school level

Exams marked and graded

Department School results Ofsted

Judgement: local level

Local authority/ federation/academy chain

Judgement: national level

Education initiatives

Civil servants

Ministers National productivity

Debate: state ed

successful?

Teacher

One pupil’s exam results: national implications

Page 55: The Reliability Programme: Leading the way to better tests and assessments

Types of “error”

Page 56: The Reliability Programme: Leading the way to better tests and assessments

Error:

“the difference between an approximate result and the true determination”.

Page 57: The Reliability Programme: Leading the way to better tests and assessments

Communication of measurement error:

It can, and is, done

Page 58: The Reliability Programme: Leading the way to better tests and assessments

“The information in these tables only provides part of the

picture of each school’s and its pupils’ achievements.

Schools change from year to year and their future results

may differ from those achieved by current pupils. The

tables should be considered alongside other important

sources of information such as Ofsted reports and school

prospectuses.”

DfE, school performance tables website, 2011

Page 59: The Reliability Programme: Leading the way to better tests and assessments

What can go wrong if measurement certainty is not understood and communicated

Page 60: The Reliability Programme: Leading the way to better tests and assessments
Page 61: The Reliability Programme: Leading the way to better tests and assessments

Is the public ready to accept the concept of measurement error?

Page 62: The Reliability Programme: Leading the way to better tests and assessments

Sats results “wrong for thousands of pupils”

Daily Telegraph, 13/11/09

Page 63: The Reliability Programme: Leading the way to better tests and assessments

“New Sats fiasco as one in three pupils 'will get wrong exam results’”

Daily Mail, 31/1/09

Page 64: The Reliability Programme: Leading the way to better tests and assessments

Talking about reliability at the macro, and at the micro, level

Page 65: The Reliability Programme: Leading the way to better tests and assessments

Was I reliably informed...?

... a former principal ponders

John GuyFormerly Principal, Farnborough Sixth Form College

Page 66: The Reliability Programme: Leading the way to better tests and assessments

3250 students; Mostly A levels3312 applications for 1750 places in September 2010

61 AS courses Biggest?

AS Mathematics AS PsychologyAS English AS Media

Smallest? AS Italian (6)

Page 67: The Reliability Programme: Leading the way to better tests and assessments
Page 68: The Reliability Programme: Leading the way to better tests and assessments

Reliability refers to the consistency of outcomes that would be observed from an assessment process were it to be repeated.

High reliability means that broadly the same outcomes would arise.

A range of factors that exist in the assessment process can introduce unreliability into assessment results.

(un)reliability concerns the impact of the particular details that do happen to vary from one assessment to the next for whatever reason.

So reliability was important to the College.....and we paid over £800,000 a year to get it

Page 69: The Reliability Programme: Leading the way to better tests and assessments

Today’s session:

Ponder aloud on reliability and the causes of unreliability and its impact upon College students

A level History

A level Business Studies

A level Art

O level Athletics

Page 70: The Reliability Programme: Leading the way to better tests and assessments

Hasna Benhassi Tatyana Tomashova

Page 71: The Reliability Programme: Leading the way to better tests and assessments

A level History150 – 200 students taking A2 annually

Previous achievements and value-added indicators suggest improving cohort

Stable cohort of experienced and inspiring teachers, led by Chair of History Teaching Association

Many experienced A level examiners

Could be employed by Higher Education – and would be awarding degrees...

Page 72: The Reliability Programme: Leading the way to better tests and assessments

145 140 166 179 195

Completers

History A level results

AwardingBody

Page 73: The Reliability Programme: Leading the way to better tests and assessments

60

0

38

3430

E

A

C

B

D

100

80

70

60

50

40

0

Mapping RawScore

UMS Scale

Map to UMS ABCDE

BAR

30

Marking tolerance+/- 5%

Tolerance Amplified+/- 8%

A*

90

42

27

Page 74: The Reliability Programme: Leading the way to better tests and assessments

145 140 166 179 195

Completers

History A level results

AwardingBody

Reliability refers to the consistency of outcomes that would be observed from an assessment process were it to be repeated.

Page 75: The Reliability Programme: Leading the way to better tests and assessments

60

0

38

3430

E

A

C

B

D

Mapping RawScore

BAR

A*

42

27

A-E range should be 40%

Narrow A-E range produces unreliability – in this case range is 25%

70%

45%

Page 76: The Reliability Programme: Leading the way to better tests and assessments

60

0

333027

E

A

CB

D

Business Studies 2011 A2 raw marks – from web search

BAR A*

36

25

18% A-E range!!

60%

42%

42

These raw marks over 42 worth nothing

These raw marks between 27 and 42 worth 3% each

These raw marks 23-27 worth 5% each

These raw marks 0 - 23 worth 1.5 % each

Candidate 1Q4 = 4 raw marks Total 27

Candidate 2Q4 = 0 raw marks Total 23

50%

30%

Is this a reliable or valid assessment

instrument?

Page 77: The Reliability Programme: Leading the way to better tests and assessments

The Regulated Assessment (wobbly) Ruler?

Questions 1,2,3 Questions 5, 6, 7, 8

When you measure things... ...it’s a good idea to use a reliable ruler!

Sometimes I think the College ruler is more reliable!

4

0!

Page 78: The Reliability Programme: Leading the way to better tests and assessments

AS level Art 2007 - 495 Candidates

A B C D EFSFC 2007

14.1 37.5 72.7 93.1 97.1

Joint Council Figure2007 21 42 66 83 94

FSFC 2006

23.2 55.4 87.3 96.3 98.3

Joint Council Figure2006 22 44 67 84 94

FSFC 2005

20.7 48.3 82.2 97.8 99.3

Joint Council Figure2005 21 42 65 82 92

FSFC 2004

20.4 45.2 78.3 94.4 99

Joint Council Figure2004 22.2 42.5 63.8 81.4 92.4

FSFC 2003

22.8 46.7 68.7 85.1 95.9

Joint Council Figure2003 22.2 42.2 63.5 80.6 91.5

Reliability refers to the consistency of outcomes that would be observed from an assessment process were it to be repeated.

Page 79: The Reliability Programme: Leading the way to better tests and assessments

2007 – a special year• New specification – 4 units• Awarding Body invited teachers to meeting to discuss grading• New boundaries for criterion judgements were proposed,

with the grade A boundary set lower than in previous years. • Attendance at the Awarding Body meetings was not

compulsory.

Grade A 62Grade B 54Grade C 46Grade D 38Grade E 30

New boundaries (used by College)

Criterion judgments, no disagreements at moderation; Work praised (again) for consistent internal assessment

Grade A 69Grade B 60Grade C 51Grade D 42Grade E 33

Adjusted boundaries (summer 2007)

New boundaries close to historic grade boundaries which the awarding body had sought to change

Page 80: The Reliability Programme: Leading the way to better tests and assessments

ANALYSISValue added scores

2005: +0.4 2006: +0.4 2007: -0.3 2008: +0.4

Chi-squared test

A B C D E U2003-2006 21.8% 27.1% 30.1% 14.3% 4.75% 2%2007 Expected 107.9 134.1 149 70.8 23.3 9.92007 Actual 70 116 174 101 20 14

Chi-sq 13.32 2.45 4.2 12.9 0.46 1.7 sum 35.02

Tables give 18.47 at 0.1% significance level

Assuming similar ability of cohort, agreed with moderator, the chances of this change occurring randomly is infinitessimaL

Page 81: The Reliability Programme: Leading the way to better tests and assessments

Was this a reliable assessment?

• College immediately contacted Board and was told to appeal• College appeal, sending copy of letter to Ofqual and the

Chief Executive• Appeal heard by three members who were interested only

in process• Appeal was rejected• No doubt the process was followed assiduously• However, the process was flawed

Page 82: The Reliability Programme: Leading the way to better tests and assessments

ConclusionsLarge cohorts from open access colleges are representative of the whole population

Large cohorts of students therefore provide an opportunity for an additional check on processes

Statistical analysis of the entire cohort will hide flaws in the assessment process

An error is associated with every measurement but some measurements are error(mistake)-ridden – and unfair.

Is error(mistake) designed into the assessment instrument?

Awarding bodies are not keen to admit it!

Reliability refers to the consistency of outcomes that would be observed from an assessment process were it

to be repeated.

Page 83: The Reliability Programme: Leading the way to better tests and assessments

Questions and Answers to the Panel of Speakers

Chair: Glenys Stacey, Ofqual Chief Executive

Page 84: The Reliability Programme: Leading the way to better tests and assessments

Ofqual’s Reliability ProgrammeClosing remarks

Dennis Opposs

Page 85: The Reliability Programme: Leading the way to better tests and assessments

Ofqual Board recommendations

Recommendation 1: Continue work on reliability as a contribution to improving

the quality assessment of qualifications, examinations and tests.

Work in the areas of teacher assessment, workplace-based assessment and construct validity of assessment would be of particularly interest and importance

The scope of the work possible will clearly be limited by the resource available.

Page 86: The Reliability Programme: Leading the way to better tests and assessments

Ofqual Board recommendations

Recommendation 2: Encourage Awarding Organisations to generate and

publish reliability data. We need to use impact assessments to help decide what

is appropriate. The first progress is likely to involve GCSEs and A levels

where the work has progressed furthest. In due course we might make some of this regulatory

requirements for Awarding Organisations.

Page 87: The Reliability Programme: Leading the way to better tests and assessments

Ofqual Board recommendations

Recommendation 3: Continue to improve public and professional

understanding of reliability and increase public confidence in the examination system by working with the Awarding Organisations and others.

Page 88: The Reliability Programme: Leading the way to better tests and assessments

Next steps

Publishing reliability compendium later this year Reliability work becomes “business as usual” Creation of a further policy

Page 89: The Reliability Programme: Leading the way to better tests and assessments

Today

Tell us your opinions or email them to

[email protected]

Page 90: The Reliability Programme: Leading the way to better tests and assessments

Thank you for attending

Networking Lunch