Top Banner
Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease Erik Kjoller a,c, * , Jorgen Hilden b , Per Winkel c , Niels J. Frandsen d , Soren Galatius e , Gorm Jensen f , Jens Kastrup g , Jorgen Fischer Hansen h , Hans J. Kolmos i , Christian M. Jespersen h , Per Hildebrandt j , Christian Gluud c , the CLARICOR Trial Group a Department of Cardiology, Copenhagen University Hospital, Herlev Hospital, Herlev Ringvej 75, DK-2730 Herlev, Denmark b Department of Biostatistics, University of Copenhagen c The Copenhagen Trial Unit, Centre for Clinical and Interventional Research, Copenhagen University Hospital, Rigshospitalet d Department of Cardiology, Copenhagen University Hospital, Amager Hospital e Department of Cardiology, Copenhagen University Hospital, Gentofte Hospital f Department of Cardiology, Copenhagen University Hospital, Hvidovre Hospital g Department of Cardiology, Copenhagen University Hospital, Rigshospitalet h Department of Cardiology, Copenhagen University Hospital, Bispebjerg Hospital i Department of clinical Microbiology, Odense University Hospital j Department of Cardiology, Copenhagen University Hospital, Frederiksberg Hospital Accepted 14 September 2011; Published online 17 January 2012 Abstract Objective: In clinical trials, agreement on outcomes is of utmost importance for valid estimation of intervention effects. As there is limited knowledge about adjudicator agreement in cardiology, we examined the level of agreement among three cardiology specialists adjudicating all possible events in a randomized controlled clinical trial of patients with stable coronary heart disease. Study Design and Setting: All information (hospital records, death certificates, etc.) was forwarded to two randomly selected blinded adjudicators. If they disagreed, the third arbiter had to choose the more likely of the two alternatives. Files of 5,475 nonfatal and 362 fatal events were evaluated. Results: For nonfatal outcomes, pairwise kappa values ranged from 0.75 to 0.80. The three adjudicators had 4.3%, 9.5%, and 6.1% of their nonfatal outcome classifications overruled by their arbiter. If stable angina pectoris, unstable angina pectoris, and acute myocardial infarction were treated as one, agreement increased minimally. For fatal outcomes, the pairwise kappa values ranged from 0.65 to 0.90. The three adjudicators had 12%, 9%, and 10% of their death classifications overruled. Conclusion: Specialists in cardiology can attain a reasonably high agreement on outcomes in patients with stable coronary heart disease. Ó 2012 Elsevier Inc. All rights reserved. Keywords: Outcome assessment; Clinical trial; Myocardial infarction; Prognosis; End point; Randomized controlled trial 1. Introduction Randomized clinical trials represent the gold standard when the effects of interventions are tested. The validity and trustworthiness of clinical trials are evaluated in terms of patient selection, methodologies for randomization (generation of the allocation sequence and allocation con- cealment), blinding, compliance with the trial protocol, choice of protocol-defined outcome measures and how outcomes are ascertained, and statistical analyses. The meth- odological problems concerning patient selection and trial conduct have drawn a lot of attention as opposed to the eval- uation of outcome measures. The requirements concerning outcomes in trials are that outcome registration should be complete and correct and based on prespecified criteria. Any outcome that requires a ‘‘measurement’’ involving hu- man judgment is inherently error prone and bias prone and hence mandates a blinded adjudication committee, a blinded core laboratory, or both [1] as also recommended by Food and Drug Administration [2] and European Medicines Agency [3]. According to a recent review by Dechartres et al. Funding: None. Ethical approval: Not required. * Corresponding author. Tel.: þ45-38681143; fax: þ45-38684399. E-mail address: [email protected] or [email protected] (E. Kjoller). 0895-4356/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2011.09.011 Journal of Clinical Epidemiology 65 (2012) 444e453
10

Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

Apr 27, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

Journal of Clinical Epidemiology 65 (2012) 444e453

Good interobserver agreement was attainable on outcome adjudicationin patients with stable coronary heart disease

Erik Kjollera,c,*, Jorgen Hildenb, Per Winkelc, Niels J. Frandsend, Soren Galatiuse,Gorm Jensenf, Jens Kastrupg, Jorgen Fischer Hansenh, Hans J. Kolmosi, Christian M. Jespersenh,

Per Hildebrandtj, Christian Gluudc, the CLARICOR Trial GroupaDepartment of Cardiology, Copenhagen University Hospital, Herlev Hospital, Herlev Ringvej 75, DK-2730 Herlev, Denmark

bDepartment of Biostatistics, University of CopenhagencThe Copenhagen Trial Unit, Centre for Clinical and Interventional Research, Copenhagen University Hospital, Rigshospitalet

dDepartment of Cardiology, Copenhagen University Hospital, Amager HospitaleDepartment of Cardiology, Copenhagen University Hospital, Gentofte HospitalfDepartment of Cardiology, Copenhagen University Hospital, Hvidovre Hospital

gDepartment of Cardiology, Copenhagen University Hospital, RigshospitalethDepartment of Cardiology, Copenhagen University Hospital, Bispebjerg Hospital

iDepartment of clinical Microbiology, Odense University HospitaljDepartment of Cardiology, Copenhagen University Hospital, Frederiksberg Hospital

Accepted 14 September 2011; Published online 17 January 2012

Abstract

Objective: In clinical trials, agreement on outcomes is of utmost importance for valid estimation of intervention effects. As there islimited knowledge about adjudicator agreement in cardiology, we examined the level of agreement among three cardiology specialistsadjudicating all possible events in a randomized controlled clinical trial of patients with stable coronary heart disease.

Study Design and Setting: All information (hospital records, death certificates, etc.) was forwarded to two randomly selected blindedadjudicators. If they disagreed, the third arbiter had to choose the more likely of the two alternatives. Files of 5,475 nonfatal and 362 fatalevents were evaluated.

Results: For nonfatal outcomes, pairwise kappa values ranged from 0.75 to 0.80. The three adjudicators had 4.3%, 9.5%, and 6.1% oftheir nonfatal outcome classifications overruled by their arbiter. If stable angina pectoris, unstable angina pectoris, and acute myocardialinfarction were treated as one, agreement increased minimally. For fatal outcomes, the pairwise kappa values ranged from 0.65 to 0.90. Thethree adjudicators had 12%, 9%, and 10% of their death classifications overruled.

Conclusion: Specialists in cardiology can attain a reasonably high agreement on outcomes in patients with stable coronary heartdisease. � 2012 Elsevier Inc. All rights reserved.

Keywords: Outcome assessment; Clinical trial; Myocardial infarction; Prognosis; End point; Randomized controlled trial

1. Introduction

Randomized clinical trials represent the gold standardwhen the effects of interventions are tested. The validityand trustworthiness of clinical trials are evaluated in termsof patient selection, methodologies for randomization(generation of the allocation sequence and allocation con-cealment), blinding, compliance with the trial protocol,

Funding: None.

Ethical approval: Not required.

* Corresponding author. Tel.: þ45-38681143; fax: þ45-38684399.

E-mail address: [email protected] or [email protected] (E.Kjoller).

0895-4356/$ - see front matter � 2012 Elsevier Inc. All rights reserved.

doi: 10.1016/j.jclinepi.2011.09.011

choice of protocol-defined outcome measures and howoutcomes are ascertained, and statistical analyses. Themeth-odological problems concerning patient selection and trialconduct have drawn a lot of attention as opposed to the eval-uation of outcome measures. The requirements concerningoutcomes in trials are that outcome registration should becomplete and correct and based on prespecified criteria.

Any outcome that requires a ‘‘measurement’’ involving hu-man judgment is inherently error prone and bias prone andhence mandates a blinded adjudication committee, a blindedcore laboratory, or both [1] as also recommended by Foodand Drug Administration [2] and European MedicinesAgency [3]. According to a recent review by Dechartres et al.

Page 2: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

445E. Kjoller et al. / Journal of Clinical Epidemiology 65 (2012) 444e453

What is new?

Key findings:Three adjudication committee members who are spe-cialists in cardiology are able to:

- in the case of non-fatal hospital admissions, toobtain a high degree of accuracy on cardiovascularoutcomes;

- in the case of death, to obtain an agreement rate thatis high on death being cardiovascular or not, butlower as regards the precise cardiovascular cause.

What this adds to what is known?- Nearly nothing is known on the certainty of out-come evaluation performed by the adjudicationcommittee members in relation to the final resultobtained by the adjudication committee.

What is the implication, what should change now?- Clinical trials should, as part of the quality controlof the outcome evaluations publish the agreementrate between the adjudication committee members.The next steps are:

- What is the effect of differences in protocols forresolving disagreements?

- Is the uncertainty of outcome evaluations affectedby the number of adjudicators?

- To what extent does uncertainty in the outcomeclassification affect the conclusions of trials?

[4], covering trials from 2004 to 2005, only a minority ofthe reports had comments on the reliability of adjudicationof outcomes, and data verification or readjudication by in-dependent monitors was rare. Only one study, in the courseof comparing blinded and unblinded evaluation of selectedcardiovascular outcomes, performed a reassessment [5].According to our knowledge, no previous studies havetested the agreement between adjudicators evaluating allpossible clinical outcomes, that is, not only fatal but alsononfatal outcomes, in cardiology.

The purpose of the present study is to evaluate the interob-server agreement among adjudication committeemembers ina large cardiovascular clinical trial, when records about allevents that might signal an outcome were available foradjudication.

2. Methods

The CLARICOR trial (ControlledTrials.gov NCT00121550) is a randomized, placebo-controlled, multicenter

trial including 4,373 patients with stable coronary heart dis-ease, using central randomization and blinding of all partiesin all phases. The potential presence of a chronic Chla-mydia pneumoniae infection in coronary arteries was thebasis for the hypothesis that clarithromycin might preventcoronary outcomes and deaths. The trial design [6] and2.6-year [7] and 6-year follow-ups [8] have been published.All patients in Copenhagen with a diagnosis of myocardialinfarction or angina pectoris (International Statistical Clas-sification of Diseases codes I 209-219) during the years1993e99 were identified and, if alive, invited via mail inlate 1999. The Copenhagen Trial Unit coordinated and con-ducted patient identification, randomization, and data man-agement. Five academic cardiology departments conductedpatient enrollment. Patients were centrally randomized toclarithromycin 500 mg per day for a fortnight vs. placebo.

2.1. Follow-up, information sources, and outcomemeasures

No follow-up visits were planned. Information about hos-pital admissions came from the Danish National HospitalRegister, a database of all somatic hospital admissions. Allhospitalizations are recorded as no hospital contact can bebrought to an end without a diagnosis and registration. Infor-mation about death came from the Danish Central Civil Reg-ister, which records the vital status of all inhabitants.Registration is close to 100% in these registers. All potentialoutcome events, that is, hospital admissions and deaths, wereevaluated to secure a uniform adjudication of the componentsthat got into the three main outcome measures [7]. The pri-mary outcome measure was a composite of time to mortalityregardless of cause,myocardial infarction, or unstable anginapectoris. The secondary outcome measure consisted of car-diovascular mortality, myocardial infarction, or unstable an-gina pectoris. The tertiary outcome (or better, secondsecondary) measure consisted of cardiovascular mortality,myocardial infarction, unstable angina pectoris, cerebrovas-cular attacks, or peripheral vascular disease. The presentanalysis, however, addresses each outcome components, thatis, diagnostic category individually, not composites, and thusintends to be applicable to cardiovascular trialsmore broadly.

2.2. Adjudication criteria for outcome components

When evaluating nonfatal events, the adjudicator firsthad to make a distinction between noncardiovascular andcardiovascular diseases. For the cardiovascular disease cat-egory, one of the following specific outcome diagnoses wascoded: acute myocardial infarction (AMI), unstable anginapectoris, stable angina pectoris with revascularization, cere-brovascular disease, peripheral vascular event, or other car-diovascular disease. The diagnoses of myocardial infarctionrequired elevation of creatine kinase isoenzyme muscleband or troponin and significant ST changes in the electro-cardiograph (ECG) consistent with myocardial ischemia or

Page 3: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

446 E. Kjoller et al. / Journal of Clinical Epidemiology 65 (2012) 444e453

myocardial infarction [9]. Long-lasting chest pain or chestpain at rest without major changes in enzymes was classi-fied as unstable angina pectoris. Heart failure was diag-nosed in accordance with the recommended criteria [10].Cerebrovascular disease comprises focal cerebral deficitssubclassified as strokes or transient ischemic attacks.Peripheral vascular event encompassed peripheral arterialthromboembolism, surgical or transluminal angioplasty, orsevere claudication.

When evaluating fatal events, the cause of death was themorbid condition leading to death, in the sense of the inter-national form prescribed for medical certificates of causesof death, and administered according to the Manual ofthe International Statistical Classification of Diseases,Injuries, and Causes of Death (World Health Organization,1993). The information forwarded to the adjudicators wasthe hospital record and/or a death certificate. The adjudica-tor first had to evaluate the quality of information. Whenboth were missing, the cause of death was classified asunknown. When information existed, a distinction betweennoncardiovascular and cardiovascular deaths was per-formed. Death was assumed to be because of cardiovascu-lar causes unless a noncardiovascular cause was identified.For the cardiovascular outcome, one of the following diag-noses was coded: AMI, unstable angina pectoris, stableangina pectoris with revascularization, heart failure, cere-brovascular disease, other specified cardiovascular disease,or cardiovascular disease not specifiable.

2.3. Procedure for evaluation of events

Three administrative types of events had to be handled:a patient’s hospital admission followed by his/her dischargealive from the hospital, a hospital admission followed bydeath at the hospital, and a patient’s death outside the hospi-tal. The coordinating center collected copies of completehospital records, including laboratory tests, ECG, dischargeletters for events of types 1 and 2, and death certificates forall events of types 2 and 3 during the 2.6-year follow-up.For each event, all information regarding the event was for-warded together with an event form to a pair of adjudicators,that is, two randomly chosen members from the adjudicationcommittee. The randomization was computer generated andbuilt into the computerized data management system.

The adjudicators (I, II, and III) were fully certified special-ists in cardiology (E.K., N.J.F., S.G.), who have had theirtraining in Denmark at various cardiology departments, butotherwise they are not closely professionally connected.All have research experience and acquired a PhD or a DMSc.At the time of adjudication, they had 15e25 years of hospitalexperience in their specialty and were working as coheads ofdepartments of cardiology in separate public hospitals. Theywere recruited from outside the CLARICOR Steering Group(E.K. entered the Steering Group after the follow-up wascompleted) and thus did not otherwise take part in the plan-ning or conduct of the trial.

They used a prespecified coding flowchart to assess allhospital admissions and deaths. They returned the filled-in forms to the coordinating center. If the two evaluationsdisagreed, both forms were sent together with copies ofthe relevant records of the event to the third adjudicator,who would serve as an arbiter; he would have to selectthe proposal he found more likely and overrule the lesslikely. The adjudicator was required to mark whether heserved as one of the first two adjudicators or served asthe arbiter, selecting one of the evaluations of the firsttwo adjudicators. After adjudication of the first 50 eventsand after adjudication of the first 200 events, the threeadjudicators met to discuss the previous 50 adjudications.

3. Materials

From randomization until the end of 2.6-year follow-up,September 16, 2002, a total of 921 patients (22%) were dis-charged alive after one ormore hospital admissions. This gaverise to 7,335 department discharge records, reflecting 5,540separate nonfatal hospital disease episodes (type 1 events),which led to 5,475 evaluable adjudication events (Fig. 1A).

A total of 384 (8.8%) patients died during follow-up, butonly 133 deaths were documented by hospital records. For14 deaths, no information on circumstances of death wasavailable. For a further eight deaths, the individual adjudi-cators’ assessments behind the consensus cause of deathwere unclear, leaving 362 deaths available for adjudicatoragreement analysis (Fig. 1B).

3.1. Statistical analysis of agreement

Weapplied the same statistical principles in the analysis ofdata on nonfatal and fatal events. For each of the listed diag-nostic categories (and each pair of adjudicationmembers [notreported here]), the overall agreement (OA) proportion wascalculated5 (aþ d )/(aþ bþ cþ d ), where a is the numberof times two adjudicators agreed that the event belonged tothe diagnostic category, d the number of times they both re-jected the category, while b and c count the (two possibletypes of) disagreements. Thus, the OA is the fraction of ad-missions in which both the two judges chose or rejected thecategory concerned. This percentage is generally quite high,but it may be somewhat misleading. Instead, we present thepositive agreement (PA), also known as ‘‘repeat frequency’’[11], which estimates the probability that an adjudicator willchoose the diagnostic class in question given that his coadju-dicator did so; similarly, the negative agreement (NA) showshow often an adjudicator will reject this diagnostic classgiven that his coadjudicator did so: PA5 2a/(2aþ bþ c)and NA5 2d/(2dþ bþ c).

One would also like to know the probability that an adjudi-cator’s initial choice will end up as the adjudication commit-tee’s final answer. To this end, we report positive (negative)confirmation percentages, defined as the probability that the

Page 4: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

latipsoh04,55

s e d o s i p e e s a e s i d

n i d e t c e l f e r (

t n e m t r a p e d 5 3 3 7,

) s d r o c e r

s h t a e d 4 8 3

a t a d e t e l p m o c n I

(30)8 1

n o i t a c i d u j d a o N

e t e l p m o c n I

8 a t a d

a t a d o N

(56)7 4

e l b a u l a v e 5 7 4 5

s n o i t a c i d u j d a

)9427,(

4 1

e l b a u l a v e 2 6 3

s n o i t a c i d u j d a

BA

Fig. 1. Flow diagrams of nonfatal and fatal events data forwarded to the adjudicators. (A) Nonfatal events: Because of transfer between hospitaldepartments, each adjudication task comprised one or more department patient records, shipped to adjudicators as a unit (henceforward calleda ‘‘hospital disease episode’’). Of 5,540 hospital disease episodes (7,335 department records), 5,475 (99%) provided usable data for evaluationof the adjudication process. ‘‘No adjudication’’ covers hospital contacts without clinical information or data loss. (B) Fatal events: Of 384 deaths,usable records for evaluation of the adjudication process were achieved in 362 (94%). In 14 cases, only the date of death is known.

447E. Kjoller et al. / Journal of Clinical Epidemiology 65 (2012) 444e453

presence (absence) of the category is confirmed either by thecoadjudicator or, in case of disagreement, by the arbiter. Ifnot confirmed, an adjudication answer is said to be overruled;with our protocol, this happens when, and onlywhen, the arbi-ter supports the coadjudicator’s answer. Details of the calcula-tions can be found in the Appendix.

We calculated the associated asymptotic (i.e., large-sample) standard errors. For each set of comparisons betweentwo adjudicators, Cohen kappa was calculated and 95% con-fidence interval stated [12] for the OA of the classificationinto the several above-mentioned diagnostic categories.

l a v e s e m o c t u o l a t a f none l b i s s o p 5 7 4 5, . s r o t a c i d u j d a e e r h t e h t y b d e t a u

l a t o T I I I r o t a c i d u j d A I I r o t a c i d u j d A I r o t a c i d u j d A

)452( 5 4 8 1,

)927(5745,)402( 8 1 8 1,

)172( 2 1 8 1,

b d e t a u l a v e s e m o c t u o l a t a f 2 6 3 y d a e e r h t e h t j . s r o t a c i d u

l a t o T I I I r o t a c i d u j d A I I r o t a c i d u j d A I r o t a c i d u j d A

)7( 2 0 1 2 0 1 (

)37(2 6 3 )72( 0 2 1

)93( 0 4 1

s t n e m e e r g a s i d f o r e b m u n e h t s t e k c a r b n I

Fig. 2. Assignment of adjudication tasks to pairs of adjudicators and(in parentheses) the resulting number of disagreements.

4. Results

4.1. Agreement in adjudication of nonfatal events

The nonfatal hospital disease episodes (type 1 events)were distributed to the three pairs of adjudicators as inFig. 2, which also gives the associated numbers of disagree-ments. Altogether of the 5,475 evaluations performed, one ofthe two diagnoses suggested was overruled by the arbiter in729 cases of disagreement, reflecting a disagreement propor-tion of 13.3%. The risk of being overruled was, therefore, ex-actly one half of this, namely, 729/(2� 5,475)5 6.7%.Individually, the adjudicators had 4.3%, 9.5%, and 6.1% oftheir diagnoses overruled. If unstable angina pectoris, revas-cularization because of stable angina pectoris, and AMIweretaken as one outcome, the overall fraction overruled droppedto 6.3% and individually to 4.0%, 9.0%, and 5.9%.

Table 1 shows the agreement statistics, and Table 2 thedetails. Using unstable angina pectoris as an illustration,we observed that in 96% of the cases the two initial adju-dicators agreed. If one member had excluded the diagnosis,then the probability that the other one would also excludethe diagnosis was 98%. However, if one adjudicator madethe diagnosis of unstable angina pectoris, the probability

that the other adjudicator would characterize the problemas unstable angina pectoris was only 40% (in fact, Table 2shows that his assessment would typically be ‘‘other cardio-vascular disease’’). The NA was generally high; the PAlower, reflecting the difficulty of reaching agreement ona more specific predicate (in the notation of the statisticssection above, the d’s are large and the a’s mostly small).

In our trial and other typical coronary heart disease tri-als, cross-confounding of stable angina pectoris, unstableangina pectoris, and myocardial infarction is inconsequen-tial. The final line of Table 1 shows the effect of treatingthese diagnostic categories as one outcome.

The OA of the classification into one of the seven nonfataloutcomes showed kappa5 0.77 (0.75e0.80), 0.80 (0.77e0.82), and 0.74 (0.71e0.76) for the three pairs of adjudicators,respectively. The kappa values increased by 0.01when the sta-ble angina pectoris, unstable angina pectoris, and myocardialinfarction are taken as one outcome. Adjudicators had differ-ent diagnostic favorites in unclear situations, causing the

Page 5: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

Table 1. Agreement on outcome for 5,475 nonfatal events

Outcome OA between pairs of adjudicators PA NA

Noncardiovascular disease 0.92 0.93 (0.97) 0.90 (0.93)Stable angina pectoris having revascularization 0.99 0.89 (0.94) 1.00 (1.00)Unstable angina pectoris 0.96 0.40 (0.57) 0.98 (0.99)AMI 0.99 0.83 (0.93) 0.99 (1.00)Cerebrovascular disease 0.99 0.85 (0.91) 0.99 (1.00)Peripheral vascular disease 0.99 0.81 (0.93) 1.00 (1.00)Other specified cardiovascular disease 0.89 0.79 (0.88) 0.93 (0.97)AMI, unstable angina pectoris, and stable angina pectoris having

revascularization, when combined0.96 0.79 (0.86) 0.98 (0.99)

Abbreviations: OA, overall agreement; PA, positive agreement; NA, negative agreement; AMI, acute myocardial infarction.Each of seven outcomes is compared with the other six outcomes combined. PA and NA refers to the chance that the coadjudicator concurs that

the outcome is present or absent. In parentheses, the corresponding chance that the adjudication committee’s final answer concurs on the presenceor absence; see Methods section. Also shown is the effect of treating elective revascularization, unstable angina pectoris, and AMI as one category.Underlying counts, see Table 2. Standard errors are everywhere below 0.04.

448 E. Kjoller et al. / Journal of Clinical Epidemiology 65 (2012) 444e453

marginal distributions to differ throughout the pairwise analy-ses (statistics omitted).

In Table 2, each paired adjudication is marked in the ap-propriate (row and column) field. The column represents, incases of disagreement, the diagnostic answer that the arbi-ter adopted and the row the outcome suggestion he therebyoverruled. The disagreement pattern is without surprisingfeatures, and it appears fairly symmetric except that, whena disagreement involves unstable angina pectoris againsta less specific condition, the arbiter appears to prefer thelatter. Similarly, he who acts as an arbiter tends to prefernoncardiovascular to ‘‘other cardiovascular’’ conditions(compare entries 234 against 131).

4.2. Agreement in adjudication of fatal events

The fatal disease episodes (type 2 and 3 events) weredistributed to the three pairs of adjudicators as in Fig. 2,which also gives the associated numbers of disagreements.The overall fraction overruled was 73/(2� 362)5 10.1%.The overruled fatal diagnoses for adjudicators I, II, andIII were 12%, 9%, and 10%, respectively.

Table 3 (analogous with Table 1) shows the agreementpercentages and Table 4 the details. Both tables reflectthe distinction between noncardiovascular and seven clas-ses of cardiovascular causes of death, ‘‘inadequate informa-tion’’ being a separate alternative.

The three pairs of adjudicators showed kappa values of0.90 (0.83e0.97), 0.68 (0.58e0.78), and 0.65 (0.56e0.74)when classifying into the nine outcomes.

5. Discussion

The main result of the present study is that adjudicatorswho are specialists in cardiology are able to obtain a reason-ably high degree of agreement on nonfatal outcomes and, byinference, accuracy on cardiovascular outcomes among non-fatal hospital admissions, solely on the basis of the informa-tion recorded during daily clinical practice, that is, without

seeing the patient. The agreement proportion can be in-creased further if no distinction needs to be made betweendifferent manifestations of ischemic cardiac disease, that is,AMI, unstable angina pectoris, or stable angina pectoris be-ing treated with revascularization. For fatal outcomes, theagreement is high on the death being cardiovascular or notbut somewhat lower with regard to the specific manifestationof cardiovascular mortality.

5.1. Strengths and weaknesses of our study

The evaluation of the adjudication process was carriedout within a large multicenter trial that was carefully con-ducted and satisfies the standard requirements of a modernrandomized clinical trial. All adjudicators were fullyblinded to the intervention given in the trial. It is a furtherstrength of the trial that all patients were followed until theend of the follow-up period, and all candidate events, thatis, all hospital admissions and deaths, were independentlyand blindly evaluated by pairs of members of the adjudica-tion committee, who worked under a formal protocol, in-cluding a randomized rotation of the tasks of its members.

Another strength of the present study is the focus on thekey outcome diagnostic categories, which form the mainbuilding blocks of composite outcome measures usuallyused in cardiology.

This study has been facilitated by the fact that all Danishcitizens are registered by a 10-digit number (the central per-son registration number), which is used at all contacts withthe health care system. Thus, all candidate events wereevaluated by adjudicators. The reader may see it asa strength or weakness that the adjudicators had no oppor-tunity to examine the patients but saw only ‘‘paper pa-tients.’’ In our study, a copy of the full patient charts wassent to the adjudicators, that is, they had access to the samedata as the doctor taking care of the patient. If the pair ofrandomly selected adjudicators disagreed, the consensusprotocol required the third member of the adjudicationcommittee to select the more likely of the two options. Inthis way, a high stability may be achieved, and we avoided

Page 6: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

Table 2. Nonfatal (5,475) hospital admissions evaluated by two adjudicators, that is, 10,950 assessments

Initial outcome options

Final outcome

Noncardiovasulardisease

Stable anginapectoris havingrevascularization

Unstable anginapectoris AMI

Cerebrovasculardisease

Peripheralvasculardisease

Othercardiovascular

diseaseOverruled,total (%)

Confirmed,total

Overall use(overruled or

confirmed), total

Noncardiovasculardisease

*3,016 4 6 15 7 131 163 (3) 6,330 6,493

Stable angina pectorishavingrevascularization

1 *153 5 3 11 20 (6) 325 345

Unstable anginapectoris

18 3 *66 19 103 143 (43) 188 331

AMI 4 11 *176 15 30 (7) 394 424Cerebrovascular disease 32 *158 2 34 (9) 339 373Peripheral vascular

disease9 *85 6 15 (7) 195 210

Other cardiovasculardisease

234 16 36 14 8 16 *1,092 324 (12) 2450 2,774

Sums 3,314 172 122 218 181 110 1,358 729 (6.7) 10,221 2n5 10,950

Abbreviation: AMI, acute myocardial infarction.On the left, each paired adjudication is marked in the appropriate (row and column) field, the column representing the final diagnosis. Cases of agreement between the initial adjudicators fall

along the diagonal (as row and column match; entries marked with *). Cases of disagreement fall outside the diagonal; here, the column reflects the arbiter’s adopting one of the two diagnosesinitially suggested, whereas the row represents the adjudicator answer thereby overruled by the arbiter. The sums on the right show how often the row label was overruled, confirmed, and usedoverall (overruled or confirmed).

Interpretation example: Consider ‘‘peripheral vascular disease.’’ There were 85 cases of initial agreement, involving 170 (52� 85) claims, which were immediately confirmed by a followadjudicator without the need for an arbiter. Outside the diagonal, the row holds 15 (59þ 6) initial diagnoses overruled by the arbiter, and the column holds 25 (57þ 2þ 16) initial diagnosesconfirmed by the arbiter (although one of the initial adjudicators had made another diagnosis). Thus, a diagnosis of peripheral vascular disease was overruled 15 and confirmed 195 times out of210 (the three totals on the right).

449

E.Kjoller

etal./JournalofClin

icalEpidem

iolog

y65(2012)444e

453

Page 7: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

Table 3. Agreement on outcome for 362 fatal events

ClassificationOA between pairsof adjudicators PA NA

Stable angina pectoris having revascularization 0.99 0.40 (0.40) 1.00 (1.00)Unstable angina pectoris 1.00 0.89 (0.89) 1.00 (1.00)AMI 0.92 0.82 (0.91) 0.95 (0.98)Heart failure 0.94 0.69 (0.80) 0.97 (0.99)Cerebrovascular disease 0.99 0.91 (0.96) 0.99 (1.00)Other specified cardiovascular disease 0.96 0.65 (0.84) 0.98 (0.99)Cardiovascular disease, nonspecifiable 0.91 0.32 (0.62) 0.95 (0.98)Noncardiovascular disease 0.92 0.91 (0.97) 0.93 (0.95)Insufficient information 0.97 0.69 (0.89) 0.98 (0.99)AMI, unstable angina pectoris, and elective revascularization, when combined 0.92 0.82 (0.90) 0.95 (0.98)

Abbreviations: OA, overall agreement; PA, positive agreement; NA, negative agreement; AMI, acute myocardial infarction.Each of nine classes is compared with the other classes combined. PA and NA refers to the chance that the coadjudicator concurs that the cause

is present or absent. In parentheses, the corresponding chance that the adjudication committee’s final answer concurs on the presence or absence;see Methods section. Also shown is the effect of treating elective revascularization, unstable angina pectoris, and AMI as one category. Some fre-quencies are based on very small numbers, see Table 4; standard errors are everywhere below 0.09.

450 E. Kjoller et al. / Journal of Clinical Epidemiology 65 (2012) 444e453

the need for an additional consensus procedure for cases ofthree-way disagreement.

As mentioned previously, a small number of adjudicationtasks were nonevaluable because of incomplete data or nodata. In retrospect, it can also be seen as a weakness thatwe did not gather the committee for any training sessions be-fore the inception of their work. However, the adjudicatorsmet twice during the assessment of the first 200 outcomes,which has likely contributed to the high agreement.

The statistics attained in an adjudicator comparison de-pend on the number and types of diagnostic categoriesand on the richness of the material available to the adjudi-cators. In the present study, the materials available for judg-ing a cause of death certainly did differ widely (compareMaterials section), but hardly more than this may be ex-pected in similar investigations and in any case identicalto that of daily clinical practice.

5.2. Other studies of the operation of adjudicationcommittees

5.2.1. GuidelinesThe importance of event adjudication, especially in car-

diovascular contexts, has recently been discussed [13,14],but no general recommendation exists apart from a guide-line proposal by Dechartres et al. [4].

5.2.2. Data available to adjudicatorsAccording to the overview by Dechartres et al. [4], the

method of selecting events for adjudication was describedin 88% of randomized trials, and of these, a search for pos-sible fatal events in national registers was only performedin 9%. Only half the randomized trial reports [4] informtheir readers what documents were provided to the adjudi-cation committee, and in only 25%, it included a copy ofcomplete medical patient record notes, as in our presentstudy.

5.2.3. Size of committee and ensuring coherenceThree adjudicators formed the adjudication committee

in our study, as in most clinical trials [4]. A larger commit-tee may be more representative of the specialist populationand the workload for each member lower, but it also im-plies a greater risk of loss of uniformity and adherence toprotocol. No data are available on the optimal number ofadjudicators, but having three adjudicators has been sug-gested as a good compromise [15]. With regard to the train-ing sessions, training or education was only described in6% of trial reports [4]. We used some training, but it canalways be discussed if more training would have been costeffective and had increased the agreement significantly.

5.2.4. Adjudication protocolIn 34 of 118 randomized trials where information on the

process of adjudication was given, only 21 of 34 (62%) de-scribed how consensus was obtained [4]. In 14 of 34 trials(41%), the whole adjudication committee reviewed eachcase, and in 12 of 34 trials (35%), all cases of disagreementwere to be resolved by the entire adjudication committee.When the enrolling centers participated in the adjudicationprocess, the event papers were sent to one adjudicator andreviewed by a second adjudicator only if the first disagreedwith the enrolling center [14]. To our knowledge, no previ-ous study has distributed the adjudication tasks by random-ization. When combined with the purely mechanicalconsensus protocol, randomization ensures a uniformly rep-resentative process with a minimum of group pressure orother kinds of psychological distortion.

5.2.5. Role of participating centers in outcome assessmentIn most randomized clinical trials, information on possi-

ble outcomes was obtained at the participating centers,where selection forces may be operating [16,17]. Overall,the participating centers might record too many [18e20]or too few outcomes [16,17,21,22] (and results may

Page 8: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

Table 4. Fatal events (362) evaluated by two adjudicators, that is, 724 assessments

Initial outcome options

Final outcome

Stable anginapectoris havingrevascularization

Unstableanginapectoris AMI

Heartfailure

Cerebrovasculardisease

Other specifiedcardiovascular

disease

Cardiovasculardisease, notspecifiable

Noncardiovasculardisease

Insufficientinformation

Overruled,total (%)

Confirmed,total

Overall use(overruled or

confirmed), total

Stable angina pectorishavingrevascularization

*1 3 3 (60) 2 5

Unstable anginapectoris

*4 1 1 (11) 8 9

AMI *63 3 1 1 6 3 14 (9) 140 154Heart failure 3 *24 1 1 5 4 14 (20) 56 70Cerebrovascular disease *24 2 2 (4) 51 53Other specifiedcardiovasculardisease

3 1 *12 1 1 6 (16) 31 37

Cardiovascular disease,not specifiable

3 2 3 *8 7 4 19 (38) 31 50

Noncardiovasculardisease

3 2 1 2 *141 2 10 (3) 301 311

Insufficient information 1 1 2 *12 4 (11) 31 35

Sums 1 4 77 32 27 19 23 160 19 73 (10) 651 2n5 724

Abbreviation: AMI, acute myocardial infarction.On the left, each paired adjudication is marked in the appropriate (row and column) field, the column representing the final diagnosis. Cases of agreement between the initial adjudicators fall

along the diagonal (as row and column match; entries marked with *). Cases of disagreement fall outside the diagonal; here, the column reflects the arbiter’s adopting one of the two diagnosesinitially suggested, whereas the row represents the adjudicator answer thereby overruled by the arbiter. The sums on the right show how often the row label was overruled, confirmed, and usedoverall (overruled or confirmed).

Interpretation example: Consider ‘‘cerebrovascular cause of death.’’ There were 24 instances of initial agreement, involving 48 (52� 24) cause-of-death claims that were immediately con-firmed by a fellow adjudicator without the need for an arbiter. Outside the diagonal, the row holds two initial diagnoses overruled by the arbiter, and the column holds 3 (51þ 1þ 1) initial di-agnoses confirmed by the arbiter (although one of the initial adjudicators had made another diagnosis). Thus, a diagnosis of cerebrovascular cause of death was overruled 2 and confirmed 51 timesout of 53 (the three totals on the right).

451

E.Kjoller

etal./JournalofClin

icalEpidem

iolog

y65(2012)444e

453

Page 9: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

452 E. Kjoller et al. / Journal of Clinical Epidemiology 65 (2012) 444e453

become significant for the investigator-reported outcomebut not for the adjudicated [23] or reversely [13,19,24]).As with our data, the discrepancies mainly occurred on‘‘softer’’ outcomes, such as angina pectoris and congestiveheart failure [25].

5.2.6. Quality controlSome degree of assessment of the reliability of adjudi-

cation was only found in 10% of randomized trials [4].Evaluation of the agreement between eight adjudicationcommittee members on the cause of death has been per-formed [26]. The total disagreement was 33%, and halfof the disagreement was related to the major classification,20% to minor classification, and the remainder only con-cerned the level of certainty. In the present study, thelevels of agreement on fatal and nonfatal event evaluationsare all higher than that. To our knowledge, only two otherstudies, one of which on a noncardiologic outcome, per-formed a reinvestigation of the outcomes evaluated bythe clinical event committee. Recently Parmar et al. [5],when investigating the possible effect of partial unblindingas to race and geography in a subset of 116 medical re-cords with 175 cardiovascular outcomes, produced agree-ment data that are quite comparable to our results.McGarvey et al. [27] found an agreement frequency withregard to chronic obstructive pulmonary disease related-ness of 84% (kappa5 0.73).

6. Ramifications

An unanswered question is this: To what extent does un-certainty in the classification of outcomes affect the conclu-sions of trials? A related question, which we plan toexamine, is ‘‘What difference would it make to rely exclu-sively on already recorded medical data, that is, on dis-charge diagnoses and death certificates in public registerscompared with using an adjudication committee?

:rotacidujdadn2A IM IMAtoN

ab*

#

b

bIMA

ts1

##

*b–b

ts1dujda :rotaci

#

?

d

c

*ctoNIMA

#

##c

*c–c ?

sesoohcretibrA ;)#(IMA ;)##(IMAtoN detlusnoctonsi )?(

Fig. 3. A modified 2-by-2 table displays the counts that enter into thecalculation of the fraction confirmed AMI diagnoses and related per-centages. AMI, acute myocardial infarction.

7. Conclusions

This study indicates that an adjudication committeeconsisting of three fully trained cardiologists can obtaina reasonably high degree of agreement, as reflected bythe high NA and a somewhat lower PA. Similar evalua-tions need to be performed in other cardiovascular ran-domized trials to ensure satisfactory levels of agreementand to see if differences are rooted in sub-classificationsof cardiovascular outcomes or in uncertainty associatedwith noncardiovascular events. The effect of different pro-tocols for resolving disagreement, numbers of adjudica-tors, and effect of training the adjudication committeemembers need to be investigated. It is also desirable to in-vestigate the reliance on diagnoses retrieved from nationalregistries.

Appendix

Calculation of fraction confirmed after arbitration

The percentages in parentheses in the middle columnof Table 1 or Table 3 show how often an initial diagnos-tic suggestion ended up being confirmed (as opposed tobeing overruled). Recall that the arbiter, when consultedbecause of disagreement, had to choose between thetwo alternatives offered and was not allowed to bringa third outcome category into play. Take AMI as an ex-ample, and consider the b* field in the accompanying,slightly extended, 2-by-2 table: b* is the number of oc-casions on which the first appointed adjudicator suggestsAMI, the second suggests something else, and the third,called in because of disagreement, votes for AMI,thereby confirming the former and overruling the latterdiagnosis. The meaning of the other fields in the figureshould be obvious (Fig. 3).

AMI is suggested altogether 2aþ 1bþ 1c times in theinitial round, out of which 2a instances are immediatelyconfirmed, whereas 2aþ (b*þ c*) is confirmed either im-mediately or, after disagreement, by an arbiter. Hence, theestimated probability that a coadjudicator will also chooseAMI becomes 2a/[2aþ bþ c] (5PA), whereas the fractionultimately confirmed, that is, the estimated chance thatcommittee’s final answer will concur on the presence ofAMI, is [2aþ (b*þ c*)]/[2aþ bþ c]. This is the figuregiven in the parenthesis. Thanks to the arbiter; it is largerthan the PA. The fraction that ends up overruled, it maybe added, is [(b� b*)þ (c� c*)]/[2aþ bþ c].

The numbers in the final columns of Tables 1 and 3 arecalculated in the same manner, except that the predicate‘‘Not AMI’’ is concerned: 2d instances out of [2dþ bþ c]are immediately confirmed, so NA5 2d/[2dþ bþ c];and the parenthesis gives the estimated chance that thecommittee’s final answer will not be AMI, which is[2dþ (b� b*)þ (c� c*)]/[2dþ bþ c].

Page 10: Good interobserver agreement was attainable on outcome adjudication in patients with stable coronary heart disease

453E. Kjoller et al. / Journal of Clinical Epidemiology 65 (2012) 444e453

References

[1] Lauer MS, Topol EJ. Clinical trialsdmultiple treatments, multiple

end points, and multiple lessons. JAMA 2003;289:2575e7.

[2] U.S. Department of Health and Human ServicesdFood and Drug Ad-

ministration. Guidance for Clinical Trial Sponsors Establishment and

Operation of Clinical Trial Data Monitoring Committees. 2006:1e38.Available at http://www.fda.gov/downloads/RegulatoryInformation/

Guidances/ucm127073.pdf. Accessed November 13, 2011.

[3] Committee for Medicinal Products for Human UsedEuropean

Medicines Agency (EMA). Guideline on Data Monitoring Commit-

tees. 2005:1e8. Available at http://www.ema.europa.eu/docs/en_GB/

document_library/Scientific_guideline/2009/09/WC500003635.pdf.

Accessed November 13, 2011.

[4] Dechartres A, Boutron I, Roy C, Ravaud P. Inadequate planning and

reporting of adjudication committees in clinical trials: recommenda-

tion proposal. J Clin Epidemiol 2009;62:695e702.

[5] Parmar G, Ghuge P, Halanych JH, Funkhouser E, Safford MM. Car-

diovascular outcome ascertainment was similar using blinded and un-

blinded adjudicators in a national prospective study. J Clin Epidemiol

2010;63:1159e63.[6] Hansen S, Als-Nielsen B, Damgaard M, Helo OH, Petersen L,

Jespersen CM. Intervention with clarithromycin in patients with sta-

ble coronary heart disease: the CLARICOR trial design. Heart Drug

2001;1:14e9.[7] Jespersen CM, Als-Nielsen B, Damgaard M, Hansen JF, Hansen S,

Helo OH, et al. Randomised placebo controlled multicentre trial to

assess short term clarithromycin for patients with stable coronary

heart disease: CLARICOR trial. BMJ 2006;332:22e7.[8] Gluud C, Als-Nielsen B, Damgaard M, Fischer HJ, Hansen S,

Helo OH, et al. Clarithromycin for 2 weeks for stable coronary heart

disease: 6-year follow-up of the CLARICOR randomized trial and

updated meta-analysis of antibiotics for coronary heart disease. Car-

diology 2008;111:280e7.

[9] Myocardial infarction redefinedda consensus document of The Joint

European Society of Cardiology/American College of Cardiology

Committee for the redefinition of myocardial infarction. Eur Heart

J 2000;21:1502e13.

[10] Guidelines for the diagnosis of heart failure. The Task Force on Heart

Failure of the European Society of Cardiology. Eur Heart J

1995;16:741e51.

[11] Christensen AH, Gjorup T, Hilden J, Fenger C, Henriksen B,

Vyberg M, et al. Observer homogeneity in the histologic diagnosis

of Helicobacter pylori. Latent class analysis, kappa coefficient, and

repeat frequency. Scand J Gastroenterol 1992;27:933e9.

[12] Cohen J. A coefficient of agreement for nominal scales. Educ Psychol

Meas 1960;20:37e46.[13] Granger CB, Vogel V, Cummings SR, Held P, Fiedorek F,

Lawrence M, et al. Do we need to adjudicate major clinical events?

Clin Trials 2008;5:56e60.

[14] Pogue J, Walter SD, Yusuf S. Evaluating the benefit of event adjudi-

cation of cardiovascular outcomes in large simple RCTs. Clin Trials

2009;6:239e51.

[15] Walter SD, Cook DJ, Guyatt GH, King D, Troyan S. Outcome assess-

ment for clinical trials: how many adjudicators do we need? Canadian

Lung Oncology Group. Control Clin Trials 1997;18:27e42.

[16] Mahaffey KW, Harrington RA, Akkerhuis M, Kleiman NS,

Berdan LG, Crenshaw BS, et al. Systematic adjudication of myocar-

dial infarction end-points in an international clinical trial. Curr Con-

trol Trials Cardiovasc Med 2001;2:180e6.

[17] Serebruany VL. The FDA prasugrel review: adjudication of myocar-

dial infarction controversy. Cardiology 2009;114:126e9.[18] Heagerty A, Deverly A, Palmer C, Kaplinsky E, Salvetti A,

Wahlgren NG, et al. The role of the critical event committee in a ma-

jor cardiovascular outcome study. Blood Press 2002;11:339e44.[19] Naslund U, Grip L, Fischer-Hansen J, Gundersen T, Lehto S,

Wallentin L. The impact of an end-point committee in a large multi-

centre, randomized, placebo-controlled clinical trial: results with and

without the end-point committee’s final decision on end-points. Eur

Heart J 1999;20:771e7.

[20] Vejlstrup N, Clemmensen P, Steinmetz E, Hansen KN, Christiansen I,

Hansen PR, et al. Blinded end point adjudication in the Danish Mul-

ticenter Randomized Study on Fibrinolytic Therapy versus Acute

Coronary Angioplasty in Acute Myocardial Infarction (DANAMI-2

trial). Heart 2003;3:127e33.

[21] Kirwan BA, Lubsen J, de Brouwer S, Danchin N, Battler A, Bayes de

Luna A, et al. Diagnostic criteria and adjudication process both deter-

mine published event-rates: the ACTION trial experience. Contemp

Clin Trials 2007;28:720e9.

[22] Mahaffey KW, Roe MT, Dyke CK, Newby LK, Kleiman NS,

Connolly P, et al. Misreporting of myocardial infarction end points:

results of adjudication by a central clinical events committee in the

PARAGON-B trial. Second Platelet IIb/IIIa Antagonist for the Re-

duction of Acute Coronary Syndrome Events in a Global Organiza-

tion Network Trial. Am Heart J 2002;143:242e8.

[23] Mahaffey KW, Harrington RA, Akkerhuis M, Kleiman NS,

Berdan LG, Crenshaw BS, et al. Disagreements between central clin-

ical events committee and site investigator assessments of myocardial

infarction endpoints in an international clinical trial: review of the

PURSUIT study. Curr Control Trials Cardiovasc Med 2001;2:

187e94.[24] Use of a monoclonal antibody directed against the platelet glycopro-

tein IIb/IIIa receptor in high-risk coronary angioplasty. The EPIC In-

vestigation. N Engl J Med 1994;330:956e61.

[25] Curb JD, McTiernan A, Heckbert SR, Kooperberg C, Stanford J,

Nevitt M, et al. Outcomes ascertainment and adjudication methods

in the Women’s Health Initiative. Ann Epidemiol 2003;13(9 suppl):

S122e8.

[26] Petersen JL, Haque G, Hellkamp AS, Flaker GC, Mark ENA 3rd,

Marchlinski FE, et al. Comparing classifications of death in the Mode

Selection Trial: agreement and disagreement among site investigators

and a clinical events committee. Contemp Clin Trials 2006;27:

260e8.

[27] McGarvey LP, John M, Anderson JA, Zvarich M, Wise RA.

Ascertainment of cause-specific mortality in COPD: operations of

the TORCH Clinical Endpoint Committee. Thorax 2007;62:411e5.