Top Banner
HST 190: Introduction to Biostatistics Lecture 6: Methods for binary data 1 HST 190: Intro to Biostatistics
54

HST 190: Introduction to Biostatistics

Oct 03, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HST 190: Introduction to Biostatistics

HST190:IntroductiontoBiostatistics

Lecture6:Methodsforbinarydata

1 HST190:IntrotoBiostatistics

Page 2: HST 190: Introduction to Biostatistics

Binarydata

• Sofar,wehavefocusedonsettingwhereoutcomeiscontinuous

• Now,weconsiderthesettingwhereouroutcomeofinterestisbinary,meaningittakesvalues1or0.§ Inparticular,weconsiderthe2x2contingencytable tabulatingpairsofbinaryobservations(𝑋#, 𝑌#), … , (𝑋(, 𝑌()

HST190:IntrotoBiostatistics2

Page 3: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics3

• Considertwopopulations§ IVdruguserswhoreportsharingneedles

§ IVdruguserswhodonotreportsharingneedles

• Istherateofpositivetuberculinskintestequalinbothpopulations?§ Toaddressthisquestion,wesample40patientswhoreportand60patientswhodonottocompareratesofpositivetuberculintest

§ Datacross-classified accordingtothesetwobinaryvariables2x2table Positive Negative Total

Reportsharing 12 28 40

Don’treportsharing 11 49 60

Total 23 77 100

Page 4: HST 190: Introduction to Biostatistics

Chi-squaretestforcontingencytables

HST190:IntrotoBiostatistics4

• TheChi-squaretestisatestofassociationbetweentwocategoricalvariables.

• Ingeneral,itsnullandalternativehypothesesare§ 𝐻*:therelativeproportionsofindividualsineachcategoryofvariable#1arethesameacrossallcategoriesofvariable#2;thatis,thevariablesarenotassociated (i.e.,statisticallyindependent).

§ 𝐻# :thevariablesareassociatedo Noticethealternativeisalwaystwo-sided

• Inourexample,thismeans§ 𝐻*:reportedneedlesharingisnotassociatedwithPPD

Page 5: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics5

• TheChi-squaretestcomparesobservedcountsinthetabletocountsexpectedifnoassociation(i.e.,𝐻*)§ Expectedcountsareobtainedusingthemarginaltotals ofthetable.

• Recallindependencerule 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃(𝐵),sofrom100people,assumingindependence,weexpect

𝑃 share ∩ positive = 𝑃 share 𝑃 positive =40100

23100 = 0.092

§ Then,we’dexpect0.092 100 = 9.2 positivesharers,insteadof12

2x2table Positive Negative TotalReportsharing 12 28 40

Don’treportsharing 11 49 60

Total 23 77 100

Page 6: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics6

• Similarly,therewilllikelybesomediscrepancybetweenobservedandexpectedcountsfortheotherthreecellsinthetable.§ Chi-squaretestassesses:arethesedifferencestoolargetobetheresultofsamplingvariability?

• StepsofChi-squaretest1) Completetheobserved-datatable

2) Computetableofexpectedcounts

3) Calculatethe𝑋A statistic

4) Getp-valuefromthechi-squaretable

• Thismethodisvalidonlyifallexpectedcounts≥5§ testreliesonapproximationthatdoesnotholdinsmallsamples

Page 7: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics7

1) Completeobserveddatatable

2) Completetableofexpectedcounts

𝐸CD =𝑂C⋅×𝑂⋅D𝑛 =

(𝑂C# + 𝑂CA)(𝑂#D + 𝑂AD)𝑛

3) Calculatechi-squareteststatistic

𝑋A = ∑observed − expected A

expected

=𝑂## − 𝐸## A

𝐸##+

𝑂#A − 𝐸#A A

𝐸#A+

𝑂A# − 𝐸A# A

𝐸A#+

𝑂AA − 𝐸AA A

𝐸AA§ swap𝑂CD − 𝐸CD with 𝑂CD − 𝐸CD − 0.5 forYatescontinuitycorrection

O11 O12 O1.O21 O22 O2.O.1 O.2 n

E11 E12 E1.E21 E22 E2.E.1 E.2 n

Page 8: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics8

4) Getp-valuefromchi-squaredistribution§ Undernullhypothesis𝐻*:noassociationbetweenthetwofactors,the𝑋A statisticfollowsachi-squaredistributionwith1degreeoffreedom.Thisisoftenwrittenas𝑋A~𝜒#A

o continuousandpositive-valued,definedbyoneparameterdf

§ p-valuecomesfromrighttail,butisinherently‘two-sided’o matlab: 1-chi2cdf(x,1)

𝜒#,*.STA = 3.84Area= 0.05

Page 9: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics9

• Thus,atthe𝛼 level,𝐻* isrejectedif𝑋A > 𝜒#,#YZA

• Using2x2contingencytable,analternateformulaforthe

Yatescorrectedteststatisticis𝑋A =( [\Y]^ Y_`

`

([a])(^a\)([a^)(]a\)

𝑋A =100 12(49) − 28(11) − 50 A

(40)(60)(23)(77) = 1.24 < 3.84 = 𝜒#,*.STA

• ⇒ Failtoreject𝐻* 2x2tablePositive

Negative Total

Reportsharing 𝑎 = 12 𝑏 = 28 𝑎 + 𝑏

= 40Don’treport

sharing 𝑐 = 11 𝑑 = 49 𝑐 + 𝑑= 60

Total 𝑎 + 𝑐= 23

𝑏 + 𝑑= 77 𝑛 = 100

Page 10: HST 190: Introduction to Biostatistics

Fisher’sexacttest

HST190:IntrotoBiostatistics10

Whathappensifallexpectedcounts<5?Insteadofchi-squaretest,useaFisher’sexacttest (seeRosner10.3)

• Likethechi-squaretest,Fisher'sexacttestexaminesthesignificanceoftheassociation(contingency)betweenthetwokindsofclassification– rowsandcolumns.

• Bothrowandcolumntotals(a+c,b+d,a+b,c+d)areassumedtobefixed- notrandom.

• Wethenconsiderallpossibletablesthatcouldgivetherowandcolumntotalsobservedandcorrespondingprobabilityofeachconfiguration(ithelpstorealizethatthefirstcount,a,hasahypergeometricdistributionunderthenull)

• Finally,thep-valuesarecomputedbyaddinguptheprobabilitiesofthetablesasextremeormoreextremethantheobservedone.

Page 11: HST 190: Introduction to Biostatistics

Whatifweareinterestedinavariablethathasmorethantwocategories?

Example: Testforassociationbetweeneyecolorandpresenceorabsenceofamutantalleleatsomegeneticlocus.

Eyecolorcategories:blue,green,brown,hazel,gray

Geneticcategories:0copiesmutantallele,

≥1 copymutantallele

11

Chi-squaretestforcontingencytables,RxC

Page 12: HST 190: Introduction to Biostatistics

Thechi-squaretest canbeusedforvariableswithmorethantwocategories.DatapresentedinanRxC table,ageneralizationofthe2x2table:

R =#rows,C =#columns(doesn’tmatterwhichvariableiswhich)

12

blue green brown hazel gray TotalMutantallele

absent 3 7 21 15 15 61

Mutantallelepresent 6 10 18 14 17 65

Total 9 17 39 29 32 126

Page 13: HST 190: Introduction to Biostatistics

Chi-squaretestforRxC tablesame asfor2x2tableexcept:

• Thismethodcanonly beusedifnomorethan1/5ofcellshaveexpectedcount<5ANDifnocellhasexpectedcount<1.

• UnderH0,theX2 teststatisticfollowsachi-squaredistributionon(R-1)(C-1)degreesoffreedom

13

𝑋A = jkkYlkk `

lkk+ jk`Ylk` `

lk`+ …+ jmnYlmn `

lmn

𝑋A~𝜒(oY#)(pY#)A

Page 14: HST 190: Introduction to Biostatistics

Again,wehavetoobtainmarginaltotalstodetermineexpectedcountforeachcell.Forexample…

Theexpectedcountswouldbecalculatedasfollows

blue green brown hazel gray TotalMutantallele

absent 4.36 8.23 18.88 14.04 15.49 61

Mutantallelepresent 4.64 8.77 20.12 14.96 16.51 65

Total 9 17 39 29 32 126

14

E11=q#rS#Aq

= 4.36,… , ERC =qTrsA#Aq

= 16.51

Page 15: HST 190: Introduction to Biostatistics

• UnderH0,𝑋A~𝜒tA

15

X 2 =3− 4.36( )

2

4.36+

7 −8.23( )2

8.23+!+

17 − 16.51( )2

16.51 = 1.80

MATLAB:1-chi2cdf(1.8,4)p-value=0.77

Conclusion:Noevidenceforassociationbetweeneyecolorandmutantalleles.

Page 16: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics16

Whatifweareinterestedinestimatingandquantifyinguncertaintyaboutthedifferenceinproportionsbetweentwogroups?

• e.g.,wantestimateandCIofdifferenceinproportionsofpositivetuberculosisskintestsbetweenneedlesharersandnon-sharers

Approachissimilartotwo-sampleestimationforcontinuousdataquestions,withsubtledifferences!

Two-samplecomparisonofproportions

Page 17: HST 190: Introduction to Biostatistics

Two-samplecomparisonofproportions

HST190:IntrotoBiostatistics17

• Whereaswehavepreviouslyconsideredthedifferenceinmeansofcontinuoustwo-sampledata,wenowcomparetwopopulations’unknownproportions𝑝# and𝑝A.

• Supposewewanttoknowwhethertwocommunitieshavethesameobesityrate.§ Youdrawrandomsamplesfromboth;inthefirstcity,20outof100areobese,whileinthesecond24outof150areobese.

• Goals:§ estimateandcomputethe95%C.I.forthedifferenceinproportions

§ conductasignificancetestatlevel𝛼 = 0.05 foradifference

Page 18: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics18

• Before,wesawthatifarandomexperimenthastwopossibleoutcomes,“success”and“failure”,andwedo𝑛independentrepetitionswithidenticalsuccessprobability𝑝,then𝑋~Bin(𝑛, 𝑝) isthenumberofsuccesses.§ Now,weobserve𝑋#~Bin(𝑛#, 𝑝#) andXA~Bin(𝑛A, 𝑝A) andthenmakeinferenceabout𝑝# − 𝑝A.

• Estimationisidenticaltotwo-samplecontinuouscase:differenceofsampleproportions, �̂�# − �̂�A

• If𝑛#�̂�# 1 − �̂�# ≥ 5 and𝑛A�̂�A 1 − �̂�A ≥ 5,theassociated100 1 − 𝛼 % CIgivenby

�̂�# − �̂�A ± 𝑧#YZA�̂�#(1 − �̂�#)

𝑛#+�̂�A(1 − �̂�A)

𝑛A

Page 19: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics19

• Forexample,considertwosamples

§ 𝑛# = 100, 𝑋# = 20, �̂�# =A*#**

= 0.20, 𝑛#�̂�# 1−�̂�# = 16 ≥ 5

§ 𝑛A = 150, 𝑋A = 24, �̂�A =At#T*

= 0.16, 𝑛A�̂�A(1−�̂�A) = 20.16 ≥ 5

• Thenthe95%CIforthedifferenceis

= (0.20 − 0.16) ± 1.960.2(0.8)100 +

0.16(0.84)150

= 0.04 ± 1.96 0.050 = 0.04 ± 0.10 = −0.06, 0.14

Page 20: HST 190: Introduction to Biostatistics

Hypothesistestingfordifferenceofproportions

HST190:IntrotoBiostatistics20

• Now,consider𝐻*:𝑝# = 𝑝A versus𝐻#:𝑝# ≠ 𝑝A§ Under𝐻*,wecanpoolthetwosamplestocalculatestandarderror,

letting�̂� = (k��ka(`��`(ka(`

• ThenIf𝑛#�̂�# 1 − �̂�# ≥ 5 and𝑛A�̂�A 1 − �̂�A ≥ 5,under𝐻*weformtheZ-teststatistic

𝑍 =�̂�# − �̂�A

�̂�(1 − �̂�) 1𝑛#+ 1𝑛A

• IthasanapproximateN(0,1)distributionwhenthenullistrue.

Page 21: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics21

• Continuingthesameexample,

§ 𝑛# = 100, 𝑋# = 20, �̂�# =A*#**

= 0.20, 𝑛#�̂�# 1−�̂�# = 16 ≥ 5

§ 𝑛A = 150, 𝑋A = 24, �̂�A =At#T*

= 0.16, 𝑛A�̂�A(1−�̂�A) = 20.16 ≥ 5

§ �̂� = �ka�`(ka(`

= A*aAt#**a#T*

= 0.176

• Teststatisticisthen

𝑧 =�̂�# − �̂�A

�̂�(1 − �̂�) 1𝑛#+ 1𝑛A

�=

0.20 − 0.16

0.176(0.824) 1100 +

1150

= 0.81

• FromtableorMATLAB,𝑃 𝑍 > 0.81 = 0.21,sop-valueis2 0.21 = 0.42 > 0.05 ⇒ donotrejectH*

Page 22: HST 190: Introduction to Biostatistics

Chi-squaretestsforcontingencytablesallowustotestforassociation betweentwocategoricalvariables.

“Istherestatisticalevidenceofanassociationbetweendailyaspirinandpepticulcerdisease?”

Howdoweestimatethemagnitudeoftheassociation betweentwocategoricalvariables?

“Howmuchhigheristherateofpepticulcerdiseaseamongdailyaspirinusers?”

22

Oddsratioandrelativerisk

Page 23: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics23

• Considertwocategoricalvariables:§ “disease”vs“nodisease”

§ “exposure”vs“noexposure”

• “Exposure”couldbetreatment,riskfactor,orotherfactor§ noassumptionsaboutincreasesordecreasesdiseaserisk

• Prospectivestudy:Supposefornowthatweenrollpatientsbasedonexposurestatus(vs.basedondiseasestatus)§ e.g.,100smokersand100nonsmokers

Page 24: HST 190: Introduction to Biostatistics

MeasuresofEffectforCategoricalData

HST190:IntrotoBiostatistics24

Afterwesampleaspecifiednumberofexposedandunexposedindividuals,weclassifythembydiseasestatusasshownbelow

Threewaystoquantifymagnitudeofassociation:

1. Riskdifference(RD)=sameasdifferenceofproportions

2. Relativerisk(RR)or‘riskratio’

3. Oddsratio

Exposure

Disease+ -

+ a b a+b- c d c+da+c b+d n

Page 25: HST 190: Introduction to Biostatistics

RiskDifference =p1 – p2,where

p1 =P(disease|exposed)

p2 =P(disease|unexposed)

estimated Risk Difference =aa + b

−cc + d

25 *

RiskDifference

Exposure

Disease+ -

+ a b a+b- c d c+da+c b+d n

Page 26: HST 190: Introduction to Biostatistics

RelativeRisk(RiskRatio) = 1

2

pp

estimated Relative Risk =

aa + b

!

"#

$

%&

cc + d

!

"#

$

%&

26 *

Exposure

Disease+ -

+ a b a+b- c d c+da+c b+d n

RiskRatio

Page 27: HST 190: Introduction to Biostatistics

Supposethatyouenroll100smokersand100nonsmokersinyourstudy:

smoke

disease+ -

+ 30 70 100- 15 85 100

45 155 200

15010015

10030 difference Risk .=-=

2100

15100

30 risk Relative ==

27

RiskDifferencevs.Ratio

Page 28: HST 190: Introduction to Biostatistics

Complicatingfactors

HST190:IntrotoBiostatistics28

Measuring“effectsize”:Whyitgetsmorecomplicated?

• Time§ Weoftenmeasurerateratioinsteadofariskratio

§ Moreonthisaspectwhenwediscusssurvivalanalysis

• EffectModificationandConfounding§ Ourestimatestypicallyneedtobeadjustedforotherfactors

• Sampling§ Dependingonhowyouenrollpatientsinyourstudy,itmaynotbepossibletoestimateariskdifferenceorriskratioeveninprinciple

Page 29: HST 190: Introduction to Biostatistics

Suppose you conduct a case-control study by enrolling 100 patients with disease and 100 without, and then determine which have smoked:

29

RiskDifferencevs.Ratio

smoke

disease+ -

+ 25 10 35- 75 90 165

100 100 200

• Can’testimatep1 &p2 ifyoupre-specifythenumberofsubjectswithdiseaseà can’testimateRDorRR.

• Needtoknowhowdatainyourtableweresampled!

Page 30: HST 190: Introduction to Biostatistics

Retrospectivesampling

HST190:IntrotoBiostatistics30

• Acase-controlstudy(orretrospectivestudy)samplespatientsbasedondiseasestatus,thenclassifiesaccordingtoexposure

§ oftenperformedforcostandefficiency,particularlywhenthediseaseoroutcomeisrarenoneedtofollowsubjectsthroughentirelifetimeandcollecthugesamples

• Case-controlstudiesareoftenperformedforcostandefficiency,particularlywhenthediseaseoroutcomeisrare– noneedtofollowsubjectsthroughtheirentirelifetimeandcollecthugesamples.

• Thereisameasureofeffectsizethatcanbecomputedregardlessofwhetherpatientsareenrolledbasedonexposurestatusordiseasestatus…

Page 31: HST 190: Introduction to Biostatistics

Odds

HST190:IntrotoBiostatistics31

• If𝑝 = 𝑃(event),thendefineoddsoftheeventas �#Y�

§ Probability = 0.2 ⇒ Odds = 0.25

§ Probability = 0.5 ⇒ Odds = 1

§ Probability = 0.75 ⇒ Odds = *.�T*.AT

= 3

§ Probability = 0.99 ⇒ Odds = *.SS*.*#

= 99

• Oddscanrangefrom0toinfinity§ Whenwerandomlysamplepatientsbasedonexposurestatus,wecanestimate𝑃(disease|exposed) and𝑃(disease|unexposed)

§ Ifweinsteadperformacase-controlstudy,wecan’t.Wecanonlyestimate𝑃(exposed|disease) and𝑃(exposed|nodisease)

Page 32: HST 190: Introduction to Biostatistics

Oddsratio

HST190:IntrotoBiostatistics32

Imagineatableshowingallindividualsinthepopulation(thetableyou“wish”youcouldsee)

Let𝑝# = 𝑃(disease|exposed) and𝑝A = 𝑃(disease|unexposed),thentheratioofbothexposure groupsʼoddsofdisease is:

OR =OddsofdiseaseforexposedOddsofdiseaseforunexposed

=𝑝# (1 − 𝑝#)⁄𝑝A (1 − 𝑝A)⁄

=𝑎/(𝑎 + 𝑏)𝑏 (𝑎 + 𝑏)⁄

𝑐/(𝑐 + 𝑑)𝑑 (𝑐 + 𝑑)⁄�

=𝑎𝑑𝑐𝑏

Exposure

Disease+ -

+ a b a+b- c d c+da+c b+d n

Page 33: HST 190: Introduction to Biostatistics

Oddsratio

HST190:IntrotoBiostatistics33

Imagineatableshowingallindividualsinthepopulation(thetableyou“wish”youcouldsee)

Ifweinsteadconsider𝑃(exposed|disease)and𝑃(exposed|nodisease),thentheratioofboth disease groupsʼoddsofexposure is:

OR =Oddsofexposurefordiseased

Oddsofexposurefornondiseased

=𝑎/(𝑎 + 𝑐)𝑐 (𝑎 + 𝑐)⁄

𝑏/(𝑏 + 𝑑)𝑑 (𝑏 + 𝑑)⁄�

=𝑎𝑑𝑐𝑏

Therefore,theORisameasureofassociationthatisnumericallyidenticalineitherstudydesign.

Exposure

Disease+ -

+ a b a+b- c d c+da+c b+d n

Page 34: HST 190: Introduction to Biostatistics

0.0 0.2 0.4 0.6 0.8

02

46

8

p

p/(1

− p

)

𝑝1 − 𝑝

𝑝

HST190:IntrotoBiostatistics34

• Therefore,samplingbyexposure,estimating𝑝# and𝑝A,andcomputingoddsratioisestimatingthesamequantityasestimatingtheoddsratio(of“exposureprobabilities”)inacase-controlstudy.

• SowhatifRRisofinterest?§ Ifdiseaseisrare,𝑝#, 𝑝A smallso𝑝

1 − 𝑝 ≈ 𝑝forsmall𝑝and

1 − 𝑝#1 − 𝑝A

≈ 1 ⇒

OR = �k #Y�k⁄�` #Y�`⁄ ≈ �k

�`= 𝑅𝑅

ORapproximatesRRforrareoutcome

Page 35: HST 190: Introduction to Biostatistics

Takeaways

HST190:IntrotoBiostatistics35

• CannotestimateRRandRDinacase-controlstudy(unlessyouhaveadditionaldata).

• Canestimateoddsratiofromeither“prospective”orcase-controlstudy,andweestimateitthesamewayineitherone.

• OddsratioapproximatesRRforraredisease.

Page 36: HST 190: Introduction to Biostatistics

Interpretingoddsratio

HST190:IntrotoBiostatistics36

• Difficulttogivean“everyday”interpretationofwhattheoddsratio’sprecisevaluemeans

• 𝑂𝑅 > 1 → exposureassociatedwithhigherdiseaserisk

• 𝑂𝑅 < 1 → exposureassociatedwithlowerdiseaserisk

• 𝑂𝑅 = 1 → noassociationofexposureanddiseasestatus

Page 37: HST 190: Introduction to Biostatistics

Inferenceonoddsratio

HST190:IntrotoBiostatistics37

• ToperformhypothesistestorgenerateCIforOR,we

1) ComputelogarithmofestimatedOR[ln(OR)]

2) Makeinferenceonln(OR)

3) TranslateconclusionsintostatementsaboutOR

• WhythelogoftheOR?

§ Thesamplingdistributionofln(OR)approximatesnormaldistributionmorecloselythanthatofORitself

o Hence,methodsbasedonnormalapproximationworkbetterforln(OR)

§ Toseethis,comparesamplingdistributionsofORvs.ln(OR):onthenextslidewesimulateapopulationwithfixedratesofexposureanddisease.Forthreedifferentsamplesizes,werandomlydraw1,000samplesandcomputeORandln(OR)foreach

Page 38: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics38

38

Page 39: HST 190: Introduction to Biostatistics

CodetorecreateinMatlab

HST190:IntrotoBiostatistics39

Sample_Size = [50,200,1000]; % Define the sample sizesProb1 = 0.75; Prob2 = 0.5;% Set the binomial probabilities for X and Ufigure;

for i=1:length(Sample_Size)X = binornd(1,Prob1,Sample_Size(i),10000); % Generate 10,000 trials

of XU = binornd(1,Prob2,Sample_Size(i),10000); % Generate 10,000 trials

of U

OR = (sum(X,1).*(sum(1-U,1)))./(sum(U,1).*(sum(1-X,1))); % Calculate the Odds Ratio

LOR = log(OR); % Calculate the log of the Odds Ratio

subplot(length(Sample_Size),2,2*i-1); hist(OR,20); xlim([min(OR) max(OR)]); xlabel('Odds Ratio'); ylabel(['Sample Size ' num2str(Sample_Size(i))]) % Plot the Odds Ratio

subplot(length(Sample_Size),2,2*i); hist(LOR,20); xlim([min(LOR) max(LOR)]); xlabel('Log Odds Ratio'); % Plot the Log Odds Ratioend

suptitle('Odds Ratio Demonstration'); % Set the title for the figure

Page 40: HST 190: Introduction to Biostatistics

ConfidenceintervalforOR

HST190:IntrotoBiostatistics40

• Iftheexpectedcountineachcellofthe2x2tableis≥5,thenthesampleestimateofthetruepopulationln(OR)approximatelyfollowsthedistribution

ln(OR)� ~𝑁 ln OR ,1𝑎 +

1𝑏 +

1𝑐 +

1𝑑

• Anotherwayofwritingthisresultis

Var 𝑂𝑅� ≈1

𝑛#�̂�#(1 − �̂�#)+

1𝑛A�̂�A(1 − �̂�A)

Exposure

Disease+ -

+ a b a+b- c d c+d

a+c b+d n

Page 41: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics41

• Therefore,togeta100(1 − 𝛼)% CIforthepopulationORweuseatwo-stepprocess:

1) CIforln OR :ln OR� ±𝑧#Y�`#[+ #

]+ #

^+ #

\� = (𝑐#, 𝑐A)

2) CIforOR:(𝑒^k, 𝑒^`)

• Importantly,theCIisnotsymmetricaroundestimatedOR

Page 42: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics42

• Consideranoutbreakofgastroenteritisinaschoolfollowinglunch.263studentsatelunchincafeteriathatday.Sandwichessuspected§ Howstrongistheassociation,ifany,betweenconsumptionofthesandwichandillness?Providea95%CIfortheoddsratio

§ OR  = [\]^= #*S st

t(##q)= 7.99 ⇒ ln(OR� ) = ln 7.99 = 2.078

§ Step1:2.078 ± 𝑧#Y�`##*S

+ ###q

+ #t+ #

st� = (1.01,3.146)

§ Step2:95%CIforOR𝒆𝟏.𝟎𝟏, 𝒆𝟑.𝟏𝟒𝟔 = (𝟐. 𝟕𝟓, 𝟐𝟑. 𝟐)§ BecauseCIdoesnotcontain1,rejectnullofnoassociationat0.05level

Atesandwich? Ill?

Yes NoYes 109 116 225No 4 34 38

113 150 263

Page 43: HST 190: Introduction to Biostatistics

Multiple2x2tables

HST190:IntrotoBiostatistics43

• Whatifwehaveaconfoundingvariableassociatedwithexposureandoutcome,suchthatthereareseveral2x2tables,eachcorrespondingtooneleveloftheconfoundingvariable?

• Canwepoolthecountsinthetablesintoonetable?§ Notsofast.Thiscanseriouslybiasourresults…

Page 44: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics44

• Forexample,PercutaneousNephrolithotomy(PN)wascomparedwithseveralotherprocedures,classifiedas“open”procedures(OP),fortreatmentofrenalcalculi

• Percutaneoustreatmentclearlylookssuperior;theestimatedoddsratioforsuccessbasedonhaving(vs.nothaving)percutaneoustreatmentis

OR  =289 7761(273) = 1.33 > 1

Successful UnsuccessfulPN 289 61 350OP 273 77 350

562 138 700

289/350=0.826chanceofsuccessforPN273/350=0.780chancesuccessesforOP

Page 45: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics45

• However,ifresultsarestratifiedbasedonstonesize,percutaneoustreatmentlooksworse!

§ Largestones:OR  = TT �#AT(#SA)

= 0.81 < 1

§ Smallstones:OR  = Ast qsq(ª#)

= 0.48 < 1

Suc. Unsuc.PN 289 61 350OP 273 77 350

562 138 700LargestonesSuc. Unsuc.

PN 55 25 80OP 192 71 263

247 96 343

SmallstonesSuc. Unsuc.

PN 234 36 270OP 81 6 87

315 42 357

Page 46: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics46

• Percutaneoustreatmentisassociatedwithhighersuccessrate(OR>1)overall,yetwithlowersuccessrate(OR<1)foreachtypeofstoneseparately§ Howisthatpossible?

• Thisistheresultofconfounding byafactorassociatedwithboththetreatmentandtheoutcome(whatisit?)§ PNwasusedmostlyforsmallstones,whichhadahighersuccessrateingeneral(88%).OP’swereusedmostlyforlargestones,whichhadlowersuccessrates(72%)

§ Poolingthedataallowedthestone-sizeeffecttomaskthedifferenceintreatmenteffectiveness

• Confoundingmayoccurwheneverthereisafactorthatisassociatedwithbothtreatmentassignmentandoutcome§ ConfoundingleadingtotheoppositeconclusioninaggregateddataiscalledSimpson’sParadox(or EcologicalFallacy).

Page 47: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics47

• Nostatisticalprocedure“automatically”protectsyoufromconfounding.Adjustmentforconfoundingrequiresunderstandingofthescience

• Afterastudyisconducted,certainstatisticaltechniquescanbeusedtoadjustforit(discussedovernexttwolectures)§ Stratification

§ Matching

§ (Logistic)Regressionadjustment

Page 48: HST 190: Introduction to Biostatistics

Stratification

HST190:IntrotoBiostatistics48

• Ifyoustratifydataintomultiple2x2tables(strata)basedonaconfounder,andbelievetheyshareacommonOR,youcanestimatethisORusingtheMantel-Haenszel Method(MH)

• Thismethodisvalidiftherelationshipbetweenexposureanddiseaseisthesameineachstratum(eventhoughbaselineriskmaydiffer)§ Iftherelationshipisnotthesameineachstratum,thenitdoesnotmakesensetocombinethedatafordoinginference

• Followtwosteps:1) TestwhethertheOR’sarethesameineachstratum

2) Ifso,proceedwithinferenceforthecommonOR,usingallthetables

Page 49: HST 190: Introduction to Biostatistics

Chi-squaretestforhomogeneity

HST190:IntrotoBiostatistics49

• ToseeiftheOR’sarethesameineachstratum,weusethechi-squaretestforhomogeneity

• Given𝑘 strata(tables),wetestthehypotheses§ 𝐻*:OR# = ORA = ⋯ = OR­ (homogeneity)

§ 𝐻#: atleastoneoftheOR’sisdifferent

• Teststatisticis𝑋¯°±A = ∑ 𝑤D­D³# ln OR� D − ln OR

A

§ 𝑤D =#[´+ #

]´+ #

^´+ #

Y#, ln OR =

∑ µ´¶´·k ¸¹ °º� ´

∑ µ´¶´·k

§ Underthenull,𝑋¯°±A ~𝜒­Y#A

• Ifwereject𝐻*,stophere.Otherwise,estimatecommonOR

Page 50: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics50

• InRenalcalculiexample,testofhomogeneitybystonesize

§ Largestones:ln OR  # = ln TT �#AT #SA

= −0.206

o 𝑤# =#TT+ #

AT+ #

#SA+ #

�#

Y#= 12.91

§ Smallstones:ln OR  A = ln Ast qsq ª#

= −0.731

o 𝑤A =#Ast

+ #sq+ #

ª#+ #

q

Y#= 4.74

§ ln(OR) = #A.S# Y*.A*q at.�t(Y*.�s#)#A.S#at.�t

= −0.347

𝑋¯°±A = 12.91 −0.206 + 0.347 A + 4.74 −0.731 + 0.347 A

= 0.956 < 3.84 = 𝜒#,*.STA

§ Wefailtorejectthenullthattheoddsratiosdiffer,andcontinue

Page 51: HST 190: Introduction to Biostatistics

Mantel-Haenzel oddsratioestimator

HST190:IntrotoBiostatistics51

• Ifweconcludehomogeneityacrossstrata,thentheMantel-Haenszel Estimator ofthecommonOddsRatio is

OR ±¯ =∑ 𝑎D𝑑D/𝑛D­D³#

∑ 𝑏D𝑐D/𝑛D­D³#

• WecannowusehypothesistestsandconfidenceintervalsforthecommonOR(viatheln(OR)).First,checkthat

§ ∑ (𝑎D + 𝑐D)(𝑎D + 𝑏D)/𝑛D­D³# ≥ 5

§ ∑ (𝑎D + 𝑐D)(𝑐D + 𝑑D)/𝑛D­D³# ≥ 5

§ ∑ (𝑏D + 𝑑D)(𝑎D + 𝑏D)/𝑛D­D³# ≥ 5

§ ∑ (𝑏D + 𝑑D)(𝑐D + 𝑑D)/𝑛D­D³# ≥ 5

Page 52: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics52

• Undertheseconditions,the100(1 − 𝛼)% CIforln(OR)is

ln OR »¼ ± z#YZA¾𝑤D­

D³#

Y#A

= (𝐿, 𝑈)

§ Where𝑤D =#[´+ #

]´+ #

^´+ #

Y#

• TheCIfortheORisthen 𝑒Á, 𝑒Â

Page 53: HST 190: Introduction to Biostatistics

HypothesistestingforMH

HST190:IntrotoBiostatistics53

• Finally,wemaywishtotestnullhypothesisofnoassociationbetweentwovariables,controllingforacofounder:𝐻*: OR = 1versus𝐻#: OR ≠ 1

• Todothetest,weneedtocalculate3quantities:§ 𝑂 = ∑ 𝑂D­

D³# = ∑ 𝑎D­D³#

§ 𝐸 = ∑ 𝐸D­D³# = ∑ ([´a]´)([´a^´)

(´­D³#

§ 𝑉 = ∑ 𝑉D­D³# = ∑ ([´a]´)(^´a\´)([´a^´)(]´a\´)

(´`((´Y#)

­D³# (mustbe≥ 5)

• 𝑋±¯A = jYl Y*.T `

Ä,whichfollows𝜒#A distributionif𝐻* true

Page 54: HST 190: Introduction to Biostatistics

HST190:IntrotoBiostatistics54

• Returningtorenalcalculiexample,

OR ±¯ =55 71343 +

234 6357

25 192343 +

36 81357� = 0.69

§ compromisebetweentwostratum-specificORs(0.81and0.48)

• Tocompute95%CI,firstverifytheconditionsgivenpreviously(theyaremessytoshow,butinthiscasemet)

ln OR ±¯ ± 𝑧#YZA1/ 12.91 + 4.74� = −0.84,0.10

• Thus,95%CIforORis 𝑒Y*.ªt, 𝑒*.#* = (0.43,1.10)