Top Banner
HST 190: Introduction to Biostatistics Lecture 1: Basic principles of statistical data analysis 1 HST 190: Intro to Biostatistics
52

HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Jun 02, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntroductiontoBiostatistics

Lecture1:Basicprinciplesofstatisticaldata

analysis

1 HST190:IntrotoBiostatistics

Page 2: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Welcome!

• Statisticalreasoningistheprocessofdrawingscientificconclusionsfromdatainarational,consistentway

• Goalsforthecourse:§ developanintuitionforthekeyconceptsthatunderpinthestatisticalanalysisofdata

§ readthe“Methods”sectionofanarticle,andunderstand/critiquetheapproachtaken

§ learntoanalyzeanddrawscientificconclusionsfromyourowndata

HST190:IntrotoBiostatistics2

Page 3: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Outline

Lecture Topic(s)1 Basicprinciplesofstatistical dataanalysis

2 Principlesofprobability&Estimationofparameters

3 Two-sample comparisons,hypothesistestingandpower/samplesizecalculations

4 Clinicaltrials&Simplelinearregression

5 Multiplelinear regression

6 Methodsforbinaryoutcomes

7 Logisticregression

8 Analysis oftime-to-eventdata

9 Projectpresentations

10 Reviewbeforetheexam

HST190:IntrotoBiostatistics3

Page 4: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

CourseLogistics

HST190:IntrotoBiostatistics4

• Eightlectures§ each2-2.5hourslong

• Readingwillbeassignedpriortoeachlecture§ giventhepaceofthecourse,thisisstronglyencouraged

• Problemsetsfollowingeachlecture§ includeMatlab exercises§ dueat9amonthedayofthefollowinglecture(unlessspecifiedotherwise)

Page 5: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics5

• Duringbreaksinthemiddlewewill:§ completegroupexercises§ learnMatlab§ discusscourseprojects

• Youwillalsoworkonagroupproject andpresentresultsduringoneoftheclassmeetings

• In-classexamwilltakeplaceduringlastmeeting§ 28th August§ open-book

Page 6: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Suggestions

HST190:IntrotoBiostatistics6

• AskquestionsduringthelectureaswellasonPiazza§ takenotes!

• MaterialpresentedindifferentsequencefromRosner§ consultRosnerforadifferentapproach

• Lotsofmaterialinashorttime§ feelfreetoaskforhelp!

• Therewillbemanyformulae§ goalisnottomemorizethem

§ eventhoughwehaveaccesstosoftware,handcalculationscanhelpcultivateintuition

Page 7: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HowtoPrioritize

HST190:IntrotoBiostatistics7

• Thecourseispass/fail.

• Examisopen-book,sodon’tspendtimememorizingformulas.Learnwhenandwhytouseeachprocedure;youcanalwaysrefertoyournotestoseehow.

• Togetthemostoutofthiscourse,youshould:§ attendlectures

§ submitsolutionstoalltheproblemsets

§ participateinclassdiscussions,groupexercises,andPiazza

§ completeaproject

§ takethefinalexam

Page 8: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Resources

HST190:IntrotoBiostatistics8

• LectureNotes(Canvas->Files)§ Getbonuspointsforfindingtypos!

• IntroductiontoMatlab (Canvas->Files)

• Rosnertextbook,7thed.(required;alifelongreference)

• Piazza

• Pagano&Gavreau textbook

• SeeSyllabusforadditionalreferences.

Page 9: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Basicstepsofdataanalysis

• Tosetthestage,let’sconsidertwomotivatingquestions:1) isthereanassociationbetweentimespentintheoperatingroom

andpost-surgicaloutcomesforlungcancerresection?

2) canwedevelopanenhancedbreastcancerriskmodel?

• Thequestionshavebeenleftdeliberatelyvague!it’softenthecasethatscientificquestionsareinitiallyimpreciselyposed

• Integraltotheprocessofresearchistranslatingscienceintostatistics,andbackagain§ asyoureadpapers,itisimportanttoconsiderhowtheauthorsthoughtthroughthisprocess

HST190:IntrotoBiostatistics9

Page 10: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

• Therearemany(possiblyinfinite!)waysinwhichonecouldcharacterize‘basicsteps’butareasonableoutlinemightbe:I. Understandthecontextoftheanalysis

II. Establishthescientificgoals

III. Translatethescientificgoalsintostatisticallanguage

IV. Choosestatisticalmethodstoemploy

V. Implementationandrunningtheanalysis

VI. Interpretation

HST190:IntrotoBiostatistics10

Page 11: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics11

• Sometimes,thewayforwardisclearand,inthatsense,theprocessisprescriptive§ features/issuesthatarecommontoallanalyses

• Inmanyinstances,however,thewayforwardisn’tclear§ aspectsoftheanalysisdon’tfitinwithwhatyoucurrentlyknow

§ thesemayrelatetothescience,dataand/orstatisticsaspects

• Solutionsinclude:§ appealingtothepublishedliterature(scientificandstatistical)

§ adoptingoradaptingexistingmethods

§ developingnewmethods

• Regardless,dealingwiththeseissueswillrequiresomecreativity,andthereisseldom,ifever,one‘correct’dataanalysis§ differentdataanalysescorrespondtodifferentscientificquestions

§ whichscientificquestionis‘right’?

Page 12: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

I.UnderstandingtheContext

• Fromtheperspectiveofabiostatistician,thepurposeofdataanalysisistolearnaboutsomepopulationusinginformationinasample

• Learnaboutcovariatesintermsofassociationwithorpredictionofanoutcome§ notationally weoftenthinkintermsof𝑋 and𝑌

§ possiblywithinoracrosscertainsub-populationsdenoted,say,by𝑍

• Contextusuallyinvolvesthreethings:1) thebackgroundscience

2) thenatureoftheavailabledata

3) thepopulationofinterest,oftencalledthe‘targetpopulation’

HST190:IntrotoBiostatistics12

Page 13: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Lungcancersurgery

HST190:IntrotoBiostatistics13

Q:Isthereanassociationbetweentimespentintheoperatingroomandpost-surgicaloutcomes?

• Backgroundscience:§ longeroperatingtime–>greaterexposuretoanesthesia

§ shorteningoperatingtimemightreduceadversepost-surgicaloutcomes

o complicationsduringthehospitalstay

o recurrenceoflungcancer

o mortality

§ mayalsoleadtodecreasedcosts/increasedefficiencyo increasedcapacityfortheoperatingroom

o shorterpost-surgicalhospitalstay

Page 14: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics14

• Availabledata:§ ≈400surgeriesatBrighamandWomen’sHospital

§ performedbetween1997-2008

§ demographic,clinical,tumorandfollow-upinformation

• Targetpopulation:§ patientswhoundergoelectivesurgeryforearlystagenon-smallcelllungcancer

§ needtobeawareofdifferentsurgerysub-types

o lobectomy,segmentectomy,wedgeresection

o thorachotomy,videoassistedthoracicsurgery

§ whatdowethinkaboutthe(relatively)longtimeframe?

§ generalizabilitybeyondBWH?

Page 15: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Breastcancerrisk

HST190:IntrotoBiostatistics15

Q:Canwedevelopanenhancedbreastcancerriskmodel?

• Backgroundscience:§ the‘Gailmodel’forbreastcancerriskwasdevelopedinthelate1980s

o age,race,

o ageatmenarche,ageatbirthoffirstchild

o familyhistory,numberofpriorbiopsyexaminationsandatypicalhyperplasia

§ themodelwasvalidatedinanumberofsubsequentstudies

§ subsequentresearchidentifiedanumberofadditionalriskfactorsforbreastcancer

o breastdensity,useofhormonereplacementtherapyandbodymassindex

Page 16: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics16

• Availabledata:§ 2,392,998screeningmammogramsfromtheBreastCancerSurveillanceConsortium

o NCI-fundednationwidenetworkofmammographyregistries

§ mammogramsperformedbetween1996-2002

§ outcomesareascertainedvialinkageswithcancerregistries

• Targetpopulation:§ screeningmammogramsperformedonwomenaged35-84years

o unitofanalysisisthemammogram,notthewoman

§ whoundergoesscreening?whodoesn’t?

o howmightthisimpacttheinterpretationofthestudy?

Page 17: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Natureoftheavailabledata

HST190:IntrotoBiostatistics17

• Whatwerethedatacollectionprocedures?§ conveniencesampleorpartofadesignedstudy?

§ whatwasthesetting/timeframe?

§ observationalstudyorrandomizeddesign?

§ cross-sectional,prospective,orretrospective?

§ stratificationand/ormatching?

• Howweretheproceduresfollowed?§ anysystematicdeviationsfromthe‘ideal’datacollectionprocess?

§ maybeduetopatients?o refusaltoparticipate/respond

o inaccurateresponses

Page 18: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics18

§ maybeduetoresearchers?

o wereuniformproceduresappliedtoall(potential)participants?

o areweactuallymeasuringwhatwethinkwearemeasuring?

• Havetherebeenanyinterimdatacleaning/manipulationefforts?§ cleaningof‘strange’values

o settosomethresholdvalueortomissing

o exclusionfromthedataset

§ constructionofderivedvariables

Page 19: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Populations

HST190:IntrotoBiostatistics19

• Inpractice,the‘population’canbe§ anactual,potentiallyobservablepopulation

§ ahypothetical(sometimesinfinite)population

• Mightrefertothe‘targetpopulation’toemphasizethatthereisaspecificpopulationinmind

• Definingthetargetpopulationiscrucialinthatitprovidesthecontextthescientificquestionofinterest§ whowouldwelikeourresultstogeneralizetoo?

• Narrowvs.broaddefinitionsofthetargetpopulation§ heterogeneityvs.homogeneity

§ whatarethetrade-offs?

Page 20: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics20

• Whatcomesfirst...thedataorthepopulation?§ dependsonwhenyougetinvolved

• Ifthedatahasalreadybeencollected:§ forwhichpopulationcouldweconsiderthesampleasbeing‘representative’?

§ mayneedtofocusthedatasetbyexcludingcertainfolkso implicitlychangesthepopulationtowhichonecangeneralize

o samplesizevs.mixingofeffects

§ istherescopeforadditionaldatacollectionefforts?

• Ifthedatahasnotbeencollected:§ muchgreaterflexibilityforchoosing/definingthepopulationofinterest

Page 21: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Learningfromdata

HST190:IntrotoBiostatistics21

• Recall,thegoalistolearnabouttherelationshipsbetweenasubsetofcovariates

• Achievedbycollectingandanalyzingasamplefromthepopulation§ animportantaspectof‘context’isthatthisisindeedwhatwearedoing

o or,atleast,hopingtodo!

• Supposewecouldenumeratetheentirepopulation§ thatis,thesampleisthepopulation

• Inthiscaseobserveddatacharacterizesrelationshipscompletely

Page 22: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics22

• Notewhenwehaveacompleteenumeration,thereisnosamplingvariability§ wedon’thavetoworryaboutmakingstatementsaboutthepopulationonthebasisofinformationinthesample

§ thesampleisthepopulation

• Wedon’thavetoconsiderorquantifyuncertaintyassociatedwithonlyobservingasub-sample§ noneedforstandarderrors,confidenceintervalsorp-values

§ maybenoneedforstatisticalmethods!

• Mostofthetimewecan’tenumeratetheentirepopulation§ typically,thisisn’tlogisticallyand/orfinanciallyfeasible

• So…

Page 23: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

II.Establishthescientificgoals

• Broadlyspeakingonecanclassifyscientificgoalsas:§ descriptionorexplorationofapopulation

§ evaluationofsomehypothesis

§ predictionoffutureoutcomes

• Asingleanalysismayhaveseveralgoals§ dependsonscientificsettingandbackground

HST190:IntrotoBiostatistics23

Page 24: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Lungcancersurgery

HST190:IntrotoBiostatistics24

Q:Isthereanassociationbetweentimespentintheoperatingroomandpost-surgicaloutcomes?

• Description/exploration:§ whatisthenatureoftheassociation?

§ doestheassociationdifferacrosssurgerytypes?

• Hypothesistesting:§ apriorihypothesisamongthecollaboratorsthatshortertimesareassociatedwithbetterpost-surgicaloutcomes

Page 25: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Breastcancerrisk

HST190:IntrotoBiostatistics25

Q:Canwedevelopanenhancedbreastcancerriskmodel?

• Prediction:§ usealltheavailableinformationinthebestpossiblewaytopredicttheriskofbreastcancer

§ buildpredictionmodelsthatcatertospecificsettingswithvaryingamounts/typeofinformation?

o athome/online

o inthephysiciansoffice

• Whymightdescription/explorationandhypothesistestingbeoflessinterest?

Page 26: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Description/exploration

HST190:IntrotoBiostatistics26

• Goalistocharacterizetherelationshipsamongasetofcovariatesinthepopulationofinterest

• Animportantissueiswhetherornotthegoalistoestablishcausation§ typicallyrequiresagreaterunderstandingofthescience

• Typically,althoughnotalways,viewedashypothesisgenerating§ wehaveacooldataset,let’sseewhatwecanfind...

§ thereisafine,oftenblurrylinebetweenexplorationandhypothesistesting

o whatcamefirst...thedataorthequestion?

Page 27: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Hypothesistesting

HST190:IntrotoBiostatistics27

• Goalistomakesomeconfirmatorystatement

• Typicallyframedinthecontextofmakinga‘decision’betweentwocompetinghypotheses𝐻%:nullhypothesis

𝐻&:alternativehypothesis

• Assumethenullhypothesisholdsandlookforevidencetothecontrary

• Standardhypothesistestingreducesthepotentialdecisionsto:1. failtoreject𝐻%2. reject𝐻% (implicitlyinfavorof𝐻&)

§ decisionshouldbeaccompaniedbysomemeasureofuncertainty

Page 28: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Prediction

HST190:IntrotoBiostatistics28

• Goalistoestimatefutureoutcomesorrisk§ Typicallyframedintermsofbuildingthebestpossiblemodel

• Whatdowemeanby‘best’?§ needsomemeansofjudgingaccuracyandpenalizingpoorpredictions

§ ideallybasedonrealworldconsequenceso e.g.false-positivevs.false-negativeforbreastcancer

• Sometimesasinglebestmodelisinappropriate§ amodelmayworkwellinonepopulationandnotothers

§ inputsmaynotalwaysbeavailable(e.g.geneticinformation)

• Towhatextentdoweneedtocareaboutcausation?§ doweneedtounderstandthe‘true’underlyingmechanisms?

Page 29: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Therealworld

HST190:IntrotoBiostatistics29

• Unfortunately,thescientificgoalsarenotalwaysclearattheoutset

• Typically,itisthecasethat:§ therearemanyscientificgoalsthatareofinterest,and/or

§ thegoalcanbeinterpretedinanumberofways

• Primarilyaproblembecauseinvestigatorsneedprecisestatementstobeabletoproceed§ totranslatethescientificgoalsintostatisticalones

• Towardsrefiningstudygoals,acoupleofusefulquestionsare:1) whoistheintended(primary)audience?

2) whatwillbeactionablefromtheresults?

Page 30: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics30

• Considerthequestion:WhatisMrs.Jones’riskofbreastcancer?

• Howoneproceedsdepends,atleastinpart,onhowthisinformationwillbeused:Researchers

o determineeligibilityforarandomizedstudyofsomenovelpreventativeagent

Patientso decisionastowhetherornotsheshouldgetintouchwithherphysician

Physicianso planningforfuturescreeningschedule

Policy-makerso monitorthepublichealthburdenofbreastcancer

Page 31: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics31

• Relatedquestionsinclude:§ isinterestinallbreastcancersorsomespecifictumortype?

§ riskoverwhichtimeframe?o 1year?

o 5years?

o lifetime?

§ howmuchinformationwilltheinterested‘user’haveaccessto?

o willdetailedfamilyhistoryinformationbeavailable?

o willgeneticinformationbeavailable?

• Differentanswerstoallthesequestionsdefinedifferentscientificgoals

Page 32: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

III.Translatingscientificgoalsintostatisticalterms/tasks

• Oncethescientificgoalsare‘established’weneedtotranslatethemintothelanguageofstatistics

• Movingforwardrequires:§ preciseandcleardefinitionsofallrelevantcovariates

§ specificationofkeyrelationshipsofinterest

HST190:IntrotoBiostatistics32

Scientificgoal StatisticaltaskDescription/exploration Estimation

Hypothesistesting InferencePrediction Estimation

Page 33: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Preciselydefiningcovariates

HST190:IntrotoBiostatistics33

• Eachofthepotentialgoalsistryingtosaysomethingabouttherelationshipsamongasetofcovariates

• Priortoanyanalysisweneedcleardefinitionsforallrelevantcovariates:§ responsevariables

§ exposure(s)ofinterest

§ interactiontermsand/oreffectmodifiers

§ predictorsoftheresponse

§ predictorsoftheexposure(s)ofinterest

• Therewillbeoverlapacrossthesevarioustypesofvariables§ e.g.,acovariatemaybeapredictorofboththeresponseandoftheexposureofinterest

Page 34: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics34

• Oftennotasstraightforwardasonemightthink,mainlybecausethereisoftenchoiceinvolved

• Supposetheresponseofinterestis‘diagnosisofbreastcancer’§ overwhichtimeframe?

§ forwhichsub-types?

• Supposetheexposureofinterestis‘operatingtime’§ whendoestimestart?

§ whendoestimestop?

• Define(andperhapsre-define)untileverythingisclear!

Page 35: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Lungcancersurgery

HST190:IntrotoBiostatistics35

Q:Isthereanassociationbetweentimespentintheoperatingroomandpost-surgicaloutcomes?

• Responses:§ hospitalstayof>7days(binary)

§ numberofmajorcomplicationsduringhospitalstay(count)

o needalistof‘major’complications

§ timetodeath(continuous,right-censored)

• Exposureofinterest:§ operatingtimedefinedasthetimefromthefirstincisiontothetimeofthefirststitchtocloseup(continuous)

Page 36: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Breastcancerrisk

HST190:IntrotoBiostatistics36

Q:Canwedevelopanenhancedbreastcancerriskmodel?

• Response:§ diagnosisofbreastcancerwithin1yearofthescreeningmammogram(binary)

• Exposureofinterest:§ age,race,education,breastdensity,HRTuse...

§ atotalof13potentialpredictors

§ allcategorical

o atleastintheavailabledataset

Page 37: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

IV.Choosingstatisticalmethods

• Onewayofviewingallthestatisticalmethodsavailableisasacollectionoftools§ differentstatisticaltoolsfordifferentstatisticaltasks

§ developunderstandingofacollectionoftoolsoverthecourseofyourcareer

• Atoolboxofstatisticaltools/methods§ basicmethods,thateveryoneshouldbeabletouse

§ specializedmethods

o sophisticatedtoolsthatrequire‘training’

o constantlybeingdevelopedandpublishedintheliterature

§ sometimesnewquestionsrequirenewmethods

HST190:IntrotoBiostatistics37

Page 38: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics38

• Forthemostpart,thetoolsthatresearchersemployaredeterminedbytheissueswe’veconsideredsofar§ scientificgoals

§ natureoftheavailabledata

§ populationofinterest

• Evengivenallthisinformation,thereareoftenseveralchoicesofstatisticaltools/methods

• Howtochoosebetweenalltheavailableapproaches?§ interpretation(tobediscussedlater)

§ operatingcharacteristicso e.g.biasandstatisticalefficiency

Page 39: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

V.Implementationandrunningtheanalysis

• Seeminglythemost‘prescriptive’ofthesteps§ inaperfectworld,turnthehandle...andyou’redone!

• Unfortunately,actuallyperformingtheanalysisisnotalwaysstraightforward

• Manychoicesforstatisticalsoftware§ R,Matlab,SAS,Stata,WinBUGS,...

§ eachhasnumerousresources,includingalready-writtencodeavailableonline

§ notallmethodshavebeenimplementedinallsoftwarepackages

HST190:IntrotoBiostatistics39

Page 40: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics40

• Performingtheanalysescanalsohighlightallsortsofproblems§ EDAmighthighlightdataissues

o missingdata

o unusualvalues

o unusualobservedrelationships

• Issueslikethismayrequireare-thinkofthescientificgoals§ ifyoucan’tanswerthisquestion,whichquestioncanyouanswer?

Page 41: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

VI.Interpretation

• It’simportanttodistinguishinterpretationofthemodel frominterpretationoftheresults

• Specificationofthemodelissomethingthatwehavecontrolover§ itshouldbestraightforwardtoprovideapreciseinterpretationofits’components

o youcannotbepedanticenoughonthispoint

§ shouldbeabletodothisbeforeyouevenseethatdata

• Considerthelinearregressionmodel:𝐸 𝑌 𝑋 = 𝛽% + 𝛽&𝑋

§ Howdoweinterpret𝛽&?

HST190:IntrotoBiostatistics41

Page 42: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Interpretationoftheresults

HST190:IntrotoBiostatistics42

• Herearesomeresults...whatdoesitallmean?!?§ translationofstatisticsbacktoscience

• Interpretingtheresultsrequiresadetailedunderstandingboththescientificandstatisticalcontext§ usuallyrequiresdiscussionwithcollaborators

• Sometimestheresultsdon’tsupporttheinitialhypotheses!§ e.g.,Breitner etal(2008)Neurology

§ RiskofdementiaandADwithpriorexposuretoNSAIDsinanelderlycommunity-basedcohort

§ seethenextslide

Page 43: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics43

Page 44: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics44

• Thesecanbeparticularlychallengingsituations

• Aretheseresults‘right’?§ arewemisinterpretingourassumptions/models?

§ aretheredataissuesthatwearen’tawareof?

§ isthecodewrong?

§ aretheresultssensitivetoparticularanalysischoices?

• Itmaybethattheresultsare‘right’§ perhapsanewunderstandingofthemechanismofinterest

§ perhapstheresultspertaintoapopulationthathasn’tbeenstudiedbefore

Page 45: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Learningaboutpopulations

• Itisseldompossibletospecifyone,singletargetpopulation§ oftenthecasetherearemanyinterestingtargetpopulations

• Flexibilitytoconsiderdifferentpopulationsdependsonwhetherornotthesamplehasbeencollected

• Ifthesamplehasnotbeencollected,onemightconsider§ arangeofscientificquestions

§ thefeasibilityofcollectingdataacrossdifferentpopulations

• Ifthesamplehasbeencollected,flexibilitydependsonthenatureandscopeoftheavailabledata

HST190:IntrotoBiostatistics45

Page 46: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Breastcancerscreening

HST190:IntrotoBiostatistics46

• Broadgoalofscreeningistodetectcancerasearlyaspossible§ balancebetweenpublichealthgoalsandcosts

§ cannotscreeneveryoneallofthetime

§ therearealso‘harms’associatedwithscreening

§ mammographyisnotperfect

§ realconsequencesassociatedwithfalse-positives

• Currentrecommendationsare(broadly):§ allwomenaged50oroldergetscreenedeverytwoyears

§ also,womenintheir40’swhoareat‘highrisk’

Q:Howgoodismammographyasascreeningmodality?§ answerdepends,inpart,onthepopulationofinterest

Page 47: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics47

• Rosenbergetal(2006)Radiology.§ allwomenwhoundergoscreeningmammography

Page 48: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics48

• Yankaskas et al (2010) JNCI.

Page 49: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics49

• Miglioretti etal(2004)JAMA.

Page 50: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics50

• Goldmanetal(2008)MedicalCare.

Page 51: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

Remarks

• Exceptinthemosttrivialofsettings,thedataanalysisprocessiscollaborativeanditerative

• Howyouproceedwilldependonmanythings:§ thenatureofthedata

§ yourphilosophy

§ thephilosophyofyourcollaborators

• Gettingthescience‘right’isoftenthehardestpart§ goalsareseldompreciseattheoutset

§ goingback-and-forthbetweenthescienceandstatisticsistypicallyaveryinstructiveprocess

§ todoagoodjobusuallyrequiresknowledgeofthescience

HST190:IntrotoBiostatistics51

Page 52: HST 190: Introduction to Biostatistics - Harvard University · 2019-09-23 · 6 HST 190: Intro to Biostatistics •Ask questions during the lecture as well as on Piazza §take notes!

HST190:IntrotoBiostatistics52

• Moreoftenthannot,thereisscopeforprescriptionaswellasforcreativity§ sometimesthereisanobviouswayforward

§ othertimesthereisn’t

• Whatcamefirst...thequestionorthedata?

• Thereisseldomone‘right’scientificquestionordataanalysis§ BoxandDraper(1987):

Essentially,allmodelsarewrongbutsomeareuseful.