Top Banner
Chapter 1 Statistics: the science of data -- collecting, classifying, summarizing, organizing, analyzing, presenting, and interpreting. coursera
67

coursera Chapter 1

Feb 03, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: coursera Chapter 1

Chapter1

Statistics:thescienceofdata--collecting,classifying,summarizing,organizing,analyzing,presenting,and

interpreting.

coursera

Page 2: coursera Chapter 1

Section1.1

TheStructureofData

Page 3: coursera Chapter 1

CasesandVariablesWeobtaininformationaboutcases orunits.

Avariable isanycharacteristicthatisrecordedforeachcase.

Generallyeachcasemakesuparowinadataset,andeachvariablemakesupacolumn

Page 4: coursera Chapter 1

CountriesoftheWorldCountry LandArea Population Density GDP Rural CO2 PumpP

rice Military

Afghanistan 652.86 30.552 46.8 665 74.1 35.3 1.28 8.65

Albania 27.40 2.897 105.7 4460 44.6 12.8 1.81

Algeria 2381.74 39.208 16.5 5361 30.5 24.6 0.29

AmericanSamoa 0.20 0.055 275.0 12.7

Andorra 0.47 0.079 168.1 13.8 9.5 1.67

Angola 1246.70 21.472 17.2 5783 57.5 44.8 0.63 13.81

AntiguaandBarbuda 0.4 0.090 204.5 13342 75.4 16.6

Argentina 2736.690 41.446 15.1 14715 8.5 16.9 1.46

Page 5: coursera Chapter 1

IntroStatisticsSurveyData

Page 6: coursera Chapter 1

Applyingthedefinition

Considertheabovedatasets–Whatarethecases?–Whatarethevariables?–Whatinterestingquestionscouldithelpyouanswer?

Page 7: coursera Chapter 1

Quantitativevs.QualitativeData

• QuantitativeData:measurementsrecordedonanaturallyoccurringnumericalscale

• Categorical (orQualitative)Data:measurementsthatdividecasesintogroups(cannotbemeasuredonanumericalscale)

Istherenumericaldatathatisnotquantitative?

Page 8: coursera Chapter 1

Datacanbeusedtoanswerinterestingquestions!

UsingDatatoAnsweraQuestion

1. Caneatingayogurtadaycauseyoutoloseweight?

2. Domalesfindfemalesmoreattractiveiftheywearred?

3. Doesloudermusiccausepeopletodrinkmorebeer?

4. Arelionsmorelikelytoattackafterafullmoon?

(theanswertoallofthesequestionsisyes!)

Page 9: coursera Chapter 1

Variables

Foreachofthefollowingsituations:–Whatarethevariables?– Iseachvariablecategoricalorquantitative?

1. Caneatingayogurtadaycauseyoutoloseweight?

2. Domalesfindfemalesmoreattractiveiftheywearred?

3. Doesloudermusiccausepeopletodrinkmorebeer?

4. Arelionsmorelikelytoattackafterafullmoon?

Page 10: coursera Chapter 1

ExplanatoryandResponseIfweareusingonevariabletohelpusunderstandorpredictvaluesofanothervariable,wecalltheformertheexplanatoryvariable andthelatter

theresponsevariable

Examples:• Doesmeditationhelpreducestress?• Doessugarconsumptionincreasehyperactivity?

Page 11: coursera Chapter 1

Variables

Foreachofthefollowingsituations:–Whichistheexplanatoryandwhichistheresponsevariable?

1. Caneatingayogurtadaycauseyoutoloseweight?

2. Domalesfindfemalesmoreattractiveiftheywearred?

3. Doesloudermusiccausepeopletodrinkmorebeer?

4. Arelionsmorelikelytoattackafterafullmoon?

Page 12: coursera Chapter 1

Summary• Dataareeverywhere,andpertaintoawidevarietyoftopics

• Adatasetisusuallycomprisedofvariablesmeasuredoncases

• Variablesareeithercategoricalorquantitative• Datacanbeusedtoprovideinformationaboutessentiallyanythingweareinterestedinandwanttocollectdataon!

Page 13: coursera Chapter 1

Section1.2

SamplingfromaPopulation

Page 14: coursera Chapter 1

SampleversusPopulation

Apopulation includesallindividualsorobjectsofinterest.

Asample isallthecasesthatwehavecollecteddataon(asubsetofthepopulation).

Statistical inference istheprocessofusingdatafromasampletogaininformationaboutthe

population.

Page 15: coursera Chapter 1

TheBigPicture

Population

Sample

Sampling

StatisticalInference

Page 16: coursera Chapter 1

Definitions• Population:allindividualsorobjectsofinterest.• Variable:acharacteristicofanindividualunit.• Sample:allthecasesthatwehavecollecteddataon(asubsetofthe

population).

Example:IwanttoestimatewhatproportionofUPstudentsareleft-handed.

• HowcouldIdothat?• Determinetheabove.

Page 17: coursera Chapter 1

SamplingBias

• Samplingbiasoccurswhenthemethodofselectingasamplecausesthesampletodifferfromthepopulationinsomerelevantway.

• Ifsamplingbiasexists,wecannottrustgeneralizationsfromthesampletothepopulation

• PeopleareTERRIBLEatselectingagoodsample,evenwhenexplicitlytryingtoavoidsamplingbias!

Page 18: coursera Chapter 1

DeweyDefeatsTruman?

Famousexampleofsamplingbiasleadingtoanincorrectprediction

Page 19: coursera Chapter 1

Sampling

Population Sample

Sample

GOAL:Selectasamplethatissimilartothepopulation,onlysmaller

Page 20: coursera Chapter 1

RandomSampling

• Howcanwemakesuretoavoidsamplingbias?

• Imagineputtingthenamesofalltheunitsofthepopulationintoahat,anddrawingoutnamesatrandomtobeinthesample

• Moreoften,weusetechnology

TakeaRANDOM sample!

Page 21: coursera Chapter 1

SimpleRandomSample

Inasimplerandomsample,eachunitofthepopulationhasthesamechanceofbeingselected,regardlessoftheother

unitschosenforthesample

*morecomplicatedrandomsamplingschemesexist

Page 22: coursera Chapter 1

RealitiesofSampling• Whilearandomsampleisideal,oftenitisn’tfeasible.Alistoftheentirepopulationmaynotbeavailable,oritmaybeimpossibleortoodifficulttocontactallmembersofthepopulation.

• Sometimes,yourpopulationofinteresthastobealteredtosomethingmorefeasibletosamplefrom.Generalizationofresultsarelimitedtothepopulationthatwasactuallysampledfrom.

• Inpractice,thinkhardaboutpotentialsourcesofsamplingbias,andtryyourbesttoavoidthem

Page 23: coursera Chapter 1

Non-RandomSamplesSupposeyouwanttoestimatetheaveragenumberofhoursthatstudentsspendstudyingeachweek.Whichofthefollowingisthebestmethodofsampling?a) Gotothelibraryandaskallthestudentsthere

howmuchtheystudyb) Emailallstudentsaskinghowmuchtheystudy,

anduseallthedatayougetc) Standonthequadandaskeveryonewalkingby

howmuchtheystudy

Page 24: coursera Chapter 1

a) Gotothelibraryandaskallthestudentstherehowmuchtheystudy

Samplingunitsbasedonsomethingobviouslyrelatedtothevariable(s)youarestudying

BadMethodsofSampling

Page 25: coursera Chapter 1

b)Emailallstudentsaskinghowmuchtheystudy,anduseallthedatayouget

•Lettingyoursamplebecomprisedofwhoeverchoosestoparticipate(volunteerbias)• Peoplewhochosetoparticipateorrespondareprobablynotrepresentativeoftheentirepopulation

BadMethodsofSampling

Page 26: coursera Chapter 1

Alcohol,Marijuana,andDriving• TheFederalOfficeofRoadSafetyinAustraliaconductedastudyontheeffectsofalcoholandmarijuanaonperformance• Volunteerswhorespondedtoadvertisementsforthestudyonrockradiostationsweregivenarandomcombinationofthetwodrugs,thentheirperformancewasobserved–Whatisthesample?Whatisthepopulation?– Istheresamplingbias?–Willtheresultsbeinformativeand/ordoyouthinkthestudyisworthconducting?

Source:Chesher,G.,Dauncey,H.,Crawford,J.andHorn,K,“TheInteractionbetweenAlcoholandMarijuana:ADoseDependentStudyontheEffectsofHumanMoodsandPerformanceSkills,”ReportNo.C40,FederalOfficeofRoadSafety,FederalDepartmentofTransport,Australia,1986.

Page 27: coursera Chapter 1

DATA

DataCollectionandBias

PopulationSample

SamplingBias?

Otherformsofbias?

Page 28: coursera Chapter 1

OtherFormsofBias• Evenwitharandomsample,datacanstillbebiased,especiallywhencollectedonhumans• Otherformsofbiastowatchoutforindatacollection:– Questionwording– Context– Inaccurateresponses

Manyotherpossibilities– examinethespecificsofeachstudy!

Page 29: coursera Chapter 1

QuestionWording• Arandomsamplewasasked:“Shouldtherebeataxcut,orshouldmoneybeusedtofundnewgovernmentprograms?”

• Adifferentrandomsamplewasasked:“Shouldtherebeataxcut,orshouldmoneybespentonprogramsforeducation,theenvironment,healthcare,crime-fighting,andmilitarydefense?”

TaxCut:60% Programs:40%

TaxCut:22% Programs:78%

Page 30: coursera Chapter 1

InaccurateResponses• InastudyonUSstudents,93%ofthesamplesaidtheywereinthetophalfofthesampleregardingdrivingskillSvenson,O.(February1981)."Arewealllessriskyandmoreskillfulthanourfellowdrivers?" Acta Psychologica 47 (2):143–148.

• FromrandomsampleofallUScollegestudents,22.7%reportedusingillicitdrugs.Doyouthinkthisnumberisaccurate?SubstanceAbuseandMentalHealthServicesAdministration(2010).“Resultsfromthe2009NationalSurveyonDrugUseandHealth:Volume1.”SummaryofNationalFindings(OfficeofAppliedStudies,NSDUHSeriesH-38A,HHSPublicationNo.SMA10-4856Findings).Rockville,MD,heeps://nsduhweb.rti.org/

Page 31: coursera Chapter 1

Summary

Alwaysthinkcriticallyabouthowthedatawerecollected,andrecognizethatnotall

formsofdatacollectionleadtovalidinferences

� Thisistheeasiestwaytoinstantlybecomeamorestatisticallyliterateindividual!

Page 32: coursera Chapter 1

Section1.3

ExperimentsandObservationalStudies

Page 33: coursera Chapter 1

AssociationandCausationTwovariablesareassociated ifvaluesofonevariabletendtoberelatedtovalues

oftheothervariable

Twovariablesarecausallyassociatedifchangingthevalueoftheexplanatoryvariableinfluencesthevalueofthe

responsevariable

Page 34: coursera Chapter 1

Explanatory,Response,Causation

Foreachofthefollowingheadlines:– Identifytheexplanatoryandresponsevariables(ifappropriate).– Doestheheadlineimplyacausal association?

1. “DailyExerciseImprovesMentalPerformance”

2. “Wanttoloseweight?Eatmorefiber!”

3. “Catownerstendtobemoreeducatedthandogowners”

Page 35: coursera Chapter 1

0 200 400 600 800 1000

4050

6070

80

TV and Life Expectancy

TVs per 1000 People

Life

Exp

ecta

ncy

Angola

Australia

Cambodia

Canada

ChinaEgypt

France

Haiti

Iraq

Japan

Madagascar

Mexico

Morocco

Pakistan

Russia

South Africa

Sri Lanka

Uganda

United KingdomUnited States

Vietnam

Yemen

r = 0.74

TVsandLifeExpectancy

ShouldyoubuymoreTVstolivelonger?

Associationdoesnotimplycausation!

Page 36: coursera Chapter 1

ConfoundingVariableAthirdvariablethatisassociatedwithboththeexplanatoryvariableandtheresponsevariable

iscalledaconfoundingvariable

• Aconfoundingvariablecanofferaplausibleexplanationforanassociationbetweentheexplanatoryandresponsevariables

• Wheneverconfoundingvariablesarepresent(ormaybepresent),acausalassociationcannotbedetermined

Page 37: coursera Chapter 1

ConfoundingVariable

ExplanatoryVariable

ResponseVariable

ConfoundingVariable

Page 38: coursera Chapter 1

TVsandLifeExpectancy

NumberofTVspercapita

LifeExpectancy

Wealth

Page 39: coursera Chapter 1

ConfoundingVariable

Foreachofthefollowingrelationships,identifyapossibleconfoundingvariable:1. Moreicecreamsaleshavebeenlinkedtomoredeathsbydrowning.

2. Thetotalamountofbeefconsumedandthetotalamountofporkconsumedworldwidearecloselyrelatedoverthepast100years.

3. Peoplewhoownayachtaremorelikelytobuyasportscar.

4. Airpollutionishigherinplaceswithahigherproportionofpavedgroundrelativetograssyground.

5. Peoplewithshorterhairtendtobetaller.

Page 40: coursera Chapter 1

Experimentvs ObservationalStudyAnobservationalstudy isastudyinwhichtheresearcherdoesnotactivelycontrolthevalueofanyvariable,butsimply

observesthevaluesastheynaturallyexist

Anexperiment isastudyinwhichtheresearcheractivelycontrolsoneormoreof

theexplanatoryvariables

Page 41: coursera Chapter 1

ObservationalStudies• Therearealmostalwaysconfoundingvariablesinobservationalstudies

• ObservationalstudiescanalmostneverbeusedtoestablishcausationObservationalstudiescanalmostnever

beusedtoestablishcausationObservationalstudiescanalmostneverbeusedto

establishcausation

Page 42: coursera Chapter 1

KindergartenandCrime• DoesKindergartenLeadtoCrime?• Yes,accordingtoresearchconductedbyNewHampshirestate

legislatureBobKingsbury• “Kingsbury(R-Laconia),86,recentlyclaimedthatanalyseshe’sbeen

carryingoutsince1996showthatcommunitiesinhisstatethathavekindergartenprogramshaveupto400%morecrimethanlocalitieswhoseclassroomsarefreeoffinger-painting5-year-olds.PointingtohishometownofLaconia,thelargestof10communitiesinBelknapCounty,thelegislatornotedthatithastheonlykindergartenprograminthecountyandthemostcrime,includingmostorallofthecounty’srapes,robberies,assaultsandmurders.”

Szalavitz,M.“DoesKindergartenLeadtoCrime?Fact-CheckingN.H.Legislator’s`Research’,”healthland.time.com,7/6/12.

Page 43: coursera Chapter 1

TexasGOPPlatform• Afewdayslater,theTexasGOP2012Platformannouncedthatitopposedearlychildhoodeducation

• Causationorjustassociation?

Source:Strauss,V.“TexasGOPrejects‘criticalthinking’skills.Really.”www.washingtonpost.com,7/9/12.

Page 44: coursera Chapter 1

http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.htmlDatafromFacebook andBloomberg

Page 45: coursera Chapter 1

http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.htmlDatafromUSSocialSecurityAdministrationandNationalHousingFinanceAgency

Page 46: coursera Chapter 1

It’saCommonMistake!

“Theinvalidassumptionthatcorrelationimpliescauseisprobablyamongthetwoorthreemostseriousandcommonerrorsofhumanreasoning.”

- StephenJayGould

http://xkcd.com/552/

Page 47: coursera Chapter 1

Randomization

• Howcanwemakesuretoavoidconfoundingvariables?

RANDOMLYassignvaluesoftheexplanatory

variable

Page 48: coursera Chapter 1

RandomizedExperiment

Inarandomizedexperiment theexplanatoryvariableforeachunitisdeterminedrandomly,beforetheresponsevariableismeasured

Page 49: coursera Chapter 1

RandomizedExperiment• Thedifferentlevelsoftheexplanatoryvariableareknownastreatments

• Randomlydividetheunitsintogroups,andrandomlyassignadifferenttreatmenttoeachgroup

• Ifthetreatmentsarerandomlyassigned,thetreatmentgroupsshouldalllooksimilar

Page 50: coursera Chapter 1

RandomizedExperiments• Becausetheexplanatoryvariableisrandomlyassigned,itisnotassociatedwithanyothervariables.Confoundingvariablesareeliminated!!!

ExplanatoryVariable

ResponseVariable

ConfoundingVariable

RANDOMIZEDEXPERIMENT

Page 51: coursera Chapter 1

RandomizedExperiments

• Ifarandomizedexperimentyieldsasignificantassociationbetweenthetwovariables,wecanestablishcausationfromtheexplanatorytotheresponsevariable

Randomizedexperimentsareverypowerful!Theyallowyoutoinfercausality.

Page 52: coursera Chapter 1

ExerciseandtheBrain• Astudyfoundthatelderlypeoplewhowalkedatleastamileadayhadsignificantlyhigherbrainvolume(graymatterrelatedtoreasoning)andsignificantlylowerratesofAlzheimer’sanddementiacomparedtothosewhowalkedless

• Thearticlestates:“Walkingaboutamileadaycanincreasethesizeofyourgraymatter,andgreatlydecreasethechancesofdevelopingAlzheimer'sdiseaseordementiainolderadults,anewstudysuggests.”

• Isthisconclusionvalid?Allen,N.“OnewaytowardoffAlzheimer’s:TakeaHike,”msnbc.com,10/13/10.

No. Observational study – cannot yield causal conclusions.

Page 53: coursera Chapter 1

ExerciseandtheBrain

• Howwouldyoudesignanexperimenttodeterminewhetherexerciseactuallycauses changesinthebrain?

Page 54: coursera Chapter 1

ExerciseandtheBrain• Asampleofmiceweredividedrandomly intotwogroups.Onegroupwasgivenaccesstoanexercisewheel,theothergroupwaskeptsedentary

• “Thebrainsofmiceandratsthatwereallowedtorunonwheelspulsedwithvigorous,newlybornneurons,andthoseanimalsthenbreezedthroughmazesandothertestsofrodentIQ”comparedtothesedentarymice

• IsthisevidencethatexercisecausesanincreaseinbrainactivityandIQ,atleastinmice?

Reynolds,“PhysEd:YourBrainonExercise",NYTimes,July7,2010.Yes. Randomized experiment– can yield causal conclusions.

Page 55: coursera Chapter 1

Let’sTryIt!

• Isjust5secondsofexerciseenoughtoincreaseyourpulserate?

• Treatmentgroups:exerciseversussedentary• Randomlydividetheclassintothetwogroups• Givethetreatment• Measuretheresponse(pulserate)• We’lllearnhowtoanalyzethislater…

Page 56: coursera Chapter 1

KneeSurgeryforArthritisResearchersconductedastudyontheeffectivenessofakneesurgerytocurearthritis.Itwasrandomlydeterminedwhetherpeoplegotthekneesurgery.Everyonewhounderwentthesurgeryreportedfeelinglesspain.Isthisevidencethatthesurgerycausesadecreaseinpain?

No. Need a control or comparison group. What would happen without surgery?

Page 57: coursera Chapter 1

ControlGroup•Whendeterminingwhetheratreatmentiseffective,itisimportanttohaveacomparisongroup,knownasthecontrolgroup• Itisn’tenoughtoknowthateveryoneinonegroupimproved,weneedtoknowwhethertheyimprovedmorethantheywouldhaveimprovedwithoutthesurgery• Allrandomizedexperimentsneedeitheracontrolgroup,ortwodifferenttreatmentstocompare

Page 58: coursera Chapter 1

KneeSurgeryforArthritis• Inthekneesurgerystudy,thoseinthecontrolgroupreceivedafakekneesurgery.Theywereputunderandcutopen,butthedoctordidnotactuallyperformthesurgery.Allofthesepatientsalsoreportedlesspain!• Infact,theimprovementwasindistinguishablebetweenthosereceivingtherealsurgeryandthosereceivingthefakesurgery!

Source:“ThePlaceboPrescription,”NYTimesMagazine,1/9/00.

Page 59: coursera Chapter 1

PlaceboEffect• Often,peoplewillexperiencetheeffecttheythinktheyshouldbeexperiencing,eveniftheyaren’tactuallyreceivingthetreatment

• Example:Eurotrip

• Thisisknownastheplaceboeffect• Onestudyestimatedthat75%oftheeffectivenessofanti-depressantmedicationisduetotheplaceboeffect

• Formoreinformationontheplaceboeffect(it’sprettyamazing!)readThePlaceboPrescription

Page 60: coursera Chapter 1

PlaceboandBlinding• Controlgroupsshouldbegivenaplacebo,afaketreatmentthatresemblestheactivetreatmentasmuchaspossible•Usingaplaceboisonlyhelpfulifparticipantsdonotknowwhethertheyaregettingtheplaceboortheactivetreatment• Ifpossible,randomizedexperimentsshouldbedouble-blinded:neithertheparticipantsortheresearchersinvolvedshouldknowwhichtreatmentthepatientsareactuallygetting

Page 61: coursera Chapter 1

TypesofRandomizedExperiments

• Randomizingcasesintodifferenttreatmentgroupsiscalledarandomizedcomparativeexperiment

• Wecanalsogiveeachtreatmenttoeachcase,andjustrandomizetheorder inwhichtreatmentsarereceived:matchedpairsexperiment

• Eitherarevalidrandomizedexperiments!

Page 62: coursera Chapter 1

MatchedPairs

Example:Toseeifpeoplereadfasteronpaperorakindle,astudywasdoneinwhich16peoplereadtwosetsofinstructionsofsimilarlength,oneonakindleandoneonpaper.Theorderinwhichtheyreadtheinstructionswasrandomized.(Readingwasfasteronpaper.)

Page 63: coursera Chapter 1

Whynotalwaysrandomize?

• Randomizedexperimentsareideal,butsometimesnotethicalorpossible

• Often,youhavetodothebestyoucanwithdatafromobservationalstudies

• Example:researchfortheSupremeCourtcaseastowhetherpreferencesforminoritiesinuniversityadmissionshelpsorhurtstheminoritystudents

Page 64: coursera Chapter 1

Wasthesamplerandomlyselected?

Possibletogeneralizetothepopulation

Yes

Shouldnotgeneralizeto

thepopulation

No

Wastheexplanatoryvariablerandomly

assigned?

Possibletomake

conclusionsaboutcausality

Yes

Cannotmakeconclusions

aboutcausality

No

RandomizationinDataCollection

Page 65: coursera Chapter 1

DATA

TwoFundamentalQuestions inDataCollection

PopulationSample

Randomsample???

Randomizedexperiment???

Page 66: coursera Chapter 1

Randomization• Doingarandomizedexperimentonarandomsampleisideal,butrarelyachievable

• Ifthefocusofthestudyisusingasampletoestimateastatisticfortheentirepopulation,youneedarandomsample,butdonotneedarandomizedexperiment(example:electionpolling)

• Ifthefocusofthestudyisestablishingcausalityfromonevariabletoanother,youneedarandomizedexperimentandcansettleforanon-randomsample(example:drugtesting)

Page 67: coursera Chapter 1

Summary(1.3)• Associationdoesnotimplycausation!• Inobservationalstudies,confoundingvariablesalmostalwaysexist,socausationcannotbeestablished

• Randomizedexperimentsinvolverandomlydeterminingtheleveloftheexplanatoryvariable

• Randomizedexperimentspreventconfoundingvariables,socausalitycanbeinferred

• Acontrolorcomparisongroupisnecessary• Theplaceboeffectexists,soaplaceboandblindingshouldbeused