Top Banner
Hypothesis tests for correla0on
43

Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Apr 09, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Hypothesis tests for correla0on

Page 2: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Overview

Reviewofhypothesisformorethan2means

Con4nuediscussionofFisher’ssignificancetestsvs.Neyman-PearsonhypothesistestsHypothesistestsforcorrela4on

Page 3: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Final project: analyze your own data set

Finalprojectreport:a5-10pageRMarkdowndocument:

>source('/home/shared/intro_stats/cs206_func4ons.R')>get_worksheet("final")

AoneparagraphfinalprojectproposalisdueonTuesdayNovember20th• Whatques4onyouwillanswer• Whereyouwillgetthedata

Page 4: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Review comparing more than two means

RoVenTomatoesisawebsitethatprovidesmoviera4ngsandreviews

Eachmoviegetsa“Tomatometer”scorethatisbasedonthepercentageofapprovedcri4cswhogavethemovieaposi4vereview.

Ques4on:arethegenresofAc4on,Anima4on,Comedy,DramaandHorrorallratedthesameonaverage?

Dataconsistsofscoresfrom2007to2013.Whatisthefirstthingweshoulddo?

Page 5: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Ro?en Tomatoes movie scores by genre

Whatshouldwedonext?

Page 6: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Hypothesis tes0ng

Step1:H0:μAc4on=μAnima4on=μComedy=μDrama=μHorrorHA:μi≠μjforonepairoffieldsofstudy

Step2:Meanabsolutedifference(MAD):

(|x̅Ac-x̅An|+|x̅Ac-x̅Co|+|x̅Ac-x̅Dr|+|x̅Ac-x̅Ho|+…+|x̅Dr-x̅Ho|)/10

Page 7: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

3. Create the null distribu0on!

Computesta4s4csfromshuffledgroups

Reconstructedpar4cipantpooldataunderH0

Ac4on1 Anima4on2 Horror...

Shuffled‘Ac4on’ Shuffled‘groupk’Shuffled‘Anima4on’ ...

ShuffledataforrandomassignmentconsistentwithH0

Page 8: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

P-value

p-value=0

Page 9: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Two theories of hypothesis tes0ng

Null-hypothesissignificancetes4ng(NHST)isahybridoftwotheories:

1.Significancetes4ngofRonaldFisher

2.Hypothesistes4ngofJezyNeymanandEgonPearson

Fisher(1890-1962) Neyman(1894-1981) Pearson(1895-1980)

Page 10: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Ronald Fisher’s significance tes0ng

Viewsthep-valueasstrengthofevidenceagainstthenullhypothesis•  P-valuespartofanon-goingscien4ficprocess:tellstheexperimenter“whatresultstoignore”

Page 11: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Neyman-Pearson null hypothesis tes0ng

Makesaformaldecisioninsta4s4caltests

RejectH0:iftheobservedsamplesta4s4cissoextremeisunlikelywhenH0istrue•  i.e.,rejectH0ifthep-valueislessthansomepredeterminedsignificancelevelα

DonotrejectH0:ifthesta4s4cisnottooextremewhenH0istrue.Thismeansthetestisinconclusive.

Page 12: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Frequen0st logic

TypeIerror:incorrectlyrejec4ngthenullhypothesiswhenitistrue

IfNeyman-Pearsonnullhypothesistes4ngparadigmwasfollowedperfectly,thenonly~5%ofallpublishedresearchfindingsshouldbewrong(forα=0.05)•  i.e.,wewouldonlymaketypeIerrors5%ofthe4me

Page 13: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Problems with the NP hypothesis tests

Problem1:weareinterestedintheresultsofaspecificexperiment,notwhetherwearerightmostofthe4me•  E.g.,95%ofthesestatementsaretrue:

•  Calciumisgoodforyourheart,Paulispsychic,BuzzandDoriscancommunicate,…

Problem2:Arbitrarythresholdsforalphalevels•  P-value=0.051,wedon’trejectH0?

Problem3:runningmanytestscangiverisetoahighnumberoftype1errors

Page 14: Hypothesis tests for correlaon - emeyers.scripts.mit.edu
Page 15: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Problems with the NP hypothesis tests

Problem1:weareinterestedintheresultsofaspecificexperiment,notwhetherwearerightmostofthe4me•  E.g.,95%ofthesestatementsaretrue:

•  Calciumisgoodforyourheart,Paulispsychic,BuzzandDoriscancommunicate,…

Problem2:Arbitrarythresholdsforalphalevels•  P-value=0.051,wedon’trejectH0?

Problem3:runningmanytestscangiverisetoahighnumberoftype1errors

Page 16: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Genes and leukemia example

Scien4stscollected7129geneexpressionlevelsfrom38pa4entstofindgene4cdifferencesbetweentwotypesleukemia(L1andL2)

Supposetherewasnogene4cdifferencesbetweenthetypesofleukemia•  H0:μL1=μL2istrueforallgenes

Q:Ifeachgenewastestedseparatelyusingasignificancelevelofα=0.05,approximatelyhowmanytype1errorswouldbeexpected?•  A:7129x0.05=356

Page 17: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Genes and leukemia example

Therearemethodsthattrytocorrectforrunningmul4plehypothesistests

TheBonferronicorrec/onisonewaythatcontrolstheprobabilityofanyhypothesistestgivingatype1error•  i.e.,controlsthefamilywiseerrorrate(notype1errorsforanyofthetestsrun)

Itworksbydividingtheini4alαlevelbythenumberoftestsrun•  E.g.,α=0.05/7129=0.000007•  Allp-valuesneedtobebelowthisleveltobeconsideredsta4s4callysignificant•  Thiscanleadtomanytype2errors(Type2error:failuretorejectH0whenitisfalse)

Page 18: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

hVp://xkcd.com/882/

Mul4plehypothesistests

Page 19: Hypothesis tests for correlaon - emeyers.scripts.mit.edu
Page 20: Hypothesis tests for correlaon - emeyers.scripts.mit.edu
Page 21: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

hVp://xkcd.com/882/

Page 22: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

The problem of mul0ple tes0ng

Forα=0.05,~5%ofallpublishedresearchfindingsshouldbewrongPublica4onbias(filedrawereffect):Generallyposi4veresultsaremorelikelytobepublished,soifyoureadtheliterature,thenumberofincorrectresults(type1errors)willbegreaterthan5%.

Page 23: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

AmericanSta4s4calAssocia4on’sStatementonp-values

Page 24: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Some thoughts…

BeVertohavehypothesisteststhannoneatall.Justneedtothinkcarefullyanduseyourjudgment.

Reporteffectsizeinmostcases–i.e.,confidenceintervals

Reportthep-valuesratherthanaccept/rejectH0•  i.e.,reportp=0.23notp<0.05

Replicatefindings(perhapsindifferentcontexts)tomakesureyougetthesameresultsBeagood/honestscien4stsandtrytogetatthetruth!

Page 25: Hypothesis tests for correlaon - emeyers.scripts.mit.edu
Page 26: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Hypothesis tests for correla0on

Isthereaposi4vecorrela4onbetweenthenumberofcarbohydratesinacerealandthenumbercalories?

Whatisthepopula4onparameterandthesta4s4cofinterest?

Page 27: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Significance tests for correla0on

Supposewehadsomedatafrom30randomlyselectedcereals

Whatwouldbeagoodfirststepinanalyzingthedata?

Calories Carbohydrates

AppleJacks 117 27

BooBerry 118 27

Cap’nCrunch 144 31

CinnamonToastCrunch 169 32

Page 28: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Correla0on between Carbohydrates and Calories

Guessesontheobservedcorrela4onr?

Page 29: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Hypothesis tes0ng for correla0on

Talktoyourneighborandcompletethesesteps:1.Writedownthenullandalterna4veinsymbolsandwords

2.Loadthedataandcomputetheobservedsta4s4c:>load("/home/shared/intro_stats/cs206_data/cereal.Rda")3.Howcouldweassesswhetherthiscorrela4onisduetochance?•  i.e.,howwecancreateonepointinthenulldistribu4on

Page 30: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Hypothesis tes0ng for correla0on

1.  H0:ρ=0Thereisnocorrela4onbetweencaloriesandcarbsHA:ρ>0Thereisaposi4vecorrela4onbetweencaloriesandcarbs2.  Theobservedsta4s4cis:

obs_cor<-cor(cereal$Calories,cereal$Carbs)

Page 31: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Hypothesis tes0ng for correla0on

3.Ifthenullhypothesiswastrue,thenthereisnocorrela4onbetweenxandy

Soshufflingtheorderofthevaluesinx(ory)andcalcula4ngthecorrela4onwouldbejustasvalidascalcula4ngthecorrela4onontheoriginalxandy

Calories Carbohydrates

AppleJacks 117 27

BooBerry 118 27

Cap’nCrunch 144 31

CinnamonToastCrunch 169 32

Page 32: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Hypothesis tes0ng for correla0on

Wecanshuffledatausingthesample()func4on>sample(cereal$Carbs)

Howcanwecreateonepointinournulldistribu4on?>cor(cereal$Calories,sample(cereal$Carbs))

Howcanwecreateafullnulldistribu4on?•  CanyouwritedowntheRcodeyourself?

Page 33: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Hypothesis tes0ng for correla0on

Rcodetocreateanulldistribu4onforcorrela4on:null_dist<-NULLfor(iin1:10000){null_dist[i]<-cor(cereal$Calories,sample(cereal$Carbs))}Howcanwegetap-valueinR?p_value<-sum(null_dist>=obs_corr)/10000

Page 34: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Whereistheobservedsta4s4conthisdistribu4on?Whatisthep-value?

Page 35: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

NHST for correla0on

Whatconclusionwouldyoudraw?

Page 36: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

1969 Vietnam DraV

Page 37: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

date Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 305 86 108 32 330 249 93 111 225 359 19 129 2 159 144 29 271 298 228 350 45 161 125 34 328 3 251 297 267 83 40 301 115 261 49 244 348 157 4 215 210 275 81 276 20 279 145 232 202 266 165 5 101 214 293 269 364 28 188 54 82 24 310 56 6 224 347 139 253 155 110 327 114 6 87 76 10 7 306 91 122 147 35 85 50 168 8 234 51 12 8 199 181 213 312 321 366 13 48 184 283 97 105 9 194 338 317 219 197 335 277 106 263 342 80 43 10 325 216 323 218 65 206 284 21 71 220 282 41 11 329 150 136 14 37 134 248 324 158 237 46 39 12 221 68 300 346 133 272 15 142 242 72 66 314 13 318 152 259 124 295 69 42 307 175 138 126 163 14 238 4 354 231 178 356 331 198 1 294 127 26 15 17 89 169 273 130 180 322 102 113 171 131 320 16 121 212 166 148 55 274 120 44 207 254 107 96 17 235 189 33 260 112 73 98 154 255 288 143 304 18 140 292 332 90 278 341 190 141 246 5 146 128 19 58 25 200 336 75 104 227 311 177 241 203 240 20 280 302 239 345 183 360 187 344 63 192 185 135 21 186 363 334 62 250 60 27 291 204 243 156 70 22 337 290 265 316 326 247 153 339 160 117 9 53 23 118 57 256 252 319 109 172 116 119 201 182 162 24 59 236 258 2 31 358 23 36 195 196 230 95 25 52 179 343 351 361 137 67 286 149 176 132 84 26 92 365 170 340 357 22 303 245 18 7 309 173 27 355 205 268 74 296 64 289 352 233 264 47 78 28 77 299 223 262 308 222 88 167 257 94 281 123 29 349 285 362 191 226 353 270 61 151 229 99 16 30 164 217 208 103 209 287 333 315 38 174 3 31 211 30 313 193 11 79 100

Page 38: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

date Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 305 86 108 32 330 249 93 111 225 359 19 129 2 159 144 29 271 298 228 350 45 161 125 34 328 3 251 297 267 83 40 301 115 261 49 244 348 157 4 215 210 275 81 276 20 279 145 232 202 266 165 5 101 214 293 269 364 28 188 54 82 24 310 56 6 224 347 139 253 155 110 327 114 6 87 76 10 7 306 91 122 147 35 85 50 168 8 234 51 12 8 199 181 213 312 321 366 13 48 184 283 97 105 9 194 338 317 219 197 335 277 106 263 342 80 43 10 325 216 323 218 65 206 284 21 71 220 282 41 11 329 150 136 14 37 134 248 324 158 237 46 39 12 221 68 300 346 133 272 15 142 242 72 66 314 13 318 152 259 124 295 69 42 307 175 138 126 163 14 238 4 354 231 178 356 331 198 1 294 127 26 15 17 89 169 273 130 180 322 102 113 171 131 320 16 121 212 166 148 55 274 120 44 207 254 107 96 17 235 189 33 260 112 73 98 154 255 288 143 304 18 140 292 332 90 278 341 190 141 246 5 146 128 19 58 25 200 336 75 104 227 311 177 241 203 240 20 280 302 239 345 183 360 187 344 63 192 185 135 21 186 363 334 62 250 60 27 291 204 243 156 70 22 337 290 265 316 326 247 153 339 160 117 9 53 23 118 57 256 252 319 109 172 116 119 201 182 162 24 59 236 258 2 31 358 23 36 195 196 230 95 25 52 179 343 351 361 137 67 286 149 176 132 84 26 92 365 170 340 357 22 303 245 18 7 309 173 27 355 205 268 74 296 64 289 352 233 264 47 78 28 77 299 223 262 308 222 88 167 257 94 281 123 29 349 285 362 191 226 353 270 61 151 229 99 16 30 164 217 208 103 209 287 333 315 38 174 3 31 211 30 313 193 11 79 100

ThefirstdatepickedwasSept14(sequen4alnumber258)

Page 39: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

date Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 305 86 108 32 330 249 93 111 225 359 19 129 2 159 144 29 271 298 228 350 45 161 125 34 328 3 251 297 267 83 40 301 115 261 49 244 348 157 4 215 210 275 81 276 20 279 145 232 202 266 165 5 101 214 293 269 364 28 188 54 82 24 310 56 6 224 347 139 253 155 110 327 114 6 87 76 10 7 306 91 122 147 35 85 50 168 8 234 51 12 8 199 181 213 312 321 366 13 48 184 283 97 105 9 194 338 317 219 197 335 277 106 263 342 80 43 10 325 216 323 218 65 206 284 21 71 220 282 41 11 329 150 136 14 37 134 248 324 158 237 46 39 12 221 68 300 346 133 272 15 142 242 72 66 314 13 318 152 259 124 295 69 42 307 175 138 126 163 14 238 4 354 231 178 356 331 198 1 294 127 26 15 17 89 169 273 130 180 322 102 113 171 131 320 16 121 212 166 148 55 274 120 44 207 254 107 96 17 235 189 33 260 112 73 98 154 255 288 143 304 18 140 292 332 90 278 341 190 141 246 5 146 128 19 58 25 200 336 75 104 227 311 177 241 203 240 20 280 302 239 345 183 360 187 344 63 192 185 135 21 186 363 334 62 250 60 27 291 204 243 156 70 22 337 290 265 316 326 247 153 339 160 117 9 53 23 118 57 256 252 319 109 172 116 119 201 182 162 24 59 236 258 2 31 358 23 36 195 196 230 95 25 52 179 343 351 361 137 67 286 149 176 132 84 26 92 365 170 340 357 22 303 245 18 7 309 173 27 355 205 268 74 296 64 289 352 233 264 47 78 28 77 299 223 262 308 222 88 167 257 94 281 123 29 349 285 362 191 226 353 270 61 151 229 99 16 30 164 217 208 103 209 287 333 315 38 174 3 31 211 30 313 193 11 79 100

TheseconddatepickedwasApril24th(sequen4alnumber115)

Page 40: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

date Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 305 86 108 32 330 249 93 111 225 359 19 129 2 159 144 29 271 298 228 350 45 161 125 34 328 3 251 297 267 83 40 301 115 261 49 244 348 157 4 215 210 275 81 276 20 279 145 232 202 266 165 5 101 214 293 269 364 28 188 54 82 24 310 56 6 224 347 139 253 155 110 327 114 6 87 76 10 7 306 91 122 147 35 85 50 168 8 234 51 12 8 199 181 213 312 321 366 13 48 184 283 97 105 9 194 338 317 219 197 335 277 106 263 342 80 43 10 325 216 323 218 65 206 284 21 71 220 282 41 11 329 150 136 14 37 134 248 324 158 237 46 39 12 221 68 300 346 133 272 15 142 242 72 66 314 13 318 152 259 124 295 69 42 307 175 138 126 163 14 238 4 354 231 178 356 331 198 1 294 127 26 15 17 89 169 273 130 180 322 102 113 171 131 320 16 121 212 166 148 55 274 120 44 207 254 107 96 17 235 189 33 260 112 73 98 154 255 288 143 304 18 140 292 332 90 278 341 190 141 246 5 146 128 19 58 25 200 336 75 104 227 311 177 241 203 240 20 280 302 239 345 183 360 187 344 63 192 185 135 21 186 363 334 62 250 60 27 291 204 243 156 70 22 337 290 265 316 326 247 153 339 160 117 9 53 23 118 57 256 252 319 109 172 116 119 201 182 162 24 59 236 258 2 31 358 23 36 195 196 230 95 25 52 179 343 351 361 137 67 286 149 176 132 84 26 92 365 170 340 357 22 303 245 18 7 309 173 27 355 205 268 74 296 64 289 352 233 264 47 78 28 77 299 223 262 308 222 88 167 257 94 281 123 29 349 285 362 191 226 353 270 61 151 229 99 16 30 164 217 208 103 209 287 333 315 38 174 3 31 211 30 313 193 11 79 100

WhatisyourDra�number?

Page 41: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

1969 Vietnam DraV sorted by sequen0al date

Date

SequenDaldate

DraEnumber

Jan1 1 305Jan2 2 159Jan3 3 251Jan4 4 215Jan5 5 101Jan6 6 224Jan7 7 306Jan8 8 199Jan9 9 194

Page 42: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

1970 Vietnam DraV

Inaperfectlyfair,randomloVery,whatshouldbethevalueofthecorrela4oncoefficientbetweendraEnumberandsequenDaldateofbirthday?

Page 43: Hypothesis tests for correlaon - emeyers.scripts.mit.edu

Worksheet 10

Usehypothesistes4ngtoassesswhetherthereisacorrela4onbetweensequen4aldateanddra�number•  i.e.,wasthedra�reallyrandom?

>source('/home/shared/intro_stats/cs206_func4ons.R')>get_worksheet(10)