1 Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University [email protected] Nonparametric.

1

Henry Horng-Shing LuInstitute of Statistics

National Chiao Tung [email protected]

http://tigpbp.iis.sinica.edu.tw/courses.htm

Nonparametric Methods III

PART 4: Bootstrap and Permutation Tests Introduction References Bootstrap Tests Permutation Tests Cross-validation Bootstrap Regression ANOVA

2

References Efron, B.; Tibshirani, R. (1993). An

Introduction to the Bootstrap. Chapman & Hall/CRC.

http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf

http://cran.r-project.org/bin/macosx/2.1/check/bootstrap-check.ex

http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf

3

Hypothesis Testing (1) A statistical hypothesis test is a method

of making statistical decisions from and about experimental data.

Null-hypothesis testing just answers the question of “how well the findings fit the possibility that chance factors alone might be responsible.”

This is done by asking and answering a hypothetical question.

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing 4

Hypothesis Testing (2) Hypothesis testing is largely the product of

Ronald Fisher, Jerzy Neyman, Karl Pearson and (son) Egon Pearson. Fisher was an agricultural statistician who emphasized rigorous experimental design and methods to extract a result from few samples assuming Gaussian distributions.

5

Hypothesis Testing (3)Neyman (who teamed with the younger Pearson) emphasized mathematical rigor and methods to obtain more results from many samples and a wider range of distributions. Modern hypothesis testing is an (extended) hybrid of the Fisher vs. Neyman/Pearson formulation, methods and terminology developed in the early 20th century.

6

Hypothesis Testing (4)

7


8


9

Hypothesis Testing (7) Parametric Tests:

Nonparametric Tests: Bootstrap Tests Permutation Tests

10

Confidence Intervals vs. Hypothesis Testing (1) Interval estimation ("Confidence Intervals")

and point estimation ("Hypothesis Testing") are two different ways of expressing the same information.

http://www.une.edu.au/WebStat/unit_materials/c5_inferential_statistics/confidence_interv_hypo.html

11

Confidence Intervals vs. Hypothesis Testing (2) If the exact p-value is reported, then the

relationship between confidence intervals and hypothesis testing is very close. However, the objective of the two methods is different: Hypothesis testing relates to a single conclusion

of statistical significance vs. no statistical significance.

Confidence intervals provide a range of plausible values for your population.

12

Confidence Intervals vs. Hypothesis Testing (3) Which one?

Use hypothesis testing when you want to do a strict comparison with a pre-specified hypothesis and significance level.

Use confidence intervals to describe the magnitude of an effect (e.g., mean difference, odds ratio, etc.) or when you want to describe a single sample.

http://www.nedarc.org/nedarc/analyzingData/advancedStatistics/convidenceVsHypothesis.html

13

P-value

http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf 14

Achieved Significance Level (ASL) Definition:

A hypothesis test is a way of deciding whether or not the data decisively reject the hypothesis .The archived significance level of the test (ASL) is defined as: .The smaller ASL, the stronger is the evidence of

false.The ASL is an estimate of the p-value by permutation and bootstrap methods.

https://www.cs.tcd.ie/Rozenn.Dahyot/453Bootstrap/05_Permutation.pdf

15

0H

*0 0

ˆ ÂSL |P H

0H

Bootstrap Tests Methodology Flowchart R code

16

Bootstrap Tests Beran (1988) showed that bootstrap

inference is refined when the quantity bootstrapped is asymptotically pivotal.

It is often used as a robust alternative to inference based on parametric assumptions.

http://socserv.mcmaster.ca/jfox/Books/Companion/appendix-bootstrapping.pdf

17

Hypothesis Testing by a Pivot (1) Pivot or pivotal quantity: a function of

observations whose distribution does not depend on unknown parameters.

http://en.wikipedia.org/wiki/Pivotal_quantity Examples:

A pivot:

when and is known

18

1,0~ NX

Z

NXiid

i ~ ,1

n

XX

n

ii

Hypothesis Testing by a Pivot (2)

An asymptotic pivot:

when

where , is unknown, and

nNnS

XT D as 1,0

NXiid

i ~

n

XX

n

ii

1

1

1

2

n

XXS

n

ii

One Sample Bootstrap Tests T statistics can be regarded as a pivot or an

asymptotic pivotal when the data are normally distributed.

Bootstrap T tests can be applied when the data are not normally distributed.

Bootstrap T tests Flowchart R code

Flowchart of Bootstrap T Tests

22

01 2 0

ˆˆ ( , , ..., ) ( ),

ˆˆ ( )ndata x x x x x s x and t

Bootstrap B times

*Bx*2x*1x

*Bt

*2t

*1t

*0ASL #{ }/Boot bt t B

** 0

*

ˆ

ˆˆ ( )b

b

b

t

Bootstrap T Tests by R

Output

23

Bootstrap Tests by The “Bca” The BCa percentile method is an efficient

method to generate bootstrap confidence intervals.

There is a correspondence between confidence intervals and hypothesis testing.

So, we can use the BCa percentile method to test whether H0 is true.

Example: use BCa to calculate p-value

24

BCa Confidence Intervals: Use R package “boot.ci(boot)” Use R package “bcanon(bootstrap)” http://qualopt.eivd.ch/stats/?page=bootstrap http://www.stata.com/capabilities/boot.html

25

R package "boot.ci(boot)" http://finzi.psych.upenn.edu/R/library/boot/

DESCRIPTION

26

An Example of "boot.ci" in R

Output

27

R package "bcanon(bootstrap)" http://finzi.psych.upenn.edu/R/library/

bootstrap/DESCRIPTION

28

An example of "bcanon" in R

Output

29

BCa http://qualopt.eivd.ch/stats/?page=bootstrap

30

Two Sample Bootstrap Tests Flowchart R code

31

Flowchart of Two-Sample Bootstrap Tests

32

Bootstrap B times

*ˆ ˆ ÂSL (#( )) /Boot b B

1 2 1: ( , y , ..., y )nSample yy

1 2 2: ( , x , ..., x )mSample xx

1 2 1ˆ : ( , , ..., , , ..., ) ( , ) ( ) ( )n n n mcombined data d d d d d s s d y x y x

m+n=Ncombine

* * *1 1 1ˆ ( ) ( )s s y x * * *

2 2 2ˆ ( ) ( )s s y x * * *ˆ ( ) ( )B B Bs s y x

* * *1 1 1( , )d y x * * *

2 2 2( , )d y x * * *( , )B B Bd y x

Two-Sample Bootstrap Tests by R

Output33

Permutation Tests Methodology Flowchart R code

34

Permutation In several fields of mathematics, the term

permutation is used with different but closely related meanings. They all relate to the notion of (re-)arranging elements from a given finite set into a sequence.

http://en.wikipedia.org/wiki/Permutation

35

Permutation Tests (1) Permutation test is also called a

randomization test, re-randomization test, or an exact test.

If the labels are exchangeable under the null hypothesis, then the resulting tests yield exact significance levels.

36

Permutation Tests (2) Confidence intervals can then be derived

from the tests.

The theory has evolved from the works of R.A. Fisher and E.J.G. Pitman in the 1930s.

http://en.wikipedia.org/wiki/Pitman_permutation_test

37

Applications of Permutation Tests (1) We can use a permutation test only when

we can see how to resample in a way that is consistent with the study design and with the null hypothesis.

http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf

38

Applications of Permutation Tests (2)

Two-sample problems when the null hypothesis says that the two populations are identical. We may wish to compare population means, proportions, standard deviations, or other statistics.

Matched pairs designs when the null hypothesis says that there are only random differences within pairs. A variety of comparisons is again possible.

Relationships between two quantitative variables when the null hypothesis says that the variables are not related. The correlation is the most common measure of association, but not the only one.

39

Inference by Permutation Tests (1) A traditional way is to consider some

hypotheses: and ,and the null hypothesis becomes .Under , the statistic can be modeled as a normal distribution with mean

0 and variance .

https://www.cs.tcd.ie/Rozenn.Dahyot/

453Bootstrap/05_Permutation.pdf40

2~aF N 2~bF N

a b

0H â bX X

2 2ˆ

1 1

m n

Inference by Permutation Tests (2) The ASL is then computed by

when is unknown and has to be estimated from the data by

We will reject if .41

2*

2ˆ

ˆ ˆ

2*

ˆˆ

ÂSL2

ed

2 2

2 1 1

2

n m

ai a bi bi i

X X X X

m n

0H ASL a

Flowchart of The Permutation Test for Mean Shift in One Sample

42

1 2 1 2 , , ..., , , , ..., n n n n mSample x x x x x x

Partition 2 subset B times

* * *1 2

ˆ ( ) ( )b b bs s x x

1x 2x11O 12O

1 2ˆ ( ) ( )s s x x

*ˆ ˆ ÂSL (#( )) / , and NPerm b nB B C

*11x

*21x

(treatment group) (control group) (treatment group) (control group)

11G 12G

*12x

*22x

21G 22G

*1Bx

*2Bx

1BG 2BG

n m N

An Example for One Sample Permutation Test by R (1)

43

An Example for One Sample Permutation Test by R (2)

http://mason.gmu.edu/~csutton/EandTCh15a.txt

44

An Example for One Sample Permutation Test by R (3) Output

45

Flowchart of The Permutation Test for Mean Shift in Two Samples

46

* * *ˆ ( ) ( )b b bs s x y

*ˆ ˆ ÂSL (#( )) / , and NPerm b nB B C

*1x

*1y

11G 12G

*2x

*2y

21G 22G

*Bx

*By

1BG 2BG

treatment

subgroup

control

subgroup

treatment

subgroup

control

subgroup

1 2 2: ( , x , ..., x )mSample xx Partition subset B times

1 2 1: ( , y , ..., y )nSample yym+n=N

1 2 1ˆ : ( , , ..., , , ..., ) ( , ) ( ) ( )n n n mcombined data d d d d d s s d y x y x

combine

Bootstrap Tests vs. Permutation Tests Very similar results between the

permutation test and the bootstrap test. is the exact probability when . is not an exact probability but is

guaranteed to be accurate as an estimate of the ASL, as the sample size B goes to infinity.

https://www.cs.tcd.ie/Rozenn.Dahyot/453Bootstrap/05_Permutation.pdf

47

ASLPerm

ASLBoot

NnB C

Cross-validation Methodology R code

48

Cross-validation Cross-validation, sometimes called rotation

estimation, is the statistical practice of partitioning a sample of data into subsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained for subsequent use in confirming and validating the initial analysis. The initial subset of data is called the training set. The other subset(s) are called validation or testing

sets. http://en.wikipedia.org/wiki/Cross-validation

49

Overfitting Problems (1) In statistics, overfitting is fitting a statistical

model that has too many parameters. When the degrees of freedom in parameter

selection exceed the information content of the data, this leads to arbitrariness in the final (fitted) model parameters which reduces or destroys the ability of the model to generalize beyond the fitting data.

50

Overfitting Problems (2) The concept of overfitting is important also

in machine learning. In both statistics and machine learning, in

order to avoid overfitting, it is necessary to use additional techniques (e.g. cross-validation, early stopping, Bayesian priors on parameters or model comparison), that can indicate when further training is not resulting in better generalization.

http://en.wikipedia.org/wiki/Overfitting51

R package “crossval(bootstrap)”

52

Output

An Example of Cross-validation by R

53

Bootstrap Regression Bootstrapping pairs:

Resample from the sample pairs { }. Bootstrapping residuals:

1. Fit by the original sample and obtain the residuals.2. Resample from residuals.

54

,i ix y

î iy x

Bootstrapping Pairs by R (1)

http://www.stat.uiuc.edu/~babailey/stat328/lab7.html

55

Bootstrapping Pairs by R (2) Output

56

Bootstrapping Residuals by R

Output

57

ANOVA When random errors follow a normal

distribution: When random errors do not follow a Normal

distribution: Bootstrap tests:Permutation tests:

58

An Example of ANOVA by R (1) Example

Twenty lambs are randomly assigned to three different diets. The weight gain (in two weeks) is recorded. Is there a difference among the diets?

http://mcs.une.edu.au/~stat261/Bootstrap/bootstrap.R

59

An Example of ANOVA by R (2)

60


61


62

An Example of ANOVA by R (5) Output

63


64


65

An Example of ANOVA by R (1) Data source

http://finzi.psych.upenn.edu/R/library/rpart/html/kyphosis.html

Reference http://www.stat.umn.edu/geyer/5601/examp/

parm.html

66

An Example of ANOVA by R (2) Kyphosis is a misalignment of the spine. The

data are on 83 laminectomy (a surgical procedure involving the spine) patients. The predictor variables are age and age^2 (that is, a quadratic function of age), number of vertebrae involved in the surgery and start the vertebra number of the first vertebra involved. The response is presence or absence of kyphosis after the surgery (and perhaps caused by it).

67


68

An Example of ANOVA by R (4) Output

69


70


71

Exercises Write your own programs similar to those

examples presented in this talk.

Write programs for those examples mentioned at the reference web pages.

Write programs for the other examples that you know.

Practice Makes Perfect!72

1 Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University [email protected] Nonparametric.

Documents

testing testing

estimation hypothesis

modern hypothesis testing

b testing

statistical hypothesis

prespecified hypothesis

archived significance

test asl