Top Banner
Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma
39

Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Jan 02, 2016

Download

Documents

Aubrie Copeland
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Extending the Bootstrap Using a Univariate Sampling Model to

Multivariate Settings

Joseph Lee Rodgers and William BeasleyUniversity of Oklahoma

Page 2: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Focusing on the Alternative (Researcher’s) Hypothesis: AHST

Joseph Lee Rodgers and William BeasleyUniversity of Oklahoma

Page 3: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Goals

• To review history of NHST• To review recent attention to AHST• To show how the bootstrap is facile in AHST

settings using univariate correlations• To extend this conceptualization into

multivariate correlation settings– Multiple regression– Factor analysis, etc.

Page 4: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

History of NHST• Not necessary, though entertaining• NHST emerged in the 1920’s and 1930’s from

Fisher’s hypothesis testing paradigm, and from the Neyman-Pearson decision-making paradigm

• In its modern version, it combines Fisher’s attention to the null hypothesis and p-value with Neyman-Pearson’s development of alpha, the alternative hypothesis, and statistical power

• The result: “an incoherent mismash of some of Fisher’s ideas on the one hand, and some of the ideas of Neyman and E. S. Pearson on the other hand” (Gigerenzer , 1993)

Page 5: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Criticism of NHST

• NHST “never makes a positive contribution” (Schmidt & Hunter, 1997)

• NHST “has not only failed to support and advance psychology as a science but also has seriously impeded it” (Cohen, 1994)

• NHST is “surely the most boneheadedlymisguided procedure ever institutionalized in therote training of science students” (Rozeboom, 1997).

Page 6: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• In a recent American Psychologist paper I suggested that the field of quantitative methodology has transitioned away from NHST toward a modeling framework that emphasizes the researcher’s hypothesis – an AHST framework – with relatively little discussion

Page 7: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Praise for AHST

• AHST “Alternative Hypothesis Significance Testing”

• “Most tests of null hypotheses are rather feckless and potentially misleading. However, an additional brand of sensible significance tests arises in assessing the goodness-of-fit of substantive models to data.” (Abelson, 1997)

Page 8: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• “After the introduction of … structural models …, it soon became apparent that the structural modeler has, in some sense, the opposite intention to the experimentalist. The latter hopes to “reject” a restrictive hypothesis of the absence of certain causal effects in favor of their presence—rejection permits publication. . . . The former wishes to “accept” a restrictive model of the absence of certain causal effects—acceptance permits publication.” (McDonald, 1997)

Page 9: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• Our “approach allows for testing null hypotheses of not-good fit, reversing the role of the null hypothesis in conventional tests of model fit, so that a significant result provides strong support for good fit.” (MacCallum, Browne, & Sugawara, 1996)

Page 10: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

More background• Both Gosset and Fisher wanted to test

hypotheses using resampling methods:

• “Before I had succeeded in solving my problem analytically, I had endeavored to do so empirically.” (Gosset, 1908)

• “[The] conclusions have no justification beyond the fact they could have been arrived at by this very elementary [re-randomization] method.” (Fisher, 1936)

Page 11: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• Both actually did resampling (rerandomization)– Gosset (1908) used N=3000 datapoints collected

from prisoners, written on slips of paper and drawn from a hat

– Fisher (1920’s) used some of Darwin’s data, comparing cross-fertilized and self-fertilized corn

• So one reasonable view is that these statistical pioneers developed parametric statistical procedures because they lacked the computational resources to use resampling methods

Page 12: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Modern Resampling

• Randomization or permutation tests – Gosset and Fisher

• The Jackknife – Quennoille, 1949; Tukey, 1953

• The Bootstrap – Efron, 1979

Page 13: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Bootstrapping Correlations

• Early work– Diaconis and Efron, Scientific American, 1983,

provided conceptual motivation– But straightforward percentile-based bootstraps

didn’t work very well– So they were bias-corrected, and accelerated, and

for awhile the BCa bootstrap was the state-of-the-art

Page 14: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• Lee and Rodgers (1998) showed how to regain conceptual simplicity using univariate sampling, rather than bivariate sampling– Especially effective in small samples and highly

skewed settings– Was as good or even better than parametric

methods in normal distribution settings

• Example using Diaconis & Efron (1983) data

Page 15: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.
Page 16: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• Beasley et al (2007) applied the same univariate sampling logic to test nonzero null hypotheses about correlations

• The methodology used there is what we’re currently extended to multivariate settings, and will be described in detail

Page 17: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• Steps in the Beasley et al approach– Given observed bivariate data, and sample r– Define a univariate sampling frame (rectangular)

that respects the two marginal distributions– Diagonalize the sampling frame to have a given

correlation using matrix square root procedure (e.g., Kaiser & Dickman, 1962)

– Use the new sampling frame to generate bootstrap samples and construct an empirical sampling distribution

Page 18: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• Two methods:– HI, hypothesis-imposed – generate an empirical

bootstrap distribution around the hypothesized null non-zero correlation

• E.g., to test ρ = .5, diagonalize sampling frame to have correlation of .5, bootstrap, then evaluate whether observed r is contained in the 95% percentile interval

– OI, observed-imposed – generate an empirical bootstrap distribution around the observed r

• E.g., to test ρ = .5, diagonalize sampling frame to have correlation of observed r, bootstrap, then evaluate whether hypothesized ρ = .5 is contained in the 95% percentile interval

Page 19: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.
Page 20: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• Both OI and HI work effectively, OI seems to work slightly better

• Beasley’s (2010) dissertation was a Bayesian approach that is highly computationally intensive – both the bootstrap and the Bayesian method require lots of computational resources – but it works quite well

Page 21: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Review the AHST logic• Define a hypothesis, model, theory, that makes a

prediction (e.g., of a certain correlation, or a correlation structure)

• Define a sampling distribution in relation to that hypothesis

• Directly test the hypothesis, using an AHST logic• Why didn’t Gosset/Fisher or Neyman-Pearson do this?

They didn’t have a method to generate a sampling distribution in relation to the alternative hypothesis – they only had the computational/mathematical ability to generate a sampling distribution around the null, so that’s what they did, and the rest is history (we hope)

Page 22: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• But using resamping theory, and the bootstrap in particular, we can generate a sampling distribution (empirically) in settings with high skewness, unknown distribution, small N, using either HI or OI logic

• Note – be prepared for some computationally intensive methods

• The programmers and computers will have lots of work to do

• But to applied researchers, the consumers of these methods, this computational intensity can be transparent

Page 23: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Previous applications

• Bollen & Stine (1993) used a square root transformation to adjust the bootstrap for SEM fit indices, using similar logic to that defined above

• Parametric bootstraps are now popular – these use a distributional model of the data, rather than the observed data, to draw the bootstrap

• Zhang & Browne (2010) used this method in a dynamic factor analysis of time series data (in which the model was partially imposed by using a moving block bootstrap across the time series)

Page 24: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

MV “diagonalization”

• The major requirement to extend this type of AHST are methods to impose the hypothesized MV model on a univariate sampling frame

• There have been advances in this regard recently

Page 25: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Cudeck & Browne (1992), Psychometrika”Constructing a Covariance Matrix that Yields a Specified

Minimizer and a Specified Minimum Discrepancy Function Value”

• Cudeck and Browne (1992) showed how to construct a covariance matrix according to a model with a prescribed lack of fit, designed specifically for Monte Carlo research

• In fact, such Monte Carlo methods – designed to produce matrices to study – themselves become hypothesis testing methods in the current paradigm

• We won’t use this method here, because in our application we need to produce raw data with specified correlation structure, rather than the covariance/ correlation matrix

• But this method can help when extentions to covariance structure analysis are considered

Page 26: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Headrick (2002) Computational Statistics and Data Analysis

“Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions”

• Power method (using a high-order polynomial transformation)

• Draw MV normal data and transform, using up to fifth order polynomial, which reproduces up to six moments of specified nonnormal distribution

Page 27: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Ruscio & Kaezetow (2008) MBR“Simulating Multivariate Nonnormal Data Using an Iterative

Algorithm”

• SI method (Sample and Iterate)• “… implements the common factor model

with user-specified non-normal distributions to reproduce a target correlation matrix”

• Construct relatively small datasets with specified correlation structure

Page 28: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Work-in-progress – Using AHST with Multiple Regression, bootstrapping R2

• Design– Four types of hypothesis tests

• MV Parametric procedure• MV Sampling, regular bootstrap• Ruscio sampling for MV diagonalization• Ruscio sampling for MV diagonalization, sample N2 points• Note: All bootstrap procedures were bias -corrected

– Three 4 X 4 correlation matrices• One completely uncorrelated, two correlated patterns

– Seven distributional patterns– 10,000 bootstrap cycles

Page 29: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Correlation matrices

(Population matrices below, MV data with this correlation structure generated using Hedrick’s method)

R2 = .27 R2 = .27 Y X1 X2 X3 Y X1 X2 X3

Y 1 .4 .4 .4 1 .4 .4 0X1 .4 1 .4 .4 .4 1 .2 0X2 .4 .4 1 .4 .4 .2 1 0X3 .4 .4 .4 1 0 0 0 1

Page 30: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Distributional patterns

• These combined normal, 1 df chi-square, and 3 df chi-square– Normal Y, X1, X2, and X3– Normal Y, 1 df chi-square X1, X2, and X3– Normal Y, 3 df chi-square X1, X2, and X3– 1 df chi-square Y, normal X1, X2, and X3– 3 df chi-square Y, normal X1, X2, and X3– 1df chi-square Y, X1, X2, and X3– 3 df chi-square Y, X1, X2, and X3

Page 31: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

How these raw data look

Page 32: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Y

-2 0 2 4 6 8 -2 0 2 4 6

-3-2

-10

12

3

-20

24

68

X1

X2

02

46

-3 -2 -1 0 1 2 3

-20

24

6

0 2 4 6

X3

Normal Y, ChiSq3 X

Page 33: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Y

0 2 4 6 -1 0 1 2 3 4 5

-10

12

34

5

02

46

X1

X2

02

46

-1 0 1 2 3 4 5

-10

12

34

5

0 2 4 6

X3

ChiSq3 Y, ChiSq3 X

Page 34: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• To evaluate this method, we put it within a monte carlo design replicating this process 1000 times per cell

• Results:

Page 35: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Sample Size

Rej

ecti

on

30 60 100 200

.05

.10

.15

.20

NormalYNormalNormalNormalX

All .4

Sample Size30 60 100 200

Two .4, One .2, Three 0R

ejec

tion

30 60 100 200

.05

.10

.15

.20

ChiDf3YChiDf3ChiDf3ChiDf3X

30 60 100 200

Rej

ecti

on

30 60 100 200

.05

.10

.15

.20

NormalYChiDf3ChiDf3ChiDf3X

30 60 100 200

AnalyticMV SamplingOI RusOI Rus Sq

Page 36: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Comments

• Based on these patterns, so far we’re not convinced that the implementation of the Ruscio method works effectively for this problem

• There are theoretical reasons to prefer Headrick, because that method respects not only the marginals, but also the original moments – it recreates our specific population distributions better

Page 37: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• What’s next for multiple regression/GLM?– Headrick’s procedure evaluated– Move onto model comparisons – bootstrap the F

statistic to compare two nested linear models– Expand number of correlation structures

Page 38: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

• What’s next more broadly?– This approach appears to be generalizable – using

a MV data generation routine to produce observations with a structure consistent with a model, then bootstrapping some appropriate statistic off of that alternative hypothesis to test the model

– To CFA, for example– To HLM, with hypothesized structure

Page 39: Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma.

Conclusion

• Rodgers (2010): “The … focal point is no longer the null hypothesis; it is the current model. This is exactly where the researcher—the scientist—should be focusing his or her concern.”