Top Banner
Motivation: researchers often obtain copious and/or incomplete data that can be superfluous/collinear in terms of explaining particular outcomeseven sophisticated analysis software and technology struggle under these particular conditions. Purpose: compare Principal Component Analysis (PCA), Partial Least Squares (PLS), and Johnson-Lindenstrauss inspired Random Matrices (RMs) in terms of reducing dataset dimensionality while retaining practical generality by reducing bias and mean-squared error (MSE). Survival Analysis: overcomes limitations of standard regression approaches; able to include positive values; can handle censoring. Accelerated Failure Time (AFT) Model: provides intuitive interpretation of predictor and response variables via survivor curves; directly models survival times. PCA: obtains components/eigenvalues from data’s variance-covariance matrix; maximizes covariance and correlation of linear combinations of predictor variable; produces new less correlated variables by constructing orthogonal transformations of covariates. PLS: similar to PCA; however, PLS maximizes covariance and correlation of linear combinations of predictor and response variables; projects predictor and response variables into new space to model covariance structure. According to the results, PCA outperforms PLS, all three RM variants are comparable, and all RMs are superior to PCA and PLS. 1. Integrate censored data into investigation. 2. Apply findings to real datasetse.g., microarray gene cancer data. 3. Utilize more powerful software and higher-performance technology. 4. Observe effects of altering regression modele.g., instead of AFT Model, implement Cox Proportional Hazards Model. Achlioptas, D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences 66(4): 671-687, 2003. Dasgupta, S. and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms 22(1): 60-65, 2003. Johnson, W.B. and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. Contemp Math 26: 189-206,1984. Nguyen, Tuan S. and Javier Rojo. Dimension Reduction of Microarray Data in the Presence of a Censored Survival Response: A Simulation Study. Statistical Applications in Genetics and Molecular Biology 8(1): 2009. Nguyen, Tuan S. and Javier Rojo. Dimension Reduction of Microarray Gene Expression Data: The Accelerated Failure Time Model. Journal of Bioinformatics and Computational Biology 7(6): 939-954, 2009. This project was ameliorated thanks to the generous guidance and support from Javier Rojo along with Kyle Bradford, Nathan C. Wiseman, Raul Cruz-Cano, and Rashidul Hasan. This research was supported by the National Security Agency through Grant H98230-15-1-0048 to the University of Nevada at Reno, Javier Rojo PI. Analysis and Discussion RMs unexpectedly outdid PCA and PLScould be connected to R’s accuracy limits when generating datasets and/or not incorporating censored data; PLS performs in-depth analysis of predictor and response variables, yet it was bested by PCAcould be due to dataset generation. Further Inquiry Acknowledgements and Literature Cited Results Assessment Introduction Methods Contribution: this research builds upon the work of Nguyen and Rojo with respect to PCA and PLS (2009); furthermore, via computer simulations, this investigation appends the results of Achlioptas (2003) and Gupta- Dasgupta (2003) in their analysis of Johnson and Lindenstrauss’s extensions of Lipschitz mappings into Hilbert spaces (1984). Methods Continued RM: matrix with predetermined random qualities; generated RM is then applied directly to predictor matrix. 1. Generate fixed regression coefficients and theoretical mean in R. 2. Obtain the true survivor curve through the AFT Model. 3. Implement all five of the dimension reduction techniques on the data. 4. Acquire estimates on the real survivor curve from each procedure. 5. Calculate bias and MSE at uniform partitions of the vertical axis. 6. Repeat steps 1-5 for the desired amount of iterations. 7. Receive total error plots to analyze given technique’s performance. Survival Analysis Dimension Reduction Techniques: A Comparison of Select Methods Iván Rodríguez and Claressa L. Ullmayer The University of Arizona * The University of Alaska, Fairbanks
1

Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM

Apr 14, 2017

Download

Documents

Ivan Rodriguez
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM

Motivation: researchers often obtain copious and/or incomplete data that

can be superfluous/collinear in terms of explaining particular outcomes—

even sophisticated analysis software and technology struggle under

these particular conditions.

Purpose: compare Principal Component Analysis (PCA), Partial Least

Squares (PLS), and Johnson-Lindenstrauss inspired Random Matrices

(RMs) in terms of reducing dataset dimensionality while retaining

practical generality by reducing bias and mean-squared error (MSE).

Survival Analysis: overcomes limitations of standard regression

approaches; able to include positive values; can handle censoring.

Accelerated Failure Time (AFT) Model: provides intuitive interpretation of

predictor and response variables via survivor curves; directly models

survival times.

PCA: obtains components/eigenvalues from data’s variance-covariance

matrix; maximizes covariance and correlation of linear combinations of

predictor variable; produces new less correlated variables by

constructing orthogonal transformations of covariates.

PLS: similar to PCA; however, PLS maximizes covariance and

correlation of linear combinations of predictor and response variables;

projects predictor and response variables into new space to model

covariance structure.

According to the results, PCA outperforms PLS, all three RM variants are

comparable, and all RMs are superior to PCA and PLS.

1. Integrate censored data into investigation.

2. Apply findings to real datasets—e.g., microarray gene cancer data.

3. Utilize more powerful software and higher-performance technology.

4. Observe effects of altering regression model—e.g., instead of AFT

Model, implement Cox Proportional Hazards Model.

Achlioptas, D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and

System Sciences 66(4): 671-687, 2003.

Dasgupta, S. and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and

Algorithms 22(1): 60-65, 2003.

Johnson, W.B. and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. Contemp Math 26: 189-206,1984.

Nguyen, Tuan S. and Javier Rojo. Dimension Reduction of Microarray Data in the Presence of a Censored Survival

Response: A Simulation Study. Statistical Applications in Genetics and Molecular Biology 8(1): 2009.

Nguyen, Tuan S. and Javier Rojo. Dimension Reduction of Microarray Gene Expression Data: The Accelerated Failure Time

Model. Journal of Bioinformatics and Computational Biology 7(6): 939-954, 2009.

This project was ameliorated thanks to the generous guidance and support from Javier Rojo along with Kyle Bradford, Nathan

C. Wiseman, Raul Cruz-Cano, and Rashidul Hasan. This research was supported by the National Security Agency through

Grant H98230-15-1-0048 to the University of Nevada at Reno, Javier Rojo PI.

Analysis and Discussion

RMs unexpectedly outdid PCA and PLS—could be connected to R’s

accuracy limits when generating datasets and/or not incorporating

censored data; PLS performs in-depth analysis of predictor and response

variables, yet it was bested by PCA—could be due to dataset generation.

Further Inquiry

Acknowledgements and Literature Cited

Results

Assessment

Sample Curve

Introduction

Methods

Contribution: this research builds upon the work of Nguyen and Rojo with

respect to PCA and PLS (2009); furthermore, via computer simulations,

this investigation appends the results of Achlioptas (2003) and Gupta-

Dasgupta (2003) in their analysis of Johnson and Lindenstrauss’s

extensions of Lipschitz mappings into Hilbert spaces (1984).

Methods Continued

RM: matrix with predetermined random qualities; generated RM is then applied directly to predictor matrix. 1. Generate fixed regression coefficients and theoretical mean in R.

2. Obtain the true survivor curve through the AFT Model.

3. Implement all five of the dimension reduction techniques on the data.

4. Acquire estimates on the real survivor curve from each procedure.

5. Calculate bias and MSE at uniform partitions of the vertical axis.

6. Repeat steps 1-5 for the desired amount of iterations.

7. Receive total error plots to analyze given technique’s performance.

Survival Analysis Dimension Reduction Techniques:

A Comparison of Select Methods Iván Rodríguez† and Claressa L. Ullmayer∗

† The University of Arizona * The University of Alaska, Fairbanks