Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details Thresholded Generalized Principal Component Regression: Forecasting with Many Predictors Mohsen Pourahmadi Ranye Sun Texas A&M University Recent Advances and Trends in Time Series Analysis: Nonlinear Time Series, High-Dimensional Inference and Beyond Banff, CN: April 27-May 2, 2014
49
Embed
Thresholded Generalized Principal Component Regression ... · Thresholded Generalized Principal Component Regression: Forecasting with Many Predictors Mohsen Pourahmadi Ranye Sun
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Thresholded Generalized Principal ComponentRegression: Forecasting with Many Predictors
Mohsen PourahmadiRanye Sun
Texas A&M University
Recent Advances and Trends in Time Series Analysis:Nonlinear Time Series, High-Dimensional Inference and Beyond
Banff, CN: April 27-May 2, 2014
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Modeling Two-Way Dependent Data
Problem I: How to model 2-way dependency based ononly one realization of a data matrix?
X Time Series: Assume Stationarity,the ACF or Spectral Density Matrix Will Do the Job.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Modeling Two-Way Dependent Data
Problem I: How to model 2-way dependency based ononly one realization of a data matrix?
X Time Series: Assume Stationarity,the ACF or Spectral Density Matrix Will Do the Job.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
The Data Matrix
In traditional multivariate analysis the rows are independent.
In mult. time series both rows and columns are corr.
Now, it is common to have data matrices where both rows andcolumns are correlated: Spatial Data, Spatio-temporal fMRI,Microarray (Efron, 2010), e-Commerce (Netflix), Finance, ...
Names: Transposable Data (Allen and Tibshirani, 2010);Two-way Structured Data (Huang, Shen and Buja, 2009).
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
The Data Matrix
In traditional multivariate analysis the rows are independent.
In mult. time series both rows and columns are corr.
Now, it is common to have data matrices where both rows andcolumns are correlated: Spatial Data, Spatio-temporal fMRI,Microarray (Efron, 2010), e-Commerce (Netflix), Finance, ...
Names: Transposable Data (Allen and Tibshirani, 2010);Two-way Structured Data (Huang, Shen and Buja, 2009).
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
How to Model Transposable Data?
Problem I: How to model 2-way dependency using only onerealization of a data matrix?
X Time Series: Assume Stationarity, the ACF or SpectralDensity Matrix Will Do the Job.
Nowadays: Assume a matrix normal distribution:
Y ∼ MNn,q(B,Ω−1,Σ−1),
OR vec(Y ) ∼ Nnq(vec(B),Σ−1 ⊗ Ω−1),
with separable covariances.
Unrealistic/limited dependence structure.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
How to Model Transposable Data?
Problem I: How to model 2-way dependency using only onerealization of a data matrix?
X Time Series: Assume Stationarity, the ACF or SpectralDensity Matrix Will Do the Job.
Nowadays: Assume a matrix normal distribution:
Y ∼ MNn,q(B,Ω−1,Σ−1),
OR vec(Y ) ∼ Nnq(vec(B),Σ−1 ⊗ Ω−1),
with separable covariances.
Unrealistic/limited dependence structure.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Multivariate Linear Regression/Prediction
Model
Y = XB + E ,
where Y ∈ Rn×q,X ∈ Rn×p, B ∈ Rp×q and E has a matrixnormal dist.
OLS estimator: BOLS = (X ′X )−1X ′Y .
Problem II: How to improve BOLS in HD for better prediction?
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Reduced Rank Regression
Finds the LS estimator of B subject to a rank constraintrank(B) = r (Anderson, 1951).
Reduces the pq parameters in B to r(p + q) which is linear inp and q.
Solution involves SVD/PCA of B.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
A Simpler Model
Reduce the regression model
Y = XB + E
to the ”signal plus noise” model:
(X ′X )−1X ′Y = B + (X ′X )−1X ′E
BOLS = B + E
Low-rank/sparse estimation of B has been studied when theentries of the error matrix are i.i.d.:Shen and Huang (2008); Yang, Buja and Ma (2013);Allen, Grosenick and Taylor (2013).
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
The Singular Value Decomposition (SVD)
Let Y be an n × q matrix of rank m. Then,(a) there exist matrices U,V and D such that
Y = UDV ′ =m∑
i=1
diuiv′i ,
where the columns of U = (u1, . . . ,um),V = (v1, . . . , vm) areorthonormal, and the diagonal entries of D = diag(d1, . . . , dm) areordered: d1 ≥ d2 ≥ . . . ≥ dm > 0.
The columns of U and V are called the left- and right-singularvectors of Y , and the diagonal entries of D are the correspondingsingular values.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Rank-r approximation
(b) (Eckart-Young Theorem,1936): For any r ≤ m, the best rank-rapproximation to Y in the Frobenius norm is
Y (r) =r∑
i=1
diuivi .
More precisely,
Y (r) = arg minrank(B)=r
||Y − B||2F
= arg minrank(B)=r
tr(Y − B)′(Y − B).
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Rank-r approximation: PCA
The SVD represents Y as the sum of m orthogonal layers ofdecreasing importance.
Use the first few SVD layers corresponding to larger di values,ignore the rest or treat them as noise.
SVD and PCA deal with decompositions of Y and Y ′Y ,respectively.
The right singular vectors in V are the eigenvectors of thesample cov. matrix or its PC loading matrix. The PCs are thecolumns of YV .
Remark: Principal Component Regression (PCR) uses thefirst few PCs as the predictors.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Rank-r approximation: PCA
The SVD represents Y as the sum of m orthogonal layers ofdecreasing importance.
Use the first few SVD layers corresponding to larger di values,ignore the rest or treat them as noise.
SVD and PCA deal with decompositions of Y and Y ′Y ,respectively.
The right singular vectors in V are the eigenvectors of thesample cov. matrix or its PC loading matrix. The PCs are thecolumns of YV .
Remark: Principal Component Regression (PCR) uses thefirst few PCs as the predictors.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Computing the SVD: Power method
Starting with v(0), iterate1. u(k) = Yv(k−1)/||Yv(k−1)||,2. v(k) = Y ′u(k)/||Y ′u(k)||,
sequentially until convergence to u and v. Computed = u′Yv.
Then, apply steps 1-2 to the residual matrix Y − duv′.
Next, ALL the singular vectors are computed simultaneously.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
The Orthogonal Subspace Iteration
Starting with V (0)
1. Multiplication: Y (k)L = YV (k−1),
2. QR Decomposition: U(k)R(k)u = Y (k)
L ,
3. Multiplication: Y (k)R = Y ′U(k),
4. QR Decomposition: V (k)R(k)v = Y (k)
R .
Golub and Van Loan (1996)
1. u(k) = Yv(k−1)/||Yv(k−1)||,2. v(k) = Y ′u(k)/||Y ′u(k)||,
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Inconsistency of U,V in high dim.
Silverman (1996); Paul (2007); Johnstone and Lu (2009).
Penalize the singular values to control the rank ( Yuan et al.,2007; Bunea et al., 2011).
Penalize the singular vectors to induce sparsity (Huang et al.2009; Witten et al., 2009).
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Regularization of the singular vectors
Minimize the objective function:
||Y − duv||2F + Pλ(u, v)
Pλ(u, v) = λu||u||1 + λv ||v||1
Sequentially solve for (di ,ui , vi), i ∈ 1 · · ·m:e.g. Y2 = Y − d1u1v1
Drawbacks:Orthogonality of the singular vectors is not guaranteed.Computational cost.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Regularization of the singular vectors
Minimize the objective function:
||Y − duv||2F + Pλ(u, v)
Pλ(u, v) = λu||u||1 + λv ||v||1
Sequentially solve for (di ,ui , vi), i ∈ 1 · · ·m:e.g. Y2 = Y − d1u1v1
Drawbacks:Orthogonality of the singular vectors is not guaranteed.Computational cost.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Thresholding: Optimization-Free
Yang et al. (2013): A sparse SVD method for high dimensionaldata.
Simultaneously computes the subspaces spanned by theleading singular vectors in U,V using the orthogonalsubspace iterations.
Thresholding is used to replace by zero the smaller entries ofU and V .
The Fast Iterative Thresholding Sparse SVD (FIT-SSVD)
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Thresholding: Optimization-Free
Yang et al. (2013): A sparse SVD method for high dimensionaldata.
Simultaneously computes the subspaces spanned by theleading singular vectors in U,V using the orthogonalsubspace iterations.
Thresholding is used to replace by zero the smaller entries ofU and V .
The Fast Iterative Thresholding Sparse SVD (FIT-SSVD)
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
The FIT-SSVD Algorithm
1. Multiplication and Thresholding: U(k),thr = η(YV (k−1), γu),
2. QR Decomposition: U(k)R(k)u = U(k),thr ,
3. Multiplication and Thresholding: V (k),thr = η(Y ′U(k), γv ),
Percentiles of ratios of RMSE of TGPCA relative to the PCR-5 fororiginal data.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Problem III: Transform the Data?
Compared to Stock and Watson (2012), the TGPCA approachobviates the need to transform the data to stationarity whichcan be a major advantage over the PCR in high-dimensionaldata situations.
Deciding what transformations to use is a difficult task evenfor univariate time series data.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Simulating nonstationary data
Case I: Random walk.
Xj,t = Xj,t−1 + εjt .
Case II: AR(2) with unit root plus drift.
Xj,t = 1.03Xj,t−1 − 0.03Xj,t−2 + cj + εjt .
Case III: AR(3) with unit root plus seasonality.
Xj,t = 1.2Xj,t−1 − 0.21Xj,t−2 + 0.01Xj,t−3 + cj
+sin(π ∗ t/16) ∗ 5 + εjt .
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details
Simulation model
Y = XB + E
B =∑q
i=1 diuiv′i with the first five largest singular values(177, 32, 30, 26, 22), while others are less than 5.
This indicates that the model with r = 5 is appropriate.
Data/Model SVD/PCA Reg/Thresholding Data analysis Summary Details