Covariance Matrix Applications

Dimensionality Reduction

Outline

• What is the covariance matrix?

• Example

• Properties of the covariance matrix

• Spectral Decomposition – Principal Component Analysis

Covariance Matrix

• Covariance matrix captures the variance and linear correlation in multivariate/multidimensional data.

• If data is an N x D matrix, the Covariance Matrix is a d x d square matrix

• .Think of N as the number of data instances (rows) and D the number of attributes (columns).

Covariance Formula

• Let Data = N x D matrix.

• The Cov(Data)

)())((

))((...)(

Example

9167.033.05.0

33.021

5.0167.1

COV(R)

68.00.0

0.007.0

078.006.0

06.007.0

087.0008.0

008.007.0

Moral: Covariance can only capture linear relationships

Dimensionality Reduction

• If you work in “data analytics” it is common these days to be handed a data set which has lots of variables (dimensions).

• The information in these variables is often redundant – there are only a few sources of genuine information.

• Question: How can be identify these sources automatically?

Hidden Sources of Variance

X1 X2 X3 X4

D A T A

Model: Hidden Sources are Linear Combinations of Original Variables

Hidden Sources

• If the information that the known variables provided was different then the covariance matrix between the variables should be a diagonal matrix – i.e, the non-zero entries only appear on the diagonal.

• In particular, if Hi and Hj are independent then E(Hi-i)(Hj-j)=0.

Hidden Sources

• So the question is what should be the hidden sources.

• It turns out that the “best” hidden sources are the eigenvectors of the covariance matrix.

• If A is a d x d matrix, then <, x> is an eigenvalue-eigenvector pair if

• Ax = x

Explanation

We have two axis, X1 and X2. We want to project the data along the directionof maximum variance.

Covariance Matrix Properties

• The Covariance matrix is symmetric.

• Non-negative eigenvalues. – 0 · 1 · 2 d

• Corresponding eigenvectors– u1,u2,,ud

Principal Component Analysis

• Also known as– Singular Value Decomposition– Latent Semantic Indexing

• Technique for data reduction. Essentially reduce the number of columns while losing minimal information

• Also think in terms of lossy compression.

Motivation

• Bulk of data has a time component

• For example, retail transactions, stock prices

• Data set can be organized as N x M table

• N customers and the price of the calls they made in 365 days

• M << N

Objective

• Compress the data matrix X into Xc, such that– The compression ratio is high and the

average error between the original and the compressed matrix is low

– N could be in the order of millions and M in the order of hundreds

Example database

ABC 1 1 1 0 0

DEF 2 2 2 0 0

GHI 1 1 1 0 0

KLM 5 5 5 0 0

0 0 0 2 2

john 0 0 0 3 3

tom 0 0 0 1 1

Decision Support Queries

• What was the amount of sales to GHI on July 11?

• Find the total sales to business customers for the week ending July 12th?

Intuition behind SVD

Customer are 2-D points

SVD Definition

• An N x M matrix X can be expressed as

Lambda is a diagonal r x r matrix.

SVD Definition

• More importantly X can be written as

tt vuvuvuX 222111

Where the eigenvalues are in decreasing order.

ttc vuvuvuX 222111

Example

80.53.

Compression

ii vuX

iiic vuX

Where k <=r <= M

Covariance Matrix Applications

Documents

Random Matrix Theory and Covariance...

Charles Stein, covariance matrix estimation and some ... ·...

Tutorial CMA-ES Evolution Strategies and Covariance Matrix.....

Adaptive Exploration through Covariance Matrix Adaptation -....

Covariance matrix-based ﬁre and ﬂame detection method in...

Covariance and precision matrix estimation for high ...

Random Matrix Theory and Covariance Estimation

Covariance Matrix Adaptation Pareto Archived Evolution ...

Estimating the Inverse Covariance Matrix of Independent...

Hierarchical matrix approximation of large covariance...

Some Heteroskedasticity-Consistent Covariance Matrix ...

Shrinkage Estimation of the Covariance Matrix

(1) Time Series Stationary: Toeplitz covariance matrix.

Generalized background error covariance matrix model (GEN_BE...

Random Matrix Covariance 2008

Covariance Matrix Estimation - Mojtaba Soltanalian