Outline SVD Linear Systems PCA Multilinear models SVD, PCA, Multilinear Models AP Georgy Gimel’farb [email protected] 1 / 37
Outline SVD Linear Systems PCA Multilinear models
SVD, PCA, Multilinear Models
AP Georgy Gimel’[email protected]
1 / 37
Outline SVD Linear Systems PCA Multilinear models
1 SVD
2 Linear Systems
3 PCA
4 Multilinear models
Recommended reading:
• G. Strang, Computational Science and Engineering. Wellesley-Cambridge Press, 2007: Sections 1.5-7• C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006: Sections 12.1-3• W. H. Press et al., Numerical Recipes: The Art of Scientific Computing. Cambridge Univ. Press, 2007:
Sec. 2.6; Chapter 11
2 / 37
Outline SVD Linear Systems PCA Multilinear models
Solving Linear Equation Systems: Recall What You Know
Most common problem in numerical modelling:
System of m equations
a11x1 + a12x2 + . . .+ a1nxn = b1a21x1 + a22x2 + . . .+ a2nxn = b2. . .am1x1 + am2x2 + . . .+ amnxn = bm
⇔
a11 a12 . . . a1na21 a22 . . . a2n...
.... . .
...am1 am2 . . . amn
︸ ︷︷ ︸
A
x1x2...xn
︸ ︷︷ ︸
x
=
b1b2...bm
︸ ︷︷ ︸
b
⇔ Ax = b
Given the known matrix A of coefficients and the known vector b, findthe unknown n-component vector x
3 / 37
Outline SVD Linear Systems PCA Multilinear models
Solving Linear Equation Systems
If a non-singular square matrix A (m = n; detA 6= 0): x = A−1b• In practice, mostly m 6= n and are of very large orders (say,≥ 10, 000 . . . 1, 000, 000)
• Even if m = n, testing for singularity and inverting a verylarge matrix present huge computational problems
• Easily solvable systems:• Diagonal n× n matrices with nonzero components:
A =
a1 . . .an
⇒ A−1 =
1a1
. . .1an
⇒ xi = biai for all i = 1, . . . , n• Simplifying notation: A = diag{a1, . . . , an}
• Orthonormal (or orthogonal) and triangular n× n matrices
4 / 37
Outline SVD Linear Systems PCA Multilinear models
Orthonormal Matrices
• A = [a1 a2 . . . an] where ai = [ai1 ai2 . . . ain]T aremutually orthogonal unit vectors:
aTi aj ≡n∑k=1
aikajk = δi−j ≡{
1 if i = j0 if i 6= j
• Inversion by transposition: A−1 = AT, so x = ATb
• ATA ≡
aT1
...aTn
[a1 a2 . . . an] = In ≡ diag{1, 1, . . . , 1}• AATA ≡ A (ATA)︸ ︷︷ ︸
In
= A and AATA ≡ (AAT)A
• Therefore, AAT = In, too
5 / 37
Outline SVD Linear Systems PCA Multilinear models
Orthonormal Matrices: An Example
A =
1√3
1√2− 1√
61√3
0 2√6
1√3− 1√
2− 1√
6
; AT =
1√3
1√3
1√3
1√2
0 − 1√2
− 1√6
2√6− 1√
6
⇒
1√3
1√3
1√3
1√2
0 − 1√2
− 1√6
2√6− 1√
6
1√3
1√2− 1√
61√3
0 2√6
1√3− 1√
2− 1√
6
= diag(1, 1, 1)
1√3
1√2− 1√
61√3
0 2√6
1√3− 1√
2− 1√
6
1√3
1√3
1√3
1√2
0 − 1√2
− 1√6
2√6− 1√
6
= diag(1, 1, 1)Ax = b
(i.e. bi =
∑3j=1 aijxj
)⇒ xj =
∑3i=1 a
Tijbi =
∑3i=1 ajibi
6 / 37
Outline SVD Linear Systems PCA Multilinear models
Triangular Matrices
• Lower and upper triangular matrices:
A =
a11a21 a22...
.... . .
an1 an2 . . . ann
ora11 a12 . . . a1n
a22 . . . a2n. . .
...ann
• Simple sequential solution of the system Ax = b without
explicit inversion of A: e.g. for the lower triangular matrix
xi = 1aii
(bi −
i−1∑j=1
aijxj
):
x1 =b1a11
; x2 =b2 − a21x1
a22; . . . ; xn =
bn − an1x1 − . . .− an−1,nxn−1ann
7 / 37
Outline SVD Linear Systems PCA Multilinear models
Three Essential Factorisations
• Elimination (LU decomposition): A = LU
• Lower triangular matrix × Upper triangular matrix• Orthogonalisation (QR decomposition): A = QR
• Orthogonal matrix (columns) ו Singular Value Decomposition (SVD): A = UDVT
• × diag(singular values) × Orthogonal (rows)• Orthonormal columns in U and V:
the left and right singular vectors, respectively• Left singular vector: an eigenvector of the square m×m
matrix AAT
• Right singular vector: an eigenvector of the square n× nmatrix ATA
8 / 37
Outline SVD Linear Systems PCA Multilinear models
Eigenvectors and Eigenvalues
• Definition of an eigenvector e with an eigenvalue λ:Ae = λe, i.e. (A− λI)e = 0• λ is an eigenvalue of A if determinant |A− λI| = 0• This determinant is a polynomial in λ of degree n:
so it has n roots λ1, λ2, . . . , λn• Every symmetric matrix A has a full set (basis) of n
orthogonal unit eigenvectors e1, e2, . . . , en
Example of deriving eigenvalues and eigenvectors
A =[
2 −1−1 2
]⇒ |A− λI| ≡
∣∣∣∣ 2− λ −1−1 2− λ∣∣∣∣ = λ2 − 4λ+ 3 = 0
⇒ λ1 = 1;λ2 = 3 ⇒ e1 = 1√2
[11
]; e2 = 1√2
[1−1
]
9 / 37
Outline SVD Linear Systems PCA Multilinear models
Eigenvectors and Eigenvalues: A Few Properties
• No algebraic formula for the polynomial roots for n > 4(Galois’ Theorem)• Thus, the eigenvalue problem needs own special algorithms• Solving the eigenvalue problem is harder than to solve Ax = b
• Determinant |A| = λ1λ · · ·λn (product of eigenvalues)• trace(A) = a11 + a22 + . . .+ ann = λ1 + λ2 + . . .+ λn (sum
of eigenvalues)
• Ak = A · · ·A︸ ︷︷ ︸k times
has the same eigenvectors as A: e.g. for A2
Ae = λe ⇒ AAe = λAe = λ2e
• Eigenvalues of Ak are λk1 , . . . , λkn• Eigenvalues of A−1 are 1λ1 , . . . ,
1λn
10 / 37
Outline SVD Linear Systems PCA Multilinear models
Eigenvectors and Eigenvalues
• Diagonalisation of an n× n matrix A with n linearlyindependent eigenvectors e• Eigenvector matrix S = [e1 e2 . . . en] ⇒
Diagonalisation S−1AS = Λ ≡ diag(λ1, λ2, . . . , λn)• With no repeated eigenvalues, A always has n linearly
independent eigenvectors• Real symmetric matrices:
real eigenvalues and orthonormal eigenvectors, so S−1 = ST• Symmetric diagonalisation:
A = S−1ΛS = STΛS
• Yet another useful representation:
A = λ1e1eT1 + λ2e2eT2 + . . .+ λnene
Tn
11 / 37
Outline SVD Linear Systems PCA Multilinear models
Singular Value Decomposition
• If singular or very close to singular sets of linear equations:results of conventional solutions (e.g. by LU decomposition)are unsatisfactory
• SVD diagnoses and in some cases solves the problem• It gives a useful numerical answer
(though not necessarily the expected one!)
• SVD is a method of choice for solving most linearleast-squares problems
12 / 37
Outline SVD Linear Systems PCA Multilinear models
Singular Value Decomposition
SVD represents an ordinary m× n matrix A as A = UDVT
U : an m× n column-orthogonal matrix:its n columns – the eigenvectors u of AAT
V : an n× n orthogonal matrix:its n columns – the eigenvectors v of ATA
D : an n× n diagonal matrix of non-negative (≥ 0) singularvalues σ1, . . . , σn such that Avj = σjuj ; j = 1, . . . , r• σj =
√λj where λj is the eigenvalue for vj
Useful representation (r – the rank of A, i.e the number of itsnonzero singular values):
A = σ1u1vT1 + σ2u2vT2 + . . .+ σrurv
Tr ; σ1 ≥ σ2 ≥ . . . ≥ σr > 0
13 / 37
Outline SVD Linear Systems PCA Multilinear models
SVD: an Example
A =
0 11 11 0
⇒ AAT = 0 11 1
1 0
[ 0 1 11 1 0
]=
1 1 01 2 10 1 1
⇒ λ1 = 3; u1 =1√6
121
; λ2 = 1; u2 = 1√2
10−1
︸ ︷︷ ︸
Top n=2 eigenvalues and eigenvectors for AAT
ATA =[
0 1 11 1 0
] 0 11 11 0
= [ 2 11 2]
⇒ λ1 = 3; v1 =1√2
[11
]; λ2 = 1; v2 =
1√2
[−1
1
]︸ ︷︷ ︸
Eigenvalues and eigenvectors for ATA
14 / 37
Outline SVD Linear Systems PCA Multilinear models
SVD: an Example (cont.)
Singular values: Avj = σjuj ; j = 1, 2
1√2
0 11 11 0
[ 11
]= σ1 1√6
121
⇒ σ1 = √31√2
0 11 11 0
[ −11
]= σ2 1√2
10−1
⇒ σ2 = 1
A ≡
0 11 11 0
= [u1 u2]︸ ︷︷ ︸U
diag(σ1, σ2)︸ ︷︷ ︸D
[vT1vT2
]︸ ︷︷ ︸
VT
≡
1/√6 1/√22/√6 01/√
6 −1/√
2
︸ ︷︷ ︸
U
[ √3 00 1
]︸ ︷︷ ︸
D
[1/√
2 1/√
2−1/√
2 1/√
2
]︸ ︷︷ ︸
VT
15 / 37
Outline SVD Linear Systems PCA Multilinear models
Structure of SVD
• Overdetermined case: m > n (more equations than unknowns)
A = U D VT
• Underdetermined case: m < n (fewer equations than unknowns)A = U D VT
• Matrix V is orthogonal (orthonormal): VVT = VTV = In(the unit n× n matrix)
• Matrix U is column-orthogonal (orthonormal):
UTU =
[m∑i=1
uiαuiβ = δαβ
]; 1 ≤ α, β ≤ n
16 / 37
Outline SVD Linear Systems PCA Multilinear models
Structure of SVD
• Overdetermined case: m > n (more equations than unknowns)
A = U D VT
• Underdetermined case: m < n (fewer equations than unknowns)A = U D VT
• If m ≥ n, δαβ = 1 if α = β and δαβ = 0 otherwise:
UTU = In
• If m < n, the singular values σj = 0 for j = m+ 1, . . . , n andthe corresponding columns in U are also zero
• So the above relationships for δαβ in UTU hold only for1 ≤ α, β ≤ m
17 / 37
Outline SVD Linear Systems PCA Multilinear models
Basic Properties of SVD
• SVD can always be done, no matter how singular the matrix is!• SVD is almost “unique” up to:
1 The same permutation of
the columns of U,
the elements of D, and
the columns of V (or rows of VT); or
2 An orthogonal rotation on any set of columns of U and Vwhose corresponding elements of D are exactly equal
18 / 37
Outline SVD Linear Systems PCA Multilinear models
Basic Properties of SVD
Due to the permutation freedom:
• If m < n, the numerical SVD algorithm need not return zeroσj ’s in their canonical positions j = m+ 1, . . . , n
• Actually, the n−m zero singular values can be scatteredamong all n positions j = 1, . . . , n
• Thus, all the singular values should be to sorted into acanonical order
• It is convenient to sort into descending order
19 / 37
Outline SVD Linear Systems PCA Multilinear models
Solutions of Systems of Linear Equations
System Ax = b is a linear mapping of an n-dimensional vectorspace X to (generally) an m-dimensional vector space B
• The mapping might reach only a lesser-dimensional subspaceof the full m-dimensional one
• That subspace is called the range of A
• Rank r of A; 1 ≤ r ≤ min(m,n): the dimension r of therange of A
• Rank r is equal to the number of linearly independent columns(also – the number of linearly independent rows) of A
20 / 37
Outline SVD Linear Systems PCA Multilinear models
Simple Examples
1 Overdetermined system – 2D-to-3D mapping; the matrix of rank 2:
A︷ ︸︸ ︷ 0 11 11 0
x︷ ︸︸ ︷[x1x2
]=
b︷ ︸︸ ︷ 111
; Ax = 0 ⇒ 0D point x = 02 Underdetermined system – 3D-to-2D mapping; the matrix of rank 2:[
0 1 11 1 0
] x1x2x3
= [ 11]
; Ax = 0 ⇒ 1D line x1 = −x2 = x3
3 Underdetermined system – 3D-to-2D mapping; the matrix of rank 1[1 1 12 2 2
] x1x2x3
= [ 11]
; Ax = 0 ⇒ 2D plane x1+x2+x3 = 0
21 / 37
Outline SVD Linear Systems PCA Multilinear models
Solutions of Systems of Linear Equations
Nullspace of A – the space of the vectors x such that Ax = 0
• Nullity – the dimension of the nullspace
• Rank – nullity theorem:
rank A + nullity A = n (the number of columns)• Example 1 (Slide 21): rank 2; nullity 0 (point in 2D); n = 2• Example 2 (Slide 21): rank 2; nullity 1 (line in 3D) ; n = 3• Example 3 (Slide 21): rank 1; nullity 2 (plane in 3D); n = 3
• If m = n and r = n:A is square, nonsingular and invertible• Ax = b has a unique solution for any b, and only zero vector
is mapped to zero
22 / 37
Outline SVD Linear Systems PCA Multilinear models
Ax = b : Why SVD?
Most favourable case: the matrix A is square and invertible:• Only in this case the LU decomposition = of A into the
product LU of the lower and upper triangular matrices is thepreferred solution method for x
In a host of practical cases: A has rank r < n (i.e. nullity > 0):• Most right-hand side vectors b yield no solution: e.g.
there is no such x that[
1 1 12 2 2
] x1x2x3
= [ 11
]• But some b have multiple solutions (actually: the whole
subspace of them): e.g. for all x such that x1 + x2 + x3 = 1[1 1 12 2 2
] x1x2x3
= [ 12
]23 / 37
Outline SVD Linear Systems PCA Multilinear models
Ax = b : Why SVD?
Practical example: a linear 1D predictor
Time series of m = 1000 measurements (ri : i = 1, . . . , 1000) – anunknown predictor ri = a1ri−1 + a2ri−2 + a3ri−3; n = 3� m:
r3 r2 r1r4 r3 r2...
......
r999 r998 r997
a1a2a3
=
r4r3...
r1000
SVD A = UDVT explicitly constructs orthonormal bases for both thenullspace and the range of a matrix!
• Columns of U whose same-numbered singular values σj 6= 0 in D:an orthonormal set of basis vectors that span the range
• Columns of V whose same-numbered singular values σj = 0 in D:an orthonormal basis for the nullspace
24 / 37
Outline SVD Linear Systems PCA Multilinear models
Principal Component Analysis (PCA)
Goal of PCA: to identify most important properties revealed byan m× n matrix A of measurement data:
A =[
a1 a2 . . . an]≡
a11 a12 . . . a1na21 a22 . . . a2n
......
. . ....
am1 am2 . . . amn
PCA is also called the Karhunen-Loève transform
25 / 37
Outline SVD Linear Systems PCA Multilinear models
Practical Example 1
SAR image1 SAR Image 2 PCA comp. 1 PCA comp. 2
M. Shimada, M. Minamisawa, O. Isoguchi: A Study on Estimating the Forest Fire Scars in East KalimantanUsing JERS-1 SAR Data. http://www.eorc.jaxa.jp/INSAR-WS/meeting/paper/g2/g2.html
26 / 37
Outline SVD Linear Systems PCA Multilinear models
Practical Example 2: http://www.cacr.caltech.edu/SDA/images/
SAR channels 3 top-rank PC
Colour-coded 3 original SAR channels
Colour-coded 3 top-rank PC
27 / 37
Outline SVD Linear Systems PCA Multilinear models
Principal Component Analysis (PCA)
A =[
a1 a2 . . . an]≡
a11 a12 . . . a1na21 a22 . . . a2n
......
. . ....
am1 am2 . . . amn
Measurements are considered as linear combinations of the originalproperties (e.g. principal components):
a = α1u1 + . . .+ αkuk
• Weights α in the combination are called loadings• All loadings are nonzero
General assumption: zero mean value for this data, so thevariance is the critical indicator of importance, large or small
28 / 37
Outline SVD Linear Systems PCA Multilinear models
Data Projection by PCA
m×m covariance matrix Σn for n samples of m-dimensional data:
Σn =1
n− 1
(a1aT1 + . . .+ ana
Tn
)=
1n− 1
AAT
• Best basis in the property space Rm: eigenvectors u1,u2, . . .of AAT
• Best basis in the sample space Rn: eigenvectors v1,v2, . . . ofATA
29 / 37
Outline SVD Linear Systems PCA Multilinear models
Data Projection by PCA
PCA: the orthogonal projection of data onto a lower-dimensional(k < m) linear subspace (the principal subspace)
• Variance of the projected data in the principal subspace ismaximal (with respect to any other subspace of the samedimension k)
• Principal subspace: the k eigenvectors u1, . . . ,uk of AAT
(i.e. of Σn, too) with the largest eigenvalues:
a[k] = α1u1 + . . .+ αkuk
αi = uTi ≡ ui,1a1 + . . .+ ui,mam
30 / 37
Outline SVD Linear Systems PCA Multilinear models
Data Projection by PCA
• Column-orthogonal matrix of k principal components:Uk = [u1 . . . uk]
• Projection matrix (projection to the principal subspace):
Pk = UkUTk ≡[
u1 u2 . . . uk]
uT1uT2...
uTk
• Original measurement vector a → Projected vector Pka
Another interpretation of the PCA:
• Projection Pk to the principal subspace minimises theprojection error
∑nj=1 ‖ aj −Pkaj ‖2
31 / 37
Outline SVD Linear Systems PCA Multilinear models
PCA: An Example
⇒ ⇒
2
2
2
A = [a1 . . .a10] =[−3 −2 −2 −1 −1 1 1 2 2 3−2 −3 −2 −1 1 −1 1 2 3 2
]Variances (σ11;σ22) and covariances (σ12 ≡ σ21):
σ11 = 19(a21,1 + . . .+ a
21,10
)= 19
((−3)2 + (−2)2 + (−2)2 + . . .+ 22 + 32
)= 389
σ22 = 19(a22,1 + . . .+ a
22,10
)= 19
((−2)2 + (−3)2 + (−2)2 + . . .+ 32 + 22
)= 389
σ12 = 19 (a1,1a2,1 + . . .+ a1,10a2,10)= 19 ((−3)(−2) + (−2)(−3) + (−2)(−2) + . . .+ 2 · 3 + 3 · 2) =
329
32 / 37
Outline SVD Linear Systems PCA Multilinear models
PCA: An Example (cont.)
u1u2
⇒ ⇒
2
2
2
Covariance matrix Σ10 = 19
[38 3232 38
]⇒∣∣∣∣ 38− λ 3232 38− λ
∣∣∣∣ = 0⇒ (38− λ)2 − 322 ≡ (70− λ)(6− λ) = 0 ⇒ Eigenvalues and vectors:
λ1 = 70; u1 = 1√2
[11
]; λ2 = 6; u2 = 1√2
[1−1
]⇒ Projection matrix P1 = 12
[11
] [1 1
]= 12
[1 11 1
]⇒
P1A =[−2.5 −2.5 −2 −1 0 0 1 2 2.5 2.5−2.5 −2.5 −2 −1 0 0 1 2 2.5 2.5
]33 / 37
Outline SVD Linear Systems PCA Multilinear models
Limitations of SVD and PCA
• Matrices U and V in SVD are not at all sparse• Thus: significant cost of computing and using them
• Sparse matrices yield lower computational costs!• Sparseness of very large matrices expected in computational
science reflects typical “local” behaviour of practical problems• A large world with small neighbourhoods
• But: orthogonal eigenvectors and singular vectors are notlocal. . .
• SVD and PCA work only with “2-dimensional” data(matrices)• Eigenvectors and the SVD have no perfect extension to data
with larger number of indices• E.g. tensors with three or four indices being used frequently
in physics and engineering
34 / 37
Outline SVD Linear Systems PCA Multilinear models
Tucker Model
Multiway array: data indexed with 3 or more indices:
YI×J×K = {yijk : i = 1, . . . , I; j = 1, . . . , J ; k = 1, . . . ,K}
• 1 index – a vector; 2 indices – a matrix; ≥3 indices – a tensor• 3 ways (or modes) in a tensor: row ( ), column( ), tube( )
Model of a 3rd-order tensor Y: yijk ≈L∑l=1
M∑m=1
N∑n=1
ailbjmcknglmn
YI
J
K
≈ GL
M
N
BJ ×M
AI × L
CK ×N
35 / 37
Outline SVD Linear Systems PCA Multilinear models
Forming a Tucker Model
Decomposition of a tensor Y:• 3 orthogonal mode matrices A, B, and C, weighted by the
core 3-way array G of size L×M ×N• The core array G is analogous to the diagonal singular value
matrix D in SVD
Unfolding of a tensor – its rearrangement into a matrix
• One possible unfolding of a 3-way array YI×J×K : into aI × JK matrix Y(1)
YI×J×KI
J
K
Ik = 1 k = 2 k = 3 . . . k = K
JK
36 / 37
Outline SVD Linear Systems PCA Multilinear models
Forming a Tucker Model
3-mode SVD for decomposing a 3-way array YI×J×K :1 Find the I × L matrix A: SVD of the unfolded I × JK
matrix Y(1); taking A from the left matrix of SVD: L ≤ JKY(1)
I × JK
=⇒ UI × JK︸︷︷︸
I×LD
JK × JK
VT
JK × JK
⇒ AI × L
2 Find the J ×M matrix B: SVD of the unfolded I × JKmatrix Y(2); taking B from the left matrix of SVD: M ≤ IK
3 Find the K ×N matrix C: SVD of the unfolded K × IJmatrix Y(3); taking C from the left matrix of SVD: N ≤ IJ
4 Find the core tensor G:
glmn =I∑i=1
J∑j=1
K∑k=1
ailbjmcknyijk
37 / 37
OutlineSVDLinear SystemsPCAMultilinear models