Top Banner
9/28/2018 1 SVD, SVD applications to LSA, non-negative matrix factorizations Presented By: Sumedha Singla Singular Value Decomposition (SVD)
18

SVD, SVD applications to LSA, non-negative matrix ...

Nov 14, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

1

SVD, SVD applications to LSA, non-negative matrix

factorizations

Presented By: Sumedha Singla

Singular Value Decomposition (SVD)

Page 2: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

2

Singular Value Decomposition (SVD)

SVD of a matrix X

๐—n ร—d = ๐”n ร—n๐šบn ร—d๐•d ร—dT or

๐—n ร—d = ๐”n ร— k๐šบk ร—k๐•k ร—dT

โ€ข ๐—: A set of n points in โ„d with rank kโ€ข ๐” : Left Singular Vectors of ๐—โ€ข ๐• : Right Singular Vectors of ๐—โ€ข ๐šบ: Rectangular diagonal matrix with positive real entries.

๐— = u1 โ€ฆ uk โ€ฆ un

ฯƒ1โ‹ฑ

ฯƒk

v1T โ€ฆ vk

T โ€ฆ vdT

๐— = ๐”๐šบ๐•T = u1ฯƒ1v1T + โ€ฆ+ ukฯƒkvk

T =

i=1

k

uiฯƒiviT

Singular Value Decomposition (SVD)

SVD of a matrix X

๐— ๐ฏ๐ข = ฯƒi๐ฎ๐ขโ€ข Finding an orthogonal basis for the row space that gets transformed into an

orthogonal basis for the column space.

โ€ข The columns of ๐” and ๐• are bases for the row and column spaces, respectively.

โ€ข ๐” and ๐• are orthonormal square matrix i.e๐•๐•T = ๐•T๐• = ๐ˆ๐”๐”T = ๐”T๐” = ๐ˆ

โ€ข Usually, ๐” โ‰  ๐•.

Page 3: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

3

Motivation

โ€ข Goal: Find the best k-dimensional subspace w.r.t ๐— (Project ๐— to โ„k where k < d)โ€ข minimize the sum of the squares of the perpendicular distances of the

points to the subspace

โ€ข Consider a set of 2d points. ๐—n ร—2, ๐ฑi โˆˆโ„2; 1 โ‰ค i โ‰ค nโ€ข Goal: Find the best fitting line through

origin w.r.t ๐—โ€ข Here, k = 1โ€ข Best least square fit

โ€ข Minimize ฯƒฮฑi2 or

โ€ข Maximize ฯƒฮฒi2 i.e projection of ๐ฑi

on subspace

โ€ข ๐ฏ: A unit vector in the direction of the best fitting line through origin w.r.t ๐—

โ€ข ฮฒi = |xi . ๐ฏ|

โ€ข Best least square fitโ€ข Maximizing ฯƒฮฒi

2 = |๐— . ๐ฏ|2

โ€ข First singular vectorโ€ข ๐ฏ1 = argmax

๐ฏ =1|๐— . ๐ฏ|

โ€ข First singular valueโ€ข ๐œŽ1 = |๐— . ๐ฏ1|

โ€ข Greedy approach for subsequent singular vectorsโ€ข Best fit line perpendicular to ๐ฏ1โ€ข ๐ฏ2 = arg max

๐ฏโŠฅ๐ฏ1, ๐ฏ =1|๐— . ๐ฏ|

Singular Vectors ๐— ๐ฏ๐ข = ฯƒi ๐ฎ๐ข

Page 4: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

4

Intuitive Interpretation

A composition of three geometrical transformations: a rotation or reflection, a scaling, and another rotation or reflection.

๐— = ๐” ๐šบ ๐•T

โ€ข Consider a unit circle๐ฑโ€ฒ. ๐ฑโ€ฒ = ๐Ÿ

โ€ข An ellipse of any size and orientation by stretching and rotating it.

โ€ข Consider 2-d points and fit an ellipse with major axis (a) and minor axes (b) to them.

โ€ข Consider,

๐’ =a 00 b

, ๐‘ =cos ฮธ sin ฮธโˆ’ sin ฮธ cos ฮธ

โ€ข Any point can be transformed as๐ฑโ€ฒ = ๐ฑ ๐‘ ๐’โˆ’1

โ€ข The equation of unit circle๐’โˆ’1๐‘T๐ฑ . ๐ฑ ๐‘ ๐’โˆ’1 = ๐Ÿ

Intuitive Interpretation ๐— = ๐” ๐šบ ๐•T

Page 5: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

5

โ€ข Resulting matrix equation

๐’โˆ’1๐‘T๐—T๐—๐‘๐’โˆ’1 = ๐Ÿ

โ€ข If we regard ๐— as a collection of points, thenโ€ข The singular values are the axes of a least squares fitted ellipsoid

โ€ข ๐• is orientation of the ellipsoid.

โ€ข The matrix ๐” is the projection of each of the points in ๐— onto the axes.

Intuitive Interpretation ๐— = ๐” ๐šบ ๐•T

โ€ข Natural Language Processingโ€ข Documents with 2 concepts:

โ€ข Computer Science (CS)โ€ข Medical Documents (MD)

SVD Example ๐—n ร—d = ๐”n ร— k๐šบk ร—k๐•k ร—dT

Term-Document MatrixRow: 1 DocumentColumns: 1 Term

Document-Concept Similarity Matrix

Row: 1 DocumentColumns: 1 Concept

Concept Strength MatrixRow: 1 Concept

Term-Concept MatrixRow: 1 ConceptColumn: 1 Term

Page 6: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

6

Eigen Vector

โ€ข An eigenvector of a square matrix ๐— is a nonzero vector ๐ฏ such that multiplication by ๐— alters only the scale of ๐ฏ

๐—๐ฏ = ฮป๐ฏโ€ข ฮป: Eigen value

โ€ข ๐ฏ: Unit Eigen vector

Eigen Value Decomposition

๐— = ๐• diag(๐›Œ)๐•โˆ’1 whereโ€ข Eigen vector matrix ๐• = [v1, โ€ฆ , vn]

โ€ข Diagonal matrix ๐›Œ = ฮป1, โ€ฆ , ฮปnMore general form

๐— = ๐ ๐œฆ ๐‘ธ๐‘‡

Eigen value decomposition ๐— = ๐” ๐šบ ๐•T

โ€ข Eigen value decomposition: ๐— = ๐ ๐œฆ ๐‘ธ๐‘‡

โ€ข ๐— needs โ€ข orthonormal eigen vectors to allow ๐” = ๐• = ๐.

โ€ข Eigenvalues ๐œ† โ‰ฅ 0 if ๐œฆ = ๐šบ.

โ€ข Hence, ๐— must be a positive semi-definte (or definite) symmetric matrix.

โ€ข Eigen value decomposition is a special case of SVD.

When is singular values same as eigen values

๐— = ๐” ๐šบ ๐•T

Page 7: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

7

Rather than solving for U, V and ฮฃ simultaneously, we multiply both sides by๐—T = ๐‘ฝ ๐šบT ๐‘ผT

๐—T๐— = (๐” ๐šบ ๐•T )T (๐” ๐šบ ๐•T )

= ๐‘ฝ ๐šบT ๐‘ผT๐” ๐šบ ๐•T

= ๐‘ฝ ๐šบT ๐šบ ๐•T

= ๐‘ฝ ๐šบ2 ๐•T

This is the form of eigen value decomposition. ๐— = ๐ ๐œฆ ๐‘ธ๐‘‡

๐•: The eigen vectors of ๐—T๐—.

๐šบT ๐šบ: The eigen value matrix of ๐—T๐—.๐œŽ๐‘– = ฮปi

U: The eigen vectors of ๐—๐—T.

Calculating SVD using Eigen value decomposition

๐— = ๐” ๐šบ ๐•T

We know that,

๐ฎ๐‘–๐‘‡ ๐ฎj =

๐— ๐ฏi

ฯƒi

T ๐— ๐ฏj

ฯƒj

๐ฎ๐‘–๐‘‡ ๐ฎj =

๐ฏ๐‘–๐‘‡ ๐—T๐—๐ฏj

ฯƒi ฯƒj=

๐—T๐—

ฯƒi ฯƒj๐ฏ๐‘–๐‘‡๐ฏj = 0

๐”: The orthonormal eigen vectors of ๐— ๐—T.

We can thus write,

๐— ๐—T๐” = ๐” ๐šบ๐Ÿ

SVD and Eigen value decomposition ๐— ๐ฏ๐ข = ฯƒi ๐ฎ๐ข

Page 8: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

8

โ€ข Consider ๐— =4 4โˆ’3 3

โ€ข Compute ๐—T๐— =4 โˆ’34 3

4 4โˆ’3 3

=25 77 25

โ€ข Orthogonal Eigen vector of ๐—T๐—

โ€ข ๐ฏ1 =1/ 2

1/ 2and ๐ฏ2 =

1/ 2

โˆ’1/ 2

โ€ข Eigen values of ๐—T๐—โ€ข ๐œŽ1

2 = 32 and ๐œŽ22 = 18

โ€ข We have,

4 4โˆ’3 3

=4 2 0

0 3 2

1/ 2 1/ 2

1/ 2 โˆ’1/ 2

Example SVD

โ€ข Consider ๐— =4 4โˆ’3 3

โ€ข Compute ๐— ๐—T =4 4โˆ’3 3

4 โˆ’34 3

=32 00 18

โ€ข Orthogonal Eigen vector of ๐— ๐—T

โ€ข ๐ฎ1 =10

and ๐ฎ2 =0โˆ’1

โ€ข We have,

4 4โˆ’3 3

=1 00 โˆ’1

4 2 0

0 3 2

1/ 2 1/ 2

1/ 2 โˆ’1/ 2

Example SVD

Page 9: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

9

โ€ข An eigen-decomposition is valid only for square matrix. Any matrix (even rectangular) has an SVD.

โ€ข In eigen-decomposition ๐— = ๐ ๐œฆ ๐‘ธ๐‘‡ , the eigen-basis (๐) is not always orthogonal. The basis of singular vectors is always orthogonal.

โ€ข In SVD we have two singular-spaces (right and left).

โ€ข Computing the SVD of a matrix is more numerically stable.

SVD vs Eigen Decomposition

โ€ข The covariance matrix of ๐— is given by

๐‚๐จ๐ฏ = ๐—T ๐—/(๐ง โˆ’ ๐Ÿ)

โ€ข The eigen value decomposition of ๐‚๐จ๐ฏ matrix๐‚๐จ๐ฏ = ๐ ๐œฆ ๐‘ธ๐‘‡

Where,

๐ is a matrix of eigenvectors of ๐‚๐จ๐ฏ or principal axes of ๐—

๐œฆ is a diagonal matrix with eigenvalues ฮปi in the decreasing order on the diagonal.

SVD and PCA ๐— = ๐” ๐šบ ๐•T

Page 10: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

10

โ€ข We can rewrite covariance matrix of ๐— as

๐‚๐จ๐ฏ = ๐—T ๐—/(๐ง โˆ’ ๐Ÿ)๐‚๐จ๐ฏ = ๐• ๐šบ ๐”T๐” ๐šบ ๐•T/(๐ง โˆ’ ๐Ÿ)

= ๐•๐šบ2

(๐‘› โˆ’ 1)๐•T

โ€ข Right singular vector ๐• is the principal axes

โ€ข ฮปi = ฮคฯƒi2 (n โˆ’ 1)

โ€ข ๐— ๐• = ๐” ๐šบ ๐•T๐• = ๐” ๐šบ

โ€ข The columns of ๐” ๐šบ are the principal components.

SVD and PCA ๐— = ๐” ๐šบ ๐•T

โ€ข Input Data: ๐—n ร—dโ€ข Goal: Reduce the dimensionality to k where k < d

โ€ข Select k first columns of ๐”, and k ร— k upper-left part of ๐šบ

โ€ข Construct ๐ = ๐”k ๐šบk ร— k

โ€ข ๐ is the required n ร— k matrix containing first k PCs.

SVD for dimensionality reduction ๐— = ๐” ๐šบ ๐•T

Page 11: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

11

The best approximation to ๐— by a rank deficient matrix is obtained by the top singular values and vectors of ๐—.

๐—k =

i=1

k

uiฯƒiviT

Then,min

๐ โˆˆโ„n ร—d rank ๐ โ‰คk๐— โˆ’ ๐ 2 = ๐— โˆ’ ๐—k 2 = ฯƒk+1

ฯƒk+1 is the largest singular value of ๐— โˆ’ ๐—k.

๐—k is the best rank k 2-norm approximation of ๐—.

Rank-k approximation in the spectral norm

๐— =

i=1

๐‘‘

uiฯƒiviT

โ€ข Determining range, null space and rank (also numerical rank).

โ€ข Matrix approximation.

โ€ข Inverse and Pseudo-inverse: โ€ข If ๐— = ๐” ๐šบ ๐•T and ๐šบ is full rank, then ๐—โˆ’1 = ๐• ๐šบโˆ’1๐”T.

โ€ข If ๐šบ is singular, then its pseudo-inverse is given by ๐—โ€  = ๐• ๐šบโ€ ๐”T , where

๐šบโ€  is formed by replacing every nonzero entry by its reciprocal.

โ€ข Least squares: โ€ข If we need to solve ๐€๐ฑ = b in the least-squares sense, then ๐ฑLS =

๐• ๐šบโ€ ๐”T b

โ€ข Denoising โ€“ Small singular values typically correspond to noise.

Applications of SVD

Page 12: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

12

โ€ข Input matrix: term-document matricesโ€ข Rows: represents words.

โ€ข Columns: represents documents.

โ€ข Value: the count of the words in the document.

โ€ข Example:

Latent Semantic Analysis using SVD

๐— =

1 1 1 0 0 00 0 0 0 1 00 1 0 0 0 01 1 1 1 1 10 1 0 0 1 0

โ€ข Consider ๐—, the term-document matrix.

โ€ข Then, โ€ข ๐” is the SVD term matrix

โ€ข ๐• is the SVD document matrix

โ€ข SVD provides a low rank approximation for ๐—.

โ€ข Constrained optimization problemโ€ข Goal: Represent ๐— as ๐—k with low Frobenius norm for the error ๐— - ๐—k

Latent Semantic Indexing (LSI) ๐—n ร—d = ๐”n ร— k๐šบk ร—k๐•k ร—dT

Page 13: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

13

Latent Semantic Indexing (LSI) ๐—n ร—d = ๐”n ร— k๐šบk ร—k๐•k ร—dT

Latent Semantic Indexing (LSI) ๐—n ร—d = ๐”n ร— k๐šบk ร—k๐•k ร—dT

k = 2

We can get rid of zero valued columns and rowsAnd have a 2 x 2 concept strength matrix

We can get rid of zero valued columnsAnd have a 5 x 2 term-to-concept similarity matrix

We can get rid of zero valued columnsAnd have a 2 x 6concept-to-doc similarity matrix

Page 14: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

14

Latent Semantic Indexing (LSI)

30.02dim

44.01dimr

Q

0

0

0

0

1cos

truck

car

moon

astronaut

monaut

Q

0)2,cos( dQ 88.0)2,cos( dQr

Original space Reduced latent semantic space

We see that query is not related to document 2 in the original space but in the latent semantic space they become highly related.

โ€ข SVD allow words and documents to be mapped into the same "latent semantic spaceโ€œ.

โ€ข LSI projects queries and documents into a space with latent semantic dimensions.โ€ข Co-occurring words are projected on the same dimensions

โ€ข Non-co-occurring words are projected onto different dimensions

โ€ข LSI captures similarities between wordsโ€ข For example, we want to project โ€œcarโ€ and โ€œautomobileโ€ onto the same

dimension.

โ€ข Dimensions of the reduced semantic space correspond to the axes of greatest variation in the original space.

Latent Semantic Indexing (LSI)

Page 15: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

15

โ€ข Extracting information from link structures of a hyperlinked environment, rank pages relevant to a topic

โ€ข Essentials:โ€ข Authorities

โ€ข Hubs

โ€ข Goal: Identify good authorities and hubs for a topic.

โ€ข Each page receive two scores, โ€ข Authority score ๐ด(๐‘): It estimates value of content on page

โ€ข Hub score ๐ป(๐‘): It estimates value of links on page

Kleinbergโ€™s Algorithm Hyperlink-Induced Topic Search (HITS)aka โ€˜hubs and authoritiesโ€™

โ€ข For a topic, authorities are relevant nodes which are referred by many hubs. (high in degree)

โ€ข For a topic, hubs are nodes which connect many related authorities for that topic. (high out degree)

Authorities and Hubs

Page 16: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

16

โ€ข Three Steps1. Create a focused base-set of the Web.

โ€ข Start with a root set.

โ€ข Add any page pointed by a page in the root set to it.

โ€ข Add any page that points to a page in the root set to it (at most d).

โ€ข The extended root set becomes our base set.

2. Iteratively compute hub and authority scores.โ€ข A(p): sum of H q for all q pointing to p.

โ€ข H(q): sum of A p for all p pointing to q.

โ€ข Starts with all scores as 1, and Iteratively repeat till convergence.

3. Filter out the top hubs and authorities

HITS (cont.)

โ€ข G (root set) is a directed graph with web pages as nodes and their links.

โ€ข G can be presented as a connectivity matrix Aโ€ข A(i,j)=1 only if i-th page points to j-th page.

โ€ข Authority weights can be represented as a unit vector aโ€ข ai The authority weight of the i-th page

โ€ข Hub weights can be represented as a unit vector hโ€ข hi : The hub weight of the i-th page

Matrix Notation

Page 17: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

17

โ€ข Updating authority weights: a = ATh

โ€ข Updating hub weights:h = Aa

โ€ข After k iterations:a1 = ATh0h1 = Aa1

โ†’ h1 = AATh0โ†’ hk = (AAT)kh0

โ€ข Convergenceโ€ข ak: Converges to principal eigen vector of ATAโ€ข hk: Converges to principal eigen vector of AAT

Algorithm

Given A โˆˆ โ„+n ร— d and a desired rank k โ‰ช min(n, d),

Find W โˆˆ โ„+n ร— k and H โˆˆ โ„+

k ร—n s.t. Aโ‰ˆ WH.

โ€ข minWโ‰ฅ0,H โ‰ฅ0

||A โˆ’WH||F

โ€ข Nonconvex.

โ€ข W and H not unique ( e.g. W = WD โ‰ฅ 0, H = Dโˆ’1H โ‰ฅ 0)

Notation: โ„+ nonnegative real numbers

Nonnegative Matrix Factorization (NMF)

Page 18: SVD, SVD applications to LSA, non-negative matrix ...

9/28/2018

18

โ€ข SVD gives: A = UฮฃVT

โ€ข Then, A โˆ’ UฮฃVT F โ‰ค min A โˆ’WH F

โ€ข Then WHY NMF???

โ€ข NMF works better in terms of its non-negativity constraints. Example in โ€ข Text mining. (A is represented as counts, so is strictly

positive.)

Nonnegative Matrix Factorization (NMF)

โ€ข https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/positive-definite-matrices-and-applications/singular-value-decomposition/MIT18_06SCF11_Ses3.5sum.pdf

โ€ข https://archive.siam.org/meetings/sdm11/park.pdf

References