Top Banner
18/09/2017 1 Single Value Decomposition SVD – Example: Users-to-Movies A = U V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 2 = SciFi Romnce Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 m n U V T “Concepts” AKA Latent dimensions AKA Latent factors SVD – Example: Users-to-Movies A = U V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 3 = SciFi Romnce x x Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 SVD – Example: Users-to-Movies A = U V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 4 SciFi-concept Romance-concept = SciFi Romnce x x Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
16

SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

Mar 12, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

1

Single Value Decomposition

SVD – Example: Users-to-Movies

• A = U VT - example: Users to Movies

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 2

=

SciFi

Romnce

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

m

n

U

VT

“Concepts” AKA Latent dimensions AKA Latent factors

SVD – Example: Users-to-Movies

• A = U VT - example: Users to Movies

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 3

=

SciFi

Romnce

x x

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

SVD – Example: Users-to-Movies

• A = U VT - example: Users to Movies

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 4

SciFi-concept

Romance-concept

=

SciFi

Romnce

x x

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

Page 2: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

2

SVD – Example: Users-to-Movies

• A = U VT - example:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 5

Romance-concept

U is “user-to-concept” similarity matrix

SciFi-concept

=

SciFi

Romnce

x x

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

SVD – Example: Users-to-Movies

• A = U VT - example:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 6

SciFi

Romnce

SciFi-concept

“strength” of the SciFi-concept

=

SciFi

Romnce

x x

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

SVD – Example: Users-to-Movies

• A = U VT - example:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 7

SciFi-concept

V is “movie-to-concept” similarity matrix

SciFi-concept

=

SciFi

Romnce

x x

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman:

Mining of Massive Datasets, http://www.mmds.org

8

SVD - Interpretation #1

‘movies’, ‘users’ and ‘concepts’: • U: user-to-concept similarity matrix

• V: movie-to-concept similarity matrix

• : its diagonal elements: ‘strength’ of each concept

Page 3: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

3

Case study: How to query? • Q: Find users that like ‘Matrix’

• A: Map query into a ‘concept space’ – how?

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 9

=

SciFi

Romnce

x x

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

0.13 0.02 -0.01

0.41 0.07 -0.03

0.55 0.09 -0.04

0.68 0.11 -0.05

0.15 -0.59 0.65

0.07 -0.73 -0.67

0.07 -0.29 0.32

12.4 0 0

0 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.09

0.12 -0.02 0.12 -0.69 -0.69

0.40 -0.80 0.40 0.09 0.09

Case study: How to query? • Q: Find users that like ‘Matrix’

• A: Map query into a ‘concept space’ – how?

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 10

5 0 0 0 0

q =

Matrix

Alie

n

v1

q

v2

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

Project into concept space:

Inner product with each

‘concept’ vector vi

Case study: How to query? • Q: Find users that like ‘Matrix’

• A: Map query into a ‘concept space’ – how?

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 11

v1

q

q*v1

5 0 0 0 0

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

v2

Matrix

Alie

n

q =

Project into concept space:

Inner product with each

‘concept’ vector vi

Case study: How to query? Compactly, we have:

qconcept = q V

E.g.:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 12

movie-to-concept similarities (V)

=

SciFi-concept

5 0 0 0 0

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

q =

0.56 0.12

0.59 -0.02

0.56 0.12

0.09 -0.69

0.09 -0.69

x 2.8 0.6

Page 4: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

4

Case study: How to query? • How would the user d that rated

(‘Alien’, ‘Serenity’) be handled? dconcept = d V

E.g.:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 13

movie-to-concept similarities (V)

=

SciFi-concept

0 4 5 0 0

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

q =

0.56 0.12

0.59 -0.02

0.56 0.12

0.09 -0.69

0.09 -0.69

x 5.2 0.4

Case study: How to query?

• Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to user q that rated (‘Matrix’), although d and q have zero ratings in common!

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 14

0 4 5 0 0

d =

SciFi-concept

5 0 0 0 0

q =

Mat

rix

Alie

n

Sere

nit

y

Cas

abla

nca

Am

elie

Zero ratings in common Similarity ≠ 0

2.8 0.6

5.2 0.4

Centralizando os Dados

• Em algumas aplicações é interessante centralizar os dados antes de aplicar a decomposição SVD

– A centralização consiste em subtrair cada linha da matriz A da média das linhas da matriz (centroide dos pontos respresentados pelas linhas de A)

Centralizando os dados

• Ao aplicar o SVD sem centralizar os dados obtermos o subespaço (hiperplano que passa pela origem) minimiza a soma dos quadrados das distâncias perpendiculares aos dados da matriz A

• Ao aplicar o SVD após centralizar os dados obtermos o espaço afim (affine space) que minimiza a soma dos quadrados das distâncias perpendiculares aos dados da matriz A

Page 5: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

5

Centralizando os Dados

Compressão de Dados

Cenário

• 130 imagens de ‘3’ escritos por mãos

– cada uma representada em uma matriz 16x16.

– cada representação precisa de 16x16 floats => imagens em R256

Compressão de Dados Compressão de Dados

• Modelo com duas direções

• Ao todo 16x16=256 direções possíveis

• 12 direções principais respondem por 63% da variância e 50 por 90% delas

Page 6: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

6

Compressão de Dados

Pontos com círculo em volta no gráfico a esquerda

Compressão de Dados

• Na figura anterior temos a coleção de dígitos 3 projetadas no espaço das duas primeiras direções principais

Alguma interpretação para as componentes?

Compressão de Dados

• Na figura anterior temos a coleção de dígitos 3 projetadas no espaço das duas primeiras direções principais

– Primeira componente (movimento horizontal da parte inferior do 3)

– Segunda componente (espessura)

Compressão de Dados

Page 7: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

7

Entrada. Matriz A representando coleção com n documentos a partir de um vocabulário de d termos

Saida. Dado um conjunto de queries (novos documentos) q(1),...,q(p), determinar eficientemente a similaridade (produto interno) entre cada documento e cada query.

Similaridade Aproximada

Abordagem 1

• Vetor de similaridades entre A e query q é dado por Aq.

• Pode ser computado em O(nd)

Similaridade Aproximada

Abordagem 2

• Utilizar Ak em vez de A

– Temos Ak = 1u1v1T + …+ kukvk

T , onde ui é um

vetor em Rn e vi é um vetor em Rd

– Logo, iuiviTq pode ser calculado em O(n+d)

• calculamos viT q em O(d) e depois i ui (vi

T q) em O(n)

• Complexidade total: O(k(n+d)).

– Vantajoso se k<<min{n,d}

Similaridade Aproximada

Abordagem 2

• Qual o erro de usar Ak?

– Máximo em q de |Aq- Akq |= |(A- Ak)q|

– O erro pode ser Ilimitado se q não tiver nenhuma restrição

Similaridade Aproximada

Page 8: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

8

Def. A norma espectral ||A||2 de A é igual a

max{ |Ax|:|x| 1}

• ||A- Ak ||2 mede o maior erro possível para vetores de norma menor ou igual a 1

• Possível mostrar (Teo 3.9) que

||A- Ak ||2 ||A-B||2

para toda matriz B de rank no máximo k

Similaridade Aproximada

• Coleção com n documentos a partir de um vocabulário de d termos

• Podemos representar a coleção utilizando uma matriz A de n linhas e d colunas

Como rankear os documentos da coleção de acordo com sua importância intrínseca?

Ranking de documentos

Ranking

1. Calcule a direção principal da coleção v1 (melhor representação unidimensional da coleção conforme norma de Frobenius)

2. Projete cada documento na direção v1 e ordene do maior para o menor (coordenadas de u1 )

Ranking de documentos

HITS: Hubs and Authorities

Page 9: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

9

Hubs and Authorities • HITS (Hypertext-Induced Topic Selection)

– Is a measure of importance of pages or documents, similar to PageRank

– Proposed at around same time as PageRank (‘98)

• Goal: Say we want to find good newspapers

– Don’t just find newspapers. Find “experts” – people who link in a coordinated way to good newspapers

• Idea: Links as votes

– Page is more important if it has more links

• In-coming links? Out-going links?

38

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

Finding newspapers

• Hubs and Authorities Each page has 2 scores:

– Quality as an expert (hub):

• Total sum of votes of authorities pointed to

– Quality as a content (authority):

• Total sum of votes coming from experts

• Principle of repeated improvement

39

NYT: 10

Ebay: 3

Yahoo: 3

CNN: 8

WSJ: 9

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

Hubs and Authorities

Interesting pages fall into two classes: 1. Authorities are pages containing

useful information – Newspaper home pages – Course home pages – Home pages of auto manufacturers

2. Hubs are pages that link to authorities – List of newspapers – Course bulletin – List of US auto manufacturers

40

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

Counting in-links: Authority

41

(Note this is idealized example. In reality graph is not bipartite and

each page has both the hub and authority score)

Each page starts with hub

score 1. Authorities collect

their votes

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

Page 10: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

10

Counting in-links: Authority

42

(Note this is idealized example. In reality graph is not bipartite and

each page has both the hub and authority score)

Sum of hub

scores of nodes

pointing to NYT.

Each page starts with hub

score 1. Authorities collect

their votes

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

Expert Quality: Hub

43

Hubs collect authority scores

(Note this is idealized example. In reality graph is not bipartite and

each page has both the hub and authority score)

Sum of authority

scores of nodes that

the node points to.

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

Reweighting

44

Authorities again collect

the hub scores

(Note this is idealized example. In reality graph is not bipartite and

each page has both the hub and authority score) J. Leskovec, A. Rajaraman, J. Ullman:

Mining of Massive Datasets, http://www.mmds.org

Mutually Recursive Definition • A good hub links to many good authorities

• A good authority is linked from many good hubs

• Model using two scores for each node:

– Hub score and Authority score

– Represented as vectors 𝒉 and 𝒂

45 J. Leskovec, A. Rajaraman, J. Ullman:

Mining of Massive Datasets, http://www.mmds.org

Page 11: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

11

Hubs and Authorities • Each page 𝒊 has 2 scores:

– Authority score: 𝒂𝒊

– Hub score: 𝒉𝒊

HITS algorithm:

• Initialize: 𝑎𝑗(0)= 1/ N, hj

(0)= 1/ N

• Then keep iterating until convergence:

– ∀𝒊: Authority: 𝑎𝑖(𝑡+1)= ℎ𝑗

(𝑡)𝒋→𝒊

– ∀𝒊: Hub: ℎ𝑖(𝑡+1)= 𝑎𝑗

(𝑡)𝒊→𝒋

– ∀𝒊: Normalize:

𝑎𝑖𝑡+1

2

𝑖 = 1, ℎ𝑗𝑡+1

2

𝑗 = 1

[Kleinberg ‘98]

46

i

j1 j2 j3 j4

𝒂𝒊 = 𝒉𝒋𝒋→𝒊

j1 j2 j3 j4

𝒉𝒊 = 𝒂𝒋𝒊→𝒋

i

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

Hubs and Authorities • HITS converges to a single stable point

• Notation:

– Vector 𝒂 = (𝑎1… , 𝑎𝑛), 𝒉 = (ℎ1… , ℎ𝑛)

– Adjacency matrix 𝑨 (NxN): 𝑨𝒊𝒋 = 1 if 𝑖𝑗, 0 otherwise

• Then 𝒉𝒊 = 𝒂𝒋𝒊→𝒋

can be rewritten as 𝒉𝒊 = 𝑨𝒊𝒋 ⋅ 𝒂𝒋𝒋

So: 𝒉 = 𝑨 ⋅ 𝒂

• Similarly, 𝒂𝒊 = 𝒉𝒋𝒋→𝒊

can be rewritten as 𝒂𝒊 = 𝑨𝒋𝒊 ⋅ 𝒉𝒋𝒋 = 𝑨𝑻 ⋅ 𝒉 47

[Kleinberg ‘98]

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

Hubs and Authorities • HITS algorithm in vector notation:

– Set: 𝒂𝒊 = 𝒉𝒊 =𝟏

𝒏

Repeat until convergence:

– 𝒉 = 𝑨 ⋅ 𝒂

– 𝒂 = 𝑨𝑻 ⋅ 𝒉

– Normalize 𝒂 and 𝒉

• Then: 𝒂 = 𝑨𝑻 ⋅ (𝑨 ⋅ 𝒂)

new 𝒉

new 𝒂

𝒂 is updated (in 2 steps): 𝑎 = 𝐴𝑇(𝐴 𝑎) = (𝐴𝑇𝐴) 𝑎 h is updated (in 2 steps): ℎ = 𝐴

(𝐴𝑇ℎ) = (𝐴 𝐴𝑇) ℎ

Repeated matrix powering 48

ℎ𝑖𝑡 − ℎ𝑖

𝑡−12

𝑖

< 𝜀

𝑎𝑖𝑡 − 𝑎𝑖

𝑡−12

𝑖

< 𝜀

Convergence criterion:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

Properties

• Under reasonable assumptions about A, HITS converges to vectors h* and a*:

– a* is the first right singular vector of matrix A AT

– h* is the projection of matrix A AT onto a*

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 49

Page 12: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

12

Example of HITS

1 1 1

A = 1 0 1

0 1 0

1 1 0

AT = 1 0 1

1 1 0

h(yahoo)

h(amazon)

h(m’soft)

=

=

=

.58

.58

.58

.80

.53

.27

.80

.53

.27

.79

.57

.23

. . .

. . .

. . .

.788

.577

.211

a(yahoo) = .58

a(amazon) = .58

a(m’soft) = .58

.58

.58

.58

.62

.49

.62

. . .

. . .

. . .

.628

.459

.628

.62

.49

.62

50

Yahoo

M’soft Amazon

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org

?????

Hubs que apontam para amazon não são bons!!!

Definição. Um autovetor de uma matriz quadrada B é um vetor v que satisfaz

Bv=v

para um escalar , denominado autovalor

• Como uma matriz B pode ser vista como uma transformação linear então um autovetor é um subespaço de dimensão 1 invariante para B

• Autovetores aparecem uma série de aplicações importantes (e.g. PageRank, HITS)

SVD e Auto Vetores

Observação. Se B=ATA então

(i) os vetores singulares a direita de A, v1,..,v k

são os autovetores de B com autovalores 12

,...,k2

(ii) O vetores singulares a esquerda de A, u1,..,u k

são os autovetores de BT=AAT com autovalores 1

2 ,...,k2

(iii) A matriz B é semi-definida positiva, ou seja, yBy>= 0 para todo y em Rn

SVD e Auto Vetores SVD: Limitações + Aproximação ótima com dimensão reduzida

em termos da norma de Frobenius - Interpretação

– Uma direção específica é combinação linear de todas as linhas da matriz

- Falta de esparsidade – Vetores singulares podem ser densos mesmo quando

a matriz é esparsa!

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 61

=

U

VT

Page 13: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

13

CUR Decomposition

CUR Decomposition

• Goal: Express A as a product of matrices C,U,R

Make ǁA-C·U·RǁF small

• “Constraints” on C and R:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 63

A C U R

Frobenius norm:

ǁXǁF = Σij Xij2

CUR Decomposition • Goal: Express A as a product of matrices C,U,R

Make ǁA-C·U·RǁF small

• “Constraints” on C and R:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 64

A C U R

Frobenius norm:

ǁXǁF = Σij Xij2

CUR: How it Works

• Sampling columns (similarly for rows):

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 65

Note this is a randomized algorithm, same

column can be sampled more than once

Page 14: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

14

Computing U (DKM 2006) Notation

• r(i): i-th sampled row of A;

• c(j): j-th sampled column of A

• Q[j]: probability of choosing the j-th row of A

• P[j]: probability of choosing the j-th column of A

• r: number of rows and

• c: number of columns

Define W as the matrix of r lines and c columns where

𝑊𝑖,𝑗 =𝐴𝑟 𝑖 ,𝑐(𝑗)

𝑟𝑐 𝑄[𝑟 𝑖 ]𝑃 𝑐 𝑗

Computing U (DKM 2006) 1. Let CTC =1

2 (v1 v1T

)+...+ k2 (vk vk

T) be the SVD decomposition of CTC

2. Define 𝑊′ =

𝑣𝑖𝑣𝑇𝑖

12

𝑘𝑖=1

3. Define

U= W’ W T

CUR: Provably good approx. to SVD

• For example:

– Select 𝒄 = 𝑶𝒌 𝒍𝒐𝒈 𝒌

𝜺𝟐 columns of A using

ColumnSelect algorithm

– Select 𝒓 = 𝑶𝒌 𝒍𝒐𝒈 𝒌

𝜺𝟐 rows of A using

ColumnSelect algorithm

– Set 𝑼 = 𝑾+

• Then:

with probability 98%

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 73

In practice:

Pick 4k cols/rows

for a “rank-k” approximation

SVD error CUR error

CUR: Pros & Cons

+ Easy interpretation • Since the basis vectors are actual

columns and rows

+ Sparse basis • Since the basis vectors are actual

columns and rows

- Duplicate columns and rows • Columns of large norms will be sampled many

times

J. Leskovec, A. Rajaraman, J. Ullman:

Mining of Massive Datasets, http://www.mmds.org

74

Singular vector Actual column

Page 15: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

15

Solution

• If we want to get rid of the duplicates:

– Throw them away

– Scale (multiply) the columns/rows by the square root of the number of duplicates

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 75

A Cd

Rd

Cs

Rs

Construct a small U

SVD vs. CUR

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 76

SVD: A = U VT

Huge but sparse Big and dense

CUR: A = C U R

Huge but sparse Big but sparse

dense but small

sparse and small

SVD vs. CUR: Simple Experiment

• DBLP bibliographic data – Author-to-conference big sparse matrix

– Aij: Number of papers published by author i at conference j

– 428K authors (rows), 3659 conferences (columns) • Very sparse

• Want to reduce dimensionality – How much time does it take?

– What is the reconstruction error?

– How much space do we need?

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org 77

Results: DBLP- big sparse matrix

• Accuracy: – 1 – relative sum squared errors

• Space ratio: – #output matrix entries / #input matrix entries

• CPU time J. Leskovec, A. Rajaraman, J. Ullman:

Mining of Massive Datasets, http://www.mmds.org

78

SVD

CUR

CUR no duplicates

SVD

CUR

CUR no dup

Sun, Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM ’07.

CUR

SVD

Page 16: SVD Example: Users-to-Movieslaber/SVD-Applications.pdf · 2017. 9. 18. · SVD – Example: Users-to-Movies •A = U V T - example: J. Leskovec, A. Rajaraman, J. Ullman: Mining of

18/09/2017

16

Bibliografia

• Cap 3. Foundations of Data Science, Avrim Blum, John Hopcroft and Ravindran Kannan http://www.cs.cornell.edu/jeh/book.pdfs

• Cap 10. Mining Massive Datasets • Fast Monte Carlo Algorithms for Matriices III:

Computing a Compressed Approximate Matrix Decomposition (DKM, SICOMP 2006)

• Cap 14 de Elements of Statistical Learning • http://www.cs.cmu.edu/~jimeng/papers/SunSD

M07.pdf