Top Banner
Fast matrix primitives for ranking, communities and more. David F. Gleich Computer Science Purdue University Netflix David Gleich · Purdue 1
39

Fast matrix primitives for ranking, link-prediction and more

Jan 15, 2015

Download

Technology

David Gleich

I gave this talk at Netflix about some of the recent work I've been doing on fast matrix primitives for link prediction and also some non-standard uses of the nuclear norm for ranking.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fast matrix primitives for ranking, link-prediction and more

Fast matrix primitives for ranking, communities !

and more.

David F. Gleich!Computer Science!Purdue University!

Netflix David Gleich · Purdue 1

Page 2: Fast matrix primitives for ranking, link-prediction and more

David Gleich · Purdue 2 Netflix

Page 3: Fast matrix primitives for ranking, link-prediction and more

Models and algorithms for high performance !matrix and network computations

Netflix David Gleich · Purdue 3

18 P. G. CONSTANTINE, D. F. GLEICH, Y. HOU, AND J. TEMPLETON

1

error

0

2

(a) Error, s = 0.39 cm

1

std

0

2

(b) Std, s = 0.39 cm

10

error

0

20

(c) Error, s = 1.95 cm

10

std

0

20

(d) Std, s = 1.95 cm

Fig. 4.5: Error in the reduce order model compared to the prediction standard de-viation for one realization of the bubble locations at the final time for two values ofthe bubble radius, s = 0.39 and s = 1.95 cm. (Colors are visible in the electronicversion.)

the varying conductivity fields took approximately twenty minutes to construct usingCubit after substantial optimizations.

Working with the simulation data involved a few pre- and post-processing steps:interpret 4TB of Exodus II files from Aria, globally transpose the data, compute theTSSVD, and compute predictions and errors. The preprocessing steps took approx-imately 8-15 hours. We collected precise timing information, but we do not reportit as these times are from a multi-tenant, unoptimized Hadoop cluster where otherjobs with sizes ranging between 100GB and 2TB of data sometimes ran concurrently.Also, during our computations, we observed failures in hard disk drives and issuescausing entire nodes to fail. Given that the cluster has 40 cores, there was at most2400 cpu-hours consumed via these calculations—compared to the 131,072 hours ittook to compute 4096 heat transfer simulations on Red Sky. Thus, evaluating theROM was about 50-times faster than computing a full simulation.

We used 20,000 reducers to convert the Exodus II simulation data. This choicedetermined how many map tasks each subsequent step utilized—around 33,000. Wealso found it advantageous to store matrices in blocks of about 16MB per record. Thereduction in the data enabled us to use a laptop to compute the coe�cients of theROM and apply to the far face for the UQ study in Section 4.4.

Here are a few pertinent challenges we encountered while performing this study.Generating 8192 meshes with di↵erent material properties and running independent

Tensor eigenvalues"and a power method

28

Tensor methods for network alignment

Network alignment is the problem of computing an approximate isomorphism between two net-works. In collaboration with Mohsen Bayati, Amin Saberi, Ying Wang, and Margot Gerritsen,the PI has developed a state of the art belief propagation method (Bayati et al., 2009).

FIGURE 6 – Previous workfrom the PI tackled net-work alignment with ma-trix methods for edgeoverlap:

i

j

j

0i

0

OverlapOverlap

A L B

This proposal is for match-ing triangles using tensormethods:

j

i

k

j

0

i

0

k

0

TriangleTriangle

A L B

If xi, xj , and xk areindicators associated withthe edges (i, i0), (j, j0), and(k, k0), then we want toinclude the product xixjxk

in the objective, yielding atensor problem.

We propose to study tensor methods to perform network alignmentwith triangle and other higher-order graph moment matching. Similarideas were proposed by Svab (2007); Chertok and Keller (2010) alsoproposed using triangles to aid in network alignment problems.In Bayati et al. (2011), we found that triangles were a key missingcomponent in a network alignment problem with a known solution.Given that preserving a triangle requires three edges between twographs, this yields a tensor problem:

maximizeX

i2L

wixi +X

i2L

X

j2L

xixjSi,j +X

i2L

X

j2L

X

k2L

xixjxkTi,j,k

| {z }triangle overlap term

subject to x is a matching.

Here, Ti,j,k = 1 when the edges corresponding to i, j, and k inL results in a triangle in the induced matching. Maximizing thisobjective is an intractable problem. We plan to investigate a heuris-tic based on a rank-1 approximation of the tensor T and usinga maximum-weight matching based rounding. Similar heuristicshave been useful in other matrix-based network alignment algo-rithms (Singh et al., 2007; Bayati et al., 2009). The work involvesenhancing the Symmetric-Shifted-Higher-Order Power Method due toKolda and Mayo (2011) to incredibly large and sparse tensors . On thisaspect, we plan to collaborate with Tamara G. Kolda. In an initialevaluation of this triangle matching on synthetic problems, using thetensor rank-1 approximation alone produced results that identifiedthe correct solution whereas all matrix approaches could not.

vision for the future

All of these projects fit into the PI’s vision for modernizing the matrix-computation paradigmto match the rapidly evolving space of network computations. This vision extends beyondthe scope of the current proposal. For example, the web is a huge network with over onetrillion unique URLs (Alpert and Hajaj, 2008), and search engines have indexed over 180billion of them (Cuil, 2009). Yet, why do we need to compute with the entire network?By way of analogy, note that we do not often solve partial di↵erential equations or modelmacro-scale physics by explicitly simulating the motion or interaction of elementary particles.We need something equivalent for the web and other large networks. Such investigations maytake many forms: network models, network geometry, or network model reduction. It is thevision of the PI that the language, algebra, and methodology of matrix computations will

11

maximize

Pijk

T

ijk

x

i

x

j

x

k

subject to kxk2

= 1

Human protein interaction networks 48,228 triangles Yeast protein interaction networks 257,978 triangles The tensor T has ~100,000,000,000 nonzeros

We work with it implicitly

where ! ensures the 2-norm

[x

(next)

]

i

= ⇢ · (

X

jk

T

ijk

x

j

x

k

+ �x

i

)

SSHOPM method due to "Kolda and Mayo

Big data methods SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12

Network alignment ICDM ‘09, SC ‘11, TKDE ‘13

Fast & Scalable"Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, …

Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 …

Ax = b

min kAx � bkAx = �x

Massive matrix "computations

on multi-threaded and distributed architectures

Page 4: Fast matrix primitives for ranking, link-prediction and more

4 Image from rockysprings, deviantart, CC share-alike

Everything in the world can be explained by a matrix, and we see

how deep the rabbit hole goes

The talk ends, you believe -- whatever

you want to.

Page 5: Fast matrix primitives for ranking, link-prediction and more

Matrix computations in a red-pill

Solve a problem better by exploiting its structure!

Netflix David Gleich · Purdue 5

Page 6: Fast matrix primitives for ranking, link-prediction and more

WHY NO PREPROCESSING?

David F. Gleich (Purdue) Emory Math/CS Seminar

Top-k predicted “links” are movies to watch!

Pairwise scores give user similarity

19 of 47

Problem 1 – (Faster) !Recommendation as link prediction

Netflix David Gleich · Purdue 6

Page 7: Fast matrix primitives for ranking, link-prediction and more

Problem 2 – (Better) !Best movies

Netflix David Gleich · Purdue 7

Page 8: Fast matrix primitives for ranking, link-prediction and more

Matrix computations in a red-pill

Solve a problem better by exploiting its structure!

Netflix David Gleich · Purdue 8

Page 9: Fast matrix primitives for ranking, link-prediction and more

Matrix structure

Movies "“liked” (>3 stars?)

5 1 1 4

5 5

Netflix matrix

Netflix graph

Problem 1!Adjacency matrix

Normalized Laplacian matrix Random walk matrix

Problem 2"Pairwise comparison matrix

Netflix David Gleich · Purdue 9

Page 10: Fast matrix primitives for ranking, link-prediction and more

WHY NO PREPROCESSING?

David F. Gleich (Purdue) Emory Math/CS Seminar

Top-k predicted “links” are movies to watch!

Pairwise scores give user similarity

19 of 47

Problem 1 – (Faster) !Recommendation as link prediction

Netflix David Gleich · Purdue 10

Page 11: Fast matrix primitives for ranking, link-prediction and more

Matrix based link predictors

A MODERN TAKE The Katz score (node-based) is

                              The Katz score (edge-based) is

                             

David F. Gleich (Purdue) Emory Math/CS Seminar 9 of 47

pred. on movie =

1X

`=1

↵`

✓num. paths of length `

from user to movie

k =1X

`=1

(↵`A`)✓

userind. vec.

| {z }⌘eiM

ovie

pred

ictio

n" v

ecto

r

Netflix David Gleich · Purdue 11

Page 12: Fast matrix primitives for ranking, link-prediction and more

Matrix based link predictors

k =1X

`=1

(↵`A`)✓

userind. vec.

| {z }⌘ei

(I � ↵A)k = ei

Carl Neumann 1X

k=0

(tA)k

Neumann

Netflix David Gleich · Purdue 12

Page 13: Fast matrix primitives for ranking, link-prediction and more

Matrix based link predictors

(I � ↵A)k = ei

(I � ↵P)x = ei

(I � ↵L)x = ei

exp{↵P}x = ei

A MODERN TAKE The Katz score (node-based) is

                              The Katz score (edge-based) is

                             

David F. Gleich (Purdue) Emory Math/CS Seminar 9 of 47

Katz

PageRank

Semi-super."learning

Heat kernel They all look at sums of "

damped paths, but "change the details, slightly

Netflix David Gleich · Purdue 13

Page 14: Fast matrix primitives for ranking, link-prediction and more

Matrix based link predictors are localized!

Netflix David Gleich · Purdue 14

PageRank scores for one node!Crawl of flickr from 2006 ~800k nodes, 6M edges, alpha=1/2

0 2 4 6 8 10x 105

0

0.5

1

1.5

100 102 104 10610−15

10−10

10−5

100

100 102 104 10610−15

10−10

10−5

100

nonzeros

erro

r

plot(x)

||xtru

e – x

nnz||

1

Page 15: Fast matrix primitives for ranking, link-prediction and more

Matrix based link predictors are localized! KATZ SCORES ARE LOCALIZED

David F. Gleich (Purdue) Emory Math/CS Seminar

Up to 50 neighbors is 99.65% of the total mass

32 of 47

Netflix David Gleich · Purdue 15

Page 16: Fast matrix primitives for ranking, link-prediction and more

Matrix computations in a red-pill

Solve a problem better by exploiting its structure!

Netflix David Gleich · Purdue 16

Page 17: Fast matrix primitives for ranking, link-prediction and more

How do we compute them fast?

Netflix David Gleich · Purdue 17

w/ access to in-links & degs. PageRankPull

w/ access to out-links PageRankPush

j = blue node

x

(k+1)j

� ↵X

i!j

x

(k )i

/deg

i

= f

j

x

(k+1)j

� ↵x

(k )a

/6

� ↵x

(k )b

/2 � ↵x

(k )c

/3= f

j

Solve for x (k+1)j

a

b

c

j = blue node

Let

x

(k+1)j

= x

(k )j

+ r

j

then

Update r(k+1)

r (k+1)b = r (k )

b + ↵r (k )j /3

r (k+1)c = r (k )

c + ↵r (k )j /3

r (k+1)a = r (k )

a + ↵r (k )j /3

r (k+1)j = 0

x

j

= ↵X

i neigh. of j

x

i

deg(i)

PageRank + 1 if j is the target user

Page 18: Fast matrix primitives for ranking, link-prediction and more

We have good theory for this algorithm … … and even better empirical performance.

Netflix David Gleich · Purdue 18

Page 19: Fast matrix primitives for ranking, link-prediction and more

Theory

Andersen, Chung, Lang (2006)!For PageRank, “fast runtimes” and “localization” Bonchi, Esfandiar, Gleich, et al. (2010/2013)!For Katz, “fast runtimes” Kloster, Gleich (2013)!For Katz, Heat Kernel, "“fast runtimes” and “localization”"(assuming power-law degrees)

Netflix David Gleich · Purdue 19

Page 20: Fast matrix primitives for ranking, link-prediction and more

Accuracy vs. work !(Heat kernel)

Netflix David Gleich · Purdue 20

For the dblp collaboration graph, we study the precision in finding the 100 largest nodes as we vary the work. This set of 100 does not include the nodes immediate neighbors. (One column, but representative)

10−2 10−1 100

0

0.2

0.4

0.6

0.8

1

dblp−cc

Effective matrix−vector products

Prec

isio

n

to

l=10

−4

tol=

10−5

@10@25@100@1000

dblp collaboration graph, 225k vertices

Page 21: Fast matrix primitives for ranking, link-prediction and more

Empirical runtime (Katz) TIMING

David F. Gleich (Purdue) Emory Math/CS Seminar 46 of 47

Netflix David Gleich · Purdue 21

Page 22: Fast matrix primitives for ranking, link-prediction and more

Never got to try it …

40 60 80 100 120

40

60

80

mm

Data sourcesWe had two data-sources: Microsoft’s toolbar logs andHelloMovies.com analytics logs.

Microsoft toolbar logsMachine:6AF023 URL:slashdot.org/ Action:ClickMachine:6AF023 URL:google.com/search?&q=yhoo Action:EntryMachine:6AF023 URL:google.com/q?yhoo Action:Entry

HelloMovies.com analytics18:27:59 P C 1 1 1 0 /find_friends/118:28:17 P C 2 1 2 0 /register_success18:28:28 P C 3 1 3 0 /18:28:29 A 4 1 3 1 /find/get_form18:28:30 AC 5 1 4 1 /find/get_movies18:28:36 P C 6 1 5 1 /movie/the-lord-of-the-[...]18:28:52 P C 7 1 6 1 /movie/lion-witch-wardrobe-[...]18:28:15 P 8 1 6 1 /actor/cate-blanchett

Note I collaborate with the company behind HelloMovies.com.

David F. Gleich (Sandia) Measuring alpha WWW2010 11 / 26

Ran out of money once we had the algorithms … promising initial results though!

Netflix David Gleich · Purdue 22

Need to test on

Netflix matrix now

Page 23: Fast matrix primitives for ranking, link-prediction and more

Problem 2 – (Better) !Best movies

Netflix David Gleich · Purdue 23

Page 24: Fast matrix primitives for ranking, link-prediction and more

Which is a better list of good DVDs? Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of … Lord of the Rings 1: The Fellowship Lord of the Rings 1: The Fellowship Lord of the Rings 2: The Two Towers Lord of the Rings 2: The Two Towers Lost: Season 1 Star Wars V: Empire Strikes Back Battlestar Galactica: Season 1 Raiders of the Lost Ark Fullmetal Alchemist Star Wars IV: A New Hope Trailer Park Boys: Season 4 Shawshank Redemption Trailer Park Boys: Season 3 Star Wars VI: Return of the Jedi Tenchi Muyo! Lord of the Rings 3: Bonus DVD Shawshank Redemption The Godfather

Nuclear Norm "based rank aggregation

(not matrix completion on the netflix rating matrix)

Standard "rank aggregation"(the mean rating)

Netflix David Gleich · Purdue 24/4

0

Page 25: Fast matrix primitives for ranking, link-prediction and more

Rank Aggregation Given partial orders on subsets of items, rank aggregation is the problem of finding an overall ordering. Voting Find the winning candidate Program committees Find the best papers given reviews Dining Find the best restaurant in Chicago

Netflix David Gleich · Purdue 25/4

0

Page 26: Fast matrix primitives for ranking, link-prediction and more

Ranking is really hard

All rank aggregations involve some measure of compromise

A good ranking is the “average” ranking under a permutation distance

Ken Arrow John Kemeny Dwork, Kumar, Naor, !

Sivikumar

NP hard to compute Kemeny’s ranking

Netflix David Gleich · Purdue 26/4

0

Page 27: Fast matrix primitives for ranking, link-prediction and more

Suppose we had scores

Netflix David Gleich · Purdue

Suppose we had scoresLet    be the score of the ith movie/song/paper/team to rank

Suppose we can compare the ith to jth:

  

Then    is skew-symmetric, rank 2.

Also works for    with an extra log.

Kemeny and Snell, Mathematical Models in Social Sciences (1978)

Numerical ranking is intimately intertwined with skew-symmetric matrices

David F. Gleich (Purdue) KDD 2011 6/20 27/4

0

Page 28: Fast matrix primitives for ranking, link-prediction and more

Using ratings as comparisons

Arithmetic Mean

Log-odds

Netflix David Gleich · Purdue

Ratings induce various skew-symmetric matrices. From David 1988 – The Method of Paired Comparisons 28

/40

Page 29: Fast matrix primitives for ranking, link-prediction and more

Extracting the scores

Netflix David Gleich · Purdue

Extracting the scores

Given    with all entries, then    is the Bordacount, the least-squares solution to   

How many    do we have? Most.

Do we trust all    ? Not really. Netflix data 17k movies,

500k users, 100M ratings–99.17% filled

105

107

101

101

David F. Gleich (Purdue) KDD 2011

105

Number of Comparisons

Movie

Pair

s

8/20 29/4

0

Page 30: Fast matrix primitives for ranking, link-prediction and more

Only partial info? COMPLETE IT!

Netflix David Gleich · Purdue

Only partial info? Complete it!Let    be known for    We trust these scores.

Goal Find the simplest skew-symmetric matrix that matches the data   

    

David F. Gleich (Purdue) KDD 2011

noiseless

noisy

Both of these are NP-hard too.9/20

30/4

0

Page 31: Fast matrix primitives for ranking, link-prediction and more

Solution GO NUCLEAR!

Netflix David Gleich · Purdue From a French nuclear test in 1970, image from http://picdit.wordpress.com/2008/07/21/8-insane-nuclear-explosions/

31/4

0

Page 32: Fast matrix primitives for ranking, link-prediction and more

The ranking algorithm

Netflix David Gleich · Purdue 32/4

0

The Ranking Algorithm0. INPUT    (ratings data) and c

(for trust on comparisons)

1. Compute    from   

2. Discard entries with fewer than c comparisons

3. Set    to be indices and values of what’s left

4.    = SVP(   )

5. OUTPUT   

David F. Gleich (Purdue) KDD 2011 17/20

Page 33: Fast matrix primitives for ranking, link-prediction and more

Exact recovery

Netflix David Gleich · Purdue 33/4

0

Exact recovery resultsDavid Gross showed how to recover Hermitian matrices.

i.e. the conditions under which we get the exact   

Note that    is Hermitian. Thus our new result!

Gross arXiv 2010.David F. Gleich (Purdue) KDD 2011

  15/20

indices. Instead we view the following theorem as providingintuition for the noisy problem.

Consider the operator basis for Hermitian matrices:

H = S [K [D where

S = {1/p2(eie

Tj + eje

Ti ) : 1 i < j n};

K = {ı/p2(eie

Tj � eje

Ti ) : 1 i < j n};

D = {eieTi : 1 i n}.

Theorem 5. Let s be centered, i.e., sT e = 0. Let Y =seT � esT where ✓ = maxi s

2

i /(sT s) and ⇢ = ((maxi si) �

(mini si))/ksk. Also, let ⌦ ⇢ H be a random set of elementswith size |⌦| � O(2n⌫(1 + �)(log n)2) where ⌫ = max((n✓ +1)/4, n⇢2). Then the solution of

minimize kXk⇤subject to trace(X⇤W i) = trace((ıY )⇤W i), W i 2 ⌦

is equal to ıY with probability at least 1� n��.

The proof of this theorem follows directly by Theorem 4 ifıY has coherence ⌫ with respect to the basis H. We nowshow this result.

Definition 6 (Coherence, Gross [2010]). Let A ben ⇥ n, rank-r, and Hermitian. Let UU⇤ be an orthogonalprojector onto range(A). Then A has coherence ⌫ with

respect to an operator basis {W i}n2

i=1

if both

maxi trace(W iUU⇤W i) 2⌫r/n, and

maxi trace(sign(A)W i)2 ⌫r/n2.

For A = ıY with sT e = 0:

UU⇤ =ssT

sT s� 1

neeT and sign(A) =

1

ksk pnA.

Let Sp 2 S, Kp 2 K, and Dp 2 D. Note that becausesign(A) is Hermitian with no real-valued entries, both quan-tities trace(sign(A)Di)

2 and trace(sign(A)Si)2 are 0. Also,

because UU⇤ is symmetric, trace(KiUU⇤Kp) = 0. Theremaining basis elements satisfy:

trace(SpUU⇤Sp) =1n+

s2i + s2j2sT s

(1/n) + ✓

trace(DpUU⇤Dp) =1n+

s2isT s

(1/n) + ✓

trace(sign(A)Kp)2 =

2(si � sj)2

nsT s (2/n)⇢2.

Thus, A has coherence ⌫ with ⌫ from Theorem 5 and withrespect to H. And we have our recovery result. Although,this theorem provides little practical benefit unless both ✓and ⇢ are O(1/n), which occurs when s is nearly uniform.

6. RESULTSWe implemented and tested this procedure in two synthetic

scenarios, along with Netflix, movielens, and Jester joke-setratings data. In the interest of space, we only present asubset of these results for Netflix.

102

103

104

0

0.2

0.4

0.6

0.8

1

Fra

ctio

n o

f tr

ials

reco

vere

d

Samples

5n

2n lo

g(n

)

6n lo

g(n

)

102

103

104

0

0.2

0.4

0.6

0.8

1

Fra

ctio

n o

f tr

ials

reco

vere

d

Samples200 1000 50000

0.01

0.02

0.03

0.04

0.05

No

ise

leve

l

Samples

5n

2n lo

g(n

)

6n lo

g(n

)

Figure 2: An experimental study of the recoverabil-ity of a ranking vector. These show that we needabout 6n log n entries of Y to get good recovery inboth the noiseless (left) and noisy (right) case. See§6.1 for more information.

6.1 RecoveryThe first experiment is an empirical study of the recover-

ability of the score vector in the noiseless and noisy case. Inthe noiseless case, Figure 2 (left), we generate a score vectorwith uniformly distributed random scores between 0 and 1.These are used to construct a pairwise comparison matrixY = seT � esT . We then sample elements of this matrixuniformly at random and compute the di↵erence betweenthe true score vector s and the output of steps 4 and 5 ofAlgorithm 2. If the relative 2-norm di↵erence between thesevectors is less than 10�3, we declare the trial recovered. Forn = 100, the figure shows that, once the number of samplesis about 6n log n, the correct s is recovered in nearly all the50 trials.Next, for the noisy case, we generate a uniformly spaced

score vector between 0 and 1. Then Y = seT � esT +"E, where E is a matrix of random normals. Again, wesample elements of this matrix randomly, and declare atrial successful if the order of the recovered score vector isidentical to the true order. In Figure 2 (right), we indicatethe fractional of successful trials as a gray value between black(all failure) and white (all successful). Again, the algorithmis successful for a moderate noise level, i.e., the value of ",when the number of samples is larger than 6n log n.

6.2 SyntheticInspired by Ho and Quinn [2008], we investigate recovering

item scores in an item-response scenario. Let ai be the centerof user i’s rating scale, and bi be the rating sensitivity of useri. Let ti be the intrinsic score of item j. Then we generateratings from users on items as:

Ri,j = L[ai + bitj + Ei,j ]

where L[↵] is the discrete levels function:

L[↵] = max(min(round(↵), 5), 1)

and Ei,j is a noise parameter. In our experiment, we drawai ⇠ N(3, 1), bi ⇠ N(0.5, 0.5), ti ⇠ N(0.1, 1), and Ei,j ⇠"N(0, 1). Here, N(µ,�) is a standard normal, and " is anoise parameter. As input to our algorithm, we sampleratings uniformly at random by specifying a desired numberof average ratings per user. We then look at the Kendall⌧ correlation coe�cient between the true scores ti and theoutput of our algorithm using the arithmetic mean pairwiseaggregation. A ⌧ value of 1 indicates a perfect orderingcorrelation between the two sets of scores.

Page 34: Fast matrix primitives for ranking, link-prediction and more

Recovery Discussion and ExperimentsConfession If    , then just look at differences from

a connected set. Constants? Not very good.

   Intuition for the truth.

    

David F. Gleich (Purdue) KDD 2011 16/20

Netflix David Gleich · Purdue 34

Page 35: Fast matrix primitives for ranking, link-prediction and more

Recovery Experiments

Netflix David Gleich · Purdue

Recovery Discussion and ExperimentsConfession If    , then just look at differences from

a connected set. Constants? Not very good.

   Intuition for the truth.

    

David F. Gleich (Purdue) KDD 2011 16/20

35/4

0

Page 36: Fast matrix primitives for ranking, link-prediction and more

Evaluation

Netflix David Gleich · Purdue

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Media

n K

endall’s

Tau

20

10

5

2

1.5

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Media

n K

endall’s

Tau

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Media

n K

endall’s

Tau

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Media

n K

endall’s

Tau

Figure 3: The performance of our algorithm (left)and the mean rating (right) to recovery the order-ing given by item scores in an item-response theorymodel with 100 items and 1000 users. The variousthick lines correspond to average number of ratingseach user performed (see the in place legend). See§6.2 for more information

Figure 3 shows the results for 1000 users and 100 itemswith 1.1, 1.5, 2, 5, and 10 ratings per user on average. We alsovary the parameter " between 0 and 1. Each thick line withmarkers plots the median value of ⌧ in 50 trials. The thinadjacency lines show the 25th and 75th percentiles of the50 trials. At all error levels, our algorithm outperforms themean rating. Also, when there are few ratings per-user andmoderate noise, our approach is considerably more correlatedwith the true score. This evidence supports the anecdotalresults from Netflix in Table 2.

6.3 NetflixSee Table 2 for the top movies produced by our technique

in a few circumstances using all users. The arithmetic meanresults in that table use only elements of Y with at least 30pairwise comparisons (it is a am all 30 model in the codebelow). And see Figure 4 for an analysis of the residualsgenerated by the fit for di↵erent constructions of the matrixY . Each residual evaluation of Netflix is described by a code.For example, sb all 0 is a strict-binary pairwise matrix Yfrom all Netflix users and c = 0 in Algorithm 2 (i.e. acceptall pairwise comparisons). Alternatively, am 6 30 denotesan arithmetic-mean pairwise matrix Y from Netflix userswith at least 6 ratings, where each entry in Y had 30 userssupporting it. The other abbreviations are gm: geometricmean; bc: binary comparison; and lo: log-odds ratio.

These residuals show that we get better rating fits by onlyusing frequently compared movies, but that there are onlyminor changes in the fits when excluding users that ratefew movies. The di↵erence between the score-based residu-als

��⌦(seT � esT )� b�� (red points) and the svp residuals��⌦(USV T )� b

�� (blue points) show that excluding compar-isons leads to “overfitting” in the svp residual. This suggeststhat increasing the parameter c should be done with careand good checks on the residual norms.To check that a rank-2 approximation is reasonable, we

increased the target rank in the svp solver to 4 to investigate.For the arithmetic mean (6,30) model, the relative residualat rank-2 is 0.2838 and at rank-4 is 0.2514. Meanwhile, thenuclear norm increases from around 14000 to around 17000.These results show that the change in the fit is minimal andour rank-2 approximation and its scores should represent areasonable ranking.

0.2 0.3 0.4 0.5 0.6 0.7

am all 30am 6 30gm 6 30gm all 30am all 100am 6 100sb 6 30sb all 30gm all 100gm 6 100bc 6 30bc all 30

bc 6 100bc all 100lo all 30lo 6 30lo 6 100lo all 100

sb all 100sb 6 100

am 6 0am all 0

bc 6 0bc all 0lo 6 0lo all 0gm 6 0gm all 0

sb 6 0sb all 0

Relative Residual

Figure 4: The labels on each residual show how wegenerated the pairwise scores and truncated the Net-flix data. Red points are the residuals from thescores, and blue points are the final residuals fromthe SVP algorithm. Please see the discussion in §6.3.

7. CONCLUSIONExisting principled techniques such as computing a Ke-

meny optimal ranking or finding a minimize feedback arc setare NP-hard. These approaches are inappropriate in largescale rank aggregation settings. Our proposal is (i) measurepairwise scores Y and (ii) solve a matrix completion problemto determine the quality of items. This idea is both princi-pled and functional with significant missing data. The resultsof our rank aggregation on the Netflix problem (Table 2)reveal popular and high quality movies. These are interestingresults and could easily have a home on a “best movies inNetflix” web page. Such a page exists, but is regarded ashaving strange results. Computing a rank aggregation withthis technique is not NP-hard. It only requires solving aconvex optimization problem with a unique global minima.Although we did not record computation times, the mosttime consuming piece of work is computing the pairwise com-parison matrix Y . In a practical setting, this could easily bedone with a MapReduce computation.

To compute these solutions, we adapted the svp solver formatrix completion [Jain et al., 2010]. This process involved(i) studying the singular value decomposition of a skew-symmetric matrix (Lemmas 1 and 2) and (ii) showing thatthe svp solver preserves a skew-symmetric approximationthrough its computation (Theorem 3). Because the svp solvercomputes with an explicitly chosen rank, these techniqueswork well for large scale rank aggregation problems.

We believe the combination of pairwise aggregation andmatrix completion is a fruitful direction for future research.We plan to explore optimizing the svp algorithm to exploitthe skew-symmetric constraint, extending our recovery resultto the noisy case, and investigating additional data.

Acknowledgements. The authors would like to thank Amy Langville,

Carl Meyer, and Yuan Yao for helpful discussions.

Nuclear norm ranking Mean rating

36/4

0

Page 37: Fast matrix primitives for ranking, link-prediction and more

Tie in with PageRank

Another way to compute the scores is through a close relative of PageRank and the link-prediction methods.

Massey or Colley methods

Netflix David Gleich · Purdue 37/4

0

(2I + D � A)s = “differeneces”(L + 2D

�1)x = “scaled differences”

Page 38: Fast matrix primitives for ranking, link-prediction and more

Ongoing Work Finding communities in large networks ! We have the best community finder (as of CIKM2013)" Whang, Gleich, Dhillon (CIKM)

Fast clique detection! We have the fastest solver for max-clique problems, useful for computing temporal strong components (Rossi, Gleich, et al. arXiv)

Scalable network alignment !& Low-rank clustering with features + links!& Scalable, distributed implementations !of fast graph kernels!& Evolving network analysis!

Netflix David Gleich · Purdue 38

v

r

t

s

w

uwtu

Overlap

A L B

Page 39: Fast matrix primitives for ranking, link-prediction and more

References !Papers Gleich & Lim, KDD 2011 – Nuclear Norm Ranking"Esfandiar, Gleich, Bonchi et al. – WAW2010, J. Internet. Math. 2013"Kloster & Gleich, WAW2013, arXiv 1310.3423

Code!www.cs.purdue.edu/homes/dgleich/codes!bit.ly/dgleich-code

!! Netflix David Gleich · Purdue 39

Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich