Top Banner
Poster session: Poster session: Thursday December 10 36 pm Gates Atrium We will provide poster boards We will provide poster boards 30% of project grade Project writeup: Due Friday December 11 PDF by email to course staff list Max 6 min 4 pages in ACM format More info on the website 70% of project grade 12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 1
32

Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Jul 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Poster session: Poster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boardsWe will provide poster boards 30% of project grade

Project writeup:j p Due Friday December 11  PDF by email to course staff list Max 6 min 4 pages in ACM format More info on the website 70% of project grade

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 1

Page 2: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Received 15 entries Received 15 entries Top score: RPL: 351 944 RPL: 351,944  GNP: 1,150,563 (5 got the OPT)

Top 5: Top 5:Name Score

Shayan Oveis Gharan 1 502 507Shayan_Oveis_Gharan 1,502,507

Farnaz_Ronaghi_Khameneh ‐2

Ying_Wang ‐11

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 2

Abhijeet_Mohapatra ‐92

Nipun_Dave ‐162

Page 3: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Idea: combine Idea: combine  min‐cut on positive edges 2nd smallest eigenvector x of Laplacian 2nd smallest eigenvector x of Laplacian

max‐cut on negative edges Largest eigenvector y of normalized Laplacian Largest eigenvector y of normalized Laplacian

So for each node 2 scores (positions): Min‐cut score Max‐cut score Min‐cut score, Max‐cut score

Now simply partition the nodes GNP (6 edges from best solution): 1 150 557GNP (6 edges from best solution): 1,150,557 RPL: 342,021 (and after local updates 351,939)

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 3

Page 4: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

CS 322: (Social and Information) Network AnalysisJure LeskovecStanford University

Page 5: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Many many documents Many many documents

How to organize/navigate it?g / g

First try: yWeb directories Yahoo, , DMOZ,  LookSmartLookSmart

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 5

Page 6: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Started in 1960s Started in 1960s Find relevant items in a repository of often small and trusted set:small and trusted set: Newspaper articles Patents et Patents, etc.

Two traditional problems:S i b d h i k d ill Synonimy: buy and purchase, sick and ill Polysemi: JaguarS d t S h Second try: Search

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 6

Page 7: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

D bi i d b tt lt ?Does bigger index mean better results?

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 7

Page 8: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

What is “best” answer to query “Stanford”?What is  best  answer to query  Stanford ? Anchor Text: I go to Stanford where I study

What about query “newspaper”? What about query  newspaper ? Not a single right answer

Scarcity (IR) vs abundance (Web) Scarcity (IR) vs. abundance (Web) Many sources of info: who to “trust”

Trick: Trick:  pages that actually know about newspapers might all be pointing to many newspapersmight all be pointing to many newspapers

Ranking!12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 8

Page 9: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 9

Page 10: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Goal (back to newspaper example): Goal (back to newspaper example): Don’t just find newspapers but also find “experts” – people who link in a coordinated way to many– people who link in a coordinated way to many good newspapers

Idea: link votingIdea: link voting Quality as an expert (hub): Total sum of votes of pages pointed to

NYT: 10Ebay: 3Total sum of votes of pages pointed to

Quality as an content (authority): Total sum of votes of experts

Ebay: 3Yahoo: 3CNN: 8WSJ: 9p

Principle of repeated improvement12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 10

Page 11: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 11

Page 12: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 12

Page 13: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 13

Page 14: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Each page i has 2 kinds of scores: Each page i has 2 kinds of scores: Hub score: hi A th it Authority score: ai

Algorithm:I iti li h 1 Initialize: ai=hi=1 Then keep iterating:

A th it h Authority:  Hub: Normalize:

ji

ij ha

ji

ji ah

Normalize:ai=1, hi=1

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 14

Page 15: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

This will converge to a single stable point This will converge to a single stable point Slightly change the notation: Vector a (a a ) h (h h ) Vector a=(a1…,an), h=(h1…,hn) Adjacency matrix (n x n): Mij=1 if ij

Then: Then:

jjiji

jiji aMhah

So: And likewise:

jji

Mah hMa T And likewise:

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis

hMa

15

Page 16: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Algorithm in new notation: Algorithm in new notation: Set: a = h = 1n

Repeat:Repeat: h=Ma, a=MTh Normalize

T a is being updated (in 2 steps): Then:  a=MT(Ma)new h

new a

a is being updated (in 2 steps):MT(Ma)=(MTM)ah is updated (in 2 steps):

Thus, in 2k steps: a=(MTM)ka

new a p ( p )M (MTh)=(MMT)h

a=(M M) ah=(MMT)kh

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis

Repeated matrix powering

16

Page 17: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Definition: Definition: Let Ax=x for some scalar , vector x and matrix A th i i t d i it i l then x is an eigenvector, and  is its eigenvalue

Fact: If A is symmetric (Aij=Aji) (note in our case MTM and MMT are symmetric)( y ) Then A has n orthogonal unit eigenvectors w1…wnthat form a basis (coordinate system) with eigenvalues 1... n (|i||i+1|)

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 17

Page 18: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Write x in coordinate system w w Write x in coordinate system w1…wnx=i iwi

x has coordinates ( ) x has coordinates (1,…, n)

Suppose: 1.. n (|1||2|  … |n|)

Akx = (1k1, 2k2,…., nkn) =  ikiwi

As k, if we normalize Akx11w1 (all other coordinates 0)

So authority a is eigenvector of MTM associated with y glargest eigenvalue 1 (need |1|>|2|)

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 18

Page 19: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

A vote from an important page is worth more A vote from an important page is worth more

A page is important if it is pointed to by other p g p p yimportant pages

f “ ” f Define a “rank” rj for node j rj should be proportional to:

ji

iri of outdegree

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 19

Page 20: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

rj … probability I’m currently at j in a random walk jrj = Pr[at i] Pr[ij]

But rj= ri/(out‐degree of i) j i/( g )prob. of being at j after one step of a random walk

Define: Nij=Mij/di = 1/di Mij=1 if node i links to j out degree of i is d out‐degree of i is di

Nij is prob. we will be at j if we are currently at i

Then in the limit: r = Nr Then in the limit: r = Nr i.e., r is principal eigenvector of N

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 20

Page 21: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Power iteration: Y! Power iteration: Set ri=1 /d

Y!

A MS rj=j ri/di And iterate

Y! A MS

Y! ½ ½ 0

A ½ 0 1

Example:1 1 5/4 9/8 6/5

A ½ 0 1

MS 0 ½ 0

y 1 1 5/4 9/8 6/5a = 1 3/2 1 11/8 … 6/5m 1 ½ ¾ ½ 3/5

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 21

Page 22: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Some pages are “dead ends” Some pages are  dead ends  (have no out‐links) Such pages cause importance Such pages cause importanceto leak out

Spider traps (all out links arewithin the group)within the group) Eventually spider traps absorb all importance

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 22

Page 23: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Power iteration: Y! Power iteration: Set ri=1 /d

Y!

A MS rj=j ri/di And iterate

Y! A MS

Y! ½ ½ 0

A ½ 0 0

Example:1 1 ¾ 5/8 0

A ½ 0 0

MS 0 ½ 0

y 1 1 ¾ 5/8 0a = 1 ½ ½ 3/8 … 0m 1 ½ ¼ ¼ 0

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Page 24: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Power iteration: Y! Power iteration: Set ri=1 /d

Y!

A MS rj=j ri/di And iterate

Y! A MS

Y! ½ ½ 0

A ½ 0 0

Example:1 1 ¾ 5/8 0

A ½ 0 0

MS 0 ½ 1

y 1 1 ¾ 5/8 0a = 1 ½ ½ 3/8 … 0m 1 3/2 7/4 2 3

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 24

Page 25: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

“Tax” each page by at each iteration Tax  each page by at each iteration

Add a fixed constant to all pages

Models a random walk with a fixed probability of jumping to a random pageprobability of jumping to a random page

We really want:(1 ) /drj=(1‐) ij ri/di + 

Random walk that follows a link with prob. 1‐ and randomly jumps with prob randomly jumps with prob. 

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 25

di … outdegreeof node i

Page 26: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

PageRank as a principal eigenvector PageRank as a principal eigenvectorr=NTr rj=j ri/di

But we really want: But we really want:rj = (1‐) ij ri/di + iri

Define: Define:N’ij = (1‐)Nij +  1/n

Then: r = N’Trdi … outdegreeof node i

Then: r = N r What is ? In practice =0 15 (5 links and jump) In practice =0.15 (5 links and jump)

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 26

Page 27: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 27

Page 28: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Topic specific PageRank Topic‐specific PageRank Goal: evaluate pages not just by popularity but by how close they are to the topicbut by how close they are to the topic

Walker has a small teleporting probability Teleporting can go to: Teleporting can go to: Any page with equal probability (we used this so far) (we used this so far)

A topic‐specific set of “relevant” pages Topic‐specific (personalized) PageRank Topic‐specific (personalized) PageRank N’ij = (1‐)Nij +  c            (where c is a vector)

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 28

Page 29: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Link Farms: networks of Link Farms: networks of millions of pages design to focus PageRank on a few gundeserving webpages

To minimize their influence use a teleport t f t t d bset of trusted webpages

E.g., homepages of universitiesuniversities

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 29

Page 30: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

Rich get richer Rich get richer

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 30

Page 31: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 31

Page 32: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project

12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 32