4/14/2015 1 COMP 465: Data Mining More on PageRank Slides Adapted From: www.mmds.org (Mining Massive Datasets) Power Iteration: Set =1/N 1: โฒ = โ 2: = โฒ Goto 1 Example: r y 1/3 r a = 1/3 r m 1/3 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org y a m y a m y ยฝ ยฝ 0 a ยฝ 0 1 m 0 ยฝ 0 2 Iteration 0, 1, 2, โฆ r y = r y /2 + r a /2 r a = r y /2 + r m r m = r a /2 Power Iteration: Set =1/N 1: โฒ = โ 2: = โฒ Goto 1 Example: r y 1/3 1/3 5/12 9/24 6/15 r a = 1/3 3/6 1/3 11/24 โฆ 6/15 r m 1/3 1/6 3/12 1/6 3/15 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org y a m y a m y ยฝ ยฝ 0 a ยฝ 0 1 m 0 ยฝ 0 3 Iteration 0, 1, 2, โฆ r y = r y /2 + r a /2 r a = r y /2 + r m r m = r a /2 Imagine a random web surfer: At any time , surfer is on some page At time +, the surfer follows an out-link from uniformly at random Ends up on some page linked from Process repeats indefinitely Let: () โฆ vector whose th coordinate is the prob. that the surfer is at page at time So, () is a probability distribution over pages J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 4 j i i j r r (i) d out j i 1 i 2 i 3
6
Embed
N COMP 465: Data Mining More on PageRankcs.rhodes.edu/welshc/COMP465_S15/Lecture16.pdf4/14/2015 1 COMP 465: Data Mining More on PageRank Slides Adapted From: (Mining Massive Datasets)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 22
[x]N โฆ a vector of length N with all entries x Note: Here we assumed M
has no dead-ends
We just rearranged the PageRank equation
๐ = ๐ท๐ด โ ๐ +๐ โ ๐ท
๐ต๐ต
where [(1-)/N]N is a vector with all N entries (1-)/N
M is a sparse matrix! (with no dead-ends)
10 links per node, approx 10N entries So in each iteration, we need to:
Compute rnew = M โ rold
Add a constant value (1-)/N to each entry in rnew
Note if M contains dead-ends then ๐๐๐๐๐
๐ < ๐ and we also have to renormalize rnew so that it sums to 1
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 23
Input: Graph ๐ฎ and parameter ๐ท Directed graph ๐ฎ (can have spider traps and dead ends) Parameter ๐ท
Output: PageRank vector ๐๐๐๐
Set: ๐๐๐๐๐ =
1
๐
repeat until convergence: ๐๐๐๐๐ค โ ๐๐
๐๐๐ > ๐๐
โ๐: ๐โฒ๐๐๐๐ = ๐ท
๐๐๐๐๐
๐ ๐๐โ๐
๐โฒ๐๐๐๐ = ๐ if in-degree of ๐ is 0
Now re-insert the leaked PageRank:
โ๐: ๐๐๐๐๐ = ๐โฒ๐
๐๐๐+๐โ๐บ
๐ต
๐๐๐๐ = ๐๐๐๐
24
where: ๐ = ๐โฒ๐๐๐๐ค
๐
If the graph has no dead-ends then the amount of leaked PageRank is 1-ฮฒ. But since we have dead-ends
the amount of leaked PageRank may be larger. We have to explicitly account for it by computing S. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org