Top Banner
CENTRALITY MEASURES Shaikh Arifuzzaman and Md Hasanuzzaman Bhuiyan CS 6604: Data Mining Large Networks and Time-series Fall 2013
69

CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Feb 06, 2018

Download

Documents

hoangdan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

CENTRALITY MEASURES Shaikh Arifuzzaman and Md Hasanuzzaman Bhuiyan

CS 6604: Data Mining Large Networks and Time-series Fall 2013

Page 2: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Outline 2

Part 1 Basic Centrality Concepts

Degree Centrality Betweenness Centrality Closeness Centrality Eigenvector Centrality Centralization

Part 2 Part 2A

Hub and Authorities (HITS Algorithm) PageRank

Part 2B Spectral Analysis of Hub and Authorities Spectral Analysis of PageRank

Page 3: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PART 1 Basic centrality concepts

Page 4: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Centrality

Relative importance of a node in the graph Which nodes are in the “center” of a graph?

What do you mean by “center”? Definition of “center” varies by context/purpose

“There is certainly no unanimity on exactly what centrality is or on its conceptual foundations, and there is little agreement on the proper procedure for its measurement.” by Freeman, 1979

4

Page 5: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Centrality

Real valued function on the nodes of a graph Structural index Applications:

How influential a person is in a social network? How well used a road is in a transportation network? How important a web page is? How important a room is in a buildling?

5

Page 6: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Centrality Measures

Different measures of centrality: Degree centrality Betweenness centrality Closeness centrality Eigenvector centrality

6

Page 7: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Example [Borgatti, 2005] 7

Page 8: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Degree Centrality

8

Page 9: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Degree Centrality

Most intuitive notion of centrality Node with the highest degree is most important Index of exposure to what is flowing through the

network Gossip network: central actor more likely to hear a

gossip

Normalized degree centrality Divide by max. possible degree (n-1)

9

Page 10: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Degree Centrality

Example:

10

Page 11: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Degree Centrality

When to use? Whom to ask for favor? People you can talk to

11

Page 12: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Degree Centrality

Can be deceiving Why? Local measure

12

Page 13: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Betweenness Centrality

13

Page 14: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Betweenness Centrality

BC of a node 𝑢 is the ratio of the shortest paths between all other nodes, that pass through node 𝑢

Quantifies the control of a node on the communication between other nodes

First introduced by Freeman

𝐶𝐵(𝑢) = ∑ δ𝑠𝑠(𝑢)δ𝑠𝑠𝑠≠𝑣≠𝑡

𝑠 = source 𝑡 = destination δ𝑠𝑡 = number of shortest paths between (𝑠, 𝑡) δ𝑠𝑡(𝑢) = number of shortest paths between (𝑠, 𝑡) that pass

through 𝑢

14

Page 15: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Betweenness Centrality

Example:

𝐴 lies between no two other vertices 𝐵 lies between 𝐴 and 3 other vertices: 𝐶,𝐷, and 𝐸 𝐶 lies between 4 pairs of vertices

(𝐴,𝐷), (𝐴,𝐸), (𝐵,𝐷), (𝐵,𝐸)

A B C E D

15

Page 16: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Betweenness Centrality

More Example: why do C and D each have

betweenness 1? They are both on shortest

paths for pairs (A,E), and (B,E), and so must share credit: ½+½ = 1

Can you figure out why B has betweenness 3.5 while E has betweenness 0.5?

A B

C

E

D

16

Page 17: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Betweenness Centrality

Famous algorithm by Brandes 𝑂(𝑚𝑚) for unweighted graph 𝑂(𝑚2 log𝑚 + 𝑚𝑚) for weighted graph

Edge betweenness centrality Pass through that edge

Normalize Divide by 𝑛−1

2 for undirected graph Number of pairs of nodes excluding itself

Divide by (𝑚 − 1)(𝑚 − 2) for directed graph

17

Page 18: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Betweenness Centrality

Normalized example:

18

Page 19: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Betweenness Centrality

Normalized example: Red circled node has

low centrality value. Why?

Green circled node has high value. Why?

19

Page 20: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Closeness Centrality

20

Page 21: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Closeness Centrality

A node is considered important if it is relatively close to all other nodes.

Farness of a node is the sum of its distances to all other nodes.

Closeness if the inverse of the farness.

𝐶𝐶 𝑢 = 1∑ 𝑑(𝑢,𝑣)𝑣≠𝑢

Normalized: Divide by (𝑚 − 1)

21

Page 22: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Closeness Centrality

Closeness is a measure of how long it will take to spread information from node 𝑢 to all other nodes

Normalized Example:

Cc' (A) =

d(A, j)j=1

N

∑N −1

−1

=1+ 2 + 3+ 4

4

−1

=104

−1

= 0.4

A B C E D

22

Page 23: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Closeness Centrality

More example:

23

Page 24: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Comparison

Comparing across 3 centrality values Generally, the 3 types will be positively correlated

When they are not, it tells you something interesting!

Low Degree Low Closeness Low Betweenness

High Degree

Embedded in cluster that is far from the rest of the network

Ego's connections are redundant - communication bypasses him/her

High Closeness

Key player tied to important/active alters

Probably multiple paths in the network, ego is near many people, but so are many others

High Betweenness

Ego's few ties are crucial for network flow

Very rare cell. Would mean that ego monopolizes the ties from a small number of people to many others.

24

Page 25: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Eigenvector Centrality

Measure of the influence of a node in a network Connections to high-scoring nodes contribute more “An important node is connected to important

neighbor” Google’s PageRank is a variant of Eigenvector

centrality Eigenvector centrality of 𝑣, Power iteration is one of the eigenvalue algorithm

25

Page 26: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Centralization of Network

Measure of how central its most central node is in relation to how central all the other nodes are

How much variation in the centrality scores? Every centrality measure can have its own

centralization measure Freeman’s formula for centralization of degree:

𝐶𝐷 =

∑ [𝐶𝐷 𝑚∗ − 𝐶𝐷(𝑖)]𝑛𝑖=1

(𝑁 − 1)(𝑁 − 2)

maximum value in the network

26

theoretically largest such sum of differences in any network of the same degree

Page 27: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Centralization of Network

Degree Centralization Example:

CD = 0.167

CD = 0.167 CD = 1.0

27

Page 28: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Centralization of Network

Degree Centralization Example: financial trading networks

high centralization: one node trading with many others

low centralization: trades are more evenly distributed

28

Page 29: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PART 2A Hub-Authority and PageRank: Conceptual Introduction

Page 30: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Searching the Web

How does Google know the “best” answers?

How hard is the problem? Synonymy Polysemy dynamicity

Understanding the network structure of web pages is crucial

30

Page 31: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Link Analysis

In this hyperlinked network of webpages, which pages are most popular/important? More in-links? More out-links? Combinations?

31

Page 32: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Voting by in-links

How to rank pages From in-links?

Intuition: Implicit endorsement Single vs aggregate endorsement Page referred by most preferred

2

3

4

1

32

Page 33: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

How about out-links

Any implication of out-links?

2

3

4

1

33

Page 34: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

An example [Kleinberg]

In-links to pages for the query newspaper

Pages getting higher in-links from other relevant pages are important

34

Page 35: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

An example [Kleinberg] contd.

Good lists: some pages compile lists of relevant resources

Pages listing higher number of relevant resources should score higher as lists

35

Page 36: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

An example [Kleinberg] contd.

Updated score: some of scores of all lists that point to it

Where does it head to? - Principle of repeated improvement

36

Page 37: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Hub-Authority (HITS Algorithm)

Authority: highly endorsed answers to queries Hub: high value lists for the query

Quality of hubs to refine estimate of the quality of the authorities • Authority update rule • Hub update rule

• Recursive dependency: 𝑎(𝑣) Σ𝑤𝜖𝜖𝜖𝜖𝜖𝑛𝑡[𝑣] ℎ(𝑤) ℎ(𝑣) Σ𝑤𝜖𝑐𝑐𝑖𝑐𝑑𝜖𝜖𝑛[𝑣]𝑎(𝑤)

37

Page 38: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Hub-Authority (HITS Algorithm)

Authority: highly endorsed answers to queries Hub: high value lists for the query

5

6

7

1

2

3

4

1

a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7)

38

Page 39: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Hub-Authority (HITS Algorithm) starts with all hub and authority scores equal to 1 chooses a number of steps K performs a sequence of K Authority and Hub updates in this order.

5

6

7

1

2

3

4

1

a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7)

39

Page 40: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Hub-Authority (HITS Algorithm) starts with all hub and authority scores equal to 1 chooses a number of steps K performs a sequence of K Authority and Hub updates in this order.

Problems Score grows to very large numbers Actually converges?

40

Page 41: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Hub-Authority (HITS Algorithm)

Problems Score grows to very large numbers

normalization Actually converges?

Equilibrium Effect of initial values

41

Page 42: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank

Keys: Mode of endorsement form the basis of PageRank Starts with simple voting on in-links Pass endorsement across out-links Repeated improvement

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. —Facts about Google and Competition

42

Page 43: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

Computation procedure: Each node with initial pagerank 1

𝑛

A number of steps K K updates of PageRank values Each node/page divides it current PageRank value

equally across its out-links Each page updates its new PageRank value to be the

sum of what it receives

Think as kind of “fluid” that circulates through networks

43

Page 44: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

What is the PageRank of node A at step 1?

44

Page 45: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

Computation procedure: Each node with initial pagerank 1

8

Step 1: PR(A) = ½*PR(D) + ½*PR(E) + PR(H) + PR(F) + PR(G)=1/16+1/16+1/8+1/8+1/8=1/2

45

Page 46: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

Convergence/equlibrium? Is there any? How to check?

46

Page 47: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

Do you see any problem with the definition?

47

Page 48: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

It’s leaking!

48

Page 49: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

What would happen here? [Broder et al. 2001]

49

Page 50: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

Solution: scaled PageRank Update rule

Scaling factor s Scale down all PageRank

values by a factor of s Divide residual 1-s equally

over all nodes, (1-s)/n to each.

50

Page 51: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

Limit of scaled PageRank

Still converges? Depends on scaling factor? Sensitivity to

addition/deletion of pages?

51

Page 52: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank (contd.)

Limit of scaled PageRank

Still converges? YES Depends on scaling factor? YES Sensitivity to addition/deletion

of pages? [Ng et al. 2001]

52

Page 53: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank: alternate definition

Random walk

Choose a page at random Pick each edge with equal probability Follow links for a sequence of k steps Pick a random out-links Follow it to where it leads

53

Page 54: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank: alternate definition

Scaled version of Random walk?

54

Page 55: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PART 2B Spectral analysis of Hub-Authority and PageRank

Page 56: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Spectral Analysis of Hub-Authorities

Adjacency matrix representation of link structure, 𝑀𝑖𝑖 Hub and authority values of nodes are two distinct

vectors

Goal: Hub-authority computation converges to limiting values 56

Page 57: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Spectral Analysis of Hub-Authorities

Example: Updating hub values 57

Page 58: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Spectral Analysis of Hub-Authorities

Example: Updating authority values 58

Page 59: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Spectral Analysis of Hub-Authorities

Example: Updating hub and authority values

1

3

2

59

Page 60: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Spectral Analysis of Hub-Authorities

Example: Updating hub and authority values

k

Multiplying an initial vector by larger and larger power of MTM and MMT respectively

60

Page 61: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Convergence of Hubs

Normalization required for convergence to limit as k goes to infinity

Eigenvector ℎ <∗>

Eigenvalue 𝑐 The proof reduces to The sequence of vectors ℎ <∗>/𝑐𝑘 indeed

converges to an eigenvector of MMT

61

Page 62: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Convergence of Hubs

Theorem [Ref 268, Kleinberg Book]

Orthogonal eigenvector:

Corresponding eigenvalues: Assumptions:

62

Page 63: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Convergence of Hubs

In a similar fashion,

Proof:

As k goes to infinity,

63

Page 64: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Convergence of Hubs

Needs to show that,

Proof (contd.):

1. The coefficient q1 is not zero. 2. Limit exists regardless of the initial hub values Any positive initial vector x works; different linear combination.

Proving 1,

Only requirement is x not being orthogonal to z1

Can be proved that no positive vector is orthogonal to z1

64

Page 65: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Spectral analysis of PageRank

Adjacency matrix representation of link structure, 𝑁𝑖𝑖, portion of i’s pagerank that should be passed to j in one update step.

PageRank vector r

Goal: PageRank computation converges to limiting values 65

Page 66: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Spectral analysis of PageRank

Goal: PageRank computation converges to limiting values

If li outgoing edges:

If no outgoing edge:

Scaled version,

66

Page 67: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Convergence of PageRank

We will apply here Perrons Theorem [Ref268, Kleiberg Book]

Proof:

67

Page 68: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

PageRank as a probability of random walk

Scaled version,

68

Page 69: CS 6604: Data Mining Large Networks and Time -series …people.cs.vt.edu/badityap/classes/cs6604-Fall13/student-lectures/... · CENTRALITY MEASURES . Shaikh Arifuzzaman and Md Hasanuzzaman

Questions?

69