Top Banner
1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins
33

1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

Dec 16, 2015

Download

Documents

Roy Nash
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

1

Connectivity Structure of Bipartite Graphs via the KNC-Plot

Erik Vee

joint work with

Ravi Kumar, Andrew Tomkins

Page 2: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

2

The fundamental question…

• Given graph with millions/billions of nodes, how do we understand it?

Page 3: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

3

Macroscopic Success Stories

• Given graph with millions/billions of nodes, how do we understand it?

• Spectral Graph Analysis– Eigenvalues reveal intuition for mixing time, connectivity

• Conductance of a graph

• Degree distribution

Page 4: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

4

Macroscopic models of graphs:Understanding connectivity

Bow tie model [Broder et al]Web graph

Jellyfish model [Faloutsos et al]Internet AS graph

No equivalent model for bipartite graphs

Page 5: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

5

Our Goals

• Develop macroscopic tools to analyze social networks– Massive networks

– What are simple, easy-to-understand properties?

– Today: KNC-plot for bipartite graphs

• Given implicit graph representation,do something smarter than explicitly building graph– Bipartite representation gives an implicit graph

– Our algorithms never build actual graph

– Same spirit as work of [Feder, Motwani 95]

Page 6: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

6

Outline

• Definition of the KNC-plot– k-neighborhood graph

• Analysis of real social networks using the KNC-plot

• Description of algorithm

Page 7: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

7

The k-neighborhood graph, Gk

• Given bipartite graph B, users on left, interests on right

• Connect two users if they share at least k interests in common

Page 8: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

8

The k-neighborhood graph, Gk

• Given bipartite graph B, users on left, interests on right

• Connect two users if they share at least k interests in common

G1

Page 9: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

9

• Given bipartite graph B, users on left, interests on right

• Connect two users if they share at least k interests in common

The k-neighborhood graph, Gk

G2

Page 10: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

10

• Given bipartite graph B, users on left, interests on right

• Connect two users if they share at least k interests in common

The k-neighborhood graph, Gk

G3

Page 11: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

11

Illustration k=1

Page 12: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

12

Illustration k=2

Page 13: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

13

Illustration k=3

Page 14: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

14

Illustration k=4

Page 15: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

15

Illustration k=5

Page 16: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

16

The KNC-plot

• The k-neighbor connectivity plot

– How many connected components does Gk have?

– What is the size of the largest component?

• Answers the question: how many shared interests are meaningful?– Communities, Cuts

Page 17: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

17

Analysis

• Four graphs:– LiveJournal

• Blogging site, users can specify interests

– Y! query logs (interests = queries)

• Queries issued for Yahoo! Search (Try it at www.yahoo.com)

– Content match (users = web pages, interests = ads)

• Ads shown on web pages

– Flickr photo tags (users = photos, interests = tags)

• All data anonymized, sanitized, downsampled– Graphs have 100s of thousands to a million users

Page 18: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

18

Examples— Largest component— Number of components

At k=5, all connected.At k=6, interesting!

At k=6, nobody connected

Page 19: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

19

Examples— Largest component— Number of components

At k=5, all connected.At k=6, interesting!

At k=6, nobody connected

Content matchWeb pages = “users”Ads = “interests”

FlickrPhotos = “users”Tags = “interests”

Page 20: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

20

Examples— Largest component— Number of components

Connectivity smoothly varies“Heavy-tailed”

At k=14, 10% connectedAt k=36, 1% connected

Page 21: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

21

Examples— Largest component— Number of components

Connectivity smoothly varies“Heavy-tailed”

At k=14, 10% connectedAt k=36, 1% connected

Y! queriesUsers = usersQueries = “interests”

LiveJournalUsers = usersInterests = interests

Page 22: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

22

Algorithms

• Naïve implementation takes O(mn) time– Impractical for

large graphs

— Naïve— Ours For k = 2

Page 23: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

23

Algorithms

• Naïve implementation takes O(mn) time– Impractical for

large graphs

• Our implementation takes O(m2-1/k) time– Social networks are generally sparse

– Faster for power-law distribution (no change in the algorithm)

– Very fast for k=2, can trim graph for k=3, etc.

Space O(km)

— Naïve— Ours For k = 2

Page 24: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

24

Alg-Intersect

• Roughly speaking, for every pair of users, determine whether they have k interests in common

• For each node u, record its neighborhood– For each node v,

• see if u’s and v’s neighborhoods intersect in at least k nodes

– If so, connect them, otherwise don’t

• Takes O(nm) time (n= # nodes, m = # edges)

Space = O(m)

Page 25: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

25

Alg-Intersect

• Roughly speaking, for every pair of users, determine whether they have k interests in common

• For each node uS, record its neighborhood– For each node v,

• see if u’s and v’s neighborhoods intersect in at least k nodes

– If so, connect them, otherwise don’t

• Takes O(nm) time (n= # nodes, m = # edges)

• BUT: May explore only nodes in set S.– Takes O(|S|m) time

Space = O(m)

Page 26: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

26

Alg-Tuples

• Consider k=2.

• Suppose user 1 has interests {A,B,C} user 2 has interests {A,C,D}

• Create “virtual nodes”

• Connect user 1 to {AB}, {AC}, {BC}

• Connect user 2 to {AC}, {AD}, {CD}

• There is an edge between user 1 and user 2 in Gk

iff there is a virtual node that both are connected to.

Page 27: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

27

Alg-Tuples

• For each node u,– Create virtual nodes for u (if not already created)

– Connect u to those virtual nodes

• // (note: there are O( deg(u)k ) of them)

• Figure out connectivity of Gk using virtual graph

• Runtime O( u deg(u)k)

– Uses Union-Set structure

– Edges not actually explicitly computed

Space O ( u deg(u)k)

Page 28: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

28

Combining them

• Run Alg-Intersect for some subset S of nodes

– We know all edges in Gk that go from uS to any node v

– Runtime O(|S|m)

S

Other nodes

High degree nodes

Page 29: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

29

Combining them

• Run Alg-Intersect for some subset S of nodes

– We know all edges in Gk that go from uS to any node v

– Runtime O(|S|m)

• Run Alg-Tuple on the rest of the nodes

– We “know” all edges in Gk that go from uS to vS

– Runtime O(uS deg(u)k )

S

Other nodes

Page 30: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

30

• Order u1, u2, … by decreasing deg(ui)

• Initialize b=1. Increase b until

i≥b deg(ui)k ≤ bm

• Let S = {u1, u2 …, ub}

• Run Alg-Intersect on nodes in S

• Run Alg-Tuple on nodes not in S– Connect the two

• Runtime is

O(bm) + O(i≥b deg(ui)k ) = O(2bm)

Finding S

High degree nodes

Page 31: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

31

Combining them

• Runtime is O(bm) + O(i≥b deg(ui)k )

• But, for any graph, deg(ui) ≤ m/i (by Markov)

– Do not need power-law

• Hence, bm = i≥b deg(ui)k ≤ i≥b mk /ik = O( mk/bk )

• So b = O(m1-1/k) Runtime is O(m2-1/k)

Page 32: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

32

Extensions

• Power-law distributed provably faster– O(m1+(1-1/k)/) for power law with exponent

– Algorithm works exactly the same

– No need to know whether power-law ahead of time

• When set of interests is logarithmic, can get quasi-linear time algorithms– Different algorithm

– In paper

Page 33: 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

33

Conclusion

• KNC-plot useful tool– Exposes how meaningful shared interests are

• The k-neighborhood graph defined implicitly– Efficient algorithm for implicit graph

– Other algorithms for Gk, given bipartite representation

• Find additional social graph properties that are meaningful, computable– Describe macroscopic structure of social networks