Top Banner
Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School of Computer Science
64

Weighted Graphs and Disconnected Components Patterns and a Generator

Jan 06, 2016

Download

Documents

Eyal

Weighted Graphs and Disconnected Components Patterns and a Generator. Mary McGlohon , Leman Akoglu, Christos Faloutsos Carnegie Mellon University School of Computer Science. “Disconnected” components. In graphs a largest connected component emerges. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Weighted Graphs and Disconnected Components Patterns and a Generator

Weighted Graphs and Disconnected Components

Patterns and a Generator

Mary McGlohon, Leman Akoglu, Christos Faloutsos

Carnegie Mellon University

School of Computer Science

Page 2: Weighted Graphs and Disconnected Components Patterns and a Generator

2McGlohon, Akoglu, Faloutsos KDD08

Page 3: Weighted Graphs and Disconnected Components Patterns and a Generator

● In graphs a largest connected component emerges.

● What about the smaller-size components? ● How do they emerge, and join with the large

one?

3McGlohon, Akoglu, Faloutsos KDD08

“Disconnected” components

Page 4: Weighted Graphs and Disconnected Components Patterns and a Generator

4McGlohon, Akoglu, Faloutsos KDD08

Weighted edges

● Graphs have heavy-tailed degree distribution.● What can we also say about these edges?● How are they repeated, or otherwise weighted?

Page 5: Weighted Graphs and Disconnected Components Patterns and a Generator

5McGlohon, Akoglu, Faloutsos KDD08

Our goals

● Observe “Next-largest connected components”Q1. How does the GCC emerge?

Q2. How do NLCC’s emerge and join with the GCC?

● Find properties that govern edge weightsQ3: How does the total weight of the graph relate to

the number of edges?

Q4: How do the weights of nodes relate to degree?

Q5: Does this relation change with the graph?

● Q6: Can we produce an emergent, generative model

Page 6: Weighted Graphs and Disconnected Components Patterns and a Generator

66McGlohon, Akoglu, Faloutsos KDD08

Outline

● Motivation● Related work● Preliminaries● Data● Observations ● Model● Summary

1 2 3 4 5

Page 7: Weighted Graphs and Disconnected Components Patterns and a Generator

7McGlohon, Akoglu, Faloutsos KDD08

Properties of networks

● Small diameter (“small world” phenomenon)– [Milgram 67] [Leskovec, Horovitz 07]

● Heavy-tailed degree distribution– [Barabasi, Albert 99] [Faloutsos, Faloutsos,

Faloutsos 99]

● Densification– [Leskovec, Kleinberg, Faloutsos 05]

● “Middle region” components as well as GCC and singletons– [Kumar, Novak, Tomkins 06]

Page 8: Weighted Graphs and Disconnected Components Patterns and a Generator

8McGlohon, Akoglu, Faloutsos KDD08

Generative Models

● Erdos-Renyi model [Erdos, Renyi 60]● Preferential Attachment [Barabasi, Albert 99]● Forest Fire model [Leskovec, Kleinberg,

Faloutsos 05]● Kronecker multiplication [Leskovec,

Chakrabarti, Kleinberg, Faloutsos 07]● Edge Copying model [Kumar, Raghavan,

Rajagopalan, Sivakumar, Tomkins, Upfal 00]● “Winners don’t take all” [Pennock, Flake,

Lawrence, Glover, Giles 02]

Page 9: Weighted Graphs and Disconnected Components Patterns and a Generator

99McGlohon, Akoglu, Faloutsos KDD08

Outline

● Motivation● Related work● Preliminaries● Data● Observations ● Model● Summary

1 2 3 4 5 6

Page 10: Weighted Graphs and Disconnected Components Patterns and a Generator

10McGlohon, Akoglu, Faloutsos KDD08

Diameter

● Diameter of a graph is the “longest shortest path”.

n1

n2

n3

n4

n5

n6

n7

Page 11: Weighted Graphs and Disconnected Components Patterns and a Generator

11McGlohon, Akoglu, Faloutsos KDD08

Diameter

● Diameter of a graph is the “longest shortest path”.

diameter=3

n1

n2

n3

n4

n5

n6

n7

Page 12: Weighted Graphs and Disconnected Components Patterns and a Generator

12McGlohon, Akoglu, Faloutsos KDD08

Diameter

● Diameter of a graph is the “longest shortest path”.

● Effective diameter is the distance at which 90% of nodes can be reached.

diameter=3

n1

n2

n3

n4

n5

n6

n7

Page 13: Weighted Graphs and Disconnected Components Patterns and a Generator

1313McGlohon, Akoglu, Faloutsos KDD08

Outline

● Motivation● Related work● Preliminaries● Data● Observations ● Model● Summary

1 2 3 4 5

Page 14: Weighted Graphs and Disconnected Components Patterns and a Generator

14McGlohon, Akoglu, Faloutsos KDD08

Unipartite Networks● Postnet: Posts in blogs, hyperlinks

between

● Blognet: Aggregated Postnet, repeated edges

● Patent: Patent citations

● NIPS: Academic citations

● Arxiv: Academic citations

● NetTraffic: Packets, repeated edges

● Autonomous Systems (AS): Packets, repeated edges

n1

n2

n3

n4

n5

n6

n7

Page 15: Weighted Graphs and Disconnected Components Patterns and a Generator

15McGlohon, Akoglu, Faloutsos KDD08

Unipartite Networks● Postnet: Posts in blogs, hyperlinks

between

● Blognet: Aggregated Postnet, repeated edges

● Patent: Patent citations

● NIPS: Academic citations

● Arxiv: Academic citations

● NetTraffic: Packets, repeated edges

● Autonomous Systems (AS): Packets, repeated edges

n1

n2

n3

n4

n5

n6

n7

(3)

Page 16: Weighted Graphs and Disconnected Components Patterns and a Generator

16McGlohon, Akoglu, Faloutsos KDD08

Unipartite Networks● Postnet: Posts in blogs, hyperlinks

between

● Blognet: Aggregated Postnet, repeated edges

● Patent: Patent citations

● NIPS: Academic citations

● Arxiv: Academic citations

● NetTraffic: Packets, repeated edges

● Autonomous Systems (AS): Packets, repeated edges

n1

n2

n3

n4

n5

n6

n7

10

1.2

8.3

2

6

1

Page 17: Weighted Graphs and Disconnected Components Patterns and a Generator

17McGlohon, Akoglu, Faloutsos KDD08

Unipartite Networks

● (Nodes, Edges, Timestamps)● Postnet: 250K, 218K, 80 days

● Blognet: 60K,125K, 80 days

● Patent: 4M, 8M, 17 yrs

● NIPS: 2K, 3K, 13 yrs

● Arxiv: 30K, 60K, 13 yrs

● NetTraffic: 21K, 3M, 52 mo

● AS: 12K, 38K, 6 mo

n1

n2

n3

n4

n5

n6

n7

Page 18: Weighted Graphs and Disconnected Components Patterns and a Generator

18McGlohon, Akoglu, Faloutsos KDD08

Bipartite Networks

● IMDB: Actor-movie network

● Netflix: User-movie ratings

● DBLP: conference- repeated edges

– Author-Keyword

– Keyword-Conference

– Author-Conference

● US Election Donations: $ weights, repeated edges

– Orgs-Candidates

– Individuals-Orgs

n1

n2

n3

n4

m1

m2

m3

Page 19: Weighted Graphs and Disconnected Components Patterns and a Generator

19McGlohon, Akoglu, Faloutsos KDD08

Bipartite Networks

● IMDB: Actor-movie network

● Netflix: User-movie ratings

● DBLP: repeated edges

– Author-Keyword

– Keyword-Conference

– Author-Conference

● US Election Donations: $ weights, repeated edges

– Orgs-Candidates

– Individuals-Orgs

n1

n2

n3

n4

m1

m2

m3

Page 20: Weighted Graphs and Disconnected Components Patterns and a Generator

20McGlohon, Akoglu, Faloutsos KDD08

Bipartite Networks

● IMDB: Actor-movie network

● Netflix: User-movie ratings

● DBLP: repeated edges

– Author-Keyword

– Keyword-Conference

– Author-Conference

● US Election Donations: $ weights, repeated edges

– Orgs-Candidates

– Individuals-Orgs

n1

n2

n3

n4

m1

m2

m3

10

1.2 2

1

5

6

Page 21: Weighted Graphs and Disconnected Components Patterns and a Generator

21McGlohon, Akoglu, Faloutsos KDD08

Bipartite Networks

● IMDB: 757K, 2M, 114 yr

● Netflix: 125K, 14M, 72 mo

● DBLP: 25 yr

– Author-Keyword: 27K, 189K

– Keyword-Conference: 10K, 23K

– Author-Conference: 17K, 22K

● US Election Donations: 22 yr

– Orgs-Candidates: 23K, 877K

– Individuals-Orgs: 6M, 10M

n1

n2

n3

n4

m1

m2

m3

Page 22: Weighted Graphs and Disconnected Components Patterns and a Generator

2222McGlohon, Akoglu, Faloutsos KDD08

Outline

● Motivation● Related work● Preliminaries● Data● Observations● Model● Summary

1 2 3 4 5

Page 23: Weighted Graphs and Disconnected Components Patterns and a Generator

23McGlohon, Akoglu, Faloutsos KDD08

Observation 1: Gelling Point

Q1: How does the GCC emerge?

Page 24: Weighted Graphs and Disconnected Components Patterns and a Generator

24McGlohon, Akoglu, Faloutsos KDD08

Observation 1: Gelling Point

● Most real graphs display a gelling point, or burning off period

● After gelling point, they exhibit typical behavior. This is marked by a spike in diameter.

Time

Diameter

IMDBt=1914

Page 25: Weighted Graphs and Disconnected Components Patterns and a Generator

Observation 2: NLCC behavior

Q2: How do NLCC’s emerge and join with the GCC?

Do they continue to grow in size?Do they shrink?

Stabilize?

25McGlohon, Akoglu, Faloutsos KDD08

Page 26: Weighted Graphs and Disconnected Components Patterns and a Generator

26McGlohon, Akoglu, Faloutsos KDD08

Observation 2: NLCC behavior● After the gelling point, the GCC takes off, but

NLCC’s remain constant or oscillate.

Time

IMDB

CC size

Page 27: Weighted Graphs and Disconnected Components Patterns and a Generator

2727McGlohon, Akoglu, Faloutsos KDD08

Outline

● Motivation● Related work● Preliminaries● Data● Observations ● Model● Summary

1 2 3 4 5

Page 28: Weighted Graphs and Disconnected Components Patterns and a Generator

Observation 3

Q3: How does the total weight of the graph relate to the

number of edges?

28McGlohon, Akoglu, Faloutsos KDD08

Page 29: Weighted Graphs and Disconnected Components Patterns and a Generator

29McGlohon, Akoglu, Faloutsos KDD08

Observation 3: Fortification Effect

● $ = # checks ?

|Checks|

Orgs-Candidates

|$|

1980

2004

Page 30: Weighted Graphs and Disconnected Components Patterns and a Generator

30McGlohon, Akoglu, Faloutsos KDD08

Observation 3: Fortification Effect

● Weight additions follow a power law with respect to the number of edges:

– W(t): total weight of graph at t

– E(t): total edges of graph at t

– w is PL exponent

– 1.01 < w < 1.5 = super-linear!

– (more checks, even more $)

|Checks|

Orgs-Candidates

|$|

1980

2004

Page 31: Weighted Graphs and Disconnected Components Patterns and a Generator

Observation 4 and 5

Q4: How do the weights of nodes relate to degree?

Q5: Does this relation change over time?

31McGlohon, Akoglu, Faloutsos KDD08

Page 32: Weighted Graphs and Disconnected Components Patterns and a Generator

32McGlohon, Akoglu, Faloutsos KDD08

Observation 4:Snapshot Power Law

● At any time, total incoming weight of a node is proportional to in degree with PL exponent, iw. 1.01 < iw < 1.26, super-linear

● More donors, even more $

Edges (# donors)

In-weights($)

Orgs-Candidates

e.g. John Kerry, $10M received,from 1K donors

Page 33: Weighted Graphs and Disconnected Components Patterns and a Generator

33McGlohon, Akoglu, Faloutsos KDD08

Observation 5:Snapshot Power Law

● For a given graph, this exponent is constant over time.

Time

exponent

Orgs-Candidates

Page 34: Weighted Graphs and Disconnected Components Patterns and a Generator

3434McGlohon, Akoglu, Faloutsos KDD08

Outline

● Motivation● Related work● Preliminaries● Data● Observations ● Q6: Is there a generative, “emergent”

model?● Summary

Page 35: Weighted Graphs and Disconnected Components Patterns and a Generator

Goals of model

35McGlohon, Akoglu, Faloutsos KDD08

● a) Emergent, intuitive behavior● b) Shrinking diameter● c) Constant NLCC’s● d) Densification power law● e) Power-law degree distribution

Page 36: Weighted Graphs and Disconnected Components Patterns and a Generator

Goals of model

36McGlohon, Akoglu, Faloutsos KDD08

● a) Emergent, intuitive behavior● b) Shrinking diameter● c) Constant NLCC’s● d) Densification power law● e) Power-law degree distribution

= “Butterfly” Model

Page 37: Weighted Graphs and Disconnected Components Patterns and a Generator

37McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● A node joins a network, with own parameter.

n1

n2

n3

n4

n5

n6

n7

n8

pstep

“Curiosity”

Page 38: Weighted Graphs and Disconnected Components Patterns and a Generator

38McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● A node joins a network, with own parameter.

● With (global) phost, chooses a random host

n1

n2

n3

n4

n5

n6

n7

n8

phost “Cross-disciplinarity”

Page 39: Weighted Graphs and Disconnected Components Patterns and a Generator

39McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● A node joins a network, with own parameters.

● With (global) phost, chooses a random host – With (global) plink, creates link

n1

n2

n3

n4

n5

n6

n7

n8

plink“Friendliness”

Page 40: Weighted Graphs and Disconnected Components Patterns and a Generator

40McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● A node joins a network, with own parameters.

● With (global) phost, chooses a random host – With (global) plink, creates link

– With pstep travels to random neighbor

n1

n2

n3

n4

n5

n6

n7

n8

pstep

Page 41: Weighted Graphs and Disconnected Components Patterns and a Generator

41McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● A node joins a network, with own parameters.

● With (global) phost, chooses a random host – With (global) plink, creates link

– With pstep travels to random neighbor. Repeat.

n1

n2

n3

n4

n5

n6

n7

n8

plink

Page 42: Weighted Graphs and Disconnected Components Patterns and a Generator

42McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● A node joins a network, with own parameters.

● With (global) phost, chooses a random host – With (global) plink, creates link

– With pstep travels to random neighbor. Repeat.

n1

n2

n3

n4

n5

n6

n7

n8

pstep

Page 43: Weighted Graphs and Disconnected Components Patterns and a Generator

43McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● Once there are no more “steps”, repeat “host” procedure:– With phost, choose new host, possibly link, etc.

n1

n2

n3

n4

n5

n6

n7

n8

phost

Page 44: Weighted Graphs and Disconnected Components Patterns and a Generator

44McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● Once there are no more “steps”, repeat “host” procedure:– With phost, choose new host, possibly link, etc.

n1

n2

n3

n4

n5

n6

n7

n8

phost

Page 45: Weighted Graphs and Disconnected Components Patterns and a Generator

45McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● Once there are no more “steps”, repeat “host” procedure:– With phost, choose new host, possibly link, etc.

– Until no more steps, and no more hosts.

n1

n2

n3

n4

n5

n6

n7

n8

plink

Page 46: Weighted Graphs and Disconnected Components Patterns and a Generator

46McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

● Once there are no more “steps”, repeat “host” procedure:– With phost, choose new host, possibly link, etc.

– Until no more steps, and no more hosts.

n1

n2

n3

n4

n5

n6

n7

n8

pstep

Page 47: Weighted Graphs and Disconnected Components Patterns and a Generator

47McGlohon, Akoglu, Faloutsos KDD08

a) Emergent, intuitive behavior

Novelties of model:● Nodes link with probability

– May choose host, but not link (start new component)

● Incoming nodes are “social butterflies”– May have several hosts (merges components)

● Some nodes are friendlier than others– pstep different for each node

– This creates power-law degree distribution (theorem)

Page 48: Weighted Graphs and Disconnected Components Patterns and a Generator

Validation of Butterfly

● Chose following parameters:– phost= 0.3

– plink = 0.5

– pstep ~ U(0,1)

● Ran 10 simulations● 100,000 nodes per simulation

48McGlohon, Akoglu, Faloutsos KDD08

Page 49: Weighted Graphs and Disconnected Components Patterns and a Generator

b) Shrinking diameter

● Shrinking diameter– In model, gelling usually occurred around N=20,000

49McGlohon, Akoglu, Faloutsos KDD08

Nodes

Diam-eter

N=20,000

Page 50: Weighted Graphs and Disconnected Components Patterns and a Generator

● Constant / oscillating NLCC’s

Nodes

NLCCsize

c) Oscillating NLCC’s

50McGlohon, Akoglu, Faloutsos KDD08

N=20,000

Page 51: Weighted Graphs and Disconnected Components Patterns and a Generator

d) Densification power law

● Densification:– Our datasets had a=(1.03, 1.7)

– In [Leskovec+05-KDD], a= (1.1, 1.7)

– Simulation produced a = (1.1,1.2)

51McGlohon, Akoglu, Faloutsos KDD08

Nodes

EdgesN=20,000

Page 52: Weighted Graphs and Disconnected Components Patterns and a Generator

e) Power-law degree distribution

● Power-law degree distribution– Exponents approx -2

52McGlohon, Akoglu, Faloutsos KDD08

Degree

Count

Page 53: Weighted Graphs and Disconnected Components Patterns and a Generator

53McGlohon, Akoglu, Faloutsos KDD08

Summary

● Studied several diverse public graphs– Measured at many timestamps

– Unipartite and bipartite

– Blogs, citations, real-world, network traffic

– Largest was 6 million nodes, 10 million edges

Page 54: Weighted Graphs and Disconnected Components Patterns and a Generator

54McGlohon, Akoglu, Faloutsos KDD08

Summary

● Observations on unweighted graphs:A1: The GCC emerges at the “gelling point”

A2: NLCC’s are of constant / oscillating size

● Observations on weighted graphs:A3: Total weight increases super-linearly with edges

A4: Node’s weights increase super-linearly with degree, power law exponent iw

A5: iw remains constant over time

● A6: Intuitive, emergent generative “butterfly” model, that matches properties

Page 55: Weighted Graphs and Disconnected Components Patterns and a Generator

55McGlohon, Akoglu, Faloutsos KDD08

References[Barabasi+99] Barabasi, A. L. & Albert, R. (1999), 'Emergence of scaling in random networks',

Science 286(5439), 509--512.

[Erdos+60] Erdos, P. & Renyi, A. (1960), 'On the evolution of random graphs', Publ. Math. Inst. Hungary. Acad. Sci. 5, 17-61.

[Faloutsos*99] Faloutsos, M.; Faloutsos, P. & Faloutsos, C. (1999), 'On Power-law Relationships of the Internet Topology', SIGCOMM, 251-262.

[Kumar+99]. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal. Stochastic models for the Web graph. Proceedings of the 41th FOCS. 2000, pp. 57-65

[Kumar+06] Kumar, R.; Novak, J. & Tomkins, A. (2006), Structure and evolution of online social networks, in 'KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowedge Discover and Data Mining', pp. 611—617.

[Leskovec+05KDD] Leskovec, J.; Kleinberg, J. & Faloutsos, C. (2005), Graphs over time: densification laws, shrinking diameters and possible explanations, in 'KDD '05.

[Leskovec+07] Leskovec, J.; Faloutsos, C. Scalable modeling of real graphs using Kronecker Multiplication. ICML 2007.

[Milgram+67] Milgram, S. (1967), 'The small-world problem', Psychology Today 2, 60—67.

[Pennock+02] Winners don’t take all: Characterizing the competition for links on the web PNAS 2002

[Wang+2002] Wang, M.; Madhyastha, T.; Chang, N. H.; Papadimitriou, S. & Faloutsos, C. (2002), 'Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic', ICDE.

Page 56: Weighted Graphs and Disconnected Components Patterns and a Generator

56McGlohon, Akoglu, Faloutsos KDD08

Contact usLeman Akoglu

www.andrew.cmu.edu/~lakoglu

[email protected]

Christos Faloutsos

www.cs.cmu.edu/~christos

[email protected]

Mary McGlohon

www.cs.cmu.edu/~mmcgloho

[email protected]

Page 57: Weighted Graphs and Disconnected Components Patterns and a Generator

● From time series data, begin with resolution r= T/2.

● Record entropy HR

57McGlohon, Akoglu, Faloutsos KDD08

Entropy plots [Wang+2002]

Time

Weights

Resolution

Entropy

Page 58: Weighted Graphs and Disconnected Components Patterns and a Generator

● From time series data, begin with resolution r= T/2.

● Record entropy HR`

58McGlohon, Akoglu, Faloutsos KDD08

Entropy plots

Time

Weights

Resolution

Entropy

Page 59: Weighted Graphs and Disconnected Components Patterns and a Generator

● From time series data, begin with resolution r= T/2.

● Record entropy HR

● Recursively take finer resolutions.

59McGlohon, Akoglu, Faloutsos KDD08

Entropy plots

Time

Weights

Resolution

Entropy

Page 60: Weighted Graphs and Disconnected Components Patterns and a Generator

● From time series data, begin with resolution r= T/2.

● Record entropy HR

● Recursively take finer resolutions.

60McGlohon, Akoglu, Faloutsos KDD08

Entropy plots

Time

Weights

Resolution

Entropy

Page 61: Weighted Graphs and Disconnected Components Patterns and a Generator

61McGlohon, Akoglu, Faloutsos KDD08

Entropy Plots

● Self-similarity Linear plot

Resolution

En

trop

y s= 0.59

● Self-similarity Linear plot●

Page 62: Weighted Graphs and Disconnected Components Patterns and a Generator

62McGlohon, Akoglu, Faloutsos KDD08

Entropy Plots

● Self-similarity Linear plot

Resolution

En

trop

y s= 0.59

● Self-similarity Linear plot● Uniform: slope of plot s=1.

time

Page 63: Weighted Graphs and Disconnected Components Patterns and a Generator

63McGlohon, Akoglu, Faloutsos KDD08

Entropy Plots

● Self-similarity Linear plot

Resolution

En

trop

y s= 0.59

● Self-similarity Linear plot● Uniform: slope of plot s=1. Point mass: s=0

time time

Page 64: Weighted Graphs and Disconnected Components Patterns and a Generator

64McGlohon, Akoglu, Faloutsos KDD08

Entropy Plots

● Self-similarity Linear plot

Resolution

En

trop

y s= 0.59

● Self-similarity Linear plot● Uniform: slope of plot s=1. Point mass: s=0

time time

Bursty: 0.2 < s < 0.9