Transcript

Graphs, Edges & Nodes

Untangling the social web.

Wednesday, March 9, 2011

What’s a graph?

Wednesday, March 9, 2011

Graph

Wednesday, March 9, 2011

Graph

Wednesday, March 9, 2011

Graph

Wednesday, March 9, 2011

Graph

6

73

14

6

4

3

1

4

5 7

13

4

199

12

157 2

10

9

Wednesday, March 9, 2011

Graph

6

73

14

6

4

3

1

4

5 7

13

4

199

12

157 2

10

9 13

12

19

10

15

6

11 10

8

17

4

6

2

21

22

9

3

Wednesday, March 9, 2011

Simple

At most one edge between any pair of nodes.

Wednesday, March 9, 2011

Multigraph

Multiple edges between vertices allowed.

Wednesday, March 9, 2011

Pseudograph

Self-loops are permitted.

Wednesday, March 9, 2011

G = (V, E)

Wednesday, March 9, 2011

Wednesday, March 9, 2011

What’s a node?

vertexpoint

junction0-simplex

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

What’s an edge?

arcbranch

linelink

1-simplex

Wednesday, March 9, 2011

Directed

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Undirected

Wednesday, March 9, 2011

Undirected

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Data Structures

Wednesday, March 9, 2011

1

2

4

3

(Finite simple graph)

Wednesday, March 9, 2011

Adjacency Matrix(2d array)

0 1 1 1

1 0 0 0

1 0 0 1

1 0 1 0

vertices

vertices

Wednesday, March 9, 2011

Adjacency Matrix(2d array)

0 1 1 1

1 0 0 0

1 0 0 1

1 0 1 0

vertices

vertices

Wednesday, March 9, 2011

1

2

4

3

(Finite simple graph)

Wednesday, March 9, 2011

[1, 2, 3, 4]234

1 14

13

Array entries (vertices) point to singly linked-lists

Wednesday, March 9, 2011

Visualizations

Wednesday, March 9, 2011

You are here.

Wednesday, March 9, 2011

Wednesday, March 9, 2011

(Graph does not include Justin Bieber)

Wednesday, March 9, 2011

Social Graphs

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

Wednesday, March 9, 2011

User-based item recommendations

Wednesday, March 9, 2011

People

Recommend items to me that are popular amongst my friends

Wednesday, March 9, 2011

People

Recommend items to me that are popular amongst my friends

(friends)

Wednesday, March 9, 2011

People

Recommend items to me that are popular amongst my friends

Items

(friends)

Wednesday, March 9, 2011

People

Recommend items to me that are popular amongst my friends

Items

(friends)

Wednesday, March 9, 2011

People

Recommend items to me that are popular amongst my friends

(friends)(me)

Items

Wednesday, March 9, 2011

People

Recommend items to me that are popular amongst my friends

(friends)(me)

Items

Wednesday, March 9, 2011

People

Recommend items to me that are popular amongst my friends

(friends)(me)

Items

Wednesday, March 9, 2011

2-step path on homogeneous bipartitegraph.

Wednesday, March 9, 2011

Strong Connection Problem (SCP)

Wednesday, March 9, 2011

There are many of these ‘fundamental’ graph units:

- tripartite graphs (user/asset/tag)- folksonomies- multicolor-multiparity graph- etc.

Wednesday, March 9, 2011

Graph Storage Engines

Wednesday, March 9, 2011

Neo4j“An embedded, disk-based, fully transactional Java persistence engine that

stores data structured in graphs rather than in tables.”

http://neo4j.org

Wednesday, March 9, 2011

HypergraphDB“A general purpose, extensible, portable, distributed, embeddable, open-source

data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects.”

http://kobrix.org/hgdb.jsp

Wednesday, March 9, 2011

Special Purpose Storage Engines

Wednesday, March 9, 2011

FlockDB“FlockDB is a database that stores graph data, but it isn't a database

optimized for graph-traversal operations. Instead, it's optimized for very large adjacency lists, fast reads and writes, and page-able set arithmetic

queries.”

http://engineering.twitter.com/2010/05/introducing-flockdb.html

Wednesday, March 9, 2011

Redis“Redis is an advanced key-value store. [...] the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference between sets, etc.”

http://code.google.com/p/redis

Wednesday, March 9, 2011

A Redis Friends/Followers Example

Wednesday, March 9, 2011

Redis makes you think in terms of datastructures,and operations on those structures.

Wednesday, March 9, 2011

Set:Finite (for our cases) collection of objects in which order has no significance and multiplicity is generally ignored.

S = { Alice, Bob, Carol }

List:Finite (for our cases) collection of objects in which order *is* significant and multiplicity is allowed.

L = [ X, Y, X, Z, Q]

Wednesday, March 9, 2011

SET uid:1000:username jperras

Insert a user into a set

Command Key Value

Wednesday, March 9, 2011

Use sets for denoting my followers/peopleI follow.

Wednesday, March 9, 2011

SADD uid:1000:following 1001SADD uid:1001:followers 1000

Adding a new follower

Command Key Value

Wednesday, March 9, 2011

Posting Updates

$r = Redis();$postid = $r->incr("global:nextPostId");$post = $User['id'] ."|". time() ."|". $status;$r->set("post:$postid", $post);$followers = $r->smembers("uid:".$User['id'].":followers");

if ($followers === false) $followers = Array();$followers[] = $User['id']; /* Add the post to our own posts too */

foreach($followers as $fid) {    $r->push("uid:$fid:posts", $postid, false);}# Push the post on the timeline, and trim the timeline to the# newest 1000 elements.$r->push("global:timeline", $postid, false);$r->ltrim("global:timeline",0,1000);

Wednesday, March 9, 2011

Common followers? - Set intersections!

SINTER users:1000:followers users:1000:followers

Command Key 1 Key 2

Wednesday, March 9, 2011

A MySQL Example

(simplified)

Wednesday, March 9, 2011

# Mutual Friendsselect f.friend_id from friends f join friends m on m.user_id = f.friend_id and m.friend_id = f.user_idwhere f.user_id = 1234

# Following (for directed graphs)select f.friend_id from friends f left join friends m on m.user_id = f.friend_id and m.friend_id = f.user_id where f.user_id = 1234 and m.user_id is null;

# Followers (for directed graphs)select m.friend_id from friends f left join friends m on m.user_id = f.friend_id and m.friend_id = f.user_id where f.friend_id = 1234 and m.user_id is null

Wednesday, March 9, 2011

Not too bad.

# Mutual Friendsselect f.friend_id from friends f join friends m on m.user_id = f.friend_id and m.friend_id = f.user_idwhere f.user_id = 1234

# Following (for directed graphs)select f.friend_id from friends f left join friends m on m.user_id = f.friend_id and m.friend_id = f.user_id where f.user_id = 1234 and m.user_id is null;

# Followers (for directed graphs)select m.friend_id from friends f left join friends m on m.user_id = f.friend_id and m.friend_id = f.user_id where f.friend_id = 1234 and m.user_id is null

Wednesday, March 9, 2011

Relational databases can work for the simplestof cases, but are not always the best solution for

many graph operations/algorithms.

Wednesday, March 9, 2011

Graphs and graph-databases are onlygoing to be more and more useful.

Wednesday, March 9, 2011

However, graph algorithms are hard.

So don’t write your own.

And make sure you use a persistent storage enginethat is best suited for the type of queries

you will be performing.

Wednesday, March 9, 2011

Resources

Wednesday, March 9, 2011

The Algorithm Design Manual, Steve S. Skiena

Programming Collective Intelligence, Toby Segaran

Introduction to Algorithms, Cormen, Leiserson, Rivest

Resources

Wednesday, March 9, 2011

@jperras

Wednesday, March 9, 2011

Graph of the internet, circa 2003: http://www.duniacyber.com/freebies/education/what-is-internet-lookslike/ (built from partial troll of public servers using traceroute)

My real friends for letting me use their Facebook profile images.

Photo Credits

Wednesday, March 9, 2011

Large Scale Graph Algorithms (class lectures), Yuri Lifshits, Steklov Institute of Mathematics at St. Petersburg

http://mathworld.wolfram.com/Set.html

Programming Collective Intelligence, Toby Segaran

The Algorithm Design Manual, Steve S. Skiena

References

Wednesday, March 9, 2011

top related