1 QSX: Querying Social Graphs Graph algorithms in MapReduce MapReduce: an introduction BFS for distance queries PageRank Keyword search Subgraph isomorphism
1
QSX: Querying Social Graphs
Graph algorithms in MapReduce
MapReduce: an introduction
BFS for distance queries
PageRank
Keyword search
Subgraph isomorphism
2
Motivation for studying parallel algorithms
Worse still
regular path queries (finding simple paths): NP-complete
Graph pattern matching via subgraph isomorphism: NP-complete
Graph queries are costlyBFS for reachability: linear time O(|V| + |E|)kNN join: quadratic timeGraph pattern matching by graph simulation: quadratic time
Can we efficiently evaluate graph queries?
Furthermore, real-life graphs are typically big
Facebook: 1.38 billion nodes and 140 billion links
3
The impact of the sheer volume of big data
Using SSD of 6G/s, a linear scan of a data set DD would take
1.9 days when DD is of 1PB (1015B)
5.28 years when DD is of 1EB (1018B)
Is it feasible to query real-life big graphs?
A departure from classical computational complexity theory
Traditional computational complexity theory of almost 50 years:
• The good: polynomial time computable (PTIME)
• The bad: NP-hard (intractable)
• The ugly: PSPACE-hard, EXPTIME-hard, undecidable…
Parallel query answering
We can do better provided more resources
10,000 processors
How to capitalize on the resources and reduce response time? 4
Using 10000 SSD of 6G/s, a linear scan of DD might take: 1.9 days/10000 = 16 seconds when DD is of 1PB (1015B)5.28 years/10000 = 4.63 days when DD is of 1EB (1018B)
Only ideally!
DB
M
DB
M
DB
M
interconnection network
P P P
5
Pipelined parallelism
Pipelining: a sequence of operations on each data item, one conducted by a processors
What are the limitations?
DB
M
DB
M
DB
M
interconnection network
P P P
op1 op2 opn
data
Parallel query answering
Given a big graph G, and n processors S1, …, SnG is partitioned into fragments (G1, …, Gn) G is distributed to n processors: Gi is stored at Si
Dividing a big G into small fragments of manageable size
Each processor Si processes operations for a query on its local fragment Gi, in parallel
Q( )
GGQ( )
G1G1Q( )GnGnQ( )G2G2
…
6
MapReduce
77
8
MapReduce
A programming model with two primitive functions:
How does it work?
Input: a list <k1, v1> of key-value pairs
Map: applied to each pair, computes key-value pairs <k2,
v2>
• The intermediate key-value pairs are hash-partitioned based on k2. Each partition (k2, list(v2)) is
sent to a reducer
Reduce: takes a partition as input, and computes key-value pairs <k3, v3>
Map: <k1, v1> list (k2, v2)
Reduce: <k2, list(v2)> list (k3, v3)
The process may reiterate – multiple map/reduce steps
9
Architecture (Hadoop)
No need to worry about how the data is stored and sent
<k1, v1>
mapper mappermapper
<k1, v1> <k1, v1> <k1, v1>
reducer reducer
<k2, v2> <k2, v2> <k2, v2>
<k3, v3> <k3, v3>
One block for each mapper (a map task)
In local store of mappers
Stored in DFS Partitioned in blocks (64M)
Hash partition (k2)
Aggregate results
Multiple steps
10
parallelism
Data partitioned parallelism
<k1, v1>
mapper mappermapper
<k1, v1> <k1, v1> <k1, v1>
reducer reducer
<k2, v2> <k2, v2> <k2, v2>
<k3, v3> <k3, v3>
Parallel computation
Parallel computation
What parallelism?
11
Popular in industry
Apache Hadoop, used by Facebook, Yahoo, …• Hive, Facebook, HiveQL (SQL)• PIG, Yahoo, Pig Latin (SQL like)• SCOPE, Microsoft, SQL, Cassandra, Facebook, CQL (no join)• HBase, Google, distributed BigTable• MongoDB, document-oriented (NoSQL)
Scalability
– Yahoo!: 10,000 cores, for Web search queries (2008)
– Facebook: 100 PB, about half a PB per day
– Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3); New York Time used 100 EC2 instances to process 4TB of image data, $240
Study Spark: https://spark.apache.org/
12
Advantages of MapReduce
Simple: one only needs to define two functions
no need to worry about how the data is stored, distributed and how the operations are scheduled
Fault tolerance: why?
scalability: a large number of low end machines• scale out (scale horizontally): adding a new computer to a
distributed software application; lost-cost “commodity”• scale up (scale vertically): upgrade, add (costly) resources
to a single node
flexibility: independent of data models or schema
independence: it can work with various storage layers
13
Fault tolerance
Able to handle an average of 1.2 failures per analysis job
<k1, v1>
mapper mappermapper
<k1, v1> <k1, v1> <k1, v1>
reducer reducer
<k2, v2> <k2, v2> <k2, v2>
<k3, v3> <k3, v3>
triplicated
Detecting failures and reassigning the tasks of failed nodes to healthy nodes
Redundancy checking to achieve load balancing
14
MapReduce algorithms
map(key: node, value: (adjacency-list, others) )
{computation;
emit (mkey, mvalue)
}
Input: query Q and graph G
Output: answers Q(G) to Q in G
compatibility
Match mkey, mvalue
reduce(key: __ , value: list[value] )
{ …
emit (rkey, rvalue)
}
Match rkey, rvalue when multiple iterations of MapReduce are needed
15
Control flow
while (termination condition is not satisfied) do {
a) map from staging dir 1;
b) reduce into staging dir 2;
c) move files from staging dir 2 to staging dir 1
}
Copy files from input directory staging dir 1; preprocessing
No global data structures accessible and mutable by all
Iterations of MapReduce
Postprocessing; move files from staging dir 2 to output dir
Termination: non-MapReduce driver program
Functional programming
BFS for distance queries
16
17
Dijkstra’s algorithm for distance queries
Distance: single-source shortest-path problem• Input: A directed weighted graph G, and a node s in G• Output: The lengths of shortest paths from s to all nodes in G
Dijkstra (G, s, w): 1. for all nodes v in V do
a. d[v] ;
2. d[s] 0; Que V;
3. while Que is nonempty do
a. u ExtractMin(Que);
b. for all nodes v in adj(u) do
a) if d[v] > d[u] + w(u, v) then d[v] d[u] + w(u, v);
Use a priority queue Que; w(u, v): weight of edge (u, v); d(u): the distance from s to u
Use a priority queue Que; w(u, v): weight of edge (u, v); d(u): the distance from s to u
Extract one with the minimum d(u)Extract one with the minimum d(u)
O(|V| log|V| + |E|).
MapReduce?MapReduce?
18
A MapReduce algorithm
key-values pairs
Input: graph G, represented by adjacency listsNode N:
• Node id: nid n
• N.distance: from start node s to N
• N.AdjList: [(m, w(n, m))], node id and weight of edge (n, m)
Key: node id n
Value of node N:
• Distance: from start node s to n got so far
• Node N (id, AdjList, etc)
Different structures
19
Mapper
Data-partitioned parallelism
Parallel processing
all nodes are processed in parallel, each by a mapper
for each node m adjacent to n, emit a revised distance via n
Map (nid n, node N)•d N.distance; •emit( nid n, N); •for each (m, w) in N.AdjList do
• emit( nid m, d + w(n, m));
Revise distance of m via n
Why?
emit (nid: n, N): preserve graph structure for iterative
processing
20
Reducer
Current M.distance: minimum from all predecessors
list for m: distances from all predecessors so far Node M: must exist (from Mapper)
Reduce (nid m, list)
•dmin ;
•for all d in list do • if IsNode(d)
• then M d;
• else if d < dmin
• then dmin d;
• M.distance dmin;
• emit (nid m, node M);
Group by node idEach d in list is either a distance to m from a predecessoror node m
Always be there. Why?
Minimum distance so far
Update M.distance for this round
21
Iterations and termination
Termination control
Each MapReduce iteration advances the “known frontier” by one hop
Subsequent iterations include more and more reachable nodes as frontier expands
Multiple iterations are needed to explore entire graph
Termination: when the intermediate result no longer changes
For no node n, N.distance is changed in the last round
controlled by a non-MapReduce driver
Use a flag – inspected by non-MapReduce driver
Iteration 0: Base case
22
mapper: (a,<c,10>) (c,<s,5>) edgesreducer: (a,<10, ...>) (c,<5, ...>)
"Wave"
0s
∞
∞
∞
∞
a b
c d
10
5
2 3
1
9
7
2
4 6
Iteration 1
23
mapper: (a,<s,10>) (c,<s,5>) (a,<c,8>) (c,<a,12>) (b,<a,11>) (b,<c,14>) (d,<c,7>) edges
reducer: (a,<8, ...>) (c,<5, ...>) (b,<11, ...>) (d,<7, ...>)
0
10
5
∞
∞
10
5
2 3 9
7
4 6s
a b
c d
1
2
"Wave"group (a,<s,10>) and (a,<c,8>)
Iteration 2
24
mapper: (a,<s,10>) (c,<s,5>) (a,<c,8>) (c,<a,12>) (b,<a,11>) (b,<c,14>) (d,<c,7>) (b,<d,13>) (d,<b,15>) edgesreducer: (a,<8>) (c,<5>) (b,<9>) (d,<7>)
0
8
5
9
7
10
5
2 3 9
7
4 6s
a b
c d
1
2
"Wave"
Iteration 3
25
mapper: (a,<s,10>) (c,<s,5>) (a,<c,8>) (c,<a,9>) (b,<a,11>) (b,<c,14>) (d,<c,7>) (b,<d,13>) (d,<b,15>) edges
reducer: (a,<8>) (c,<5>) (b,<11>) (d,<7>)No change! Convergence!No change! Convergence!
0
8
5
11
7
10
5
2 3 9
7
4 6s
a b
c d
1
2
26
Efficiency?
Any other sources of inefficiency?
MapReduce explores all paths in parallel
Each MapReduce iteration advances the “known frontier” by one hop
• Redundant work, since useful work is only done at the “frontier”
Dijkstra’s algorithm is more efficient
• At any step it only pursues edges from the minimum-cost path inside the frontier
skew
27
A closer look
Need a way to test for convergence
Data partitioned parallelism Local computation at each node in mapper, in parallel:
attributes of the node, adjacent edges and local link structures Propagating computations: traversing the graph; this may
involve iterative MapReduce
Tips: Adjacency lists Local computation in mapper; Pass along partial results via outlinks, keyed by destination node; Perform aggregation in reducer on inlinks to a node Iterate until convergence: controlled by external “driver” pass graph structures between iterations
PageRank
2828
29
PageRank
The likelihood that page v is visited by a random walk:
(1/|V|) + (1 - ) _(u L(v)) P(u)/C(u)
Recursive computation: for each page v in G,• compute P(v) by using P(u) for all u L(v)
until• converge: no changes to any P(v)• after a fixed number of iterations
random jumprandom jump following a link from other pagesfollowing a link from other pages
How to speed it up?
30
A MapReduce algorithm
Assume that = 0
Input: graph G, represented by adjacency listsNode N:
• Node id: nid n• N.rank: the current rank• N.AdjList: [m], node id
Key: node id n
Value of node N:
• rank: a rank of a node
• Node N (id, AdjList, etc)
Simplified version: _(u L(v)) P(u)/C(u)
31
Mapper
Local computation in mapper
Parallel processing
all nodes are processed in parallel, each by a mapper
Pass PageRank at n to successors of n
Map (nid n, node N)•p N.rank/|N.AdjList|;•emit( nid n, N); •for each m in N.AdjList do
• emit( nid m, p);Pass rank to neighbors
P(u)/C(u)
emit (nid: n, N): preserve graph structure for iterative
processing
32
Reducer
Aggregation in reducer
list for m: P(u)/C(u) from all predecessors of m
m.rank at the end: _(u L(v)) P(u)/C(u)
Reduce (nid m, list)•s 0; •for all p in list do
• if IsNode(p) • then M p; • else s s + p;
• M.rank s;• emit (nid m, node M);
Sum up
With updated M.rank for this round
Recover graph structure
PageRank in MapReduce
n5 [n1, n2, n3]n1 [n2, n4] n2 [n3, n5] n3 [n4] n4 [n5]
n2 n4 n3 n5 n1 n2 n3n4 n5
n2 n4n3 n5n1 n2 n3 n4 n5
n5 [n1, n2, n3]n1 [n2, n4] n2 [n3, n5] n3 [n4] n4 [n5]
Map
Reduce
Acknowledgments: some animation slides are borrowed from
www.cs.kent.edu/~jin/Cloud12Spring/GraphAlgorithms.pptx
Termination control: external driver
Keyword search
3434
35
Distinct-root trees
k dj(r, pj): k iterations (termination condition) 35
Match: a subtree T = (r, (k1, p1, d1(r, p1)), …, (km, pm, dm(r, pm)) of G such that
• each keyword ki in Q is contained in a leaf pi of T
• pi is closest to r among all nodes that contain ki • the distance from the root r of T to the lead does not exceed k
Input: A list Q = (k1, …, km) of keywords, a directed graph G, and a positive integer k
Output: distinct trees that match Q bounded by k
A simplified version
36
An MapReduce algorithm
Preprocessing: can be done in MapReduce itself
Input: graph G, represented by adjacency listsNode N:
• Node id: nid n• N.((K1, P1, D1), …, (Km, Pm, Dm) : representing (n, (k1, p1,
d1(n, p1)), …, (km, pm, dm(n, pm))
• N.AdjList: [m], node id
Key: node id n
Preprocessing: N.((K1, P1, D1), …, (Km, Pm, Dm):
• P1 = and Dm = if N does not contain km
• P1 = n and Dm = 0 otherwise
37
Mapper
Pass information from successors
Local computation:
Shortcut one node
One hop forward
Map (nid n, node N)•emit( nid n, N); •for each m in N.AdjList do
• emit( nid n, (M.(K1, P1, D1+1), …, M.(Km, Pm, Dm+1));
Contrast this to, e.g., PageRank
m is the node id of node M
38
Reducer
Shortest distances within j hops
Invariant: in iteration j, N.((K1, P1, D1), …, (Km, Pm, Dm) represents (n, (k1, p1, d1(n, p1)), …, (km, pm, dm(n, pm))
Reduce (nid n, list)•for i from 1 to m do
• pi N. Pi; di N. di;
•for i from 1 to m do
• Si the set of all M.(Ki, Pi, Di) in list
• di the smallest M.Di; pi the corresponding M. Di;
•for i from 1 to m do
• N.Pi pi; N.Di di;
• emit (nid n, node N);
Group by keyword ki
Pick the one with the shortest distance to n
N: the node represented by n; must be in list
39
Termination and post-processing
A different way of passing information during traversal
Termination: after k iterations, for a given positive integer k
Post-processing: upon termination, for each node n, where
N.((K1, P1, D1), …, (Km, Pm, Dm)
If no Pi = for i from 1 to m, then
N.((K1, P1, D1), …, (Km, Pm, Dm) represents a valid match
(n, (k1, p1, d1(n, p1)), …, (km, pm, dm(n, pm))
Graph pattern matching by subgraph isomorphism
4040
41
Input: a query Q and a data graph G,
Output: all the matches of Q in G, i.e, all subgraphs of G that are isomorphic to Q
a bijective function f on nodes:
(u,u’ ) ∈ Q iff (f(u), f(u’)) G∈
41
Graph pattern matching by subgraph isomorphism
NP-complete
MapReduce?
42
An MapReduce algorithm
Two MapReduce steps: preprocessing, and computation
Input: graph G, represented by adjacency listsNode N:
• Node id: nid n• N.Gd: the subgraph of G rooted at n, consisting of nodes
within d hops of n• N.AdjList: [m], node id
Key: node id n
Preprocessing: for each node n, computes N.Gd
• A MapReduce algorithm of d iterations
• adjacency lists are only used in the preprocessing step
d: the radius of Q
43
Algorithm
Just a conceptual level evaluation
Show the correctness? All and only isomorphic mappings?
Map (nid n, node N)
•compute all matches S of Q in N.Gd
•emit(1, S);
reduce (1, list)•M the union of all sets in list•emit(M, 1);
Invoke any algorithm for subgraph isomorphism: VF2, Ullman
not necessary; just to eliminate duplicates
Parallel scalability? The more processors, the faster? Yes, as long as the number of processors does not exceed the number of nodes of G
Lot of redundant computations
Yes, data locality
Summing up
4444
45
Summary and review
Why do we need parallel algorithms for querying graphs?
What is the MapReduce framework?
How to develop graph algorithms in MapReduce?
– Graph representation
– Local computation in mapper
– Aggregation in reducer
– Termination
Graph algorithms in MapReduce may not be efficient. Why?
Develop your own graph algorithms in MapReduce. Give correctness proof, complexity analysis and performance guarantees for your algorithms
46
Project (1)
Recall strongly connected components (Lecture 2)
46
Implement a MapReduce algorithm that, given a graph G, computes all (maximum) strongly connected components of G
Develop optimization strategies Experimentally evaluate your algorithm, especially its scalability
with the size of G Write a survey on parallel algorithms for computing strongly
connected components, as part of the related work.
A development project
47
Project (2)
Recall strongly kNN joins (Lecture 2)
47
Implement a MapReduce algorithm for evaluating kNN join queries Develop optimization strategies Experimentally evaluate your algorithm Write a survey on parallel algorithms for kNN queries and kNN join
queries, as part of the related work.
A development project
48
Project (3)
Recall keyword search with Steiner-tree semantics (Lecture 2)
48
Implement a MapReduce algorithm for keyword search with Steiner-tree semantics
Develop optimization strategies Experimentally evaluate your algorithm Write a survey on parallel algorithms for keyword search, as part of
the related work.
A development project
49
• W. Fan, F. Geerts, and F. Neven. Making Queries Tractable on Big Data with Preprocessing, VLDB 2013
• Y. Tao, W. Lin. X. Xiao. Minimal MapReduce Algorithms (MMC) http://www.cse.cuhk.edu.hk/~taoyf/paper/sigmod13-mr.pdf
• L. Qin, J. Yu, L. Chang, H. Cheng, C. Zhang, Xuemin Lin: Scalable big graph processing in MapReduce. SIGMOD 2014. http://www1.se.cuhk.edu.hk/~hcheng/paper/SIGMOD2014qin.pdf
• W. Lu, Y. Shen, S. Chen, B. Ooi: Efficient Processing of k Nearest Neighbor Joins using MapReduce. PVLDB 2012. http://arxiv.org/pdf/1207.0141.pdf
• V. Rastogi, A. Machanavajjhala, L. Chitnis, A. Sarma: Finding connected components in map-reduce in logarithmic rounds. ICDE 2013http://arxiv.org/pdf/1203.5387.pdf
Papers for you to review