Single-Source Shortest Path on MapReduce Shenglei Zhao
Single-SourceShortest Path on
MapReduce
Shenglei Zhao
Social Media Recommendation
3
Motivation
Determine the appropriate person to recommend in Social Media (Mainly Undirected here)
Help calculate the Betweeness Centrality of A Person in a Social Network
(Weighted Graph) Best Route on Google Map
4
Algorithms – Breath-first Search• Determine Root Node (starting Node)• Initial Settings for all Nodes: EdgeList, Distance From Root(initially all Infinite), Status(not visited, being visited, already visited), Parent Node• Iteration of looking for neighbors and change
settings of nodes
5
Iteration• Traverse all records (node ID and node setting), find “BEINGVISITED” nodes (for example: A)• Find their neighbors (B, C …..), apply new settings: for B, C, … Parent: null A Status: TOBEVISITED BEINGVISITED Distance: null A’s distance + 1 for A: Status: BEINGVISITED VISITED• Next Traversal would apply on these neighbors (BEINGVISITED)
6
7
Map Reduce StructurePreprocessing Map & Reduce:
put data into format
<nodeId, nodeInformation
> Reduce: Delete old rows of neighbors
Map:Create new rows for neighborsPostprocessin
g Map & Reduce:
Collect all the distances and
get the average
8
Design - <k, v> evolution
1
• Preprocessing Map & Reduce:• List<n1, n2> <n1, [n2,n3,n4…]> <n1 ID, neighbors
IDs + Distance + Status + Parent Node ID>
2• Recurring Working Map & Reduce (shown in next
slide)
3
• Postprocessing Map & Reduce• <ID2, nodeInformation> <Status, distance> <Status,
averageDistance>
9
Working Map & Reduce Sample1 2,3|0|BEINGVISITED|source2 1,3,4,5|Inf|TOBEVISITED|null3 1,4,2|Inf|TOBEVISITED|null4 2,3|Inf|TOBEVISITED|null5 2|Inf|TOBEVISITED|null
Map
1 2,3|0|BEINGVISITED|source2 1,3,4,5|Inf|TOBEVISITED|null3 1,4,2|Inf|TOBEVISITED|null4 2,3|Inf|TOBEVISITED|null5 2|Inf|TOBEVISITED|null1 |0|BEINGVISITED|source2 |1|BEINGVISITED|13 |1|BEINGVISITED|1
1 2,3|0|VISITED|source2 1,3,4,5|1|BEINGVISITED|13 1,4,2|1|BEINGVISITED|14 2,3|Inf|TOBEVISITED|null5 2|Inf|TOBEVISITED|null
Parent Node this turn
New record created without edge list
Merged recordsWill be picked as parent Node next turn
Next Turn
Reduce
10
Code Snippet
11
Mapper for Core part of Algorithm
12
Reducer for Core part of Algorithm
13
Create an enum variable to represent counter
The counter increments by 1 when a node turns “BEINGVISITED”
Show the counter value on screen to show # of changes
14
Iteration for Job
15
Result
• A list of rows containing complete information
16
Result – distance-only
17
Result – average distance
• The average distance from all nodes to source node (0) is about 2.829
• For random person, Person 0 has to go across about 2.8 immediate persons to find him.
18
Future Work
• Calculate all the average distance to each nodes
• Determine betweeness centrality based on results of all nodes
• Visualization of Network (lower average distance nodes have bigger radius)