Top Banner
Single-Source Shortest Path on MapReduce Shenglei Zhao
18

Hadoop_FinalProject_ShengleiZhao

Feb 19, 2017

Download

Documents

Shenglei Zhao
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hadoop_FinalProject_ShengleiZhao

Single-SourceShortest Path on

MapReduce

Shenglei Zhao

Page 2: Hadoop_FinalProject_ShengleiZhao

Social Media Recommendation

Page 3: Hadoop_FinalProject_ShengleiZhao

3

Motivation

Determine the appropriate person to recommend in Social Media (Mainly Undirected here)

Help calculate the Betweeness Centrality of A Person in a Social Network

(Weighted Graph) Best Route on Google Map

Page 4: Hadoop_FinalProject_ShengleiZhao

4

Algorithms – Breath-first Search• Determine Root Node (starting Node)• Initial Settings for all Nodes: EdgeList, Distance From Root(initially all Infinite), Status(not visited, being visited, already visited), Parent Node• Iteration of looking for neighbors and change

settings of nodes

Page 5: Hadoop_FinalProject_ShengleiZhao

5

Iteration• Traverse all records (node ID and node setting), find “BEINGVISITED” nodes (for example: A)• Find their neighbors (B, C …..), apply new settings: for B, C, … Parent: null A Status: TOBEVISITED BEINGVISITED Distance: null A’s distance + 1 for A: Status: BEINGVISITED VISITED• Next Traversal would apply on these neighbors (BEINGVISITED)

Page 6: Hadoop_FinalProject_ShengleiZhao

6

Page 7: Hadoop_FinalProject_ShengleiZhao

7

Map Reduce StructurePreprocessing Map & Reduce:

put data into format

<nodeId, nodeInformation

> Reduce: Delete old rows of neighbors

Map:Create new rows for neighborsPostprocessin

g Map & Reduce:

Collect all the distances and

get the average

Page 8: Hadoop_FinalProject_ShengleiZhao

8

Design - <k, v> evolution

1

• Preprocessing Map & Reduce:• List<n1, n2> <n1, [n2,n3,n4…]> <n1 ID, neighbors

IDs + Distance + Status + Parent Node ID>

2• Recurring Working Map & Reduce (shown in next

slide)

3

• Postprocessing Map & Reduce• <ID2, nodeInformation> <Status, distance> <Status,

averageDistance>

Page 9: Hadoop_FinalProject_ShengleiZhao

9

Working Map & Reduce Sample1 2,3|0|BEINGVISITED|source2 1,3,4,5|Inf|TOBEVISITED|null3 1,4,2|Inf|TOBEVISITED|null4 2,3|Inf|TOBEVISITED|null5 2|Inf|TOBEVISITED|null

Map

1 2,3|0|BEINGVISITED|source2 1,3,4,5|Inf|TOBEVISITED|null3 1,4,2|Inf|TOBEVISITED|null4 2,3|Inf|TOBEVISITED|null5 2|Inf|TOBEVISITED|null1 |0|BEINGVISITED|source2 |1|BEINGVISITED|13 |1|BEINGVISITED|1

1 2,3|0|VISITED|source2 1,3,4,5|1|BEINGVISITED|13 1,4,2|1|BEINGVISITED|14 2,3|Inf|TOBEVISITED|null5 2|Inf|TOBEVISITED|null

Parent Node this turn

New record created without edge list

Merged recordsWill be picked as parent Node next turn

Next Turn

Reduce

Page 10: Hadoop_FinalProject_ShengleiZhao

10

Code Snippet

Page 11: Hadoop_FinalProject_ShengleiZhao

11

Mapper for Core part of Algorithm

Page 12: Hadoop_FinalProject_ShengleiZhao

12

Reducer for Core part of Algorithm

Page 13: Hadoop_FinalProject_ShengleiZhao

13

Create an enum variable to represent counter

The counter increments by 1 when a node turns “BEINGVISITED”

Show the counter value on screen to show # of changes

Page 14: Hadoop_FinalProject_ShengleiZhao

14

Iteration for Job

Page 15: Hadoop_FinalProject_ShengleiZhao

15

Result

• A list of rows containing complete information

Page 16: Hadoop_FinalProject_ShengleiZhao

16

Result – distance-only

Page 17: Hadoop_FinalProject_ShengleiZhao

17

Result – average distance

• The average distance from all nodes to source node (0) is about 2.829

• For random person, Person 0 has to go across about 2.8 immediate persons to find him.

Page 18: Hadoop_FinalProject_ShengleiZhao

18

Future Work

• Calculate all the average distance to each nodes

• Determine betweeness centrality based on results of all nodes

• Visualization of Network (lower average distance nodes have bigger radius)