Fast Failure Recovery in Distributed Graph Processing System Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University Big Data Final Seminar VLDB 2014 H.V. Jagadish University of Michigan
25
Embed
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Fast Failure Re-covery
in Distributed Graph Processing
SystemPresented By HaeJoon Lee
Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tu-dor
National University of SingaporeWei LuRenmin University
Cang ChenZhejiang University
Big Data Final Seminar
VLDB 2014
H.V. JagadishUniversity of Michigan
Outline
Presented by Haejoon Lee
Partition Based Recovery
Implementation
Evaluation
Conclusion
Motivation
Background
1 Background
2 / 20Presented by Haejoon Lee
Distributed Graph Processing System- The set of vertices and edges is divided partitions.- The partitions are distributed among compute nodes.
Bulk Synchronous ParallelComputation model in DGPS.
Each worker executes input phase. Then they are iteratively process-ing by Global Barrier.
1 Background
3 / 20Presented by Haejoon Lee
Scaling the # of nodes causes two effects - It increase the # of failed nodes during job-execution.- System progress stops during recovery, so a # of nodes could become idle.
For these reasons, we need efficient failure recovery system
Why do you think?
2 Motivation
4 / 20Presented by Haejoon Lee
Checkpoint Based Recovery Flow- Requires nodes to write the status to storage as checkpoint.- Uses healthy nodes to load the status from the last check-point.- Re-executes all the missing workloads.
However, CBR causes high recovery latency.- Re-executes the missing workloads over the whole graph in failed and even healthy nodes.
2 Motivation
5 / 20Presented by Haejoon Lee
Problem in Cascading Failure - Def. failure occurs during normal execution at any time. - Frequent check-pointing will incur long execution time.
Proposes Fast Failure Recovery (Partition Based Recovery)
Outline
6 / 20Presented by Haejoon Lee
Partition Based Recovery
Implementation
Evaluation
Conclusion
Motivation
Background
3 Partition Based Recovery
7 / 20Presented by Haejoon Lee
Execution Flow- Restricts recovery of subgraph in only failed nodes using log msg.- Divides the subgraphs in only failed nodes into partitions.- Distribute these partitions among computer nodes.- Reload these partitions from the last checkpoint and rebalance it
What is locally log message in PBR? - PBR require every node to log its outgoing msg at the end of super step.
- Every healthy node forwards the log msg to vertices in failed partitions.
3 CBR vs PBR
8 / 20Presented by Haejoon Lee
C D
A B
E F
Checkpoint Based Recovery
N1
N2
C ’ D ‘
A ‘ B ‘
E ‘ F ‘
Each node storage has Checkpoint
CBR incurs HIGH computation cost and communication cost
< If N2 node fails >
3 CBR vs PBR
9 / 20Presented by Haejoon Lee
Partition Based Recovery
A B
C D
E F
N1
N2
< If N2 node fails >
3 Details of PBR
10 / 20Presented by Haejoon Lee
A B
C D E
F
N1 N2
1. Reassignment Partition- Random assigning partitions
- In each iteration calculate the above one for Cost
- Check the minimal cost
- Find the optimal partition based minimal cost
Optimal Partition after checking generated partition
- Healthy Partition D for-wards locally log msg to vertices in failed partitions
A BC D E F
A BC D E F
Superstep 11
Superstep 12
< If N2 node fails in Super step 12>
Assuming: the latest checkpoint is in super step 11
Locally log messageCompute vertices from checkpoint
FailedHealthy
3 Details of PBR
12 / 20Presented by Haejoon Lee
3. Re-balance configuration if each node’s one is different
How to handle Cascading Failure?- Unlike the CBR’s handling, PBR treats cascading failure as normal failure by executing these 3 steps- In practice, the occurrence of failure is not very frequent.
4 PBR Architecture on Giraph
13 / 20Presented by Haejoon Lee
Master - ‘Assign Partitions’ as recovery plan and save it to Zookeeper
Zookeeper - a centralized service for maintaining configuration in-formation, naming, providing distributed synchronization
Slaves - fetch the partitions from Zookeeper
If ( slaves are in checkpoint step )they do checkpoint, and perform computation
Else if ( slaves are failed as restart ) they load partitions and perform computation
Outline
Presented by Haejoon Lee
Partition Based Recovery
Implementation
Evaluation
Conclusion
Motivation
Background
5 Experimental SetupCBR vs PBR
Benchmark - *K-means, Semi-clustering, and *PageRank- Runs all the tasks for 20 super steps.- Performs a checkpoint at the beginning of step 11.
PBR outperforms 12.4 to 25.7 than CBR.The recovery time of two function in-crease linearly.
PBR takes almost the same time as CBR.- No outgoing msg among differ vertices in K-means- The time of checkpoint is negligible compared to com-puting the new belonging clusters
5 Evaluation- K-means CBR vs PBR
16 / 20Presented by Haejoon Lee
These experiments verify the effectiveness of PBR, which parallelizes computation and eliminates unnecessary recovery cost.
< Checkpoint at the beginning of super step 11 >
PBR outperforms 6.8 to 23.9 than CBR- In CBR, no matter how many nodes fail be-cause they have to reload all computation
PBR can reduce recovery time by 23.8 to 26.8 than CBR.
5 Evaluation- PageRank CBR vs PBR
17 / 20Presented by Haejoon Lee
< Checkpoint at the beginning of super step 11 >
PBR takes slightly more time than CBR.- Friendester’s property is Power-law links.- Each super step involve a # of forwarding logged msg via Disk I/O.
Check Pointing
5 Evaluation- PageRank CBR vs PBR
18 / 20Presented by Haejoon Lee
These experiments verify the effectiveness of PBR, which parallelizes computation and eliminates unnecessary recovery cost.
< Checkpoint at the beginning of super step 11 >
6 Conclusion
19 / 20Presented by Haejoon Lee
Partition based recovery is proposed as novel recovery system which parallelize failure recovery processing.
This system distributes the recovery task to multiple compute nodes such that the recovery processing can be executed concur-rently
It is implemented on the widely used Girpah system and observe outperforms existing checkpoint-based recovery stem by up to 30 times
Thanks
6 Backup: Semi-Clustering
Master Seminar PresentationPresented by Haejoon Lee
6 PBR Architecture on Giraph
13 / 20Presented by Haejoon Lee
Master - ‘Assign Partitions’ as recovery plan and save it to Zookeeper
Slaves fetch the partitions from Zookeeper - If they are in checkpoint step, they do and perform computation- If they are in fail as restart, they load partitions and perform it
6 Backup: Communication Cost of PR
Master Seminar PresentationPresented by Haejoon Lee