Top Banner
Fast Failure Recovery in Distributed Graph Processing System Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University Big Data Final Seminar VLDB 2014 H.V. Jagadish University of Michigan
25

Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Dec 29, 2015

Download

Documents

Dina Dickerson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Fast Failure Re-covery

in Distributed Graph Processing

SystemPresented By HaeJoon Lee

Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tu-dor

National University of SingaporeWei LuRenmin University

Cang ChenZhejiang University

Big Data Final Seminar

VLDB 2014

H.V. JagadishUniversity of Michigan

Page 2: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Outline

Presented by Haejoon Lee

Partition Based Recovery

Implementation

Evaluation

Conclusion

Motivation

Background

Page 3: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

1 Background

2 / 20Presented by Haejoon Lee

Distributed Graph Processing System- The set of vertices and edges is divided partitions.- The partitions are distributed among compute nodes.

Bulk Synchronous ParallelComputation model in DGPS.

Each worker executes input phase. Then they are iteratively process-ing by Global Barrier.

Page 4: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

1 Background

3 / 20Presented by Haejoon Lee

Scaling the # of nodes causes two effects - It increase the # of failed nodes during job-execution.- System progress stops during recovery, so a # of nodes could become idle.

For these reasons, we need efficient failure recovery system

Why do you think?

Page 5: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

2 Motivation

4 / 20Presented by Haejoon Lee

Checkpoint Based Recovery Flow- Requires nodes to write the status to storage as checkpoint.- Uses healthy nodes to load the status from the last check-point.- Re-executes all the missing workloads.

However, CBR causes high recovery latency.- Re-executes the missing workloads over the whole graph in failed and even healthy nodes.

Page 6: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

2 Motivation

5 / 20Presented by Haejoon Lee

Problem in Cascading Failure - Def. failure occurs during normal execution at any time. - Frequent check-pointing will incur long execution time.

Proposes Fast Failure Recovery (Partition Based Recovery)

Page 7: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Outline

6 / 20Presented by Haejoon Lee

Partition Based Recovery

Implementation

Evaluation

Conclusion

Motivation

Background

Page 8: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

3 Partition Based Recovery

7 / 20Presented by Haejoon Lee

Execution Flow- Restricts recovery of subgraph in only failed nodes using log msg.- Divides the subgraphs in only failed nodes into partitions.- Distribute these partitions among computer nodes.- Reload these partitions from the last checkpoint and rebalance it

What is locally log message in PBR? - PBR require every node to log its outgoing msg at the end of super step.

- Every healthy node forwards the log msg to vertices in failed partitions.

Page 9: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

3 CBR vs PBR

8 / 20Presented by Haejoon Lee

C D

A B

E F

Checkpoint Based Recovery

N1

N2

C ’ D ‘

A ‘ B ‘

E ‘ F ‘

Each node storage has Checkpoint

CBR incurs HIGH computation cost and communication cost

< If N2 node fails >

Page 10: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

3 CBR vs PBR

9 / 20Presented by Haejoon Lee

Partition Based Recovery

A B

C D

E F

N1

N2

< If N2 node fails >

Page 11: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

3 Details of PBR

10 / 20Presented by Haejoon Lee

A B

C D E

F

N1 N2

1. Reassignment Partition- Random assigning partitions

- In each iteration calculate the above one for Cost

- Check the minimal cost

- Find the optimal partition based minimal cost

Optimal Partition after checking generated partition

Page 12: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

3 Details of PBR

11 / 20Presented by Haejoon Lee

A BC D E F

N1 N22. Recomputation Missing Workload- Failed partitions (A,B),

(C,D)load checkpoint in step11

- Healthy Partition D for-wards locally log msg to vertices in failed partitions

A BC D E F

A BC D E F

Superstep 11

Superstep 12

< If N2 node fails in Super step 12>

Assuming: the latest checkpoint is in super step 11

Locally log messageCompute vertices from checkpoint

FailedHealthy

Page 13: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

3 Details of PBR

12 / 20Presented by Haejoon Lee

3. Re-balance configuration if each node’s one is different

How to handle Cascading Failure?- Unlike the CBR’s handling, PBR treats cascading failure as normal failure by executing these 3 steps- In practice, the occurrence of failure is not very frequent.

Page 14: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

4 PBR Architecture on Giraph

13 / 20Presented by Haejoon Lee

Master - ‘Assign Partitions’ as recovery plan and save it to Zookeeper

Zookeeper - a centralized service for maintaining configuration in-formation, naming, providing distributed synchronization

Slaves - fetch the partitions from Zookeeper

If ( slaves are in checkpoint step )they do checkpoint, and perform computation

Else if ( slaves are failed as restart ) they load partitions and perform computation

Page 15: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Outline

Presented by Haejoon Lee

Partition Based Recovery

Implementation

Evaluation

Conclusion

Motivation

Background

Page 16: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

5 Experimental SetupCBR vs PBR

Benchmark - *K-means, Semi-clustering, and *PageRank- Runs all the tasks for 20 super steps.- Performs a checkpoint at the beginning of step 11.

Cluster - 72 Compute Nodes- Intel X3430 2.4GHZ, 8GB memory, 2 * 500GB HDD

- Giraph with PBR runs as MapReduce job on Hadoop

Dataset

14 / 20Presented by Haejoon Lee

Page 17: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

5 Evaluation- K-means CBR vs PBR

15 / 20Presented by Haejoon Lee

< Checkpoint at the beginning of super step 11 >

PBR outperforms 12.4 to 25.7 than CBR.The recovery time of two function in-crease linearly.

PBR takes almost the same time as CBR.- No outgoing msg among differ vertices in K-means- The time of checkpoint is negligible compared to com-puting the new belonging clusters

Page 18: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

5 Evaluation- K-means CBR vs PBR

16 / 20Presented by Haejoon Lee

These experiments verify the effectiveness of PBR, which parallelizes computation and eliminates unnecessary recovery cost.

< Checkpoint at the beginning of super step 11 >

PBR outperforms 6.8 to 23.9 than CBR- In CBR, no matter how many nodes fail be-cause they have to reload all computation

PBR can reduce recovery time by 23.8 to 26.8 than CBR.

Page 19: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

5 Evaluation- PageRank CBR vs PBR

17 / 20Presented by Haejoon Lee

< Checkpoint at the beginning of super step 11 >

PBR takes slightly more time than CBR.- Friendester’s property is Power-law links.- Each super step involve a # of forwarding logged msg via Disk I/O.

Check Pointing

Page 20: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

5 Evaluation- PageRank CBR vs PBR

18 / 20Presented by Haejoon Lee

These experiments verify the effectiveness of PBR, which parallelizes computation and eliminates unnecessary recovery cost.

< Checkpoint at the beginning of super step 11 >

Page 21: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

6 Conclusion

19 / 20Presented by Haejoon Lee

Partition based recovery is proposed as novel recovery system which parallelize failure recovery processing.

This system distributes the recovery task to multiple compute nodes such that the recovery processing can be executed concur-rently

It is implemented on the widely used Girpah system and observe outperforms existing checkpoint-based recovery stem by up to 30 times

Page 22: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Thanks

Page 23: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

6 Backup: Semi-Clustering

Master Seminar PresentationPresented by Haejoon Lee

Page 24: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

6 PBR Architecture on Giraph

13 / 20Presented by Haejoon Lee

Master - ‘Assign Partitions’ as recovery plan and save it to Zookeeper

Slaves fetch the partitions from Zookeeper - If they are in checkpoint step, they do and perform computation- If they are in fail as restart, they load partitions and perform it

Page 25: Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

6 Backup: Communication Cost of PR

Master Seminar PresentationPresented by Haejoon Lee