Top Banner
HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng IEEE 2007 Dec 3, 2014 Kyung-Bin Lim
32

HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

Jan 19, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

HAMA: An Efficient Matrix Computation with the MapReduce Framework

Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul MaengIEEE 2007

Dec 3, 2014Kyung-Bin Lim

Page 2: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

2 / 35

Outline

Introduction Methodology Experiments Conclusion

Page 3: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

3 / 35

Apache HAMA

Easy-of-use tool for data-intensive scientific computation Massive matrix/graph computations are often used as primary

functionalities Fundamental design is changed from MapReduce with matrix

computation to BSP with graph processing Mimic of Pregel running on HDFS

– Use zookeeper as a synchronization barrier

Page 4: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

4 / 35

Our Focus

This paper is a story about previous version 0.1 of HAMA– Latest version: 0.7.0, Mar. 2014 released

Only Focus on matrix computation with MapReduce Shows simple case studies

Page 5: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

5 / 35

The HAMA Architecture

We propose distributed scientific framework called HAMA (based on HPMR)– Provide transparent matrix/graph primitives

Page 6: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

6 / 35

The HAMA Architecture

HAMA API: Easy-to-use Interface HAMA Core: Provides matrix/graph primitives HAMA Shell: Interactive User Console

Page 7: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

7 / 35

Contributions of HAMA

Compatibility– Take advantage of all Hadoop features

Scalability– Scalable due to compatibility

Flexibility– Multiple Compute Engines Configurable

Applicability– HAMA’s primitives can be applied to various applications

Page 8: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

8 / 35

Outline

Introduction Methodology Experiments Conclusion

Page 9: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

9 / 35

Case Study

With case study approach, we introduce two basic primitives with MapReduce model running on HAMA– Matrix multiplication and finding linear solution

And compare with MPI versions of these primitives

Page 10: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

10 / 35

Case Study

Representing matrices– As a defaults, HAMA use HBase (NoSQL database)

HBase is modeled after Google’s Bigtable Column oriented, semi-structured distributed database with high scalability

Page 11: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

11 / 35

Case Study – Multiplication: Iterative Way

Iterative approach (Algorithm)

Page 12: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

12 / 35

Case Study – Multiplication: Iterative Way

Simple, naïve strategy

Works well with sparse matrix

Sparse matrix: most entries are 0

Page 13: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

13 / 35

Multiplication: Iterative Way

Page 14: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

14 / 35

Multiplication: Iterative Way

Page 15: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

15 / 35

Multiplication: Iterative Way

Page 16: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

16 / 35

Multiplication: Iterative Way

Page 17: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

17 / 35

Multiplication: Iterative Way

Page 18: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

18 / 35

Multiplication: Iterative Way

Page 19: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

19 / 35

Case Study – Multiplication: Block Way

Multiplication can be done using sub-matrix

Works well with dense matrix

Page 20: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

20 / 35

Case Study – Multiplication: Block Way

Block Approach– Minimize data movement (network cost)

Page 21: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

21 / 35

Case Study – Multiplication: Block Way

Block Approach (Algorithm)

Page 22: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

22 / 35

Case Study – Finding Linear Solution

Ax =b– x = ?

A: known square symmetric positive-definite matrix b: known vector

Use Conjugate Gradient approach

Page 23: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

25 / 35

Case Study – Finding Linear Solution

Conjugate Gradient Method– Find a direction (conjugate direction)– Find a step size (Line search)

Page 24: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

26 / 35

Case Study – Finding Linear Solution

Conjugate Gradient Method (Algorithm)

Page 25: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

27 / 35

Outline

Introduction Methodology Experiments Conclusion

Page 26: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

28 / 35

Evaluations

TUSCI (TU Berlin SCI) Cluster– 16 nodes, two Intel P4 Xeon processors, 1GB memory– Connected with SCI (Scalable Coherent Interface) network interface in a 2D

torus topology– Running in OpenCCS (similar environment of HOD)

Test sets

Page 27: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

29 / 35

HPMR’s Enhancements

Prefetching– Increase Data Locality

Pre-shuffling– Reduces Amount of intermediate outputs to shuffle

Page 28: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

30 / 35

Evaluations

The comparison of average execution time and scaleup with Ma-trix Multiplication

Page 29: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

31 / 35

Evaluations

The comparison of average execution time and scaleup with CG

Page 30: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

32 / 35

Evaluations

The comparison of average execution time with CG, when a single node is overloaded

Page 31: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

33 / 35

Outline

Introduction Methodology Experiments Conclusion

Page 32: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

34 / 35

Conclusion

HAMA provides the easy-of-use tool for data-intensive computa-tions– Matrix computation with MapReduce– Graph computation with BSP