Author: Shubin Zhang, et al. Institute of Computing Technology, Beijing, China Reported by: Tzu-Li Tai National Cheng Kung University, Taiwan High Performance Parallel and Distributed Systems Lab 2009 IEEE 15 th International Conference on Parallel and Distributed Systems
36
Embed
Accelerating MapReduce with Distributed Memory Cache
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Author: Shubin Zhang, et al.
Institute of Computing Technology, Beijing, China
Reported by: Tzu-Li Tai
National Cheng Kung University, Taiwan
High Performance Parallel and Distributed Systems Lab
2009 IEEE 15th International Conference on Parallel and Distributed Systems
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
A. Background and Motivation
B. Goals and Design Decisions
C. System Overview
D. System Details
E. Experimental Results and Analysis
F. Conclusion and Future Works
G. Future Studies for Topic
E. Discussion: Our Chances
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background and Motivation
Pre-Notes:
- Published in 2009 (1st paper on topic)
- Outdated hardware/software and data size
- Focus on methodology and reasoning of using
distributed cache in Hadoop
- Learn possible tackle points and what to avoid
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background and Motivation
• Shuffle time becomes the bottleneck
M
Inter.
Data
M
Inter.
Data
M
Inter.
Data
R
HDFS pipeline replication
HDFS HDFS HDFS
1 local write
2 remote read
GOAL
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Goals and Design Decisions
• Target clusters are small-scale
- Bandwidth is not scarce
- Node failures are uncommon- Commodity machines
- Heterogeneous
- GB Ethernet
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Goals and Design Decisions
• Stay close to the original
• Retain fault-tolerance (!)
• Local decision-making
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Goals and Design Decisions
• Low-latency, high-throughput access to map
outputs: global storage system
- No central coordinator
- Uniform global namespace- Low-latency, high-throughput data access
- Concurrent access
- Large capacity
- Scalable
⇒ 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒆𝒅𝑴𝒆𝒎𝒐𝒓𝒚 𝑪𝒂𝒄𝒉𝒆
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU