Top Banner
Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir
19

Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Mar 31, 2015

Download

Documents

Brayan Drury
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Investigating Distributed Caching Mechanisms for Hadoop

Gurmeet SinghPuneet Chandra

Rashid Tahir

Page 2: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

GOAL

• Explore the feasibility of a distributed caching mechanism inside Hadoop

Page 3: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Presentation Overview

• Motivation• Design• Experimental Results• Future Work

Page 4: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Motivation

• Disk Access Times are a bottleneck in cluster computing

• Large amount of data is read from disk• DARE• RAMClouds• PACMan – Coordinated Cache Replacement

We want to strike a balance between RAM and Disk Storage

Page 5: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Our Approach

• Integrate Memcached with Hadoop• Used Quickcached and Spymemcached• Reserve a portion of the main memory at each

node to serve as local cache• Local caches aggregate to abstract a distributed

caching mechanism governed by Memcached• Greedy caching strategy• Least Recently Used (LRU) cache eviction policy

Page 6: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Design Overview

Page 7: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Memcached

Page 8: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Design Choice 1

• Simultaneous requests to Namenode and Memcached

Minimizes access latency with additional network overhead

Page 9: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Design Choice 2• Send request to Namenode only in the case of

a cache miss

Minimizes network overhead with increased latency

Page 10: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Design Choice 3

• Datanodes send requests only to Memcached

• Memcached checks for cached blocks

• If cache miss occurs, it contacts the namenode and returns the replicas’ addresses to the datanodes

Page 11: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Global Cache Replacement• LRU based Global Cache Eviction Scheme

Page 12: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Prefetching

Page 13: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Simulation Results

• Test data ranging from 2GB to 24GB• Word Count and Grep

Page 14: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

0 5 10 15 20 25 30 35 400

20

40

60

80

100

Network Overhead vs Cache Size

Cache Size (GB)

% O

verh

eadWord Count

Page 15: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Word Count

0 5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

1

Hit Ratio vs Cache Size

Cache Size (GB)

Cach

e H

it Ra

tio

Page 16: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Grep

0 5 10 15 20 25 30 350

20

40

60

80

100

Network Overhead vs Cache Size

Cache Size (GB)

% O

verh

ead

Page 17: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Grep

0 5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

1

Hit Ratio vs Cache Size

Cache Size (GB)

Hit

Ratio

Page 18: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Future Work

• Implement a pre-fetching mechanism• Customized caching policies based on access

patterns• Compare and contrast caching with locality

aware scheduling

Page 19: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Conclusion

• Caching can improve the performance of cluster based systems based on the access patterns of the workload being executed