Top Banner
Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo˜ ao Paiva, Pedro Ruivo, Paolo Romano, Lu´ ıs Rodrigues Instituto Superior T´ ecnico / Inesc-ID, Lisboa, Portugal June 27, 2013
48

Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Jun 03, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Autoplacer: Scalable Self-Tuning DataPlacement in Distributed Key-value Stores

ICAC’13

Joao Paiva, Pedro Ruivo, Paolo Romano, Luıs Rodrigues

Instituto Superior Tecnico / Inesc-ID, Lisboa, Portugal

June 27, 2013

Page 2: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Outline

Introduction

Our approach

Evaluation

Conclusions

Page 3: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Motivation

Collocating processing with storage can improve performance.

I Using random placement, nodes waste resources due tonode-intercommunication.

I Optimize data placement to improve locality and to reduceremote requests.

Page 4: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Motivation

Collocating processing with storage can improve performance.

I Using random placement, nodes waste resources due tonode-intercommunication.

I Optimize data placement to improve locality and to reduceremote requests.

Page 5: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Motivation

Collocating processing with storage can improve performance.

I Using random placement, nodes waste resources due tonode-intercommunication.

I Optimize data placement to improve locality and to reduceremote requests.

Page 6: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Approaches Using Offline Optimization

Algorithm:

1. Gather access trace for all items

2. Run offline optimization algorithms on traces

3. Store solution in directory

4. Locate data items by querying directory

I Fine-grained placement

I Costly to log all accesses

I Complex optimization

I Directory creates additional network usage

Page 7: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Approaches Using Offline Optimization

Algorithm:

1. Gather access trace for all items

2. Run offline optimization algorithms on traces

3. Store solution in directory

4. Locate data items by querying directory

I Fine-grained placement

I Costly to log all accesses

I Complex optimization

I Directory creates additional network usage

Page 8: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Main challenges

Cause: Key-Value stores may handle large amounts of data

Challenges:

1. Collecting Statistics: Obtaining usage statistics in anefficient manner.

2. Optimization: Deriving fine-grained placement for dataobjects that exploits data locality.

3. Fast lookup: Preserving fast lookup for data items.

Page 9: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Approaches to Data Access Locality

1. Consistent Hashing (CH):The “don’t care” approach

2. Distributed Directories:The “care too much” approach

Page 10: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Consistent Hashing

Don’t care for locality: items placed deterministically according tohash functions and full membership information.

I Simple to implement

I Solves lookup challenge by using local lookups

I No control on data placement → bad locality

I Does not address optimization challenge

Page 11: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Consistent Hashing

Don’t care for locality: items placed deterministically according tohash functions and full membership information.

I Simple to implement

I Solves lookup challenge by using local lookups

I No control on data placement → bad locality

I Does not address optimization challenge

Page 12: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Distributed Directories

Care too much for locality: nodes report usage statistics tocentralized optimizer, placement defined in a distributed directory(may be cached locally)

I Can solve statistics challenge using coarse statistics

I Solves optimization challenge with precise data placementcontrol

Hindered by lookup challenge:

I Additional network hop

I Hard to update

Page 13: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Distributed Directories

Care too much for locality: nodes report usage statistics tocentralized optimizer, placement defined in a distributed directory(may be cached locally)

I Can solve statistics challenge using coarse statistics

I Solves optimization challenge with precise data placementcontrol

Hindered by lookup challenge:

I Additional network hop

I Hard to update

Page 14: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Outline

Introduction

Our approach

Evaluation

Conclusions

Page 15: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Our approach: beating the challenges

Best of both worlds

I Statistics Challenge: Gather statistics only for hotspot items

I Optimization Challenge: Fine-grained optimization forhotspots

I Lookup Challenge: Consistent Hashing for remaining items

Page 16: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Algorithm overview

Online, round-based approach:

1. Statistics: Monitor data access to collect hotspots

2. Optimization: Decide placement for hotspots

3. Lookup: Encode / broadcast data placement

4. Move data

Page 17: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Algorithm overview

Online, round-based approach:

1. Statistics: Monitor data access to collect hotspots

2. Optimization: Decide placement for hotspots

3. Lookup: Encode / broadcast data placement

4. Move data

Page 18: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Statistics: Data access monitoring

Key concept: Top-K stream analysis algorithm

I Lightweight

I Sub-linear space usage

I Inaccurate result... But with bounded error

Page 19: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Statistics: Data access monitoring

Key concept: Top-K stream analysis algorithm

I Lightweight

I Sub-linear space usage

I Inaccurate result... But with bounded error

Page 20: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Statistics: Data access monitoring

Key concept: Top-K stream analysis algorithm

I Lightweight

I Sub-linear space usage

I Inaccurate result... But with bounded error

Page 21: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Algorithm overview

Online, round-based approach:

1. Statistics: Monitor data access to collect hotspots

2. Optimization: Decide placement for hotspots

3. Lookup: Encode / broadcast data placement

4. Move data

Page 22: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Optimization

Integer Linear Programming problem formulation:

min∑j∈N

∑i∈O

X ij(crr rij + crwwij) + Xij(cl

r rij + clwwij) (1)

subject to:

∀i ∈ O :∑j∈N

Xij = d ∧ ∀j ∈ N :∑i∈O

Xij ≤ Sj

Inaccurate input:

I Does not provide optimal placement

I Upper-bound on error

Page 23: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Accelerating optimization

1. ILP Relaxed to Linear Programming problem

2. Distributed optimization

LP relaxation

I Allow data item ownership to be in [0− 1] interval

Distributed Optimization

I Partition by the N nodes

I Each node optimizes hotspots mapped to it by CH

I Strengthen capacity constraint

Page 24: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Accelerating optimization

1. ILP Relaxed to Linear Programming problem

2. Distributed optimization

LP relaxation

I Allow data item ownership to be in [0− 1] interval

Distributed Optimization

I Partition by the N nodes

I Each node optimizes hotspots mapped to it by CH

I Strengthen capacity constraint

Page 25: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Accelerating optimization

1. ILP Relaxed to Linear Programming problem

2. Distributed optimization

LP relaxation

I Allow data item ownership to be in [0− 1] interval

Distributed Optimization

I Partition by the N nodes

I Each node optimizes hotspots mapped to it by CH

I Strengthen capacity constraint

Page 26: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Algorithm overview

Online, round-based approach:

1. Statistics: Monitor data access to collect hotspots

2. Optimization: Decide placement for hotspots

3. Lookup: Encode / broadcast data placement

4. Move data

Page 27: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Lookup: Encoding placement

Probabilistic Associative Array (PAA)

I Associative array interface (keys→values)

I Probabilistic and space-efficient

I Trade-off space usage for accuracy

Page 28: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Probabilistic Associative Array: Usage

Building

1. Build PAA from hotspot mappings

2. Broadcast PAA

Looking up objects

I If item not in PAA, use Consistent Hashing

I If item is hotspot, return PAA mapping

Page 29: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Probabilistic Associative Array: Usage

Building

1. Build PAA from hotspot mappings

2. Broadcast PAA

Looking up objects

I If item not in PAA, use Consistent Hashing

I If item is hotspot, return PAA mapping

Page 30: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

PAA: Building blocks

I Bloom FilterSpace-efficient membership test (is item in PAA?)

I Decision tree classifierSpace-efficient mapping (where is hotspot mapped to?)

Page 31: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

PAA: Building blocks

I Bloom FilterSpace-efficient membership test (is item in PAA?)

I Decision tree classifierSpace-efficient mapping (where is hotspot mapped to?)

Page 32: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

PAA: Properties

Bloom Filter:

I False Positives: match items that it was not supposed to.

I No False Negatives: never return ⊥ for items in PAA.

Decision tree classifier:

I Inaccurate values (bounded error).

I Deterministic response: deterministic (item→node)mapping.

Page 33: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

PAA: Properties

Bloom Filter:

I False Positives: match items that it was not supposed to.

I No False Negatives: never return ⊥ for items in PAA.

Decision tree classifier:

I Inaccurate values (bounded error).

I Deterministic response: deterministic (item→node)mapping.

Page 34: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

PAA: Properties

Bloom Filter:

I False Positives: match items that it was not supposed to.

I No False Negatives: never return ⊥ for items in PAA.

Decision tree classifier:

I Inaccurate values (bounded error).

I Deterministic response: deterministic (item→node)mapping.

Page 35: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Algorithm Review

Online, round-based approach:

1. Statistics: Monitor data access to collect hotspotsTop-k stream analysis

2. Optimization: Decide placement for hotspotsLightweight distributed optimization

3. Lookup: Encode / broadcast data placementProbabilistic Associative Array

4. Move data

Page 36: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Algorithm Review

Online, round-based approach:

1. Statistics: Monitor data access to collect hotspotsTop-k stream analysis

2. Optimization: Decide placement for hotspotsLightweight distributed optimization

3. Lookup: Encode / broadcast data placementProbabilistic Associative Array

4. Move data

Page 37: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Algorithm Review

Online, round-based approach:

1. Statistics: Monitor data access to collect hotspotsTop-k stream analysis

2. Optimization: Decide placement for hotspotsLightweight distributed optimization

3. Lookup: Encode / broadcast data placementProbabilistic Associative Array

4. Move data

Page 38: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Algorithm Review

Online, round-based approach:

1. Statistics: Monitor data access to collect hotspotsTop-k stream analysis

2. Optimization: Decide placement for hotspotsLightweight distributed optimization

3. Lookup: Encode / broadcast data placementProbabilistic Associative Array

4. Move data

Page 39: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Outline

Introduction

Our approach

Evaluation

Conclusions

Page 40: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Experimental settings

I Integrated in Distributed Key-Value store (JBoss Infinispan)

I 40 Virtual Machines (10 physical machines)

I Gigabit network

Page 41: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Modified TPC-C benchmark

Induce controllable locality:

I Probability p: Nodes access data associated with a givenwarehouse.

I Probability 1− p: Nodes access data associated a randomwarehouse.

Page 42: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Remote operations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30

Perc

enta

ge o

f rem

ote

oper

atio

ns (%

)

Time (minutes)

100% locality90% locality50% locality

0% localitybaseline

Page 43: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Throughput

10

100

1000

0 5 10 15 20 25 30

Tran

sact

ions

per

sec

ond

(TX/

s)

Time (minutes)

100% locality90% locality50% locality

0% localitybaseline

Page 44: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Directory effects

10

100

1000

100% Locality 90% Locality 0% Locality

Tra

nsaction p

er

second (

tx/s

)Autoplacer

DirectoryBaseline

Page 45: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Outline

Introduction

Our approach

Evaluation

Conclusions

Page 46: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Conclusions

I Gather statistics only for hotspots

I Fine-grained hotspot placement

I Retain Local lookups using PAA

I Effective locality improvement

I Good network usage

I Considerable performance improvements

Page 47: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Conclusions

I Gather statistics only for hotspots

I Fine-grained hotspot placement

I Retain Local lookups using PAA

I Effective locality improvement

I Good network usage

I Considerable performance improvements

Page 48: Autoplacer: Scalable Self-Tuning Data Placement in ......Autoplacer: Scalable Self-Tuning Data Placement in Distributed Key-value Stores ICAC’13 Jo~ao Paiva, Pedro Ruivo, Paolo Romano,

Thank you