Top Banner
Nectar: Efficient Management of Computation and Data in Data Centers Lenin Ravindranath Pradeep Kumar Gunda, Chandu Thekkath, Yuan Yu, Li Zhuang
39

Nectar: Efficient Management of Computation and Data in Data Centers

Feb 24, 2016

Download

Documents

bekele bekele

Nectar: Efficient Management of Computation and Data in Data Centers. Lenin Ravindranath Pradeep Kumar G unda, Chandu Thekkath, Yuan Yu, Li Zhuang. Motivation. Resources are poorly managed in a data center. Computation. Storage. Redundant computations Wasting resources. Manually managed - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Nectar: Efficient Management of Computation and Data in Data Centers

Nectar: Efficient Management of Computation and Data in Data Centers

Lenin Ravindranath

Pradeep Kumar Gunda, Chandu Thekkath, Yuan Yu, Li Zhuang

Page 2: Nectar: Efficient Management of Computation and Data in Data Centers

Motivation

Resources are poorly managed in a data center

Computation Storage

Redundant computations– Wasting resources

Manually managed– Unused files occupying space– Redundant output files

Page 3: Nectar: Efficient Management of Computation and Data in Data Centers

Goal

Efficiently manage resources in a cluster

Computation Storage

Nectar

Page 4: Nectar: Efficient Management of Computation and Data in Data Centers

Key Insight

Data Center

Computation Storage

Single query interface for computation and data access

DryadLINQ

Query Interface

User

Page 5: Nectar: Efficient Management of Computation and Data in Data Centers

Goal

Efficiently manage resources in a cluster

Computation Storage

Nectar

Page 6: Nectar: Efficient Management of Computation and Data in Data Centers

ComputationPROBLEM: Redundant Computation– Programs share sub queries

– Programs share partial data sets

SOLUTION: Caching– Cache results of popular sub queries – Automatically rewrite user query to use cache

X.Select(…)X.Select(…).Where(…)

X.Select(…)(X+X’).Select(…)

1 2 3 4 5 6 7

2 3 4 5 6 7 8

Page 7: Nectar: Efficient Management of Computation and Data in Data Centers

Does caching help?

• Analyzed logs from production clusters• Logs of 3 months (Oct – Dec 2008)• 33 virtual clusters, 36000 jobs• Parsed SCOPE programs, extracted sub queries• Simulated caching

Page 8: Nectar: Efficient Management of Computation and Data in Data Centers

Caching helps

search

DM

domainRele

vance2

domainRele

vance

shopping

releva

nce IE

CosmosA

dmin

search

DM-prod

autopilo

t

search

XAP

adCen

ter

adcen

ter.au

diencei

ntellig

ence

search

DM-prod2

search

UX

MSR.Liv

eLabs bi

sandbox

search

STC

msn wlc

adLab

s

Selecti

on

search

Relevan

ce-prod2

search

Platform

search

Relevan

ce.ae

ther

cosmoste

st_vc1

adCen

ter.AdCen

terDeli

very

adCen

ter.KSP

search

Exec

tellm

e

adPlatf

orm.at

las0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cluster

Prog

ram

s hel

ped

by ca

chin

g

• About 50% cache hit on 10 clusters• More than 30% cache hit on 20 clusters• 35% on average

Page 9: Nectar: Efficient Management of Computation and Data in Data Centers

Goal

Efficiently manage resources in a cluster

Computation Storage

Nectar

Page 10: Nectar: Efficient Management of Computation and Data in Data Centers

StoragePROBLEM: Manually managed– Unused files occupying space

0 100 200 300 400 5000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Last accessed (days before)

CDF

Total Size: 190 TB

50% data was never accessed in the last 275 days

Page 11: Nectar: Efficient Management of Computation and Data in Data Centers

StorageSOLUTION: Automatically manage data– Track usage and delete infrequently used files– Store programs which re-computes the data

Page 12: Nectar: Efficient Management of Computation and Data in Data Centers

Query Interface

Data Center

Computation Storage

DryadLINQ

Query Interface

User

Page 13: Nectar: Efficient Management of Computation and Data in Data Centers

Goal

Efficiently manage resources in a cluster

Computation Storage

Nectar

Page 14: Nectar: Efficient Management of Computation and Data in Data Centers

Nectar

Data Center

Computation Storage

DryadLINQ

Query Interface

Nectar

User

Page 15: Nectar: Efficient Management of Computation and Data in Data Centers

Nectar Architecture

Query Rewriter

DryadLINQ

Dryad

DryadLINQ program

Query

Cache entries

Nectar Client

Cache Server

Add T to cache

P

P’ Add R to cache

R

TCluster

Page 16: Nectar: Efficient Management of Computation and Data in Data Centers

Nectar Architecture

Query Rewriter

Nectar Client

Cache Server

Page 17: Nectar: Efficient Management of Computation and Data in Data Centers

Query Rewriter

Select

X

R

X X’

Select

X’

SelectR

Concat

(R+R’)

Cache

Page 18: Nectar: Efficient Management of Computation and Data in Data Centers

Query Rewriter

Select

X

R

X X’

Select

X’

Select

R

Merge Sort

(R+R’)

Cache

Order by Order by Order by

Page 19: Nectar: Efficient Management of Computation and Data in Data Centers

Query Rewriter

• Generates multiple plans– Using multiple cache entries

• Selects the best plan– Based on benefit• Execution time• Output Size• Whether pipeline is broken

• Operators supported– Select, Where, Order by, Group by, Join

X.Select(…)X.Select(…).Where(…)

Page 20: Nectar: Efficient Management of Computation and Data in Data Centers

Nectar Architecture

Query Rewriter

Nectar Client

Cache Server

Page 21: Nectar: Efficient Management of Computation and Data in Data Centers

Cache Server

SQL Server

Garbage CollectorCache Policy

Cache Server

URI Query Fingerprint

Query + Data Fingerprint

Execution Time

Output Size

Inquire Stats

Usage Stats

Fingerprints

Page 22: Nectar: Efficient Management of Computation and Data in Data Centers

Cache policy• Insertion Policy– Always add program output to cache– Sub query outputs are added to cache• Popularity exceeds a threshold• Savings exceeds a threshold

elapsed Time1

SizeOutput TimeExecution Savings Sum

Page 23: Nectar: Efficient Management of Computation and Data in Data Centers

Garbage Collector

• Storage pressure– Delete infrequently used files

• Deletion policy– Based on savings – Cache type

• Mark and sweep algorithm– Delete cache entry– Reachability analysis• Delete files

Cache Server1

2

3

Distributed FS

1

2

Page 24: Nectar: Efficient Management of Computation and Data in Data Centers

What if I try to access a garbage collected file?

Page 25: Nectar: Efficient Management of Computation and Data in Data Centers

Nectar Architecture

Query Rewriter

Nectar Client

Cache Server

Program store

Page 26: Nectar: Efficient Management of Computation and Data in Data Centers

Program Store

• Store executed programs in the cluster• Output file is tied to its corresponding

program that generates the output• If a file is deleted, the program is executed to

regenerate the output

Page 27: Nectar: Efficient Management of Computation and Data in Data Centers

Managing Data

Nectar Client

Program Store

Distributed FSfoo.pt

Cache Server

FPFP Program

FPA31E4.pt

ToPartitionedTable (lenin\foo.pt)

DryadLINQ

Dryad

usr Nectar

P’

Program

P

Program

Page 28: Nectar: Efficient Management of Computation and Data in Data Centers

Managing Data

Nectar Client

Program Store

Distributed FSfoo.pt

Cache Server

FPFP Program

FP

FromPartitionedTable (lenin\foo.pt)

DryadLINQ

Dryad

usr Nectar

P

A31E4.pt

Page 29: Nectar: Efficient Management of Computation and Data in Data Centers

Managing Data

Nectar Client

Program Store

Distributed FSfoo.pt

Cache Server

FPFP Program

FP

FromPartitionedTable (lenin\foo.pt)

DryadLINQ

Dryad

usr Nectar

P

A31E4.pt

Program

KJ1LM.pt

Page 30: Nectar: Efficient Management of Computation and Data in Data Centers

Goal

Efficiently manage resources in a cluster

Computation Storage

Nectar

Computation Storage

Unified computation and data

Page 31: Nectar: Efficient Management of Computation and Data in Data Centers

Distributed cache servers

Cache ServerSQL Server

Partitioned by query fingerprint

Nectar Client

CentralizedGarbage collector

Hash based on query fingerprint

Program store Program store

Cache ServerSQL Server

Page 32: Nectar: Efficient Management of Computation and Data in Data Centers

Summary• We built Nectar

– Automatically manage data– Efficiently manage computation

Components• Query Rewriter

– Automatically rewrite queries to use cache• Cache server

– Popular sub queries are cached– Garbage collected based on usage

• Program store– Store programs which regenerates the output

Page 33: Nectar: Efficient Management of Computation and Data in Data Centers

Status

• Almost done with development– Query Rewriter• Including other operators

– Fingerprinter• Program static analysis

– Cache Server– Program Store

• In the process of deploying

Page 34: Nectar: Efficient Management of Computation and Data in Data Centers

Can we do better?

Page 35: Nectar: Efficient Management of Computation and Data in Data Centers

Cluster Utilization

search

DM

domainRele

vance2

domainRele

vance

shopping

releva

nce IE

CosmosA

dmin

search

DM-prod

autopilo

t

search

XAP

adCen

ter

adcen

ter.au

diencei

ntellig

ence

search

DM-prod2

search

UX

MSR.Liv

eLabs bi

sandbox

search

STC

msn wlc

adLab

s

Selecti

on

search

Relevan

ce-prod2

search

Platform

search

Relevan

ce.aet

her

cosmoste

st_vc1

adCen

ter.AdCen

terDeli

very

adCen

ter.KSP

search

Exec

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Clusters

Idle

Per

cent

• Most clusters have more than 40% Idle time• Even the busiest clusters have 10-20% idle time

Page 36: Nectar: Efficient Management of Computation and Data in Data Centers

Exploiting idle time

• Do speculative caching– Cache popular data before query issued– Run program on new streams when available

• No side effects– Executed only when cluster is idle– Low priority jobs– Output garbage collected with high priority– More electric bill? Not Really!

Page 37: Nectar: Efficient Management of Computation and Data in Data Centers

Questions

Page 38: Nectar: Efficient Management of Computation and Data in Data Centers

Backup

Page 39: Nectar: Efficient Management of Computation and Data in Data Centers

Caching Results

search

DM

domainRele

vance2

domainRele

vance

shopping

releva

nce IE

CosmosA

dmin

search

DM-prod

autopilo

t

search

XAP

adCen

ter

adcen

ter.au

diencei

ntellig

ence

search

DM-prod2

search

UX

MSR.Liv

eLabs bi

sandbox

search

STC

msn wlc

adLab

s

Selecti

on

search

Relevan

ce-prod2

search

Platform

search

Relevan

ce.ae

ther

cosmoste

st_vc1

adCen

ter.AdCen

terDeli

very

adCen

ter.KSP

search

Exectel

lme

adPlatf

orm.at

las

search

Web

Load

cosmosTe

st_common1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Cach

e Hi

t