Top Banner
Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing
18

Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Apr 01, 2015

Download

Documents

Sage Vibert
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Yahoo! Research

Johns Hopkins University

Chris OlstonAnish Das Sarma

Xiaodan WangRandal Burns

Shared Scan Batch Scheduling in Cloud Computing

Page 2: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Project Goals

Eliminate redundant data processing for concurrent workflows that access the same dataset in the Cloud

Batch MapReduce workflows to enable scan sharing– Single pass scan of shared data segments– Alleviate contention and improve scalability– Utilize fewer map/reduce slots under load

Data-intensive workloads (tens of minutes to hours)– Joins across multiple datasets– User specified rewards for early completion

Trade-offs between efficient resource utilization and deadlines

Page 3: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Data-Driven Batch Scheduling

Throughput scales with contention (Astro. & Turbulence) Decompose into sub-queries based on data access Co-schedule sub-queries to amortize I/O Evaluate data atoms based on utility metric

– Reordering based on contention vs. arrival order (CIDR’09)– Adaptive starvation resistance– Job-aware (queries with data dependency) (SC’10)

Turbulence DBTurbulence DB

R1 R2 R3

R2 R3 R4

R1 R2

Q1

Q2

Q3

Dec

omp

osit

ion

Data Access by QueryData Access by Query

Q1 Q2 Q3

Q1 Q3

Q1 Q2

R2

R1

R3Q2R3

Co-schedule by Sub-queryCo-schedule by Sub-query

Bat

ch S

ched

.

QueryResultsQueryResults

Page 4: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Application in Cloud Computing Fixed Cloud (fixed resources)

– Single pass scan of shared data– Alleviate contention (utilize less map/reduce slots, shared

loading and shuffling of data)– Earn rewards for early completion (soft deadlines)– Local improvement w/ simulated annealing, greedy ordering

Elastic Cloud– Machine charge = (# of machines) x (# hours)– Speed-up factors w/ more machines (i.e. more parallelism)– Add machines to meet soft deadlines– Aggressive batching to minimize machine charge (efficiency)

Page 5: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Sample PigA = load ‘input1' as (a, b, c);

B = filter A by a > 5;

store B into 'output1';

C = group B by b;

store C into 'output2';

Nova Workflow Platform

What is Nova?– Content mgmt and workflow scheduling for the Cloud– Leverages existing resources

Cloud Data: HDFS/Zebra storage Cloud Computing: Oozie, Pig/MR/Hadoop

Users define complex workflows in Oozie that consume the data

Storage: HDFSStorage: HDFS

Processing: Hadoop M-RProcessing: Hadoop M-R

Simple workflow: Oozie

Dataflow: PigDataflow: Pig

Advanced workflow: Nova

App 1App 1 App 2App 2 App 3App 3

OozieWorkflow engine for coordinating Map-Reduce/Pig jobs in Hadoop (i.e. Workflow DAG in which nodes are MR tasks and edges are dataflows)

Page 6: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Sample Nova Workflow

cand. entity

extractor

cand. entity

extractor

candidate entity

occurrences

candidate entity

occurrences

(url, entity string)

crawled pages

crawled pages

(url, content)

entitiesentities

(entity id, entity string)

validated entity

occurrences

validated entity

occurrences

(url, entity id)

entity occurrence

counts

entity occurrence

counts

(entity id, count)

crawlercrawler

editorseditors

joinjoingroup-wise

count

group-wise

count

outputoutput

Nova TasksNova Tasks Nova TaskNova Task Nova TaskNova Task

Nova DataNova Data Nova DataNova Data Nova DataNova Data Nova DataNova Data

Nova DataNova Data

Page 7: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Shared Scan via Workflow MergingNova Workflow 1Nova Workflow 1

Nova Workflow 2Nova Workflow 2

c2s0c2s0c2s0c2s0

c1s0c1s0c1s0c1s0 c3s0c3s0c3s0c3s0

c4s0c4s0c4s0c4s0

c2s0c2s0c2s0c2s0 c5s0c5s0c5s0c5s0

WorkflowMerger

Nova Workflow 1.2Nova Workflow 1.2(scans c2s0 once)(scans c2s0 once)

c2s0c2s0c2s0c2s0

c1s0c1s0c1s0c1s0 c3s0c3s0c3s0c3s0

c4s0c4s0c4s0c4s0

c5s0c5s0c5s0c5s0

InputData

Pig/MRTasks

OutputData

Sample Use Cases in Nova– Concurrent research, production, maintenance workflows over same data– Content enrichment workflows (i.e. dedup, clustering) over news content– Webmap workflows consuming same URL table

Page 8: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Output DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput Data

MapMap11

ReduceReduce11

ShuffleShuffle

Split(Tuple)Split(Tuple)Split(Tuple)Split(Tuple)

NestedNestedPlanPlan

NestedNestedPlanPlanNestedNested

PlanPlanNestedNested

PlanPlanNestedNestedPlanPlan

NestedNestedPlanPlan

Combine(Tuple)Combine(Tuple)Combine(Tuple)Combine(Tuple)

Demux(Tuple)Demux(Tuple)Demux(Tuple)Demux(Tuple)

NestedNestedPlanPlan

NestedNestedPlanPlanNestedNested

PlanPlanNestedNested

PlanPlanNestedNestedPlanPlan

NestedNestedPlanPlan

MapMap22

ReduceReduce22

……

……

Output DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput Data

MapMapnn

ReduceReducemm

Input DataInput DataInput DataInput Data Performance ImpactPerformance Impact

(1) (1) Shared LoadingShared Loading

(network, redundant proc.)(network, redundant proc.)

(2) (2) Consolidated computationConsolidated computation

(shared startup/tear down)(shared startup/tear down)

(3) (3) Reducer parallelismReducer parallelism

(Max/Sum # of reducers)(Max/Sum # of reducers)

(1)

(2)

(3)

Page 9: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Completion Time by Scheduling Strategy

0

500000

1000000

1500000

2000000

2500000

3000000

1 2 3 4 5 6

# of Shingling Workflows

Tim

e (

ms

)

Sequential-NoMerge

Concurrent-NoMerge

Merged

Performance in Nova for different enrichment workflows (ie. de-dup) on news content (SIGMOD’11)

Page 10: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Utilization of Grid Resources (Slot Time)

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

1 2 3 4 5 6 7

# of Shingling Workflows

Slo

t T

ime

(ms)

Concurrent-NoMerge Map

Concurrent-NoMerge Reduce

Merge Map

Merge Reduce

Page 11: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

PigMix: Load Cost Savings

Page 12: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

PigMix: Estimating Makespan

Page 13: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Ongoing Work Starvation resistance

– Account for heterogeneity in workflow sizes– Provide soft deadline guarantees– Handling cascading failures– Prefer jobs with high load cost (less dilation, high slot time

savings, map-only jobs)

Predicting workflow runtime and frequency– Robustness to inaccuracies in cost estimates– Conserve or expend Cloud resources based on deadline

requirements and system load

Jobs that join/scan multiple input sources

Page 14: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Questions?

Page 15: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Nova Workflow Platform

Nova features– Abstraction for complex workflows that consume data

Incrementally arriving data (logs, crawls, feeds, ...) Incremental processing of arriving data

– Stateless: shingle every newly-crawled page– Stateful: maintain inlink counts as web grows

Scheduling processing steps– Periodic: run inlink counter once per week

– Triggered: run inlink counter after link extractor – Provides provenance, metadata management, incremental processing

(i.e. joins), data replication, transactional guarantees

Page 16: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

PigMix: Reducer Parallelism

Page 17: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Optimizing for Shared Scan

Define a job J (i.e. MapReduce or Pig)– Scans files f(J) = (F1, …, Fi), scan time per file: s(Fi)

– Fixed processing cost c(J)

d(J) defines a soft deadline of each job– Step: d defined by n pairs of (ti, pi) where 0<ti< ti+1 and pi>pi+1

(a job that completes by ti is award pi points)

– Linearly decay: enforce eventual completion w/ negative pts

Cost of shared scan for Jobs J1 and J2

c(J1) + c(J2) + ∑Fє(f(J1) U f(J2)) s(F) Maximize points and minimize resources

– Local improvement w/ simulated annealing, greedy ordering– Aggressive batching when load is high

Page 18: Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.

Shared Scan Batch Scheduling in Cloud Computing

Performance Evaluation Experimental Setup

– Nova with Shared Scan Module– 200 node Hadoop cluster

128MB HDFS block size 1GB RAM per node 640 mapper and 320 reducer slots

– Shingling workflow (offline content enrichment) De-duplication of news Filter and extract features from content Cluster content by feature and pick one per cluster Execution of multiple de-dup workflows using different clustering alg.

– Scheduling strategies compared Sequential-NoMerge (slower, conserve Grid resources) Concurrent-NoMerge (fast, elastic Grid resources) Merged (fast, conserve Grid resources)