Top Banner
. . OS-Assisted Task Preemption for Hadoop Mario Pastorelli, Matteo Dell’Amico, Pietro Michiardi EURECOM, France DCPerf 2014 Madrid, 30 June 2014 1
23

OS-Assisted Task Preemption for Hadoop

Jun 11, 2015

Download

Science

his work introduces a new task preemption primitive for Hadoop, that allows tasks to be suspended and resumed exploiting existing memory management mechanisms readily available in modern operating systems. Our technique fills the gap that exists between the two extremes cases of killing tasks (which waste work) or waiting for their completion (which introduces latency): experimental results indicate superior performance and very small overheads when compared to existing alternatives.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: OS-Assisted Task Preemption for Hadoop

.

......OS-Assisted Task Preemption for Hadoop

Mario Pastorelli,Matteo Dell’Amico, Pietro MichiardiEURECOM, France

DCPerf 2014Madrid, 30 June 2014

1

Page 2: OS-Assisted Task Preemption for Hadoop

Outline

...1 Why Task Preemption On Hadoop

...2 Our Approach

...3 Experiments

2

Page 3: OS-Assisted Task Preemption for Hadoop

Why Task Preemption On Hadoop

Outline

...1 Why Task Preemption On Hadoop

...2 Our Approach

...3 Experiments

3

Page 4: OS-Assisted Task Preemption for Hadoop

Why Task Preemption On Hadoop Data-Intensive Scalable Computing & Hadoop

Hadoop MapReduce

Bring the computation to the data – split in blocks across a cluster.Map..

......

One task per block

Hadoop filesystem (HDFS): typically, 64–512 MB

Stores locally key-value pairs

e.g., for word count: [(red, 15) , (green, 7) , . . .]

.Reduce..

......

# of tasks set by the programmer

Mapper output is partitioned by key and pulled from “mappers”

The Reduce function operates on all values for a single key

e.g., (green, [7, 42, 13, . . .])

4

Page 5: OS-Assisted Task Preemption for Hadoop

Why Task Preemption On Hadoop Data-Intensive Scalable Computing & Hadoop

Hadoop MapReduce

Bring the computation to the data – split in blocks across a cluster.Map..

......

One task per block

Hadoop filesystem (HDFS): typically, 64–512 MB

Stores locally key-value pairs

e.g., for word count: [(red, 15) , (green, 7) , . . .]

.Reduce..

......

# of tasks set by the programmer

Mapper output is partitioned by key and pulled from “mappers”

The Reduce function operates on all values for a single key

e.g., (green, [7, 42, 13, . . .])

4

Page 6: OS-Assisted Task Preemption for Hadoop

Why Task Preemption On Hadoop Why You Need Preemption

High-Priority Tasks

MapReduce jobs are made of several tasks

we will focus on the task granularity

Sometimes you have high priority tasks

humans waiting for the resultshigh-value computations

Some tasks may take very long

errors in implementationsimply, a lot of computation

Solution: preempt low-priority tasks and give the resources theyare using to high-priority ones

5

Page 7: OS-Assisted Task Preemption for Hadoop

Why Task Preemption On Hadoop Why You Need Preemption

Preemptive Scheduling

Priority can be decided by a scheduler

fairness: guarantee that no user can “cheat the system”[Zaharia et al., EuroSys 2010]

deadline scheduling: ensure jobs are completed by a due date[Kc and Anyanwu, CloudCom 2010]

optimize response time: let small jobs pass in front[Wolf et al., Middleware 2010; Pastorelli et al., BIGDATA 2013]

6

Page 8: OS-Assisted Task Preemption for Hadoop

Why Task Preemption On Hadoop Why You Need Preemption

In Hadoop, Now

Currently, Hadoop can only preempt tasks by killing them

waste work

…or you justwait for them to finish

introduce latencies

We want to do better!

7

Page 9: OS-Assisted Task Preemption for Hadoop

Our Approach

Outline

...1 Why Task Preemption On Hadoop

...2 Our Approach

...3 Experiments

8

Page 10: OS-Assisted Task Preemption for Hadoop

Our Approach Delegating To the OS

Delegating To the OS

Hadoop tasks are standard POSIX processes

they communicate through POSIX signals

We use the same strategy: SIGTSTP, SIGCONT

Our implementation mirrors the one for killing tasks in Hadoop

SIGTSTP takes the place of SIGTERM

The state of the computation is implicitly saved by the OS

will be paged to disk if necessary

9

Page 11: OS-Assisted Task Preemption for Hadoop

Our Approach The OS and Paging

The OS and Paging

Memory is occupied by running processes and file system cache

When it is full, pages are evicted from memory

Least Recently Used-like policyPrioritizing clean pages (not modified after reading)

don’t need page out

Page out operations are clustered to improve throughput

disk seeks are amortized

Thrashing: when theworking set (memory used by runningprograms) is larger than memory

10

Page 12: OS-Assisted Task Preemption for Hadoop

Our Approach The OS and Paging

OS and Paging In Our Case

In Hadoop, a best practice is to configure the OS to prioritizerunning processes over disk cache

Hadoop reads in streams, so cache is not importantThisminimizes paging out

Paging out is done efficiently

close to maximum disk speed

No Trashing!

suspended tasks are not in the working set

11

Page 13: OS-Assisted Task Preemption for Hadoop

Experiments

Outline

...1 Why Task Preemption On Hadoop

...2 Our Approach

...3 Experiments

12

Page 14: OS-Assisted Task Preemption for Hadoop

Experiments Settings

Experimental Settings

.

......

tl, th: tasks with low and high priority

Synthetic tasks parsing randomly generated data

512MB blocks

We vary the arrival time of th 13

Page 15: OS-Assisted Task Preemption for Hadoop

Experiments Results

Standard Case

10 20 30 40 50 60 70 80 90tl progress at launch of th (%)

80

90

100

110

120

130

140

150

sojo

urn

tim

et h

(s)

wait

kill

susp

10 20 30 40 50 60 70 80 90tl progress at launch of th (%)

170

180

190

200

210

220

230

240

mak

espa

n(s

)

wait

kill

susp

14

Page 16: OS-Assisted Task Preemption for Hadoop

Experiments Results

Worst Case

10 20 30 40 50 60 70 80 90tl progress at launch of th (%)

80

90

100

110

120

130

140

150

sojo

urn

tim

et h

(s)

wait

kill

susp

10 20 30 40 50 60 70 80 90tl progress at launch of th (%)

170180190200210220230240250

mak

espa

n(s

)

wait

kill

susp

Each job allocates 2GB of memory

it’s a lot, requires modifying the Hadoop configuration

15

Page 17: OS-Assisted Task Preemption for Hadoop

Experiments Results

Overheads Due To Memory Usage

0 625 MB 1.25 GB 1.875 GB 2.5 GBmemory allocated by th

200400600800

1000120014001600

page

dby

tes

(MB)

0

5

10

15

20

25

over

head

(s)

swap

makespan

th sojourn time

16

Page 18: OS-Assisted Task Preemption for Hadoop

Experiments Discussion

Another Approach: Natjam

Natjam [Cho et al., SoCC 2013] works at the application layerRequires explicit handling by the application:

currently works for statelessMapReduce programsproposes hooks for serialization/deserialization to deal with state

.Pro..

......Might compress the amount of data written to disk

.Con..

......

Would always, pessimistically, write to disk

Requires serialization/deserialization overhead

The two approaches can be both available to a scheduler

17

Page 19: OS-Assisted Task Preemption for Hadoop

Experiments Discussion

Another Approach: Natjam

Natjam [Cho et al., SoCC 2013] works at the application layerRequires explicit handling by the application:

currently works for statelessMapReduce programsproposes hooks for serialization/deserialization to deal with state

.Pro..

......Might compress the amount of data written to disk

.Con..

......

Would always, pessimistically, write to disk

Requires serialization/deserialization overhead

The two approaches can be both available to a scheduler

17

Page 20: OS-Assisted Task Preemption for Hadoop

Experiments Discussion

Resume Locality

You can resume only locally suspended tasks

process migration could be implemented, but it would beexpensive…or you could just restart the task from scratch

Delay scheduling [Zaharia et al., EuroSys 2010]: wait until athreshold before scheduling non-local work

can be done also here

18

Page 21: OS-Assisted Task Preemption for Hadoop

Experiments Discussion

Implications On Scheduling

To optimizewall time, suspend tasks that are closest tocompletion

avoid stragglers (late tasks) as much as possible

To avoid redundant work, suspend tasks with smaller memoryfootprint

avoid swapping overheads

19

Page 22: OS-Assisted Task Preemption for Hadoop

Experiments Discussion

Implementing Suspension-Friendly Tasks

.Controlling Memory Footprint..

......

It could be worth to optimize for using less memory

Hint the garbage collector to run on suspension

Use garbage collectors that do deallocate RAM

.External State..

......

Some tasks interact with the outside world

Suspension should be handled correctly, but probably needstesting

20

Page 23: OS-Assisted Task Preemption for Hadoop

Conclusion Take-Home Messages

Take-Home Messages

Task preemption is important for Hadoop scheduling

priorities, fairness, deadlines, size-based schedulers, …

We do not need to reinvent the wheel

OSes have been suspending processes for many yearsthey do it well, let’s just use them!

Swapping is not bad per se

Hadoop mechanisms keep the working set under control andavoid thrashing

21