Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Post on 31-Dec-2015

23 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Parrot: Transparent User-Level Middleware for Data-Intensive Computing. Douglas Thain Condor Project, University of Wisconsin Workshop on Adaptive Grid Middleware 28 September 2003. The Reality of the Grid. afwuhweiuhsdvxmndf (and then a miracle happens) P=NP. - PowerPoint PPT Presentation

Transcript

Parrot:Transparent User-Level Middleware

for Data-Intensive Computing

Douglas Thain

Condor Project, University of Wisconsin

Workshop on Adaptive Grid Middleware

28 September 2003

The Reality of the Grid

(and then a miracle happens)

P=NP

Look at my new proof!

I think you have a

problem here...

run thisbatch job

User’sApp

Process Interface(main, exit, abort, kill, sleep)

Local Operating System

Condor PBS NQE LSFLoad

Leveler

(open, close, read, write, lseek)I/O Interface

Local Operating System

Chirp FTP NeST RFIO DCAP

StorageServer access

data

Parrot

User’sApp

Applications of Parrot• Interactive Browsing

– tcsh, tar, gzip, make, acroread, gv, xv...

• Improved Reliability– Transparent retry/reassignment/reallocation– Files, sockets, even repair broken apps.

• Private Namespaces– Make /home/thain appear the same everywhere.– Make /usr/data/calibration different everywhere.

• Dynamic/Distributed Program Construction– Remote link, remote exec, remote eval...

• Profiling and Debugging– Users may not know low-level I/O patterns.

Challenges

• Technical Methods of Interposition• Semantic Differences• Error Management• CPU – I/O Integration• Performance

• The butterfly effect:– Subtle underlying differences can have

large effects in performance and usability.

Internal Techniques

Polymorphic Extension

Static or DynamicRe-Linking

Binary Rewriting

Library

M1 M2 NEW

Standard Library

New Library

App Code

App Code

Standard Library

App Code

New Code

Agent

External Techniques

App

Kernel

NFS LFS USR

App

Kernel

NFS LFS FFS

agent

Remote Filesystem

Kernel Callout

AgentApp

Kernel

NFS LFS FFS

Debugger Trap

Techniques Compared

technique burden speed hole detection

polymorphic rewrite fast easy

static link relink fast hard

dynamic link dynlink medium hard

binary rewrite dynlink fast hard

remote fs root varies easy

callout root slow easy

debugger none very slow easy

Hole Detection Matters

• Dynamic Linking– Bypass Toolkit, ca. 2000– Works with some standard tools.– Many still crash in strange ways.– Doesn’t apply to static exes; always a surprise.

• Debugger Trap– Parrot: Coding began in May of 2003.– Works reliably with almost everything in /usr/bin.– Caveat #1: Twice as much code– Caveat #2: Higher latency

Debugger Trap

• For the rest of this talk, we select the debugger trap for completeness and reliability. Much of the discussion still applies to the other techniques too.

• Some technical details in the paper:– Only on Linux.– Must manage process ancestry.– Must fudge some broken ptrace behavior.– Cannot write directly to process, must take

roundabout path through temp file.

UserProcess

0 1 2 3 4 5 6 7 8 9 ...

LocalDriver

ChirpDriver

FTPDriver

NeSTDriver

RFIODriver

DCAPDriver

pos:100

pos:0

pos:0

pos:1 MB

pos:42

“outfile” “infile” “config” “data”

name resolver

mountlist

driver

chirplookupdriver

DeviceDrivers

FileObjects

FilePointers

FileDescr.

parrot_readparrot_write parrot_open

SYS_write SYS_read SYS_open (debugger trap)

Adaptation

App

Parrot

Local FTP Chirp

On same host:

/mydata-> /usr/data

/usr/data

open(“/mydata/foo”)

chirpd

App

Parrot

Local FTP Chirp

On nearby host:

/mydata ->/chirp/host1/usr/mydata

open(“/mydata/foo”)

Parrot

Local FTP Chirp

App

On distant host:

/mydata -> /ftp/host2/opt/DAT

/opt/DATftpd

open(“/mydata/foo”)

What Protocol?• File Transfer Protocol:

– Internet standard, many implementations.– High bandwidth sequential access.

• NeST– General purpose storage appliance from UW.– Virtual users, namespace, and allocation.

• RFIO:– Remote I/O protocol used with CERN CASTOR.– UNIX like, most ops require a new TCP.

• DCAP– Remote I/O protocol used with Fermi D-Cache– UNIX like, WORM semantics, no directories, caching/

• Chirp:– Protocol developed @ UW for Parrot.– Corresponds very closely to UNIX, incl errnos.

Small Details Matter

• Standard tools need to know subtle details, otherwise, they break:– ls –lR performs getdents(“foo”)– on success: descend– on ENOTDIR: display and continue– on ENOENT: display error and stop.

• FTP does not provide this detail– Failed LIST -> error 550– Failed GET -> error 550– Failed CDIR -> error 550

• Simple assignment doesn’t work:– Making 550=ENOENT breaks many tools.

Example Solution

LIST “foo”

CWD “foo”

SIZE “foo”

Transient ErrorNot a dir.

Access denied.

No such entry.

Success200

550

550

200

200

550

other

other

other

CPU-IO Integration

• Errors that cannot be expressed in the client’s interface must be passed to a higher level (the batch system.)

• Simple options:– kill –9 application (retry app elsewhere)– exit(1) application (don’t retry app)

• Complex options: (Condor only)– restart with (Subnet!=“128.101.175”)– restart with (CurrentTime>5pm)

Bandwidth by Protocol

0

1

2

3

45

6

7

8

9

4KB 16KB 64KB 256KB 1MB 4MB 16MB 64MB

block size

ban

dw

idth

(M

B/s

) ftp

chirprfio

dcap

nest

(parrot default hint)(unix default hint)

Latency by Protocol (ms)stat open

close

read

1B

read 8KB

write 1B

write 8KB

chirp 0.50 0.84 0.61 2.80 0.38 2.23

ftp 0.87 2.82 - - - -

nest 2.51 2.53 2.96 4.48 5.53 7.41

rfio 13.41 23.11 0.50 3.32 39.8 2.85

dcap 152.53 159.09 40.05 3.01 40.14 3.14

Andrew-Like Benchmark

• Original Andrew benchmark is no longer appropriate, so replace with the Parrot source: 296 files, 955 KB.

• Copy the source to a remote device, then manipulate in five stages:– copy: cp –rp– list: ls –lR– scan: grep searchstring –r *– make: make– delete: rm –rf *

Overheads Compared

0

20

40

60

80

100

120

140

160

copy list scan make deletebenchmark stage

tim

e (s

)

parrot only+ chirp+ lan+ cache

Overheads Compared

01

234

5678

910

copy list scan make deletebenchmark stage

tim

e (s

)

parrot only+ chirp+ lan+ cache

Protocols Compared

0

50

100

150

200

250

300

350

copy list scan make deletebenchmark stage

tim

e (s

)

chirpftpnestrfio (failed)dcap (no dirs)

Protocols Compared

05

101520

25303540

4550

copy list scan make deletebenchmark stage

tim

e (s

)

chirpftpnestrfio (failed)dcap (no dirs)

Moral of the story:

• The butterfly effect: Small underlying differences can have big effects on performance and reliability.

• Examples in interposition:– Dynamic linking: fast but poor hole detection.– Debugger trap: slow but good hold detection.

• Examples in protocols:– Chirp: UNIX semantics restrict bandwidth.– FTP: Need for multiple ops increases latency.– NeST: Powerful virtualization increases latency.– RFIO: Connection per op doesn’t scale.

For more info...

• Douglas Thain– thain@cs.wisc.edu

• Miron Livny– miron@cs.wisc.edu

• Software, manuals, more info:– http://www.cs.wisc.edu/condor/parrot

• The Condor Project:– http://www.cs.wisc.edu/condor

top related