Top Banner
Parrot: Transparent User-Level Middleware for Data-Intensive Computing Douglas Thain Condor Project, University of Wisconsin Workshop on Adaptive Grid Middleware 28 September 2003
25

Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Dec 31, 2015

Download

Documents

dara-conrad

Parrot: Transparent User-Level Middleware for Data-Intensive Computing. Douglas Thain Condor Project, University of Wisconsin Workshop on Adaptive Grid Middleware 28 September 2003. The Reality of the Grid. afwuhweiuhsdvxmndf (and then a miracle happens) P=NP. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Parrot:Transparent User-Level Middleware

for Data-Intensive Computing

Douglas Thain

Condor Project, University of Wisconsin

Workshop on Adaptive Grid Middleware

28 September 2003

Page 2: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

The Reality of the Grid

(and then a miracle happens)

P=NP

Look at my new proof!

I think you have a

problem here...

Page 3: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

run thisbatch job

User’sApp

Process Interface(main, exit, abort, kill, sleep)

Local Operating System

Condor PBS NQE LSFLoad

Leveler

(open, close, read, write, lseek)I/O Interface

Local Operating System

Chirp FTP NeST RFIO DCAP

StorageServer access

data

Parrot

User’sApp

Page 4: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Applications of Parrot• Interactive Browsing

– tcsh, tar, gzip, make, acroread, gv, xv...

• Improved Reliability– Transparent retry/reassignment/reallocation– Files, sockets, even repair broken apps.

• Private Namespaces– Make /home/thain appear the same everywhere.– Make /usr/data/calibration different everywhere.

• Dynamic/Distributed Program Construction– Remote link, remote exec, remote eval...

• Profiling and Debugging– Users may not know low-level I/O patterns.

Page 5: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Challenges

• Technical Methods of Interposition• Semantic Differences• Error Management• CPU – I/O Integration• Performance

• The butterfly effect:– Subtle underlying differences can have

large effects in performance and usability.

Page 6: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Internal Techniques

Polymorphic Extension

Static or DynamicRe-Linking

Binary Rewriting

Library

M1 M2 NEW

Standard Library

New Library

App Code

App Code

Standard Library

App Code

New Code

Page 7: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Agent

External Techniques

App

Kernel

NFS LFS USR

App

Kernel

NFS LFS FFS

agent

Remote Filesystem

Kernel Callout

AgentApp

Kernel

NFS LFS FFS

Debugger Trap

Page 8: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Techniques Compared

technique burden speed hole detection

polymorphic rewrite fast easy

static link relink fast hard

dynamic link dynlink medium hard

binary rewrite dynlink fast hard

remote fs root varies easy

callout root slow easy

debugger none very slow easy

Page 9: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Hole Detection Matters

• Dynamic Linking– Bypass Toolkit, ca. 2000– Works with some standard tools.– Many still crash in strange ways.– Doesn’t apply to static exes; always a surprise.

• Debugger Trap– Parrot: Coding began in May of 2003.– Works reliably with almost everything in /usr/bin.– Caveat #1: Twice as much code– Caveat #2: Higher latency

Page 10: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Debugger Trap

• For the rest of this talk, we select the debugger trap for completeness and reliability. Much of the discussion still applies to the other techniques too.

• Some technical details in the paper:– Only on Linux.– Must manage process ancestry.– Must fudge some broken ptrace behavior.– Cannot write directly to process, must take

roundabout path through temp file.

Page 11: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

UserProcess

0 1 2 3 4 5 6 7 8 9 ...

LocalDriver

ChirpDriver

FTPDriver

NeSTDriver

RFIODriver

DCAPDriver

pos:100

pos:0

pos:0

pos:1 MB

pos:42

“outfile” “infile” “config” “data”

name resolver

mountlist

driver

chirplookupdriver

DeviceDrivers

FileObjects

FilePointers

FileDescr.

parrot_readparrot_write parrot_open

SYS_write SYS_read SYS_open (debugger trap)

Page 12: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Adaptation

App

Parrot

Local FTP Chirp

On same host:

/mydata-> /usr/data

/usr/data

open(“/mydata/foo”)

chirpd

App

Parrot

Local FTP Chirp

On nearby host:

/mydata ->/chirp/host1/usr/mydata

open(“/mydata/foo”)

Parrot

Local FTP Chirp

App

On distant host:

/mydata -> /ftp/host2/opt/DAT

/opt/DATftpd

open(“/mydata/foo”)

Page 13: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

What Protocol?• File Transfer Protocol:

– Internet standard, many implementations.– High bandwidth sequential access.

• NeST– General purpose storage appliance from UW.– Virtual users, namespace, and allocation.

• RFIO:– Remote I/O protocol used with CERN CASTOR.– UNIX like, most ops require a new TCP.

• DCAP– Remote I/O protocol used with Fermi D-Cache– UNIX like, WORM semantics, no directories, caching/

• Chirp:– Protocol developed @ UW for Parrot.– Corresponds very closely to UNIX, incl errnos.

Page 14: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Small Details Matter

• Standard tools need to know subtle details, otherwise, they break:– ls –lR performs getdents(“foo”)– on success: descend– on ENOTDIR: display and continue– on ENOENT: display error and stop.

• FTP does not provide this detail– Failed LIST -> error 550– Failed GET -> error 550– Failed CDIR -> error 550

• Simple assignment doesn’t work:– Making 550=ENOENT breaks many tools.

Page 15: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Example Solution

LIST “foo”

CWD “foo”

SIZE “foo”

Transient ErrorNot a dir.

Access denied.

No such entry.

Success200

550

550

200

200

550

other

other

other

Page 16: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

CPU-IO Integration

• Errors that cannot be expressed in the client’s interface must be passed to a higher level (the batch system.)

• Simple options:– kill –9 application (retry app elsewhere)– exit(1) application (don’t retry app)

• Complex options: (Condor only)– restart with (Subnet!=“128.101.175”)– restart with (CurrentTime>5pm)

Page 17: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Bandwidth by Protocol

0

1

2

3

45

6

7

8

9

4KB 16KB 64KB 256KB 1MB 4MB 16MB 64MB

block size

ban

dw

idth

(M

B/s

) ftp

chirprfio

dcap

nest

(parrot default hint)(unix default hint)

Page 18: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Latency by Protocol (ms)stat open

close

read

1B

read 8KB

write 1B

write 8KB

chirp 0.50 0.84 0.61 2.80 0.38 2.23

ftp 0.87 2.82 - - - -

nest 2.51 2.53 2.96 4.48 5.53 7.41

rfio 13.41 23.11 0.50 3.32 39.8 2.85

dcap 152.53 159.09 40.05 3.01 40.14 3.14

Page 19: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Andrew-Like Benchmark

• Original Andrew benchmark is no longer appropriate, so replace with the Parrot source: 296 files, 955 KB.

• Copy the source to a remote device, then manipulate in five stages:– copy: cp –rp– list: ls –lR– scan: grep searchstring –r *– make: make– delete: rm –rf *

Page 20: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Overheads Compared

0

20

40

60

80

100

120

140

160

copy list scan make deletebenchmark stage

tim

e (s

)

parrot only+ chirp+ lan+ cache

Page 21: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Overheads Compared

01

234

5678

910

copy list scan make deletebenchmark stage

tim

e (s

)

parrot only+ chirp+ lan+ cache

Page 22: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Protocols Compared

0

50

100

150

200

250

300

350

copy list scan make deletebenchmark stage

tim

e (s

)

chirpftpnestrfio (failed)dcap (no dirs)

Page 23: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Protocols Compared

05

101520

25303540

4550

copy list scan make deletebenchmark stage

tim

e (s

)

chirpftpnestrfio (failed)dcap (no dirs)

Page 24: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

Moral of the story:

• The butterfly effect: Small underlying differences can have big effects on performance and reliability.

• Examples in interposition:– Dynamic linking: fast but poor hole detection.– Debugger trap: slow but good hold detection.

• Examples in protocols:– Chirp: UNIX semantics restrict bandwidth.– FTP: Need for multiple ops increases latency.– NeST: Powerful virtualization increases latency.– RFIO: Connection per op doesn’t scale.

Page 25: Parrot: Transparent User-Level Middleware for Data-Intensive Computing

For more info...

• Douglas Thain– [email protected]

• Miron Livny– [email protected]

• Software, manuals, more info:– http://www.cs.wisc.edu/condor/parrot

• The Condor Project:– http://www.cs.wisc.edu/condor