Top Banner
Efficient Access to Efficient Access to Many Small Files Many Small Files in a Grid Filesystem in a Grid Filesystem Douglas Thain and Christopher Douglas Thain and Christopher Moretti Moretti University of Notre Dame University of Notre Dame
44

Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Efficient Access toEfficient Access toMany Small FilesMany Small Files

in a Grid Filesystem in a Grid Filesystem

Douglas Thain and Christopher MorettiDouglas Thain and Christopher Moretti

University of Notre DameUniversity of Notre Dame

Page 2: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Efficient Access to ManyEfficient Access to ManySmall (and Big) FilesSmall (and Big) Files in a Grid Filesystem in a Grid Filesystem

Douglas Thain and Christopher MorettiDouglas Thain and Christopher Moretti

University of Notre DameUniversity of Notre Dame

Page 3: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

AbstractAbstractMany grid data tools focus on transferring, Many grid data tools focus on transferring, storing, and managing large (GB-TB) files.storing, and managing large (GB-TB) files.

But, many users need to manage, transfer, and But, many users need to manage, transfer, and process lots (1000s) of small (KB-MB) files.process lots (1000s) of small (KB-MB) files.

We describe protocols and interfaces for We describe protocols and interfaces for manipulating many small files over wide area manipulating many small files over wide area networks. (Doesn’t hurt large files, either.)networks. (Doesn’t hurt large files, either.)

Implemented in the Implemented in the ChirpChirp file system. file system.

Performance:Performance:– Best case: order of magnitude improvement.Best case: order of magnitude improvement.– Worst case: no slower than before.Worst case: no slower than before.

Page 4: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

The Small File ProblemThe Small File Problem

Page 5: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Who has lots of small files?Who has lots of small files?

Anyone using a batch system.Anyone using a batch system.– One file for submit, input, output, error, log...One file for submit, input, output, error, log...

Anyone using a large software package.Anyone using a large software package.– Executables, libraries, config files...Executables, libraries, config files...

Anyone using a filesystem like a database.Anyone using a filesystem like a database.– Genomics, astronomy, physics...Genomics, astronomy, physics...

Anyone who likes to write shell scripts.Anyone who likes to write shell scripts.– foreach host in list ssh $host > $host.outputforeach host in list ssh $host > $host.output

Page 6: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Why is this a problem?Why is this a problem?

Users do the “sensible” thing:Users do the “sensible” thing:– foreach file in (list) do transfer doneforeach file in (list) do transfer done

The “sensible” thing performs miserably:The “sensible” thing performs miserably:– New TCP ConnectionNew TCP Connection– SSL AuthenticationSSL Authentication– Configuration OperationsConfiguration Operations– Slow Start AgainSlow Start Again

Result is KB/s on a GB/s link.Result is KB/s on a GB/s link.

Page 7: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Why not just use tar?Why not just use tar?

If you can, you should!If you can, you should!Sometimes you cannot:Sometimes you cannot:– The system semantics demand multiple files.The system semantics demand multiple files.– Packing and unpacking can be very slow.Packing and unpacking can be very slow.– Not enough disk space to unpack.Not enough disk space to unpack.– Different apps select different data subsets.Different apps select different data subsets.– Using an existing script or program.Using an existing script or program.

Users don’t know or care that it’s a dist Users don’t know or care that it’s a dist system, why should they change?system, why should they change?

Page 8: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

The Challenge:The Challenge:

How to design How to design interfacesinterfacesso that users get the expectedso that users get the expected

performance and behavior?performance and behavior?

Page 9: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Chirp and Parrot:Chirp and Parrot:A Grid FilesystemA Grid Filesystem

Page 10: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Requirements for a Grid FilesystemRequirements for a Grid Filesystem

Transparent access to files in the same Transparent access to files in the same manner as a local Unix filesystem.manner as a local Unix filesystem.Non privileged deployment at both client Non privileged deployment at both client and server. (root not possible on the grid.)and server. (root not possible on the grid.)User control over policies for naming, User control over policies for naming, caching, consistency, and fault tolerance.caching, consistency, and fault tolerance.Flexible access controls for sharing.Flexible access controls for sharing.Good performance on both small and Good performance on both small and large files.large files.

Page 11: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Chirp/Parrot – A Grid Chirp/Parrot – A Grid FilesystemFilesystem

Chirp

OrdinaryUnix

Filesystem

OrdinaryUnix

Program

Parrot

unixsystem

calls

Authorization:kerberos:[email protected] RWLDAglobus:/O=ND/CN=Joe RWLDAhostname:*.nd.edu RLgroup:server.nd.edu/team RWL

Protocol:open / pread / pwrite / closestat / mkdir / rmdir / unlinkgetfile / putfile / movefile

Authentication:Kerberos / Globus / Hostname / Unix

Single TCP Stream

NoPrivs

Needed!

NoPrivs

Needed!

Automatic Recoveryptracetrap

Page 12: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Ordinary Unix CommandsOrdinary Unix Commands

> parrot tcsh> parrot tcsh

> ls /chirp> ls /chirp

alpha.nd.edualpha.nd.edu

beta.nd.edubeta.nd.edu

......

> cd /chirp/alpha.nd.edu/mydir> cd /chirp/alpha.nd.edu/mydir

> cp /tmp/bigdata .> cp /tmp/bigdata .

> emacs mydata.txt> emacs mydata.txt

Page 13: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Parrot Specific CommandsParrot Specific Commands

> parrot tcsh> parrot tcsh

> parrot_whoami> parrot_whoami

globus:/O=ND/CN=Joeglobus:/O=ND/CN=Joe

> parrot_getacl /chirp/alpha.nd.edu/> parrot_getacl /chirp/alpha.nd.edu/

kerberos:[email protected] RWLDAkerberos:[email protected] RWLDA

globus:/O=ND/CN=Joe RWLglobus:/O=ND/CN=Joe RWL

hostname:*.nd.edu RLhostname:*.nd.edu RL

Page 14: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Chirp as Remote FilesystemChirp as Remote Filesystem

Grid Site A Grid Site B

App

Parrot

App

Parrot

App

Parrot

App

Parrot

App

Parrot

App

Parrot

App

Parrot

ChirpServer

UnixFilesystem

GridMiddleware

App

ParrotCert

Securedby GSI

Page 15: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Chirp as Cluster FilesystemChirp as Cluster Filesystem

Grid Site A Grid Site B

App

Parrot

App

Parrot

App

Parrot

App

Parrot

App

Parrot

App

Parrot

App

Parrot

ChirpServer

UnixFilesystem

ChirpServer

UnixFilesystem

ChirpServer

UnixFilesystem

ChirpServer

UnixFilesystem

dirserver

auxdb

Page 16: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

http://www.cse.nd.edu/~ccl/viz

Page 17: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Sample ApplicationsSample Applications

Image Processing for BiometricsImage Processing for Biometrics– Moretti et al, PCGRID 2007Moretti et al, PCGRID 2007

Bioinformatics on EGEEBioinformatics on EGEE– Blanchet et al, Grid 2006Blanchet et al, Grid 2006

High Energy Physics on LCGHigh Energy Physics on LCG– Sfiligoi et al, CHEP 2005, Sfiligoi et al, CHEP 2005,

Molecular Dynamics RepositoryMolecular Dynamics Repository– Wozniak et al, HPDC 2005Wozniak et al, HPDC 2005

Remote DB Access on EDGRemote DB Access on EDG– Klous et al, CCPE 2005Klous et al, CCPE 2005

Page 18: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Protocols for Small FilesProtocols for Small Files

Page 19: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

What About FTP?What About FTP?

FTP is a great FTP is a great data transferdata transfer system, but it system, but it was never designed to be a was never designed to be a file systemfile system::– New TCP stream per data transfer.New TCP stream per data transfer.– New TCP stream for each directory list.New TCP stream for each directory list.– Lots of connections can overwhelm net devices.Lots of connections can overwhelm net devices.– Coarse errors: 550 for all file system errors.Coarse errors: 550 for all file system errors.– Semantic problems: e.g. empty directory.Semantic problems: e.g. empty directory.– Unix access controls, (But, see SecPAL)Unix access controls, (But, see SecPAL)– Wildly varying implementations and support.Wildly varying implementations and support.

Page 20: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

FTP Protocol ReminderFTP Protocol Reminder

AUTH GSSAPIMICMIC

Data Transfer

AUTH GSSAPIMICMIC

PORTRETR

Control Connection

Data Connection

FTPClient

FTPServer

Minimum of four round trips (plus auth overhead) to fetch a file +

loss of TCP window.

Common practice is new control connection for

every data transfer!

Page 21: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

What About NFS?What About NFS?

NFS was designed for a local area NFS was designed for a local area network among (relatively) trusted hosts.network among (relatively) trusted hosts.– Fine-grained file access very slow on WAN.Fine-grained file access very slow on WAN.– Kernel support and root assistance needed to Kernel support and root assistance needed to

start server, mount client, change target.start server, mount client, change target.– Unix UID for ownership, access control.Unix UID for ownership, access control.– Need to bind to privileged port, often filtered.Need to bind to privileged port, often filtered.– Use of “file handles” to refer to files makes it Use of “file handles” to refer to files makes it

very difficult to build a user-level server.very difficult to build a user-level server.+ lots of lookup operations over the WAN.+ lots of lookup operations over the WAN.

Page 22: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

NFS Protocol ReminderNFS Protocol Reminder

NFSClient

NFSServer

On a WAN, throughput limited to 4KB/latency.

10ms = 400 KB/s

100ms = 40 KB/s

lookup(00,a)lookup(10,b)lookup(20,c)

...

read 4KBread 4KBread 4KB

...

Page 23: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Chirp Hybrid Protocol OverviewChirp Hybrid Protocol Overview

ChirpClient

ChirpServer

auth globus (8 RTT)openreadwriteclose...getfile(“mydata”)

putfile(“otherdata”,size)

size and data

data

Page 24: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Protocol ComparisonProtocol Comparison

FTP - Stream per FileFTP - Stream per File– Latency = 4+ RTT for each fileLatency = 4+ RTT for each file– Throughput = TCP limit after slow startThroughput = TCP limit after slow start

NFS – Remote Procedure CallNFS – Remote Procedure Call– Latency = 1 RTT for each fileLatency = 1 RTT for each file– Throughput = block size / latencyThroughput = block size / latency

Chirp - HybridChirp - Hybrid– Latency = 1 RTT for each fileLatency = 1 RTT for each file– Throughput = TCP limit in steady stateThroughput = TCP limit in steady state

Page 25: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Local Area PerformanceLocal Area Performance

Page 26: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Wide Area PerformanceWide Area Performance

Page 27: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Real WAN PerformanceReal WAN Performance

Page 28: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Interfaces for Small FilesInterfaces for Small Files

Page 29: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Standard Unix CopyStandard Unix Copy

Parrot

cp

Local Chirp

LocalDisk

ChirpServer

open(source)

open(source)

read

read

open

open

write

write

open(source)open(target)

loop: read/write

cp /tmp/source /chirp/B/target

Page 30: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Problem:Problem:The system does not know the The system does not know the

contextcontext of the operation! of the operation!

Solution:Solution:Introduce a higher-level operationIntroduce a higher-level operationcopyfilecopyfile that exploits the context. that exploits the context.

Page 31: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Improved Copy with CopyfileImproved Copy with Copyfile

Parrot

newcp

Local Chirp

LocalDisk

ChirpServer

copyfile(source,target)

open(source)

open(source)

putfile(target)

putfile(target)

cp /tmp/source /chirp/B/target

Page 32: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Is it reasonable to modify cp?Is it reasonable to modify cp?

Installation:Installation:– Cannot modify /bin/cp.Cannot modify /bin/cp.– Install new parrot_cpInstall new parrot_cp– Alias cp or link named “cp” in PATH.Alias cp or link named “cp” in PATH.

Backwards compatibility:Backwards compatibility:– parrot_cp without Parrot falls back to normal.parrot_cp without Parrot falls back to normal.– Ordinary cp on Parrot behaves as before.Ordinary cp on Parrot behaves as before.– Parrot_cp on a different filesystem falls back.Parrot_cp on a different filesystem falls back.

Page 33: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Improved Copy with CopyfileImproved Copy with Copyfile

Parrot

newcp

Chirp

ChirpServer

B

copyfile(source,target)

thirdput(source,B,target)

ChirpServer

A

cp /chirp/A/source /chirp/B/target

putfile(target)thirdput(source,B,target)

Page 34: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Directory CopyDirectory Copy

ChirpServer

B

ChirpServer

A

ACL X Y Z

mydir

thirdput(/mydir/X,B,/mydir/X)

X

setacl(mydir)

ACL

mydir

thirdput(/mydir/X,B,/mydir/Y)

Y

thirdput(/mydir/X,B,/mydir/Z)

Z

cp

Parrot

mkdir(mydir)

cp –r /chirp/A/mydir /chirp/B/mydir

Page 35: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Improved Directory CopyImproved Directory Copy

ChirpServer

B

ChirpServer

A

ACL X Y Z

mydir

ACL X Y Z

mydir

mkdirputfile*3setacl

cp

Parrot

thirdput(/mydir,B,/mydir)

cp –r /chirp/A/mydir /chirp/B/mydir

Page 36: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Third Party PerformanceThird Party Performance

Page 37: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

You get the idea...You get the idea...

ls –la Dls –la D– Original: getdir D + N*statOriginal: getdir D + N*stat– Improved: getlongdir DImproved: getlongdir D

rm –rf Drm –rf D– Original: getdir D + N*unlink (recursive)Original: getdir D + N*unlink (recursive)– Improved: rmall DImproved: rmall D

md5sum Fmd5sum F– Original: open F + N*read + closeOriginal: open F + N*read + close– Improved: md5 FImproved: md5 F

Page 38: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Final ExampleFinal Example

ls –la /chirp/alpha/datals –la /chirp/alpha/data

md5sum /chirp/alpha/data/*md5sum /chirp/alpha/data/*

cp -r /chirp/alpha/datacp -r /chirp/alpha/data

/chirp/beta/data/chirp/beta/data

md5sum /chirp/beta/data/*md5sum /chirp/beta/data/*

rm –rf /chirp/alpha/datarm –rf /chirp/alpha/data

Page 39: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Original ImplementationOriginal Implementation

ls -la md5 cp rm cp md5

chirpserver

A

chirpserver

B

parrot

app

Page 40: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Improved ImplementationImproved Implementation

rm

chirpserver

A

chirpserver

B

parrot

app

ls -la md5 cp md5

Page 41: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

Performance on ScriptPerformance on Script

0

20

40

60

80

100

120

140

160

180

list

chec

ksum

mov

e

chec

ksum

dele

te

tim

e (s

eco

nd

s)

Original

Improved

Page 42: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

The Challenge:The Challenge:

How to design How to design interfacesinterfacesso that users get the expectedso that users get the expected

performance and behavior?performance and behavior?

Page 43: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

SummarySummaryGood small file performance requires Good small file performance requires attention to low level network protocols.attention to low level network protocols.– getfile, putfile, thirdput, rmall, checksumgetfile, putfile, thirdput, rmall, checksum

Exploiting protocols requires minor Exploiting protocols requires minor changes to the Unix I/O interface.changes to the Unix I/O interface.– copyfile, rmall, checksum, others?copyfile, rmall, checksum, others?

Easy to apply those changes in a user Easy to apply those changes in a user transparent way.transparent way.– cp, rm, md5sum all operate as normalcp, rm, md5sum all operate as normal

Usable performance in a wide-area FS.Usable performance in a wide-area FS.

Page 44: Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame.

For more information...For more information...

Douglas ThainDouglas Thain–[email protected]@nd.edu

Chris MorettiChris Moretti–[email protected]@nd.edu

Parrot and ChirpParrot and Chirp–http://www.cctools.orghttp://www.cctools.org