Top Banner
An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin {Adrien.Lebre,Yves.Denneulin}@imag.fr ID-IMAG (UMR 5132) Laboratory, Grenoble, France BULL - HPC, ´ Echirolles, France. Slide 1/17 Adrien Lebre c Bull-ID LIPS 2004 aIOLi - CCGRID05 - May 2005
39

An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

Nov 23, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

An Input/Output LIbrary for clusterof SMP

Adrien Lebre, Yves Denneulin

{Adrien.Lebre,Yves.Denneulin}@imag.fr

ID-IMAG (UMR 5132) Laboratory, Grenoble, France

BULL - HPC, Echirolles, France.

6th May 2005Slide 1/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 2: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

Plan

1 IntroductionContextParallel Input/Output

2 aIOLi systemPreamblePrinciplesTechnical aspects

3 ResultsPOSIX vs aIOLiMPI I/O vs aIOLi

4 Conclusion

Slide 2/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 3: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Context

Environment

Cluster of SMPs

Linux

High Performance Computing

Intensive I/O applications

CPU bounded application ⇒ I/O bounded applicationRemote hard drive I/O

Parallel I/O

Handling concurrent accesses to a same resource (a file)

Accesses : different in size, in offsetExample : matrix product

Slide 3/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 4: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Context

Environment

Cluster of SMPs

Linux

High Performance Computing

Intensive I/O applications

CPU bounded application ⇒ I/O bounded applicationRemote hard drive I/O

Parallel I/O

Handling concurrent accesses to a same resource (a file)

Accesses : different in size, in offsetExample : matrix product

Slide 3/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 5: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Parallel I/O Example

Matrix product

Specific parts to fetch (accordingto data distribution: columns,rows, BLOCK/BLOCK,BLOCK/CYCLIC ...)

Several requests at the same time :disjoint/contiguous

“lethal” behavior for I/O subsystem

P0

P1

P2

P3

SMP Client

read(fd,buf,1024); //file position=0

read(fd,buf,1024); //file position=1024

read(fd,buf,1024); //file position=3072

read(fd,buf,1024); //file position=2048

from a global point of view 4 independent requests but contiguous

Slide 4/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 6: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Parallel I/O Example

Matrix product

Specific parts to fetch (accordingto data distribution: columns,rows, BLOCK/BLOCK,BLOCK/CYCLIC ...)

Several requests at the same time :disjoint/contiguous

“lethal” behavior for I/O subsystem

P0

P1

P2

P3

SMP Client

read(fd,buf,1024); //file position=3072

read(fd,buf,1024); //file position=0

read(fd,buf,1024); //file position=5120

read(fd,buf,1024); //file position=7168

read(fd,buf,1024); //file position=1024

...

4 requests have been processed ?

read(fd,buf,1024); //file position=4096

read(fd,buf,1024); //file position=6144read(fd,buf,1024); //file position=2048

What about the new requests ?Contiguous / Disjoint ?

Slide 4/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 7: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Parallel I/O Example

Matrix product

Specific parts to fetch (accordingto data distribution: columns,rows, BLOCK/BLOCK,BLOCK/CYCLIC ...)

Several requests at the same time :disjoint/contiguous

“lethal” behavior for I/O subsystem

P0

P1

P2

P3

SMP Client

read(fd,buf,1024); //file position=3072

read(fd,buf,1024); //file position=0 read(fd,buf,1024); //file position=4096

read(fd,buf,1024); //file position=5120

read(fd,buf,1024); //file position=7168

read(fd,buf,1024); //file position=1024

...

...

...

...

read(fd,buf,1024); //file position=2048read(fd,buf,1024); //file position=6144

4 requests have been processed ? What about the new requests ?

Contiguous / Disjoint ?

Slide 4/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 8: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Parallel I/ORequirements / constraints

Methods for disjoint data (readv) ⇒ complexity of API

Collective operations ⇒ Synchronization mechanisms

logical view (the files) ⇒ physical placements (block devices)

Available solutions - related works

Many Parallel File Systems : +/- efficient but hardware dependent

“cluster compliant” : PVFS, NFSparallel, GPFS, LustreDesigned for “ Parallel I/O” : PIOUS, VESTA ...

Libraries : Focus on portability aspects

A lot ! : MPI I/O is the reference.

Sophisticated API ⇒ Development overhead / Language bindings

Slide 5/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 9: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Parallel I/ORequirements / constraints

Methods for disjoint data (readv) ⇒ complexity of API

Collective operations ⇒ Synchronization mechanisms

logical view (the files) ⇒ physical placements (block devices)

Available solutions - related works

Many Parallel File Systems : +/- efficient but hardware dependent

“cluster compliant” : PVFS, NFSparallel, GPFS, LustreDesigned for “ Parallel I/O” : PIOUS, VESTA ...

Libraries : Focus on portability aspects

A lot ! : MPI I/O is the reference.

Sophisticated API ⇒ Development overhead / Language bindings

Slide 5/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 10: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Context summary

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

IO server 1

... ... P1 Pn

IO server 2

... ... P1 Pn

IO server n

... ...

P1 Pn

SMP Client

... ...

Slide 6/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 11: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Context summary

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

IO server 1

... ... P1 Pn

IO server 2

... ... P1 Pn

IO server n

... ...

P1 Pn

SMP Client

... ...

Slide 6/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 12: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Context summary

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

IO server 1

... ... P1 Pn

IO server 2

... ... P1 Pn

IO server n

... ...

P1 Pn

SMP Client

... ...

Slide 6/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 13: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Context summary

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ... P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

IO server 1

... ... P1 Pn

IO server 2

... ... P1 Pn

IO server n

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

Slide 6/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 14: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

ContextParallel Input/Output

Context summary

P1 Pn

P1 Pn

IO server n

... ...

P1 Pn

SMP Client

... ...P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ... P1 Pn

SMP Client

... ... P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...P1 Pn

SMP Client

... ...P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ... P1 Pn

SMP Client

... ...

P1 Pn

IO server 1

... ... P1 Pn

IO server 2

... ...

SMP Client

... ...

Slide 6/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 15: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

aIOLi system

Objectives

Supply Parallel I/O algorithms

scheduling policiesaggregating access ⇒ efficiencyoverlapping access

Only through the use of the ubiquitous POSIX calls

open/creat/lseek/read/write/close ⇒ Simplicity

Minimal overhead

avoid expensive synchronisation mechanisms (barrier, . . . )

Slide 7/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 16: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Evaluation of “the Linux” I/O stack

1 GB File decomposition on a SMP

(kernel 2.4.27, IDPOT cluster, NFS version 3, mpich 1.2.5)

0

50

100

150

200

250

300

40961024512128643216841

Com

plet

ion

time

(sec

)

Access granularity(KBytes)

1248

1 randomize

Observations

1 process ⇒ Sequentialread (optimal)

+ processes ⇒ -performance

1 process in randomaccess ⇒ moreperformance for largeaccess than parallelapproach

Slide 8/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 17: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Define a “think time” window

Maximize the use of I/O server (bandwidth)

At least one request should be in the queue on the server side

Apply parallel I/O algorithms in the queue on the client side

At most one request on the server side !

(1) A first request is sent to the file server.(2) The server processes it on the attached disk.(3) The reply is returned to the client.

(2)

Waiting I/O queue

(1)

Waiting I/O queue

(3)

SMP Client

I/O Node

Slide 9/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 18: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)

fd offset = 0

P0

t

WaitingProcessing

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 19: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)

fd offset = 0

P0

t

WaitingProcessing

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 20: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)

fd offset = 10

P1

read(fd,dest,10)

fd offset = 0

P0

t

WaitingProcessing

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 21: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)

fd offset = 20

P2

read(fd,dest,10)

fd offset = 10

P1

read(fd,dest,10)

fd offset = 0

P0

read(fd,dest,10)

fd offset = 40

P0

t

WaitingProcessing

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 22: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)

fd offset = 30

P3

read(fd,dest,10)

fd offset = 0

P0

read(fd,dest,10)

fd offset = 10

P1

WaitingProcessing

read(fd,dest,10)

fd offset = 20

P2

read(fd,dest,10)

fd offset = 40

P0

read(fd,dest,10)

fd offset = 50

P1

t

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 23: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)fd offset = 30

P3

read(fd,dest,10)fd offset = 0

P0

read(fd,dest,10)fd offset = 10

P1

WaitingProcessing

read(fd,dest,10)fd offset = 40

P0

read(fd,dest,10)fd offset = 50

P1

read(fd,dest,10)fd offset = 20

P2

t

Contiguous pattern

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 24: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)

fd offset = 10

P1

read(fd,dest,10)

fd offset = 20

P2

read(fd,dest,30)

fd offset = 30P3/P0/P1

read(fd,dest,10)

fd offset = 60

P2

WaitingProcessing

t

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 25: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)fd offset = 60

P2

read(fd,dest,10)fd offset = 20

P2

WaitingProcessing

t

read(fd,dest,30)fd offset = 30

P3/P0/P1

wait during a "Think time" periodas we discovered an aggregation

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 26: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)fd offset = 60

P2

read(fd,dest,10)fd offset = 20

P2

read(fd,dest,10)fd offset = 70

P3

WaitingProcessing

read(fd,dest,10)fd offset = 90

P1

read(fd,dest,10)fd offset = 80

P0

t

read(fd,dest,30)fd offset = 30

P3/P0/P1

Wait during a "think time" period

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 27: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)fd offset = 60

P2

read(fd,dest,10)fd offset = 20

P2

read(fd,dest,10)fd offset = 70

P3

read(fd,dest,10)fd offset = 90

P1

read(fd,dest,10)fd offset = 80

P0

read(fd,dest,10)fd offset = 60

P2

read(fd,dest,10)fd offset = 20

P2

read(fd,dest,10)fd offset = 70

P3

read(fd,dest,10)fd offset = 90

P1

read(fd,dest,10)fd offset = 80

P0

t

read(fd,dest,30)fd offset = 30

P3/P0/P1

WaitingProcessing

t

read(fd,dest,30)fd offset = 30

P3/P0/P1

WaitingProcessing

Reccurent contiguous pattern on fd

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 28: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

Fundamental concepts

Aggregating example

basic decomposition including 4 processes(granularity=10 bytes)

read(fd,dest,10)fd offset = 20

P2

WaitingProcessing

t

read(fd,dest,30)fd offset = 30

P3/P0/P1

read(fd,dest,40)fd offset = 60P2/P3/P0/P1 access pattern has been discovered

Slide 10/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 29: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

PreamblePrinciplesTechnical aspects

aIOLi prototype

User library

A component overloads POSIX calls(linked to the HPC applications)

aIOLi daemon

“Multi-threaded”Includes distinct improvementsProcesses real I/O calls

IPC mechanisms and shared memory

process naIOLi module

process 1aIOLi module

process 0aIOLi module

I/O thread 0 I/O thread 1 I/O thread n

I/O stack and remote file system clients

Receiver thread

waiting queues

Network stack

user spacekernel space

aIOLi service

(1)

(3)

(2)

To remote File Server

SMP Client Node

Slide 11/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 30: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

POSIX vs aIOLiMPI I/O vs aIOLi

Evaluation : POSIX vs aIOLi

1 GB File decomposition on a SMP

(kernel 2.4.27, IDPOT cluster, NFS version 3, mpich 1.2.5)

compiled without aIOLi

0

50

100

150

200

250

300

40961024512128643216841

Com

plet

ion

time

(sec

)

Access granularity(KBytes)

1248

Slide 12/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 31: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

POSIX vs aIOLiMPI I/O vs aIOLi

Evaluation : POSIX vs aIOLi

1 GB File decomposition on a SMP

(kernel 2.4.27, IDPOT cluster, NFS version 3, mpich 1.2.5)

compiled with aIOLi

0

50

100

150

200

250

300

40961024512128643216841

Com

plet

ion

time

(sec

)

Access granularity(KBytes)

1248

Slide 12/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 32: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

POSIX vs aIOLiMPI I/O vs aIOLi

Evaluation : MPI I/O vs aIOLi

1 GB File decomposition including 8 MPI instances on a SMP

(kernel 2.4.27, IDPOT cluster, NFS version 3, mpich 1.2.5, ROMIO)

0

50

100

150

200

250

300

40961024512128643216841

Com

plet

ion

time

(sec

)

Access granularity(KBytes)

posixlevel0level1level2level3

aioli Observations

MPI I/O :explicit access pattern

For all levels, aIOLiprovided the bestresults

Slide 13/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 33: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

Conclusion

Positive results

Efficient

Simplicity of the API ⇒ POSIX

No requirements for inter-processsynchronization.

Current constraints

Centralized distributed file system(such as NFS) vs Parallel FS

Reduce overhead for single access

Kernel scheduler dependent

Current and future works

Add Data striping considerations (stabilization phase)

Implement a patch for the VFS (summer 2005)

Evaluation on bigger SMP and Lustre

Coordination between several SMPs, in progress (Master)

The grid ...

Slide 14/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 34: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

Conclusion - summary

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

IO server 1

... ... P1 Pn

IO server 1

... ... P1 Pn

IO server 1

... ...

P1 Pn

SMP Client

... ...

Slide 15/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 35: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

Conclusion - summary

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

IO server 1

... ... P1 Pn

IO server 1

... ... P1 Pn

IO server 1

... ...

P1 Pn

SMP Client

... ...

aIOLi aIOLi aIOLi

aIOLi aIOLi aIOLi

aIOLi aIOLi aIOLi

aIOLi aIOLi aIOLi

Slide 15/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 36: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

Conclusion - summary

aIOLiMaster

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

IO server 1

... ... P1 Pn

IO server 1

... ... P1 Pn

IO server 1

... ...

P1 Pn

SMP Client

... ...

aIOLi aIOLi aIOLi

aIOLi aIOLi aIOLi

aIOLi aIOLi aIOLi

aIOLi aIOLi aIOLi

Slide 15/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 37: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

IntroductionaIOLi system

ResultsConclusion

Conclusion - summary

MasteraIOLi

GATEWAY

MasteraIOLi

GATEWAY

MasteraIOLi

GATEWAY

aIOLiMaster

INTERNET /GRID

Cluster X

Cluster Y

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

SMP Client

... ...

P1 Pn

IO server 1

... ...P1 Pn

IO server 1

... ...P1 Pn

IO server 1

... ...

P1 Pn

SMP Client

... ...

aIOLi aIOLi aIOLi

aIOLi aIOLi aIOLi

aIOLi aIOLi aIOLi

aIOLi aIOLi aIOLi

Slide 15/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 38: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

Question

Questions ?

http://aioli.imag.fr

LIPS ProjectBULL - INRIA - ID Laboratory

Thanks

Slide 16/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005

Page 39: An Input/Output LIbrary for cluster of SMPaioli.imag.fr/DOWNLOADS/aioli-ccgrid-presentation-may2005.pdf · An Input/Output LIbrary for cluster of SMP Adrien Lebre, Yves Denneulin

Question

MPI I/O improvements

Independant noncontigous request

using a derived data types (level2, Data Sieving)

using derived data types (level 3, Two Phases)

Collective noncontigous requests

Independant noncontigous request

using a derived data types (level2, Data Sieving)

File

Space

Collective contigous requests

like aIOLi concept (level 1)

0 1 2 3 Processes

MPI I/O − Four levels

[Thakhur/Gropp/Lusk02]

Representing increasing amounts of data per request

Slide 17/17

Adrien Lebrec©Bull-ID LIPS 2004

aIOLi - CCGRID05 - May 2005