Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences
Presented by
Open MPI on the Cray XT
Richard L. Graham
Tech Integration
National Center for Computational Sciences
2 Graham_OpenMPI_SC07
Why does Open MPI exist?
• Maximize all MPI expertise:
research/academia,
industry,
…elsewhere.
• Capitalize on (literally) years ofMPI research and implementationexperience.
• The sum is greater than the parts.
Research/
academia
Industry
3 Graham_OpenMPI_SC07
Current membership
1 individual8 universities
7 vendors4 US DOE labs
14 members, 6 contributors
4 Graham_OpenMPI_SC07
Key design feature: Components
Formalized interfaces
• Specifies “black box” implementation
• Different implementations available at run-time
• Can compose different systems on the fly
Interface 3Interface 2Interface 1
Caller
5 Graham_OpenMPI_SC07
Point-to-point architecture
BTL-GM
MPool-GM
Rcache
BTL-OpenIB
MPool-OpenIB
Rcache
BML-R2
PML-OB1/DR
MPI
MTL-MX
(Myrinet)
PML-CM
MTL-
Portals
MTL-PSM
(QLogic)
6 Graham_OpenMPI_SC07
Portals port: OB1 vs. CM
OB1
• Matching in main-memory
• Short message: eager,buffer on receive
• Long message: rendezvous
Rendezvous packet:0 byte payload
Get message after match
CM
• Matching maybe on NIC
• Short message: eager,buffer on receive
• Long message: eager
Send all data
• If Match: deliver directlyto user buffer
• No Match: discard payload,and get() user data aftermatch
7 Graham_OpenMPI_SC07
Collective communications componentstructure
PML
OB
1
CM
DR
CR
CP
W
Allocator
Bas
ic
Buc
ket
BTL
TC
P
Sha
red
Mem
.
Infib
and
MTLM
yrin
et M
X
Por
tals
PS
M
Topology
Bas
ic
Util
ity
Collective
Bas
ic
Tun
ed
Hie
rarc
hica
l
Inte
rcom
m.
Sha
red
Mem
.
Non
-blo
ckin
g
I/O
Por
tals
MPI Component Architecture (MCA)
MPI API
User application
9 Graham_OpenMPI_SC07
NetPipe bandwidth data (MB/sec)
0.0001 0.001 0.01 0.1 1 10 100 1000 10000
Data Size (KBytes)
Open MPI—CM
Open MPI—OB1
Cray MPI
2000
1800
1600
1400
1200
1000
800
600
400
200
0
Ban
dw
idth
(M
Byte
s/s
ec)
10 Graham_OpenMPI_SC07
Zero byte ping-pong latency
4.78 secCray MPI
6.16 secOpen—OB1
4.91 secOpen MPI—CM
11 Graham_OpenMPI_SC07
VH1—Total runtime
3.5 4.0 4.5 5.0 5.5 6.5 7.0 7.5 8.5
Log 2 Processor Count
250
240
230
220
210
200
VH
-1 W
all
Clo
ck T
ime (
sec)
8.06.0
Open MPI—CM
Open MPI—OB1
Cray MPI
12 Graham_OpenMPI_SC07
GTC—Total runtime
1 2 3 4 5 7 8 9 11
Log 2 Processor Count
Open MPI—CM
Open MPI—OB1
Cray MPI
1150
1100
1050
1000
900
800106
950
850
GT
C W
all
Clo
ck T
ime (
sec)
13 Graham_OpenMPI_SC07
POP—Step runtime
3 4 5 6 7 8 9 11
Log 2 Processor Count
2048
1024
512
256
128PO
P T
ime S
tep W
all
Clo
ck T
ime (
sec)
10
Open MPI—CM
Open MPI—OB1
Cray MPI
14 Graham_OpenMPI_SC07
Summary and future directions
• Support for XT (Catamount and Compute Node Linux) withinstandard distribution
• Performance (application and micro-benchmarks)comparable to that of Cray MPI
• Support for recovery from process failure is being added
15 Graham_OpenMPI_SC07
Contact
Richard L. Graham
Tech IntegrationNational Center for Computational Sciences(865) [email protected]
www.open-mpi.org
15 Graham_OpenMPI_SC07