Serge G. Petiton June 23rd, 2008
Post on 05-Feb-2016
36 Views
Preview:
DESCRIPTION
Transcript
What Programming Paradigms and algorithms for Petascale Scientific Computing, a Hierarchical Programming Methodology Tentative
Serge G. Petiton
June 23rd, 2008
Japan-French Informatic Laboratory (JFIL)
June, 23rd PAAP2
Outline
1. Introduction
2. Present Petaflops, on the Road to Future Exaflops
3. Experimentations, toward models and extrapolations
4. Conclusion
June, 23rd PAAP3
Outline
1. Introduction
2. Present Petaflops, on the Road to Future Exaflops
3. Experimentations, toward models and extrapolations
4. Conclusion
June, 23rd PAAP4
Introduction
The Petaflop frontier was crossed (May 25-26 night) – top500
Sustained Petaflop would be soon available by a large number of computers
As scheduled since the 9Oth, we didn’t really have large technological gaps to access Petaflops computers
Language and tools are not so different since first SMPs
What about languages, tools, methods for sustained 10 Petaflops
Exaflop would probably ask for new technology advancements and new ecosystems
On the road toward Exaflops, we would soon face difficult challenges and we have to anticipate new problems around the 10 Petaflop frontier.
June, 23rd PAAP5
Outline
1. Introduction
2. Present Petaflops, on the Road to Future Exaflops
3. Experimentations, toward models and extrapolations
4. Conclusion
June, 23rd PAAP6
Hyper Large Scale Hierarchical Distributed Parallel Architectures
Many-cores ask for new programming paradigm, as data parallel,
Message passing would be efficient for gang of cluster, Workflow and Grid-like programming may be a solution for the
higher level programming, Accelerators, vector computing, Energy consumption optimization, Optical networks, “Inter” and “intra” (chip, cluster, gang,….) communications Distributed/Shared Memory computer on a chip.
June, 23rd PAAP7
On the road from Petaflop toward Exaflop
Multi programming and execution paradigms, Technological and software challenge :
compilers, systems, middleware, schedulers, fault tolerance,…
New applications and Numerical Methods, Arithmetic and elementary function (multiple
and hybrids) Data distributed on networks and grids, Education challenges, we have to educate
scientists
June, 23rd PAAP8
and the road would be dificult….
Multi-level programming paradigms, Component Technologies, Mixed data migration and computing, with large instrument
control, We have to use end-users expertise, Indeterminist distributed computing, component dependence
graph, Middleware and Platform independent “Time to solution” minimization, new metrics We have to allow end-users to propose scheduler assistance
and to give some advice to anticipate data migration data
June, 23rd PAAP9
Outline
1. Introduction
2. Present Petaflops, on the Road to Future Exaflops
3. Experimentations, toward models and extrapolations
4. Conclusion
June, 23rd PAAP10
Front end : Depends only of theapplications
Back end : depends of middleware.
Ex. XtermWeb (F), OmniRPC (Jp), andCondor (USA).
http://yml.prism.uvsq.fr/
YMLLanguage
June, 23rd PAAP11
Components/Tasks Graph Dependency
Begin node
End node
Graph node
Dependence
par compute tache1(..); signal(e1);// compute tache2(..); migrate matrix(..); signal(e2);// wait(e1 and e2); Par compute tache3(..); signal(e3); // compute tache4(..); signal(e4); // compute tache5(..); control robot(..); signal(e5); visualize mesh(…) ; end par// wait(e3 and e4 and e5); compute tache6(..); compute tache7(..);end par
Generic component node
Résultat A
June, 23rd PAAP12
LAKe Library (Nahid Emad, UVSQ)
June, 23rd PAAP13
YML/LAKe
June, 23rd PAAP14
Block Gauss-Jordan, 101 processor Cluster, Grid 5000; YML versus YML/OmniRPC (with Maximes Hugues (TOTAL and LIFL))
Taille de bloc = 1500Block
NumberTask
NumberOverhead
(%)2x2 8 22.413x3 27 14.784x4 64 28.375x5 125 40.826x6 216 65.607x7 343 97.018x8 612 138.24
We optimize the « Time to Solution »Several middleware may be choose
Number of Blocks
Time
June, 23rd PAAP15
GRID 5000, BGJ,10, 101 nœuds, YML versus YML/OmniRPC
Block sizes = 1500Block
NumberOverhead
(%)101 nodes
Overhead (%)10 nodes
2x2 22.41 21.673x3 14.78 11.574x4 28.37 12.125x5 40.82 22.606x6 65.60 50.007x7 97.01 63.988x8 138.24 133.69
June, 23rd PAAP16
BGJ, YML/OmniRPC versus YML
Block Size = 1500 Block Number
Overhead (%)101 nodesGrid5000
Overhead (%)Cluster of clusters
2x2 22.41 17.583x3 14.78 14.224x4 28.37 25.175x5 40.82 24.646x6 65.60 62.867x7 97.01 40.128x8 138.24 99.79
June, 23rd PAAP17
Asynchronous Restarted Iterative Methods on multi-node computers
With Guy Bergère,Zifan Li, and Ye Zhang (LIFL)
June, 23rd PAAP18
Convergence on GRID 5000
1.00E-14
1.00E-12
1.00E-10
1.00E-08
1.00E-06
1.00E-04
1.00E-02
1.00E+00
1.00E+02
1.00E+04
1.00E+06
1.00E+08
1.00E+10
0 5 10 15 20 25 30
Temps (secondes)
No
rme
Res
idu
elle
GMRES pur
nG=2
nG=5
nG=8
nG=10
Time (second)
Residual Norm
June, 23rd PAAP19
One or two distributed sites, same number of processors, communication overlay
1.00E-141.00E-131.00E-121.00E-111.00E-101.00E-091.00E-081.00E-071.00E-061.00E-051.00E-041.00E-031.00E-021.00E-011.00E+001.00E+011.00E+021.00E+031.00E+041.00E+05
0 5 10 15 20 25
temps d'execution
no
rme
d'it
erat
ion
sur un site
sur deux sites
One site
Two sites
June, 23rd PAAP20
Cell/GPU CEA/DEN : with Christophe Calvin et Jérome Dubois (CEA/DEN Saclay)
MINOS/APOLLO3 solver Netronic tranport problem Power Method to compute the dominante eigenvalue Slow convergence Large number of floating point operations
Experimentations on : Bi-Xeons quadcore 2.83GHz (45 GFlops) CellBlade (Cines Montpellier) (400 GFlops) GPU Quadro FX 4600 (240 GFlops)
June, 23rd PAAP21
Power method : Performances
3264
128256
5121024
15362048
25603072
35844096
46085120
56326144
66567168
76808192
0
5
10
15
20
25
GPU
CPU
Cell
Nombre de lignes de la matrice
GF
lops
21
Matrix size
June, 23rd PAAP22Power Method : Arithmetic Accuracy
22
0.00E+00
5.00E-06
1.00E-05
1.50E-05
2.00E-05
2.50E-05
3.00E-05
3.50E-05
Itérations
Eca
rt m
esur
é
Difference
June, 23rd PAAP23Arnoldi Projection: Performances
23
32 64 128 256 512 1024 1536 2048 2560 3072 3584 4096 4608 5120 5632 6144 6656 7168 76800
5
10
15
20
25
CPU
Cell
GPU
Cell/LS
Nombre de lignes de la matrice
GF
lop
s
Matrix Size
June, 23rd PAAP24Arnoldi Projection : Arithmetic Accuracy
24
v1,v2v3,v4
v5,v6v7,v8
0,00000E+00
5,00000E-04
1,00000E-03
1,50000E-03
2,00000E-03
2,50000E-03
3,00000E-03
3,50000E-03
Erreur GPU
Erreur Cell
Erreur GPU
Erreur Cell
Déviation de la base orthogonale
Eca
rt
Orthogonalization degradation
June, 23rd PAAP25
Outline
1. Introduction
2. Present Petaflops, on the Road to Future Exaflops
3. Experimentations, toward models and extrapolations
4. Conclusion
June, 23rd PAAP26
Conclusion
We plan to extrpolate from Grid5000 and our multi-core experimentations some behaviors of the future hiearachical large petascale computers, using YML for the higher level,
We need to propose new high-level languages to program large Petaflop computers, to be able to minimize “Time to Solution” and energy consumptions, with system and middleware independencies, MPI would probably very difficult to dethrone,
Other important codes would be still carefully “hand-optimized”
Several Programming paradigms, with respect to the different level, have to me mixed. The interface have to be well-specified; MPI would probably very difficult to dethrone,
End-users have to be able to give expertise to help middleware management such as scheduling, and to chose libraries
New Asynchronous Hybrid Methods have to be introduced
top related