6/19/13 1 Scalasca, FFTW & Alltoall Pencil code: Performance JOACHIM HEIN, LUNARC, LUND UNIVERSITY IN COLLABORATION: ANDERS JOHANSEN, Outline • PRACE • Scalasca • Scalasca analysis of the Pencil code • Communications for Fourier transformations • FFTW
21
Embed
Scalasca, FFTW & Alltoall Pencil code: Performancemichiel/PC2013/presentations/... · • Times measured in 1000s, 64 MPI tasks • Speed up by almost 2× for transform shear •
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
6/19/13
1
Scalasca, FFTW & Alltoall Pencil code: Performance JOACHIM HEIN, LUNARC, LUND UNIVERSITY
IN COLLABORATION: ANDERS JOHANSEN,
Outline
• PRACE
• Scalasca
• Scalasca analysis of the Pencil code
• Communications for Fourier transformations
• FFTW
6/19/13
2
PRACE AN OVERVIEW
25 PRACE Members
April, 23rd 2010 creation of the legal entity (AISBL) PRACE
with seat location in Brussels, Belgium
67+ Million € from EC FP7 for preparatory and
implementation phases Grants INFSO-RI-211528,
261557, 283493, and 312763 Complemented by ~ 50 Million € from PRACE
members
Interest by: Latvia, Belgium
4
6/19/13
3
PRACE is building the top of the pyramid...
First production system available: 1 Petaflop/s IBM BlueGene/P (JUGENE) at GCS (Gauss Centre for Supercomputing) partner FZJ (Forschungszentrum Jülich)
Second production system available: Bull Bullx CURIE at GENCI partner CEA. Full capacity of 1.8 Petaflop/s reached by late 2011.
Third production system available by the end of 2011: 1 Petaflop/s Cray (HERMIT) at GCS partner HLRS (High Performance Computing Center Stuttgart).
Fourth production system available by mid 2012: 3 Petaflop/s IBM (SuperMUC) at GCS partner LRZ (Leibniz-Rechenzentrum).
Tier-0
Tier-1
Tier-2
Fifth production system available by August 2011: 1 Petaflop/s IIBM BG/Q (FERMI) at CINEC.
Sixth production system available by January 2013: 1 Petaflop/s IBM (MareNostrum) at BSC.
Upgrade: 5.87 Petaflop/s IBM Blue Gene/Q (JUQUEEN)
• Tier-0 program – Largest machines in Europe – Can have several 10 M CPU hours – Call expected in September/October frame – Scalability requirements – should be fine for Pencil – Also: preparatory access program – cut-off 2nd September
• Can ask for help from PRACE HPC expert
• Tier-1 program – DECI program for access to Tier-1 architecture – Selected on a national level – In Sweden can ask ≈ 6 M CPU hours (Lindgren) – typically cut a bit
• Training events and Schools • Website: www.prace-project.eu
– Info on the above – White papers
Selected PRACE activities
6
6/19/13
4
Performance Analysis Tool SCALASCA
Scalasca: Overview
• Parallel profiling tool – OpenMP – MPI
– Can be used for serial • Aims to help with questions like
– Where is my application spending time – Why is it spending time on this
• Fast and convenient to use
6/19/13
5
Overview (cont.)
• Developed by: – Forschungszentrum Jülich (Germany) – German Research School for Simulation Sciences
• Free but copyright • Widely available – architectures include: