2005 2006 2007 2008 2009 2010 2011 2014 2012 2005 2006 2007 2008 2009 2010 2011 2014 2012 1 10 100 1000 10000 1 10 100 1000 10000 1 10 100 1000 10000 HPCC AWARDS CLASS 1: PERFORMANCE HPL This is the widely used implementation of the Linpack Toward Peak Performance benchmark. It measures the sustained floating point rate of execution for solving a linear system of equations. STREAM A simple benchmark test that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for four vector kernel codes. RandomAccess Measures the rate of integer updates to random locations in a large global memory array. PTRANS Implements parallel matrix transpose that exercises a large volume communication pattern whereby pairs of processes communicate with each other simultaneously. FFT Calculates a Discrete Fourier Transform (DFT) of very large one-dimensional complex data vectors. b_eff Effective bandwidth benchmark is a set of MPI tests that measure the latency and bandwidth of a number of simultaneous communication patterns. DGEMM Measures the floating point rate of execution of double precision real matrix-matrix multiplication. HPCC BENCHMARKS 0.1 1 10 100 1000 2005 2006 2007 2008 2009 2010 2011 2014 2012 2005 2006 2007 2008 2009 2010 2011 2014 2012 EP-STREAM-Triad G-RandomAccess G-FFT G-HPL 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd Cray XT5 Hex-core OAK RIDGE Jaguar IBM Blue Gene/P ARGONNE Intrepid Cray XT3 SANDIA Red Storm IBM Blue Gene/L LIVERMORE NEC SX-9 JAMSTEC Earth Simulator Fujitsu SPARC64 VIIIfx RIKEN AICS K computer IBM Blue Gene/Q ARGONNE Mira IBM Blue Gene/L LIVERMORE Cray XT5 Quad-core OAK RIDGE Jaguar Cray XT5 Hex-core OAK RIDGE Jaguar Fujitsu SPARC64 VIIIfx RIKEN AICS K computer IBM Blue Gene/L LIVERMORE IBM Blue Gene/P ARGONNE Intrepid Fujitsu SPARC64 VIIIfx RIKEN AICS K computer DARPA PERCS IBM Power7 Hub chip codenamed Torrent IBM Blue Gene/P LIVERMORE Dawn Cray XT5 Hex-core OAK RIDGE Jaguar IBM Blue Gene/P ARGONNE Intrepid Cray XT3 SANDIA Red Storm IBM Blue Gene/L LIVERMORE NEC SX-9 JAMSTEC Earth Simulator Fujitsu SPARC64 VIIIfx RIKEN AICS K computer TB/s GUPS Tflop/s Tflop/s FIND OUT MORE AT http://www.hpcchallenge.org SPONSORED BY National Science Foundation
2
Embed
HPCC AWARDS CLASS 1: PERFORMANCE - … · HPCC AWARDS CLASS 1: PERFORMANCE ... Blue Gene/L LIVERMORE NEC SX-9 JAMSTEC Earth Simulator Fujitsu SPARC64 VIIIfx RIKEN AICS K computer
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
2005 2006 2007 2008 2009 2010 2011 20142012
2005 2006 2007 2008 2009 2010 2011 201420121
10
100
1000
10000
1
10
100
1000
10000
1
10
100
1000
10000
HPCC AWARDS CLASS 1: PERFORMANCE
HPL
This is the widely used implementation of the Linpack Toward Peak Performance benchmark. It measures the sustained floating point rate of execution for solving a linear system of equations.
STREAM
A simple benchmark test that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for four vector kernel codes.
RandomAccess
Measures the rate of integer updates to random locations in a large global memory array.
PTRANS
Implements parallel matrix transpose that exercises a large volume communication pattern whereby pairs of processes communicate with each other simultaneously.
FFT
Calculates a Discrete Fourier Transform (DFT) of very large one-dimensional complex data vectors.
b_eff
Effective bandwidth benchmark is a set of MPI tests that measure the latency and bandwidth of a number of simultaneous communication patterns.
DGEMM
Measures the floating point rate of execution of double precision real matrix-matrix multiplication.
HPCC BENCHMARKS
0.1
1
10
100
1000
2005 2006 2007 2008 2009 2010 2011 20142012
2005 2006 2007 2008 2009 2010 2011 20142012
EP-STREAM-TriadG-RandomAccess
G-FFT G-HPL1st2nd
3rd
1st
2nd
3rd
1st
2nd3rd
1st
2nd3rd
Cray XT5 Hex-coreOAK RIDGEJaguar
IBMBlue Gene/PARGONNEIntrepid
Cray XT3SANDIARed Storm
IBMBlue Gene/LLIVERMORE
NEC SX-9JAMSTEC
EarthSimulator
FujitsuSPARC64 VIIIfx
RIKEN AICSK computer
IBMBlue Gene/Q
ARGONNEMira
IBMBlue Gene/LLIVERMORE
Cray XT5 Quad-core
OAK RIDGEJaguar
Cray XT5 Hex-coreOAK RIDGEJaguar
FujitsuSPARC64 VIIIfxRIKEN AICSK computer
IBMBlue Gene/LLIVERMORE
IBMBlue Gene/P
ARGONNEIntrepid
FujitsuSPARC64 VIIIfx
RIKEN AICSK computer
DARPA PERCSIBM Power7Hub chip codenamedTorrent
IBMBlue Gene/P
LIVERMOREDawn
Cray XT5 Hex-coreOAK RIDGEJaguar
IBMBlue Gene/PARGONNEIntrepidCray XT3
SANDIARed Storm
IBMBlue Gene/LLIVERMORE
NEC SX-9JAMSTEC
EarthSimulator
FujitsuSPARC64 VIIIfxRIKEN AICSK computer
TB
/s
GU
PS
Tfl
op
/s
Tfl
op
/s
FIND OUT MORE AT http://www.hpcchallenge.org
SPONSORED BY
National Science Foundation
PROJECT GOALS• Provide performance bounds in locality space using real world
computational kernels
• Allow scaling of input data size and time to run according to the system capability
• Verify the results using standard error analysis
• Allow vendors and users to provide optimized code for superior performance
• Make the benchmark information continuously available to the public in order to disseminate performance tuning knowledge and record technological progress over time
• Ensure reproducibility of the results by detailed reporting of all aspects of benchmark runs
FEATURE HIGHLIGHTS OF HPCC 1.4.3• Increased the size of scratch vector for local FFT tests that was
missed in the previous version (reported by SGI)• Added Makefile for Blue Gene/P contributed by Vasil Tsanov• Released in August 2013
SUMMARY OF HPCC AWARDSCLASS 1: Best Performance• Best in G-HPL, EP-STREAM-Triad per system, G-RandomAccess, G-FFT• There will be 4 winners (one in each category)
CLASS 2: Most Productivity• One or more winners• Judged by a panel at SC14 BOF• Stresses elegance and performance• Implementations in various (existing and new) languages are encouraged• Submissions may include up to two kernels not present in HPCC• Submission consists of: code, its description, performance numbers, and a