S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013 Performance Portability and Programmability for Heterogeneous Many-core Architectures (PEPPHER) Siegfried Benkner (on behalf of the PEPPHER Consortium) Research Group Scientific Computing Faculty of Computer Science University of Vienna Austria
23
Embed
Siegfried Benkner (on behalf of the PEPPHER Consortium ) Research Group Scientific Computing
Performance Portability and Programmability for Heterogeneous Many- core Architectures (PEPPHER). Siegfried Benkner (on behalf of the PEPPHER Consortium ) Research Group Scientific Computing Faculty of Computer Science University of Vienna Austria. EU Project PEPPHER. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
Performance Portability and Programmability for Heterogeneous Many-core Architectures (PEPPHER)
Siegfried Benkner
(on behalf of the PEPPHER Consortium)Research Group Scientific Computing
Faculty of Computer ScienceUniversity of Vienna
Austria
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
EU Project PEPPHERPerformance Portability & Programmability for Heterogeneous Manycore Architectures
• ICT FP7, Computing Systems; 3 years; finished Feb. 2013• 9 Partners, Coordinated by University of Vienna• http://www.peppher.eu
Goal: Enable portable, productive and efficient programming of single-node heterogeneous many-core systems.
Holistic Approach• Component-Based High-Level Program Development• Auto-tuned Algorithms & Data Structures• Compilation Strategies• Runtime Systems• Hardware Mechanisms
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
PEPPHER ComponentsComponent Interface• Specification of functionality• Used by mainstream programmers
Implementation Variants• Different architectures/platforms• Different algorithms/data structures• Different input characteristics• Different performance goals• Written by expert programmers (or generated, e.g. auto-tuning cf. EU Autotune Project)
Component Implementation Variants
…
«interface»C
f(param-list)
«variant»Cn
f(param-list){…}
«variant»C1
f(param-list){…}
Interfacemeta-data
Variantmeta-data
Variantmeta-data
Features • Different programming languages
(C/C++, OpenCL, Cuda, OpenMP)• Task & Data parallelism
Constraints• No side-effects; Non-preemptive• Stateless; Composition on CPU only
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
Platform Description Language (PDL)Goal: Make platform specific information explicit for tools and users.
Processing Units (PUs)• Master (initiates program execution)• Worker (executes delegated tasks)• Hybrid (master & worker)
Memory Regions• Express key characteristics of memory hierarchy• Can be defined for all processing units
Interconnects • describe communication facilities between PUs
Hardware and Software Properties• e.g., core-count, memory sizes, available libraries
Data movement
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
Component calls•asynchronous & synchronous
PEPPHER Coordination Language
#pragma pph call//read A, write B -> meta datacf1(A, N, B, M);
#pragma pph callcf2(B, M);
#pragma pph call synccf(A, N);
Other Features:•Specification of optimization goals (time vs. power) and execution targets
•Data partitioning; array access patterns; parameter assertions; •Memory consistency control
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
Source-to-Source Transformation• based on ROSE• generates C++ with calls
to coordination layer and StarPU runtime
Coordination Layer• Support for parallel patterns
(pipelining)• Submission of tasks to StarPUHeterogeneous Runtime System• Based on INRIA’s StarPU • Selection of implementation variants
based on available hardware resources• Data-aware & performance-aware task
scheduling onto heterogeneous PUs
Transformation System
Hybrid HardwareGPU MIC
PEPPHER Component Framework
Task-basedHeterogeneous Runtime
Application with Annotations
Transformation Tool
Coordination Layer
SMP
PEPPHERComponentRepository
PlatformDescriptor
PDL
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
Performance ResultsOpenCV Face Detection• 3425 images• Image resolution: 640x480 (VGA)• Different implementation variants for middle stages (CPU vs. GPU)• Comparison to plain OpenCV version and hand-coded Intel TBB(pipeline)
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
Major Results of PEPPHER Component Framework
• Multi-architectural, resource- & performance-aware components; • PDL adopted by Open Community Runtime (OCR) – US XStack program
Transformation, Composition, Compilation• Transformation Tool (U. Vienna)• Composition Tool & SkePU (U. Linköping)• Offload C++ compiler used by game industry (Codeplay)
Runtime System (U. Bordeaux)• StarPU part of Linux (Debian) distribution and MAGMA library
Superior parallel algorithms and data structures (KIT, Chalmers)
PePU Experimental Hardware Platform & Simulator (Movidius)• PeppherSIM used in industry
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
Backup Slides
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
Example
FOR k = 0..TILES-1POTRF(A[k][k])FOR m = k+1..TILES-1
TRSM(A[k][k], A[m][k])FOR n = k+1..TILES-1
SYRK(A[n][k], A[n][n])FOR m = n+1..TILES-1
GEMM(A[m][k], A[n][k], A[m][n])
Utilize expert written components:BLAS kernels from MAGMA and PLASMA
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013
Memory Consistency•flush; for ensuring consistency btw. host and workers
Component calls•implicit memory consistency across workers
Basic Coordination Language
#pragma pph callcf1 (A, N);...#pragma pph flush(A) // block until A has become availableint first = A[0]; // explicit flush req. since A is accessed
#pragma pph callcf1 (A, N); // A: read / write... // implicit memory consistency on workers only... // no explicit flush is needed here provided A... // is not accessed within the master process#pragma pph callcf2(A, N); // A:read; actual values of A produced by cf1()
S. Benkner, University of Vienna HIPEAC CSW 2013, Paris, May 2, 2013