Top Banner
Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton
9

Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Designing a CCA Framework for Accelerators

Khushbu AgarwalDaniel ChavarríaManoj Krishnan

Ian Gorton

Page 2: Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Motivation

Emerging hybrid applications & architectures:Many codes & kernels being ported to hybrid CPU/GPU clusters“Production”-level applications not yet (?)

Other accelerated systems being deployed/developed:

FPGA-accelerated systemsTraditional

Convey systems

Cray XMT system: x86 front-end nodes + multithreaded compute nodes

2

Page 3: Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Motivation (cont.)

3

Page 4: Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Motivating Applications

4

Molecular dynamics GPU-based kernels being developed at PNNL

Evolving into more sophisticated/complete hybrid MD application

CCSD(T) GPU-based kernels also being developedCray XMT-based power grid contingency analysis application:

Dynamic power flow-based topology analysis on multithreaded processors to determine non-critical elements

Numerical solution of critical contingency cases on x86-based cluster

Page 5: Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Motivating Applications (cont.)

5

“N-x”

CxN

“N-x”

CxNb

non-critical elements

Dynamic Topology Analysis

Western Power Grid “N-5” C5

17000 ~= 1020

Page 6: Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Objective

Provide a device independent interface to access accelerators and execute computations on them. e.g.:

Graphics Processor Units (GPUs)Field-Programmable Gate Arrays (FPGAs)Cray XMT

6

Page 7: Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Proposed SIDL Framework

Initialize()Initializes the connection between accelerator and local host

opaque AllocMem()Allocate Memory on the accelerator and returns a pointer to it

opaque ReadMem( opaque m)Synchronous - Reads data from the accelerator and copies it to local buffer

WriteMem(opaque m, opaque data)Synchronous - Copy data from local buffer to accelerator memory

Exec()Synchronous - Executes the computation kernel on the accelerator

int AsyncExec()Asynchronous – Executes the computation kernel on the accelerator and returns a handle

7

Page 8: Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Proposed SIDL Framework

AsyncWriteMem(opaque m, opaque data)Asynchronous – Issue request to copy data from local buffer to accelerator memory

opaque AsyncReadMem (opaque m)Asynchronous – Issue a request to read data from accelerator and copy it to local buffer

Wait() Wait for all the asynchronous data transfer operations to complete

WaitExec(int handle)Wait for asynchronous execution (identified by the handle) to complete

8

Page 9: Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Implementation Challenges

Needs to differentiate between on board vs “farther away” accelerators.Init – will need to set up a connection between host and acceleratorHow do we start the execution of a kernel (move data) on an accelerator?

Heavily system and platform dependent

Good to hide behind a component interface!

9