Top Banner
Daniel Moth, Parallel Computing Platform, Microsoft
35

Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Feb 04, 2018

Download

Documents

vuongquynh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Daniel Moth, Parallel Computing Platform, Microsoft

Page 2: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Heterogeneous platform support in Visual Studio

Page 3: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;
Page 4: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Context

Code

Closing thoughts

Page 5: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

146X

Interactive

visualization of

volumetric white

matter connectivity

36X

Ionic placement for

molecular

dynamics

simulation on GPU

19X

Transcoding HD

video stream to

H.264

17X

Simulation in

Matlab using .mex

file CUDA function

100X

Astrophysics N-

body simulation

149X

Financial

simulation of

LIBOR model with

swaptions

47X

GLAME@lab: An

M-script API for

linear Algebra

operations on GPU

20X

Ultrasound

medical imaging

for cancer

diagnostics

24X

Highly optimized

object oriented

molecular

dynamics

30X

Cmatch exact string

matching to find

similar proteins and

gene sequences

Page 6: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

CPU GPU

Low memory bandwidth

Higher power consumption

Medium level of parallelism

Deep execution pipelines

Random accesses

Supports general code

Mainstream programming

High memory bandwidth

Lower power consumption

High level of parallelism

Shallow execution pipelines

Sequential accesses

Supports data-parallel code

Niche/exotic programming

Page 7: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

CPUs and GPUs coming closer together…

…nothing settled in this space, things still in motion…

We have designed a mainstream solution not only for today, but also for tomorrow

Page 8: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Part of Visual C++

Visual Studio integration

STL-like library for multidimensional data

Builds on DirectX

performance

portability

productivity

Page 9: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Context

Code

Closing thoughts

Page 10: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

void AddArrays(int n, int * pA, int * pB, int * pC) { for (int i=0; i<n; i++) { pC[i] = pA[i] + pB[i]; } }

How do we take the serial code on the left that runs on the CPU and convert it to run on the GPU?

Page 11: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

How do we take the serial code on the left that runs on the CPU and convert it to run on the GPU?

void AddArrays(int n, int * pA, int * pB, int * pC) { for (int i=0; i<n; i++) { pC[i] = pA[i] + pB[i]; } }

#include <amp.h> using namespace concurrency; void AddArrays(int n, int * pA, int * pB, int * pC) { array_view<int,1> a(n, pA); array_view<int,1> b(n, pB); array_view<int,1> sum(n, pC); parallel_for_each( sum.grid, [=](index<1> idx) restrict(direct3d) { sum[idx] = a[idx] + b[idx]; } ); }

void AddArrays(int n, int * pA, int * pB, int * pC) { for (int i=0; i<n; i++) { pC[i] = pA[i] + pB[i]; } }

Page 12: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

void AddArrays(int n, int * pA, int * pB, int * pC) { array_view<int,1> a(n, pA); array_view<int,1> b(n, pB); array_view<int,1> sum(n, pC); parallel_for_each( sum.grid, [=](index<1> idx) restrict(direct3d) { sum[idx] = a[idx] + b[idx]; } ); }

array_view variables captured and copied to device (on demand)

restrict(direct3d): tells the compiler to check that this code can execute on DirectX hardware

parallel_for_each: execute the lambda on the accelerator once per thread

grid: the number and shape of threads to execute the lambda

index: the thread ID that is running the lambda, used to index into captured arrays

array_view: Wraps the data to operate on the accelerator

Page 13: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

index<N>

represents an N-dimensional point

extent<N>

number of elements in each dimension of an N-dimensional array

grid<N>

origin (index<N>) plus extent<N>

N can be any number

conveniences for up to 3 dimensions (z,y,x)

Page 14: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

index<1> i1(2); index<2> i2(0,2); index<3> i3(2,0,1);

extent<3> e3(3,2,2); extent<2> e2(3,4); extent<1> e1(6);

grid<3> g(index<3>(47,58,12), extent<3>(3,2,2));

grid<3> g3(e3); grid<2> g2(e2); grid<1> g1(e1);

// cubic indices from (-1,-1,-1) through (98,98,98) grid<3> g(index<3>(-1,-1,-1), extent<3>(100,100,100));

Page 15: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Multi-dimensional array of rank N with element T

Storage lives on accelerator

vector<int> v(96); extent<2> e(8,12); // e.y == 8; e.x == 12; array<int,2> a(e, v.begin(), v.end()); // in my lambda index<2> i(3,9); // i.y == 3; i.x == 9; int o = a[i]; // = a(i[0], i[1]); // = a(i.y, i.x)

Page 16: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

View on existing data on the CPU or GPU

Usage considerations

array_view<T,N>

array_view<const T,N>

array_view<writeonly<T>,N>

vector<int> v(10);

extent<2> e(2,5); array_view<int,2> a(e, v);

//above two lines can be written //array_view<int,2> a(2,5,v);

Page 17: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

array<T,N> array_view<T,N>

Rank at compile time

Extent at runtime

Rectangular

Dense

Origin always at zero

Container for data

Explicit copy

Capture by reference [&]

Rank at compile time

Extent at runtime

Rectangular

Dense in one dimension

Origin can be non-zero

Wrapper for data

Future proof design

Capture by value [=]

Page 18: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

1. parallel_for_each( 2. grid<N>, 3. [ ](index<N>) restrict(direct3d) { // kernel code } 1. );

Executes the lambda for each point in the grid

As-if synchronous in terms of visible side-effects

Page 19: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Applies to functions (including lambdas)

Why restrict

Target-specific language restrictions

Optimizations or special code-gen behavior

Functions can have multiple restrictions

In 1st release we are implementing “direct3d” and “cpu”

“cpu” – the implicit default

Page 20: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Can only call other restrict(direct3d) functions

All functions must be inlinable

Only direct3d-supported types

int, unsigned int, float, double

structs & arrays of these types

Pointers and References

Lambdas cannot capture by reference, nor capture pointers

References and single-indirection pointers supported only as local variables and function arguments

Page 21: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

No

recursion

'volatile'

virtual functions

pointers to functions

pointers to member functions

pointers in structs

pointers to pointers

No

goto or labeled statements

throw, try, catch

globals or statics

dynamic_cast or typeid

asm declarations

varargs

unsupported types

e.g. bool, char, short, long double

Page 22: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

void MatrixMultiply( vector<float>& C, const vector<float>& vA, const vector<float>& vB, int M, int N, int W ) { for (int y = 0; y < M; y++) { for (int x = 0; x < N; x++) { float sum = 0; for(int i = 0; i < W; i++) sum += vA[y * W + i] * vB[i * N + x]; vC[y * N + x] = sum; } } }

void MatrixMultiply( vector<float>& vC, const vector<float>& vA, const vector<float>& vB, int M, int N, int W ) { array_view<const float,2> a(M,W,vA),b(W,N,vB); array_view<writeonly<float>,2> c(M,N,vC); parallel_for_each(c.grid, [=](index<2> idx) restrict(direct3d) {

float sum = 0; for(int i = 0; i < a.extent.x; i++) sum += a(idx.y, i) * b(i, idx.x); c[idx] = sum;

} ); }

Page 23: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

restrict(direct3d, cpu)

parallel_for_each

class array<T,N>

class array_view<T,N>

class index<N>

class extent<N>

class grid<N>

class accelerator

class accelerator_view

Page 24: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Schedule threads in a tiled manner

Avoid thread index remapping

Gain ability to use tile static memory

parallel_for_each overload for tiles accepts

tiled_grid<X> or tiled_grid<Y,X> or tiled_grid<Z,Y,X>

a lambda which accepts

tiled_index<X> or tiled_index<Y,X> or tiled_index<Z,Y,X>

0 1 2 3 4 5

0

1

2

3

4

5

6

7

0 1 2 3 4 5

0

1

2

3

4

5

6

7

Page 25: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Given

When the lambda is executed by

t_idx.global = index<2> (6,3)

t_idx.local = index<2> (0,1)

t_idx.tile = index<2> (3,1)

t_idx.tile_origin = index<2> (6,2)

T

array_view<int,2> data(8, 6, pMyData); parallel_for_each( data.grid.tile<2,2>(), [=] (tiled_index<2,2> t_idx)… { … });

0 1 2 3 4 5

0

1

2

3

4

5

6 T

7

Page 26: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Within the kernel we can use

tile_static storage class

only applicable in restrict(direct3d)

indicates that the local variable is allocated in shared memory, i.e. shared by each thread in a tile of threads

class tile_barrier

synchronize all threads within a tile

e.g. myTiledIndex.barrier.wait();

Page 27: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

void MatrixMultiplySimple(float* A, float* B, float* C, int M, int N, int W) { extent<2> eA(M, N), eB(N, W), eC(M, W); grid<2> g(eC); array<float,2> mA(eA, A), mB (eB, B), mC (eC); parallel_for_each(g, [=, &mA, &mB, &mC] (index<2> idx) restrict(direct3d) { float temp = 0; for(int k = 0; k < N; k++) temp += mA(idx.y, k) * mB(k, idx.x); mC(idx) = temp; } ); copy(mC, C); }

void MatrixMultiplyTiled(float* A, float* B, float* C, int M, int N, int W) { static const int TS = 16; extent<2> eA(M, N), eB(N, W), eC(M, W); grid<2> g(eC); array<float,2> mA (eA, A), mB (eB, B), mC (eC); parallel_for_each(g.tile< TS, TS >(), [=, &mA, &mB, &mC] (tiled_index< TS, TS> t_idx) restrict(direct3d) { float temp = 0; index<2> locIdx = t_idx.local; index<2> globIdx = t_idx.global; for (int i = 0; i < N; i += TS) { tile_static float locB[TS][TS], locA[TS][TS]; locA[locIdx.y][locIdx.x] = mA(globIdx.y, i + locIdx.x); locB[locIdx.y][locIdx.x] = mB(i + locIdx.y, globIdx.x); t_idx.barrier.wait(); for (int k = 0; k < TS; k++) temp += locA[locIdx.y][k] * locB[k][locIdx.x]; t_idx.barrier.wait(); } mC[t_idx] = temp; } ); copy(mC, C); }

Page 28: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

restrict(direct3d, cpu)

parallel_for_each

class array<T,N>

class array_view<T,N>

class index<N>

class extent<N>

class grid<N>

class accelerator

class accelerator_view

class tiled_grid<Z,Y,X>

class tiled_index<Z,Y,X>

class tile_barrier

tile_static storage class

Page 29: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Context

Code

Closing thoughts

Page 30: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Organize

Edit

Design

Build

Browse

Debug

Profile

Page 31: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Organize

Edit

Design

Build

Browse

Debug

Profile

Page 32: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

We are looking for developers wanting to use C++ AMP to participate in a study on the API and tools

For 45 minutes of your time you get

To peek at what we are thinking

To influence our product direction and our development team

A Microsoft product as a “thank you”

Sign up at the Microsoft Lounge Information Desk

Page 33: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

Democratization of parallel hardware programmability

Performance for the mainstream

High-level abstractions in C++ (not C)

State-of-the-art Visual Studio IDE

Hardware abstraction platform

Intent is to make C++ AMP an open specification

Page 34: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

[email protected]

www.danielmoth.com/Blog/

Page 35: Heterogeneous platform support in Visual Studio - AMDdeveloper.amd.com/wordpress/media/2013/06/2616_final.pdf · Part of Visual C++ Visual Studio integration ... using namespace concurrency;

| AMD FUSION DEVELOPER SUMMIT | June 2011

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied.