Top Banner
Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008
22

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

Many-Core Programming with GRAMPS

Jeremy SugermanStanford University

September 12, 2008

Page 2: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

2

Background, Outline Stanford Graphics / Architecture Research

– Collaborators: Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan

To appear in ACM Transactions on Graphics

CPU, GPU trends… and collision? Two research areas:

– HW/SW Interface, Programming Model– Future Graphics API

Page 3: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

3

Problem Statement Drive efficient development and execution in

many-/multi-core systems. Support homogeneous, heterogeneous cores. Inform future hardware

Status Quo: GPU Pipeline (Good for GL, otherwise hard) CPU (No guidance, fast is hard)

Page 4: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

4

Software defined graphs Producer-consumer, data-parallelism Initial focus on rendering

GRAMPSInput

FragmentQueue

OutputFragment

Queue

Rasterization Pipeline

Ray Tracing Graph

= Thread Stage= Shader Stage= Fixed-func Stage

= Queue= Stage Output

RayQueue

Ray HitQueue Fragment

Queue

Camera Intersect

Shade FB Blend

Shade FB BlendRasterize

Page 5: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

5

As a Graphics Evolution Not (too) radical for ‘graphics’ Like fixed → programmable shading

– Pipeline undergoing massive shake up– Diversity of new parameters and use cases

Bigger picture than ‘graphics’– Rendering is more than GL/D3D– Compute is more than rendering– Some ‘GPUs’ are losing their innate pipeline

Page 6: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

6

As a Compute Evolution (1) Sounds like streaming:

Execution graphs, kernels, data-parallelism

Streaming: “squeeze out every FLOP”– Goals: bulk transfer, arithmetic intensity– Intensive static analysis, custom chips (mostly)– Bounded space, data access, execution time

Page 7: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

7

As a Compute Evolution (2) GRAMPS: “interesting apps are irregular”

– Goals: Dynamic, data-dependent code– Aggregate work at run-time– Heterogeneous commodity platforms

Naturally allows streaming when applicable

Page 8: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

8

GRAMPS’ Role A ‘graphics pipeline’ is now an app! GRAMPS models parallel state machines.

Compared to status quo:– More flexible than a GPU pipeline– More guidance than bare metal– Portability in between– Not domain specific

Page 9: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

9

GRAMPS Interfaces Host/Setup: Create execution graph

Thread: Stateful, singleton

Shader: Data-parallel, auto-instanced

Page 10: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

GRAMPS Entities (1) Accessed via windows

Queues: Connect stages, Dynamically sized– Ordered or unordered– Fixed max capacity or spill to memory

Buffers: Random access, Pre-allocated– RO, RW Private, RW Shared (Not Supported)

Page 11: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

GRAMPS Entities (2) Queue Sets: Independent sub-queues

– Instanced parallelism plus mutual exclusion– Hard to fake with just multiple queues

Page 12: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

12

What We’ve Built (System)

Page 13: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

13

GRAMPS Scheduler Tiered Scheduler

‘Fat’ cores: per-thread, per-core

‘Micro’ cores: shared hw scheduler

Top level: tier N

Page 14: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

14

What We’ve Built (Apps)Direct3D Pipeline (with Ray-tracing Extension)

Ray-tracing Graph

IA 1 VS 1 RO Rast

Trace

IA N VS N

PS

SampleQueue Set

RayQueue

PrimitiveQueue

Input VertexQueue 1

PrimitiveQueue 1

Input VertexQueue N

OM

PS2

FragmentQueue

Ray HitQueue

Ray-tracing Extension

PrimitiveQueue N

Tiler

Shade FB Blend

SampleQueue

TileQueue

RayQueue

Ray HitQueue

FragmentQueue

CameraSampler Intersect

= Thread Stage= Shader Stage= Fixed-func

= Queue= Stage Output= Push Output

Page 15: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

15

Initial Results Queues are small, utilization is good

Page 16: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

16

GRAMPS Visualization

Page 17: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

17

GRAMPS Visualization

Page 18: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

18

GRAMPS Portability Portability really means performance.

Less portable than GL/D3D– GRAMPS graph is (more) hardware sensitive

More portable than bare metal– Enforces modularity– Best case, just works – Worst case, saves boiler plate

Page 19: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

19

High-level Challenges Is GRAMPS a suitable GPU evolution?

– Enable pipeline competitive with bare metal?– Enable innovation: advanced / alternative

methods?

Is GRAMPS a good parallel compute model?– Map well to hardware, hardware trends?– Support important apps?– Concepts influence developers?

Page 20: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

20

What’s Next: Implementation Better scheduling

– Less bursty, better slot filling– Dynamic priorities– Handle graphs with loops better

More detailed costs– Bill for scheduling decisions– Bill for (internal) synchronization

More statistics

Page 21: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

21

What’s Next: Programming Model Yes: Graph modification (state change)

Probably: Data sharing / ref-counting

Maybe: Blocking inter-stage calls (join) Maybe: Intra/inter-stage synchronization primitives

Page 22: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

22

What’s Next: Possible Workloads REYES, hybrid graphics pipelines Image / video processing Game Physics

– Collision detection or particles Physics and scientific simulation AI, finance, sort, search or database query, …

Heavy dynamic data manipulation- k-D tree / octree / BVH build- lazy/adaptive/procedural tree or geometry