Top Banner
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan
14

Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

Many-Core Programming with GRAMPSJeremy SugermanKayvon FatahalianSolomon BoulosKurt AkeleyPat Hanrahan

Page 2: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

2

Problem Statement Facilitate efficient development and

execution in many-/multi-core commodity systems.

Homogeneous or heterogeneous cores.

Status Quo: GPUs: Easy to write GL/D3D and run it fast,

hard to express anything else CPUs: Possible (not easy) to write

anything, possible (hard) to run it fast

Page 3: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

3

GRAMPS Background Resembles a GPU with software constructed

pipeline. Not (too) radical even in a pure graphics context Similar story saw fixed -> programmable

shading Now the pipeline topology is under analogous

pressures: proliferation of stages and options And graphics is more than a GL/D3D pipeline… And throughput / many-core is more than

graphics…

Page 4: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

4

GRAMPS Programming Model Software constructs the pipeline (actually

graph) Exposes threads, shaders, fixed function

stages– Coprocessors exposed via ISA

Exposes FIFOs / Queues connecting stagesAlso enables software push / re-sorting

Exposes Buffers for memory access

Page 5: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

5

GRAMPS’ Place Compared to GPU Pipeline:

More things possible (and medium easy), still (mostly) runs fast, less hardware independent

Compared to CPU:Easier to write things, easier to run them well,

some loss of expressivity and flexibility

Still a role for a ‘graphics pipeline’. It’s an app! GRAMPS is a layer, model for state machines.

Page 6: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

6

GRAMPS and Streaming From some angles, GRAMPS sounds a lot like

Stream Processing / Computing Distinctions are most visible in the target

traits. Streaming expects predictable data creation,

flow, and consumption. Intensive offline / compile-time optimization and pre-scheduling.

GRAMPS expects dynamic data-dependent execution, (and thus) run-time scheduling

Also, GRAMPS assumes commodity and heterogeneity.

Page 7: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

GRAMPS Examples

Rast ShadeFB

Blend

InputFragment

Queue

OutputFragment

Queue

Camera Intersect

FB Blend

RayQueue

SampleQueue

Shade

PixelQueue

Rasterization Pipeline

Ray Tracing Pipeline

Page 8: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

8

GRAMPS Overview Concepts:

GraphsStages: thread, shader, fixed-functionQueues: ordered, unordered, sets

(exclusion)Buffers

ComponentsAPIs: setup/driver, thread, shaderScheduler: fat core, shader core, top-level

Page 9: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

9

What We’ve Built Three rendering pipelines:

Direct3D, Packet Tracer, D3D + Push (Hybrid)

Simulator and Runtime for two machines:GPU-like: Many threads per core, hw

schedCPU-like: Few threads per core, sw sched

Page 10: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

10

Rendering Pipelines

Direct3D Pipeline (with Ray-tracing Extension)

IA 1 VS 1 RO Rast

Trace

IA N VS N

PS

SampleQueue Set

RayQueue

PrimitiveQueue

Input VertexQueue 1

PrimitiveQueue 1

Input VertexQueue N

Ray-tracing Pipeline

Tiler Sampler Camera Intersect

Shade FB Blend

SampleQueue

TileQueue

RayQueue

Ray HitQueue Fragment

Queue

= Thread Stage

= Shader Stage

= Fixed-func Stage

= Queue

= Output via Push

OM

PS2

FragmentQueue

= Stage Output

Ray HitQueue

Ray-tracing Extension

PrimitiveQueue N

Page 11: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

11

Initial Results Measured thread occupancy, worst case

total queue memory.

Page 12: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

12

GRAMPS Vis

Page 13: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

13

High-level Challenges Is GRAMPS a suitable GPU evolution?

– Enable pipeline competitive with bare metal?

– Enable innovation: advanced / alternative methods?

– Is there a ‘best’ graphics pipeline on top?

Is GRAMPS a good parallel compute model?– Map well to hardware, hardware trends?– Support important apps?– Concepts influence developers?

Page 14: Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.

14

What’s Next? Low level implementation: scheduling,

more accurate simulation. More apps: REYES, physics, likely more. Audit and refine model: graph modification

/ state change, fork-join / blocking calls, locks / barriers / synchronization primitives intra- or inter-stage

Prototype, explore next generation graphics pipelines.