Top Banner
Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG
14

Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

Extending GRAMPS Shaders

Jeremy SugermanJune 2, 2009

FLASHG

Page 2: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

2

Agenda GRAMPS Reminder (quick!) Reductions Reductions and more with GRAMPS

Shaders

Page 3: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

3

GRAMPS Reminder

Ray Tracer

RayQueue

Ray HitQueueFragment

Queue

Camera Intersect

Shade FB Blend

= Thread Stage= Shader Stage

= Queue= Stage Output= Push Output

IntermediateTuples

Map

Output

Produce

Combine(Optional)

Reduce Sort

InitialTuples

IntermediateTuples

FinalTuples

Map-Reduce

Page 4: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

4

GRAMPS Shaders Facilitate data parallelism Benefits: auto-instancing, queue management, implicit

parallelism, mapping to ‘shader cores’ Constraints: 1 input queue, 1 input element and 1 output

element per queue (plus push).

Effectively limits kernels to “map”-like usage.

Page 5: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

5

Reductions Central to Map-Reduce (duh), many parallel

apps Strict form: sequential, requires arbitrary

buffering– E.g., compute median, depth order

transparency

Associativity, commutativity enable parallel incremental reductions– In practice, many of the reductions actually

used (all Brook / GPGPU, most Map-Reduce)

Page 6: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

6

Logarithmic Parallel Reduction

11 55 33 22 11 77 33 55

66 55 88 88

1111 1616

2727

Page 7: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

7

Simple GRAMPS Reduction

Barrier / Sum

Generate:

0 .. MAX

Sum(Input)

(Reduce)

Validate /

Consume

Data

Strict reduction All stages are threads, no shaders

Page 8: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

8

Strict Reduction ProgramsumThreadMain(GrEnv *env) { sum = 0; /* Block for entire input */ GrReserve(inputQ, -1); for (i = 0 to numPackets) { sum += input[i]; } GrCommit(inputQ, numPackets);

/* Write sum to buffer or outputQ */}

Page 9: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

9

Incremental/Partial Reduction sumThreadMain(GrEnv *env) { sum = 0; /* Consume one packet at a time */ while (GrReserve(inputQ, 1) != NOMORE) { sum += input[i]; GrCommit(inputQ, 1); } /* Write sum to buffer or outputQ */}

Note: Still single threaded!

Page 10: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

10

Shaders for Partial Reduction? Appeal:

– Stream, GPU languages offer support– Take advantage of shader cores– Remove programmer boiler plate– Automatic parallelism and instancing

Obstacles:– Location for partial / incremental result– Multiple input elements (spanning packets)– Detecting termination– Proliferation of stage / program types.

Page 11: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

11

Shader Enhancements Stage / kernel takes N inputs per invocation

Must handle < N being available (for N > 1) Invocation reduces all input to a single output

– Stored as an output key? GRAMPS can (will) merge input across packets

No guarantees on shared packet headers!

Not a completely new type of shaderGeneral filtering, not just GPGPU reduce

Page 12: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

12

GRAMPS Shader Reduction

Barrier

Generate:

0 .. MAX

Sum(Input)

(Reduce)

Validate /

Consume

Data

Combination of N:1 shader and graph cycle (in-place).

Input “Queue” to validate only gets NOMORE

Page 13: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

13

Scheduling Reduction Shaders Highly correlated with graph cycles.

– Given reduction, preempt upstream under footprint.

Free space in input gates possible parallelism– 1/Nth free is the most that can be used.– One free entry is the minimum required

for forward progress. Logarithmic versus linear reduction is

entirely a scheduler / GRAMPS decision.

Page 14: Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.

14

Other Thoughts (As mentioned) Enables filtering. What

else? How interesting are graphs without loops?

Are there other alternatives? Would a separate “reduce” / “combine” stage be better?

Questions?