Top Banner
GPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper Damkjær and Kenny Erleben {damkjaer,kenny}@diku.dk Department of Computer Science University of Copenhagen October 2009
13

GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

May 25, 2018

Download

Documents

hoangdien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

GPU Accelerated Tandem Traversal of BlockedBounding Volume Hierarchies

Jesper Damkjær and Kenny Erleben{damkjaer,kenny}@diku.dk

Department of Computer ScienceUniversity of Copenhagen

October 2009

Page 2: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Traditional BVH Traversal

Two BVHs are traversed

Using either a stack or a queueUsing a descend rule descending either treeDescend both trees simultainiously

For each descend, the BVs in the nodes are compared foroverlap

2

Page 3: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Naive BVH on GPU

One pair of BVHs per Thread

Upper space bound for stack

k (c − 1) max (height(A),height(B)) ,

max. cardinality, c , and size of two BV node references, k .

Shared memory too small and global memory too slow

3

Page 4: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Use Blocks

1 Block ≡ Each node has 4 children

If overlap ⇒ 16 new overlaps

Less data to transfer and more work per thread

4

Page 5: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Use Double Buffered ListStack/Queue ⇒ Double buffered list

Swap input/output paris for next pass5

Page 6: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Memory Trick Needed

6

Page 7: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Need Imaginary Nodes

Less than 4 children ⇒ fill with imaginary nodes

Fills up space ⇒ part of calculation time ⇒ use sparesly

7

Page 8: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Blocks with Mixed Internal or Leaf Nodes

Not allowed ⇒ Simpler code

8

Page 9: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Internal Block versus Leaf Block

if collide (a, k) ⇒ push (e, k)if collide (a, l) collision ⇒ push (e, k)if collide (a,m) collision ⇒ push (e, k)if collide (a, n) collision ⇒ push (e, k)

Redundant results ⇒ add extra check to code

9

Page 10: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

The Test Setup

Three different configuration types

Structured stack Unstructured Pile Rock Slide

10

Page 11: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

The Test Setup (Cont’d)

For each configuration type

Increasing number of triangles in objectsIncreasing number of objects

Test against Rapid

Rapid uses OBBs we use AABBs

No optimization of imaginary nodes in BVHs (upto 33%)

11

Page 12: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Results

Rapid on Intel Quad CPU using one core

216343

512729

100019248

12

0

1

2

3

Number of objects

Stack: Rapid

Triangles per object

Tim

e in

sec

onds

216343

512729

1000240006000

1500

0

1

23

4

5

Number of objects

Pile: Rapid

Triangles per objectTi

me

in s

econ

ds

5001000

15002000

2500240006000

1500

0

0.1

0.2

0.3

Number of objects

Rockslide: Rapid

Triangles per object

Tim

e in

sec

onds

Cuda on ge9800 GX2 using one core

216343

512729

100019248

12

0

1

2

3

Number of objects

Stack: Cuda only

Triangles per object

Tim

e in

sec

onds

216343

512729

1000240006000

1500

0

1

23

4

5

Number of objects

Pile: Cuda only

Triangles per object

Tim

e in

sec

onds

5001000

15002000

2500240006000

1500

0

0.1

0.2

0.3

Number of objects

Rockslide: Cuda only

Triangles per object

Tim

e in

sec

onds

Stack (5-8) Pile (3-7) Slide (2)

12

Page 13: GPU Accelerated Tandem Traversal of Blocked …image.diku.dk/kenny/download/damkjaer.erleben.08.slides.pdfGPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper

Thanks

Questions?

13