1 A Hierarchical Shadow Volume Algorithm Timo Aila 1,2 Tomas Akenine-Möller 3 1 Helsinki University of Technology 2 Hybrid Graphics 3 Lund University
Dec 23, 2015
1
A Hierarchical Shadow Volume Algorithm
Timo Aila1,2
Tomas Akenine-Möller3
1Helsinki University of Technology 2Hybrid Graphics 3Lund University
2
Outline
Brief intro to shadow volumes fillrate problem, existing solutions
Our solution idea implementation
Results Q&A
3
Shadow volumes [Crow77]
Shadow volumes define closed volumes of space that are in shadow
infinitesimallight source
shadow caster = light cap
extrudedside quads
dark cap
4
Is point inside shadow volume?
1. Pick reference point R outside shadow volume any such point is OK
2. Span line from R to point to be classified
3. Compute sum of enter (+1) and exit (-1) events
RP2
P3
P1
2D illustration:
shadow volume
5
Using graphics hardware
R at ∞ behind pixel (z-fail) [Bilodeau&Songy, Carmack]
infinity always outside SVs – robust must not clip to far plane of view frustum sum hidden events to stencil buffer,
sign from backface culling
2D illustration: cameraR
visible samples (or pixels)
view frustum
+
++
-
-
shadow volume
7
Fillrate problem
50+ fps without shadows on ATI Radeon 9800XT at 1280x1024, 1 sample/pixel
1 fps when shadow volumes rasterized 2.2 billion pixels per frame
8
Existing solutions (1/2)
CC shadow volumes [Lloyd et al. 2004]
draw SVs only where receivers exist good when lots of empty space
Hybrid shadow maps and volumes [Chan&Durand 2004]
use SVs only at shadow boundaries boundary pixels determined using shadow map artifacts due to limited shadow map resolution
9
Existing solutions (2/2)
Depth bounds [Nvidia 2003]
application supplies min & max depth values separately for each shadow volume
rasterize shadow volume only when visible geometry between [min,max]
optimal bounds hard to compute
camera
min max
2D illustration:
shadow volumevisible pixels
10
Outline
Brief intro to shadow volumes fillrate problem, existing solutions
Our solution idea implementation
Results Q&A
15
How to detect shadow boundaries?
Two facts about shadow volumes1. always closed2. SV triangles mark potential shadow boundaries
If 3D volume in scene not intersected by shadow volume triangles
fully lit or fully in shadow single sample classifies entire volume
16
Outline
Brief intro to shadow volumes fillrate problem, existing solutions
Our solution idea implementation
Results Q&A
17
Detecting boundary tiles
Bound tile with axis-aligned bounding box 8x8 pixel region Zmin, Zmax
Triangle vs. AA Box intersection test1. low-resolution rasterization
2. Zmin and Zmax tests
8
8 pixels
Zmax
Zmin
18
Fast update of non-boundary tiles
Copy low-res shadows to stencil buffer writing 64 per-pixel values would be slow
Two-level stencil buffer saves the day maintain [Smin, Smax] per tile always test the higher level first often no need to validate per-pixel values stencil values of non-boundary tiles are constant
19
Implementation – Stage 1
Buffers built separately for each shadow volume Classifications ready when entire SV processed
application marks begin/end of shadow volumes
Low-res shadows
Low-resolution rasterizer
Boundary?
Per-tile operations
SV triangles
20
Implementation – Stage 2
Low-res shadows
Per-pixel rasterizer
Copy to2-level stencil
Stencil ops
Yes
No
Low-resolution rasterizer
Boundary?
SV triangles
boundarytile?
Update 2-level stencil
21
Alternative implementations
Two pass Pass 1 = Stage 1 Pass 2 = Stage 2 How to keep pixel units busy during Stage 1?
maybe assign per-tile operations to pixel shaders?
Single pass Separate stages using delay stream [Aila et al. 2003]
Stage 2 of current SV executes simultaneously with next SV’s Stage 1
22
Hardware resources
Two-level stencil buffer Per-tile operations
Optionally delay stream * duplicate low-res rasterizer & Zmin/Zmax units * cache for per shadow volume buffers
multiple buffers for pipelined operation allocate from external memory
* If not already there for occlusion culling purposes
23
Outline
Brief intro to shadow volumes fillrate problem, existing solutions
Our solution idea implementation
Results Q&A
24
Results – Simple scene (1280x1024)
Depth bounds Hierarchical Improvement
Ratio in #pixels 1.1 12.7 11.5
Ratio in bandwidth 1.03 17.6 17.2
25
Results – Knights (1280x1024)
Depth bounds Hierarchical Improvement
Ratio in #pixels 2.6 7.4 2.8
Ratio in bandwidth 2.4 5.6 2.4
26
Results – Powerplant (1280x1024)
Depth bounds Hierarchical Improvement
Ratio in #pixels 2.4 22.9 9.5
Ratio in bandwidth 2.3 16.0 6.9
27
Summary
Hierarchical rendering method for shadow volumes significant fillrate savings compared to other
hardware methods also works for soft shadow volumes
Future work would it make sense to extend programmability to
per-tile operations? how many pipeline bubbles are created?
requires chip-level simulations
28
Thank you!
Questions?
Acknowledgements Ville Miettinen, Jacob Ström, Eric Haines, Ulf Assarsson,
Lauri Savioja, Jonas Svensson, Ulf Borgenstam, Karl Schultz, 3DR group at Helsinki University of Technology
The National Technology Agency of Finland, Hybrid Graphics, Bitboys, Nokia and Remedy Entertainment
ATI for granting fellowship to Timo (2004-2005)