Afrigraph Tutorial B: Afrigraph Tutorial B: Interactive Ray-Tracing Interactive Ray-Tracing Ingo Wald Ingo Wald Philipp Slusallek Philipp Slusallek Saarland University Saarland University Computer Graphics Group Computer Graphics Group http://graphics.cs.uni-sb.de http://graphics.cs.uni-sb.de
Afrigraph Tutorial B: Interactive Ray-Tracing. Ingo Wald Philipp Slusallek Saarland University Computer Graphics Group http://graphics.cs.uni-sb.de. For almost 20 years, researchers have argued that eventually, Ray-Tracing will become faster than rasterization. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
For almost 20 years, researchers have argued For almost 20 years, researchers have argued that eventually, Ray-Tracing will become that eventually, Ray-Tracing will become
faster than rasterizationfaster than rasterization
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
For almost 20 years, researchers have argued For almost 20 years, researchers have argued that eventually, Ray-Tracing will become that eventually, Ray-Tracing will become
faster than rasterizationfaster than rasterization
And nothing happened...And nothing happened...Well, almost ...Well, almost ...
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Four Power Plants Four Power Plants (50 Mtris)(50 Mtris)
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Tutorial OverviewTutorial Overview
· IntroductionIntroduction– Introduction to Ray-TracingIntroduction to Ray-Tracing– Discussion: Ray-Tracing versus RasterizationDiscussion: Ray-Tracing versus Rasterization
· Interactive Ray-TracingInteractive Ray-Tracing on PCs on PCs– Coherent Ray-Tracing Coherent Ray-Tracing ImplementationImplementation– ComparisonsComparisons (SW / HW) (SW / HW)– Distributed RT of Massive ModelsDistributed RT of Massive Models
· Outlook: Hardware-Architectures for Ray-TracingOutlook: Hardware-Architectures for Ray-Tracing· Future Research and ConclusionsFuture Research and Conclusions
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Tutorial OverviewTutorial Overview
· IntroductionIntroduction– Introduction to Ray-TracingIntroduction to Ray-Tracing– Discussion: Ray-Tracing versus RasterizationDiscussion: Ray-Tracing versus Rasterization
· Interactive Ray-TracingInteractive Ray-Tracing on PCs on PCs– Coherent Ray-Tracing Coherent Ray-Tracing ImplementationImplementation– ComparisonsComparisons (SW / HW) (SW / HW)– Distributed RT of Massive ModelsDistributed RT of Massive Models
· Outlook: Hardware-Architectures for Ray-TracingOutlook: Hardware-Architectures for Ray-Tracing· Future Research and ConclusionsFuture Research and Conclusions
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Introduction to Introduction to Ray-TracingRay-Tracing
· In principle: Very simple algorithmIn principle: Very simple algorithm– For each pixelFor each pixel
• Create ray through that pixel• Cast ray into scene and find closest intersection• “Shade” ray at intersection point
– Can also shoot new rays during shading:Can also shoot new rays during shading:• Determine visibility of point lights by “shadow rays”• Compute reflected/refracted light by recursively tracing
reflection-/refraction-rays
– Basically, that´s all…Basically, that´s all…
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing Ray-Tracing AlgorithmAlgorithm
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Introduction to Introduction to Ray-TracingRay-Tracing
· Only three main components:Only three main components:– Generating raysGenerating rays– Finding the closest intersection of a rayFinding the closest intersection of a ray
• Ray traversal• Ray-object intersection
– ShadingShading
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-GenerationRay-Generation
· Generate initial ray for each pixelGenerate initial ray for each pixel– Other camera models are trivial…Other camera models are trivial…
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Object-Ray-Object-IntersectionIntersection
· Need to compute intersectionsNeed to compute intersections fastfast– Requires many floating point operationsRequires many floating point operations
• But typically dominated by traversal (2:1)
– Plenty of algorithmsPlenty of algorithms• Plenty of primitives• Even for triangles
· OptimizationsOptimizations– Use SIMD CPU-extensions (SSE, AltiVec, 3D-Now) Use SIMD CPU-extensions (SSE, AltiVec, 3D-Now)
• Data parallel execution
– Proper caching of dataProper caching of data
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
ShadingShading
· Lots of reflection models possibleLots of reflection models possible– Phong, Cook-Torrance, Ward, …Phong, Cook-Torrance, Ward, …– Direct use of Shading Languages (Renderman)Direct use of Shading Languages (Renderman)
· Shading Shading afterafter visibility has been computed visibility has been computed– No overhead due to overdrawNo overhead due to overdraw– Every ray is shaded Every ray is shaded exactly onceexactly once
· Can generate new raysCan generate new rays– Shadow, reflection, transmission, ...Shadow, reflection, transmission, ...
• Need to deal with recursion• Rendering cost linear in #rays traced
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Introduction to Introduction to Ray-TracingRay-Tracing
· Only three main components:Only three main components:– Generating raysGenerating rays– Finding the closest intersection of a rayFinding the closest intersection of a ray
• Ray traversal• Ray-object intersection
– ShadingShading
· Problem:Problem:– ““Find closest intersection” is Find closest intersection” is very very expensiveexpensive– And: Lots of rays per image …And: Lots of rays per image …
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
· Efficient HW implementationEfficient HW implementation– Use of object coherenceUse of object coherence– Many new featuresMany new features
· Rendering is driven by App.Rendering is driven by App.– Application submits geometryApplication submits geometry
· Visibility determined at endVisibility determined at end– Z-buffer fragment testZ-buffer fragment test
Application
T&L, Vertex Ops
Rasterization
Texturing
Fragment Ops
Fragment Tests
Framebuffer
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
RasterizationRasterizationDrawbacksDrawbacks
Drawbacks of this approachDrawbacks of this approach
· Use of object coherenceUse of object coherence– Only if triangle is largeOnly if triangle is large
· Rendering is driven by App.Rendering is driven by App.– Application has to know what is visible…Application has to know what is visible…– Efficient occlusion culling is hardEfficient occlusion culling is hard
· Visibility determined at endVisibility determined at end– Overdraw: Discard all but one fragmentsOverdraw: Discard all but one fragmentsHigh depth complexity: very inefficientHigh depth complexity: very inefficient
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing versus Ray-Tracing versus RasterizationRasterization
· FlexibilityFlexibility– Handling unstructured groups of raysHandling unstructured groups of rays
• Image-based rendering, reflections, shadows …
· GeneralityGenerality– Ray-Tracing is the basis for many algorithmsRay-Tracing is the basis for many algorithms
• Global illumination, visibility, …
– Used in many disciplinesUsed in many disciplines• Physics, Biology, Chemistry, Telecom, …
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing versus Ray-Tracing versus RasterizationRasterization
· Simple and Efficient ShadingSimple and Efficient Shading– Shading happens after visibility computationShading happens after visibility computation– Direct use of Shading LanguagesDirect use of Shading Languages
· Correctness & Image QualityCorrectness & Image Quality– Rasterization inherently relies on approximationsRasterization inherently relies on approximations
• Environment maps, shadow maps, ...
– Ray-traced images are “correct” by defaultRay-traced images are “correct” by default• ´True´ reflections and shadows…• Use of approximations is optional
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing versus Ray-Tracing versus RasterizationRasterization
· Parallel ScalabilityParallel Scalability– Ray-Tracing is „embarrassingly parallel“Ray-Tracing is „embarrassingly parallel“
(e.g. each pixel independent of all others)
– Scales well with Scales well with the available the available hardwarehardware– Needs fast access to scene data baseNeeds fast access to scene data base
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing versus Ray-Tracing versus RasterizationRasterization
· Scalability with Scene Size:Scalability with Scene Size:
Occlusion Culling & Logarithmic ComplexityOcclusion Culling & Logarithmic Complexity– RT never even looks at invisible geometryRT never even looks at invisible geometry– RT traversal allows for efficient searching: RT traversal allows for efficient searching: O(log N)O(log N)
– Rasterization shows linear behavior: Rasterization shows linear behavior: O(N)O(N)
RT wins for complex scenesRT wins for complex scenes– But rasterization is improvingBut rasterization is improving
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing versus Ray-Tracing versus RasterizationRasterization
· CoherenceCoherence– Key to efficient renderingKey to efficient rendering– Rasterization: Rasterization: Object coherenceObject coherence
• Allows for efficient HW implementation• But only really efficient for large triangles
– Ray-Tracing: Ray-Tracing: Ray coherenceRay coherence• Improved caching & reduced bandwidth• Allows for data parallel computation
– RT has much more coherence than assumedRT has much more coherence than assumed• But harder to exploit…
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing versus Ray-Tracing versus RasterizationRasterization
· Conclusion of that ComparisonConclusion of that Comparison– Ray Tracing has Ray Tracing has manymany advantages advantages
• These advantages become ever more pronounced• Not only qualty, also efficiency…
– But: Ray-Tracing is (still) costlyBut: Ray-Tracing is (still) costly• Have to make it faster !
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Tutorial OverviewTutorial Overview
· IntroductionIntroduction– Introduction to Ray-TracingIntroduction to Ray-Tracing– Discussion: Ray-Tracing versus RasterizationDiscussion: Ray-Tracing versus Rasterization
· Interactive Ray-TracingInteractive Ray-Tracing on PCs on PCs– Coherent Ray-Tracing Coherent Ray-Tracing ImplementationImplementation– ComparisonsComparisons (SW / HW) (SW / HW)– Distributed RT of Massive ModelsDistributed RT of Massive Models
· Outlook: Hardware-Architectures for Ray-TracingOutlook: Hardware-Architectures for Ray-Tracing· Future Research and ConclusionsFuture Research and Conclusions
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Previous and Previous and Related WorkRelated Work
Two ways to achieve ray-tracing like quality interactively:Two ways to achieve ray-tracing like quality interactively:
· Trace less rays per frame: “Approximative ray-tracing”Trace less rays per frame: “Approximative ray-tracing”– Rasterization hardwareRasterization hardware– Image-based techniquesImage-based techniques– Interpolation of ray-traced resultsInterpolation of ray-traced results
· Trace more rays/sec: “Accelerated ray-tracing”Trace more rays/sec: “Accelerated ray-tracing”– Better data structuresBetter data structures– Better algorithmsBetter algorithms– Better implementationsBetter implementations– Parallel processingParallel processing
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Previous and Previous and Related WorkRelated Work
Two ways to achieve ray-tracing like quality interactively:Two ways to achieve ray-tracing like quality interactively:
· Trace less rays per frame: “Approximative ray-tracing”Trace less rays per frame: “Approximative ray-tracing”– Rasterization hardwareRasterization hardware– Image-based techniquesImage-based techniques– Interpolation of ray-traced resultsInterpolation of ray-traced results
· Trace more rays/sec: “Accelerated ray-tracing”Trace more rays/sec: “Accelerated ray-tracing”– Better data structuresBetter data structures– Better algorithmsBetter algorithms– Better implementationsBetter implementations– Parallel processingParallel processing
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
– Compute visible geometry in HWCompute visible geometry in HW• Lookup of geometry in frame buffer
– Only works for primary rays and point lightsOnly works for primary rays and point lights– Creates artifacts (e.g. shadow buffer resolution)Creates artifacts (e.g. shadow buffer resolution)
· Augmenting hardware with RT effectsAugmenting hardware with RT effects– Selective raySelective ray--tracingtracing– Integrate Integrate ray-tracing ray-tracing with OpenGLwith OpenGL rendering rendering
• Rasterization for diffuse objects• Textures or splatting [Stamminger/Haber 00/01] for ray-
traced samples
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
· RenderCache [Walter et al. 99]RenderCache [Walter et al. 99]– Store ray samples per pixel (color, depth, ...)Store ray samples per pixel (color, depth, ...)– Reproject sampleReproject sampless for next frame for next frame– Detect and fill holes by sending few new raysDetect and fill holes by sending few new rays
• Heuristic algorithms based on neighborhood
– Locate and correct errors (shadow, etc)Locate and correct errors (shadow, etc)• Pseudo-randomly sample a few other pixel• Adaptively sample near error regions
– But: Reprojection and Heuristics are expensiveBut: Reprojection and Heuristics are expensive• Pays off (only) when pixels are very expensive to
compute directly (e.g. global illumination)
– Scales badly with #CPUsScales badly with #CPUs
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
· Holodeck [Ward 98]Holodeck [Ward 98]– Similar to RenderCache, butSimilar to RenderCache, but
• Long term storage of ray samples on disk• Fast access to samples based on grid structure
– Builds light-field-like data representationBuilds light-field-like data representation
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Previous and Previous and Related WorkRelated Work
Two ways to achieve ray-tracing like quality interactively:Two ways to achieve ray-tracing like quality interactively:
· Trace less rays per frame: “Approximative ray-tracing”Trace less rays per frame: “Approximative ray-tracing”– Rasterization hardwareRasterization hardware– Image-based techniquesImage-based techniques– Interpolation of ray-traced resultsInterpolation of ray-traced results
· Trace more rays/sec: “Accelerated ray-tracing”Trace more rays/sec: “Accelerated ray-tracing”– Better data structuresBetter data structures– Better algorithmsBetter algorithms– Better implementationsBetter implementations– Parallel processingParallel processing
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Accelerated Ray Tracing:Accelerated Ray Tracing:Better Data Better Data
Structures/AlgorithmsStructures/Algorithms
· ´Best´ data structure (Grid vs BSP vs…) ?´Best´ data structure (Grid vs BSP vs…) ?– Always scene and implementation dependentAlways scene and implementation dependent– In practice, most do about equally well…In practice, most do about equally well…– Well-reserached topic Well-reserached topic ´New´ data structures are ´New´ data structures are
unlikely to be foundunlikely to be found
· But: Potential for better algorithms:But: Potential for better algorithms:– Can we better exploit coherence ?Can we better exploit coherence ?– Can we build data structures faster ?Can we build data structures faster ?– Can we build data structures fully automatically ?Can we build data structures fully automatically ?
· Also: Need for dynamic data structuresAlso: Need for dynamic data structures
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Accelerated Ray-Tracing:Accelerated Ray-Tracing:Parallelization on SuperComputersParallelization on SuperComputers
· RT of RT of llarge CSG models [Muuss 95]arge CSG models [Muuss 95]– Motivation: Interactively render complex data setsMotivation: Interactively render complex data sets– Idea: Use raytracingIdea: Use raytracing
• Flexibility: Avoid tessellation of CSG-models• Take advantage of logarithmic complexity of RT• Exploit parallelism
– ResultsResults• 1-2 frames per second @ video resolution (in ´95!!!)
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
· Utah Parallel RT System [Parker 99]Utah Parallel RT System [Parker 99]– Similar approach to MuussSimilar approach to Muuss
• Parallelization on shared memory machine
– Supports general primitives and volume data Supports general primitives and volume data setssets
– ResultsResults• Has shown scalability up to 128 CPUs• Importance of caching analysis• New goal: interactive visual cues for visualization
(Same information at less cost)
Accelerated Ray-Tracing:Accelerated Ray-Tracing:Parallelization on SuperComputersParallelization on SuperComputers
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Tutorial OverviewTutorial Overview
· IntroductionIntroduction– Introduction to Ray-TracingIntroduction to Ray-Tracing– Discussion: Ray-Tracing versus RasterizationDiscussion: Ray-Tracing versus Rasterization
· Interactive Ray-TracingInteractive Ray-Tracing on PCs on PCs– Coherent Ray-Tracing Coherent Ray-Tracing ImplementationImplementation– ComparisonsComparisons (SW / HW) (SW / HW)– Distributed RT of Massive ModelsDistributed RT of Massive Models
· Outlook: Hardware-Architectures for Ray-TracingOutlook: Hardware-Architectures for Ray-Tracing· Future Research and ConclusionsFuture Research and Conclusions
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
IRT on PC´s:IRT on PC´s:What to keep in mindWhat to keep in mind
· PC hardware has changed dramaticallyPC hardware has changed dramatically– Processors become much fasterProcessors become much faster
• But increase in ray-tracing speed is gradual
– Increasing gap between speed of CPU and Increasing gap between speed of CPU and memorymemory• But ray-tracing algorithm did not change
– SIMD extensionsSIMD extensions• Flops become increasingly cheap• But difficult to take advantage of in ray-tracing
– Fast (and cheap) networking & network of PCsFast (and cheap) networking & network of PCs• But good performance on non-shared-memory is hard• Small clusters are around everywhere…
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
IRT on PC´s:IRT on PC´s:What to keep in mindWhat to keep in mind
· PC hardware has changed dramaticallyPC hardware has changed dramatically
Have to adapt our algorithms !Have to adapt our algorithms !– Special emphasis on Special emphasis on
• Keeping the CPU busy• Memory & Caching
(1 cache miss can cost several triangle intersections)• SIMD
– Not so important any more:Not so important any more:• Instruction count, avoiding float ops
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
General General Optimizations: Optimizations:
CacheCacheMain memory is too slow for CPU (1:10)Main memory is too slow for CPU (1:10)
(bandwidth and latency)(bandwidth and latency)
· Keep relevant data in cachesKeep relevant data in caches– Design algorithms for cache reuse Design algorithms for cache reuse coherence coherence– Align data to cache lines (32 bytes)Align data to cache lines (32 bytes)– Separate data according to usageSeparate data according to usage
• Separate volatile from non-volatile data• Store intersection data separate from shading data
(e.g. shading normals not needed for intersection)
– Prefetch dataPrefetch data• Design algorithms to enable data access prediction
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
General General Optimizations: Optimizations:
CacheCacheCache Reuse Example: Triangle Data Cache Reuse Example: Triangle Data
StructureStructure· Variant 1: Variant 1:
Struct Triangle { Vec3f *a,*b,*c; };Struct Triangle { Vec3f *a,*b,*c; };– Intersect() routine works on this structureIntersect() routine works on this structure– Prefetching hard (2 levels of indirection)Prefetching hard (2 levels of indirection)– Data stored in 4 different memory regionsData stored in 4 different memory regions
(1 struct + 3 vectors)
Worst case: 8 cache missesWorst case: 8 cache misses(if each of the 4 data overlaps cacheline border)
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
General General Optimizations: Optimizations:
CacheCacheCache Reuse Example: Triangle Data Cache Reuse Example: Triangle Data
StructureStructure· Variant 2: Variant 2:
With preprocessed intersection dataWith preprocessed intersection data– All necessary data packed into 48 aligned bytesAll necessary data packed into 48 aligned bytes
(see paper)(see paper)– Con: Additional data to store (48b/triangle)Con: Additional data to store (48b/triangle)– But several advantages: But several advantages:
• At most 2 cache misses• 1 continuous memory region Trivial to prefetch
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
General General Optimizations: Optimizations:
CacheCache
· This was only This was only oneone example: example: Similarly forSimilarly for – BSP Nodes (even more important)BSP Nodes (even more important)– Triangle listsTriangle lists– MaterialsMaterials– Shading DataShading Data– ……
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
General Optimizations: General Optimizations: SimplificationSimplification
Today's CPUs have very long pipelinesToday's CPUs have very long pipelines· Simplify the code Simplify the code to avoid pipeline stallsto avoid pipeline stalls
– Choose simple algorithmsChoose simple algorithms• “KISS” wins…(KISS = keep it simple and stupid)• E.g. BSP-tree traversal simpler than grids• Easier to maintain and optimize (e.g. prefetching)
– Write tight inner loopsWrite tight inner loops• E.g. better caching and handling of branches
– Avoid conditionals/relative jumps in inner loops Avoid conditionals/relative jumps in inner loops • E.g. support only triangles
Most CPUs provide SIMD extensionsMost CPUs provide SIMD extensionsIntel: SSE (Others: 3D-Now!, AltiVec, ...)Intel: SSE (Others: 3D-Now!, AltiVec, ...)
· Use SIMD: higher speed & lower bandwidthUse SIMD: higher speed & lower bandwidth– Up to four parallel floating point operationsUp to four parallel floating point operations
For the cost of 1 !
– Fetch data once to reduce bandwidth to cacheFetch data once to reduce bandwidth to cache• Amortize loading cost over 4 operationsFactor 4 in bandwidth reduction
– Overhead due to restricted instruction setOverhead due to restricted instruction set• E.g. no ´SSE dot product´
– Con: Programming in assembly languageCon: Programming in assembly language
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
How to use SIMD Extensions ?How to use SIMD Extensions ?· Either: Instruction-parallelEither: Instruction-parallel
– Combine 4 computations in ´normal´ algorithmCombine 4 computations in ´normal´ algorithm– E.g. the 4 mults in a dot productE.g. the 4 mults in a dot product
· Or: Data-parallelOr: Data-parallel– Run algorithm on 4 different data in parallel Run algorithm on 4 different data in parallel – E.g. 4 independent dot productsE.g. 4 independent dot products
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
SIMD: IntersectionSIMD: Intersection· SIMD best used in data parallel fashionSIMD best used in data parallel fashion
– Little instruction-level parallelism (in RT)Little instruction-level parallelism (in RT) Just doesn´t work…
– Data parallel: 1 ray Data parallel: 1 ray 4 triangles 4 triangles• Hard to always have four triangles ready• Data parallel traversal for 1 ray ?
– Data parallel: 4 rays Data parallel: 4 rays 1 triangle 1 triangle• Must traverse rays in parallel ray packets• Standard intersection code• Overhead for terminated rays
(E.g. 1 ray hits, 3 rays miss)
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Coherent Algorithm: Coherent Algorithm: Tracing Ray PacketsTracing Ray Packets
Many rays are very similarMany rays are very similare.g. primary and shadow rays, but others tooe.g. primary and shadow rays, but others too
· Handle rays together in packets of 4 raysHandle rays together in packets of 4 rays– Process them in lock-step (Process them in lock-step ( SIMD) SIMD)– Reorder computations to be partly breadth-firstReorder computations to be partly breadth-first– Load data once and use it for all raysLoad data once and use it for all rays
· SpeedupSpeedup– Prerequisite: Expose coherence in ray-tracing Prerequisite: Expose coherence in ray-tracing
algorithmalgorithm– Factor >5: General optimizationsFactor >5: General optimizations– Factor >2: SIMD computationsFactor >2: SIMD computations– Further optimizations are possibleFurther optimizations are possible
• Better prefetching, more efficient shading
· PerformancePerformance– 200K to 1.5M primary rays/s (800 MHz, P-III)200K to 1.5M primary rays/s (800 MHz, P-III)– Almost linear in # of reflection & shadow raysAlmost linear in # of reflection & shadow rays
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Comparison: Comparison: Test ScenesTest Scenes
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Soda HallSoda Hall 8M8M OOMOOM OOMOOM OOMOOM 0.80.8
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Comparison: Comparison: Scaling with Scene SizeScaling with Scene Size
· Render time of subsampled terrain (spf)Render time of subsampled terrain (spf)
– Typical linear scaling of rasterization HWTypical linear scaling of rasterization HW– Worst case for RT: No occlusionWorst case for RT: No occlusion– Only 1 CPU !Only 1 CPU !
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Demo / VideoDemo / Video
Distributed RT of Massive Distributed RT of Massive ModelsModels
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Reference Model Reference Model (12.5 Mtris)(12.5 Mtris)
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Previous WorkPrevious Work
· Rendering of Massive ModelsRendering of Massive Models [Aliaga 99] [Aliaga 99]– Framerate: 5 to 15 fps for single power plantFramerate: 5 to 15 fps for single power plant
• Needs shared-memory supercomputer (SGI)
– Framework of algorithmsFramework of algorithms• Textured-depth-meshes (96% reduction in #tris)• View-Frustum Culling & LOD (50% each)• Hierarchical occlusion maps (10%)
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Distributed RT of Distributed RT of Massive ModelsMassive Models
· Ray-Tracing Ray-Tracing and massive models just and massive models just match:match:– Logarithmic scaling in #primitivesLogarithmic scaling in #primitives
• Ideal for big models
– PreprocessingPreprocessing• Simple and fast spatial sorting, fully automatic
– Distributed computingDistributed computing• Parallel scalability to many networked computers• No scene replication
Our Approach: Use coherent ray-tracingOur Approach: Use coherent ray-tracing– Caching of scene data in networkCaching of scene data in network– Deal with network issues by reorderingDeal with network issues by reordering
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing IssuesRay-Tracing Issues
· Distributed Scene ManagementDistributed Scene Management– Several GB of scene dataSeveral GB of scene data
• File size and virtual address space (32 bit)
– Cannot use OS caching (demand paging)Cannot use OS caching (demand paging)• Cache miss will stall the entire process
– 1ms network latency = time to trace several hundred rays• Reordering would need non-blocking memory read
Need to handle cache Need to handle cache manuallymanually• No longer limited by address spaceNo longer limited by address space• Allows reordering of computationsAllows reordering of computations
• Do not wait for missing data• Continue with other rays while data is being fetched…
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Massive Models: Massive Models: CachingCaching
· 2-Level BSP-Trees2-Level BSP-Trees– Caching based on “voxels“Caching based on “voxels“– Voxels are completely self-containedVoxels are completely self-contained
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Structure of the Structure of the BSP-TreeBSP-Tree
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Distribution IssuesDistribution Issues
· PreprocessingPreprocessing– Simple spatial sortingSimple spatial sorting– Need out-of-core algorithm due to model sizeNeed out-of-core algorithm due to model size– Simplistic implementation: 2.5 hoursSimplistic implementation: 2.5 hours
• Estimated with optimizations: < 30 min
· Model ServerModel Server– Single server provides all model dataSingle server provides all model data
• Potenial bottleneck
– Should be distributed as wellShould be distributed as well• At least for more than 10 clients• Trivial to implement
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Distribution IssuesDistribution Issues
· Load BalancingLoad Balancing– Tile based (32x32 pixels)Tile based (32x32 pixels)– Demand drivenDemand driven– Avoid idle-times Avoid idle-times
· Frame-to-Frame CoherenceFrame-to-Frame Coherence– Keep rays on the same clientKeep rays on the same client
• Simple: Keep tiles on the same client• Better: Assign tiles based on reprojected pixels
– Larger effective cache sizeLarger effective cache size• Increases with number of clients
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
ResultsResults
· SetupSetup– Seven dual Pentium-III 800-866 MHzSeven dual Pentium-III 800-866 MHz– FastEthernet (100Mbit) for normal clientsFastEthernet (100Mbit) for normal clients– GigabitEthernet only for display & model serverGigabitEthernet only for display & model server
· Performance for one Power PlantPerformance for one Power Plant– 44-5 fps without SSE optimization-5 fps without SSE optimization– Factor 2 speedup with SSEFactor 2 speedup with SSE– Almost perfect scaling from 1 to 14 CPUsAlmost perfect scaling from 1 to 14 CPUs
• Never tried any more than that
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Animation: Animation: Framerate vs. BandwidthFramerate vs. Bandwidth
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
SpeedupSpeedup
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Demo / VideoDemo / Video
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Tutorial OverviewTutorial Overview
· IntroductionIntroduction– Introduction to Ray-TracingIntroduction to Ray-Tracing– Discussion: Ray-Tracing versus RasterizationDiscussion: Ray-Tracing versus Rasterization
· Interactive Ray-TracingInteractive Ray-Tracing on PCs on PCs– Coherent Ray-Tracing Coherent Ray-Tracing ImplementationImplementation– ComparisonsComparisons (SW / HW) (SW / HW)– Distributed RT of Massive ModelsDistributed RT of Massive Models
· Outlook: Hardware-Architectures for Ray-TracingOutlook: Hardware-Architectures for Ray-Tracing· Future Research and ConclusionsFuture Research and Conclusions
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing Ray-Tracing HardwareHardware
· Summary so far:Summary so far:– RT has many technicalRT has many technical
advantagesadvantages– Better performance forBetter performance for
large scenes, (logN vs N)large scenes, (logN vs N)– Better image quality, Better image quality,
more featuresmore features– But: High initial cost onBut: High initial cost on
main CPUmain CPU
Hardware support Hardware support would helpwould help
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing Ray-Tracing Hardware:Hardware:
Why today ?Why today ? The setting has changedThe setting has changed
– ´Real´ scenes aren´t suited for rasterization any ´Real´ scenes aren´t suited for rasterization any moremore• High depth complexity• Large scenes, small triangles• Shading becomes more expensive• Demand for more features (shading, programmability)
Advantages of raytracing finally come to playAdvantages of raytracing finally come to play
– Also: Flops aren´t that expensive any moreAlso: Flops aren´t that expensive any more• Number of Gigaflops per Gforce ?
– Neither is memory…Neither is memory…
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing Ray-Tracing Hardware:Hardware:
Previous WorkPrevious Work· Over the last decade: Over the last decade:
Several research systemsSeveral research systems– Often suffered from lack of resourcesOften suffered from lack of resources
· Volume-Ray-Casting systemsVolume-Ray-Casting systems– Full volume ray casting on a chipFull volume ray casting on a chip– Many, some already commercially successfulMany, some already commercially successful
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing Ray-Tracing Hardware:Hardware:
The SHARP The SHARP ArchitectureArchitecture· SHARP architecture: Tim Purcell, StanfordSHARP architecture: Tim Purcell, Stanford
– Mixed SW/HW approachMixed SW/HW approach
· Based on SmartMemories [Mai 00]Based on SmartMemories [Mai 00]– ““Multiprocessor on a Chip”Multiprocessor on a Chip”– Roughly 64 R10k, with 8GB/s (!) memory bandwith Roughly 64 R10k, with 8GB/s (!) memory bandwith
ChipTile Quad
Processor
Interconnect
16 x 8Kb SRAMQuad Network
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing Ray-Tracing Hardware:Hardware:
The SHARP The SHARP ArchitectureArchitecture
· Conclusions from SHARPConclusions from SHARP(Also see Siggraph 2001, Course 13)(Also see Siggraph 2001, Course 13)– Simple caching works very wellSimple caching works very well
• Good ray coherence
Off-chip bandwidth is minimalOff-chip bandwidth is minimal• Simple memory access design
– Reconfigurability allows to adapt to demandsReconfigurability allows to adapt to demands• Adapt number of shading/traversal units to scene
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Ray-Tracing Ray-Tracing HardwareHardware
Other Other ArchitecturesArchitectures· RAYA (MERL, Siggraph 2001, Course 13)RAYA (MERL, Siggraph 2001, Course 13)
– Based on „Memory Coherent Ray-Tracing“ Based on „Memory Coherent Ray-Tracing“ [Pharr][Pharr]
· CORA (Saarbrücken)CORA (Saarbrücken)– Hardware version of Coherent RT AlgorithmHardware version of Coherent RT Algorithm– Custom-design chipCustom-design chip– Est. performance: ~30/25 fps at 1024x768Est. performance: ~30/25 fps at 1024x768
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Tutorial OverviewTutorial Overview
· IntroductionIntroduction– Introduction to Ray-TracingIntroduction to Ray-Tracing– Discussion: Ray-Tracing versus RasterizationDiscussion: Ray-Tracing versus Rasterization
· Interactive Ray-TracingInteractive Ray-Tracing on PCs on PCs– Coherent Ray-Tracing Coherent Ray-Tracing ImplementationImplementation– ComparisonsComparisons (SW / HW) (SW / HW)– Distributed RT of Massive ModelsDistributed RT of Massive Models
· Outlook: Hardware-Architectures for Ray-TracingOutlook: Hardware-Architectures for Ray-Tracing· Future Research and ConclusionsFuture Research and Conclusions
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
What you should What you should take home with take home with
you…you…· Interactive Ray Tracing Interactive Ray Tracing ISIS feasible feasible
– If importance is paid to underlying hardware…If importance is paid to underlying hardware…
· It´s not only feasible, it´s It´s not only feasible, it´s already already therethere– Not only a theoretical phantasy any more…Not only a theoretical phantasy any more…– And even on cheap PCs And even on cheap PCs
· Not only better, it can even be Not only better, it can even be fasterfaster– At least for certain applicationsAt least for certain applications
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
The FutureThe Future
· IRT enables completely new applicationsIRT enables completely new applications– Just think what has been done OpenGLJust think what has been done OpenGL– Large scale visualization: engineering, … Large scale visualization: engineering, …
• Handling of huge models
– Interactive global illumination (?)Interactive global illumination (?)• Need to adapt algorithms to new situation
– Flexible renderingFlexible rendering• Gaze tracking and non-uniform sampling density• Image-Based or Frameless rendering
Question: What can IRT do for Question: What can IRT do for youyou??
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Open Research Open Research ProblemsProblems
· Can we make it even faster ?Can we make it even faster ?· HardwareHardware
– What is the best HW architecture?What is the best HW architecture?
· Dynamic ScenesDynamic Scenes– Optimized rebuild or transformation of index?Optimized rebuild or transformation of index?
· APIAPI– Better alternative to OpenGL´s „push model“?Better alternative to OpenGL´s „push model“?– OpenGL not suited for Ray-TracingOpenGL not suited for Ray-Tracing
· Global IlluminationGlobal Illumination– Efficient new algorithmsEfficient new algorithms
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
AcknowledgementsAcknowledgements
·AMDAMD–Generous support, sponsoring and collaboration Generous support, sponsoring and collaboration soon: 24-node dual-Althlon IV, 1.5GHz clustersoon: 24-node dual-Althlon IV, 1.5GHz cluster
·Presenters of the Siggraph 2001 Course 13Presenters of the Siggraph 2001 Course 13–Images, material, and informationImages, material, and information
·Tim Purcell & Pat Hanrahan (Stanford)Tim Purcell & Pat Hanrahan (Stanford)–Many discussions and ideasMany discussions and ideas
·The Max-Planck-Institute at SaarbrueckenThe Max-Planck-Institute at Saarbruecken–Collaboration and use of their Graphics HardwareCollaboration and use of their Graphics Hardware
·C. Benthin & M. Wagner & othersC. Benthin & M. Wagner & others–Work on the RT implementation and discussionsWork on the RT implementation and discussions
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
The FutureThe Future
· Applications on compute clustersApplications on compute clusters– Visualization of large modelsVisualization of large models– Previewing of animations with full shadingPreviewing of animations with full shading
· Hardware support for IRTHardware support for IRT– At least for specialized applicationsAt least for specialized applications
· Convergence between RT and TRConvergence between RT and TR– Occlusion cullingOcclusion culling– Improved shading capabilitiesImproved shading capabilities– Eventually based on the same API?Eventually based on the same API?
Afrigraph 2001, Capetown, ZA Tutorial on Interactive Raytracing
Open Research Open Research ProblemsProblems
Global IlluminationGlobal Illumination· New situationNew situation
– Ray-tracing bottleneck is gone (Well, almost…)Ray-tracing bottleneck is gone (Well, almost…)
· New challengesNew challenges– Need for coherenceNeed for coherence– Efficient computationsEfficient computations– Usage of view-importanceUsage of view-importance– High-degree of parallelismHigh-degree of parallelism– Small communication overheadSmall communication overhead– Interactivity !!!Interactivity !!!– Can we trade quality for speed ?Can we trade quality for speed ?