Top Banner

of 59

A Hardware Pipeline for Accelerating Ray Traversal Algorithms

Apr 07, 2018

Download

Documents

Lija Mohan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    1/59

    A Hardware Pipeline for Accelerating

    Ray traversal Algorithms onStreaming Processors

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    2/59

    Introduction

    Ray tracing

    Ray tracing algorithms

    Ray traversal hardware pipeline

    Streaming processors

    GPGPU

    Performance degradation of 1.5X-2.5X

    2Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    3/59

    Introduction

    2 stage traversal process

    1. Hardware implementation2. User defined algorithm

    3Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    4/59

    Introduction

    Performance Simulator created

    streaming processor architecture

    Kd tree as software traversal algorithm

    Software traversal reduced by 32X

    Instruction executed reduced by 2.15X.

    Roll No:7 Mtech CSIS FISATJanuary 11

    4

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    5/59

    Previous Work

    Accelerated Data Structures Hierarchical Space Subdivision Schemes

    Bounding Volume Hierarchies

    GPU implementations Vector operations

    Graphics Hardware

    Large programmable multi-core architectures

    Graphics computations in parallel

    Multiple threads on each processor

    Software kernels

    Vector operations and vectorized processors 5Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    6/59

    Pipeline Traversal Algorithm

    Group Uniform Grid (GrUG)

    Axis-aligned subdivision of space

    Two hierarchical layers

    Top Layer

    L

    owerL

    ayer

    6Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    7/59

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    8/59

    Grid Concepts

    Spatial Subdivisions

    Roll No:7 Mtech CSIS FISATJanuary 11

    8

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    9/59

    Stepping Between Neighbours

    DDA method is used

    tmax , delta and step

    9Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    10/59

    Ray projection from original GrUG grouping in A to next GrUG

    grouping in B. To compute the next point along the ray for the

    hash function,the ray is projected by the tmin value.

    10Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    11/59

    A

    DC

    KD-Tree

    B

    X

    Y

    Z

    X

    Y Z

    A B C D

    tmin

    tmax

    11Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    12/59

    DC

    A

    B

    X

    Y

    Z

    KD-Tree Traversal

    X

    Y Z

    A B C D

    12Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    13/59

    DC

    A

    B

    X

    Y

    Z

    Observation

    X

    Y Z

    A B C D

    Current leafs tmax Next leafs tmin= 13Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    14/59

    Overview of GrUG

    2 spatial seperation methods

    Uniform Grid

    GrUG groups Traversal of GrUG

    Hash Table

    Performs 2 mappings Input:ray location

    Output:memory address of GrUG group

    14Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    15/59

    Hash function starting with X,Y,Z coordinates and outputting the

    memory address of a GrUG grouping that can be passed to a software

    traversal algorithm.

    15Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    16/59

    Hash function implementation

    3 axes concatenated to form CellID

    Allows parallel processing

    Roll No:7 Mtech CSIS FISATJanuary 11

    16

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    17/59

    Hash Function Implementation

    17Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    18/59

    Architecture of Group Uniform Grid

    18Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    19/59

    Data Structure Creation

    2 memory spaces

    Hash table

    User defined tree data structure

    Starts at GrUG groupings

    Kd tree is used

    Uniform grid structure Only leaf nodes need to be present in memory

    19Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    20/59

    Pipeline Architecture

    Standalone processing block inside processor

    Fixed Hardware

    Memory address registers Ray Projection

    Ray undergoes GrUG traversal

    Read bounding box of the GrUG groups

    tmax value is computed

    20Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    21/59

    Pipeline architecture

    Rays per clock cycle

    Pipeline stages can be vectorized

    Ideal for streaming processors

    Roll No:7 Mtech CSIS FISATJanuary 11

    21

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    22/59

    Integration of the GrUG pipeline into a multi-core

    graphics processor

    and the fixed hardware stages for the GrUG pipeline.

    22Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    23/59

    Hash Function

    Determine grid cell of a ray

    Grid cell id to memory address

    Locate root node for software traversal

    Input: Ray location (x,y,z)

    Output: 9 bit value from each hash functionpipeline

    Maximum grid size support 512 X 512 X 512

    Floating point values from -1.0 to 1.0

    23Roll No:7 Mtech CSIS FISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    24/59

    Architecture of GrUG hash function for

    one axis using a 512 grid

    24Roll No:7 Mtech CSIS F

    ISAT

    January 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    25/59

    Implementation Simulator

    GPGPU SIM simulator

    PTX assembly files generated-NVIDIA NVCC

    compiler

    PTX assembly code modification

    25Roll No:7 Mtech CSIS F

    ISAT

    January 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    26/59

    Implementation

    Kernel Code

    Ray generation

    Post GrUG traversal operation

    Read selected GrUG grouping bounding box

    Compute rays tmax value

    Kd tree algorithm

    Radius CUDA

    Ray triangle intersection

    Walds algorithm

    26Roll No:7 Mtech CSIS F

    ISAT

    January 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    27/59

    Kernel Code

    27Roll No:7 Mtech CSIS F

    ISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    28/59

    Benchmark Scenes

    8 scenes

    Resolution 512 X 512

    28Roll No:7 Mtech CSIS F

    ISATJanuary 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    29/59

    Roll No:7 Mtech CSIS F

    ISATJanuary 11 29

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    30/59

    Roll No:7 Mtech CSIS F

    ISATJanuary 11 30

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    31/59

    Roll No:7 Mtech CSI

    S FI

    SATJanuary 11 31

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    32/59

    Roll No:7 Mtech CSI

    S FI

    SATJanuary 11 32

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    33/59

    Roll No:7 Mtech CSI

    S FI

    SATJanuary 11 33

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    34/59

    Roll No:7 Mtech CSI

    S FI

    SATJanuary 11 34

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    35/59

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    36/59

    Roll No:7 Mtech CSIS FISAT

    January 11 36

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    37/59

    Roll No:7 Mtech CSIS FISAT

    January 11 37

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    38/59

    Roll No:7 Mtech CSIS FISAT

    January 11 38

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    39/59

    Roll No:7 Mtech CSIS FISAT

    January 11 39

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    40/59

    Roll No:7 Mtech CSIS FISAT

    January 11 40

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    41/59

    Roll No:7 Mtech CSIS FISAT

    January 11 41

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    42/59

    Roll No:7 Mtech CSIS FISAT

    January 11 42

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    43/59

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    44/59

    Roll No:7 Mtech CSIS FISAT

    January 11 44

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    45/59

    Roll No:7 Mtech CSIS FISAT

    January 11 45

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    46/59

    Roll No:7 Mtech CSIS FISAT

    January 11 46

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    47/59

    Roll No:7 Mtech CSIS FISAT

    January 11 47

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    48/59

    Results

    a) Performance

    Relative speedup over brute-force intersection.

    12.9

    Box Bunny Robots Kitchen

    48Roll No:7 Mtech CSIS FISAT

    January 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    49/59

    Performance Results

    Reduced the number o f tree traversal steps by 32.5xfor visible rays.

    Overall Speedup : Average 1.6X for visible rays

    Performance for grid size of 128 is improved over

    software implementation

    by 1.9X compared to 2.15X

    for a grid size of 512.

    Conference benchmark

    scene at resolution 128

    Roll No:7 Mtech CSIS FISAT

    January 11 49

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    50/59

    Results

    b) Memory

    50Roll No:7 Mtech CSIS FISAT

    January 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    51/59

    Memory Requirements

    Overhead of storing hash table in memory

    4 bytes / grid cell -> 4,294,967,296 GrUG groups

    512 MB hash table

    2 bytes / grid cell -> 65536 GrUG groups

    256 MB hash table

    Smaller grid size -> upto 4MB hash table

    128 grid size -> 1.5 times memory of kd tree 512 grid size -> 27.6 times memory of kd tree

    Roll No:7 Mtech CSIS FISAT

    January 11 51

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    52/59

    Memory Requirements

    Smaller grid sizes are more efficient

    Balance between performance and memory

    Stores kd tree structure

    bounding dimensions of threshold nodes

    Similar memory requirement for storing a full

    kd tree.

    Roll No:7 Mtech CSIS FISAT

    January 11 52

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    53/59

    Results

    c) Bandwidth

    53Roll No:7 Mtech CSIS FISAT January 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    54/59

    Bandwidth requirements

    Average memory bandwidth per frame issmaller

    Less down tree traversals -> less device

    memory transactions Bandwidth is used for post GrUG software

    traversal

    GrUG Memory bandwidth + down treetraversal < down traversals by full softwareimplementation

    Roll No:7 Mtech CSIS FISAT

    January 11 54

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    55/59

    Advantages

    Maintains user programmability

    Increases ray tracing performance

    Diverse implementation scope

    55

    Roll No:7 Mtech CSIS FISAT

    January 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    56/59

    Conclusion

    New graphics hardware architecture

    Small fixed hardware pipeline

    Offload part of the acceleration traversalcomputations

    Diverse implementation scope of processor

    architecture

    User programmability

    Overall run time performance

    56

    Roll No:7 Mtech CSIS FISAT

    January 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    57/59

    Future Work

    57

    Roll No:7 Mtech CSIS FISAT

    January 11

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    58/59

    References

    [1] Algorithm for 3D digital differential algorithm

    CG351-551 Raytracing Algorithm for 3DDDA.htm

    [2] Introduction to GRIDS

    flipcode - Raytracing Topics & Techniques.mht

    [3] KD-Tree Acceleration Structures for a GPU Raytracer.

    Tim Foley, Jeremy Sugerman Stanford University

    [4] Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure

    Michael Steffen and Joseph Zambreno , Department of Electrical and Computer Engineering

    Iowa State University, USA.

    [5] Analyzing CUDA Workloads Using a Detailed GPU Simulator

    Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong and Tor M. Aamodt

    University of British Columbia,Vancouver, BC, Canada,

    {bakhoda,gyuan,wwlfung,henryw,aamodt}@ece.ubc.ca[6] Ray Tracing on a GPU with CUDA Comparative Study of Three Algorithms

    Martin Zlatuka Czech Technical University in Prague,Faculty of Electrical Engineering

    Czech Republic,zlatum1{@}fel.cvut.cz

    [7] Wikepedia, Ray Tracing basics.

    Roll No:7 Mtech CSIS FISAT

    January 11 58

  • 8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms

    59/59

    Thank you

    Roll No:7 Mtech CSIS FISAT