4/18/2007 1 Real-Time Graphics Architecture Lecture 4: Parallelism and Communication Kurt Akeley CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Kurt Akeley Pat Hanrahan http://graphics.stanford.edu/cs448-07-spring/ Topics 1. Frame buffers 2. Types of parallelism 3. Communication patterns and requirements 4. Sorting classification for parallel rendering (with examples) CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
4/18/2007
1
Real-Time Graphics Architecture
Lecture 4: Parallelism and Communication
p
Kurt Akeley
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Kurt Akeley
Pat Hanrahanhttp://graphics.stanford.edu/cs448-07-spring/
Topics
1. Frame buffers
2. Types of parallelism
3. Communication patterns and requirements
4. Sorting classification for parallel rendering (with examples)
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
4/18/2007
2
Frame Buffers
Raster vs. calligraphic
Raster (image order)
dominant choice
Calligraphic (object order)
Earliest choice (Sketchpad)
E&S terminals in the 70s and 80s
Works with light pens
Scene complexity affects frame rate
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Scene complexity affects frame rate
Monitors are expensive
Still required for FAA simulationIncreases absolute brightness of light points
4/18/2007
3
Frame buffer definitions
What is a frame buffer?
What can we learn by considering different definitions?
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Frame buffer definition #1
Storage for commands that are executed to refresh the display
Allows for raster or calligraphic display (e g Megatech)Allows for raster or calligraphic display (e.g. Megatech)
“Frame buffer” for calligraphic display is a “display list”
OpenGL “render list”?
Key point: frame buffer contents are interpreted
Color mapping
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Image scaling, warping
Window system (overlay, separate windows, …)
Address Recalculation Pipeline
4/18/2007
4
Frame buffer definition #2
Image memory used to decouple the render frame rate from the display frame rate
Meets common understanding of frame buffer as imageMeets common understanding of frame buffer as image
Leads naturally to double buffering
One render buffer, one display buffer, swap
n-buffering also possible, can control latency
Key idea: decoupling enables general-purpose GPU
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Visual simulation has high render frame rate
MCAD has low render frame rate
Window manager has no frame rate
Frame buffer definition #3
All pixel-assigned memory used to assemble and display the images being rendered
Key point: frame buffer is active participant in renderingLeads to non-color buffers: depth, stencil, window control
OpenGL treats these buffers as part of frame bufferSome reserve “frame buffer” for color imagesShould be n-buffered in some cases (sort last)RealityEngine frame buffer can be deeper than wide or high
History cycles through this definition
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Old idea (visual simulation, window systems)
GigaPixel render tile
Frame buffer stores color images only
Depth, stencil, etc. in small tile
Dominant architecture is consistent
SGI architectures look like
ATI architectures, which look like
NVIDIA architectures
Details are evolving, but big picture remains the same
Why is this?
Simplicity of design
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Simplicity of design
Simplicity of algorithms
Simplicity of immediate-mode approach
4/18/2007
6
Simplicity of design
Frame buffer operationsBlending: merge fragment and pixel colorDepth Buffering: save nearest fragmentp g gStencil Buffering: simple pixel state machineAccumulation Buffering: high-resolution color arithmeticAntialiasing: (to be covered later)….
All frame buffer operations:Combine fragment and pixel data (not just a replace)
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Combine fragment and pixel data (not just a replace)But replace operation is optimized, e.g., no parity/ECC
Are local (no intra-pixel dependencies)
Why aren’t fragment operations programmable?
Simplicity of algorithms
Frame buffer employs brute-force simplicity
Hidden surface elimination: Depth-buffer vs. sort/painter
Capping: Stencil-based vs object calculationsCapping: Stencil based vs. object calculations
Image-space algorithm is efficientJust samples, never “object” information, locality
Just-in-time calculation, steady cost function
Accumulation Buffer (high-resolution color arithmetic)
The Accumulation Buffer, Haeberli and Akeley, Proceedings of SIGGRAPH ‘90
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Proceedings of SIGGRAPH 90
Volume rendering using 3D textures
Multi-pass rendering
Interactive Multi-pass Programmable Shading, Peercy, Olano, Airey, and Ungar, Proceedings of SIGGRAPH ‘00
4/18/2007
7
Simplicity of immediate-mode
Frame buffer contents are “context”
Matches 2D/window-rendering model
Rendering
System
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Frame buffer: most graphics
state here
Little graphics state here
Decreasing display bandwidth burden
Historically display bandwidth was a limiting factor
Parallelism – using multiple computational units to processes work in parallel
Communication – connecting the computational units to Communication – connecting the computational units to allow work to be distributed and aggregated
Issues
DependenciesOrdering
Sorting
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Sorting
ScalabilityComputation
Bandwidth
Load balancing
4/18/2007
9
Parallelism taxonomy
Hardware parallelism(simultaneous execution on multiple processors)
Virtual parallelism(time sharing a single processor, usually with hardware support)
Data parallelism[aka “parallelism”](same task on similardata sets)
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Task parallelism(different tasks on similar OR differing data sets)
Parallelism taxonomy
Hardware parallelism(simultaneous execution on multiple processors)
Virtual parallelism(time sharing a single processor, usually with hardware support)
Data parallelism[aka “parallelism”](same task on similardata sets)
Frame-parallelism(batch, SGI N-clops)
Object-parallelism(geometry)
Image-parallelism(fragment/pixel)
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Task parallelism(different tasks on similar OR differing data sets)
4/18/2007
10
Parallelism taxonomy
Hardware parallelism(simultaneous execution on multiple processors)
Virtual parallelism(time sharing a single processor, usually with hardware support)
Data parallelism[aka “parallelism”](same task on similardata sets)
Frame-parallelism(batch, SGI N-clops)
Object-parallelism(geometry)
Image-parallelism(fragment/pixel)
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Task parallelism(different tasks on similar OR differing data sets)
Multi-processing(on multiple CPUs)
Pipelining(the graphics pipeline)
Parallelism taxonomy
Hardware parallelism(simultaneous execution on multiple processors)
Virtual parallelism(time sharing a single processor, usually with hardware support)
Data parallelism[aka “parallelism”](same task on similardata sets)
Frame-parallelism(batch, SGI N-clops)
Object-parallelism(geometry)
Image-parallelism(fragment/pixel)
Multi-processing(graphics context switching)
Multi-threading(almost defines a GPU-like processor)
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Task parallelism(different tasks on similar OR differing data sets)
Multi-processing(on multiple CPUs)
Pipelining(the graphics pipeline)
4/18/2007
11
Parallelism taxonomy
Hardware parallelism(simultaneous execution on multiple processors)
Virtual parallelism(time sharing a single processor, usually with hardware support)
Data parallelism[aka “parallelism”](same task on similardata sets)
Frame-parallelism(batch, SGI N-clops)
Object-parallelism(geometry)
Image-parallelism(fragment/pixel)
Multi-processing(graphics context switching)
Multi-threading(almost defines a GPU-like processor)
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Task parallelism(different tasks on similar OR differing data sets)
Multi-processing(on multiple CPUs)
Pipelining(the graphics pipeline)
Multi-processing(time sharing a single CPU)
Multi-threading (Direct3D-10 “common-core”)
Graphics is embarrassingly parallel
Ample self-similar data sets …
Frames, vertexes, fragments, texels, pixels
With minimal dependenciesWith minimal dependencies
Few intra-set dependenciesPixels (in the frame buffer) are the significant exception
Inter-set dependencies are purely sequential
“Graphics pipeline” is designed to minimize dependencies
Other graphics architectures have more dependencies
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
E.g., for global lighting effects
But graphics pipeline has huge redundanciesHence many opportunities for optimization …
How hard should we work to do things wrong ?
4/18/2007
12
Geometry parallelism trend (SGI)
20
0
5
10
15ModelTransform LengthTransform Width
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
0
1000
2000 G
GTXVGX RE IR
Image parallelism trend (SGI)
Rasterization
100
200
300
400
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
01000 2000 G GTX VGX RE IR
4/18/2007
13
The clear trend
Shorter and wider
Why ?
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Why ?
Communication taxonomy
Sorting
Object ImageDistribution Routing(Introduced by
parallelism)(Introduced by
parallelism)
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Texturing
Fundamental
4/18/2007
14
Sorting is fundamentalSorting
Object ImageDistribution Routing
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
TexturingI. E. Sutherland, R. F. Sproull, and R. A. Schumacher, A characterization of ten hidden surface algorithms
Classified by order of x, y, z radix sorts
Pipelining vs. parallelism
Issue Task Parallelism(pipelining) Data Parallelism
Orderingdependencies Easy Challenging
Sortingdependencies Easy Challenging
Computationscalability
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007
Bandwidthscalability
Load balancingscalability
4/18/2007
15
Pipelining vs. parallelism
Issue Task Parallelism(pipelining) Data Parallelism
Orderingdependencies Easy Challenging
Sortingdependencies Easy Challenging
Computationscalability Poor Challenging
CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007