8/3/2019 02 Gpu Architecture Overview s07
1/22
GPU Architecture OverviewGPU Architecture Overview
John OwensJohn Owens
UC DavisUC Davis
8/3/2019 02 Gpu Architecture Overview s07
2/22
The Right-Hand TurnThe Right-Hand Turn
[H&P Figure 1.1]
8/3/2019 02 Gpu Architecture Overview s07
3/22
3
Why? [Architecture Reasons]Why? [Architecture Reasons]
ILP increasingly difficult to extract frominstruction stream
Control hardware dominates processors
Complex, difficult to build and verify Takes substantial fraction of die
Scales poorly
Pay for max throughput, sustain average throughput
Quadratic dependency checking Control hardware doesnt do any math!
Intel Core Duo: 48 GFLOPS, ~10 GB/s
NVIDIA G80: 330 GFLOPS, 80+ GB/s
8/3/2019 02 Gpu Architecture Overview s07
4/22
4
AMDAMD DeerhoundDeerhound (K8L)(K8L)
chip-architect.com
8/3/2019 02 Gpu Architecture Overview s07
5/22
5
Why? [Technology Reasons]Why? [Technology Reasons]
Industry moving from instructions persecond to instructions per watt
Power wall now all-important
Traditional proc techniques are not power-efficient
We can continue to put more transistors ona chip
but we cant scale their voltage like we used to
and we cant clock them as fast
8/3/2019 02 Gpu Architecture Overview s07
6/22
Go ParallelGo Parallel
Time of architecturalinnovation GPUs let us explore using
hundreds of processors now, not
10 years from now Major CPU vendors
supporting multicore
Interest in general-purpose
programmability on GPUs Universities must teach
thinking in parallel
8/3/2019 02 Gpu Architecture Overview s07
7/22
7
WhatWhats Different about the GPU?s Different about the GPU?
The future of the desktop is parallel We just dont know what kind of parallel
GPUs and multicore are different Multicore: Coarse, heavyweight threads, better
performance per thread GPUs: Fine, lightweight threads, single-thread
performance is poor
A case for the GPU
Interaction with the world is visual GPUs have a well-established programming model
Market for GPUs is 500M+ total/year
8/3/2019 02 Gpu Architecture Overview s07
8/22
GPU
The Rendering PipelineThe Rendering Pipeline
Application
Rasterization
Geometry
Composite
Compute 3D geometryMake calls to graphics API
Transform geometry from 3D to
2D (in parallel)
Generate fragments from 2D
geometry (in parallel)
Combine fragments into image
8/3/2019 02 Gpu Architecture Overview s07
9/22
GPU
TheThe ProgrammableProgrammable PipelinePipeline
Application
Rasterization
Geometry
Composite
Compute 3D geometryMake calls to graphics API
Transform geometry from 3D to
2D [vertex programs]
Generate fragments from 2D
geometry [fragment programs]
Combine fragments into image
8/3/2019 02 Gpu Architecture Overview s07
10/22
DirectX 10 PipelineDirectX 10 Pipeline
VertexVertex
BufferBuffer
InputInput
AssemblerAssemblerVertexVertex
ShaderShaderSetupSetup
RasterizerRasterizerOutputOutput
MergerMerger
PixelPixel
ShaderShaderGeometryGeometry
ShaderShader
IndexIndex
BufferBufferTextureTexture TextureTexture
RenderRender
TargetTargetDepthDepth
StencilStencilTextureTexture
StreamStream
BufferBuffer
Stream outStream out
MemoryMemory
memorymemory
programmableprogrammable
fixedfixed
SamplerSampler SamplerSampler SamplerSampler
ConstantConstant ConstantConstant ConstantConstant
Courtesy David Blythe, MicrosoftCourtesy David Blythe, Microsoft
8/3/2019 02 Gpu Architecture Overview s07
11/22
11
Characteristics of GraphicsCharacteristics of Graphics
Large computational requirements Massive parallelism
Graphics pipeline designed for independent
operations
Long latencies tolerable
Deep, feed-forward pipelines
Hacks are OKcan tolerate lack of accuracy
GPUs are good at parallel, arithmetically
intense, streaming-memory problems
8/3/2019 02 Gpu Architecture Overview s07
12/22
Application
Vertex
Geometry
Rasterization
Fragment
Display
Command
Application/
Command (CPU)
Command
Vertex
Geometry
Raster-
ization
Fragment
Display
GPU
Mem
Mem
Graphics HardwareGraphics HardwareTask ParallelTask Parallel
8/3/2019 02 Gpu Architecture Overview s07
13/22
13
Rage 128Rage 128
8/3/2019 02 Gpu Architecture Overview s07
14/22
Triangle Setup
L2 Tex
Shader Instruction Dispatch
Fragment Crossbar
Memory
Partition
Memory
Partition
Memory
Partition
Memory
Partition
Z-Cull
NVIDIA GeForce 6800 3D PipelineNVIDIA GeForce 6800 3D Pipeline
Courtesy Nick Triantos, NVIDIA
Vertex
Fragment
Composite
8/3/2019 02 Gpu Architecture Overview s07
15/22
Programmable PipelineProgrammable Pipeline
Application
Command
Per-Surface
Tessellation
Per-Vertex
Primitive Assembly
Per-Primitive
Rasterization
Per-Fragment
Image Composition?
Per-Pixel
Display
Per-Texel Texture
Memory
Pixel Ops
Object Space
Image Space
Texture Spaces
FB
[From Akeley and Hanrahan, Real-Time Graphics Architectures]
8/3/2019 02 Gpu Architecture Overview s07
16/22
16
Transform A
to B
Process A toA
Generalizing the PipelineGeneralizing the Pipeline
Transform A to B Ex: Rasterization (triangles
to fragments)
Historically fixed function
Process A to A
Ex: Fragment program
Recently programmable,
and becoming more so
8/3/2019 02 Gpu Architecture Overview s07
17/22
17
GeForce 8800 GPUGeForce 8800 GPU
Global Memory
Thread Execution Manager
Input Assembler
Host
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Load/store
Thread Processors Thread ProcessorsThread ProcessorsThread ProcessorsThread ProcessorsThread ProcessorsThread ProcessorsThread Processors
[courtesy of Ian Buck, NVIDIA]
Built aroundprogrammable units
Unified shader
8/3/2019 02 Gpu Architecture Overview s07
18/22
Application
Vertex
Geometry
Rasterization
Fragment
Display
Command
Application/
Command (CPU)
Command
Rasterization
Display
GPU
Mem
Mem
Programmable
UnifiedUnified ShadersShaders
8/3/2019 02 Gpu Architecture Overview s07
19/22
19
http://www.neoptica.com/NeopticaWhitepaper.pdf
http://www.graphicshardware.org/previous/www_2006/presentations/pharr-keynote-gh06.pdf
Towards Programmable GraphicsTowards Programmable Graphics
Fixed function Configurable, but not programmable
Programmable shading
Shader-centric
Programmable shaders, but fixed pipeline
Programmable graphics
Customize the pipeline
Neoptica asserts the major obstacle is programmingmodels and tools
8/3/2019 02 Gpu Architecture Overview s07
20/22
YesterdayYesterdays Vendor Supports Vendor Support
High-Level Graphics Language
OpenGL
D3D
Low-Level Device Driver
8/3/2019 02 Gpu Architecture Overview s07
21/22
TodayTodays New Vendor Supports New Vendor Support
High-Level Graphics Language
OpenGL
D3D
Compute
Low-Level Device Driver
High-Level
Compute Lang.
Low-LevelAPI
CUDA
CTM HAL
CTM CAL
8/3/2019 02 Gpu Architecture Overview s07
22/22
22
Architecture SummaryArchitecture Summary
GPU is a massively parallel architecture Many problems map well to GPU-style computing
GPUs have large amount of arithmetic capability
Increasing amount of programmability in the pipeline
New features map well to GPGPU Unified shaders
Direct access to compute units in new APIs
Challenge: How do we make the best use of GPU hardware?
Techniques, programming models, languages,evaluation tools