Page 1
Review for Modern GPU Hardware
Lan-Da Van (范倫達), Ph. D.
Department of Computer Science
National Chiao Tung University Hsinchu, Taiwan
Spring, 2020
1
The following content are extracted from the material in the references on
last page. If any wrong citation or reference missing, please contact
[email protected] . I will correct the error asap.
This course used only and please do NOT broadcast. Thank you.
Page 2
Outline
2
GPU Pipeline
History of GPU Hardware
GPU Hardware Consideration
Modern GPU Hardware Architecture
NVIDIA GeForce
AMD (ATI) Radeon
IMG PowerVR
ARM Mali
GPU Applications
Summary
Page 3
GPU Fundamentals: Graphics Pipeline
• A simplified graphics pipeline
– Note that pipe widths vary
– Many caches, FIFOs, and so on not shown
GPUCPU
ApplicationTransform
& LightRasterize Shade Video
Memory
(Textures)
Xfo
rmed, L
it Vertic
es (2
D)
Graphics State
Render-to-texture
Assemble
Primitives
Vertic
es (3
D)
Scre
ensp
ace tria
ngle
s (2D
)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, Depth
)
Page 4
GPU
Transform
& Light
CPU
Application Rasterize Shade Video
Memory
(Textures)
Xfo
rmed, L
it Vertic
es (2
D)
Graphics State
Render-to-texture
Assemble
Primitives
Vertic
es (3
D)
Scre
ensp
ace tria
ngle
s (2D
)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, Depth
)
GPU Fundamentals: ModernGraphics Pipeline
• Programmable vertex processor!
• Programmable pixel processor!
Fragment
Processor
Vertex
Processor
Page 5
GPUCPU
ApplicationVertex
ProcessorRasterize
Fragment
ProcessorVideo
Memory
(Textures)
Xfo
rmed, L
it Vertic
es (2
D)
Graphics State
Render-to-texture
Vertic
es (3
D)
Scre
ensp
ace tria
ngle
s (2D
)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, Depth
)
GPU Fundamentals: ModernGraphics Pipeline
Assemble
Primitives
Geometry
Processor
Programmable primitive assembly!
More flexible memory access!
Page 6
History of Graphics Hardware (1/3)
6
… - mid ’90s
SGI mainframes and workstations
PC: only 2D graphics hardware
mid ’90s
Consumer 3D graphics hardware (PC)
- 3dfx, NVIDIA, Matrox, ATI, …
Triangle rasterization (only)
Cheap: pushed by game industry
1999
PC-card with TnL (Transform and Lighting)
- NVIDIA GeForce: Graphics Processing Unit (GPU)
PC-card more powerful than specialized workstations
3DFX Voodoo graphics 4MB - 1997
Page 7
History of Graphics Hardware (2/3)
https://www.zhihu.com/question/21980949
Page 8
History of Graphics Hardware (3/3)
8
Modern graphics hardware
Graphics pipeline partly programmable
Leaders: AMD(ATI) and NVIDIA
- “AMD Radeon HD 6990” and “NVIDIA GeForce GTX 590”
Game consoles similar to GPUs (Xbox)
Page 9
Computational Power (1/2)
• GPUs are fast…
– 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160):
• Computation: 48 GFLOPS peak
• Memory bandwidth: 21 GB/s peak
• Price: $874 (chip)
– NVIDIA GeForce 8800 GTX:
• Computation: 330 GFLOPS observed
• Memory bandwidth: 55.2 GB/s observed
• Price: $599 (board)
• GPUs are getting faster, faster
– CPUs: 1.4× annual growth
– GPUs: 1.7× (pixels) to 2.3× (vertices) annual growth
Page 10
Computational Power (2/2)
Courtesy Naga Govindaraju
GPU
CPU
Page 11
Flops Comparison on GPU and CPU
Page 12
Memory Bandwidths Comparison of CPU and GPU
Page 13
Motivation
• Why are GPUs getting faster so fast?
– Arithmetic intensity
• the specialized nature of GPUs makes it easier to use additional transistors for computation
– Economics
• multi-billion dollar video game market is a pressure cooker that drives innovation to exploit this property
Page 14
Flexible and Precise
• Modern GPUs are deeply programmable
– Programmable pixel, vertex, and geometry engines
– Solid high-level language support
• Modern GPUs support “real” precision
– 32-bit/64-bit floating point throughout the pipeline
• High enough for many applications
– DX10-class GPUs add 32-bit integers
Page 15
Graphics Hardware Consideration (1/2)
• GPU = Graphics Processing Unit– Vector processor
– Operates on 4 tuples• Position ( x, y, z, w )
• Color ( red, green, blue, alpha )
• Texture Coordinates ( s, t, r, q )
– 4 tuple ops, 1 clock cycle• SIMD [ Single Instruction Multiple Data ]
– ADD, MUL, SUB, DIV, MADD, …
Page 16
• Pipelining
– Number of stages
• Parallelism
– Number of parallel processes
• Parallelism + pipelining
– Number of parallel pipelines
1 2 3
1 2 3
1 2 3
1 2 3
1
2
3
Graphics Hardware Consideration (2/2)
Page 17
Outline
17
GPU Pipeline
History of GPU Hardware
GPU Hardware Consideration
Modern GPU Hardware Architecture
NVIDIA GeForce
AMD (ATI) Radeon
IMG PowerVR
ARM Mali
Summary
Page 18
http://5pit.tw/tech/computer/tid_12880
Page 19
Growth of NVIDIA GPU
• Performance matrices
– Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate.
Page 20
Growth of NVIDIA GPU
Page 21
NVIDIA GeForce 7900 GTX
Page 22
Nvidia Graphics Card Architecture
• GeForce-8 Series– 12,288 concurrent threads, hardware managed– 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
L2
Memory
Work DistributionHost CPU
L2
Memory
L2
Memory
L2
Memory
L2
Memory
L2
Memory
Page 24
FERMI: Streaming Multiprocessor (SM)
• Each SM contains
• 32 Cores
• 16 Load/Store units
• 32,768 registers
• Newer FP representation
• IEEE 754-2008
• Two units
• Floating point
• Integer
Page 26
FERMI: Comparison
Page 27
Kepler: Core Architecturehttp://www.weistang.com/article-941-1.html
Page 28
Titan vs Tesla Comparison
Page 29
Maxwell: Core Architecturehttp://www.weistang.com/article-941-1.html
http://www.coolaler.com/showthread.php/313295-
%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A
B%98%E6%95%88GPU%EF%BC%9ANVIDIA-
Maxwell%E6%9E%B6%E6%A7%8B
Page 30
Kepler vs Maxwell Comparison
http://www.coolaler.com/showthread.php/313295-
%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIA-
Maxwell%E6%9E%B6%E6%A7%8B
2012 2014
Page 31
Mobile Roadmap
http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler-
into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table-
drawing-tablet?page=2
Page 32
Pascal: Core Architecture
https://read01.com/zh-tw/oemmE4.html#.Wi5F30qWYps
Page 33
Volta: Core Architecture
http://technews.tw/2017/05/11/nvidia-gpu-volta/
Page 34
Pascal vs Volta Comparison
http://technews.tw/2017/05/11/nvidia-gpu-volta/
2016 2017
Page 35
https://zh.wikipedia.org/wiki/CUDA
Page 36
• Features of ATI Radeon X1900 XTX
– Core speed 650 MHz
– 48 pixel shader processors
– 8 vertex shader processors
– 51 GB/s memory bandwidth
– 512 MB memory
ATI Radeon X1900 XTX
http://product.pcpop.com/000024721/Index
.html
Page 37
GPU
650MHzGraphics memory
½ GB
CPU
3GHzMain memory
1GB
Cach
e
½M
B
AGP bus
2GB/s
Output
Graphics CardHigh bandwidth
51GB/s
High bandwidth
77GB/s
Par
alle
l P
roce
sses
3GB/s
AGP memory
½ GB
Processor Chip
• High Memory Bandwidth
ATI Radeon X1900 XTX
Page 38
• Parallelism + pipelining: ATI Radeon 9700
4 vertex pipelines 8 pixel pipelines
ATI Radeon 9700
Page 39
Radeon Comparison
http://www.pcdiy.com.tw/detail/4275
Page 40
http://wccftech.com/amd-vega-4096-gcn-stream-processors/
Page 41
http://wccftech.com/amd-vega-4096-gcn-stream-processors/
Page 42
http://www.anandtech.com/show/9233/amds-2016-gpu-roadmap-
finfet-high-bandwidth-memory
Page 43
http://www.anandtech.com/show/9233/amds-2016-gpu-roadmap-
finfet-high-bandwidth-memory
Page 44
https://www.youtube.com/watch?v=l_f_lIF3A7Q
Page 45
45
https://www.cnread.news/content/2536026.html
Page 46
46
http://intotech.ir/phone-tablet/proccessor/ -مقایسه-گوشی-گرافیکی-پردازندهی powervr- /ای
Page 47
47
http://imgtec.eetrend.com/news/7355
Page 48
IMG PowerVR Series5XT (SGXMP)
48
Page 49
IMG PowerVR Series5XT (SGXMP)
49
• Shader-driven Tile-Based Deferred Rendering (TBDR) architecture
• Fully programmable GPU using unique USSE architecture
• All SGX cores support OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1
Page 50
IMG PowerVR Series6 (Rogue)
50
Page 51
IMG PowerVR Series6 (Rogue)
51
• Support OpenGL ES 3.0, OpenGL ES 2.0, OpenGL 3.x/4.x, OpenCL 1.x and DirectX10 with certain family members.
http://technews.tw/2014/07/19/powervr-rogue-gpu-list/
Page 52
IMG PowerVR 7XT Plus
52http://imgtec.eetrend.com/article/7130
Page 53
IMG PowerVR 7XT Plus
53http://imgtec.eetrend.com/article/7130
Page 54
IMG PowerVR 7XT Plus
54
http://www.21ic.com/news/opto/201703/709965.htm
Page 55
IMG PowerVR 8XE Plus
55
http://www.anandtech.com/show/11028/powervr-8xe-plus-announced
Page 56
IMG PowerVR 8XE Plus
56
http://www.anandtech.com/show/11028/powervr-8xe-plus-announced
Page 57
IMG PowerVR 8XE Plus
57
http://www.anandtech.com/show/11028/powervr-8xe-plus-announced
Page 58
IMG PowerVR 8XE Plus
58
http://www.anandtech.com/show/11028/powervr-8xe-plus-announced
Page 59
59
http://intotech.ir/phone-tablet/proccessor/ -مقایسه-گوشی-گرافیکی-پردازندهی powervr- /ای
Page 60
Features of ARM Mali
60
http://www.arm.com/products/graphics-and-multimedia/mali-gpu
Page 63
ARM Mali-400MP
63
Page 64
ARM Mali-450MP
64
Page 66
ARM Mali-T604
• GPGPU (support OpenCL 1.1)
• Tri-pipe architecture
• The first GPU based on the Midgard architecture
• True IEEE double-precision floating-point math in hardware for Full Profile
• The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU
• 5x performance improvement over previous Mali graphics processors.
66
Page 68
ARM Mali-T678
68
• 50% performance improvement compared to the Mali-T658.
Page 71
ARM Mali Comparison
71
https://zh.wikipedia.org/wiki/Mali_(GPU)
Page 72
ARM Mali Comparison
72
https://zh.wikipedia.org/wiki/Mali_(GPU)
Page 73
Applications (1/4)
• Includes lots of applications
– Ray-tracer
– Image segmentation
– FFT/Linear Algebra
http://graphics.stanford.edu/data/3Ds
canrep/stanford-bunny-cebal-ssh.jpg
http://f.fwallpapers.com/images/3d
-bunny.jpg
Page 74
09/02/11
Applications (2/4)
http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler-
into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table-
drawing-tablet?page=2
Page 75
Applications (3/4)
http://wechatinchina.com/thread-461154-1-1.html
Page 76
http://wechatinchina.com/thread-461154-1-1.html
Applications (4/4)
AR and VR Applications @@
Page 77
Summary
77
Understand the GPU pipeline in depth
Understand the motivation of of GPU hardware
Understand modern GPU hardware architecture and
specifications
Understand GPU/GPGPU applications
Page 78
Reference
78
GPU Architecture & CG, Mark Colbert, 2006
Introduction to Graphics Hardware and GPUs, Yannick Francken,
Tom Mertens
GPU Tutorial, Yiyunjin, 2007
Evolution of GPU and Graphics Pipelining, Weijun Xiao
Commercial product website (NVIDIA, ATI, IMG, ARM).
Referencing SIGGRAPH 2005 Course Notes from David Luebke
Adapted from: David Luebke (University of Virginia) and NVIDIA
Jan Verschelde, MCS 572 Lecture 27, Introduction to
Supercomputing, 17 March 2014
Acknowledgement:
Thanks for TA’s help for preparing the material.