8/3/2019 Dokken Darmstadt March 2006_print
1/51
1ICT Applied Mathematics
The Graphics Processing Unit (GPU)as a high performance
computational resource for simulationand geometry processing
Tor Dokken
Department of Applied Mathematics
SINTEF Information and Communication Technology
www.sintef.no/gpgpu/[email protected]
March 10th 2006
8/3/2019 Dokken Darmstadt March 2006_print
2/51
2ICT Applied Mathematics
Graphics cards as a high-end
computational resource
Project funded by the Research Council of Norway
Main partner: SINTEF ICT, Department of Applied Mathematics
Period: 2004-2007
Cooperation with a number of academic partners in Norway:
Center of Mathematics for Application at the University of Oslo,Department of Informatics University of Oslo
3 Ph.D. fellows (1. University of Oslo and 2 SINTEF)
6 master students (January 2006-June 2007) (Will sit at SINTEF)
Mathematics Department, University of Bergen
1 Post. Doc
Narvik University College, Narvik 3 master students in 2006, 7 in 2005, 1 in 2004
8/3/2019 Dokken Darmstadt March 2006_print
3/51
3ICT Applied Mathematics
Steps in the graphics pipeline
Vertices aftertransformation andcoloring, per vertex
operations
(Vertex processor)
Rasterization. Texturing and coloring,fragment processing.
(Fragment processor)
8/3/2019 Dokken Darmstadt March 2006_print
4/51
8/3/2019 Dokken Darmstadt March 2006_print
5/51
5ICT Applied Mathematics
Vertex Processor
Vertex processor capabilities:
Vertex transformation. Normal transformation.
Texture coordinate generation and transformation.
Lighting calculations.
Fully programmable.
Processes 4-component vectors (xyzw)
Can change the position of current vertex.
Can not read info from other vertices.
8/3/2019 Dokken Darmstadt March 2006_print
6/51
6ICT Applied Mathematics
Fragment Processor
Main capabilities is to calculate the final color and/or depth
to a fragment. Fully programmable.
Processes 4-component vectors (rgba).
Can read info from other fragments.
Can not write to more than one pixel in the same buffer.
Typically more useful than vertex processor for GPGPUpurposes.
Limited number of registers for temporary variables in afragment shader (currently NVIDIA 32 registers)
8/3/2019 Dokken Darmstadt March 2006_print
7/51
8/3/2019 Dokken Darmstadt March 2006_print
8/51
8ICT Applied Mathematics
Programming Fragment Processor
Specificities
The programs running on the fragment processor arecalled fragment shaders A fragment shader can write to up to 4 render targets (textures)
A fragment shader can read from many textures,
but cannot read from a render target to which it is writing (we do notknow the sequence in which the fragments are processed)
A render target can be converted to a texture to be used as inputin later fragment shaders
When programming the fragment processor for non-graphical purposes, a default primitive covering the entire
viewport should be defined to facilitate the execution ofthe fragment shaders. High level program languagesthrough DirectX or OpenGL
8/3/2019 Dokken Darmstadt March 2006_print
9/51
9ICT Applied Mathematics
Shading languages
High Level Shading Language (HLSL, Microsoft) Part of the DirectX API and only compiles into DirectX code.
Hardware independent.
Windows only.
Game industry.
C for graphics (Cg, NVIDIA) Platform independent but hardware dependent.
Cg and HLSL are very similar languages. Cg/HLSL was co-developed by NVIDIA and Microsoft.
OpenGL Shading Language (GLSL, ARB) Platform and hardware independent.
CAD, scientific visualization, movie industry, academic world...
8/3/2019 Dokken Darmstadt March 2006_print
10/51
10ICT Applied Mathematics
Using shaders
1. Provide shader source code to OpenGL.
2. Compile shader.3. Link compiled shaders together.
4. Use program.
8/3/2019 Dokken Darmstadt March 2006_print
11/51
11ICT Applied Mathematics
The GPU and a double loopBody of for loop
Initiation of geometry (a quad)glBegin(GL_QUADS);
glTexCoord2f(0.0f, 0.0f);glVertex3f(0.0f, 0.0f, 0.0f);glTexCoord2f(1.0f, 0.0f);
glVertex3f(1.0f, 0.0f, 0.0f);glTexCoord2f(1.0f, 1.0f);glVertex3f(1.0f, 1.0f, 0.0f);glTexCoord2f(0.0f, 1.0f);glVertex3f(0.0f, 1.0f, 0.0f);
glEnd();
Initiation of a viewport
glViewport( 0, 0, n, m);
Definition of the clipping volume such that the quad is just inside
the clipping volume.
n x m fragments with texture coordinates in [0,1][0,1] will be executed
8/3/2019 Dokken Darmstadt March 2006_print
12/51
12ICT Applied Mathematics
Debugging shaders
No normal debuggers for GPUs
Debugging has to be based on reading values from textures andanalyzing these
Visualizing the values of computational grid as an image can behelpful for understanding the behavior of the shader.
Parallel CPU-implementations often very useful to verify that theshader works properly (and measure performance)
The reason for an error can be:
You may have an error in your algorithm.
You may have misunderstood the functionality of the shaderlanguage.
The driver for the GPU is not working properly.
8/3/2019 Dokken Darmstadt March 2006_print
13/51
8/3/2019 Dokken Darmstadt March 2006_print
14/51
14ICT Applied Mathematics
Example: Heat Equation
yyxxt uuu +=
Discretication by finite differences over a regular grid:
( ).4 ,1,1,,1,12,1
,
n
ji
n
ji
n
ji
n
ji
n
ji
n
ji
n
ji UUUUUh
kUU ++++= ++
+
Each fragment updated as a
weighted sum of its nearest fiveneighbours.
8/3/2019 Dokken Darmstadt March 2006_print
15/51
15ICT Applied Mathematics
Heat equation shaders
[Heat Equation Fragment shader]
varying vec4 texXcoord;
varying vec4 texYcoord;
uniform sampler2D heatTex;
uniform float r;
void main(void)
{
vec4 col;
vec4 tex = texture2D(heatTex, texXcoord.yx);
vec4 tex0 = texture2D(heatTex, texXcoord.zx);
vec4 tex1 = texture2D(heatTex, texXcoord.wx);
vec4 tex2 = texture2D(heatTex, texYcoord.xz);vec4 tex3 = texture2D(heatTex, texYcoord.xw);
col = tex + r*(tex0+tex1-4.0*tex+tex2+tex3);
gl_FragColor = vec4(col);
}
[Heat Equation Vertex shader]
varying vec4 texXcoord;
varying vec4 texYcoord;
uniform vec2 dXY;
void main(void)
{
texXcoord=gl_MultiTexCoord0.yxxx +
vec4(0.0,0.0,-1.0,1.0)*dXY.x;texYcoord=gl_MultiTexCoord0.xyyy +
vec4(0.0,0.0,-1.0,1.0)*dXY.y;
gl_Position =
gl_ModelViewProjectionMatrix*gl_Vertex
;
}
( ).4 ,1,1,,1,12,1
,
n
ji
n
ji
n
ji
n
ji
n
ji
n
ji
n
ji UUUUUh
kUU ++++= ++
+
Not using thestandard texture
coordinates tomake more
efficient code.
8/3/2019 Dokken Darmstadt March 2006_print
16/51
16ICT Applied Mathematics
Heat Equation contd.
8/3/2019 Dokken Darmstadt March 2006_print
17/51
8/3/2019 Dokken Darmstadt March 2006_print
18/51
18ICT Applied Mathematics
Wave Equation contd.
8/3/2019 Dokken Darmstadt March 2006_print
19/51
19ICT Applied Mathematics
Example: Linear Wave Equation
Discretication by finite differences over a regular grid:
yyxxtt uuu +=
( ).42 ,1,1,,1,122
1
,,
1
,
n
ji
n
ji
n
ji
n
ji
n
ji
n
ji
n
ji
n
ji UUUUUh
kUUU ++++= ++
+
Almost the heat equation, but
needs extra texture to store thevalue at n-1
8/3/2019 Dokken Darmstadt March 2006_print
20/51
20ICT Applied Mathematics
2. order high-resolution scheme
8/3/2019 Dokken Darmstadt March 2006_print
21/51
8/3/2019 Dokken Darmstadt March 2006_print
22/51
22ICT Applied Mathematics
Systems of Conservation Laws
Fundamental laws of physics: conservation of quantities like mass,momentum and energy.
In arbitrary space dimension this reads:
Example: Shallow water equations
)()0,(,0)( 0 xQxQQfQt ==+
=
+
+
++
0
0
0
2
212
2
212
yxtghhu
huv
hv
huv
ghhu
hu
hv
hu
h
8/3/2019 Dokken Darmstadt March 2006_print
23/51
23ICT Applied Mathematics
Lax-Friedrich
8/3/2019 Dokken Darmstadt March 2006_print
24/51
24ICT Applied Mathematics
Speedup - Lax-Friedrichs Runtime per time step and speedup factor for the CPU versus GPU
implementation of Lax-Friedrichs
26.75.54148.001024x1024
25.21.4737.10512x51219.80.469.09256x256
9.530.232.22128x128
SpeedupGPU**
ms
CPU*
ms
N
* 2.8 GHz Intel Xeon (EM64T)** GeForce 7800 GTX (450 MHz)
8/3/2019 Dokken Darmstadt March 2006_print
25/51
25ICT Applied Mathematics
Semi-Discrete High-Resolution Schemes
Evolution of cell averages described by ODEs
Steps in the algorithms:
Reconstruction of piecewise polynomials from cell averages
Evaluation of reconstruction at integration points
Numerical computation of edge fluxes Evolution by Runge-Kutta scheme
y
tGtG
x
tFtFtU
dt
d jijijijiij
+
= ++
)()()()()(
2/1,2/1,,2/1,2/1
8/3/2019 Dokken Darmstadt March 2006_print
26/51
26ICT Applied Mathematics
2nd order high-resolution - Bottom Topography
Initial wave map Initial bottom map
8/3/2019 Dokken Darmstadt March 2006_print
27/51
27ICT Applied Mathematics
2nd order high-resolution - Bottom Topography
8/3/2019 Dokken Darmstadt March 2006_print
28/51
28ICT Applied Mathematics
Bilinear Interpolation - Dry States
Initial wave map Initial bottom map
8/3/2019 Dokken Darmstadt March 2006_print
29/51
29ICT Applied Mathematics
Bilinear Interpolation - Dry States
8/3/2019 Dokken Darmstadt March 2006_print
30/51
8/3/2019 Dokken Darmstadt March 2006_print
31/51
31ICT Applied Mathematics
Euler equations The two dimensional Euler equations model the dynamics of
compressible gasses:
denotes density, uand vvelocity in x- and y- directions,p pressureand Ethe total energy.
The three dimensional
0
)()(
2
2
=
+
++
+
++
yxtpEv
pv
uv
v
pEu
uv
pu
u
E
v
u
8/3/2019 Dokken Darmstadt March 2006_print
32/51
32ICT Applied Mathematics
Example: Interaction of a low-density bubble
with a shock.
8/3/2019 Dokken Darmstadt March 2006_print
33/51
33ICT Applied Mathematics
Speedup of 2D shock-bubble on NxN cells
16.57.62e-21.26e-019.91.48e-12.95e-01024
17.11.72e-22.95e-120.83.32e-26.90e-1512
24.74.37e-31.08e-120.08.69e-31.74e-1256
13.61.38e-31.88e-211.83.70e-34.37e-2128
speedup7800AMDspeedup6800IntelN
Bilinear reconstruction
14.42.99e-14.32e-09.37.14e-16.67e-01024
15.06.86e-21.03e-09.41.78e-11.67e-0512
19.81.74e-23.45e-18.44.99e-24.20e-1256
17.24.60e-37.90e-28.61.22e-21.05e-1128
speedup7800AMDspeedup6800IntelN
CWENO reconstruction
8/3/2019 Dokken Darmstadt March 2006_print
34/51
34ICT Applied Mathematics
3D RayleighTaylor Instability.A layer of heavier fluid isplaced on top of a lighterfluid and the heavier fluid is
accelerated downwards bygravity.
11.51.72e-11.98e-081
13.98.20e-21.14e-064
12.64.16e-25.23e-149
speedup7800AMDN
N N N gridAverage time per time STEP
8/3/2019 Dokken Darmstadt March 2006_print
35/51
35ICT Applied Mathematics
Flow-chart for a GPU implementation of a semi-
discrete, high resolution scheme.
8/3/2019 Dokken Darmstadt March 2006_print
36/51
36ICT Applied Mathematics
Saturation equation The saturation equation models transport of two fluid-phases in a
porous medium.
Example: water injection into a fluvial reservoir filled with oil.
( ) ,0))()(( 0 =++ gsVsfs t
8/3/2019 Dokken Darmstadt March 2006_print
37/51
37ICT Applied Mathematics
Water injection in a fluvial reservoir
8/3/2019 Dokken Darmstadt March 2006_print
38/51
38ICT Applied Mathematics
Linear Advection
Models various transport phenomena of (passive) quantities. Simpleequation, but difficult to compute solutions correctly.
Classical schemes: excessive smearing or spurious oscillations.
Need for more sophisticated schemes.
1st order scheme: Lax-Friedrich 2nd order scheme: Lax-Wendroff
8/3/2019 Dokken Darmstadt March 2006_print
39/51
39ICT Applied Mathematics
Linear Advection contd.
Modern high-resolution schemes:
Higher-order approximation of smooth parts and no spuriousoscillations at discontinuities.
Dissipative Compressive
8/3/2019 Dokken Darmstadt March 2006_print
40/51
40ICT Applied Mathematics
Semi-Discrete High-Resolution Schemes
Evolution of cell averages described by ODEs
Steps in the algorithms:
Reconstruction of piecewise polynomials from cell averages
Evaluation of reconstruction at integration points
Numerical computation of edge fluxes
Evolution by Runge-Kutta scheme
y
tGtG
x
tFtFtQ
dt
d jijijijiji
+
= ++
)()()()()(
2/1,2/1,,2/1,2/1
,
8/3/2019 Dokken Darmstadt March 2006_print
41/51
41ICT Applied Mathematics
CAD Intersections on the GPU
Use the GPU to find if difficult intersection configurationsare present and create conjectures on the intersectionconfiguration. (SINTEF has applied for patent)
Surface intersection check for loops
check for singular intersections
Surface self-intersection
Is a self-intersection possible? Global self-intersection? Two different parts of the surface intersect
Local self-intersection loop?
Ridges with vanishing or near vanish surface normal possiblycombined with self-intersections.
The GPU can help to determine if the self-intersectionneeds advanced intersection algorithms or simplerapproaches can be used.
8/3/2019 Dokken Darmstadt March 2006_print
42/51
42ICT Applied Mathematics
Global Self-intersection
Two different parts of the surface intersect. Proper subdivisionchange the problem to a transversal intersections between twosurfaces.
Trace of intersection in parameter domain
8/3/2019 Dokken Darmstadt March 2006_print
43/51
S
8/3/2019 Dokken Darmstadt March 2006_print
44/51
44ICT Applied Mathematics
Self-intersection with ridges (Vanishing
normal)
The ridges do not in general follow constant parameter lines
Typical for offset surfaces, duct type surfaces and draft-angle surface
cannot be converted to a near-transversal intersection.
Vanishing
surface
normal
Self-
intersectioncurve
Example
made by
think3
Partial trace of intersection in parameter domain
8/3/2019 Dokken Darmstadt March 2006_print
45/51
45ICT Applied Mathematics
Ridges in self-intersection Pipe surface
Computationally expensive to solveby recursive subdivision.
Points
on
ridge
curve
Parts of
self-
intersectionExamplemade bythink3
Partial trace of intersection in parameter domain
A f ( bi ) & l f
8/3/2019 Dokken Darmstadt March 2006_print
46/51
46ICT Applied Mathematics
A surface (cubic) & normal surface
(quintic)
GPU l l t d d i li d i t
8/3/2019 Dokken Darmstadt March 2006_print
47/51
47ICT Applied Mathematics
GPU calculated and visualized points on
normal surface close to the origin
The regions ith near anishing normal
8/3/2019 Dokken Darmstadt March 2006_print
48/51
48ICT Applied Mathematics
The regions with near vanishing normal
visualized as texture on original surface
8/3/2019 Dokken Darmstadt March 2006_print
49/51
49ICT Applied Mathematics
Observations
Most surfaces do not have self-intersection
Important to identify most surfaces without self-intersections asearly as possible
Moderate simplistic subdivision makes the surface and normalcone boxes smaller O(h2) convergence, and will classify manypossible self-intersections as not self-intersecting
For surface with no vanishing surface normal all self-intersection curves intersect a boundary
Vanishing surface normal identifies regions with morecomplex self-intersection topology
Moderate simplistic subdivision reduce the size of such regions,and allows to focus a more complex self-intersection algorithms onsurface sub-regions.
Comparison of CPU and GPU
8/3/2019 Dokken Darmstadt March 2006_print
50/51
50ICT Applied Mathematics
Comparison of CPU and GPU
implementation of simplistic subdivisionInitial process for surface self-intersection
Subdivision of a bicubic Bezier patch and (quintic) patch representing surfacenormals into 2n x 2n subpatches
Test for degenerate normals for subpatches Computation of the approximate normal cones for subpatches
Computation of the approximate bounding boxes for subpatches
Computation of bounding box pair intersections for subpatches
CPU/GPU approach
For n
8/3/2019 Dokken Darmstadt March 2006_print
51/51
51ICT Applied Mathematics
Where are we now, and where do we go?
During the first year (2004) of the project we gotexperiences on the use of the GPU, and the sort of
problems best suited for the GPU We try to make the potential of the GPU understandable to
personnel outside of computer graphics All concepts related to the GPU has a strong computer graphics
flavor We will continue investigation on
Partial differential equations
Geometry problems - intersection
Image processing Linear algebra