Glsl Intro Survey

Introduction to GPU Programming with GLSL

Ricardo MarroquimIstituto di Scienza e Tecnologie dell’Informazione

CNRPisa, Italy

[email protected]

Andre MaximoLaboratorio de Computacao Grafica

COPPE – UFRJRio de Janeiro, [email protected]

Figure 1. Teapot textured with the SIBGRAPI 2009 logo using the fixed functionality (left), and with a combination of shaders (right).

Abstract—One of the challenging advents in ComputerScience in recent years was the fast evolution of parallelprocessors, specially theGPU – graphics processing unit. GPUstoday play a major role in many computational environments,most notably those regarding real-time graphics applications,such as games.

The digital game industry is one of the main drivingforces behind GPUs, it persistently elevates the state-of-artin Computer Graphics, pushing outstanding realistic scenesto interactive levels. The evolution of photo realistic scenesconsequently demands better graphics cards from the hard-ware industry. Over the last decade, the hardware has not onlybecome a hundred times more powerful, but has also becomeincreasingly customizable allowing programmers to alter someof previously fixed functionalities.

This tutorial is an introduction to GPU programming usingthe OpenGL Shading Language –GLSL. It comprises anoverview of graphics concepts and a walk-through the graphicscard rendering pipeline. A thorough understanding of thegraphics pipeline is extremely important when designing aprogram in GPU, known as ashader. Throughout this tutorial,the exposition of the GLSL language and GPU programmingdetails are followed closely by examples ranging from verysimple to more practical applications. It is aimed at an audiencewith no or little knowledge on the subject.

Keywords-GPU Programming; Graphics Hardware; GLSL.

I. I NTRODUCTION

The realism of interactive graphics has been leverage bythe growth of the GPU’s computational power throughoutthe years, turning a simple graphics card into a highlyparallel machine inside a regular PC. The main motivationof this growth is given by the game community whichincreasingly demands realism and consequently graphicsprocessing power. To add even more realism, graphics unitshave also shifted from being simple rendering black boxesto powerful programming units, allowing a variety of specialeffects within the graphics pipeline, such as those shown inFigure 1.

One of the challenges of GPU programming is to learnhow streaming architectures work. The true power of graph-ics processing comes from the fact that its primitives, e.g.vertices or pixels, are computationally independent, thatis,the graphics data can be processed in parallel. Hence, a largenumber of primitives can be streamed through the graphicspipeline to achieve high performance.

Streaming programming is heavily based on the singleinstruction multiple data (SIMD) paradigm, where the sameinstruction is used to process different data. Within the GPUcontext, the data flows through the graphics pipeline whilebeing processed by different programmable stages, called

shaders. Differently from traditional sequential program-ming, the GPU streaming model forces a fixed data flowthrough pipeline stages, i.e. a shader is restricted to dealwith a specific input and output data type in a specific stageof the pipeline. From the many different rendering stages,only three of them is currently available in hardware to beprogrammable: the vertex shader, responsible to transformvertex primitives; the geometry shader, responsible to buildthe geometry from vertices; and the fragment shader, respon-sible to color fragments generated by geometric primitives.The graphics pipeline will be further explained in moredetails in Section II.

Despite the limitations, GPU programming is essentialto implement special graphics effects not supported by thefixed pipeline. The power to replace a pipeline stage by ashader represents a major shift of implementation controland freedom to a computer graphics programmer. For real-time applications, such as games, the topic has become soimportant that it is no longer an extra feature but a necessity.Not surprisingly, the gaming industry represents the maindriving force to constantly elevate the graphics hardwarepower and flexibility.

GPU programming goes beyond graphics and it is alsoexploited by completely different purposes, from genomesequence alignment to astrophysical simulations. This us-age of the graphics card is known as General PurposeGPU (GPGPU) [1] and is playing such an important rolethat specific computational languages are being designed,such asnVidia’s [2] Compute Unified Device Architecture(CUDA) [3], which allows the non-graphical community tofully exploits the outstanding computational power of GPUs.Although this survey will not cover this area, some aspectsof GPGPU are provided in Section VII.

The goal of this survey is to describe the creation ofcomputer generated images from the graphics card side, thatis, the graphics pipeline, the GPU streaming programmingmodel and the three different shaders. The standard graphicslibrary OpenGL[4] and its shading languageGLSL [5] areused in order to reach a wider audience, since they are cross-platform and supported by different operating systems.

The remainder of this survey is organized as follows. InSection II, the graphics pipeline is summarized and its basicstages are explained. After that, Section III draws majoraspects of the GPU’s history, aiming at important featuresabout its architecture. Section IV briefly describes how theGLSL language integrates with OpenGL and GLUT. Themain part of this tutorial is in Sections V and VI, whereGLSL is explained by analyzing six examples step-by-step.The GPGPU argument raised in the introduction is furtheranalysed in Section VII. Finally, conclusions and insightson the future of shader programming wrap up this survey inSection VIII.

II. GRAPHICSPIPELINE

Before venturing into the topic of GPU programmingitself, it is important to be familiar with the graphicspipeline. Its main function is rendering, that is, generatingthe next frame to be displayed. To accomplish this, geo-metric primitives are sent to the graphics pipeline, wherethey are processed and filled with pixels (process calledrasterization) to compose the final image. Each stage ofthe pipeline is responsible for a part of this process, andprogrammable stages are nothing more than customizableparts that can perform different methods than those offeredby the graphics API. Understanding the different parallelparadigms of the rendering pipeline is also crucial for thedevelopment of efficient GPU applications.

The rendering pipeline is responsible for transforminggeometric primitives into a raster image. The primitives areusually polygons describing a model, such as a trianglemesh, while the raster image is a matrix of pixels. Figure 2illustrates the rendering pipeline. First, the model’s primi-tives, usually described as a set of triangles with connectedpoints, are sent to be rendered. The vertices enter the pipelineand undergo a transformation which maps their coordinatesfrom the world’s reference to the camera’s. When thereare light sources in the scene, the color of each vertex isaffected in this first stage by a simple illumination model.To accelerate the rest of the rendering process, clipping isperformed to discard primitives that are outside the viewingfrustum, i.e. primitives not visible from the current viewpoint. The vertex processing stage is responsible for thesetransformation, clipping and lighting operations.

The transformed vertices are then assembled according totheir connectivity. This basically means putting the prim-itives back together. Each primitive is then rasterized inorder to compute which pixels of the screen it covers, wherefor each one a fragment is generated. Each fragment hasinterpolated attributes from its primitive’s vertices, such ascolor and texture coordinates. A fragment does not yet definethe final image pixel color, only in a last stage of the pipelineit is computed by composing the fragments which fall inits location. This composition may consist of picking thefrontmost fragment and discarding others by means of adepth buffer, or blending the fragments using some criteria.

Note that the graphics pipeline, as any pipeline, is in-dependent across stages, that is, vertices can be processedat the same time pixels are being processed. In this way,rendering performance (measured in frames per second)can be significantly improved, since while final pixels ofthe current frame are being handled, the vertex processorscan already start processing the input vertices for the nextframe. Besides being independent across stages, the graphicspipeline is also independent inside stages, different verticescan be transformed at the same time by different processorsas they are computationally independent among themselves.

Figure 2. The basic stages of the rendering pipeline.

Figure 3. The data flow inside the graphics pipeline.

Figure 3 illustrates the three shader stages of the graphicspipeline, where they are represented by theprogrammablestream (arrows in green), in contrast to thefixed built-instream (arrows in gray). The threeprogrammable stages(green boxes) have read access to the video memory, whilethe fixed stages(gray boxes) access it for input and output.

The vertex shader is further detailed in Figure 4. Thisfirst stage treats only vertex primitives in a very strictinput/output scheme: exactly one vertex enters and exactlyone vertex exits. The processing is done in a completelyindependent way, no vertex has information about the othersflowing through the pipeline, which allows for many of themto be processed in parallel.

In the next stage, depicted in Figure 5, the geometryshader processes primitives formed by the connected ver-tices, such as lines or triangles. At this stage all the verticesinformation belonging to the primitive may be made avail-able. The geometry shader may receive as input differentprimitives than those it outputs, however, information suchas the type of input and output primitives and maximum

Figure 4. Pipeline Vertex Shader.

Figure 5. Pipeline Geometry Shader.

number of exiting vertices must be pre-defined by the user.Each input primitive may generate from zero to the definedmaximum number of output vertices.

Finally, the fragment shader pipeline, illustrated in Fig-ure 6, runs once for each fragment generated by eachprimitive. It has no information about which primitivesit belongs to neither about other fragments. However, itreceives all the interpolated attributes from the primitive’svertices. The shader’s main task is to output a color foreach fragment, or discard it.

Figure 6. Pipeline Fragment Shader.

III. T HE EVOLUTION OF GPUS

One of the biggest evolution steps on GPU’s historyoccurred in 1999, when nVidia designed a graphics hardwarecapable of performing vertex transformation and lightingoperations, advertising theirGeForce 256as ”the world’sfirst GPU”. At first, many were sceptical about its usefulness,but it soon became the new paradigm in graphics hardware.Heavy graphics processing operations were previously donein software, except for some expensive high-end graphicscards that could perform them in a dedicated chip; by con-trast, with the GeForce everything was integrated in a single-chip processor which allowed for a major price reduction.Followed by theGeForce 2and ATI’s Radeon 7500, thesecards framed the so called second generation GPUs allowingmore powerful effects, such as multi-texturing.

Nonetheless, hardware had only been designed to improvethe graphics performance, and the rendering process wasstill based on built-in fixed functionalities, i.e. hard-codedfunctions on graphics chips. The major drawback was thatmoving from software to hardware traded implementationfreedom for speed, restricting the possible achievable effects.In 2001 this drawback started to be overcome with thefirst programmable graphics card: theGeForce 3introduceda programmable processor for any per-vertex computation.Small assembly codes could be uploaded to the graphicscard replacing the fixed functionality. These were known asvertex shaders and framed the third GPU generation.

In the next two years, a programmable processor for anyper-fragment computation was introduced. The new shaderreplaces fragment operations allowing better illuminationeffects and texture techniques. Note that even though per-fragment operations influences the pixel colors, the finalpixel composition is still up to now a fixed functionalityof GPUs. The fragment shader is also named pixel shader,but during the survey will use the former nomenclature.

The fourth generation of GPUs had not only introducedthe fragment shader, but the maximum number of instruc-tions per shader was also raised, and conditional brancheswere now allowed. The fast pace of GPU evolution isevidenced even more with the introduction of theGeForce 6andShader Model 3.0in 2004. With this shader model alsocame along different high-level shading languages, such as:OpenGL’s GLSL [5], Cg from nVidia [6] and Microsoft’sHLSL [7]. Shader Model 3.0 allows for longer shaders,better flow control, and texture access in vertex shader,among other improvements.

In 2006 theShader Model 4.0, also calledUnified ShaderModel, introduced a powerful GPU programming and ar-chitecture concept. First, a new programmable stage in thegraphics pipeline was introduced: the geometry shader. Thisis, up until now, the last available programmable shaderin hardware, which allows manipulation and creation ofnew graphics primitives, such as points, lines and triangles,directly inside the graphics card. Second, the three shadertypes were bounded to a single instruction set, enabling anew concept of aUnified Shader Architecture. The new GPUarchitecture, starting with theGeForce 8andATI’s RadeonR600, allows a more flexible usage of the graphics pipeline,where more processors can be dedicated to demanding stagesof the pipeline. While the old architectures had a fixednumber of processors and resources throughout the graphicspipeline, with the unified model the system can allocateresources to the computationally demanding shaders.

The graphics card is now considered a massively parallelco-processor of CPUs not only for graphics programs but forany general purpose application. The non-graphical commu-nity has been using GPUs since 2002 under the conceptof GPGPU [1], however the new architecture allows thedesign of non-graphical languages, such as CUDA [3] andOpenCL [8], to fully exploit the highly parallel and flexiblecomputation power of present GPUs. This concept will befurther discussed in Section VII.

The history of GPUs points to a future where the graphicscards will be fully programmable, without restriction onshader programmability and resources usage. Last year,two new types of shader were announced with theShaderModel 5.0, to be implemented in graphics hardware this year(2009): the hull and domain shaders to better control thetessellation procedure. This survey will not stray to thesenew shader, focusing only on the three currently availableshaders in hardware. Beyond the evolution of GPUs, it seemsthat the future holds the end of sequential single-processorprogramming computation, in favor for massively parallelprogramming models. Another indication of this future pathis the Cell Broadband Engine Architecture (CBEA) releasedby IBM, Sony and Toshiba, and the recent announcements ofIntel’s Larrabee architecture, where a major convergence ofCPU and GPU pipelines and design strategies is promised.

IV. OPENGL BASICS

This survey uses GLUT – OpenGL Utility Toolkit [9]– a simple platform-independent toolkit for OpenGL [4].GLUT is used to create the OpenGL context, that is, itopens a graphics window for rendering, and handles theuser’s inputs. We have chosen GLUT for its popularityand simplicity, but among other options are QT [10] andwxWidgets [11].

Listing 1 shows a simple example of GLUT usage: afterseveral basic initializations, the OpenGL window is createdand the following callbacks are registered:reshape anddisplay functions, called when the window is resized orits display contents needs to be updated;keyboard andmouse functions, called when one of such user’s inputoccurs. At the end, the program enters the main event loopwhere it waits for events, such as window draw calls andkeyboard hits.

# inc lude <GL/ g l u t . h>/ / C Main f u n c t i o ni n t main ( i n t argc , char∗∗ argv ) {

/ / GLUT I n i t i a l i z a t i o ng l u t I n i t ( &argc , a rgv ) ;g l u t I n i t D i s p l a y M o d e ( GLUTDOUBLE | GLUT RGBA ) ;g l u t I n i t W i n d o w S i z e ( 512 , 512 ) ;/ / C r e a t e OpenGL Windowglu tCreateWindow ( ” Simple Window” ) ;i n i t ( ) ; / / non−GLUT i n i t i a l i z a t i o n s/ / R e g i s t e r c a l l b a c k sg lu tReshapeFunc ( r e s h a p e ) ;g l u t D i s p l a y F u n c ( d i s p l a y ) ;g lu tKeyboardFunc ( keyboard ) ;g lu tMouseFunc ( mouse ) ;/ / Event LoopglutMainLoop ( ) ;re turn 0 ;

}/ / / The r e s u l t i s a window wi th 512 x512 p i x e l s

Listing 1. Simple GLUT Example.

GLUT is used to create a layer between graphics functionsand the window management system, namely the OpenGLcontext. Once the context is created, OpenGL functions canbe used for rendering. Moreover, OpenGL has a number offunctions to manage shader codes in its native library. Indespite of newer versions, this survey uses OpenGL version2.1, as well as GLSL 1.2 and GLUT 3.7, which are sufficientto implement the introductory concepts.

The OpenGL functions to manage shader codes are illus-trated in Listing 2. TheinitShader function in this listingis called among OpenGL initialization calls, e.g. inside theinit function showed in Listing 1, where the goal is toinitialize the shaders. In this example, only a vertex shaderwill be set in order to illustrate the shader management.Nevertheless, the functions shown here can be used for ageometry or fragment shader as well. The source code of ashader to be uploaded is simply a string and can be declaredtogether with the application, as done in this example, orsaved in a file to be read at run-time. The vertex shader

code inside thevsSource string is the correspondingHelloWorld shader, and it will be explained in more details inSection V.

/ / OpenGL i n i t i a l i z a t i o n c a l l s f o r s h a d e r svo id i n i t S h a d e r ( ) {

/ / Ver tex Shader code s o u r c econs t GLchar∗ vsSource ={”# v e r s i o n 120\n”” vo id main ( vo id ) {\n”” g l F r o n t C o l o r = g l Co lo r ;\ n”” g l P o s i t i o n = g l Mode lV iewPro jec t i onMa t r i x

∗ g l V e r t e x ;\ n”” } ”}/ / C r e a t e program and v e r t e x s h a d e r o b j e c t sprogramObjec t = g lC rea teProg ram ( ) ;v t xShade r = g l C r e a t e S h a d e r ( GLVERTEX SHADER ) ;/ / Ass ign t h e v e r t e x s h a d e r s o u r c e codeg lShade rSou rce ( v txShader , 1 , &vsSource , NULL ) ;/ / Compile t h e v e r t e x s h a d e rg lCompi leShader ( v t xShade r ) ;/ / A t t ach v e r t e x s h a d e r t o t h e GPU programg l A t t a c h S h a d e r ( programObject , v t xShade r ) ;/ / C r e a t e an e x e c u t a b l e t o run on t h e GPUglL inkProgram ( programObjec t ) ;/ / I n s t a l l v e r t e x s h a d e r as p a r t o f t h e p i p e l i n eglUseProgram ( programObjec t ) ;

}/ / / The r e s u l t i s a v e r t e x s h a d e r a c t i n g as a/ / / s i m p l i f i e d v e r s i o n o f t h e f i x e d f u n c t i o n a l i t y

Listing 2. OpenGL setup example for GLSL.

The glCreateProgram function creates aGPU program object to hold the shader objects.The glCreateShader function creates a shaderobject to maintain the source code string, whilethe glShaderSource function is responsible toassign a given source code to the created shader. TheglCompileShader function compiles the shader object,while the glAttachShader function attaches it to atarget program object. Note that, in the example shown inListing 2, only a vertex shader is attached to the programobject and, consequently, the rest of the pipeline remainsrunning the fixed functionality.

Finally, theglLinkProgram function links the programcreating the final executable to be processed on the GPU. Toswitch among program objects and the fixed functionality,the glUseProgram function is used. This function mustbe called before rendering, in order to change the pipelineexecution accordingly.

The action of compiling and linking shaders do not raiseerrors as normally happens in applications. The error/successstates must be queried using OpenGL functions such asglGetProgramInfoLog.

V. GPU PROGRAMMING WITH GLSL

In this section, GPU programming and the OpenGL Shad-ing Language –GLSL– are introduced. GLSL is a high-levellanguage based on C/C++, and can be used to write shadercodes. The three types of shaders discussed in this survey

are the vertex, geometry and fragment shaders. However,it will cover only the overall aspect of the language, formore detailed information refer to Bailey and Cunningham’snewly released book onGraphics Shaders[12], as well asthe OpenGLOrange Book[13] for GLSL references.

The strategy chosen here is to explain the languageby showing GLSL code examples and, for each piece ofcode, detail types and functions among other aspects. Theexamples grow in difficulty ranging from very simple ”helloworld” examples to more sophisticated ones, such as phongshading and environment mapping.

A. Hello World

The first example is a GPUHello World that illustratethe three available shaders, showing a simple loop andpresenting different GLSL types. Listing 3 shows the vertexshader. The full set of operations performed by the fixedfunctionality includes a set of transformations, clippingandlighting; however, the goal here is to accomplish a simplifiedresult where only the position transformation is realized.More specifically, each vertex position is transformed fromthe world’s coordinate system to the camera’s system. Thistransformation uses the model-view and projection matrices(refer to the OpenGLRed Book[14] for more informationabout transformation matrices).

# v e r s i o n 120/ / Ver tex Shader Mainvo id main (vo id ) {

/ / Pass v e r t e x c o l o r t o nex t s t a g eg l F r o n t C o l o r = g l Co lo r ;/ / Transform v e r t e x p o s i t i o n b e f o r e p a s s i n g i tg l P o s i t i o n = g l Mode lV iewPro jec t i onMa t r i x

∗ g l V e r t e x ;}

Listing 3. Hello World Vertex Shader.

The first line indicates the minimum GLSL version re-quired for this shader, in this case version 1.2. From nowon, this version is always used and the directive is omitted.Ignoring comments, the next line is a function declaration.In any shader, the main function is obligatory and servesas entry point. In the next two lines the vertex’s color ispassed to the next step and the position transformed to screenspace. Note that the right side of the last line producesexactly the same result as calling the build-in functionftransform(), which shall be used during the rest ofthe text. Names starting withgl_ represent GLSL standardnames and have different meanings, explained later.

Next in the pipeline is the geometry shader, where eachgeometric primitive is processed and one or more primitivesis outputted, which may be of a different type than theone entered. For example, a geometry shader to decompressprimitives could be defined over points and produce trian-gles, where for each point several triangles can be generated.As stated previously, the geometry type of input and output,as well as the maximum number of vertices on the output,

have to be defined by the OpenGL application before linkingthe shader program and after compiling it.

Listing 4 shows the geometry shader for ourHello Worldexample. The goal here is to simply pass the relevantinformation forward, that is, read the input values from thevertices and write them to the output primitives.

# e x t e n s i o n GL EXT geometry shader4 : e n a b l e/ / Geometry Shader Mainvo id main (vo id ) {

/ / I t e r a t e s over a l l v e r t i c e s i n t h e i n p u tp r i m i t i v e

f o r ( i n t i = 0 ; i < g l V e r t i c e s I n ; ++ i ) {/ / Pass c o l o r and p o s i t i o n t o nex t s t a g eg l F r o n t C o l o r = g l F r o n t C o l o r I n [ i ] ;g l P o s i t i o n = g l P o s i t i o n I n [ i ] ;/ / Done wi th t h i s v e r t e xEmi tVer tex ( ) ;

}/ / Done wi th t h e i n p u t p r i m i t i v eE n d P r i m i t i v e ( ) ;

}

Listing 4. Hello World Geometry Shader.

The first line enables the geometry shader extension asdefined by the Shader Model 4.0. This directive is alwaysrequired when using a geometry shader and it is omittedfor the next examples. The main function body containsa loop iterating over the number of vertices in the inputgeometry. This is a pre-defined constant by GLSL and it hasthe same value for all the geometric primitives, dependingon the input geometry type. Each loop iteration receives thecolor and position of the corresponding input vertex andassigns them to the color and position of the output vertex.TheEmitVertex function finishes the output vertex, whiletheEmitPrimitive function finishes the primitive. In thegeometry shader, only one primitive type can be received asinput with a constant number of vertices.

Listing 5 shows the fragment shader, the last pro-grammable step. The only action here is the assignment ofa color to the fragment. The input color of each fragmentis computed by the rasterization stage, where the primitivesproduced by the geometry shader are filled with fragments.For each fragment all vertices’ attributes are interpolated,such as the color.

/ / Fragment Shader Mainvo id main (vo id ) {

/ / Pass f ragmen t c o l o rg l F r ag C o lo r = g l Co lo r ;

}

Listing 5. Hello World Fragment Shader.

In this last shader, thegl_Color name relate to a differ-ent built-in type than the one showed on the vertex shader(see Listing 3). A built-in type is a graphics functionality,made available by OpenGL or by the graphics hardware, toprovide access to pre-defined values inside or outside therendering pipeline. In the vertex shader case,gl_Colorwas assigned outside the pipeline as a vertex attribute,

while in the fragment shader case it was defined inside thepipeline as an interpolated value. The GLSL language hasfive different built-in types:

• built-in constants– variables storing hardware-specificconstant values, such as maximum number of lights.These values can be accessed by any shader and maychange depending on the graphics card.

• built-in uniforms – values passed from the OpenGLapplication to one or more shaders. These values areOpenGL states, such as the projection matrix, and mustbe set before rendering. Inside a draw call the uniformsremain fixed.

• built-in attributes– values representing an attribute ofa vertex, such as color. Like uniforms, the attributesare passed from the OpenGL application to (and onlyto) the vertex shader. However, unlike uniforms, theattributes may change for each vertex.

• built-in varying variables– variables used to conveyinformation throughout the rendering pipeline. For in-stance, the vertex color can be passed from the vertexshader to the geometry shader and then to the fragmentshader. Note that, even though the vertex color entersthe pipeline as a vertex attribute, it flows through thepipeline as output and input varying variables.

• built-in functions– basic and advanced functions fordifferent types of operations. The basic functions, suchas sin and cos, can be used in any shader, while theadvanced functions, such as the geometry shaders’sfunction to emit a primitive, may be only used in aspecific shader.

Listing 6 illustrates one example of each built-in typedescribed. The data types are similar to the C language,e.g.int andfloat, with some additions explained later.

cons t i n t g l MaxL ights ;uniform mat4 g l P r o j e c t i o n M a t r i x ;a t t r i b u t e vec4 g l Co lo r ;vary ing vec4 g l F r o n t C o l o r ;genType s i n ( genType ) ;

Listing 6. Built-in types examples.

The genTypeabove may represent one of the followingtypes: float, vec2, vec3 or vec4. Thevecdata type is a specialGLSL feature that allows easy manipulation of vectors. Italso allows the processor to optimize vector operations byperforming them in a parallel element-wise fashion.

All the built-in types are represented on the previ-ous three shaders of this first example. For instance,the gl_Color name from the vertex shader is a built-in attribute, while the gl_Color from the fragmentshader is a built-in input varying variable. Other exam-ples include a built-in constant in the geometry shader,gl_VerticesIn, a built-in uniform in the vertex shader,gl_ModelViewProjectionMatrix, and two built-infunctions within the geometry shader,EmitVertex andEmitPrimitive.

Figure 7 shows the difference between the fixed func-tionality and theHello World shader program. The normalpipeline does lighting computation, which enhances thedetails of the teapot model and yields a 3D appearance.On the other hand, the three presented shaders of this firstexample aim to simplify the fixed functionality and simplycarries on the blue color of the vertices, losing, for instance,the lighting effect and the sensation of depth.

Figure 7. Difference between the fixed functionality and theHello Worldshaders. The 3D appearance is lost due to lack of lighting computation inthe vertex shader.

B. Cartoon Effect

The second example is a GPUCartoon Effectthat illus-trate the use of normals and lighting, and mimics a cartoonistdrawing effect. This effect is achieved by rendering themodel with an extremely reduced color palette (4 shadesof blue in this example). The color is chosen dependingon the intensity of direct lighting arriving at the surfaceposition. Listing 7 shows the vertex shader, the first steptowards achieving thisCartoon Effect.

/ / Output v e r t e x normal t o f ragmen t s h a d e rvary ing ou t vec3 normal ;vo id main (vo id ) {

/ / Compute normal per−v e r t e xnormal = n o r m a l i z e ( g l NormalMat r i x ∗ gl Normal ) ;g l F r o n t C o l o r = g l Co lo r ;/ / Transform p o s i t i o n u s i n g b u i l t−i n f u n c t i o ng l P o s i t i o n = f t r a n s f o r m ( ) ;

}

Listing 7. Cartoon EffectVertex Shader.

In the vertex shader, two new lines are added from theprevious example. The first is an user-defined output varyingvariable callednormal, responsible to carry on the per-vertexnormal vector to the fragment shader. The other new lineassigns a value to this varying by multiplying the vertex nor-mal by its corresponding matrix. The normal matrix storestransformations to normal vectors as the model-view matrixstores about vertex positions. The last line computes thevertex position using the built-in functionftransform(),which does the same computation as the last line of the HelloWorld vertex shader (see Listing 3). More information aboutthe normal matrix and built-in functions can be found in theRedBook[14].

The next and last step of theCartoon Effect is themore involving fragment shader, shown in Listing 8. It uses

the normal from the vertex shader and a built-in uniformdefining the light position.

/ / I n p u t v e r t e x normal from v e r t e x s h a d e rvary ing i n vec3 normal ;vo id main (vo id ) {

/ / Compute l i g h t d i r e c t i o nvec3 l d = n o r m a l i z e ( vec3(

g l L i g h t S o u r c e [ 0 ] . p o s i t i o n ) ) ;/ / Compute l i g h t i n t e n s i t y t o t h e s u r f a c ef l o a t i t y = do t ( ld , normal ) ;/ / Weight t h e f i n a l c o l o r i n f o u r cases ,/ / depend ing on t h e l i g h t i n t e n s i t yvec4 f c ;i f ( i t y > 0 . 9 5 ) f c = 1 .00 ∗ g l Co lo r ;e l s e i f ( i t y > 0 . 5 0 ) f c = 0 .50 ∗ g l Co lo r ;e l s e i f ( i t y > 0 . 2 5 ) f c = 0 .25 ∗ g l Co lo r ;e l s e f c = 0 .10 ∗ g l Co lo r ;/ / Output t h e f i n a l c o l o rg l F r ag C o lo r = f c ;

}

Listing 8. Cartoon EffectFragment Shader.

The first line indicates that the normal is received asan user-defined input varying variable, corresponding tothe ouput varying from the vertex shader. Remember thatvarying variables arriving in fragment shaders have beenpreviously interpolated from the vertices. In contrast, thelight source position is directly fetched from the built-inuniform gl_LightSource[0].position, and used tocompute the light direction. An intensity value is defined bythe dot product between the light direction and the normalvector, i.e. by the angle between these two vectors. Theintensity value is distributed in four possible ranges, wherefor each one, a different shade of the original color is used asthe fragment final color. Figure 8 illustrates the final result.

Figure 8. Teapot rendered using theCartoon Effect.

C. Simple Texture Mapping

The three shaders provide access to texture memory, eventhough earlier versions of programmable graphics card didnot allow texture fetching in the vertex shader, and did notpossess geometry shaders at all. The texture coordinates canbe passed per vertex and interpolated to be accessed perfragment, however, any shader is free to access any positionin any of the available textures. A texture is made available

by setting it as an uniform variable and then calling one ofthe straightforwardsampler2Dfunctions to access it (see theGLSL reference [5] for further details on these functions).

Nevertheless, there are a few considerations that should betaken into account when designing a shader. First, the GPUis optimized for accessing texture in a more or less sequen-tial manner, thus accessing cached memory is extremely fastcompared to random texture fetches, but usually the GPUcache memory is substantially small. In this manner, linearalgebra operations, such as adding two large vectors, runextremely fast.

Another important point is that, since the GPU is opti-mized for arithmetic intense applications, many times it isworthwhile recomputing than storing a value in a texturefor latter access. The texture fetch latency is to some extendtaken care of by the GPU: it switches between fragmentoperations when a fetch command is issue, i.e. it may startworking on the next fragment while fetching the informationfor the current fragment. Therefore, as long as there isenough arithmetic operations to hide the fetch latency, theapplication will not be limited by the memory latency.

When texture coordinates are passed per vertex during theAPI draw call, they can be transformed within the vertexshader and retrieved after interpolation by the fragmentshader. Texture coordinates are automatically made availablewithout the need to define new varying variables. Thefollowing shaders exemplify this operation.

vo id main (vo id ) {/ / Pass t e x t u r e c o o r d i n a t e t o nex t s t a g egl TexCoord [ 0 ] = g l T e x t u r e M a t r i x [ 0 ]

∗ g l Mul t iTexCoord0 ;/ / Pass c o l o r and t r a n s f o r m e d p o s i t i o ng l F r o n t C o l o r = g l Co lo r ;g l P o s i t i o n = f t r a n s f o r m ( ) ;

}

Listing 9. Simple TextureVertex Shader.

The only difference to the vertex shader on the previousexample is the replacement of the normal by a texturecoordinate using a built-in varying variablegl_TexCoord.The computation is similar, the input texture coordinatesare transformed by the texture matrix and stored in avarying variable. The limit on the number of input texturecoordinatesgl_MultiTexCoordN per vertex is imposedby the graphics hardware.

/ / User−d e f i n e d un i fo rm t o a c c e s s t e x t u r euniform sampler2D t e x t u r e ;vo id main (vo id ) {

/ / Read a t e x t u r e e lemen t from a t e x t u r evec4 t e x e l = t e x t u r e 2 D ( t e x t u r e ,

g l TexCoord [ 0 ] . s t ) ;/ / Output t h e t e x t u r e e lemen t as c o l o rg l F r ag C o lo r = t e x e l ;

}

Listing 10. Simple TextureFragment Shader.

Within the fragment shader the texture defined as anuniform variable within the OpenGL API is accessed, and

Figure 9. Applying texture to a model within the fragment shader.

the fetched value used as the final color. The built-infunction texture2D is used to access the texture, whilethe gl_TexCoord[0].st is a 2D vector containing theinput texture coordinates. The.st field returns the firsttwo vector components of the 4D vector glTexCoord[0],whereas.stuv returns the full vector. Any vector canbe accessed by using the following components:.xywz,.rgba or .stuv. These components depend only on thesemantic intended to the vector, for example, a 3D point inspace can use.xyz coordinates while the samevec3 canbe used as a three-channel color.rgb. Additionally, thesecomponents can be swizzled when accessed by changing itsorder, such as theblue-green-redchannels of a color.bgr.

The result of applying a texture inside the fragment shaderis illustrated in Figure 9.

D. Phong Shading

One nice example that helps to put together the basicsof GPU programming isPhong Shading. OpenGL performsbuilt-in Gouraud shading when theGL_SMOOTH flag isset, where illumination is calculated per vertex, and thevertex colors are interpolated to the fragments. In this model,normal variations inside the triangle are not really wellaccounted for, and sometimes the border between trianglesare very evident giving an unpleasant rendering result. Abetter way to do this is to interpolate the normals inside thetriangle, and then compute the illumination per fragment.Surely it is more computationally involving, but the tradeoff from speed to quality is usually well rewarding.

From the last examples, it was shown that it is possibleto customize other attributes to be passed from the vertex tothe fragment shader asvarying variables. To get our PhongShading working we need to pass the vertex normals asvarying variables to the fragment shader, as well as transferthe illumination computation from vertex to fragment shader.

The vertex shader below only transforms the normal andvertex to camera coordinates. Note that thevert varying

variable is only transformed by the modelview as we willneed it to compute the direction from the light source:

/ / Output v e r t e x normal and p o s i t i o nvary ing ou t vec3 normal , v e r t ;vo id main (vo id ) {

/ / S t o r e normal per−v e r t e x t o f ragmen t s h a d e rnormal = n o r m a l i z e ( g l NormalMat r i x

∗ gl Normal ) ;/ / Compute v e r t e x p o s i t i o n i n model−view space/ / t o be used i n t h e f ragmen t s h a d e rv e r t = vec3( g l ModelViewMatr ix ∗ g l V e r t e x ) ;/ / Pass c o l o rg l F r o n t C o l o r = g l Co lo r ;/ / Pass t r a n s f o r m e d p o s i t i o ng l P o s i t i o n = f t r a n s f o r m ( ) ;

}

Listing 11. Phong Vertex Shader

Since the normals were passed as varying variables, theyare also interpolated and accessible in the fragment shader.The code below (Listing 12) performs illumination withinthe fragment shader:

/ / A d d i t i o n a l i n p u t from v e r t e x s h a d e r :/ / v e r t e x normal and p o s i t i o nvary ing i n vec3 normal , v e r t ;vo id main (vo id ) {

/ / Compute l i g h t and eye d i r e c t i o nvec3 l p = g l L i g h t S o u r c e [ 0 ] . p o s i t i o n . xyz ;vec3 l d = n o r m a l i z e ( l p − v e r t ) ;vec3 ed = n o r m a l i z e (− v e r t ) ;/ / Compute r e f l e c t i o n v e c t o r based on/ / l i g h t d i r e c t i o n and normalvec3 r = n o r m a l i z e ( − r e f l e c t ( ld , normal ) ) ;/ / Compute l i g h t p a r a m e t e r s pe r f ragmen tvec4 l a = g l F r o n t L i g h t P r o d u c t [ 0 ] . ambien t ;vec4 l f = g l F r o n t L i g h t P r o d u c t [ 0 ] . d i f f u s e

∗ max ( do t ( normal , l d ) , 0 .0 ) ;vec4 l s = g l F r o n t L i g h t P r o d u c t [ 0 ] . s p e c u l a r

∗ pow ( max ( do t ( r , ed ) , 0 .0 ) ,g l F r o n t M a t e r i a l . s h i n i n e s s ) ;

/ / Use l i g h t p a r a m e t e r s t o compute f i n a l c o l o rg l F r ag C o lo r = g l F r o n t L i g h t M o d e l P r o d u c t .

sceneCo lo r + l a + l f + l s ;}

Listing 12. Phong Fragment Shader

This follows closely how OpenGL computes illuminationper vertex (for more details refer to the Red Book [14]), buthere we are performing per pixel illumination. An exampleof the quality improvement can be observed in Figure 10.Figure 11 illustrates how the data flow for this example isintegrated within the rendering pipeline.

Figure 10. Gouraud shading rendered withGL SMOOTH(left) and Phongshading rendered with the vertex and fragment shaders (right).

Figure 11. The data flux of the Phong shaders. Note how the lightingoperation was postponed from the vertex shader to the fragment shader.

E. Environment Map

An useful texture application that is extremely simple toimplement with shaders is theenvironment mapping. Thegoal is to simulate the reflection of an environment onto theobject’s surface. The idea behind this technique is to applya texture in a reverse manner, that is, instead of directlyplacing a texture to a surface, the reflection of the light isused to map it. One way to do this is to map a commonimage into a sphere, giving the illusion it was taken with afish eye lens. Fortunately, most image editors have simplefilters to perform this operation.

In this example, for each vertex we will compute thetexture coordinates for the environment map, much likeOpenGL would perform using sphere textures. Even thoughthere are better ways to achieve this mapping, such as cubemaps, we will follow this model for the sake of simplicity.The original and sphere mapped images used are shown inFigure 12.

The environment map vertex and fragment shaders areshown in Listing 13 and 14. The vertex shader is responsibleto compute the reflection vector used by the fragment shaderto access the environment texture.

/ / Output r e f l e c t i o n v e c t o r per−v e r t e xvary ing ou t vec3 r ;vo id main (vo id ) {

/ / Pass t e x t u r e c o o r d i n a t egl TexCoord [ 0 ] = g l Mul t iTexCoord0 ;

Figure 12. The original image (left) and the image mapped to a sphere(right) to be used as the environment mapping texture, a point light was alsoadded to the final image. Note that even though the environment texturehas completely black regions they will never be accessed by the shaders.

/ / Compute v e r t e x p o s i t i o n i n model−view spacevec3 v = n o r m a l i z e ( vec3( g l ModelViewMatr ix

∗ g l V e r t e x ) ) ;/ / Compute v e r t e x normalvec3 n = n o r m a l i z e ( g l NormalMat r i x∗gl Normal ) ;/ / Compute r e f l e c t i o n v e c t o rr = r e f l e c t ( u , n ) ;/ / Pass t r a n s f o r m e d p o s i t i o ng l P o s i t i o n = f t r a n s f o r m ( ) ;

}

Listing 13. Environment Map Vertex Shader

In the vertex shader a reflected vector of the view directionover the normal is computed and passed to the fragmentshader. This vector gives the direction of the simulatedincoming light from the environment, i.e. is the point inspace we are seeing through the reflection on the specularsurface of the object.

/ / I n p u t r e f l e c t i o n v e c t o r from v e r t e x s h a d e rvary ing i n vec3 r ;/ / Tex tu re i d t o a c c e s s env i ronment mapuniform sampler2D envMapTex ;vo id main (vo id ) {

/ / Compute t e x t u r e c o o r d i n a t e u s i n g t h e/ / i n t e r p o l a t e d r e f l e c t i o n v e c t o rf l o a t m = 2 .0 ∗ s q r t ( r . x∗ r . x + r . y∗ r . y

+ ( r . z + 1 . 0 )∗ ( r . z + 1 . 0 ) ) ;vec2 coord = vec2( r . x /m + 0 . 5 , r . y /m + 0 . 5 ) ;/ / Read c o r r e s p o n d i n g t e x t u r e e lemen tvec4 t e x e l = t e x t u r e 2 D ( envMapTex , coord . s t ) ;/ / Output t e x t u r e e lemen t as f ragmen t c o l o rg l F r ag C o lo r = t e x e l ;

}

Listing 14. Environment Map Fragment Shader

The interpolated reflected vector per fragment is used tofetch the texture. This is done by parametrizing the vectorover a circle that will match our fish eye texture. The resultis shown in Figure 13, applying the environment map ofFigure 12 to different models. Figure 14 illustrates a differenttexture for the environment map.

F. Spike Effect

The last example is a GPUSpike Effectthat illustrate aspecial effect using the geometry shader. Listing 15 shows

Figure 13. Torus and teapot models rendered with the environment mapshaders. Note that the blue comes from the sky in the original image, andnot from the teapot color from the other examples.

Figure 14. The teapot rendered with a constant color (left) and a differentenvironment map (right).

the vertex shader. It is the simplest possible vertex shadercontaining only one line to receive a vertex and outputit without modification. The goal is to keep the vertexuntransformed in order to send it in the world’s coordinatesystem to the geometry shader.

vo id main (vo id ) {g l P o s i t i o n = g l V e r t e x ; / / Pass−t h r u v e r t e x

}

Listing 15. Spike Vertex Shader

The next step is where the special effect takes place.The geometry shader, shown in Listing 16, receives triangleprimitives with untransformed vertices, and creates newprimitives rebuilding the surface to create a spike effect.Each input triangle is broken into three triangles usingthe centroid, where each new triangle has one-third of theoriginal size. The centroid is displaced by a small offsetalong the normal direction to create the spike effect.

vary ing ou t vec3 normal , v e r t ; / / Output t o FSvo id main ( ) {

/ / S t o r e o r i g i n a l t r i a n g l e ’ s v e r t i c e svec4 v [ 3 ] ;f o r ( i n t i =0 ; i <3; ++ i )

v [ i ] = g l P o s i t i o n I n [ i ] ;/ / Compute t r i a n g l e ’ s c e n t r o i dvec3 c = ( v [ 0 ] + v [ 1 ] + v [ 2 ] ) . xyz / 3 . 0 ;/ / Compute o r i g i n a l t r i a n g l e ’ s normalvec3 v01 = ( v [ 1 ] − v [ 0 ] ) . xyz ;vec3 v02 = ( v [ 2 ] − v [ 0 ] ) . xyz ;vec3 t n = −c r o s s ( v01 , v02 ) ;/ / Compute midd le v e r t e x p o s i t i o nvec3 mp = c + 0 .5 ∗ t n ;/ / Genera te 3 t r i a n g l e s u s i n g midd le v e r t e xf o r ( i n t i = 0 ; i < g l V e r t i c e s I n ; ++ i ) {

/ / Compute t r i a n g l e ’ s normalv01 = ( v [ ( i +1)%3] − v [ i ] ) . xyz ;v02 = mp− v [ i ] . xyz ;t n = −c r o s s ( v01 , v02 ) ;

/ / Compute and send f i r s t v e r t e xg l P o s i t i o n = g l Mode lV iewPro jec t i onMa t r i x

∗ v [ i ] ;normal = n o r m a l i z e ( t n ) ;v e r t = vec3( g l ModelViewMatr ix ∗ v [ i ] ) ;Emi tVer tex ( ) ;/ / Compute and send second v e r t e xg l P o s i t i o n = g l Mode lV iewPro jec t i onMa t r i x

∗ v [ ( i +1) %3];normal = n o r m a l i z e ( t n ) ;v e r t = vec3( g l ModelViewMatr ix ∗ v [ ( i +1) %3]) ;Emi tVer tex ( ) ;/ / Compute and send t h i r d v e r t e xg l P o s i t i o n = g l Mode lV iewPro jec t i onMa t r i x

∗ vec4( mp , 1 .0 ) ;normal = n o r m a l i z e ( t n ) ;v e r t = vec3( g l ModelViewMatr ix∗vec4(mp , 1 . 0 ) ) ;Emi tVer tex ( ) ;/ / F i n i s h t h i s t r i a n g l eE n d P r i m i t i v e ( ) ;

}}

Listing 16. Spike Effect Geometry Shader

In the geometry shader code, the triangle’s centroid andnormal vector are computed using the original verticespositions. The centroid is then displaced by moving it halfway along the normal direction. The displaced centroid isthen used to build three new triangles. The model-viewand projection matrices are applied to the original verticesand the displaced centroid to convert them to the camera’scoordinate system. Each of the three output triangles arebuilt using a combination of two original vertices plus thecentroid.

The Spike Effectexample does not have a specific frag-ment shader. In order to illustrate different shaders combi-nations one of the two previous defined fragment shadersare used without further modifications:Phong ShadingorEnvironment Mapping. This is a powerful feature of shaderprogramming, the developer is free to combine differenteffects and shaders obtaining interesting new results.

It is also interesting to note that both shaders discard thevertex color attribute since they are not used to evaluate thefinal color of the fragments,Phong shadinguses materialproperties whileEnvironment Mapfetches the color froma texture. Figure 15 shows the final result of applying theSpike Effectwith Phong ShadingandEnvironment Mapping.

Figure 15. Spike Effectshader applied in combination withPhong Shading(left) andEnvironment Map(right).

VI. SHADERS SUMMARY

During the last sections we have introduced the GLSLlanguage through a few examples. Since it has a very similarstructure as other programming languages, such as C, thereal challenge resides in learning how to design the shaderswithin the rendering pipeline. More specifically, the dataflow is one of the most important points to have in mind,that is, knowing what flows in and out of each stage.

Figure 16. Input/Ouput summary of the vertex shader.

Figures 16, 17, and 18 illustrates the input/ouput variablesof the each shader. Note how attributes coming in thevertex shader may be passed forward as special or varyingvariables.

Figure 17. Input/Ouput summary of the geometry shader.

As an example, lets analyze how the color valueflows through the pipeline. It first enters the vertexshader asgl_Color and leaves asgl_FrontColor;the geometry shader receives each vertex color asgl_FrontColorIn[vertexId] and outputs againas gl_FrontColor for each emitted vertex; finallythe fragment shader receives the interpolated coloras the gl_Color variable and writes the result togl_FragColor. Figure 19 illustrates this particular flowof the color values.

Figure 18. Input/Ouput summary of the fragment shader.

A. Upcoming Features

Recently OpenGL 3.0 was released where many fixedfunctions are marked as deprecated, and, in fact, havealready began to be removed from the API with the newer3.1 and 3.2 versions. GLSL is also being reformulated andversions 1.3, 1.4 and 1.5 have been released in a very narrowtime frame. The newer versions point to a future whereshader programming will not be an option, but a requirementfor working with the graphics API, since most of the pipelinewill have to be written by the programmer himself. At first,this might difficult the learning curve for newbies in graphicsprogramming, but it will force them to gain a better graspof how graphics computation is handled, and consequentlydesign better and innovative graphics applications.

Another clear evidence of this tendency is the introductionof new programmable stages: the hull and domain shadersplus the tessellator, which is a configurable stage. However,up until this point, they have not yet been integrated withthe hardware and are implemented only via software.

VII. G ENERAL PURPOSEGPU

Since the GPU comprises such a powerful parallel ar-chitecture, many programmers have been using it for otherpurposes other than graphics, a trend known as GPGPU, orGeneral Purpose GPU. The essence of stream programmingis the data, which is a set of elements of the same type.A set of kernels can process the data by operating on thewhole stream.

We recall that the graphics hardware cannot handle alltype of parallel paradigms since it is not good at solving flowcontrol issues; on the other hand, it performs extremely effi-ciently within the streaming paradigm. Multicore processorsfor example, many times perform different tasks in parallel,which is different from parallelizing a single task. The lackof complex control structures also partially explains whyGPU’s performance has increased in a rate much higherthan CPUs: it is easier to assemble more processor togetherif they can act more independently and the global memoryaccess requires little control.

While the GPU has few levels of memory hierarchy, theCPU has different cache levels, swap, main memory amongother resources. This requires a high control level and many

Figure 19. The flow of the color variables through the shadersinside therendering pipeline.

transistors must be dedicated to the task, but the advantageisthat it allows the CPU to significantly reduce latency issues,i.e. the time to fetch information.

The GPU has an immense parallel computational power,operates on many primitives at the same time and computesarithmetic operations extremely fast; nevertheless, it stillneeds data to compute. Modern graphics hardware havedecreased considerably the difference from CPU to GPUmemory capacity, with some achieving the mark of 1Gb ofmemory. Even so, a common bottleneck is transferring thedata to the graphics card. A large memory capacity partiallysolves the problem, because in most cases the data can beuploaded once to the GPU and used thereafter, avoiding theslow CPU-GPU transfer. In fact, many games rely heavyon compacting data to be able to fit everything in memoryand avoid this problem. On the other hand, general purposeapplications usually have to handle this deficiency with otherstrategies since even the compressed data may not fit inmemory, or the data might be dynamic and change everyframe, such as with simulation applications.

The CPU-GPU interaction works as a command buffer,and transferring data is one of such commands. There are

two threads in play, one for the CPU that adds commands,and another for the GPU that reads commands. How fast thecommands are written or read determines if our applicationis CPU or GPU bound. If the GPU is consuming commandsfaster than the CPU is writing, at some moment the bufferwill be empty, meaning that the GPU will be out of workand we are CPU bound at this point. On the other hand, ifwe fill up the buffer the GPU is not able to handle all thecommands fast enough, and we are GPU bound in this case.

Fortunately, the CPU-GPU interaction is handled in asmart way by the API and we do not have to worry aboutmost of the problems that one might have with bottlenecksfrom sequentially adding commands. The GPU can processmost commands in a non sequential manner and is able tosubstantially reduce the latency; for example, it does a verygood job of continuing with other tasks when a current jobis waiting for some information to arrive.

VIII. C ONCLUSION

In this survey, we have exposed an introductory walk-through to shader programming using the GLSL language,while at the same time pointing out some important aspectsof GPU programming. Unfortunately, it is by no means acomplete reference as this is not possible to be achievedin a few pages. Many surveys with more specific anddetailed information are available on the internet [1], [2],[15], [16], [17], [18], [19]. Other source of specializedinformation is the GPU Gems books series, offering a varietyof advanced examples on applications and effects that canonly be achieved using the graphics card and programmableshaders [20], [21], [22].

GPU Programming is not anymore a small specializedniche of computer graphics, it allows the developer toachieve a wider and more efficient variety of visual effects.It is also embedded in a big turn that software developmentis making as a whole, the “Think Parallel” slogan is ev-eryday more imminent, be it within GPUs, clusters, Cell ormulticore processors.

It is no wonder that over the last decade the GPU’sgrowing curve, on what matters GFLops/sec, is astonishinglyhigher than those of the CPUs. In fact, a single CPUprocessor has undergone little evolution during these lastyears, what we acknowledge is the growth in number ofprocessors per unit.

The general purpose GPU programming languages, suchas CUDA and OpenCL, are unlikely to overtake all kindof graphics hardware implementation. Programming shaderswill still be a major skill for those working closely withits real objective, which is to write modifications on thegraphics pipeline to achieve different, better, and fastervisual effects. This is specially true for the game industrywhere shaders are heavily employed. As an illustrativeexample, a modern game may achieve the mark of a fewhundreds or more shaders.

To summarize, even though GPU Programming was onlya few years ago an exotic and highly technical part ofcomputer graphics, is has evolved into a valuable skill andby now a requirement for graphics programmers. Exploitingthe graphics card capability at its best is much more thanlearning a new language such as GLSL, it requires a deepunderstanding on how shaders fit into the graphics pipelineand how great efficiency can be reached by profiting fromthe GPUs parallel power.

ACKNOWLEDGEMENT

This work was carried out during the tenure of an ERCIM”Alain Bensoussan” Fellowship Programme of the firstauthor. We also acknowledge the grant of the second au-thor provided by Brazilian agency CNPq (National Counselof Technological and Scientific Development), and thankRobert Patro for his fruitful insights on the shader examples.

REFERENCES

[1] “General purpose gpu.” [Online]. Available: http://gpgpu.org/

[2] “nvidia corporation.” [Online]. Available: http://www.nvidia.com/

[3] “nvidia’s cuda.” [Online]. Available: http://www.nvidia.com/object/cudahome.html

[4] “Opengl.” [Online]. Available: http://www.opengl.org/

[5] “Opengl shading language.” [Online]. Available: http://www.opengl.org/documentation/glsl/

[6] “nvidia’s cg.” [Online]. Available: http://developer.nvidia.com/page/cgmain.html

[7] “Microsoft’s hlsl.” [Online]. Available: http://msdn.microsoft.com/en-us/library/bb509561\%28VS.85\%29.aspx

[8] “Opencl.” [Online]. Available: http://www.khronos.org/opencl/

[9] “Glut.” [Online]. Available: http://www.opengl.org/resources/libraries/glut/

[10] “Qt.” [Online]. Available: http://qt.nokia.com/

[11] “wxWidgets.” [Online]. Available: http://www.wxwidgets.org/

[12] M. Bailey and S. Cunningham,Graphics Shaders Theory andPractice. A K Peters, 2009.

[13] R. J. Rost,OpenGL(R) Shading Language (2nd Edition).Addison-Wesley Professional, January 2006.

[14] Opengl, D. Shreiner, M. Woo, J. Neider, and T. Davis,OpenGL(R) Programming Guide : The Official Guide toLearning OpenGL(R), Version 2 (5th Edition). Addison-Wesley Professional, August 2005.

[15] “Lighthouse glsl tutorial.” [Online]. Available: http://www.lighthouse3d.com/opengl/glsl/

[16] “Gpu shading and rendering course.” [Online]. Avail-able: http://old.siggraph.org/publications/2006cn/course03/index.html

[17] “Textures in glsl.” [Online]. Available: http://www.ozone3d.net/tutorials/glsltexturing.php

[18] “Glsl introduction.” [Online]. Available: http://nehe.gamedev.net/data/articles/article.asp?article=21

[19] “Clockwork coders glsl tutorials.” [Online]. Available:http://www.clockworkcoders.com/oglsl/tutorials.html

[20] R. Fernando,GPU Gems: Programming Techniques, Tips andTricks for Real-Time Graphics. Pearson Higher Education,2004.

[21] M. Pharr and R. Fernando,Gpu gems 2: programming tech-niques for high-performance graphics and general-purposecomputation. Addison-Wesley Professional, 2005.

[22] H. Nguyen,GPU Gems 3. Addison-Wesley Professional,2007.

Glsl Intro Survey

Documents

graphics data

graphics unitshave

graphics pipeline whilebeing

gpu graphics processing

challenges of gpu programming

simple graphics card

artin computer graphics

streaming programming