1 Sample Tweaker Ocean Fog Overview This paper will discuss how we successfully optimized an existing graphics demo, named Ocean Fog, for our latest processors with Intel® Integrated Graphics. We achieved a 4x boost in performance (40 FPS to 160 FPS) with very little to no fidelity loss by applying techniques such as reducing texture sizes and lowering precision. These optimization techniques are not revolutionary by any means, but knowing when to apply them can be a bit more involved. To help us identify where we might be able to optimize, we used Intel’s graphics profiler, called Intel® Graphics Performance Analyzers or Intel® GPA for short. We will use screenshots of Intel GPA to show how we identified a graphics bottleneck and then detail how we tried to optimize or fix those problem areas. Understanding the architecture that you are optimizing for can really help you in deciding how to fix problem areas. Intel GPA allows you to run different tests against problem areas to help identify the problems and possible fixes without an intimate knowledge of the architecture. In this paper, you will see that our tests are labeled as 2x2 textures or simple pixel shader. Those tests are built into Intel GPA and are not something that a person would have to modify themselves in the existing application. The purpose of the original Ocean Fog project was to investigate how to effectively render a realistic ocean scene on differing graphics solutions while trying to provide a good, current, working class set of data to the graphics community. The ocean was rendered by using a projected grid that is displayed orthogonally to the viewer. The vertices of the grid are displaced using a height field. Perlin noise was used for generating wave motion. In the original paper, the author notes that computation Perlin noise was less CPU-intensive than other methods. However, other methods like Navier- Stokes work better on the GPU side and the author mentions it is worth further investigation. Snell’s law was used for reflection and refractions. For more information on how the water was rendered, please see Claes Johanson’s Master’s thesis, Real-time water rendering - Introducing the projected grid concept . The fog was also generated using Perlin noise. The processing for the fog was also done on the CPU side. This was done by sampling points in the 3D texture space. There are two lights in the scene: one infinite (directional) light, and one spotlight casting from the lighthouse. For further discussion please see: Ocean Fog using Direct3D 10 . Optimization Summary The original application was running at 40 FPS on our test hardware 1 ; after all optimizations, it was running 4x faster at 160 FPS. CPU utilization went from 8% to 84%, GPU active time from 32% to 85%, and GPU stall time from 53% to 9%. 1 We used 3 systems. First, an Intel® microarchitecture codename Sandy Bridge processor-based platform with a 2.4Ghz processor running 64-bit Microsoft Windows* 7, 4GB of memory and an 80GB solid-state disk. Second, an Intel® Core™ i5 640 processor-based platform with a 3.2Ghz processor running 32-bit Microsoft Windows Vista*, 2GB of memory and a Seagate* 7200RPM 500GB disk. Third, an Intel® Core™ 2 Duo T7700 processor-based system with a 2.4Ghz processor running 64-bit Microsoft Windows* 7, 4GB of memory, an 80GB solid-state disk, and an NVIDIA Quadro* FX-570 graphics card
15
Embed
Sample Tweaker - Intel Developer Zone · Performance Analysis Overview Intel GPA System Analyzer showed nearly 60% stall time initially. Using 2x2 textures override showed the greatest
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Sample Tweaker
Ocean Fog
Overview This paper will discuss how we successfully optimized an existing graphics demo, named Ocean Fog, for our latest
processors with Intel® Integrated Graphics. We achieved a 4x boost in performance (40 FPS to 160 FPS) with very little
to no fidelity loss by applying techniques such as reducing texture sizes and lowering precision. These optimization
techniques are not revolutionary by any means, but knowing when to apply them can be a bit more involved. To help us
identify where we might be able to optimize, we used Intel’s graphics profiler, called Intel® Graphics Performance
Analyzers or Intel® GPA for short.
We will use screenshots of Intel GPA to show how we identified a graphics bottleneck and then detail how we tried to
optimize or fix those problem areas. Understanding the architecture that you are optimizing for can really help you in
deciding how to fix problem areas. Intel GPA allows you to run different tests against problem areas to help identify the
problems and possible fixes without an intimate knowledge of the architecture. In this paper, you will see that our tests
are labeled as 2x2 textures or simple pixel shader. Those tests are built into Intel GPA and are not something that a
person would have to modify themselves in the existing application.
The purpose of the original Ocean Fog project was to investigate how to effectively render a realistic ocean scene on
differing graphics solutions while trying to provide a good, current, working class set of data to the graphics community.
The ocean was rendered by using a projected grid that is displayed orthogonally to the viewer. The vertices of the grid
are displaced using a height field. Perlin noise was used for generating wave motion. In the original paper, the author
notes that computation Perlin noise was less CPU-intensive than other methods. However, other methods like Navier-
Stokes work better on the GPU side and the author mentions it is worth further investigation. Snell’s law was used for
reflection and refractions. For more information on how the water was rendered, please see Claes Johanson’s Master’s
thesis, Real-time water rendering - Introducing the projected grid concept.
The fog was also generated using Perlin noise. The processing for the fog was also done on the CPU side. This was done
by sampling points in the 3D texture space.
There are two lights in the scene: one infinite (directional) light, and one spotlight casting from the lighthouse.
For further discussion please see: Ocean Fog using Direct3D 10.
Optimization Summary The original application was running at 40 FPS on our test hardware1; after all optimizations, it was running 4x faster at
160 FPS. CPU utilization went from 8% to 84%, GPU active time from 32% to 85%, and GPU stall time from 53% to 9%.
1 We used 3 systems. First, an Intel® microarchitecture codename Sandy Bridge processor-based platform with a 2.4Ghz processor running 64-bit Microsoft
Windows* 7, 4GB of memory and an 80GB solid-state disk. Second, an Intel® Core™ i5 640 processor-based platform with a 3.2Ghz processor running 32-bit
Microsoft Windows Vista*, 2GB of memory and a Seagate* 7200RPM 500GB disk. Third, an Intel® Core™ 2 Duo T7700 processor-based system with a 2.4Ghz
processor running 64-bit Microsoft Windows* 7, 4GB of memory, an 80GB solid-state disk, and an NVIDIA Quadro* FX-570 graphics card
Both map reduction and 16 bit depth provided 1.5x improvement
The depth change from 32-bit to 16-bit showed a slightly grainier normal map. The reflection and refraction dimension
reduction to 256x256 showed a pixelated reflection/refraction map only when there was zero wave amplitude, and thus
no water distortion. However, after any wave amplitude or water distortion, the pixelation could not be seen; along with
the addition of fog, the difference between the image fidelities could not be seen anymore.
10
Figure 5 - Reflection/Refraction Map 32bit vs 16 bit
Figure 6 - Reflection/Refraction Map - 256x256 vs. 512x512
Next, the skybox (1024x1024x6) was replaced with a smaller version (256x256x6), and because of the gradient and
unfocused nature of the texture, there was no change in fidelity. There was about a 3% FPS increase with all other
objects turned off in the scene.
11
Figure 7 - 1024x1024 vs 256x256 Cubemap
12
Miscellaneous Optimizations
Removing unnecessary clear calls The clearing of the reflection and refraction render targets were disabled when they were unchecked from the GUI. This
gave a frame boost from 52 FPS (water render only) to about 73 FPS. When everything else was rendered in the scene,
the frame rate dropped when reflection was disabled; however, this behavior was also observed with the original build.
Clearing must be done at every frame because as the camera moves, the reflection and refraction map must change.
MIP Generation Generating the additional MIP levels for the normal map did not show a significant change in FPS, but we thought it
might help, so we tried the experiment anyway.
MIP generation (msec/frame)
Normal Map Size
MIPs 1024x2048, 32 bit 1024x2048, 16 bit 256x512, 16 bit
One 37.736 18.519 7.168
All eight levels 37.736 18.018 7.220
Offloading GPU work One possible optimization would be moving more work from the GPU because the CPU is not being fully utilized. The
two largest shaders that compute the height and normal map were disabled, which showed about a 14% increase in
frame rate. One possible implementation would to pass normal and height map information, generated on the CPU,
along with the rest of the vertex data. This could also greatly reduce frame time, but we thought it might be an
interesting experiment that we didn’t have time to try.
Disabling Shader work (msec/frame)
Normal Map Size
256x512 1024x2048
GPU Work 11.050 21.978
No GPU Work 9.709 10.582
Summary We achieved a 4x performance improvement in this application by using Intel GPA to help us identify the GPU
bottlenecks and possible solutions. This application was stalled mainly on textures, so reducing the size and precision
allowed us to gain some substantial performance. In doing so, we lost some minor visual fidelity, and in some cases we
mitigated that loss by varying other simulation parameters. The optimizations that we did should be considered on a
case-by-case basis, because sometimes you might need that extra precision or even fidelity to convey to the user what is
visually important in your application or game. The technique and tools we used can be applied to any graphics
application to troubleshoot performance problems, so you should consider those on your next optimization adventure.
About the Author Jeff Laflam is a software engineer in the Intel Software and Services Group, where he supports Intel graphics solutions in
the Visual Computing Software Division.
13
Optimization Notice Refer to our Optimization Notice for more information regarding performance and optimization choices in Intel software