How we optimized our Game – Jake & Tess’ Finding Monsters Adventure Phil Lira Sr. Staff Engineer (Graphics) @phi_lira
How we optimized our Game – Jake & Tess’ Finding Monsters Adventure
Phil Lira Sr. Staff Engineer (Graphics) @phi_lira
RELEASE TRAILER
https://www.youtube.com/watch?v=STzdj04n7dc
TECHNICAL CHALLENGES
Technical challenges
Many custom shaders and effects
Technical challenges
Many custom shaders and effects
Technical challenges
Multiple characters with complex skinning
Our budget is the limit
• Push as much content as possible with smooth gameplay and no overheat– Can we get the same quality
with a similar approach?– Are we doing something we
don’t need to?
What if we hit our budge
• What happens when we fail?– Either gameplay or visual quality
will be impacted
• When it comes to remove effects, trust is important
OPTIMIZATION PROCESS
Optimization Process
• Do not make any assumptions. • A profiler will tell you where the bottleneck is.
Profile Optimize Test
Optimization Process
• Rewrite code to use resources more efficiently• Often we can fake or simplify effects • Experience comes into play here.
OptimizeProfile Test
Optimization Process
• Guarantee your tests have same conditions• Did you work reduced overall gpu ms?
TestProfile Optimize
How to find our bottleneck?
• Unity comes with a built-in profiler that does most of the work
• We wanted to have more detailed GPU info
– Adreno Profiler – Snapdragon GPUs– Mali Graphics Debugger (MGD) and
DS-5 Streamline – Mali GPUs
Adreno GPU Profiler
How to find our bottleneck?
Disable GL
Frame rate increased?
No Yes
CPU Bound GPU Bound
Vertex Frag Memory
How to find our bottleneck?
• Vertex– #triangles– Vertex shader– Per-vertex lighting
• Fragment– Fragment Shader (instruc. / sample)– Blend Ops– Per-Pixel light (forward rendering)
• Bandwidth– Large textures– Dependent Texture Reads– Block Resolve (ReadPixels)
CASE STUDY – ROYAL MOON
Case Study – Royale Moon
• Triangles 106k• Drawcalls 87• Overdraw 2.51x• Shader Stats:
– Up to 160 ALU/Frag– Up to 7 texture samples
• Adreno %Time Shading Fragment - max– Fragment bound
Overdraw Debug
Case Study – Royale Moon
• Early Z-Test Discards occluded fragments
• Render Order Matters
• Optimized Render Order– Opaques – Front to Back– Skybox– Transparent – Back to Front– Overlay (UI / HUD)
We need to improve this
How to assign object to sorting layers?• Per Shader
– Have to duplicate shader files. Hard to maintain because we have to make changes individually to each duplicate.
• Per Mesh– Not scalable, requires lot of work. – Risky! May break batches by mistake.
• Per Material– YES!– In that case do not use same material for different scene
• While you fix sort for one might break for the other.
Custom Material Inspector
• Created an editor script BRSMaterialEditor to set Material.renderQueue
• Add CustomEditor “BRSMaterialEditor”
to the end of shader file.
Character and Props
Camera Island Top
Outer Islands
Skydome
Before and After Improving Sort
Reduced from 2.51 to 1.91
Z-Reject
FRAGMENT SHADER
Shader hotzone (% time shading)
Shader hotzone (ALU per frag)
• Improving Shader Instructions– Model: ops that can be done once per drawcall
• Use scripts to compute and pass values to shader• Input Vector Normalization (ex. Rim Light)• Scroll Offset
– Vertex: Ops that can be done per vertex• Uniform texture tile & offset
– Fragment: Ops that needs to be done per pixel• Equation simplification• Half & Fixed precision for better thermal• Saturate vs max(0.0, dot)
Fragment
Vertex
Model
COM
PLEX
ITY
How to optimize fragment shader
Optimizing Shaders
• Many custom shaders done in ShaderForge– ShaderForge does heavy work on fragment
• Many variants and not exactly the same code structure
• How to optimize them all?– 1st pass optimizing in ShaderForge– 2nd pass optimizing in Code
1st Pass: ShaderForge
• Identify core changes to lighting model– BlinnPhongWrapped– BlinnPhongRamp
• Created custom code node– Artist helped with the process to replace for this code– This made shader code common and more organized
1st Pass: ShaderForge
Custom Lightmap in ShaderForge
• One major art complain was the lack of support for lightmap in custom lighting
• Created a Lightmap node for them• Problem1: Need to enable lightmap in config shader header.• Problem2: ShaderForge does not exposes interpolated data.
2nd Pass: Shader Code
Created a cginc file with macros for optimized code• ShaderForge follows name convention for
input data
The results - Ground Shader
After optimization:
Before optimization:
• Avg ALU/Frag – ~21% reduction• Fragments Shaded – ~45% reduction Overall Improvement: ~7ms• Fragment Instructions – ~64% reduction
Further Improvements
• Fallback Shader– We came across some problems
with shaders not being supported for some configurations
– Vertex Animation with a noise texture (tex2dlod) is not supported on OpenGL ES 2.0 profiles
– Fallback shader to standout in those cases
– Makes it easy to differentiate from other errors
ASTC
TEXTURE TEXTURE COMPRESSIONCOMPRESSION
ASTC
• Optimal performance with high quality
• Improves bandwitdh and power consuption
• Galaxy Note 4, Galaxy S6 and above support it
• Supported with OpenGL 3 Unity profile
ASTC
ASTC 4x4 ASTC 6x6 ETC 2
ASTC
Format RGB RGBA Normal MapCodec ASTC 6x6 ASTC 4x4 ASTC 4x4BPP 3.56 8 8
Size vs Uncompressed
14.8% 50% 50%
Size vs ETC2 89% 100% 100%
Recommended Settings:
Review
• Do not make assumptions, use a profiler.• GPU profilers will give you in-depth data per drawcall• One can assign objects to sorting layers at material
level for best workflow• Reduce amount of work to optimize shader by
creating means to reuse optimized code. • ASTC texture compression is best option available
for quality but only supported in a few devices.
Phil [email protected] @phi_lira
Q&A
CONTACTS
www.blackriverstudios.net@BlackRvrStudios/blackrivergames
Phil [email protected] @phi_lira
THANKS!
CONTACTS
www.blackriverstudios.net@BlackRvrStudios/blackrivergames