Jan 15, 2015
Making a game with Molehill: Zombie Tycoon
Jean-Philippe AuclairLead R&D Software Architect
Luc BeaulieuCTO – Frima Studio
Session Overview
• State of Flash• Molehill’s API presentation• Digging deeper into Molehill
State of Flash
• Is Flash Dead?• FB: Top 10 = 250M MAU • Desktops: Flash 10 installed on 99%+• SmartPhones: Flash/Air 200+M, 100 devices• Streaming: 120 petabytes per month
• Advances in Flash for 3D games• AS3• 10.1, 10.2 …• Molehill
Molehill’s API Presentation
• Pros:– GPU Accelerated API– Relies on DirectX 9 and OpenGL ES 2.0– Native Software fallback
• Cons:– No point sprite support, branching, MRT, depth buffer– No CPU threading support– Native Software fallback
Educational slide
This Page Intentionally Left Green
Digging deeper into Molehill
• Assuming a basic knowledge of 3D development terminology
• Display Layers• Model/Animation File Format• Character Animation: Matrix vs Quaternion• Texturing• Optimizing the Particle System• Fast Lights & Shadows• CPU Post-Processing effects• Profiling & Debugging tools
• Bonus!– 3D GameDev Lexicon– The math explaining all the numbers I’m going to talk about– Cheat sheets
Display Layers
Frima 3D File Format
• Many 3D engines for flash try to support multiple input format• …Or support only generic format such as ColladaXML
• Using a format optimized for 3D game made in Flash– Small File Size– Small Memory footprint– No processing required
Collada XML Frima Binary Format0
1000
2000
3000
4000
5000
60005250
15
Model & Animation File Processing on low-end computer
Time to process (ms)
Frima 3D File Format
Collada XML
3DS Max Scene
Max Script Exporter
Build Tool
• Export pipeline
Frima 3D File Format
Model / Animation
Build Tool
GameObject
Serialize (AMF)Compress
GameFile
• Export pipeline
Add To Scene
Frima 3D File Format
GameObject
UncompressUnserialize
GameFile
• In-Game usage
Zombie Re-Animation
• Techniques– Matrix linear blending– DualQuaternion linear blending
• Molehill Constraint– Vertex Shader constants limits: 128 Float4
Zombie:24 bones
Animation techniques• Matrix linear blending can cause loss of volume when joints are twisted or
extremely bent• When using matrix, each bone take 3 constants
– Maximum number of bones is 40• When using DualQuats, each bone take only 2 constants
– Maximum number of bones is 60
Matrix (left) / Dual Quaternion (Right)
Animation techniques
Transitions & interpolation
matrix
DualQuaternion
0 32 64 96 128
72
48
VertexShader constant required for animating a character (24 bones)
matrix
DualQuaternion
0 32 64 96 128
Anim1 (72)
Anim1 (48)
Anim2 (72)
Anim2 (48)
Constant for anim 1 Constant for anim 2
Too Much
• Animation transition require two sets of bones• Idle blending to walk
• Same thing for frame interpolation (ex: Bullet time Animation)
File size? Performance?
matrix
DualQuaternion
50
60
Animation file size (k)
matrix
DualQuaternion
100%
130%
Vertex Shader processing time
matrix
DualQuaternion
0 32 64 96 128 160 192 224 256
54
136
VertexShader assembler instructions for animation processing
Texturing in Molehill
Texturing in Molehill
• The first version of the engine was only using PNGs• Adobe Texture Format (ATF)
– Texture are kept compressed in Video Memory– Native support for multi-device publishing– One file containing 3 encoding: DXT1, ETC1 and PVRTC– 1.3x bigger than original PNG– Contain the MipMapping of the texture– Does not support transparency
Texturing in Molehill
• Transparency– Use PNGs with indexed color– Sample a “alpha mask texture” in the pixel shader
ATFAvatar = opaque
PNGFence = Transparent
Texturing in Molehill
• Many effects can use ATF when using the good blend modes• No need for transparency
Splatter = Multiply Fire = Additive
Optimizing the Particle System
Particle System
• Using a divided workload (CPU/GPU) for better performance– Each particle property update is computed on the CPU at each frame
• Alpha, Color, Direction, Rotation, frame(If SpriteSheet), etc.– On the GPU
• Applying theses properties• Expending billboard vertex to face the screen
Particle System : Optimization
• How many particle?– Due to the VertexBuffer and IndexBuffer limits,– In ZombieTycoon we were limited to around 16383 particles per draw call
• Using Fast ByteArray (also known as Alchemy memory or DomainMemory)– Using Azoth, properties updates were 10 times faster
• Batching draw calls using the same texture• Using a 100% GPU particle system
– It’s expensive on the GPU – Support only linear transformation– Zero CPU required
Particle System
Particle System
Lights & shadows
• Techniques– ShadowMap & LightMap– Dynamic lighting– Fake Volumetric lights– Fake projected shadows
Lights & shadows
• ShadowMap & LightMap– We used two textures, a “multiplied” ShadowMap and an “additive” LightMap
Diffuse * ShadowMap+ Lightmap= Composite
Lights & shadows
• Dynamic lighting– Lighting required expensive pixel shader, currently limited to 256 instructions– Zombie Tycoon support up to 7-9 lights (spot or points) per object.
Lights & shadows
Lights & shadows
• Pixel Shader assembly code– Per light, without Normal/Specular mapping.
Lights & shadows
• Fake Volumetric Lights– Using a few billboard particles, it’s easy to fake a nice and lightweight volumetric lighting– All object are sampling Shadow and light maps, and since the light particles are “additive”, if
an object is behind the lights, it will look brighter
Lights & shadows
Lights & shadows
Lights & shadows
• Fake projected shadows– We created a particle of a gradient black spot aligned to the ground– Orientation and scale of the particle depends on light position and intensity
Fake shadows
CPU Post-Processing
• Possibility of reading the BackBuffer– Strongly recommended not to use Readback– Fast pipeline for data from the System memory to Video memory– VERY slow pipeline from video to system memory
• Effects: Bloom, Blur, Depth of Field, etc.
Motion Blur
CPU Post-Processing
Bloom post-processingNormal
Profiling and Debugging tools (CPU)
• FlashDevelop (O.S.S.)
– Most of the production is using FlashDevelop– Now with a profiler and a debugger, it’s very easy to work with it
Profiling and Debugging tools (CPU)
• Adobe Flash Builder Profiler– Profile Function calls– Profile Memory allocation
Profiling and Debugging tools (CPU)
• FlashPreloadProfiler (O.S.S.)
– Profile Function calls– Profile Memory allocation– Profile Loaders status– Can be used in Debug/Release & browser/Projector
Profiling and Debugging tools (GPU)
• Pix for windows– List of API calls– Shaders assembly code– Pixel debugger– Texture viewer
Profiling and Debugging tools (GPU)
• Intel® Graphics Performance Analyzers (GPA)– Render in wireframe– Profile Vertex and Pixel shader performance– Visualize overdraw and draw call sequence– Save a frame, and make real-time experiment– Identification of bottlenecks
Sources & References
• Geometric Skinning with Approximate Dual Quaternion Blending – http://isg.cs.tcd.ie/kavanl/papers/sdq-tog08.pdf
• Intel® Graphics Performance Analyzers (GPA)– http://software.intel.com/en-us/articles/intel-gpa/
• Pix for windows– http://msdn.microsoft.com/en-us/library/ee417072(v=VS.85).aspx
ContactLuc Beaulieu
Jean-Philippe Auclair [email protected]
@jpauclair jpauclair.net
• TD-Matt blog• http://td-matt.blogspot.com/
• FlashPreloadProfiler• http://jpauclair.net/flashpreloadprofiler/
• Azoth• http://www.buraks.com/azoth/
• Flash in Facebook• AppData.com
• Flash Stats• http://adobe.ly/rwXU• http://adobe.ly/gnlUEH
What it means?
• VertexBuffer• IndexBuffer• Vertex Constants• MipMapping• Quaternion• Billboard
Bonus Slide: The maths!
• Character animation:– Matrix linear blending:
• 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4• 120Float4 / / 3Float4 per bone = 40 bones in the constants• Bullet time and transitions require two sets of bones: 40/2 = 20 bones per character max
– DualQuaternion linear blending:• 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4• 120Float4 / / 2Float4 per bone = 60 bones in the constants• Bullet time and transitions require two sets of bones: 60/2 = 30 bones per character max
• Max Particle Count– The VertexBuffer is limited to 65536 vertex, the IndexBuffer is limited to 983040 index of type SHORT– In theory, you could have up to 327680 triangle in one draw call– In practice, with no vertex re-use between particles and using quads (4 vertex): 65536/6 = 16383 particle max per
draw call
• Lighting– With the PixelShader limit of 256 instructions, we were able to fit around 7 to 9 dynamic lights per object (point or
spot light)
Achievement: Geek
• Cheat Sheet
Achievement: Super Geek!
ContactLuc Beaulieu
Jean-Philippe Auclair [email protected]
@jpauclair jpauclair.net
Thank You! Questions?