Top Banner
MANTLE FOR DEVELOPERS JOHAN ANDERSSON – TECHNICAL DIRECTOR FROSTBITE ELECTRONIC ARTS
38

Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Jan 13, 2015

Download

Technology

Keynote, Mantle for Developers, by Johan Andersson, Technical Director, DICE/Electronic Arts, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

MANTLE FOR DEVELOPERS

JOHAN ANDERSSON – TECHNICAL DIRECTOR FROSTBITE

ELECTRONIC ARTS

Page 2: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Simplify advanced development

Improve performance Enable developers to innovate

Challenge the status quo

Mantle?

Page 3: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts
Page 4: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Control GPU performance CPU performance

Programmability Platforms

Developer impact areas

Page 5: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Explicit Model: Mantle

Traditional Model: Black Box

Middle-ground abstraction – compromise between performance & “usability”

Hidden resource memory & state

Resource CPU access tied to device context

Driver analyzes & synchronizes implicitly

Thin low-level abstraction to expose how hardware works

App explicit memory management

Resources are globally accessible

App explicit resource state transitions

Control

New model

Page 6: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Tell when render target will be used as a texture ‒ And many more resource state transitions

Don’t destroy resources that GPU is using ‒ Keep track with fences or frames

Manual dynamic resource renaming ‒ No DISCARD for driver resource renaming

Resource memory tiling

Powerful validation layer will help!

App responsibility Control

Page 7: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

App high-level decisions & optimizations ‒ Has full scene information ‒ Easier to optimize performance & memory

Flexible & efficient memory management ‒ Linear frame allocators ‒ Memory pools ‒ Pinned memory

Reduced development time ‒ For advanced game engines & apps ‒ Easier to get to target performance & robustness

Explicit control enables Control

Page 8: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Light-weight driver ‒ Easier to develop & maintain ‒ Reduced CPU draw call overhead

Transient resources ‒ Alias render targets within frame ‒ Major memory savings ‒ No need to pre-allocate everything

Explicit control enables Control

Page 9: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

CPU performance Control

Page 10: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

CPU perf

Descriptor sets Monolithic pipelines Command buffers

Core concepts

Page 11: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Table with resource references to bind to graphics or compute pipeline

Replaces traditional resource stage binding ‒ Major performance & flexibility advantage ‒ Closer to how the hardware works

App managed - lots of strategies possible! ‒ Tiny vs huge sets ‒ Single vs multiple ‒ Static vs semi-static vs dynamic

Example 1: Single simple dynamic descriptor set ‒ Bind everything you need for a single draw call ‒ Close to DX/GL model but share between stages

Descriptor sets CPU perf

Link Sampler

Image Memory

VertexBuffer (VS)

Texture0 (VS+PS)

Constants (VS)

Texture1 (PS)

Texture2 (PS)

Sampler0 (VS+PS)

Dynamic descriptor set

Page 12: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Table with resource references to bind to graphics or compute pipeline

Replaces traditional resource stage binding ‒ Major performance & flexibility advantage ‒ Closer to how the hardware works

App managed - lots of strategies possible! ‒ Tiny vs huge sets ‒ Single vs multiple ‒ Static vs semi-static vs dynamic

Example 2: Reuse static set with nesting ‒ Reduce update time & memory usage

Descriptor sets CPU perf

Link Sampler

Image Memory

Constants (VS)

Link

Dynamic descriptor set

Texture3 (PS)

Texture4 (PS)

Sampler0 (VS+PS)

Texture2 (PS)

Texture1 (PS)

Sampler1 (PS)

Static descriptor set

VertexBuffer (VS)

Texture0 (VS+PS)

Page 13: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

CPU perf

Shader stages & select graphics state combined into single object ‒ No runtime compilation or patching needed! ‒ Significantly less runtime overhead to use

Supports parallel building & caching ‒ Fast loading times

Usage & management up to the app ‒ Static vs dynamic creation ‒ Amount of pipelines ‒ State usage

Monolithic pipelines

IA VS HS DS Tessellator

GS RS PS DB

CB

Pipeline state

Page 14: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Issue pipelined graphics & compute commands into a command buffer ‒ Bind graphics state, descriptor sets, pipeline ‒ Draw calls ‒ Render targets ‒ Clears ‒ Memory transfers ‒ NOT: resource mapping

Fully independent objects ‒ Create multiple every frame ‒ Or pre-build up front and reuse

Command buffers CPU perf

Page 15: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Render Driver Render

Game Render

Game Game Render

Automatically extracts parallelism out of most apps

Doesn’t scale beyond 2-3 cores

Additional latency

Driver thread often bottleneck – can collide app threads

CPU 0

CPU 1

CPU 2

CPU perf

DX/GL parallelism

Page 16: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Render

Game

Render

Game Game

Render

App can go fully wide with its rendering – minimal latency

Close to linear scaling with CPU cores

No driver threads – no overhead – no contention

Frostbite’s approach on all consoles – and on PC with Mantle!

Render

Render

Render

Render

Render

Render

Render

Render

Render

CPU 0

CPU 1

CPU 2

CPU 3

CPU 4

CPU perf

Parallel dispatch with Mantle

Page 17: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

GPU performance CPU performance

Page 18: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

GPU perf

Thanks to improved CPU performance – CPU will rarely be a bottleneck for the GPU ‒ CPU could help GPU more:

‒ Less brute force rendering ‒ Improve culling

Shader pipeline object – driver optimizations ‒ Can optimize with pipeline state knowledge ‒ Can optimize across all shader stages

Resource states ‒ Gives driver a lot more knowledge & flexibility ‒ Apps can avoid expensive/redundant transitions,

such as surface decompression

Expose existing GPU functionality ‒ Quad & Rect-lists ‒ HW-specific MSAA & depth data access ‒ Programmable sample patterns ‒ And more..

GPU optimizations

Page 19: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Modern GPUs are heterogeneous machines with multiple engines ‒ Graphics pipeline ‒ Compute pipeline(s) ‒ DMA transfer ‒ Video encode/decode ‒ More…

Mantle exposes queues for the engines + synchronization primitives

Queues GPU perf

Graphics

Compute

DMA

GPU

. . .

Queues

Page 20: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Queues GPU perf

Graphics

Compute

DMA

GPU

. . .

Queues

Page 21: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Async DMA transfers ‒ Copy resources in parallel with graphics or

compute

Queue use cases GPU perf

Render Other render Use copy Copy

Graphics

DMA

Page 22: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Async DMA transfers ‒ Copy resources in parallel with graphics or

compute

Async compute together with graphics ‒ ALU heavy compute work at the same time as

memory/ROP bound work to utilize idle units

Queue use cases GPU perf

GBuffer Shadowmap 0 Shadowmap 1 Final lighting Non-shadowed lighting Compute

Graphics

Page 23: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Async DMA transfers ‒ Copy resources in parallel with graphics or

compute

Async compute together with graphics ‒ ALU heavy compute work at the same time as

memory/ROP bound work to utilize idle units

Multiple compute kernels collaborating ‒ Can be faster than über-kernel ‒ Example: Compute geometry backend & compute

rasterizer

Queue use cases GPU perf

Compute Geometry Compute 0

Compute 1

Graphics Ordinary Rendering Compute Rasterizer

Page 24: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Async DMA transfers ‒ Copy resources in parallel with graphics or

compute

Async compute together with graphics ‒ ALU heavy compute work at the same time as

memory/ROP bound work to utilize idle units

Multiple compute kernels collaborating ‒ Can be faster than über-kernel ‒ Example: Compute geometry backend & compute

rasterizer

Compute as frontend for graphics pipeline ‒ Compute runs asynchronously ahead and prepares

& optimizes geometry for graphics pipeline

Queue use cases GPU perf

Game engines will build large GPU job graphs ‒ Move away from single sequential submission ‒ Just as we already have done on CPU

Draw0 Draw1 Draw2 Process0 Compute

Graphics

Process1 Process0

Page 25: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

GPU performance

Programmability

Page 26: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Programmability

Explicit control of GPU queues and synchronization, finally! ‒ Implement your own Alternate-Frame-Rendering ‒ Or something more exotic..

Use case: Workstation rendering with 4-8 GPUs ‒ Super high-quality rendering & simulation ‒ Load balance graphics & compute job graphs across GPUs ‒ 20-40 TFlops in a single machine!

Use case: Low-latency rendering ‒ Important for VR and competitive games ‒ Latency optimized GPU job graph scheduling ‒ VR: Simultaneously drive 2 GPUs (1 per eye)

Explicit Multi-GPU

Page 27: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Programmability

Command buffer predication & flow control ‒ GPU affecting/skipping submitted commands ‒ Go beyond DrawIndirect / DispatchIndirect ‒ Advanced variable workloads ‒ Advanced culling optimizations

Write occlusion query results into GPU buffer ‒ No CPU roundtrip needed ‒ Can drive predicated rendering ‒ Or use results directly in shaders (lens flares)

New mechanisms

Page 28: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Programmability

Mantle supports bindless resources ‒ Shaders can select resources to use instead of

static binding from CPU ‒ Extension of the descriptor set support

Key component that will open up a lot of opportunities!

Examples ‒ Performance optimizations – less data to update ‒ Logic & data structures that live fully on the GPU

‒ Scene culling & rendering ‒ Material representations

‒ Deferred shading ‒ Raytracing

Bindless resources

Page 29: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Programmability Platforms

Page 30: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Mantle gives us strong benefits on Windows today ‒ Console-like performance & programmability on both Windows 7 and Windows 8 ‒ For us, well worth the dev time!

DX & GL are the industry standards ‒ Needed for platforms that do not support Mantle ‒ Needed by devs who do not want/need more control ‒ Have to have fallback paths for GL/DX, but not limit oneself to it

Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations ‒ PS4 graphics API has great programmability & performance as well ‒ Share concepts, methods & optimization strategies

Today Platforms

Page 31: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Want to see Mantle on Linux and Mac! ‒ Would enable support for our full engine & rendering ‒ Significantly easier to do efficient renderer with Mantle than with OpenGL

Use cases: ‒ Workstations ‒ R&D

‒ Not limited by WDDM ‒ Games

‒ Mantle + SteamOS = powerful combination!

Linux & Mac Platforms

Page 32: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Mobile architectures are getting closer in capabilities to desktop GPUs

Want graphics API that allows apps to fully utilize the hardware ‒ Power efficient ‒ High performance ‒ Programmable

Major opportunity with Mantle – leap frog GL4, DX11 ‒ For mobile SoC vendors ‒ For Google and Apple

Mobile Platforms

Page 33: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Mantle is designed to be a thin hardware abstraction ‒ Not tied to AMD’s GCN architecture ‒ Forward compatible ‒ Extensions for architecture- and platform-specific functionality

Mantle would be a much more efficient graphics API for other vendors as well ‒ Most Mantle functionality can be supported on today’s modern GPUs

Want to see future version of Mantle supported on all platforms and on all modern GPUs! ‒ Become an active industry standard with IHVs and ISVs collaborating ‒ Enable us developers to innovate with great performance & programmability everywhere

Multi-vendor? Platforms

Page 34: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Platforms

Page 35: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Mantle support is in development ‒ Core renderer (closer to PS4 than DX11) ‒ Implement all rendering techniques used in BF4 (many!) ‒ CPU optimizations (parallel dispatch, descriptor sets) ‒ GPU optimizations (minimize transitions, MSAA) ‒ R&D for advanced GPU optimizations ‒ Memory management ‒ Multi-GPU support ‒ ~2 months of work

Update targeting late December

Battlefield 4 Frostbite

Page 36: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Very different rendering compared to BF4

Frostbite Mantle renderer will work out of the box

Focus on APU performance

Plants vs Zombies: Garden Warfare Frostbite

Page 37: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

All Frostbite games designed with Mantle ‒ 15 games in development across all of EA

Advanced Mantle rendering & use cases ‒ Lots of exciting R&D opportunities!

Want multi-vendor & multi-platform support!

Future Frostbite

Page 38: Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

THE END

Email: [email protected] Web: http://frostbite.com Twitter: @repi