Top Banner
April 4-7, 2016 | Silicon Valley
33

April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

Apr 02, 2018

Download

Documents

dangdiep
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

April 4-7, 2016 | Silicon Valley

Page 2: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

22

TEGRA PLATFORMS

GAMINGAUTOMOTIVE

DRONESROBOTICS

IVA

Page 3: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

33

C/C++

CodeWorksJetPack

Installers

NVTXNVIDIA Tools eXtension

Compile Debug Profile

Trace

Hardware Support

IDE Integration Standalone and CLI

Getting Started…

Page 4: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

44

SOFTWARE DEVELOPMENT WORKFLOW

Software Development

Toolchain Setup

Cross-compilation

Porting

Debugging

CPU/GPU

Remote

Debugging

Profiling

System/CPU/GPU/IO/…

Remote

Profiling

Running

Ship it!

CodeWorks

JetPack Install.

Nsight EE

Nsight Tegra

VSE

Tegra

Graphics

Debugger

Tegra

Graphics

Debugger

CUDA Visual Profiler

Tegra

System Profiler

Cuda-gdb

PerfWorks

nvprof

CUPTI

Cuda-memcheck

Nsight EE

Desktop

Tools

Page 5: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

55NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

COMMON CORE TOOLS OFFERING

Tegra Common

Tools

Gaming

AutomotiveEmbedded

Specialization Where Needed

CUDA Debugging and Profiling

Graphics Debugging and Profiling

GPU Performance Counters Libraries

CPU Profiling and System Trace

NVIDIA Tools Extension (NVTX)

Page 6: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

66NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

GETTING STARTED…

Jump starts developing for SHIELD/Android platform

Installs Android NDK & SDK tool chain

Installs Developer tools, GameWorks, Libraries,…

Reference documentation and samples

Cross-compiles code samples, pushes them to devkit

And Runs one sample…

CodeWorks 1r4ANDROID TOOL CHAIN

TEGRA ANDROID TOOLKIT

ANDROID OS IMAGES

Android

SDK/NDKEclipseJava

SamplesDocsTools

Kit Kat Marshmallow Lollipop

Page 7: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

77NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

GETTING STARTED…

Jump starts developing for Embedded platforms

Installs Linux ARM cross-compilation tool chain

Installs Developer tools, CUDA, Libraries,…

Flashes Drive PX/CX, Jetson TK1/TX1 OS Image

Reference documentation and samples

Compiles code samples, pushes them to devkit

And Runs one sample…

JetPack Installer For DriveCX/PX and Jetson

Page 8: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

88

JETPACK AND CODEWORKS

New SDK Component Manager

Parallel downloads

Component update notification

Component dependency resolution

Integrated terminal window

Page 9: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

99

Visual Profiler

Trace CUDA activities

Profile CUDA kernels

Correlate performance instrumentation with source code

Expert-guided performance analysis

NVPROF

Collect performance events and metrics

GPU Library Advisor

Detect CUDA library optimization opportunities

NVDISASM, CUOBJDUMP

CUDA-MEMCHECK

Detect out-of-bounds memory accesses

Detect race condition in memory accesses

Detect uninitialized variable accesses

Detect incorrect GPU thread synchronization

CUDA-GDB

Debug CUDA kernels with CLI

Debug CPU and GPU code

CPU and GPU core dump support

CUDA STANDALONE TOOLS

Page 10: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1010

NVIDIA® NSIGHT™ ECLIPSE EDITIONHomogeneous application development for

CPU+GPU compute platforms

CUDA-Aware Editor CUDA Debugger

CPU+GPU

CUDA Profiler

Page 11: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1111

CUDA 8.0For Tegra Platforms

Tegra Parker Support

CUDA Debugging with Compute Preemption

CUDA Profiling with advanced PC Sampling metrics

CUDA Visual Profiler with Critical Path Analysis

Page 12: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1212NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DEPENDENCY ANALYSIS

Provide insight into application-level performance limiters

Expose dependencies between activities according to the programming model

Identify waiting time due to inter-stream dependencies

Highlight activities on the critical application runtime path

Supports CUDA (Linux/Mac/Windows) and POSIX threads (Linux/Mac)

Page 13: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1313NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DEPENDENCY ANALYSIS EXAMPLE

Dependencies between events derived from programming model constraints

Allows to compute wait states and the critical path

cudaLaunch

Kernel

cudaStreamSynchronize

Stalls CPU (waiting time)

Page 14: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1414NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DEPENDENCY ANALYSIS IN NVPROF

New option to run post-mortem dependency analysis

New option to trace POSIX threads for multi-threaded applications

==22557== Dependency Analysis:

Critical path(%) Critical path Waiting time Name

94.61% 3.942181s 0ns clock_block(long*, long)

5.20% 216.857718ms 0ns cudaMalloc

0.16% 6.617667ms 0ns <Other>

0.01% 293.028000us 0ns cuDeviceGetAttribute

0.01% 235.154000us 0ns cudaGetDeviceProperties

0.01% 221.116000us 0ns cudaFree

0.00% 158.703000us 0ns cudaStreamCreate

0.00% 35.252000us 0ns cudaConfigureCall

0.00% 35.248000us 0ns cuDeviceGetName

0.00% 33.139000us 0ns cuDeviceTotalMem_v2

0.00% 20.298000us 0ns cudaSetupArgument

0.00% 19.433000us 0ns cudaGetDevice

0.00% 0ns 3.942147s pthread_join

0.00% 0ns 3.942136s cudaStreamSynchronize

0.00% 0ns 1.001459s pthread_mutex_lock

0.00% 0ns 980.464357ms pthread_cond_wait

Page 15: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1515

DEMO - DEPENDENCY ANALYSIS IN VISUAL PROFILER

Page 16: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1616

TEGRA GRAPHICS DEBUGGERNext-gen graphics development tools

Supports OpenGL ES 2.0/3.0/3.1 + Android Extension Pack

Monitor key software and hardware performance metrics

Debug draw calls, related states and resources

Live capture of a single rendering frame

Edit and recompile shaders live

Automatic GPU bottleneck analysis

Advanced timings for draw calls and kernel dispatches

Page 17: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1717

NEW WITH TEGRA GRAPHICS DEBUGGER 2.X

Graphics Range Profiler

Advanced Interposer for non-rooted devices

Highlight redundant API state changes

NVTX support for perf markers and ranges

Highlight drawcalls based on shader selection

Page 18: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1818

TEGRA SYSTEM PROFILERMulti-core CPU profiler for all Tegra platforms

Easily prepare a device and deploy application for profiling

Maximize multi-core CPU utilization

Quickly identify CPU “hot spots”, “hot paths” and L1/L2 cache issues

Visualize multi-core CPU activities with a new timeline view

Time range filtering

Page 19: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

1919

NEW WITH TEGRA SYSTEM PROFILER 2.5/2.6

NVIDIA Tools eXtension Support (NVTX)

Visualize thread state: running/ready/blocked

Trace CUDA kernel workload execution (Jetson/DriveCX/PX)

Trace OpenGL-ES API calls

Visualize CPU and EMC frequencies

Tegra Parker support and Expanded system trace

Page 20: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

2020

STEREOLABS

2560 x 720 @ 60HzUSB3.0

Real-time HD SLAM with GPU processing @ 25Hz

Jetson TX1

Zed

Page 21: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

2121

DEMO STEREOLABS

Page 22: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

2222

PERFWORKSNext-gen GPU Performance Counter Data Collection

C API for collecting GPU performance counters and data from NVIDIA GPUs.

• CUDA, OpenGL, OpenGL ES, D3D11, and D3D12

• Cross-Platform, with support for Kepler, Maxwell, Pascal GPUs

• Target Audience: tools developers, engine developers

Schedule: Beta in 3Q 2016

Page 23: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

2323

PERFWORKS SDK

Successor to the NVIDIA Perfkit SDK (NVPMAPI)

• Collect GPU metrics for Performance Monitoring (e.g. HUD).

• Automated serialized drawcall bottleneck analysis

Adds range-based profiling

• Collect metrics per user defined range, draw calls, or dispatches (Perf Markers, RTs, ...)

Supports multi-threaded GPU work submission

• Collect data on modern multi-threaded APIs

• Collect consistent metrics across multiple generations of NVIDIA GPUs

Page 24: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

2424

System

CPU

Core 0

Core 1

Freq.

GPU

Graphics

Compute

build 3D world

Win10

Process 0

Process 1

Thread 0DX12

GPU

Memory

CUDA

GPU

Memory

Thread 1

HairWorksAnnotation

Annotation

ODE

Animate CharacterAnimate Hair

CUDA

GPU

Memory

Thread 2

Annotation

PhysXWorld physics simulation

NVTX 1R2Events and Domains

Differentiate annotations from libraries

and application

Middleware libraries have their own domain

User-defined and named synchronization primitives

Page 25: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

2525

GAME DEVELOPERS

Enable Game Developer to easily port to SHIELD/Android

• CodeWorks 1r4

• NDK profiling

• Same Graphics Debugging and Profiling Tools as PC

Support consumer devices

• No rooted OS requirement

• Basic feature set support on non-Tegra

Visual Studio support

Page 26: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

2626

NVIDIA® NSIGHT™ TEGRA VISUAL STUDIO EDITION

Android NDK/JDK application development

Android Debugging Logcat FilteringProject Management

Page 27: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

2727

NEW WITH NSIGHT TEGRA 3.3Android Marshmallow and ARMv8 AArch64

Support for non-Tegra devices

Tegra Graphics Debugger Attach

SIGSEV Signal handler

NDK r10e win64 (Link massive games + new options)

GDB 7.9 win64

CMake 3.1 support

Visual Studio 2015

CMake

Page 28: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

2828NVIDIA CONFIDENTIAL

Android GDB debugging in Visual

Studio

Set breakpoints in both Java and

Native (C/C++)

Use the familiar Visual Studio

Locals, Watches, Memory and

Breakpoints windows.

Build Native Android projects in

Visual Studio using vs-android,

ndk-build or makefiles.

Page 29: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

3030

AUTOMOTIVENsight Eclipse Edition – NextGen

Build, Debug and Profile CUDA applications

Required Eclipse version 4.4 or later

Developed based on Eclipse CDT/DSF framework.

Using Eclipse remote system explorer plugins to connect to the remote devices.

Nsight plugins delivered as archive file(zip) and installed using standard Eclipse

Can co-exist with other Eclipse plugins in the user environment.

Page 30: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

3131NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

SCREENSHOTS

Debug session views

Launch shortcuts for cuda-gdb

Page 31: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

3232

FUTURE

Multi-process support

PerfWorks

Tegra System Profiler

Multi-Node

DrivePX2 with 2 SoCs

LTTNg GPU event provider

GPU process trace

Page 32: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing

3333

Q&A

https://developer.nvidia.com

Monday 4/4 - S6111, NVIDIA CUDA® Optimization with NVIDIA Nsight™ Eclipse Edition: A Case Study 9:00 - Room 211A

Tuesday 4/5 - S6659, Perfworks: A Library for GPU Performance Analysis 15:00 - Room 211B

Wednesday 4/6 - L6135A/B, Jetson Developer Tools Lab 13:30/15:30 – Room 210C

Thursday 4/7 - S6810, Optimizing App. Performance w/ CUDA Profiling Tools 10:00 - Room 211B

Page 33: April 4-7, 2016 | Silicon Valleyon-demand.gputechconf.com/gtc/2016/presentation/s6615-sebastien... · April 4-7, 2016 | Silicon Valley. 2 TEGRA PLATFORMS GAMING ... Jump starts developing