Top Banner
High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast Video Peter Walsh Chief Emerging Technology Engineer ESPN
29

High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Aug 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast Video

Peter Walsh Chief Emerging Technology Engineer

ESPN

Page 2: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Overview

• Real-time GPU processing of broadcast video

– Maximize GPU utilization

– Maintain flexibility

• High Performance Video Pipeline

– CPU and GPU buffers

– Data transfer

Page 3: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Monday Night Football production truck

Page 4: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

NASCAR production truck

Page 5: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Studio (BCS championship “Film Room”)

Page 6: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

GPU Processing

• Segmentation (generating chromakey)

• Inserting graphics (linear and chromakeying)

• Field (camera) tracking

• Object (player) tracking

Page 7: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Segmentation

GFX insertion

Field Tracking

Interop

Input Video

CPU GPU

Rendering

Output Video

Object Tracking

Page 8: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Background

• “Best Practices in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2013

• “Topics in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2014

Page 9: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS
Page 10: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Naïve Sequential Implementation

• Acquire

• Upload

• Process

• Download

• Output

1 Frame Time

Page 11: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Simultaneous Operations

• Acquire

• Upload

• Process

• Download

• Output

1 Frame Time

Page 12: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Techniques

• Avoid CPU memory copies

• Use pinned system memory

• DMA Video I/O using pinned memory

• DMA between CPU and GPU

• Asynchronous – using multiple CUDA streams

• Double buffers for simultaneous R/W

Page 13: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Frame Buffers

Pinned System

System

GPU

Page 14: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Frame Buffers

Pinned System

System

GPU

Page 15: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Buffer Allocation • Device • System • Pinned System

• 1D • 2D (pitch specified) • 2D (pitch determined by CUDA allocation)

Page 16: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Pitch

Page 17: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

CUDA API

Allocation:

Memory Copies:

cudaMalloc() cudaHostAlloc() cudaMallocPitch()

cudaMemcpy() cudaMemcpy2D() cudaMemcpyAsync() cudaMemcpy2DAsync()

Page 18: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Buffer Transfers

B.Copy(A, pStream)

• Source and destination buffers

– System, pinned system, device

– Different pitches

• Supports Synchronous/Asynchronous transfers

Page 19: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

CUDA Kernels

LaunchKernel( A, B, pStream, …)

• Buffers A and B are in device memory

• Sync/Async behavior controlled by pStream

Page 20: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

A

B C

D

Processing

Acquire(A) B.Copy(A, pUploadStream) Process(B, C, pProcessingStream, params) D.Copy(C, pDownLoadStream) Output(D)

GPU

CPU

Page 21: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Double Buffering

Dst

Src

Src

Dst

Frame “i”

Frame “i + 1”

Page 22: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Double Buffering

Src

Processing

GPU

CPU

Dst

Src Dst Src Dst

Src Dst

Page 23: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Double Buffering

Src

Processing

GPU

CPU

Dst

Src Dst Src Dst

Src Dst

Page 24: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Segmentation

GFX insertion

Field Tracking

Interop

Input Video

CPU GPU

Rendering

Output Video

Object Tracking

Page 25: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Simultaneous Operations

• Acquire

• Upload

• Process

• Download

• Output

1 Frame Time

Page 26: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Intel IPP ippiFilter_8u_C1R (pSrcImgOffset, srcPitch, pDstImgOffset, dstPitch, roi, filterKernel, kernelSize, anchor, divisor);

NVIDIA NPP nppiFilter_8u_C1R (pSrcImgOffset, srcPitch, pDstImgOffset, dstPitch, roi, filterKernel, kernelSize, anchor, divisor);

HPVP Filter_8u_C1R(pSrc, pDest, roi, pFilterKernel);

Page 27: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Live Filtering

• Acquire(A)

• B.Copy(A, pUploadStream)

• Filter_8u_C3R(B, C, roi, pFilterKernel) *

• D.Copy(C, pDownLoadStream)

• Output(D)

* CUDA stream for processing already defined

Page 28: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

References/Links

“Best Practices in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2013

“Topics in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2014 http://www.youtube.com/watch?v=QpEV-XVIxNw http://frontrow.espn.go.com/2014/01/espns-advanced-replay-tool-art-graphically-enhances-sports-telecasts/

Page 29: High Performance Video Pipelining: A Flexible Architecture for …on-demand.gputechconf.com/gtc/2014/presentations/S4481... · 2014. 4. 7. · NASCAR production truck . Studio (BCS

Questions

Peter Walsh ESPN [email protected] (860) 766-2908