Top Banner
Presented By Name: Fabio Murra and Obioma Okehie Title: PERSEUS Plus, the first single-FPGA real-time 4Kp60 encoder Date: 10 th December 2018 PERSEUS Plus HEVC the first single-FPGA real-time 4Kp60 encoder
23

PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

May 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

Presented By

Name: Fabio Murra and Obioma Okehie

Title: PERSEUS Plus, the first single-FPGA real-time 4Kp60 encoder

Date: 10th December 2018

PERSEUS Plus HEVCthe first single-FPGA real-time 4Kp60 encoder

Page 2: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

Unique compression technologies to dramatically improve density & video quality

to all screens over any network

Page 3: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

Traditional approaches to video compression

Traditional encoding approaches

Traditionally, standards have been developed within MPEG every 7-10 years to freeze the codec algorithms and deploy it as a hardware block within dedicated encoding / decoding devices and SoCs needed to deal with the high complexity of the algorithm in real time

h.264, HEVC, VP9, AV1

Page 4: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

PERSEUS Plus: a new approach

• Unique hierarchical image representation is far more efficient than the traditional block-based codecs

• Combining PERSEUS Plus with an existing base codec improves the overall quality and bandwidth requirements

• The approach better utilises the hardware resources available in modern chipsets and FPGAs

Lower res base increases density 4x

Shaders and Scalers graphics computations

Contr

ol

Tra

nsfo

rms

Page 5: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

PERSEUS Plus: a new approach

• Unique hierarchical image representation is far more efficient than the traditional block-based codecs

• Combining PERSEUS Plus with an existing base codec improves the overall quality and bandwidth requirements

• The approach better utilises the hardware resources available in modern chipsets and FPGAs

V-Nova becoming a standard:

PERSEUS Plus in process for “Low Complexity Codec Enhancements”

PERSEUS Pro undergoing standardization as VC-6/ST-2117

Page 6: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

PERSEUS on Xilinx FPGA: unique benefits

Codec AgnosticBandwidth savings

• Up to 50% more efficient

• Live UHDp60 @8Mbps, 1080p60 @3Mbps

• Increase reach, improve quality of experience, reduce cost

Unbeatable Density

• 4x increase in density on FPGA

• 50x denser than the equivalent software-only implementation

• UHDp60 in single FPGA.

• PERSUS Plus is codec agnostic

• Works with h.264, HEVC, VP9and even AV1 when available

• Maximum compatibility with existing workflow

Page 7: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

The ONLY 4Kp60 real-time encoder on single FPGA

4Kp60 on single VU9P 4Kp60 on 4 x VU9P 4Kp60 on 80 x x86 cores

V-Nova PERSEUS+ NGC HEVC NGC HEVC only x265 Software (very slow preset)

Best Performance Medium Performance Lowest Performance

Lowest Cost Medium Cost Highest Cost

Lowest Power Medium Power Highest Power

Page 8: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

PERSEUS: from Algorithm to Board

Page 9: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

Compression algorithm to Board

Algorithm Fix-Pointing RTL Coding

RTL Verification

Board Installation

Synthesis / P&RBoard

ProgrammingSystem Verification

Design spec

Host Machine setup

Page 10: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

Going through the implementation options

˃ Two options to achieving design implementation

Full hardware flow

‒ We have full control of every aspect of the implementation and deployment platforms

‒ Tools used for pre-synthesis stages can be flexible

‒ Take full responsibility on host / kernel drivers

‒ Typically longer design times (Much work involved)

SDAccel design flow

‒ Static region abstracted out. Hence, only need to care about the core IP

‒ Reduces design time

‒ Integration with partner IP’s much easier and faster

˃ Option we chose

SDAccel RTL-kernel design flow

‒ OpenCL runtime

‒ XMA runtime

Page 11: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

SDAccel

RTL CodingFull system integration

(Core-IP, DDR, PCIe)

Existing project

Split Core-IP into separate kernels

Package custom IP

Block design based integration

Kernel Verification

Synthesize

Generate RTL kernel (.xo)

SDAccel build

Host Code

*.exe

*.xclbin

SDAccel generate kernels

MicroBlazeexecutable build

External IP(.xo file)

˃ Started from a known good design

˃ Split design into separate kernels

(parallelism)

˃ Generated RTL kernels (.xo) and

microblaze executables (.elf) via

Vivado

˃ Integrate host code and external IP

within SDAccel

Page 12: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

FPGA occupancy: 4Kp60 in a single VU9P FPGA

ItemCurrent

LUTsMemory

LUTsCurrent

DSPsCurrent FFs

36kbit BRAMs(RAMB36)

(OR) 18kbit BRAMs

(RAMB18)BRAM (Kbits)

288Kbit UltraRams(RAMB288)

Core-IP % 62.94 20.11 45.77 32.91 45.09 19.12 64.22 41.67

DSA static % 9.36 1.35 0.04 5.56 10.42 0.19 10.60 0.00

Fitting Percentage (%) 72.3 21.5 45.8 38.5 55.5 19.3 74.8 41.7

PERSEUS Kernels

NGC Kernel

Memory Subsystem

(Xilinx)

DSA

(Xilinx)

DDR

Host CPU

Dynamic Region

Static Region

Core-IP

DSA

Free

% LUTS

Page 13: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

SDAccel

˃ Difficulties we faced and how we solved them (2017.4.op)

˃ Close communication with Xilinx SME’s & FAE’s will help resolve most issues in a

timely manner.

Kernel verification (HW-Emu option does not enable backpressure on the memory interfaces)

‒ Will require mixed signal simulator license to use external faster simulators for complex systems

Accessing local memory contents (e.g ROM values)

‒ Create extra HW process to dump contents onto DDR

Debugging the microblaze code

‒ Connecting the microblaze processor to the AXI-MM interface to dump text printouts onto DDR

Lack of control on allocation of local memory (DDR on FPGA)

‒ Address re-routing logic required (for cases where the allocated address is outside the module address range)

Page 14: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

Page 14

Integrating PERSEUS Plus into FFmpeg framework

$ ffmpeg \-f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -an -i/home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_24.yuv \-frames 240 -b:v 4000k -g 30 -c:v pplusenc_fpga -y ./hw_outdir/out1_br4000k.h264

ffmpeg \-f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -an -i/home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_24.yuv \-frames 240 -c:v libx264 -preset medium -profile:v high -crf 23 -bf 4 -refs 3 -g 30 -b:v 4000k -maxrate 4000k -bufsize8000k -f h264 -r 30 -y ./sw_outdir/x264_medium_out0_br4000k.h264

https://trac.ffmpeg.org/wiki/EncodingForStreamingSites

Change < 20 characters to get hyper acceleration

Page 15: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

Visual Quality Improvement

PERSEUS Plus

Page 16: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

UHD VQ improvements

Bitrate

Qualit

y

HEVC (NGCodec)

PERSEUS Plus

HEVC (NGCodec)

4x lift in density coupled with:

- Video quality improvement

- Bandwidth savings

Page 17: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

UHD VQ improvements

frames

Qua

lity

HEVC (NGCodec)

PERSEUS Plus HEVC (NGCodec)

Page 18: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

Improving quality and density of existing deployments

QSV h.264 PERSEUS

0.80

0.84

0.88

0.92

0.96

1.00

'Wat

chab

ility

' (M

S-S

SIM

)

1080p at 1.0 Mbps - Full HD over 4G

0.840.860.880.9

0.920.940.960.98

1

0 250 500 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 3250 3500 3750 4000 4250 4500 4750 5000 5250 5500

MS

-SS

IM

480p at 500 kbpsX.264Perseus

PERSEUS

Plus h.264Native

h.264

Qualit

y (

hig

he

r is

better)

Frames

Page 19: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

1. Acceleration for any 3rd party IP

2. Acceleration for Xilinx Video IP

PERSEUS Plus Products:

Page 20: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

PERSEUS XSA – Available for deployment

Benefits

Enhance existing

server performance:

• add real-time 4Kp60

• increase ABR density

• Reduce power

• Reduce $ per channel

PERSEUS + any codec

QSV (available now)

x264 (available now)

x265 (available now)

VP9 (roadmap)

AVS2 (pending business case)

AV1 (pending business case)

VVC (pending business case)

PERSEUS Plus

Xilinx VU9P implementation of

PERSEUS Plus works with any codec

Page 21: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

V-Nova PERSEUS+ NGC HEVC NGC HEVC only x265 Software (very slow preset)

Best Performance Medium Performance Lowest Performance

Lowest Cost Medium Cost Highest Cost

Lowest Power Medium Power Highest Power

4Kp60 on single VU9P 4Kp60 on 4 x VU9P 4Kp60 on 80 x x86 cores

PERSEUS XDE – Available soon

Page 22: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,

PERSEUS Plus Xilinx IP Offering

Codec Partner Description PERSEUS Plus benefits Availability

H.264 HDE Alma High density encoder Improve video quality Feasibility

H.264 HQE IDT High quality encoder

HEVC-HDE NGCodec High density encoder

HEVC-HQE NGCodec High quality encoder Improve density 4x

Improve UHD VQ

SOON

VP9-HQE NGCodec High quality encoder Improve density 4x

Improve UHD VQ

Roadmap

Zynq-H.264 Xilinx Hardened H.264 core Improved video quality

Improved density

Feasibility

Zynq-H.265 Xilinx Hardened H.265 core Improved video quality

Improved density

Feasibility

Codec Partner Description PERSEUS Plus benefits Availability

x.264 Open source software encoder Improve video quality (from –medium to –very-slow)

Improve density

NOW

x.265 Open source software encoder Improve video quality (from –fastest to –medium)

Improve density

NOW

QSV Intel hardened core Improve density 3x

Improve video quality

NOW

Under review

Page 23: PERSEUS Plus HEVC - Xilinx · Bandwidth savings Codec Agnostic •Up to 50% more efficient •Live UHDp60 @8Mbps, 1080p60 @3Mbps •Increase reach, improve quality of experience,