Page 1
Copyright©2015 NTT corp. All Rights Reserved.
Professional H.265/HEVC Encoder
LSI Toward High-Quality 4K/8K
Broadcast Infrastructure
Hiroe Iwasaki, Takayuki Onishi, Ken Nakamura, Koyo Nitta,
Takashi Sano, Yukikuni Nishida, Kazuya Yokohari, Jia Su,
Naoki Ono, Ritsu Kusaba, Atsushi Sagata, Mitsuo Ikeda, and
Atsushi Shimizu
NTT Media Intelligence Laboratories
NTT Corporation
Code Name: NARA (Next-generation encoder Architecture for Real-time HEVC Application)
Page 2
2 Copyright©2015 NTT corp. All Rights Reserved.
Outline
• Introduction and Background • History of NTT’s Video CODEC LSIs
• Roadmaps toward 4K/8K UHDTV
• Latest video coding standard (HEVC)
• Requirements for 4K/8K broadcasting
• NARA Architecture
• NARA Key Features and Functions • Single-chip configuration
• Multi-chip configuration
• NARA Chip Implementation
• Target Applications
• Conclusion
Page 3
3 Copyright©2015 NTT corp. All Rights Reserved.
Encoder PC Card
(ICCE2000)
SDTV HDTV
Encoder PCI Board
Portable HDTV Encoder
(ICCE2001)
ISIL(’02)
(CICC2003)
SARA(’07) (HotChips19)
VASA (’02) (HotChips14)
NHK/NTT-COM Board
SARA/D (’08) (CoolChips XI)
HDTV H.264 Encoder
HDTV H.264 Decoder
MPEG-2/H.264 Transcoder
MPEG2 H.264
SuperENCII(‘00)
ISIL-II (’07)
(CoolChips X)
Analog Digital
ENC-C/-M (’95)
(HotChips7)
SuperENC (‘98)
(HotChips10)
HDTV MPEG-2 Encoder
HDTV MPEG-2 Decoder
HDTV UHDTV
H.264 H.265
NARA(’15)
History of NTT’s video CODEC LSIs
Page 4
4 Copyright©2015 NTT corp. All Rights Reserved.
Roadmap toward 4K/8K UHDTV
• 4K test broadcast over satellite in 2014, 8K in 2016
• 4K/8K commercial broadcast TV programs in 2020
2011 2012 2013 2014
Analog to Digital Terrestrial
2016 2020
CATV/IPTV
4K Test
Broad-cast
Rio de Janeiro soccer games
Tokyo sporting event
Rio de Janeiro sporting event
8K Test
Broad-cast
4K/8K Roadmap in Japan
Reference: Interim Report of 4K/8K Roadmap Follow-up Meeting (MIC, Japan)
2160pix
3840pix
4320pix
7680pix
1080pix
1920pix
Page 5
5 Copyright©2015 NTT corp. All Rights Reserved.
HEVC – High Efficiency Video Coding
• The latest video coding standard (Jan. 2013, Range extensions Apr. 2014)
• Achieves half bit rate compared to H.264, 1/4 to MPEG-2, key technology for 4K/8K
H.261 (ISDN videophone)
MPEG-2 (Digital TV, DVD)
Year 1990 1999
H.265/HEVC
H.264 (Blu-ray, camcorder)
MPEG-4 (Cellular videophone)
2013 1996 2003
Video coding standards history
4K/8K Broadcast & Distribution
Bit
rate
(lo
g)
Page 6
6 Copyright©2015 NTT corp. All Rights Reserved.
What is HEVC?
• Existing encoding flows, but “adaptive and exhaustive” combination of prediction tools
Inter prediction (motion vector search)
Quantize Entropy coding - Video frames Encoded stream
+
Loop filtering
Transform
Inv. transform
Inv. quantize
Frame memory
Intra prediction
Locally decoded frames
10011010…
Predicted Pixels
33 pred. directions
4x4..32x32 blocks
-1 0 1 …… 6 7 8 9 ….. 15 -1 0 1 …
6 7 8 9
…
15
Reference Pixels -1 0 1 …
6 7
8 directions 4x4..16x16 blocks
-1 0 1 …… 6 7
Intra (within-a-frame) prediction
H.264 HEVC
Inter (inter-frame) prediction
H.264 HEVC
8x8
8x16
16x8
16x16
*4x4 sub-MB available, but rarely used
CU
Coding Unit (CU)
CU
CU
CU CU
CU CU
CU CU
Prediction Unit (PU) Coding Tree Unit(CTU)
(Max. 64x64)
PU PU
PU PU PU
PU PU
PU PU
PU PU PU PU PU
PU
PU
PU
Motion Vectors(MV)
TU
TU TU
TU TU
TU TU
Intra
Intra
Intra
Intra
64
64
Example result: …
Page 7
7 Copyright©2015 NTT corp. All Rights Reserved.
HEVC encoding complexity
• About 30x of MPEG-2 processing time, 5x of H.264 processing time
Reference software: MPEG-2_MSSG H.264_JM18.5 HEVC_HM12.1
0 5 10 15 20 25 30 35
MPEG-2
H.264
HEVC
Processing time ratio (MPEG-2 = 1)
4K video coding time comparison
Motion search
Freq. transform
Q+IQ
CABAC
Intra prediction
Deblocking
SAO
Others
Page 8
8 Copyright©2015 NTT corp. All Rights Reserved.
• Practical 4K/8K broadcast infrastructure in 2020
• Latest video coding standard (H.265/HEVC) for high compression
• Color signal robustness against tandem encoding
• High bitrate of up to 600 Mbps
Requirements for 4K/8K broadcasting
NARA: Professional H.265/HEVC encoder LSI toward high-quality 4K/8K broadcast infrastructure
Page 9
9 Copyright©2015 NTT corp. All Rights Reserved.
Main concepts for NARA architecture
• Application specific hardware blocks for processes high computational complexity processes, such as precise motion estimation
• Hierarchical pipeline scheme for decisions on optimal hierarchical coding/prediction/transform unit size with high compression
• Single-chip 4K configuration and multi-chip 8K configuration for practical encoding systems
Page 10
10 Copyright©2015 NTT corp. All Rights Reserved.
NARA block diagram
8K Configurable Reference Picture Image Cache
Reference Picture Image Bus
MBUS
VIF Video Data
Prediction Core
Bus Interconnect
TQ / ITIQ
DF / SAO
CRISC
Dual Coding Core
TRISC
MRISC
MBUS
Output
Host CPU Host BUS
DDR x 3
Multi-chip Stream In
PCIe I/F
IFE
WME FME MC MME IPD MED CABAC
BSO
PCIe
IIM
MRISC
DDR I/F
IME
PRISC
Motion Estimation(ME) engines
Audio
MUX
8K Configurable Reference Picture Image Cache
210Mbit
5120bit
768 GOPS
600Mbps
VIF: Video Interface IFE: Image Feature Extraction MED: Multi-block-size Edge Detector IPD: Intra Prediction WME: Wide-range Motion Estimation MME: Multi-Block-Size Motion Estimation IME: Integer pixel Motion Estimation FME: Fractional pixel Motion Estimation
BSO: Bit Stream Out MUX: Multiplexer PRISC: Prediction Core RISC CRISC: Coding Core RISC MRISC: Middle-level RISC TRISC: Top-level RISC
MC: Motion Compensation IIM: Intra-Inter Mode Decision MBUS: Memory BUS TQ: Transform and Quantization ITIQ: Inverse Transform and Quantization DF: Deblocking Filter SAO: Sample Adaptive Offset filtering
Page 11
11 Copyright©2015 NTT corp. All Rights Reserved.
NARA pipeline polices
• Parallel processing for precise motion estimation achieving better coding efficiency
• Strictly sequential calculation (conforming to HEVC standard) desirable for mode decision to precisely evaluate coding bit costs
• Short pipeline stages for efficient rate control
Page 12
12 Copyright©2015 NTT corp. All Rights Reserved.
NARA pipeline scheme
1CTU
Pixel data transfer and filtering
Block-size-parallel motion search
Squeeze from 4 to 3 sizes by parallel pre-decision
Full-tournament mode decision
WME
MME
IME
Hpel-FME
Qpel-FME
Bipred-FME
IIM
MC
TQ/ITIQ
DF
SAO
CABAC
• NARA adopts CTU-based hierarchical pipeline scheme
• Filter mode decisions for coding efficiency while keeping image quality: – Wide-range ME (WME) -> Multi-block-size ME (MME) -> Integer ME (IME) ->
Fractional ME (FME) -> Inter/Intra Mode Decision (IIM)
Page 13
13 Copyright©2015 NTT corp. All Rights Reserved.
Parallel pre-decision in MME
Parallel pre-decision
8x8 SAD 16x16 32x32 64x64
Jud
gmen
t
WME
MME
IME
Hpel-FME
Qpel-FME
Bipred-FME
IIM
MC
TQ/ITIQ
DF
SAO
CABAC
• Motion estimation for all block sizes by parallel pre-decision
• Filter into three block sizes by parallel pre-decision
Page 14
14 Copyright©2015 NTT corp. All Rights Reserved.
Block-size parallel motion search in FME
16x16 32x32 8x8/64x64
Block-size parallel motion search
WME
MME
IME
Hpel-FME
Qpel-FME
Bipred-FME
IIM
MC
TQ/ITIQ
DF
SAO
CABAC
• Motion estimation for 3 block sizes in block-size parallel motion search
Page 15
15 Copyright©2015 NTT corp. All Rights Reserved.
Full-tournament mode decision in IIM
4x4 / 8x8 16x16 32x32 64x64
Full- tournament mode decision Z-scan order
WME
MME
IME
Hpel-FME
Qpel-FME
Bipred-FME
IIM
MC
TQ/ITIQ
DF
SAO
CABAC
• Strictly sequential calculation by full-tournament mode decision
• Strictly sequential calculation (conforming to HEVC standard) desirable for mode decision to precisely evaluate of coding bit costs
Page 16
16 Copyright©2015 NTT corp. All Rights Reserved.
NARA configurations
• Capability
• Single-chip processing up to 4K 60fps 4:2:2
2160
3840
60 frames/sec
HEVC
4:2:0 x
y y
4:2:2
• Multi-chip scalability up to 8K 60fps
x
4320
7680
HEVC
Page 17
17 Copyright©2015 NTT corp. All Rights Reserved.
Multi-chip configuration
PCIe Switch
Chip
#0 Chip
#1
Chip
#2 Chip
#3 Concatenated
Stream Out
Host CPU
8k
4k
Horizontal split (Slice or Tile)
#0 #1 #2 #3
Reference Image, DF&SAO Image near Slice/Tile Boundaries
Stream of slice/tile #0
Stream of slice/tile #0+#1
Stream of slice/tile #0+#1+#2
Slice or Tile Boundaries
128
128
4
4
Reference Picture Transfer Region
DF&SAO Transfer Region
Picture
Page 18
18 Copyright©2015 NTT corp. All Rights Reserved.
Key NARA features and functions
• Complicated HEVC processing mapped to hierarchical pipeline scheme based on coding tree units (CTUs).
• Hierarchical pipeline achieves wide-range motion estimation with ±3847.75 x ±1926.75 search range and optimized HEVC’s high-precision prediction mode decision
• 4k/60p 4:2:2 real-time encoding with ultra-low delay for field pickup units (FPUs), high bitrate of up to 600 Mbps for contribution, multi-channel encoding for cloud systems, and multi-standard encoding for smooth migration
• Ultra-high definition TV encoded beyond 4K with motion estimation and loop filtering across split boundaries when each chip encodes a partitioned frame
• Suitable for HEVC-based tandem encoding with 4:2:2 for keeping good color information and two-pass encoding for higher compression of final distribution
Single-chip configuration:
Multi-chip configuration:
Page 19
19 Copyright©2015 NTT corp. All Rights Reserved.
NARA chip implementation
Reference Picture Image Cache Dual Coding Core
PCIe
DD
R ch
0
DD
R ch
1
DD
R ch
2
BSO MUX
MME
IFE
IME FME
WME
Bipred FME
ETHER
Audio
VIF
IIM
MED IPD
MC
T RISC
M RISC
RISC
MRISC
PRISC
H264
Page 20
20 Copyright©2015 NTT corp. All Rights Reserved.
Physical features
Technology 28nm CMOS
Number of transistors 83M gates
Clock frequency Max 600 MHz
Supply voltage Core: 0.9 V IO: 1.8/3.3 V DDR3: 1.5 V PCIe and 3G-SDI: 0.9/1.8 V
Power consumption Approximately 15.0W
Package 1152 pin FCBGA (35 x 35mm)
External memories DDR3
Page 21
21 Copyright©2015 NTT corp. All Rights Reserved.
Functional features
Video Profile H.265/HEVC Main, Main 10, Main 4:2:2 10 H.264/AVC Baseline, Main, High, High422
Motion search range
-3847.75/+3847.75 (H) -1926.75/+1926.75 (V)
Resolution and video rate
Single-chip: 4096x2160 at up to 60 frames per second Multi-chip: 7860x4320 at up to 60 frames per second
Others Audio: Serial I/F x 2 Port Stream Out: Parallel x 1 /Serial x 4 PCIe: Gen.2 x 8 Lane Ethernet: 1000/100/10 Mbps with MAC Others: User PES input, STC input/output
Page 22
22 Copyright©2015 NTT corp. All Rights Reserved.
Target applications (1)
Embedded CODEC Portable Microwave link
TV Station
TV System
HDTV CODEC
HDTV CODEC
Satellite
Edge
Local TV
Station
Digital TV Broadcasting Network Service
Distribution Transmission
Contribution Transmission
Original
3 times encoding and decoding
420 422
Page 23
23 Copyright©2015 NTT corp. All Rights Reserved.
Target applications (2)
4K/8K
UHDTV
Technology
Mobile/ TV
Broadcast
Advertising Education/ Academic
Cinema
Conferencing/
Presentation
Security/
Surveillance
Medical
Industrial Design
Reference: Interim Report of 4K/8K Roadmap Follow-up Meeting (MIC, Japan) *1USD= 120JPY
$68B
$192B
$21B
$17B
$7B
$7B
$20B
$0.6B
• Mobile phones • Consumer TVs • Broadcast devices (Cameras, editors, encoders)
• Digital signage • Outdoor display LEDs
• Medical monitors • Endoscopic systems
• CAD, CAM, CG (Mechanical, automotive, industrial designs)
• Security (surveillance) cameras • Industrial camera sensors
• Conferencing systems/services • Projectors
• Digital cinema (Projectors, screens) • Box office sales
• Museums, galleries
Page 24
24 Copyright©2015 NTT corp. All Rights Reserved.
Conclusion
• Developed: single-chip 4K 60fps 4:2:2 HEVC video encoder LSI, scalable to 8K 60fps
• 8K scalability achieved inter-chip connectivity and parallel processing functions
• NARA architecture has hierarchical pipeline scheme for CTUs
NARA is a key LSI for professional H.265/HEVC encoder LSI toward high-quality 4K/8K broadcast infrastructure