Overview of the H.264/AVC Video Coding Standard T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra Overview of the H.264/AVC Video Coding Standard, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003. CMPT 820: Multimedia Systems
38
Embed
Overview of the H.264/AVC Video Coding Standard T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra Overview of the H.264/AVC Video Coding Standard,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Overview of the H.264/AVC Video Coding Standard
T. Wiegand, G. Sullivan, G. Bjontegaard, and A. LuthraOverview of the H.264/AVC Video Coding Standard,IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003.
CMPT 820: Multimedia Systems
Outline
Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions
Evolution of Video Compression Standards
H.261Video Telephony
H.262/MPEG-2Digital TV/DVD
MPEG-4 VisualObject-based Coding
H.263Video Conferencing
H.264 MPEG-4 AVC
MPEG-1Video-CD
ITU-T MPEG
H.264/AVC Coding Standard
Various Applications Broadcast: cable, satellites, terrestrial, and DSL Storage: DVDs (HD DVD and Blu-ray) Video Conferencing: over different networks Multimedia Streaming: live and on-demand Multimedia Messaging Services (MMS)
Challenge: How to handle all these applications and networks Flexibility and customizability
Structure of H.264/AVC CodecLayered design Network Abstraction
Layer (NAL) formats video and meta
data for variety of networks
Video Coding Layer (VCL) represents video in an
efficient way
Scope of H.264 standard
Outline
Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions
Network Abstraction Layer
Provide network friendliness for sending video data over various network transports, such as: RTP/IP for Internet applications MPEG-2 streams for broadcast services ISO File formats for storage applications
We present a few NAL concepts
NAL Units
Packets consist of video data short packet header: one byte
Support two types of transports stream-oriented: no free unit boundaries use a
3-byte start code prefix packet-oriented: start code prefix is a waste
Can be classified into: VCL units: data for video pictures Non-VCL units: meta data and additional info
Non-VCL NAL Units
Two types of non-VCL NAL units Parameter sets: headers shared by a large
number of VCL NAL units a VCL NAL unit has a pointer to its picture parameter
set a picture parameter set points to its sequence
parameter set Supplemental enhancement info (SEI): optional
info for higher-quality reconstruction and/or better application usability
Sent over in-band or out-of-band channels
Access Units
A set of NAL units Decoding an access unit
results in one picture Structure:
Delimiter: for seeking in a stream
SEI: timing and other info primary coded picture: VCL redundant coded picture:
for error recovery
Video Sequences and IDR Frames Sequence: an independently decodable NAL unit
stream don’t need NALs from other sequences with one sequence parameter set starts with an instantaneous decoding refresh (IDR) access
unit IDR frames: random access points
Intra-coded frames no subsequent picture of an IDR frame will require
reference to pictures prior to the IDR frame decoders mark buffered reference pictures unusable once
seeing an IDR frame
Outline
Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions
Video Coding Layer (VCL)
(Like other) hybrid video coding: H.264/AVC represents pictures in macroblocks Motion compensation: temporal redundancy Transform: spatial redundancy
Small improvements add up to huge gain Combining many coding tools together
Pictures/Frames/Fields Fields: top/bottom field contains even/odd rows Interlaced: two fields were captured at diff time Progressive: otherwise
Macroblocks and Slices
Fixed size MBs: 16x16 for luma, and 8x8 for chroma
Slice: a set of MBs that can be decoded without use of other slices I slice: intra-prediction (I-MB) P slice: possibly one inter-prediction signal (I- and
P-MBs) B slice: up to two inter-prediction signals (I- and B-
MBs) SP slice: efficient switch among streams SI slice: used in conjunction with SP slices
Flexible Macroblock Ordering (FMO) MBs in a slice: in raster-order
Slice group: more flexible Each slice group contains one
or several slices Possible usages:
Region-of-interest (ROI) Checker-board for video
conferencing
Adaptive Field Coding
Two fields of a frame can be coded as: A single frame (frame mode) Two separate fields (field mode) A single frame with adaptive mode (mixed mode)
Picture-adaptive frame/field (PAFF) frame/field decision is made at frame level 16% - 20% bit rate reduction over frame only
Macroblock-adaptive frame/field (MBAFF) frame/field decision is made at MB level 14% - 16% bit rate reduction over PAFF
[Ref] http://scien.stanford.edu/2005projects/ee398/projects/presentations/Guerrero Chan Tsang - Project Presentation - Fast Macroblock Adaptive Coding in H264.ppt
suitable for Interlaced and high motion
Intra-frame Prediction
In spatial domain, using samples to the left and/or on above to predict samples in a MB
Types of intra-frame prediction: Intra_4x4: detailed luma block Intra_16x16: smooth luma blocks Chroma_8x8: similar to Intra_16x16 as chroma
components are smooth I_PCM: bypass prediction/transform, send
samples anomalous pictures, loseless, and predictable bit rate
Intra_4x4 Prediction Samples in 4x4 block are predicted using 13
neighboring sample 8 prediction mode: 1 DC and 8 directional Sample D is used if E-H is not available
Sample Intra_4x4 Prediction Interpolation is used in some modes
Inter-Prediction Accuracy ¼-pixel for luma, 1/8-pixel for Chroma Half-pixel samples:
6-tap FIR filter
Quarter-pixel samples:
average of neighbors
Chroma predictions: bilinear
interpolation
)32/(
)520205(
1
1
broundb
JIHGFEb
)2/)(( bGrounda
C D
A B
E
K L M N O P
F G H I J
T U
R S
cc dd ee ff
aa
bb
gg
hh
ba ce f gi j kp q r
dhn
m
s
Multiframe Inter-Prediction in P Slices More than one prior reference pictures (by diff. MBs) Encoders/decoders buffer the same reference
pictures for inter-prediction Reference index is used when coding MVs MVs for regions smaller than 8x8 uses the same index for
all MVs in the 8x8 region P_skip mode:
Don’t send residual signals nor MVs nor reference index Use buffered frame 0 as the reference picture Use neighbor’s MVs Large areas with no change or constant motion like slow
panning can be represented with very few bits.
Multiframe Inter-Prediction in B Slices Weighted average of
2 predictions B-slices can be used
as reference Two reference picture
lists are used One out of four pred. methods for each partition:
list 0 list 1 bi-predictive direct prediction: inferred from prior MBs
The MB can be coded in B_skip mode (similar to P_skip)
4x4 Integer Transform
Why smaller transform: Only use add and shift, an exact inverse transform is
possible no decoding mismatch Not too much residue to code Less noise around edge (ringing or mosquito noise) Less computations and shorter data type (16-bit)
An approximation to 4x4 DCT:
, where
2nd Transform and Quantization Parameter 2nd Transform: Intra_16x16 and Chroma
modes are for smooth area DC coefficients are transformed again to cover
the whole MB Quantization step is adjusted by an
exponential function of quantization parameter to cover a wider range of QS QP increases by 6 => QS doubles QP increases by 1 => QS increases by 12% =>
bit rate decreases by 12%
Entropy Coding Non-transform coefficients: an infinite-extent
codeword table Transform coefficients:
Context-Adaptive Variable Length Coding (CAVLC) several VLC tables are switched dep. on prior
transmitted data better than a single VLC table Context-Adaptive Binary Arithmetic Coding
(CABAC) flexible symbol probability than CAVLC 5 – 15% rate
reduction efficient: multiplication free
In-loop Deblocking Filter
Operate within coding loop Use filtered frames as ref. frames
improve coding efficiency Adaptive deblocking, need to determine
Blocking effects or object edges Strong or weak deblocking
Intuitions Large difference near a block edge -> likely a block artifact If the difference is too large to be explained by the QP
difference -> likely a real edge E.g., Filter p0 and q0 if
Hypothetical Reference Decoder (HRD) Standard receiver buffer models encoders
must produce bit streams that are decodable to HRD
Two buffers Coded picture buffer (CPB)
models the bit arrival and removal time Decoded picture buffer (DPB)
models the frame decoded and output time in reference frame lists
Outline
Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions
Profiles and Applications Defines a set of coding tools and algorithms Conformance points for interoperability 3 Profiles for different applications
Baseline – video conferencing Main – broadcast, media storage, digital cinema Extended – streaming over IP (wire/wireless)
[Ref] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. WediVideo coding with H.264/AVC: tools, performance and complexityIEEE Circuits and Systems Magazine 4(1) pp. 7 - 28 May 2004
Outline
Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions
Feature Highlights -- Prediction Variable blocksize MCs Quarter-sample accurate MCs MVs over pic. Boundaries Multiple reference pictures Weighted Bi-directional prediction Decoupling of referencing from display orders Decoupling prediction mode from reference
capability (uses B frames as reference) Improved Skip/Direct modes Intra prediction in Spatial domain In-loop deblocking filter
Feature Highlights -- Transform Small block-size transform 2-level block transform (repeated DC