Overview of the H.264/AVC Video Coding Standard ThomasWiegand, Gary J. Sullivan, Gisle Bj ø ntegaard, and Ajay Luthra IEEE TRANSACTIONS ON CIRCUITS AND.

Overview of the H.264/AVC Video Coding Standard

ThomasWiegand, Gary J. Sullivan,

Gisle Bjøntegaard, and Ajay Luthra

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL. 13, NO. 7, JULY 2003

Outline

Overview of the technical features of H.264/AVC

Profiles and Levels

Goals of the H.264/AVC

Video Coding Experts Group (VCEG), ITU-T SG16 Q.6 H.26L project (early 1998) Target – double the coding efficiency in

comparison to any other existing video coding standards for a broad variety applications.

H.261, H.262 (MPEG-2), H.263 (H.263+, H.263++)

Scope of video coding standardization

Pre-Processing Encoding

DecodingPost-Processing& Error recovery

Source

Destination

Scope of Standard

Applications on H.264/AVC standard

Broadcast over cable, satellite, cable modem, DSL, terrestrial, etc.

Interactive or serial storage on optical and magnetic devices, DVD, etc.

Conversational services over ISDN, Ethernet, LAN, DSL, wireless and mobile networks, modems, etc. or mixtures of these.

Video-on-demand or multimedia streaming services over ISDN, cable modem, DSL, LAN, wireless networks, etc.

Multimedia messaging services (MMS) over ISDN, DSL, ethernet, LAN, wireless and mobile networks, etc.

Structure of H.264/AVC video encoder

Con

trol D

ata Video Coding Layer

Data Partitioning

Network Abstraction Layer

H.320 MP4FF H.323/IP MPEG-2 Etc.

Coded Macroblock

Coded Slice/Partition

Design feature highlights (1) — improved on prediction methods

Variable block-size motion compensation witVariable block-size motion compensation with small block sizesh small block sizes A minimum luma motion compensation block size

as small as 4×4. Quarter-sample-accurate motion compensatiQuarter-sample-accurate motion compensati

onon First found in an advanced profile of the MPEG-4 Vis

ual (part 2) standard, but further reduces the complexity of the interpolation processing compared to the prior design.

Design feature highlights (2) 　— improved on prediction methods

Motion vectors over picture boundariesMotion vectors over picture boundaries First found as an optional feature in H.263 is includ

ed in H.264/AVC. Multiple reference picture motion compensatiMultiple reference picture motion compensati

onon Decoupling of referencing order from display Decoupling of referencing order from display

orderorder (X)IBBPBBPBBP… => IPBBPBBPBB… Bounded by a total memory capacity imposed to e

nsure decoding ability. Enables removing the extra 　 delay previously ass

ociated with bi-predictive coding.


Decoupling of picture representation methodDecoupling of picture representation methods froms from 　　 picture referencing capabilitypicture referencing capability Ｂ－ frame　 could not be used as 　 references for

prediction Referencing to closest pictures

Weighted predictionWeighted prediction A new innovation in H.264/AVC allows the motion-c

ompensated prediction signal to be weighted and offset by amounts specified by the encoder.

For scene fading

Design feature highlights (4)　 — improved on prediction methods

Improved “skipped” and “direct” Improved “skipped” and “direct” motion inferencemotion inference Inferring motion in “skipped” areas => fo

r global motion Enhanced motion inference method for “

direct”


Directional spatial prediction for intra cDirectional spatial prediction for intra codingoding Allowing prediction from neighboring area

s that were not coded using intra coding Something not enabled when using the tra

nsform-domain prediction method found in H.263+ and MPEG-4 Visual


In-the-loop deblocking filteringIn-the-loop deblocking filtering Building further on a concept from an opti

onal feature of H.263+ The deblocking filter in the H.264/AVC desi

gn is brought within the motion-compensated prediction loop

Design feature highlights (7) 　 — other parts

Small block-size transformSmall block-size transform The new H.264/AVC design is based primari

ly on a 4×4 transform. Allowing the encoder to represent signals i

n a more locally-adaptive fashion, which reduces artifacts known colloquially as “ringing”.


Hierarchical block transformHierarchical block transform Using a hierarchical transform to extend th

e effective block size use for low-frequency chroma information to an 8×8 array

Allowing the encoder to select a special coding type for intra coding, enabling extension of the length of the luma transform for low-frequency information to a 16×16 block size


Short word-length transformShort word-length transform While previous designs have generally required

32-bit processing, the H.264/AVC design requires only 16-bit arithmetic.

Exact-match inverse transformExact-match inverse transform Building on a path laid out as an optional

feature in the H.263++ effort, H.264/AVC is the first standard to achieve exact equality of decoded video content from all decoders.

Integer transform


Arithmetic entropy codingArithmetic entropy coding While arithmetic coding was previously

found as an optional feature of H.263, a more effective use of this technique is found in H.264/AVC to create a very powerful entropy coding method known as CABAC (context-adaptive binary arithmetic coding)


Context-adaptive entropy coding Context-adaptive entropy coding CAVLC (context-adaptive variable-

length coding) CABAC (context-adaptive binary

arithmetic coding)

Design feature highlights (12) — Robustness to data errors/losses and flexibility for operation over a variety of network environments

Parameter set structureParameter set structure The parameter set design provides for robu

st and efficient conveyance header information

NAL unit syntax structureNAL unit syntax structure Each syntax structure in H.264/AVC is place

d into a logical data packet called a NAL unit


Flexible slice sizeFlexible slice size Unlike the rigid slice structure found in

MPEG-2 (which reduces coding efficiency by increasing the quantity of header data and decreasing the effectiveness of prediction), slice sizes in H.264/AVC are highly flexible, as was the case earlier in MPEG-1.


Flexible macroblock ordering (FMO)Flexible macroblock ordering (FMO) Significantly enhance robustness to data losses by

managing the spatial relationship between the regions that are coded in each slice

Arbitrary slice ordering (ASO)Arbitrary slice ordering (ASO) sending and receiving the slices of the picture in an

y order relative to each other first found in an optional part of H.263+ can improve end-to-end delay in real-time applicat

ions, particularly when used on networks having out-of-order delivery behavior


Redundant picturesRedundant pictures Enhance robustness to data loss A new ability to allow an encoder to

send redundant representations of regions of pictures


Data PartitioningData Partitioning Allows the syntax of each slice to be separated

into up to three different partitions for transmission, depending on a categorization of syntax elements

This part of the design builds further on a path taken in MPEG-4 Visual and in an optional part of H.263++.

The design is simplified by having a single syntax with partitioning of that same syntax controlled by a specified categorization of syntax elements.


SP/SI synchronization/switching picturesSP/SI synchronization/switching pictures A new feature consisting of picture types that

allow exact synchronization of the decoding process of some decoders with an ongoing video stream produced by other decoders without penalizing all decoders with the loss of efficiency resulting from sending an I picture

Enable switching a decoder between different data rates, recovery from data losses or errors, as well as enabling trick modes such as fast-forward, fast-reverse, etc.

NAL (Network Abstraction Layer)C

on

trol D

ata Video Coding Layer

Data Partitioning

Network Abstraction Layer

H.320 MP4FF H.323/IP MPEG-2 Etc.

Coded Macroblock

Coded Slice/Partition

NAL (Network Abstraction Layer)

Designed in order to provide “network friendliness”

facilitates the ability to map H.264/AVC VCL data to transport layers such as: RTP/IP for any kind of real-time wire-line and wirel

ess Internet services (conversational and streaming);

File formats, e.g., ISO MP4 for storage and MMS; H.32X for wireline and wireless conversational servi

ces; MPEG-2 systems for broadcasting services, etc.

Key concepts of NAL

NAL Units Byte stream and Packet format uses

of NAL units Parameter sets Access units

NAL units

payload

1 byte header

Integer number of bytes

Interleaved as necessary with emulation prevention bytes, which are bytes inserted with a specific value to prevent a particular pattern of data called a start code prefix from being accidentally generated inside the payload.

The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream.

NAL units in byte-stream format use

E.g., H.320 and MPEG-2/H.222.0 systems require delivery of the entire or partial NAL

unit stream as an ordered stream of bytes or bits.

Each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix.

payload

NAL units in packet-transport system use

E.g., internet protocol/RTP systems The inclusion of start code prefixes in

the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes.

payload

VCL and no-VCL NAL units

VCL NAL units The data that represents the values of the

samples in the video pictures Non-VCL NAL

Any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures).

Parameter Sets (1)

A parameter set is supposed to contain information that is expected to rarely change and offers the decoding of a large number of VCL NAL units.

Parameter Sets (2)

Two types of parameter sets: Sequence parameter sets

Apply to a series of consecutive coded video pictures called a coded video sequence;

Picture parameter sets Apply to the decoding of one or more

individual pictures within a coded video sequence.

Parameter Sets (3)The Structure

VCL NAL unit

Picture parameter setIdentifier to Picture parameter set

Sequence parameter setIdentifier to Sequence parameter set

Non VCL NAL unit

Parameter Sets (4)Transmission

VCL NAL unitNon VCL NAL unit

Non VCL NAL unit

VCL NAL unit

In-band

Out of band

Parameter set use with reliable “out-of-band” parameter set exchange

H.264/AVC Encoder NAL unit with VCL Data encoded with PS#3 (address in Slice Header ) H.264/AVC Decoder

1 2 3 3 2 1Reliable Parameter Set Exchange

Parameter Set #3•Video format PAL•Entr. Code CABAC•…

Access Units

A set of NAL units in a specified form is referred to as an access unit.

access unit delimiter

SEI

primary coded picture

redundant coded picture

end of sequence

end of stream

end

start

Supplemental Supplemental Enhancement Enhancement InformationInformation

VCL NAL unitsVCL NAL unitsslices slices or slice data partitionsor slice data partitions

Coded Video Sequences

A coded video sequence consists of a series of access units that are sequential in the NAL unit stream and use only one sequence parameter set.

Can be decoded independently Start with an instantaneous decoding

refresh (IDR) – Intra picture. A NAL unit stream may contain one or

more coded video sequences.

MotionEstimation

FrameMemory

MotionCompensation

DCT Q

IQ

IDCT

Clipping

VLC- output

bitstream

inputvideo

De-blockingFilter

Intra-Prediction

Intra / inter

VCL (Video Coding Layer)

Decoder

16×16macroblocks

output video

YCbCr Color Space and 4:2:0 Sampling

Pictures, Frames, and Fields

∆t

Interlaced Frame (Top Field First)

ProgressiveFrame

TopField

BottomField

Slices and Slice Groups (1)

Slice #0

Slice #1

Slice #2

Subdivision of a picture into slices when not using FMO.(Flexible Macroblock Ordering)

Slices and Slice Groups (2)

Slice Group #2

Subdivision of a QCIF frame into slices utilizing FMO.

Slice Group #1

Slice Group #0 Slice Group #0

Slice Group #1

Slice coding types

I Slice P Slice B Slice SP Slice

Switching P slice efficient switching between different pre-coded pic

tures becomes possible. SI Slice

Switching I slice Allowing an exact match of a macroblock in an SP s

lice for random access and error recovery purposes.

Adaptive Frame/Field Coding Operation

Three modes can be chosen adaptively for each frame in a sequence. Frame mode Field mode Frame mode / Field coded

For a frames consists of mixed moving regions The frame/field encoding decision can be made for

each vertical pair of macroblocks (a 16×32 luma region) in a frame.

Macroblock-adaptive frame/field (MBAFF)

Picture-adaptive frame/field (PAFF)16% ~ 20% save over frame-onlyfor ITU-R 601 “Canoa”, “Rugby”, etc.

Macroblock-adaptive frame/field (MBAFF)

A Pair of Macroblocksin Frame Mode

Top/Bottom Macroblocksin Field Mode

PAFF vs MBAFF

The main idea of MBAFF is to preserve as much spatial consistency as possible.

In MBAFF, one field cannot use the macroblocks in the other field of the same frame as a reference for motion prediction.

PAFF coding can be more efficient than MBAFF coding in the case of rapid global motion, scene change, or intra picture refresh.

MBAFF was reported to reduce bit rates 14 ~ 16% over PAFF for ITU-R 601 (Mobile and Calendar, MPEG-4 World News)

Intra-Frame Prediction (1)

Intra 4×4 Well suited for coding of parts of a picture with sign

ificant detail. Intra_16×16 together with chroma prediction

More suited for coding very smooth areas of a picture.

4 prediction modes I_PCM

Bypass prediction and transform coding and send the values of the encoded samples directly


In H.263+ and MPEG-4 Visual Intra prediction is conduced in the

transform domain In H.264/AVC

Intra prediction is always conducted in the spatial domain



Across slice boundaries is not allowed.

Inter-Frame Prediction in P slices (1) Segmentations of the macro

blockMB TypesMB Types

8x8 Types8x8 Types

16

16 16

8 88

8 8

88 816

8

4 84

4 4

4

48

8

www.vcodex.com H.264 / MPEG-4 Part 10 : Inter Prediction H.264 / MPEG-4 Part 10 : Inter Prediction

*P_Skip

Inter-Frame Prediction in P slices (2) The accuracy of motion compensation

C D

A B

E

K L M N O P

F G H I J

T U

R S

cc dd ee ff

aa

bb

gg

hh

ba ce f gi j kp q r

dhn

m

s

b1=(E-5F+20G+20H-5I+J)h1=(A-5C+20G+20M-5R+T)

b=(b1+16) >> 5h=(h1+16) >> 5----------j1=cc-5dd+20h1+20m1-5ee+ff

j = (j1+512) >>10----------a=(G+b+1) >>1

e=(b+h+1) >> 1

clipped to0~255

clipped to0~255

Inter-Frame Prediction in P slices (3) Multiframe motion-compensated predic

tion

∆=1

∆=2∆=4

CurrentPicture

4 Prior Decoded PicturesAs Reference

Inter-Frame Prediction in B slices

Other pictures can reference pictures containing B slices

Weighted average of two distinct motion-compensated prediction

Utilizing two distinct lists of reference pictures (list0, list1)

4 prediction types list0, list1, bi-predictive, direct prediction, B_Skip

For each partition, the prediction type can be chosen separately.

Transform, Scaling, and Quantization(1)

4×4 DCT Integer transform matrix

H =

1 1 1 12 1 -1 -21 -1 -1 11 -2 2 -1


Repeated transforms

Intra_16×16, chroma intra modes are intend coding for smooth areas

The DC coefficients undergo a second transform with the results that we have transform coefficients covering the whole macroblock

0 1

2 3

00 01

10 11

Repeat transform for chroma blocks

indices correspond to the indices of2×2 inverse Hadamard transform


52 values An increase of 1in quantization parameter

means an increase of quantization step size by approximately 12% (an increase of 6 means an increase of quantization step size by exactly a factor of 2)

A change of step size by approximately 12% also means roughly a reduction of bit rate by approximately 12%


Scanning order Zig-zag scan For 2×2 DC coefficients of the chroma component

Raster-scan order All inverse transform operations in H.264/AVC

can be implemented using only additions and bit-shifting operations of 16-bit integer values

Only 16-bit memory accesses are needed for a good implementation of the forward transform and quantization process in the encoder

Entropy Coding

Two methods of entropy coding are suppoted An exp-Golomb code - A a single infinite-ex

tent codeword table for all syntax elements

For transmitting the quantized transform coefficients

Context-Adaptive Variable Length Coding (CAVLC)

CAVLC (1)

The number of nonzero quantized coefficients (N) and the actual size and position of the coefficients are coded separately

7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0.

1) Number of Nonzero Coefficients (N) and “Trailing 1s”

T1s = 2, N=5,

These two values are coded as a combined event. One out of 4 VLC tables is used based on the number of coefficients in neighboring blocks.

CAVLC (2)

7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0.

2) Encoding the Value of Coefficients

For T1s, only sign need to be coded.Coefficient values are coded in reverse order:-2, 6, …

A starting VLC is used for -2, and a new VLC may be used based on the just coded coefficient. In this way adaptation is obtained in the use of VLC tables, Six exp-Golomb code tales are available for this adaptation.

CAVLC (3)

7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0.

3) Sign Information

For T1s, this is sent as single bits. For the other coefficients, the sign bit is included in the exp-Golomb codes

CAVLC (4)

4) TotalZeroes The number of zeros between the last nonzero coefficient of the scan and its start. TotalZeroes = 3

N=5, => the number must in the range 0-11, 15 tables are available for N in the range 1-15. (If N=16 there is no zero coefficient.)

7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0.

5) RunBefore In this example it must be specified how the 3 zeros are distributed. The number of 0s before the last coefficient is coded. 2, => range:0-3 => a suitable VLC is used. 1, => range:0-1

CAVLC vs CABAC

To efficiency of entropy coding can be improved further if the Context-Adaptive Binary Arithmetic Coding (CABAC) is used.

Compared to CAVLC, CABAC typically provides a reduction in bit rate between 5%~15%.

The highest gains are typically obtained when coding interlaced TV signals.

In-Loop Deblocking filter

p0

p1

p2

q0 q1

q2

4×4 block edge

Time to apply deblocking filter

.For p0 and q0:1. |p0-q0|<α(QP)2. |p1-p0|<β(QP)3. |q1-q0|<β(QP)

.For p1 and q1:|p2-p0|<β(QP) or |q2q0|< β(QP)

*The filter reduces the bit rate typically by 5%~10%

Hypothetical Reference Decoder

In H.264/AVC HRD specifies operation of two buffers: The coded picture buffer (CPB)

Modeling the arrival and removal time of the coded bits.

The decoded picture buffer (DPB) Similar in spirit to what MPEG-2 had,

but is more flexible in support at a variety of bit rates without excessive delay.

Profiles and Levels

Baseline, Main, and Extended Baseline supports all features in H.264/

AVC except: Set 1: B slices, weighted prediction, CABAC,

field coding, and picture or macroblock adaptive switching between frame and field coding.

Set 2: SP/SI slices, and slice data partitioning.

H.264/AVC Profiles

BaselineMain

Extended

Set 1

Set 2

CAVLC

FMO, ASO,redundant pictures

Conclusions

Some important differences relative to prior standards. Enhanced motion-prediction capability Use of a small block-size exact –match transform Adaptive in-loop deblocking filter Enhanced entropy coding methods

When used well together, a approximately 50% bit rate savings for equivalent perceptual quality relative to the performance of prior standards.

Overview of the H.264/AVC Video Coding Standard ThomasWiegand, Gary J. Sullivan, Gisle Bj ø ntegaard, and Ajay Luthra IEEE TRANSACTIONS ON CIRCUITS AND.

Documents

Overview of the H.264/AVC Video Coding Standard ThomasWiegand, Gary J. Sullivan, Gisle Bj ø ntegaard, and Ajay Luthra IEEE TRANSACTIONS ON CIRCUITS AND.