Electrical Engineering National Central University Video-Audio Processing Laboratory Overview of H.264/AVC 2003.9.x M.K.Tsai
Dec 20, 2015
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Overview of H.264/AVC
2003.9.x
M.K.Tsai
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
OutlineOutline Abstract
Applications Network Abstraction Layer,NAL Conclusion—(I) Design feature highlight Conclusion—(II) Video Coding Layer,VCL Profile and potential application Conclusion—(III)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
abstractabstractH.264/AVC is newest video coding standardMain goals have been enhanced compression and provisi
on of “network-friendly” representation addressing “conver
sational”(video telephony) and “nonconversational” (stora
ge,broadcast, or streaming) applicationH.264/AVC have achieved a significant improvement in ra
te-distortion efficiencyScope of standardization is illustrated below
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
applicationsapplicationsBroadcast over cable, cable modem …Interactive or serial storage on optical and DVD …Conversational service over LAN, modem …Video-on-demand or streaming service over
ISDN,wireless network …Multimedia message service (MMS) over DSL, mobile
network …
How to handle the variety of applications and networks ?
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
applicationsapplicationsTo address this need for flexibility and customizability, the
H.264/AVC design VCL and NAL, structure of H.264/AVC
encoder is shown below
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
applicationsapplicationsVCL(video coding layer), designed to efficiently represent
video contentNAL(network abstraction layer), formats the VCL
representation of the video and provides header
information in a manner appropriate for conveyance by a
variety of transport layers or storage media
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction LayerTo provide “network friendliness” to enable simple and
effective customization of the use of the VCLTo facilitate the ability to map H.264/AVC data to
transport layers such as : RTP/IP for kind of real-time Internet services File formats,ISO MP4 for storage H.32X for conversational services MPEG-2 systems for broadcasting services
The design of the NAL anticipates a variety of such
mappings
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction LayerSome key concepts of the NAL are NAL units, byte strea
m, and packet format uses of NAL units, parameter sets a
nd access units … NAL units
a packet that contains an integer number of bytes- First byte is header byte containing indication of type of data
- Remaining byte contains payload data
- Payload data is interleaved as necessary with emulation preventi
on bytes, preventing start code prefix from being generated insid
e payload Specifies a format for use in both packet- and bitstream- orient
ed transport system
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction Layer NAL units in Byte-Stream format use
byte stream format Each is prefixed by a unique start code to identify the boundary Some systems require delivery of NAL unit stream as ordered
stream of bytes (like H.320 and MPEG-2/H.220)
NAL units in packet-transport system use Coded data is carried in packets framed by system transport
protocol Can be carried by data packets without start code prefix In such system, inclusion of start code prefixes in data would
be waste
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction Layer VCL and Non-VCL NAL units
VCL NAL units contain data represents the values of the
samples in video pictures Non- VCL NAL units contain extra data like parameter sets
and supplemental enhancement information (SEI)- parameter sets, important header data applying to large number
of VCL NAL units
- SEI, timing information and other supplemental data enhancing
usability of decoded video signal but not necessary for decoding
the values in the picture
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction Layer Parameter sets
Contain information expected to rarely change and offers the
decoding of a large number of VCL NAL units Divided into two types
- Sequence parameter sets, apply to series of consecutive coded
video picture
- Picture parameter sets, apply to the decoding of one or more
individual picture within a coded video sequence The above two mechanisms decouple transmission of
infrequently changing information Can be sent well ahead of the VCL NAL units and repeated to
provide robustness against data loss
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction Layer Parameter sets
Can be sent well ahead of the VCL NAL units and repeated to
provide robustness against data loss Small amount of data can be used (identifier) to refer to a
larger amount of of information (parameter set) In some applications, these may be sent within the channel
(termed “in-band” transmission)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction Layer Parameter sets
In other applications, it can be advantageous to convey
parameters sets “out of band” using reliable transport
mechanism
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction Layer Access units
The format of access unit is shown below
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction Layer Access units
Contains a set of VCL NAL units to compose a primary coded
picture Prefixed with an access unit delimiter to aid in locating the
start of the access unit SEI contains data such as picture timing information Primary coded data consists of VCL NAL units consisting of
slices that represent the sample of the video Redundant coded picture are available for use by decoder in
recovering from loss of data
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction Layer Access units
For the last coded picture of video sequence, end of sequence
NAL unit is present to indicate the end of sequence For the last coded picture in the entire NAL unit stream, end of
stream NAL unit is present to indicate the stream is ending Decoder are not required to decode redundant coded pictures
if they are present Decoding of each access unit results in one decoded picture
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Network Abstraction LayerNetwork Abstraction Layer Coded video sequences
Consists of a series of access unit and use only one sequence
parameter set Can be decoded independently of other coded video
sequence ,given necessary parameter set Instantaneous decoding refresh(IDR) access unit is at the
beginning and contains intra picture Presence of IDR access unit indicates that no subsequent
picture will reference to picture prior to intra picture
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Conclusion—(I)Conclusion—(I)H.264/AVC represents a number of advances in standard
video coding technology in term of flexibility for effective
use over a broad variety of network types and application
domain
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightVariable block-size motion compensation with small block
size With minimum luma block size as small as 4x4 The matching chroma is half the length and width
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightQuarter-sample-accurate motion compensation
Half-pixel is generated by using 6 tap FIR filter As first found in advanced profile of MPEG-4, but further
reduces the complexity
Multiple reference picture motion compensation Extends upon enhanced technique found in H.263++ Select among large numbers of pictures decoded and
stored in the decoder for pre-prediction Same for bi-prediction which is restricted in MPEG-2
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightDecoupling of reference order from display order
A strict dependency between ordering for referencing and
display in prior standard Allow encoder to choose ordering of pictures for referencing
and display purposes with a high degree of flexibility Flexibility is constrained by total memory capability Removal of restriction enable removing extra delay
associated with bi-predictive coding
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightMotion vector over boundaries
Motion vectors are allowed to point outside pictures Especially useful for small picture and camera movement
Decoupling of picture representation methods from picture
referencing capability Bi-predictively-encoded pictures could not be used as refere
nces in prior standard Provide the encoder more flexibility to use a picture for refer
encing that is closer to the picture being coded
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightWeighted prediction
Allow motion-compensated prediction signal to be weighted
and offset by amounts Improve coding efficiency for scenes containing fades
one grid means one pixel
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightImproved skipped and direct motion inference
In prior standard ,”skipped” area of a predictively-coded pict
ure can’t motion in the scene content ,which is detrimental f
or global motion Infers motion in “ skipped ” motion For bi-predictively coded areas ,improves further on prior dir
ect prediction such as H.263+ and MPEG-4.
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightDirectional spatial prediction for intra coding
Extrapolating edges of previously decoded parts of current
picture is applied in intra-coded regions of picture Improve the quality of the prediction signal Allow prediction from neighboring areas that were not intra-
coded
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightIn-the-loop deblocking filtering
Block-based video coding produce artifacts known as blocki
ng artifacts originated from both prediction and residual diffe
rence coding stages of decoding process Improvement in quality can be used in inter-picture predictio
n to improve the ability to predict other picture
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightIn addition to improved prediction methods coding efficiency
is also enhanced, including the following
Small block-size transform All major prior video coding standards used a transform
block size of 8x8 while new ones is based primarily on 4x4 Allow the encoder to represent the signal in a more locally-
adaptive fashion and reduce artifact
Short word-length transform Arithmetic processing 32-bit 16-bits
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightHierarchical block transform
Extend the effective block size for low-frequency chroma to
8x8 array and luma to 16x16 array
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightExact-match inverse transform
Previously transform was specified within error tolerance
bound due to impracticality of obtaining exact match to ideal
inverse transform Each decoder would produce slightly different decoded
video, causing “drift” between encoder and decoder
Arithmetic entropy coding Previously found as an optional feature of H.263 Use a powerful “Context-adaptive binary arithmetic
coding”(CABAC)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightContext-adaptive entropy coding
Both “CAVLC (context-adaptive variable length coding)” and
“CABAC” use context-based adaptivity to improve performa
nce
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightRobustness to data errors/losses and flexibility for operation
over variety of network environments is enable, including
the followingParameter set structure
Key information was separated for handling in a more
flexible and specialize manner Provide for robust and efficient conveyance header
information
Flexible slice size Rigid slice structure reduce coding efficiency by increasing
the quantity of header data and decreasing the
effectiveness of prediction in MPEG-2
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightNAL unit syntax structure
Each syntax structure in H.264/AVC is placed into a logical
data packet called a NAL unit Allow greater customization of the method of carrying the
video content in a manner for each specific network
Redundant pictures Enhance robustness to data loss Enable a representation of regions of pictures for which the
primary representation has been lost
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightFlexible macroblock ordering (FMO)
Partition picture into regions called slice groups, with each s
lice becoming independently decodable subset of a slice gr
oup Significantly enhance robustness by managing the spatial re
lationship between the regions that are coded in each slice
Arbitrary slice ordering (ASO) Enable sending and receiving the slices of the picture in any
order relative to each other as found in H.263+ Improve end-to-end delay in real time applications particular
ly for out-of-order delivery behavior
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightData partitioning
Allow the syntax of each slice to be separated into up to
three different partitions(header data, Intra-slice, Inter-slice,
partition), depending on a categorization of syntax elements
SP/SI synchronization/switching pictures Allow exact synchronization of the decoding process of
some decoder with an ongoing video Enable switching a decoder between video streams that use
different data rate, recover from data loss or error Enable switching between different kind of video streams,
recover from data loss or error
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightSP/SI synchronization/switching pictures
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Design feature highlightDesign feature highlightSP/SI synchronization/switching pictures
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
H.264/AVC represents a number of advances in standard
video coding technology in term of both coding efficiency
enhancement and flexibility for effective use over a board
variety of network types and application domain
Conclusion—(II)Conclusion—(II)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerPictures, Frames, and Fields
Picture can represent either an entire frame or a single field If two fields of a frame were captured at different time
instants the frame is referred to as a interlaced frame,
otherwise it is referred to as a progressive frame
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerYCbCr color space and 4:2:0 sampling
Y represents brightness Cb 、 Cr represents color deviates from gray toward blue an
d red
Division of the picture into macroblockSlices and slice groups
Slices are a sequence of macroblocks processed in the ord
er of a raster scan when not using FMO Some information from other slices maybe needed to apply
the deblocking filter across slice boundaries
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Picture may be split into one or more slices without FMO sh
own below
FMO modifies the way how pictures are partitioned into slic
es and MBs by using slice groups Slice group is a set of MBs defined by MB to slice group ma
p specified by picture parameter set and some information fr
om slice header
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Slice group can be partitioned into one or more slices, such
that a slice is a sequence of MBs within same slice group pr
ocessed in the order of raster scan By using FMO, a picture can be split into many macroblock
scanning patterns such as the below
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Each slice can be coding using different types
I slice - A slice where all MBs are coded using intra prediction
P slice- In addition to intra prediction, it can be coded with inter prediction
with at most one motion-compensated prediction B slice
- In addition to coding type of P slice, it can be coded with inter pre
diction with two motion-compensated prediction SP (switching P) slice
- Efficient switching between different pre-coded pictures SI (switching I) slice
- Allows exact match of a macroblock in an SP slice for random ac
cess and error recovery
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer If all slices in stream B are P-slices, decoder won’t have
correct reference frame, solution is to code frame as an I-slice
like below
I-slice result in a peak in the coded bit rate at each switching
point
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer SP-slices are designed to support switching without increased
bit-rate penalty of I-slices
Unlike “ normal ” P-slice, the subtraction occurs in transform d
omain
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer A simplified diagram of encoding and decoding processing for
SP-slices A2、 B2、 AB2 is shown (A’ means reconstructed
frame)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer
If stream A and B are versions of the same original sequence
coded at different bit-rates the SP-slice AB2 should be efficient
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer SP-slices is to provide random access and “VCR-like”
functionalities.(e.g decoder can fast-forward from A0 directly to
frame A10 by first decoding A0, then decoding SP-slice A0-10)
Second type of switching slice, SI-slice may be used to switch
from one sequence to a completely different sequence
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerEncoding and decoding process for macroblocks
All luma and chroma samples of a MB are either spatially or
temporally predicted Each color component of prediction is subdivided into 4x4 bl
ocks and is transformed using integer transform and then b
e quantized and encoded by entropy coding methods The input video signal is split into MBs, the association of M
Bs to slice groups and slices is selected An efficient parallel processing of MB is possible when there
are various slices in the picture
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerEncoding and decoding process for macroblocks
block diagram of VCL for a MB is in the following
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerAdaptive frame/field coding operation
For regions of moving objects or camera motion, two
adjacent rows show a reduced degree of dependency in
interlaced frames but progressive frames To provide high coding efficiency, H.264/AVC allows the
following decisions when coding a frame To combine two fields and code them as one single frame
(frame mode) To not combine the two fields and to code them as separated
coded fields (field mode) To combine the two fields and compress them as a single
frame, before coding them to split the pairs of the vertically
adjacent MB into pairs of two fields or frame MB
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer The three options can be made adaptively and the first two
can be is referred to as picture-adaptive frame/field (PAFF)
coding As a frame is coded as two fields, coded in ways similar to f
rame except the following Motion compensation utilizes reference fields rather frames The zig-zag scan is different Strong deblocking is not used for filtering horizontal edges of
MB in fields
A frame consists of mixed regions, it’s efficient to code the n
onmoving regions in frame mode, moving regions in field m
ode
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerA frame/field encoding decision can be made independe
ntly for each vertical pairs of MB. The coding option is ref
erred as macroblock-adaptive frame/field (MBAFF) codin
g. The below shows the MBFAA MB pair concept.
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerAn important distinction between PAFF and MBAFF is
that in MBAFF, one field can’t use MBs in other field of
the same frameSometimes PAFF coding can be more efficient than M
BAFF coding, particularly in the case of rapid global m
otion, scene change, intra picture refresh
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIntra frame prediction
In all slice coding type Intra_4x4 or intra_16x16 together wit
h chroma prediction and I_PCM prediction mode Intra_4x4 mode is based on 4x4 luma block and suited for si
gnificant detail of picture When using, each 4x4 block is predicted from the neighbori
ng samples like the below
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIntra frame prediction
4x4 block prediction mode Suited to predict textures with structure in the specified
direction except the “DC” mode prediction
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIntra frame prediction
In earlier draft, the four samples below L were also used for
some prediction modes. They are dropped due to the need
to reduce memory access Intra modes for neighboring 4x4 block are highly correlated.
For example, if previously-encoded 4x4 blocks A and B
were predicted mode 2, it’s likely that the best mode for
block C is also mode 2.
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIntra frame prediction
Intra_16x16 mode is suited for smooth areas of a picture When using this mode, it contains
vertical 、 horizontal 、 DC and plane prediction Plane prediction works well in areas of smoothly-varying
luminance
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIntra frame prediction
Chroma of MB is predicted by the similar prediction as Intra
_16x16(the same four modes) I_PCM mode allows the encoder to bypass the prediction a
nd transform coding process and instead directly send the v
alues of the encoded samples I_PCM mode server the following purposes
Allow the encoder to precisely represent the value of samples Provide a way to accurately represent the values of anomalou
s picture content Enable placing a hard limit on the number of bits, decoder mu
st handle for MB without harm to coding efficiency
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIntra frame prediction
Constrained intra coding mode allows prediction only from i
ntra-coded neighboring MBs Intra prediction across slice boundaries is not used Referring to neighboring samples of previously-coded block
s may incur error propagation in environments with transmis
sion error
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerInter frame prediction
In P slices Each P MB type is partitioned into partitions like the below
This method of partitioning MB is known as tree structure moti
on compensation
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerInter frame prediction
Choosing larger partition size means- Small number of bits are required to signal the choice of MV and
the type of partition
- Motion compensated residual contain a significant amount of ene
rgy in frame areas with high detail Choosing small partition size means
- Give a lower-energy residual after motion compensation
- Require larger number of bits to signal MV and type of partition The accuracy of motion compensation is in units of one quarte
r of the distance between two luma sample
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerInter frame prediction
Half-sample values are obtained by applying a one-
dimensional 6-tap FIR filter vertically and horizontally 6 tap interpolation filter is relatively complex but produces
more accurate fit to the integer-sample data and hence better
motion compensation performance Quarter-sample values are generated by averaging samples
at integer- and half-sample position
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer The above illustrates the half sample interpolation
1)1(
1)1(
10)5121(
)51201205(1
5)161(
5)161(
)520205(1
)520205(1
hbe
bGa
jj
ffeemhddccj
hh
bb
TRMGCAh
JIHGFEb
1)1(
1)1(
10)5121(
)51201205(1
5)161(
5)161(
)520205(1
)520205(1
hbe
bGa
jj
ffeemhddccj
hh
bb
TRMGCAh
JIHGFEb
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerInter frame prediction
The following illustrates the luma quarter-pel positions
a = round ((G+b)/2)
d = round ((G+h)/2)
e = round ((h+b)/2)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer The prediction for chroma component are obtained by bilinear
interpolation The displacements used for chroma have one-eighth sample p
osition accuracy
a = round([(8-dx)(8-dy)A + dx(8-dy)B + (8-dx)dyC + dxdyD]/64)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerInter frame prediction
Motion prediction using full,half,and one-quarter sample have
improvements than the previous standards for two reasons- More accurate motion representation
- More flexibility in prediction filtering Allows MV over picture boundaries No MV prediction takes place across slice boundaries Motion compensation for smaller regions than 8x8 use the
same reference index for prediction of all blocks within 8x8
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerInter frame prediction
Choice of neighboring partitions of same and different size are
shown below
- For transmitted partitions, excluding 16x8 and 8x16 partition size
s: MVp is the median of the MV for partitions A,B,C
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer- For 16x8 partitions: MVp for the upper 16x8 partition is predicted
from B, MVp for the lower 16x8 partition is predicted from A
- For 8x16 partitions: MVp for the left 8x16 partition is predicted fro
m A, MVp for the right 8x16 partition is predicted from C
- For skipped macroblocks: a 16x16 vector MVp is generated as in
case(1) above
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer P MB can be coded in P_Skip type useful for large areas with
no change or constant motion like slow panning can be
represented with very few bits Support multi-picture motion-compensation like below
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer In B slices
Intra coding are also supported Four other types are supported : list 0, list 1, bi-predictive, and
direct prediction For bi-predictive mode, the prediction signal is formed by a
weighted average of motion-compensation list 0 and list 1
prediction signal The direct mode can be list 0 or list 1 prediction or bi-
predictive Support multi-frame motion compensation
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerTransform, scaling, quantization
Transform is applied to 4x4 block Instead of DCT, a separated integer transform with similar
properties as DCT is used Inverse transform mismatches are avoided At encoder, transform, scanning, scaling, and rounding as
quantization followed by entropy coding At decoder, process of inverse encoding is performed
except for the rounding Inverse transform is implemented using only additions and
bit-shifting operations of 16 bit
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Several reasons for using smaller size transform
Remove statistical correlation efficiently Have visual benefits resulting in less noise around edges Require less computations amd a smaller processing word-len
gth
Quantization parameter(QP) can take 52 values
Qstep double in size for every increment of 6 in QP
With increasing 1 of QP means increasing 12.5% Qstep
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Wide range of quantizer step size make it possible for enco
der to control the trade-off between bit rate and quality accu
rately and flexibly The values of QP may be different from luma and chroma.
QPchroma is derived from QPY by user-defined offset
4x4 luma DC coefficient and quantization (16x16 intra mode
only ) The DC coefficient of each 4x4 block is transformed again u
sing 4x4 Hadamard transform In a intra-coded MB, much energy is concentrated in the DC
coefficients and this extra transform helps to de-correlate th
e 4x4 luma DC coefficients
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer 2x2 chroma DC coefficient transform and quantization, as w
ith Intra luma DC coefficients, the extra transform help to d
e-correlate the 2x2 chroma DC coefficients and improve co
mpression performance The complete process
Encoding : Input : 4x4 residual samples : Forward “core” transform :( followed by forward transform for Chroma DC or Intra-16 Luma
DC coefficients) Post-scaling and quantization :( modified for Chroma DC or Intra-16 Luma DC)
qbitsstepQ
PFWZ
2
Xt
ff XCCW
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerDecoding :( inverse transform for chroma DC or intra-16 luma DC coefficient
) Re-scaling ( incorporating inverse transform pre-scaling ) :
(modified for chroma DC or Intra-16 Luma DC coefficients) Inverse “core” transform : Post-scaling :
Output : 4x4 residual samples :
64...' PFQZW step
iTi CWCX ''
)64/'('' XroundX
''X
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Flow chart
An additional 2x2 transform is also applied to DC coefficient
s of the four 4x4 blocks of chroma
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerEntropy coding
Simpler method use a single infinite-extent codeword table f
or all syntax elements except residual mapping of codeword table is customized according to data
statistics Codeword table chosen is an exp-Golomb code with simple
and regular decoding property In CAVLC, VLC tables for various syntax elements are switc
hed depending on already transmitted syntax elements In CAVLC, number of non-zero quantized coefficient and ac
tual size and position of the coefficients are coded separatel
y
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerEntropy coding
VLC tables are designed to match the corresponding conditi
oned statistics CAVLC encoding of a block of transform coefficients procee
ds as follows Encode number of non-zero coefficients and “ trailing 1s ”
- Encode total number of non-zero coefficients(TotalCoeffs) and tr
ailing +/-1 values(T1) coeff_token
- TotalCoeffs:0~16 ,T1:0~3
- There are 4 look-up tables for coeff_token (3 VLC and 1 FLC) Encode the sign of each T1
- Coded in reverse order, starting with highest-frequency
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerEntropy coding
Encoding levels of remaining non-zero coefficients- Coded in reverse order
- There are 7 VLC tables to choose from
- Choice of table adapts depending on magnitude of coded level Encode total number of zeros before last coefficient
- TotalZeros is sum of all zeros preceding the highest non-zero co
efficient in the reorder array
- Coded with a VLC Encode each run of zeros
- Encoded in reverse order
- Chosen depending on ZerosLeft 、 run_before
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerEntropy coding
example
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerEntropy coding
example
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerEntropy coding
example
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer In CABAC, it allows assignment of a non-integer number of
bits to each symbol of an alphabet Usage of adaptive codes permits adaptation to non-
stationary symbol statistics Statistics of already coded syntax elements are used to
estimate conditional probabilities used for switching several
estimated models Arithmetic coding core engine and its associated probability
estimation are specified as multiplication-free low
complexity methods using only shift and table look-ups
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Coding a data symbol involves the following stages (take M
VDx) Binarization
- For |MVDX|<9 it’s carried out by following table, larger values are
by Exp-Golomb codeword
the first bit is bin 1,second bit is bin 2
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Coding a data symbol involves the following stages (take M
VDx) Context model selection
- It’s by following table
Arithmetic encoding- Selected context model supplies two probability estimates (1 and
0) to determine sub-range the arithmetic coder uses
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Coding a data symbol involves the following stages (take M
VDx) Probability update
- The value of bin 1 is “0”, the frequency count of “0” is incremente
d
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIn-loop deblocking filter
Applied between inverse transform and reconstruction of MB Particular characteristics of block-based coding is the accide
ntal production of visible block structures Block edges are reconstructed with less accuracy than interio
r pixels and “blocking” is most visible artifacts It has two benefits
Block edges are smoothed Resulting in a smaller residuals after prediction
In adaptive filter, strength of filtering is controlled by several s
yntax elements
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIn-loop deblocking filter
Basic idea is that if a relatively larger absolute difference bet
ween samples near a block edge is measured , it is quite lik
ely a blocking artifact and should be reduced If magnitude of difference is large and can’t be explained by
coarse quantization, it’s likely actual behavior of picture Filtering is applied 4x4 block
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIn-loop deblocking filter
Filtering is applied 4x4 block
Choice of filtering outcome depends on boundary strength a
nd gradient of image across boundary
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerIn-loop de-blocking filter
Boundary strength Bs is chosen according to following table
Filter implementation Bs {1,2,3} : a 4-tap linear filter is applied Bs {4} : 3 、 4 、 5-tap linear filter may be used
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Below shows principle using one dimensional edge
Samples p0 and q0 as well as p1 and q1 are filtered is
determined using quantization parameter (QP) dependent
thresholds α(QP) and β(QP), β(QP) is smaller than α(QP)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer Filtering of p0 and q0 takes place if each of the below is satisfied
1. |p0 – q0| < α(QP)
2. |p1 – p0| < β(QP)
3. |q1 – q0| < β(QP)
Filtering of p1 and q1 takes place if the below is satisfied
1. |p2 – p0| < β(QP)
or 2. |q2 – q0| < β(QP)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer
Foreman.qcif 10 Hz Foreman.cif 30 Hz
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding LayerHypothetical reference decoder (HRD)
For a standard, it’s not sufficient to provide a coding
algorithm It’s important in real-time system to specify how bits are fed
to a decoder and how the decoded pictures are removed
from decoder Specifying input and output buffer models and developing
an implementation independent model of receiver called
HRD Specifies operation of two buffers
Coded picture buffer (CPB) Decoded picture buffer (DPB)
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Video Coding LayerVideo Coding Layer CPB models arrival and removal time of the coded bits HRD is more flexible in support of sending video at variety
of bit rates without excessive delay HRD specifies DPB model management to ensure that
excessive memory capability is not needed
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Profile and potential applicationProfile and potential applicationProfiles
Three profiles are defined, which are Baseline, Main, and
Extended profiles. Baseline support all features except the following
B slice, weighted prediction, CABAC, field coding, and picture
or MB adaptive switching between frame/field coding SP/SI slices, and slices data partition
Main profile supports first set of above but FMO, ASO, and
redundant pictures Extended profile supports all features of baseline and the
above both set except for CABAC
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Profile and potential applicationProfile and potential applicationAreas for profiles of new standard to be used
A list of possible application areas is list below Conversational services
- H.323 conversational video services that utilize circuit–switched
ISDN-based video conference
- H.323 conversational services over internet with best effort
IP/RTP protocols Entertainment video applications
- Broadcast via satellite, cable or DSL
- DVD for standard
- VOD(video on demand) via various channels
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Profile and potential applicationProfile and potential application Streaming services
- 3GPP streaming using IP/RTP for transport and RSTP for
session setup
- Streaming over wired Internet using IP/RTP protocol and RTSP
for session Other services
- 3GPP multimedia messaging services
- Video mail
Electrical Engineering
National Central University
Video-Audio Processing Laboratory
Conclusion—(III)Conclusion—(III)Its VCL design is based on convectional block-based
hybrid video coding concepts, but with some differences
relative to prior standard, they are illustrated below Enhanced motion-prediction capability Use of a small block-size exact-match transform Adaptive in-loop de-blocking filter Enhanced entropy coding methods