Top Banner
EE 5359 Project Report Spring 2008 Study and Comparison of MPEG 2 and H.264 main profiles and available transcoding methods
51
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Report

EE 5359

Project Report

Spring 2008

Study and Comparison of MPEG 2 and H.264 main profiles and available transcoding methods

Priyanka Ankolekar1000 51 4497

Page 2: Report

List of Acronyms

AVC: Advanced video coding

CABAC: Context-based adaptive binary arithmetic coding

CAVLC: Context-based adaptive variable length coding

DCT: Discrete Cosine Transform

GOP: Group of pictures

HDTV: High Definition Television

IDCT: Inverse DCT.

IQ: Inverse quantization:

ISO: International Organization for Standardization.

ITU: International telecommunication union.

JVT: Joint Video Team

M.E: Motion Estimation

M.C: Motion Compensation

MB: Macroblock

MV: Motion vector.

NAL: Network abstraction layer

QP: Quantization parameter.

VLC: Variable length coding

VLD: Variable length decoding

VCL: Video coding layer

VCEG: Video coding experts group.

2

Page 3: Report

Abstract

There is a high demand for multimedia applications like digital video recording and teleconferencing. This has led to the development of various video coding standards like MPEG-2 and H.264. The video coding layer of H.264 is superficially similar to that of MPEG-2; however, there are several differences in the details. In this project the MPEG-2 and H.264 video coding standards are compared with a concentration on the main profiles. H.264 gives a better compression performance than MPEG-2. However, MPEG-2 has already been widely used in the field of digital broadcasting, HDTV and DVD applications. This incompatibility problem between H.264 video source and the existing MPEG-2 decoders can be solved using transcoders. This project also discusses the criteria for efficient transcoding and a few transcoding architectures.

3

Page 4: Report

1 IntroductionDevelopment of the international video coding standards such as MPEG-2 [7] [11][17][16] boosted a diverse range of multimedia applications, including digital video recording and teleconferencing. As a result of the growing demand for better compression performance, advanced standards such as H.264 [1][2][6][9][18] were developed by the ITU-T-ISO/IEC Joint Video Team (JVT) in 2003. The overall scheme of the video coding layer (VCL) of H.264 is superficially similar to the encoding scheme of MPEG-2. However, there are significant differences in the details. In this project the MPEG-2 and H.264 video coding standards are compared, i.e. the similarities and differences are studied, with a concentration on the main profiles.

H.264 can support various applications such as video broadcasting, video streaming and video conferencing over fixed and wireless networks and over different transport protocols. However, MPEG-2 has already been widely used in the field of digital broadcasting, HDTV and DVD applications. The incompatibility problem between H.264 video source and the existing MPEG-2 decoders can be solved by using transcoders. In this project, the criteria for transcoding and a few transcoding architectures are discussed.

The report has been structured in the following manner: Chapter 1 is an introduction to the topic and explains the scope of the project. Chapter 2 explains the various aspects of the MPEG-2 video coding standard while Chapter 3 covers the same for H.264 video coding standard. Chapter 4 shows a comparison between the two standards. In Chapter 5, the topic of MPEG-2 to H.264 transcoding is covered in greater detail.

4

Page 5: Report

2 MPEG-2MPEG-2 is widely used as the format of digital television signals that are broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. It also specifies the format of movies and other programs that are distributed on DVD and similar disks. As such, TV stations, TV receivers, DVD players, and other equipment are often designed to this standard. MPEG-2 was the second of several standards developed by the Moving Pictures Expert Group (MPEG) and is an international standard (ISO/IEC 13818). [16]

The video section, part 2 of MPEG-2, is similar to the previous MPEG-1 standard, but also provides support for interlaced video; the format used by analog broadcast TV systems. MPEG-2 video is not optimized for low bit-rates, especially less than 1 Mbit/s at standard definition resolutions. However, it outperforms MPEG-1 at 3 Mbit/s and above. MPEG-2 is directed at broadcast formats at higher data rates of 4 Mbps (DVD) and 19 Mbps (HDTV). All standards-compliant MPEG-2 video decoders are fully capable of playing back MPEG-1 video streams. MPEG-2/video is formally known as ISO/IEC 13818-2 and as ITU-T Rec. H.262 [21].

2.1 MPEG-2 Profiles and Levels

MPEG-2 video supports wide range of applications from mobile to high quality HD editing. For many applications, it is unrealistic and too expensive to support the entire standard. To allow such applications to support only subsets of it, the standard defines profile and level. [21]

2.1.1 Description

MPEG-2 video is a family of systems, each having an arranged degree of commonality and compatibility. It allows four source formats, or ‘Levels’, to be coded, ranging from Limited Definition (about today’s VCR4 quality), to full HDTV5 – each with a range of bit rates [22]. The level defines the subset of quantitative capabilities such as maximum bit rate, maximum frame size, etc [16].

In addition to this flexibility in source formats, MPEG-2 allows different ‘Profiles’. Each profile offers a collection of compression tools that together make up the coding system. A different profile means that a different set of compression tools is available. [22]

5

Page 6: Report

2.1.2 MPEG-2 Profiles

2.1.2.1 Simple Profile

This profile has the fewest tools. The Simple profile offers the basic toolkit for MPEG-2 encoding. This is intra and predicted frame encoding and decoding with a color sub sampling of YUV 4:2:0.

2.1.2.2 Main Profile

This profile has all the tools of the Simple Profile plus one more (termed bi-directional prediction). It gives better (maximum) quality for the same bit-rate than the Simple Profile. A Main Profile decoder decodes both Main and Simple Profile-encoded pictures. This backward compatibility pattern applies to the succession of profiles. A refinement of the Main Profile, sometimes unofficially known as Main Profile Professional Level or MPEG 422, allows line-sequential color difference signals (4:2:2) to be used, but not the scaleable tools of the higher Profiles.

2.1.2.3 SNR Scalable Profile and Spatially Scalable Profile

The two Profiles after the Main Profile are, successively, the SNR Scaleable Profile and the Spatially Scaleable Profile. These add tools which allow the coded video data to be partitioned into a base layer and one or more ‘top-up’ signals. The top-up signals can either improve the noise (SNR Scalability) or the resolution (Spatial Scalability). These Scaleable systems may have interesting uses. The lowest layer can be coded in a more robust way, and thus provide a means to broadcast to a wider area, or provide a service for more difficult reception conditions. Nevertheless there will be a premium to be paid for their use in receiver complexity. Owing to the added complexity, none of the Scaleable Profiles is supported by digital video broadcasting (DVB). The inputs to the system are YUV component radio. However, the first four profiles code the color-difference signals line-sequentially.

2.1.2.4 High Profile

It includes all the previous tools plus the ability to code line-simultaneous colour-difference signals. In effect, the High Profile is a ‘super system’, designed for the most sophisticated applications, where there is no constraint on bit rate.

Table 1 is a tabulated form of the properties of the various MPEG-2 profiles.

6

Page 7: Report

Table 1. MPEG-2 Profiles

MPEG-2 Profiles[16]

Abbr. NamePicture Coding Types

Chroma Format

Aspect Ratios

Scalable modes

SP Simple profile I, P 4:2:0square pixels, 4:3, or 16:9

none

MP Main profile I, P, B 4:2:0square pixels, 4:3, or 16:9

none

SNRSNR Scalable profile

I, P, B 4:2:0square pixels, 4:3, or 16:9

SNR (signal-to-noise ratio) scalable

SpatialSpatially Scalable profile

I, P, B 4:2:0square pixels, 4:3, or 16:9

SNR- or spatial-scalable

HP High profile I, P, B4:2:2 or 4:2:0

square pixels, 4:3, or 16:9

SNR- or spatial-scalable

7

Page 8: Report

2.1.3 MPEG-2 Levels

2.1.3.1 Description of Levels

A level is the definition for the MPEG standard for physical parameters such as bit rates, picture sizes and resolutions. There are four levels specified by MPEG2: High level, High 1440, Main level, and Low level. MPEG-2 Video Main Profile and Main level has sampling limits at ITU-R 601 parameters (PAL and NTSC). Profiles limit syntax (i.e. algorithms) whereas Levels limit encoding parameters (sample rates, frame dimensions, coded bitrates, buffer size etc.). Together, Video Main Profile and Main Level (abbreviated as MP@ML) keep complexity within current technical limits, yet still meet the needs of the majority of applications. MP@ML is the most widely accepted combination for most cable and satellite systems; however different combinations are possible to suit other applications. [4]

Table 2 shows a comparison between the four MPEG-2 levels on the basis of the frame size (PAL/NTSC) and the maximum bit rate for each.

Table 2. MPEG-2 Levels [22]

8

Page 9: Report

2.2 MPEG-2 Encoder

Figure 1. MPEG 2 encoder [10]

The various blocks of the MPEG-2 encoder are explained below:

2.2.1 DCTThe MPEG-2 encoder uses 8x8 2-D DCT. In the case of intra frames, it is applied to 8x8 blocks of pels and in the case of inter frames it is applied to 8x8 blocks of the residual (motion compensated prediction errors). Since DCT is more efficient in compressing correlated sources, intra pictures DCT compress more efficiently than inter pictures.

2.2.2 Quantizer

The DCT coefficients obtained above are then quantized by using a default or modified matrix. User defined matrices may be downloaded and can occur in the sequence header or in the quant matrix extension header. The quantizer step sizes for DC coefficients of the luminance and chrominance components are 8, 4, 2 and 1 according to the intra DC precision of 8, 9, 10 and 11 bits respectively.

9

Page 10: Report

2.2.3 Motion estimation and compensation

Motion estimation and compensation: In the motion estimation process, motion vectors for predicted and interpolated pictures are coded differentially between macroblocks. The two motion vector components, the horizontal component first and then the vertical component are coded independently. The motion compensation process forms prediction from previously decoded pictures using the motion vectors that are of integer and half-pel resolutions.

2.2.4 Coding decisions

There are four different coding modes in MPEG-2. These modes are chosen based on whether the encoder encodes a frame picture as a frame or two fields or in the case of interlaced pictures it can chose to encode it as two fields or use 16x8 motion compensation.

2.2.5 Scanning and VLC

The quantized transform coefficients are scanned and converted to a one dimensional array. Two scanning methods are available:a. Zigzag scan (Figure 2(a)): For progressive (non-interlaced) mode processingb. Alternate scan (Figure 2(b)): For interlaced format video.

(a) (b)

Figure 2 (a) Zig Zag scan pattern (4x4) [4](b) Alternate scan pattern (4x4)

10

Page 11: Report

(a)

(b)

Figure 3. Scan matrices in MPEG-2 [20] (8x8) (a) Zigzag scan (b) Alternate scan

The list of values produced by scanning is then entropy coded using a variable length code (VLC).

11

Page 12: Report

2.3 MPEG-2 Decoder

Figure 4. MPEG 2 Decoder [7]

At the decoder side, the quantized DCT coefficients are reconstructed and inverse transformed to produce the prediction error. This predicted error is then added to the motion compensated prediction generated from previously decoded picture to produce the reconstructed output.

The various parts of the MPEG-2 decoder are:

2.3.1 Variable length decoding

This process involves the use of a table defined for decoding intra DC coefficients and three tables one each for non intra DC coefficients, intra AC coefficients and non intra AC coefficients. The decoded values basically infer one of three courses of action: end of block, normal coefficients and escape coding.

2.3.2 Inverse scan

The output of the variable length decoding stage is one dimensional and of length 64. Inverse scan process converts this one dimensional data into a two dimensional array of coefficients according to a predefined scan matrix.

2.3.3 Inverse quantization

At this stage the two dimensional DCT coefficients are inverse quantized to produce the reconstructed DCT coefficients. This process involves the rescaling

12

Page 13: Report

of the coefficients by essentially multiplying them by the quantizer step size. The quantizer step size can be modified by using either a weighing matrix or a scale factor. After performing inversion quantization, saturation and mismatch control operations are performed.

2.3.4 Inverse DCT

Once the reconstructed DCT coefficients are obtained, a 2D 8x8 inverse DCT is applied to obtain the inverse transformed values. These values are then saturated to keep them in the range of [-256:+255].

2.3.5 Motion Compensation

During this stage, predictions from previously decoded pictures are combined with the inverse DCT transformed coefficient data to get the final decoded output.

13

Page 14: Report

3 H.264

H.264/AVC [1][9] was developed by the JVT (Joint Video Team) to achieve MPEG-2 [7] quality compression at almost half the bit rate. H.264/AVC provides significant coding efficiency, simple syntax specifications, and seamless integration of video coding into all current protocols and multiplex architectures. H.264 supports various applications such as video broadcasting, video streaming, and video conferencing over fixed and wireless networks and over different transport protocols. [4]

H.264 video coding standard has the same basic functional elements as previous standards (MPEG-1, MPEG-2, MPEG-4 part 2, H.261, H.263) [23], i.e., transform for reduction of spatial correlation, quantization for bitrate control, motion compensated prediction for reduction of temporal correlation, entropy encoding for reduction of statistical correlation. However, in order to fulfill better coding performance, the important changes in H.264 occur in the details of each functional element by including intra-picture prediction, a new 4x4 integer transform, multiple reference pictures, variable block sizes and a quarter pel precision for motion compensation, a deblocking filter, and improved entropy coding. [1]

3.1 H.264 Profiles

Each Profile specifies a subset of entire bitstream of syntax and limits that shall be supported by all decoders conforming to that Profile. There are three Profiles in the first version: Baseline, Main, and Extended. Baseline Profile is to be applicable to real-time conversational services such as video conferencing and videophone. Main Profile is designed for digital storage media and television broadcasting. Extended Profile is aimed at multimedia services over Internet. Also there are four High Profiles defined in the fidelity range extensions[19] for applications such as content-contribution, content-distribution, and studio editing and post-processing : High, High 10, High 4:2:2, and High 4:4:4. High Profile is to support the 8-bit video with 4:2:0 sampling for applications using high resolution. High 10 Profile is to support the 4:2:0 sampling with up to 10 bits of representation accuracy per sample. High 4:2:2 Profile is to support up to 4:2:2 chroma sampling and up to 10 bits per sample. High 4:4:4 Profile is to support up to 4:4:4 chroma sampling, up to 12 bits per sample, and integer residual color transform for coding RGB signal. The Profiles have both the common coding parts and as well specific coding parts as shown in Figure 5. [1]

14

Page 15: Report

3.1.1Common Parts of All Profiles

3.1.1.1 I slice (Intra-coded slice)

This slice is coded by using prediction only from decoded samples within the same slice.

3.1.1.2 P slice (Predictive-coded slice)

This slice (Figure 6) is coded by using inter prediction from previously-decoded reference pictures, using at most one motion vector and reference index to predict the sample values of each block.

3.1.1.3 CAVLC (Context-based Adaptive Variable Length Coding)

This is used for entropy coding. After transform and quantization, the probability that the level of coefficients is zero or +/-1 is very high. CAVLC handles the zero and +/-1 coefficients as the different manner with the levels of coefficients. The total numbers of zero and +/-1 are coded. For other coefficients, their levels are coded.

3.1.2Baseline Profile

3.1.2.1 Flexible macroblock order

Macroblocks may not necessarily be in the raster scan order. The map assigns macroblocks to a slice group.

3.1.2.2 Arbitrary slice order

The macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture.

3.1.2.3 Redundant slice

This slice belongs to the redundant coded data obtained by same or different coding rate, in comparison with previous coded data of same slice.

15

Page 16: Report

Figure 5. The specific coding parts of the Profiles in H.264 [1].

3.1.3 Main Profile

3.1.3.1 B slice (Bi-directionally predictive-coded slice)

This slice (Figure 6) is coded by using inter prediction from previously-decoded reference pictures, using at most two motion vectors and reference indices to predict the sample values of each block.

3.1.3.2 Weighted prediction

This is a scaling operation performed by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice. A prediction signal p for B slice is obtained using different weights from two reference signals, r1 and r2.

Equation 1 [1]: p = w1 r1 + w2 r2

where w1 and w2 are weights.

16

Page 17: Report

3.1.3.3 CABAC (Context-based Adaptive Binary Arithmetic Coding)

This is used for entropy coding. It utilizes arithmetic coding in order to achieve good compression.

Figure 6. Illustration of temporal prediction (B and P slices)

3.1.4 Extended Profile

This profile includes all parts of Baseline Profile: flexible macroblock order, arbitrary slice order, and redundant slice. The other features of this profile are:

3.1.4.1 SP slice

The specially coded slice for efficient switching between video streams, similar to coding of a P slice.

3.1.4.2 SI slice

The switched slice, similar to coding of an I slice.

3.1.4.3 Data partition

The coded data is placed in separate data partitions, each partition can be placed in different layer unit.

17

Page 18: Report

3.1.4.4 B slice

H.264 generalizes the concept of bidirectional prediction and supports not only forward/backward prediction pairs but also forward/forward and backward/backward pairs.

3.1.4.5 Weighted prediction

All existing standards consider equal weights for reference pictures, i.e. a prediction signal is obtained by averaging using equal weights of reference signals. But gradual transitions from scene to scene need different weights. H.264 uses weighted prediction method for a macroblock of P slice or B slice.

3.1.5 High Profiles

High profiles include all parts of the Main Profile: B slice, weighted prediction, CABAC. The salient features of this profile are:

3.1.5.1 Adaptive transform block size

H.264 uses an adaptive transform block size, 4 x 4 and 8 x 8 (High Profiles only), whereas previous video coding standards used the 8 x 8 DCT. The smaller block size leads to a significant reduction in ringing artifacts. Also, the 4 x 4 transform has the additional benefit of removing the need for multiplications. [1]

3.1.5.2 Quantization scaling matrices

Different scaling according to specific frequency associated with the transform coefficients in the quantization process to optimize the subjective quality. The High Profiles support the perceptual-based quantization scaling matrices similar to those used in MPEG-2. The encoder can specify a matrix for scaling factor according to the specific frequency associated with the transform coefficient for use in inverse quantization scaling by the decoder. This allows optimization of the subjective quality according to the sensitivity of the human visual system which is less sensitive to the coded error in high frequency transform coefficients.

Table 3 shows a comparison between the baseline, extended, main and high profiles of H.264.

18

Page 19: Report

Table 3. Comparison chart for the various profiles of H.264 [18]

Baseline Extended Main High

I and P Slices Yes Yes Yes Yes

B Slices No Yes Yes Yes

SI and SP Slices No Yes No No

Multiple Reference Frames Yes Yes Yes Yes

In-Loop Deblocking Filter Yes Yes Yes Yes

CAVLC Entropy Coding Yes Yes Yes Yes

CABAC Entropy Coding No No Yes Yes

Flexible Macroblock Ordering (FMO) Yes Yes No No

Arbitrary Slice Ordering (ASO) Yes Yes No No

Redundant Slices (RS) Yes Yes No No

19

Page 20: Report

3.2 H.264 Encoder

Figure 7. H.264 encoder [9]

The encoder blocks are explained below:

3.2.1.1 4x4 Integer transform

The H.264 employs a 4x4 integer DCT as compared to 8x8 DCT adopted by the previous standards. The smaller block size leads to a significant reduction in ringing artifacts. Also, the 4 x 4 transform has the additional benefit of removing the need for multiplications.

3.2.1.2 Quantization and scan

The H.264 standard specifies the mathematical formulae of the quantization process. The scale factor for each element in each sub block varies as a function of the quantization parameter associated with the macroblock and as a function

20

Page 21: Report

of the position of the element within the sub block. The rate control algorithm controls the value of the quantization parameter. Two types of scan pattern are used for 4x4 blocks – one for frame coded macroblocks and one for field coded macroblocks.

3.2.1.3 Context-based adaptive variable length coding (CAVLC) and Context-based adaptive binary arithmetic coding (CABAC) entropy coding

H.264 uses different variable length coding methods in order to match a symbol to a code based on the context characteristics. They are context-based adaptive variable length coding (CAVLC) and context-based adaptive binary arithmetic coding (CABAC). All syntax elements except for the residual data are encoded by the Exp-Golomb codes. In order to read the residual data (quantized transform coefficients), zig-zag scan (interlaced) or alternate scan (non-interlaced or field) is used. For coding the residual data, a more sophistical method called CAVLC is employed. Also, CABAC is employed in Main and High profiles, CABAC has more coding efficiency but higher complexity compared to CAVLC.

3.2.1.4 Deblocking filter

H.264 employs a deblocking filter to reduce the blocking artifacts in the block boundaries and stops the propagation of accumulated coded noise. The filter is applied after the inverse transform (before reconstructing and storing the macroblock for future predictions) and in the decoder (before reconstructing and displaying the macroblocks). The deblocking filter is applied across the edges of the macroblocks and the sub-blocks. The filtered image is used in motion compensated prediction of future frames and helps achieve more compression.

Figure 8. Diagram depicting how the loop filter works on the edges of the blocks and sub-blocks [4]

21

Page 22: Report

3.2.1.5 Intra prediction

During intra prediction, the encoder derives a predicted block based on its prediction with previously decoded samples. The predicted block is then subtracted from the current block and then encoded. There are a total of nine prediction modes (Figure 9) for each 4x4 luma block, four prediction modes for each 16x16 luma block and four modes for each chroma block.

Figure 9. Intra prediction 4x4 [31]

3.2.1.6 Inter prediction

Inter prediction is performed on the basis of temporal correlation and consists of motion estimation and motion compensation. As compared to the previous standards, H.264 supports a large number of block sizes from 16x16 to 4x4. Moreover H.264 supports motion vector accuracy of one-quarter of the luma sample.

3.2.1.7 Reference pictures

Unlike the previous standards that just use the immediate previous I or P picture for inter prediction, H.264 has the ability to use more than one previous reference picture for inter prediction thus enabling the encoder to search for the best match for the current picture from a wider set of reference pictures than just the previously encoded one.

22

Page 23: Report

3.3 H.264 Decoder

Figure 10 shows the block diagram of a general H.264/MPEG-4 AVC decoder.

Figure 10. H.264 decoder [7]

It includes all the control information such as picture or slice type, macroblock types and subtypes, reference frames index, motion vectors, loop filter control, quantizer step size etc, as well as coded data comprising of quantized transform coefficients. The decoder of Figure 10 works similar to the local decoder at the encoder; a simplified description is as follows. After entropy (CABAC or CAVLC) decoding, the transform coefficients are inverse scanned and inverse quantized prior to being inverse transformed. To the resulting 4_4 blocks of residual signal, an appropriate prediction signal (intra or motion compensated inter) is added depending on the macroblock type mbtyp (and submbtype) mode, the reference frame, the motion vector/s, and decoded pictures store, or in intra mode. The reconstructed video frames undergo deblock filtering prior to being stored for future use for prediction. The frames at the output of deblocking filter may need to undergo reordering prior to display. [2]

23

Page 24: Report

4 Comparison between MPEG-2 and H.264

4.1 Key features of MPEG-2 video

The MPEG-2 coding standard has been designed to efficiently support both interlaced and progressive video coding and produce high quality standard definition video at about 4 Mbps. The MPEG-2 video standard uses a block-based hybrid transform coding algorithm that employs transform coding of the motion-compensated prediction error. While motion compensation exploits temporal redundancies, the DCT transform exploits the spatial redundancies. The asymmetric encoder-decoder complexity allows for a simpler decoder whilemaintaining high quality and efficiency through a more complex encoder. [3]

4.2 Key features of H.264 video

The H.264 video coding standard has been developed recently through the joint work of the ITU’s video coding experts group (VCEG) and ISO moving pictures experts group (MPEG). The H.264 video coding standard is flexible and offers a number of tools to support a range of applications with very low as well as very high bitrate requirements.

4.3 Comparison: Similarities and Differences between MPEG-2 video and H.264 video

In this section, the MPEG-2 and H.264 video coding standards are compared with respect to their various aspects such as bit rate, block size, macroblock size, intra prediction, motion estimation blocks, quantization, motion vector prediction, intra prediction amongst various other. Table 4 tabulates these comparisons systematically. They are also further elaborated in the sub sections below.

4.3.1 Increased efficiency

Compared with MPEG-2 video, the H.264 video format gives perceptually equivalent video at 1/3 to1/2 of the MPEG-2 bit rates. Some extensions--known as "Fidelity Range Extensions" facilitate higher-fidelity video coding by supporting higher bit-depths, including 10-bit and 12-bit encoding, and higher color resolution using the sampling structures YUV 4:2:2 and YUV 4:4:4. This naturally makes it attractive to video distributors, because it permits them to maximize the number of services that may be contained in a given amount of bandwidth [30].

24

Page 25: Report

The bit rate gains are not a result of any single feature but a combination of a number of encoding tools. These gains come with a significant increase in encoding and decoding complexity [27]. In spite of the increased complexity, the dramatic bandwidth savings encourages TV broadcasters to adopt the new technology as they can use the bandwidth savings to provide new channels or new data and interactive services. With the coding gains of H.264, full length HDTV resolution movies can be stored on DVDs. Further more, the fact that the same video coding format can be used to broadcast TV as well as for internet streaming.

4.3.2 Coding flexibility

ISO14496-10/H.264, like previous MPEG standards, does not define a specific encoder and decoder. Instead, it defines the syntax of an encoded bitstream and describes the method of decoding that bitstream. The implementation is left to the developer. [31]

The H.264 video uses the same hybrid coding approach that is used in the other MPEG video standards: motion compensated transform coding. The H.264 employs a hybrid coding approach similar to that of MPEG-2 but differs significantly from MPEG-2 in terms of the actual coding tools used. The main differences are: use of an integer transform with energy compaction properties similar to that of the DCT instead of the DCT, an in-loop deblocking filter (DF) to reduce block artifacts, and intra frame prediction (IFP). The coder control operation is responsible for functions such as reference frame management, coding mode selection, and managing the encoding parameter set. Besides, the H.264 standard introduces several other new coding tools that improve coding efficiency.

Multiple reference picture motion compensation uses previously encoded pictures more flexibly than does MPEG-2. In MPEG-2, a P-frame can use only a single previously coded frame to predict the motion compensation values for an incoming picture, while a B-frame can use only the immediately previous P- or I-frame and the immediately subsequent P- or I-frame.

H.264 permits the use of up to 32 previously coded pictures, and it supports more flexibility in the selection of motion compensation block sizes and shapes, down to the use of a luma compensation block as small as 4-by-4 pixels. H.264 also supports quarter-sample motion compensation vector accuracy, as opposed to MPEG-2's half-sample accuracy.

25

Page 26: Report

These refinements permit more precise segmentation of moving areas within the image, and more precise description of movement. Further, in H.264, the motion-compensated prediction signal may be weighted and offset by the encoder, facilitating significantly improved performance in fades (fades can be problematic for MPEG-2).

4.3.3 Deblocking filter

Block-based coding can generate blocking artifacts in the decoded pictures. In H.264, a de-blocking filter is brought within the motion-compensated prediction loop, so that this filtering may be used to predict an expanded number of pictures (Figure 8).

Switching slices, which permit a decoder to jump between bitstreams in order to smoothly change bit-rates or do stunt modes without requiring all streams to send an I-frame at the switch point (making the decoder's job easier at switch points), have been incorporated.

26

Page 27: Report

Table 4. Comparison between MPEG-2 and H.264

Algorithm Characteristic MPEG-2 H.264General Motion

compensated predictive, residual transformed, entropy coded

Same basic structure as MPEG

Block size 8x8 16x16, 8x16, 16x8, 8x8, 4x8, 8x4, 4x4

Macroblock size 16x16 (frame mode)16x8 (field mode)

16x16

Intra Prediction None Multi-direction, Multi-pattern

QuantizationScalar quantization with step size of constant increment

Scalar quantization with step size of increase at the rate of 12.5%

Entropy coding VLC CAVLC, CABAC

Weighted prediction No Yes

Reference picture One picture Multiple picturesMotion Estimation Blocks 16x16 16x16, 8x16, 16x8, 8x8,

4x8, 8x4, 4x4Motion vector prediction Simple Uses median and

segmentedEntropy Coding Multiple VLC Tables Arithmetic Coding and

adaptive VLC TablesFrame Distance for Prediction

+/- 1 Unlimited forward/backward

Fractional Motion Estimation

1/2 Pixel 1/4 Pixel

Deblocking Filter None Dynamic edge filtersScalable coding support [2] Yes, layered picture

spatial, SNR, temporal scalability

With some support on temporal and SNR scalability

Bit rates with same quality HD video with resolution (1920 x 1080)

12 -20 Mbps 7 – 8 Mbps

Transmission rate 2 – 15 Mbps 64 kbps – 150 Mbps

27

Page 28: Report

4.3.4 Performance comparison between MPEG-2 and H.264 using standard test streams – Simulation results

Test streams (foreman, news and carphone [26]) were encoded using the open-source MPEG-2 codec [25] and the H.264 codec [24]. The results were compared against each other for parameters like the signal to noise ratio (SNR), GOP and compression ratio. CIF files were used for the “Foreman” and the “News” clips whereas QCIF was used for the “Carphone” clip. The bit rate for H.264 encoding was taken as the standard one used by the codes. The bit rate for MPEG-2 encoding was adjusted on the basis of the bit rate of the H.264 encoding process. This helped to compare the two standards on a common plane. While the aim of this project is to compare the Main profiles of MPEG-2 and H.264, simulations were run for the Simple/Baseline profiles too. This was done in order to prove quantitatively that encoding using the Main profile for both MPEG-2 and H.264 gives a better compression ratio and better quality video than the Simple profile. Tables 5, 6 and 7 tabulate the results obtained after running the simulations. Figures 11, 12 and 13 show screen shots of the encoded videos (only for the Main profiles). Section 4.3.5 explains the conclusions drawn on the basis of the results obtained from simulations.

4.3.5 Conclusion

From the tables below, the following is concluded: For the same bit rate and video resolution, the PSNR (dB) values are

greater for H.264 encoded videos than for the MPEG-2 encoded videos indicating better video quality. This can be verified from the screen shots.

The compression ratio for H.264 encoded video is also better than that for MPEG-2 encoded video inspite of better quality video.

Compression ratio = original file size/compressed file size

The video quality for H.264 video is better than for MPEG-2 video for the Simple/Baseline profiles as well. Therefore, it can be concluded that H.264 video coding standard gives better compression and better video quality as compared to MPEG-2.

28

Page 29: Report

Table 5. Performance comparison between MPEG-2 and H.264 main/simple profiles - Foreman

Parameter Main Profile Simple profileMPEG-2 H.264 MPEG-2 H.264

Input video resolution

352 x 288 (CIF)

352 x 288 (CIF)

352 x 288 (CIF)

352 x 288 (CIF)

fps 30 30 30 30

# frames encoded

90 90 90 90

GOP I-P-B-B-P-B-B

I-B-B-P-B-B-P

I-P-P-P I-P-P-P

PSNR (Y) (dB) 30.42 37.03 32.1 37.4

PSNR (U) (dB) 39.1 41.08 39.01 41

PSNR (V) (dB) 39.6 43.81 40.2 43.6

Bit rate (kbits/second)

481.00 481.06 561.0 561.01

Compression ratio

74:1 78:1 65:1 65:1

(a) (b)

Figure 11. (a) Foreman – MPEG-2 (main profile) encoding (b) Foreman – H.264 (main profile) encoding

29

Page 30: Report

Table 6. Performance comparison between MPEG-2 and H.264 main profiles – News

Parameter Main Profile Simple ProfileMPEG-2 H.264 MPEG-2 H.264

Input video resolution

352 x 288 (CIF)

352 x 288 (CIF)

352 x 288 (CIF)

352 x 288 (CIF)

fps 30 30 30 30

# frames encoded

90 90 90 90

GOP I-P-B-B-P-B-B

I-B-B-P-B-B-P

I-P-P-P I-P-P-P

PSNR (Y) (dB) 37.02 39.1 34.01 39

PSNR (U) (dB) 37.02 41.0 39.1 41

PSNR (V) (dB) 39.02 42.0 39.7 42

Bit rate (kbits/second)

376.00 376.00 380.8 380.8

Compression ratio

94:1 99.7:1 95.5:1 95.5:1

(a) (b)

Figure 12.(a) News – MPEG-2 (main profile) encoding (b) News – H.264 (main profile) encoding

30

Page 31: Report

Table 7. Performance comparison between MPEG-2 and H.264 main profiles – Carphone

Parameter Main Profile Simple ProfileMPEG-2 H.264 MPEG-2 H.264

Input video resolution

176 x 144 (QCIF)

176 x 144 (QCIF)

176 x 144 (QCIF)

176 x 144 (QCIF)

fps 30 30 30 30

# frames encoded

90 90 90 90

GOP I-P-B-B-P-B-B

I-B-B-P-B-B-P

I-P-P-P I-P-P-P

PSNR (Y) (dB) 30.46 37.6 31.6 38

PSNR (U) (dB) 36.36 40.9 39 40.8

PSNR (V) (dB) 36.5 41.5 39 41.3

Bit rate (kbits/second)

128 127.6 147.3 147.3

Compression ratio

69.6:1 72.6:1 60.8:1 61.9:1

(a) (b)

Figure 13. (a) Carphone – MPEG-2 (main profile) encoding (b) Carphone – H.264 (main profile) encoding

31

Page 32: Report

5 Transcoding methods

5.1 Introduction to transcoding

In this fast growing world of multimedia and telecommunications there is a great demand for efficient usage of the available bandwidth. With the growth of technology there is an increase in the number of networks, types of devices and different content representation formats as a result of which interoperability between different systems and networks is gaining in importance. Transcoding of video content is one such effort in this direction. Besides these, a transcoder can also be used to insert new information for example company’s logos, watermarks as well as error resilience features into a compressed video stream. Transcoding techniques are also useful in supporting VCR trick modes such as fast-forward, reverse play etc. for on-demand applications. [4]

Technically, transcoding is the coding and recoding of digital content from one compressed format to another to enable transmission over different media and playback over various devices [29].

Having said this, now arises the question of why the need for H264/AVC to MPEG-2 transcoding [14] [15]? In order to provide better compression of video as compared to previous standards, H.264/AVC was recently developed by the JVT (Joint Video Team). This new standard fulfills significant coding efficiency, simple syntax specifications and seamless integration of video coding into all current protocols and multiplex architectures. The H.264 specification represents a significant advancement in the field of video coding technology by providing MPEG-2 comparable video quality at an average of half the required bandwidth. Since widespread use of H.264 is anticipated, many legacy systems including all Digital TVs and home receivers use MPEG-2. This leads to the need for an efficient architecture that significantly employs the lower cost of H.264 video and does not require a significant investment in additional video coding hardware.

Figure 14. H.264 to MPEG-2 transcoder applications [12]

32

Page 33: Report

5.2 How is transcoding done – the basic process

The simplest approach to transcoding is to completely decode the MPEG-2 bit stream and then re-encode it with an H.264 encoder. The decode operation can be performed either externally or as a part of the H.264 encoder. System issues, such as handling SCTE-35 digital program insertion (DPI) messages, will require that the decode and the encode operations be tightly coupled. The quality of transcoding with this simple approach will not be high.

Figure 15 shows a comparison between direct encoding and transcoding. The figure shows the PSNR (a measure of mean square error between the input and decoded output) values computed at different bit rates. The PSNR numbers are obtained by averaging the results over 18 different sequences of varying content type and complexities. The top plot shows the performance of direct encoding using an H.264 encoder. The bottom plot shows the performance of transcoding where the video is originally coded with MPEG-2 at 4Mb/s, decoded and then re-encoded with the same encoder used for direct encoding. Transcoding can result in up to 20 percent loss in compression efficiency.

Similar to the previous approach, the incoming MPEG-2 stream is decoded and then re-encoded using an H.264 encoder. However, here the relevant information available from the MPEG-2 bit stream is reused.

Figure 15. Performance comparison between direct encoding and transcoding [32]

33

Page 34: Report

5.3 Criteria for transcoding

Transcoding can be of various types [14]. Some of them are bit rate transcoding to facilitate more efficient transport of video, spatial and temporal resolution reduction transcoding for use in mobile devices with limited display and processing power and error-resilience transcoding in order to achieve higher resilience of the original bit stream to transmission errors.

To achieve optimum results by transcoding, the following criteria have to be fulfilled:(i) The quality of the transcoded bitstream should be comparable to the one obtained by direct decoding and re-encoding of the output stream.(ii) The information contained in the input stream should be used as much as possible to avoid multigenerational deterioration.(iii) The process should be cost efficient, low in complexity and achieve the highest quality possible.

5.4 Transcoding of H.264 to MPEG-2

In order to provide better compression of video as compared to previous standards, H.264/AVC video coding standard was recently developed by the JVT (Joint Video Team) consisting of experts from VCEG (Video Coding Experts Group) and MPEG. This new standard fulfills significant coding efficiency, simple syntax specifications, and seamless integration of video coding into all current protocols and multiplex architectures. Thus H.264 can support various applications such as video broadcasting, video streaming and video conferencing over fixed and wireless networks and over different transport protocols. However MPEG-2 has already been widely used in the field of digital broadcasting, HDTV and DVD applications. Hence transcoding is a feasible method to solve the incompatibility problem between H.264 video source and the existing MPEG-2 decoders.

An H.264/AVC to MPEG-2 transcoder is designed to transcode the H.264 video stream to MPEG-2 format so as to be used by the MPEG-2 end equipment. It is better to transmit H.264 bitstreams on public networks to save on the much needed bandwidth and then transcode them into MPEG-2 bitstreams for local MPEG-2 equipment like a set-top box.

34

Page 35: Report

5.5 Transcoding architectures

This section describes the various transcoding architectures [15]:

5.5.1 Open loop transcoding:

Open loop transcoders include selective transmission where the high frequency DCT coefficients are discarded and requantization. They are computationally efficient, since they operate directly on the DCT coefficients. However they suffer from the drift problem. Drift error occurs due to rounding, quantization loss and clipping functions.

Figure 16. Open loop transcoding architecture [15]

5.5.2 Cascaded pixel domain transcoding architecture:

This is a drift free architecture. It is a concatenation of a simplified decoder and encoder as shown in Figure 17. In this architecture, instead of performing the full motion estimation, the encoder reuses the motion vectors along with other information extracted from the input video bitstream thus reducing the complexity.

Figure 17. Cascaded pixel domain transcoding architecture [15].

35

Page 36: Report

5.5.3 Simplified DCT domain transcoding (SDDT):This architecture is based on the assumption that DCT, IDCT and motion compensation are all linear operations. Since in this architecture, the motion compensation is performed in the DCT domain it is a computationally intensive operation. For instance, as shown in Figure 19, the goal is to compute the target block B from the four overlapping blocks B1, B2, B3 and B4.

Figure 18. Simplified DCT domain transcoding architecture [15].

Figure 19. DCT- Motion compensation [15].

SDDT eliminates the DCT/IDCT and reduces the frame numbers by half as a result of which it requires less computation and memory as compared to CPDT. However the linearity assumptions are not strictly true since there are clipping functions performed in the video encoder/decoder and rounding operations performed in the interpolation for fractional pixel MC. These failed assumptions may cause drift in the transcoded video.

36

Page 37: Report

5.5.4 Cascaded DCT domain transcoding (CDDT)

The cascaded DCT-domain transcoder can be used for spatial and temporal resolution downscaling and other coding parameter changes. Compared to SDDT, greater flexibility is achieved using additional DCT-motion compensation and frame memory resulting in higher cost and complexity. This architecture is adopted for downscaling operations where the encoder side DCT-MC and memory will not cost much.

Figure 20. Cascaded DCT domain transcoding architecture [15]

5.6 Conclusions

The selection of appropriate transcoding architecture depends upon the application for which it is intended. There is generally a tradeoff between the accuracy and the complexity and cost of the architecture. For example, the simplest open loop architecture is the easiest to implement but it suffers from the problem of drift whereas the cascaded DCT domain transcoding architecture overcomes this problem but it is a very complex and expensive architecture to implement.

37

Page 38: Report

6 References[1] Soon-kak Kwon, A. Tamhankar and K.R. Rao, “Overview of H.264 / MPEG-4 Part 10 (pp.186-216)”, Special issue on “ Emerging H.264/AVC video coding standard”, J. Visual Communication and Image Representation, vol. 17, pp.183-552, Apr. 2006.[2] A. Puri, H. Chen and A. Luthra, “Video Coding using the H.264/MPEG-4 AVC compression standard”, Signal Processing: Image Communication, vol.19, pp 793-849, Oct. 2004.[3] H. Kalva, “Issues in H.264/MPEG-2 Video Transcoding”, Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL.[4] S. Sharma, “Transcoding of H.264 bitstream to MPEG 2 bitstream”, Master’s Thesis, May 2006, EE Department, University of Texas at Arlington.[5] S. Sharma and K. R. Rao, “Transcoding of H.264 bitstream to MPEG-2 bitstream”, Proceedings of Asia-Pacific Conference on Communications 2007.[6] “Emerging H.264/AVC Video Coding Standard”, J. Visual Communication and Image Representation, vol.17, pp. 183-552, Apr. 2006.[7] P.N.Tudor, “Tutorial on MPEG-2 Video Compression”, IEE J Langham Thomson Prize, Electronics and Communication Engineering Journal, Dec. 1995.[8] “The MPEG-2 International Standard”, ISO/IEC, Reference number ISO/IEC 13818-2, 1996.[9] T. Wiegand et. al., “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, Issue 7, pp. 560-576, July 2003.[10] J McVeigh et. al., “A software based real time MPEG-2 video encoder”, IEEE Trans. CSVT, Vol 10, pp 1178-1184, Oct. 2000.[11] O.J. Morris, “MPEG-2: Where did it come from and what is it?”, IEE Colloquium, pp. 1/1-1/5, 24 Jan. 1995.[12] P. Kunzelmann and H. Kalva, “Reduced Complexity H.264 to MPEG-2 Transcoder”, ICCE 2007, pp. 1-2, Jan. 2007.[13] N. Kamaci and Y. Altunbasak., “Performance Comparison of the Emerging H.264 Video Coding Standard with the existing standards”, ICME, Vol.1, pp. 345-348, July 2003.[14] J. Xin, C. Lin and M. Sun , “Digital Video Transcoding”, Proceedings of the IEEE, Vol. 93, Issue 1,pp 84-97, Jan. 2005.[15] A. Vetros, C. Christopoulos and H. Sun, “Video transcoding architectures and techniques: an overview”, IEEE Signal Processing Magazine, Vol. 20, Issue 2, pp 18-29, Mar. 2003.[16] “MPEG-2”, Wikipedia, Feb. 14, 2008. Available at <http://en.wikipedia.org/wiki/Mpeg_2>[17] “Introduction to MPEG 2 Video Compression” Available at <http://www.bretl.com/mpeghtml/codecdia1.HTM>

38

Page 39: Report

[18] “H.264/MPEG-4 AVC”, Wikipedia, Feb. 18, 2008. Available at < http://en.wikipedia.org/wiki/H.264>[19] “H.264 A new Technology for Video Compression” – Available at <

http://www.nuntius.com/technology3.html>[20] R. Periera, “Efficient transcoding of MPEG-2 to H.264”, Master’s thesis, Dec. 2005, EE Department, University of Texas at Arlington. [21] “H.262 : Information technology - Generic coding of moving pictures and associated audio information: Video”, International Telecommunication Union, 2000-02.Available at < http://www.itu.int/rec/T-REC-H.262>[22] “MPEG-2 White paper”, Pinnacle Technical Documentation, Version 0.5, Pinnacle Systems, Feb. 29, 2000.[23] M. Ghanbari, “Standard Codecs : Image Compression to Advanced Video Coding,” Hertz, UK: IEE, 2003.[24] H.264 software (version 13.2) obtained from:<http://iphome.hhi.de/suehring/tml/>[25] MPEG-2 software (version 12) obtained from:<http://www.mpeg.org/MPEG/video/mssg-free-mpeg-software.html>[26] Test streams (Foreman, News, Carphone) obtained from:<http://www-ee.uta.edu/dip/Courses/EE5356/ee_5356.htm> [27] Implementation Studies Group, “Main Results of the AVC Complexity analysis”, MPEG document N4964, ISO/IEC JTC11/SC29/WG11, July 2002.[28] A. Joch et al., “Performance comparison of video coding standards using Lagarangian coder control”, IEEE Int. Conf. of Image Processing, Vol. 2, pp. II-501 to II-504, Sept. 2002.[29] I. Sylvester, “Transcoding: The future of the video market depends on it”, IDC Executive Brief, Nov. 2006. Available at <

http://www.ed-china.com/ARTICLES/2006NOV/2/2006NOV10_HA_AVC_HN_12.PDF>[30] R. Hoffner, “MPEG-4 Advanced Video Coding emerges”, Available at < http://www.tvtechnology.com/features/Tech-Corner/F_Hoffner-03.09.05.shtml>[31] S. Wagston and A. Susin, “IP core for an H.264 Decoder SoC”, 2007,Available at< www.us.design-reuse.com/news/?id=15746&print=yes>[32] S. Krishnamachari and K. Yang, “MPEG-2 to H.264 Transcoding: Why and How?”, Dec. 1, 2006, Available at < http://broadcastengineering.com/infrastructure/broadcasting_mpeg_transcoding_why/index1.html>

39