proposal

EE 5359 PROPOSALH.264 to VC-1 TRANSCODING

Vidhya VijayakumarStudent I.D.: 1000-622152Date: September 24, 2009

1

H.264 to VC-1 TRANSCODER

OBJECTIVE:

The objective of the thesis is to implement a H.264 bitstream to VC-1 transcoder for progressive compression.

MOTIVATION:

The high definition video adoption has been growing rapidly for the last five years. The high definition DVD format Blue ray has mandated MPEG-2[3], H.264 [2] and VC-1 [1] as video compression formats. The coexistence of these different video coding standards creates a need for transcoding. As more and more end products use the above standards, transcoding from one format to another adds value to the product’s capability. While there has been recent work on MPEG-2 to H.264 transcoding [3], VC-1 to H.264 transcoding [4], the published work on H.264 to VC-1 transcoding is nearly non-existent. This has created the motivation to develop a transcoder that can efficiently transcode a H.264 bitstream to a VC-1 bitstream.

DETAILS:

Video transcoding is the operation of converting video from one format to another [5]. A format is defined by characteristics such as bit-rate, spatial resolution etc. One of the earliest applications of transcoding is to adapt the bit-rate of a compressed stream to the channel bandwidth for universal multimedia access in all kinds of channels like wireless networks, Internet, dial-up networks etc. Changes in the characteristics of an encoded stream like bit-rate, spatial resolution, quality etc can also be achieved by scalable video coding [5].However, in cases where the available network bandwidth is insufficient or if it fluctuates with time, it may be difficult to set the base layer bit-rate. In addition, scalable video coding demands additional complexities at both the encoder and the decoder.

The basic architecture for converting an H.264 bitstream into a VC-1 elementary stream arises from complete decoding of the H.264 stream and then re-encoding into a VC-1 stream. However, this involves significant computational complexity [6]. Hence there also is a need to transcode at low complexity.

Transcoding can in general be implemented in the spatial domain or in the transform domain or in a combination of the two domains. The common transcoding architectures [5] are:

Open loop transform domain transcoding

2

Fig. 1 Open loop transform domain transcoder architecture [5]Open loop transcoders are computationally efficient (Fig 1). They operate in the DCT domain. However they are subject to drift error. Drift error occurs due to rounding, quantization loss and clipping functions.

Cascaded Pixel Domain Architecture (CPDT)

Fig. 2 Cascaded pixel domain transcoder architecture [5]

This is the most basic transcoding architecture (Fig 2). The motion vectors from the incoming bit stream are extracted and reused. Thus the complexity of the motion estimation block is eliminated which accounts for 60% of the encoder computation. As compared to the previous architecture, CPDT is drift free. Hence, even though it is slightly more complex, it is suited for heterogeneous transcoding between different standards where the basic parameters like mode decisions, motion vectors etc are to be re-derived.

Simplified DCT Domain transcoders (SDDT)

Fig. 3 Simplified transform domain transcoder architecture [5]

3

This transcoder is based on the assumption that DCT, IDCT and motion compensation are linear processes (Fig 3). This architecture requires that motion compensation be performed in the DCT domain, which is a major computationally intensive operation [3]. For instance, as shown in the figure 4, the goal is trying to compute the DCT coefficients of the target block B from the four overlapping blocks B1, B2, B3 and B4.

Fig. 4 Transform domain motion compensation illustration [5]

Also, clipping functions and rounding operations performed for interpolation in fractional pixel motion compensation lead to a drift in the transcoded video.

Cascaded DCT Domain transcoders (CDDT)

Fig. 5 Cascaded transform domain transcoder architecture [5]

This is used for spatial/temporal resolution downscaling and other coding parameter changes (Fig 5). As compared with SDDT, greater flexibility is achieved by introducing another transform domain motion compensation block; however it is far more computationally intensive and requires more memory [3]. It is often applied to downscaling applications where the encoder end memory will not cost much due to downscaled resolution.

4

Choice of basic transcoder architecture:

DCT domain transcoders have the main drawback that motion compensation in transform domain is very computationally intensive. DCT domain transcoders are also, less flexible as compared to pixel domain transcoders, for instance, the SDDT architecture can only be used for bit rate reduction transcoding. It assumes that the spatial and temporal resolutions stay the same and that the output video uses the same frame types, mode decisions and motion vectors as the input video.

For H.264 to VC-1 transcoding, it is required to implement several changes in order to accommodate the mismatches between the two standards. For instance, for motion estimation and compensation, H.264 supports 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 macroblock partitions (Fig 6), but VC-1 supports 16x16 and 8x8 only (Fig 7). The transform size and type (8x8 and 4x4 in H.264 and 8x8, 4x8, 8x4 and 4x4 in VC-1) are different and make transform domain transcoding prohibitively complex. Hence, the use of DCT domain transcoders is not very ideal.

Fig.6 Segmentations of the macroblock for motion compensation in H.264Top: segmentation of macroblocks, bottom: segmentation of 8x8 partitions. [2]

Fig.7 Segmentations of the macroblock for motion compensation in VC-1 [2]

From Fig. 8, it can be inferred that, the cascaded pixel domain architecture outperforms the DCT domain transcoders. Also for larger GOP sizes, the drift in DCT domain transcoders becomes more significant.

5

Fig.8 PSNR vs Bit-rate graph for the Foreman sequence transcoded with a GOP size 15, using different transcoding architectures as described in Figs. 1, 2, 3 and 5. [5]

Hence, heterogeneous transcoding in the pixel domain is preferred for standards transcoding.

Standards transcoding:

When transcoding between two different standards, the main factor involved is compatibility between the profile and level of the input stream and that of the output stream for a specific purpose. The goal here is to transcode a H.264 bitstream of Baseline profile to VC-1 bit stream of Simple profile.

The table 1 compares and contrasts the characteristics of both standards

H.264 High Profile VC-1 Main ProfileChroma Format 4:2:0 4:2:0Picture coding type I ,P ,B I ,P ,BTransform size 4x4, 8x8 8x8, 4x8, 8x4, 4x4Intra Prediction Directional Predictors None

Block sizes for Motion Compensation

16x16, 16x8, 8x16, 8x8, 4x8, 8x4, 4x4

16x16, 8x8

Table 1 Main characteristics of H.264 Main profile and VC-1 Main profile

Overview of H.264:

H.264 [2] is a standard for video compression, and is equivalent to MPEG-4 Part 10, or MPEG-4 AVC (for advanced video coding) (Fig 9). As of 2008, it is the latest block-oriented motion-compensation-based video standard developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving

6

Picture Experts Group (MPEG), and it was the product of a partnership effort known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 standard (formally, ISO/IEC 14496-10) are jointly maintained so that they have identical technical content.

Fig 9 H.264 Encoder [32]

Fig 10. H.264 Decoder [32]

The standardization of the first version of H.264/AVC was completed in May 2003. The JVT then developed extensions to the original standard that are known as the Fidelity Range Extensions (FRExt) [29]. These extensions enable higher quality video coding by supporting increased sample bit depth precision and higher-resolution color information, including sampling structures known as YUV 4:2:2 and YUV 4:4:4. Several other features are also included in the Fidelity Range Extensions project, such as adaptive switching between 4×4 and 8×8 integer transforms, encoder-specified perceptual-based quantization weighting matrices, efficient inter-picture lossless coding, and support of additional color spaces. The design work on the Fidelity Range Extensions was completed in July 2004, and the drafting work on them was completed in September 2004.

Scalable video coding (SVC) [30] as specified in Annex G of H.264/AVC allows the construction of bitstreams that contain sub-bitstreams that conform to H.264/AVC. For temporal bitstream scalability, i.e., the presence of a sub-bitstream with a smaller temporal sampling rate than the bitstream, complete access units are removed from the bitstream when deriving the sub-bitstream. In this case, high-level syntax and inter prediction reference pictures in the bitstream are constructed accordingly. For spatial and quality bitstream scalabilities, i.e. the presence of a sub-bitstream with lower spatial resolution or quality than the bitstream, network abstraction layer (NAL) units are removed from the bitstream when deriving the sub-

7

bitstream. In this case, inter-layer prediction, i.e., the prediction of the higher spatial resolution or quality signal by data of the lower spatial resolution or quality signal, is typically used for efficient coding. The Scalable Video Coding extension was completed in November 2007.Some of the features adopted in H.264 for enhancement of prediction, improved coding efficiency and robustness to data errors/losses are listed as follows.

Features for enhancement of prediction

Directional spatial prediction for intra coding

Variable block-size motion compensation with small block size

Figure 11 – Various block sizes in H.264

Quarter-sample-accurate motion compensation Motion vectors over picture boundaries Multiple reference picture motion compensation Decoupling of referencing order from display order Decoupling of picture representation methods from picture referencing

capability Weighted prediction Improved “skipped” and “direct” motion inference In-the-loop deblocking filtering

Features for improved coding efficiency

Small block-size transform Exact-match inverse transform

8

Figure – Forward 4x4 and 8x8 integer transform

Short word-length transform Hierarchical block transform Arithmetic entropy coding Context-adaptive entropy coding

Features for robustness to data errors/losses

Parameter set structure NAL unit syntax structure Flexible slice size Flexible macroblock ordering (FMO) Arbitrary slice ordering (ASO) Redundant pictures Data partitioning SP/SI synchronization/switching pictures

Profiles in H.264

H.264 standard defines numerous profiles.

Constrained baseline profile Baseline Main profile Extended profile High profile High 10 profile

9

High 4:2:2 profile High 4:4:4 predictive profile High stereo profile High 10 intra profile High 4:2:2 intra profile High 4:4:4 intra profile CAVLC 4:4:4 intra profile Scalable baseline profile Scalable high profile Scalable high intra profile

Table Features in baseline, main and extended profile

Table Features in high profile

10

Figure 12 Comparison of H.264 baseline, main, extended and high profile

Overview of VC-1

VC-1 [1] is the informal name of the SMPTE 421M video codec standard initially developed by Microsoft. It was released on April 3, 2006 by SMPTE. It is now a supported standard for Blu-ray Discs, and Windows Media Video 9.

VC-1 is an evolution of the conventional DCT-based video codec design also found in H.261 [31], H.263 [27], MPEG-1[40] and MPEG-2[3]. It is widely characterized as an alternative to the latest ITU-T and MPEG video codec standard known as H.264/MPEG-4 AVC. VC-1 contains coding tools for interlaced video sequences as well as progressive encoding. The main goal of VC-1 development and standardization is to support the compression of interlaced content without first converting it to progressive, making it more attractive to broadcast and video industry professionals.

The VC-1 codec is designed to achieve state-of-the-art compressed video quality at bit rates that may range from very low to very high. The codec can easily handle 1920 pixel × 1080 pixel resolution at 6 to 30 megabits per second (Mbps) for high-definition video. VC-1 is capable of higher resolutions such as 2048 pixels × 1536 pixels for digital cinema, and of a maximum bit rate of 135 Mbps. An example of very low bit rate video would be 160 pixel × 120 pixel resolution at 10 kilobits per second (Kbps) for modem applications.

11

The basic functionality of VC-1 involves a block-based motion compensation and spatial transform scheme similar to that used in other video compression standards such as MPEG-1 and H.261 [31]. However, VC-1 includes a number of innovations and optimizations that make it distinct from the basic compression scheme, resulting in excellent quality and efficiency. VC-1 Advanced Profile is also transport independent. This provides even greater flexibility for device manufacturers and content services.

Fig. 11 VC – 1 Codec [32]

Profiles in VC-1

VC-1 defines three profiles1. Simple2. Main3. Advanced

Simple Main Advanced

Baseline intra frame compression

Yes Yes Yes

Variable-sized transform Yes Yes Yes

16-bit transform Yes Yes Yes

Overlapped transform Yes Yes Yes

4 motion vector per macroblock

Yes Yes Yes

12

¼ pixel luminance motion compensation

Yes Yes Yes

¼ pixel chrominance motion compensation

No Yes Yes

Start codes No Yes Yes

Extended motion vectors No Yes Yes

Simple Main Advanced

Loop filter No Yes Yes

Dynamic resolution change No Yes Yes

Adaptive macroblock quantisation

No Yes Yes

B frames No Yes Yes

Intensity compensation No Yes Yes

Range adjustment No Yes Yes

Field and frame coding modes No No Yes

GOP Layer No No Yes

Display metadata No No Yes

Table – Features in VC-1 profiles [49]

Innovations

13

VC-1 includes a number of innovations that enable it to produce high quality content. This section provides brief descriptions of some of these features.

Adaptive Block Size Transform

Traditionally, 8 × 8 transforms have been used for image and video coding. However, there is evidence to suggest that 4 × 4 transforms can reduce ringing artifacts at edges and discontinuities. VC-1 is capable of coding an 8 × 8 block using either an 8 × 8 transform, two 8 × 4 transforms, two 4 × 8 transforms, or four 4 × 4 transforms. This feature enables coding that takes advantage of the different transform sizes as needed for optimal image quality.

Figure – VC-1 transform sizes [4]

16-Bit Transforms

In order to minimize the computational complexity of the decoder, VC-1 uses 16-bit transforms. This also has the advantage of easy implementation on the large amount of digital signal processing (DSP) hardware built with 16-bit processors. Among the constraints put on transforms specified in VC-1 is the requirement that the 16-bit values used produce results that can fit in 16 bits. The constraints on transforms ensure that decoding is as efficient as possible on a wide range of devices.

Motion Compensation

Motion compensation is the process of generating a prediction of a video frame by displacing the reference frame. Typically, the prediction is formed for a block (an 8 × 8 pixel tile) or a macroblock (a 16 × 16 pixel tile) of data. The displacement of data due to motion is defined by a motion vector, which captures theshift along both the x- and y-axes.

Figure VC-1 motion compensation sizes [4]

14

The efficiency of the codec is affected by the size of the predicted block, the granularity of sub-pixel data that can be captured, and the type of filter used for generating sub-pixel predictors. VC-1 uses 16 × 16 blocks for prediction, with the ability to generate mixed frames of 16 × 16 and 8 × 8 blocks. The finest granularity of sub-pixel information supported by VC-1 is 1/4 pixel. Two sets of filters are used by VC-1 for motion compensation. The first is an approximate bicubic filter with four taps. The second is a bilinear filter with two taps. The four-tap bicubic filters used in VC-1 for ¼ and ½ pixel shifts are: [-4 53 18 -3]/64 and [-1 9 9 -1]/16.

Figure – Integer, half and quarter pel positions [2](A-Q Integer, aa-hh half, a-s quarter pel positions)

VC-1 combines the motion vector settings defined by the block size, sub-pixel resolution, and filter type into modes. The result is four motion compensation modes that suit a range of different situations. This classification of settings into modes also helps compact decoder implementations.

Loop Filtering

VC-1 uses an in-loop deblocking filter that attempts to remove block-boundary discontinuities introduced by quantization errors in interpolated frames. These discontinuities can cause visible artifacts in the decompressed video frames and can impact the quality of the frame as a predictor for future interpolated frames.

15

Figure – Loop filtering in VC-1 [4] (Only pixel p4 and p5 are filtered)

The loop filter takes into account the adaptive block size transforms. The filter is also optimized to reduce the number of operations required.

Interlaced Coding

Interlaced video content is widely used in television broadcasting. When encoding interlaced content, the VC-1 codec can take advantage of the characteristics of interlaced frames to improve compression. This is achieved by using data from both fields to predict motion compensation in interpolated frames.

Advanced B Frame Coding

A bi-directional or B frame is a frame that is interpolated from data both in previous and subsequent frames. B frames are distinct from I frames (also called key frames), which are encoded without reference to other frames. B frames are also distinct from P frames, which are interpolated from previous frames only. VC-1 includes several optimizations that make B frames more efficient. VC-1 does not have a fixed group of pictures (GOP) structure and the number of pictures in a GOP can vary.

Fading Compensation

Due to the nature of compression that uses motion compensation, encoding of video frames that contain fades to or from black is very inefficient. With a uniform fade, every macroblock needs adjustments to luminance. VC-1 includes fading compensation, which detects fades and uses alternate methods to adjust luminance. This feature improves compression efficiency for sequences with fading and other global illumination changes.

Differential Quantization

Differential quantization, or dquant, is an encoding method in which multiple quantization steps are used within a single frame. Rather than quantize the entire frame with a single quantization level, macroblocks are identified within the frame that might benefit from lower quantization levels and greater number of preserved AC

16

coefficients. Such macroblocks are then encoded at lower quantization levels than the one used for the remaining macroblocks in the frame. The simplest and typically most efficient form of differential quantization involves only two quantizer levels (bi-level dquant), but VC-1 supports multiple levels, also.

MAPPING DIFFERENCES BETWEEN THE TWO STANDARDS:

The transcoding algorithm considered in this research assumes full H.264 decoding down to the pixel level, followed by a reduced complexity VC-1 encoding. The data gathered during the H.264 decoding stage is used to accelerate the VC-1 encoding stage. It is assumed that the H.264 encoded bitstream is generated with an R-D optimized encoder. The picture coding types used are similar in both the standards. The transform size and type are different and makes transform domain transcoding prohibitively complex. The semantics of intra MBs are similar except for the intra directional prediction allowed in H.264 and the mixed MBs in VC-1. The inter prediction has significant differences including the block size of MC, block size of transform, and reference frames used. These similarities between the codecs can be exploited in reducing the transcoding complexity.

Intra MB Mode Mapping:

An intra MB in the incoming H.264 bitstream is coded as a VC-1 intra MB. A H.264 intra MB can be coded as Intra 4x4 (9 different directional modes) or Intra 16x16 (4 different modes). But a VC-1 intra MB has four 8x8 blocks and has no prediction modes. Since intra MB in VC-1 uses 8x8 transform, irrespective of the block size (16x16 or 4x4) in H.264, we need not carry over the information of the intra prediction type in H.264. Table 2 shows the proposed intra MB mapping.

H.264 Intra MB VC-1 Intra MB

Intra 16x16 (Any mode) Intra MB 8x8Intra 4x4 (Any mode) Intra MB 8x8

Table 2 H.264 and VC-1 Intra MB mapping

Figure – Matrix for one-dimensional 8-point inverse transform [32]

Inter MB Mode Mapping:

17

An inter coded MB in the incoming H.264 bitstream is coded as inter MB in VC-1. The inter MB in H.264 has 7 different motion compensation sizes – 16x16, 16x8, 8x16, 8x8, 4x8, 8x4, 4x4. The inter MB in VC-1 has 2 different motion compensation sizes 16x16 and 8x8. Another significant difference is that H.264 uses 4x4 (and 8x8 in fidelity range extensions) transform sizes where as VC-1 uses 4 different transform sizes – 8x8, 4x8, 8x4 and 4x4.

The 16x16, 8x16, 16x8 motion compensation sizes are usually selected in H.264 for areas that are relatively uniform and will be mapped to inter 16x16 MB in VC-1 using the selected H.264 MC block size as a measure of homogeneity in the block to be able to differentiate the transform size to be applied in VC-1.

The 8x8, 8x4, 4x8 and 4x4 modes are usually selected in H.264 for areas that have non-uniform motion. The 16x16 mode in VC-1 is eliminated for such non-uniform MBs. The MB is then mapped to 8x8 block size in VC-1 with the H.264 block size determining the transform size to be used in VC-1.

Table 3 describes the decision making for mapping the inter MBs and the type of transform to be used in VC-1.

H.264 Inter MB VC-1 Inter MB Transform size in VC-1Inter 16x16 Inter 16x16 8x8Inter 16x8 Inter 16x16 8x4Inter 8x16 Inter 16x16 4x8Inter 8x8 Inter 8x8 8x8Inter 4x8 Inter 8x8 4x8Inter 8x4 Inter 8x8 8x4Inter 4x4 Inter 8x8 4x4

Table 3 H.264 and VC-1 Inter MB mapping and VC-1 transform type

Motion vector mapping:

Re-use of motion vectors selected in H.264 can significantly reduce the complexity of VC-1 encoding. Table 4 describes the re-use of motion vectors.

H.264 Inter MB VC-1 Inter MB Motion Vector Re-useInter 16x16 Inter 16x16 Same motion vectorsInter 16x8 Inter 16x16 Average of motion vectorsInter 8x16 Inter 16x16 Average of motion vectorsInter 8x8 Inter 8x8 Same motion vectorsInter 4x8 Inter 8x8 Average of motion vectorsInter 8x4 Inter 8x8 Average of motion vectorsInter 4x4 Inter 8x8 Average of motion vectors

Table 4 H.264 and VC-1 Inter MB motion vector mapping

18

Reference Pictures:

H.264/AVC standard defines the use of up to sixteen reference pictures for motion estimation, while VC-1 uses only one or two, according to the slice type P or B respectively. The reuse of motion vectors implies using the same reference pictures to maintain their meaning. The motion vector conversion assumes that motion vector length is related to the reference image distance [39]. The source motion vectors are scaled, according to figure 12 in order to use valid VC-1 reference pictures. This conversion assumes constant motion between H.264/AVC and VC-1 reference pictures. The motion vector conversion is performed by scaling it with the temporal distance between the two reference pictures.

Fig 12 Motion vector scaling [38]

Skipped Macroblock:

When a skipped macro block is signaled in the bit stream, no further data is sent for that macro block. The mode conversion of H.264 skip macroblocks to VC-1 skip is a straightforward process. Since the skip macro block definition of both standards is fully compatible, a direct conversion is possible.

OPEN LOOP TRANSCODER:

The open loop transcoder is designed by cascading a H.264 encoder [44], H.264 [44] decoder, VC-1 encoder [45] and a VC-1 decoder [45].

Fig 13 Open loop transcoder

Performance of open loop transcoder

Mean square error (MSE), peak-to-peak signal to noise ratio (PSNR), structural similarity index measure (SSIM) for Foreman QCIF (3 frames) is calculated using the open loop transcoder.

19

H.264

VC-1

H.264 Encoder H.264 Decoder VC-1 Encoder VC-1 DecoderYUV YUV

Fig 14 MSE of open loop transcoder – Foreman sequence

Fig 15 PSNR of open loop transcoder – Foreman sequence

20

Fig 16 SSIM of open loop transcoder – Foreman sequence

CONCLUSIONS:

As mentioned earlier, it is proposed to transcode an H.264 bitstream to a VC-1 stream in the pixel domain (CPDT) and compare the results (MSE, PSNR, SSIM, complexity, bit rates) against an open loop transcoder. On the encoder side, since there is no re-estimation of the motion vectors, the complexity on the encoder side reduces by about 40-50%. Road map ahead is to extract re-usable information from the H.264 bitstream to be used in VC-1 encoding.

REFERENCES: [1] VC-1 Compressed Video Bitstream Format and Decoding Process (SMPTE

421M-2006), SMPTE Standard, 2006.

[2] T. Wiegand et al, “Overview of the H.264/AVC video coding standard,” IEEE

Trans. CSVT, Vol. 13, pp. 560-576, July 2003.

[3] C. Chen, P-H.Wu and H. Chen, “MPEG-2 to H.264 transcoding,” Picture Coding

Symposium, pp. 15-17 Dec, 2004.

[4] Jae-Beom Lee and H. Kalva, "An efficient algorithm for VC-1 to H.264 video

transcoding in progressive compression," IEEE International Conference on

Multimedia and Expo, pp. 53-56, July 2006

[5] J Xin, C.W. Lin and M.T. Sun, “Digital video transcoding”, Proceedings of the

IEEE, Vol. 93, pp 84-97, Jan 2005.

[6] A. Vetros, C. Christopoulos and H. Sun, “Video transcoding architectures and

techniques: An overview”, IEEE Signal Processing Magazine, Vol. 20, pp 18-29,

March 2003.

[7] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 /

ISO / IEC 14496-10, Mar 2005.

[8] S. Srinivasan and S. L. Regunathan, “An overview of VC-1” Proc. SPIE, vol.

5960, pp. 720–728, 2005.

[9] P. List et al, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst. Video

Technol., vol. 13, pp.614–619, Jun. 2003.

[10] T. D. Tran, J. Liang and C. Tu, “Lapped transform via time-domain pre- and

post-filtering,” IEEE Trans. Signal Proc., vol. 51, pp. 1557–1571, Jun. 2003.

21

[11] C. C. Cheng, T. S. Chang, and K. B. Lee, “An in-place architecture for the

deblocking filter in H.264/AVC,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53,

pp. 530–534, Jul. 2006.

[12] T. C. Chen et al “Analysis and architecture design of an HDTV720p 30 frames/s

H.264/AVC encoder,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, pp. 673 –

688, Jun. 2006.

[13] Y.-W. Huang et al “Architecture design for deblocking filter in H.264 / JVT /

AVC,” in IEEE Proc. Int. Conf. Multimedia and Expo, pp. 693–696, July 2003.

[14] S.-C. Chang et al “A platform based bus-interleaved architecture for de-blocking

filter in H.264/MPEG-4 AVC,” IEEE Trans. Consumer Electron., vol. 51, pp. 249–

255, Feb 2005.

[15] M. Sima, Y. Zhou, and W. Zhang, “An efficient architecture for adaptive

deblocking filter of H.264/AVC video coding,” IEEE Trans. Consumer Electronics,

vol. 50, pp. 292–296, Feb. 2004.

[16] S.-Y. Shih, C.-R. Chang and Y.-L. Lin, “A near optimal deblocking filter for

H.264 advanced video coding” in Proc. Asia and South Pacific Design Automation

Conf., pp. 170–175, Jan 2006.

[17] T.-M. Liu et al, “A memory-efficient deblocking filter for H.264/AVC video

coding,” in Proc. IEEE Int. Symp. Circuits Syst., pp. 2140–2143, May 2005.

[18] T.-M. Liu et al, “A 125 µ W fully scalable MPEG-2 and H.264/AVC video

decoder for mobile applications,” IEEE J. Solid-State Circuits, vol. 42, pp. 161–169,

Jan. 2007.

[19] L. Li, S. Goto and T. Ikenaga, “An efficient deblocking filter architecture with 2-

dimensional parallel memory for H.264/AVC,” in Proc. Asia and South Pacific

Design Automation Conf., pp.623–626, 2005

[20] H.-Y. Lin et al “Efficient deblocking filter architecture for H.264 video coders,”

in IEEE ISCAS, pp 4, May 2006

[21] T.-M. Liu, W.-P. Lee and C.-Y. Lee, “An in/post-loop deblocking filter with

hybrid filtering schedule” IEEE Trans. Circuits Syst. for Video Technol., vol. 17, pp.

937–943, Jul. 2007.

[22] I. Ahmad et al, “Video transcoding: An overview of various techniques and

research Issues”, IEEE Trans. on Multimedia, vol. 7, pp. 793-8, Oct. 2005

22

[23] Y.L Lee and T.Q Nguyen, "Analysis and efficient architecture design for VC-1

overlap smoothing and in-loop deblocking Filter," IEEE Trans Circuits and Syst. for

Video Technol, vol.18, pp 1786-1796, Dec. 2008

[24] G. Fernandez-Escribano et al, “Speeding-up the macroblock partition mode

decision for MPEG-2 to H.264 transcoding,” Proceedings of IEEE ICIP 2006,

Atlanta, pp 869-872, Sept 2006.

[25] Z. Zhou et al "Motion information and coding mode reuse for MPEG-2 to H.264

transcoding", Proceedings of the IEEE ISCAS 2005, pp 1230-1233, May 2005.

[26] B. Petljanski and H. Kalva, “DCT domain intra MB mode decision for MPEG-2

to H.264 transcoding” Proceedings of the IEEE ICCE 2006, pp. 419-420, Jan 2006.

[27] J. Bialkowski, A. Kaup and K. Illgner, “Fast transcoding of intra frames between

H.263 and H.264,” IEEE ICIP, vol.4, pp. 2785- 2788, Oct 2004.

[28] Y.-K. Lee, S.-S. Lee, and Y.-L. Lee, “MPEG-4 to H.264 transcoding using

macroblock statistics,” Proceedings of the IEEE ICME 2006, pp.57-60, Toronto,

Canada, July 2006.

[29] G. Sullivan, P. Topiwalla and A. Luthra, “The H.264/AVC video coding

standard: overview and introduction to the fidelity range extensions”, SPIE

Conference on Applications of Digital Image Processing XXVII, vol. 5558, pp. 53-74

Aug 2004.

[30] T. Weigand et al, “Introduction to the Special Issue on Scalable Video Coding—

Standardization and Beyond” IEEE Trans on Circuits and Systems for Video

Technology, Vol 17, pp 1034, Sept 2007.

[31] Von Roden and T. Praktische, “H.261 and MPEG1- A comparison” Conference

Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on

Computers and Communications, pp.65-71, Mar 1996

[32] S. Srinivasan et al, “Windows Media Video 9: overview and applications” Signal

Processing: Image Communication, Vol 19, pp 851-875, Oct 2004.

[33] S. K. Kwon, A. Tamhankar and K.R. Rao, "An overview of H.264/MPEG-4 Part

10," Special issue of Journal of Visual Communication and Image

Representation,vol.17, pp 186-216, April 2006.

[34] G.A Davidson et al, “ATSC video and audio coding”, Proc. IEEE, vol 94, pp 60-

76, Jan 2006.

23

[35] J. Bialkowski, M Barkowky and A. Kaup, “Overview of low complexity video

transcoding from H.263 to H.264” IEEE ICME, pp 49-52, 2006.

[36] T. D. Nguyen et al, “Efficient MPEG-4 to H.264/AVC transcoding with spatial

downscaling”, ETRI Journal, vol.29, no.6, pp 826-828, Dec. 2007.

[37] H. Kalva, G.F. Escribano and K Kunzelmann, “Reduced resolution MPEG-2 to

H.264 transcoder” Proc. SPIE, Vol. 7257, 72571V Jan 2009.

[38] S Moiron et al, "H.264/AVC to MPEG-2 video transcoding architecture", Proc

Conf. on Telecommunications - ConfTele, Peniche, Portugal, Vol. 1, pp. 449 - 452,

May, 2007.

[39] S Moiron et al, “Video transcoding from H.264/AVC to MPEG-2 with reduced

computational complexity”, Signal Processing: Image Communication, vol 24, pp

637-650, September 2009

[40] Mei-Juan Chen, Ming-Chung Chu and Chih-Wei Pan, “Efficient motion-

estimation algorithm for reduced frame-rate video transcoder”, IEEE Trans on

Circuits and Systems for Video Technology, vol. 12, pp. 269–275, Apr. 2002.

[41] ISO/IEC 11172-2:1993 Information technology -- Coding of moving pictures

and associated audio for digital storage media at up to about 1,5 Mbits/s -- Part 2:

Video

[42] H. Kalva and J.B. Lee, "The VC-1 Video Coding Standard," IEEE Multimedia,

vol. 14, pp. 88-91, Oct.-Dec. 2007

[43] P. Bordes, A. Orhand, “Improved Algorithm for fast transcoding H.264”

EUSIPCO 2007.

REFERENCE BOOKS:

[44] K. Sayood, “Introduction to Data compression”, III edition, Morgan

Kauffmann publishers, 2006.

[45] I.E.G. Richardson, “H.264 and MPEG-4 video compression: video coding for

next-generation multimedia”, Wiley, 2003.

24

[46] K. R. Rao and P. C. Yip, “The transform and data compression handbook”,

Boca Raton, FL: CRC press, 2001.

[47] K.R. Rao and J.J. Hwang “Techniques and Standards for Image, Video, and

Audio Coding” - Prentice Hall, 1996.

[48] J.B. Lee and H. Kalva, The VC-1 and H.264 Video Compression Standards

for Broadband Video Services, Springer, 2008.

REFERENCE WEBSITES:

[49] JM software : http://iphome.hhi.de/suehring/tml/

[50] VC-1 Software : http://www.smpte.org/home

[51] Microsoft website - VC-1 Technical Overview

http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverv

iew.aspx#VC1ComparedtoOtherCodecs

[52] VC-1 Wikipedia site - http://en.wikipedia.org/wiki/VC-1

[53]

ACRONYMS:

ASO Arbitrary slice orderingAVC Advanced Video CodingB MB Bi-predicted MBCDDT Cascaded DCT Domain TranscoderCPDT Cascaded Pixel Domain TranscoderDCT Discrete Cosine TransformDSP Digital Signal ProcessingDVD Digital Versatile DiscFMO Flexible macroblock orderingFRExt Fidelity Range ExtensionsGOP Group Of PicturesI MB Intra Predicted MBIEC International Electrotechnical CommissionISO International Organization for StandardizationITU-T International Telecommunication Union – Transmission

sectorJVT Joint Video TeamP MB Inter Predicted MBIDCT Inverse Discrete Cosine TransformIQ Inverse Quantizer

25

http://en.wikipedia.org/wiki/VC-1

http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx#VC1ComparedtoOtherCodecs



http://www.smpte.org/home

http://iphome.hhi.de/suehring/tml/

MB MacroblockME Motion EstimationMC Motion CompensationMV Motion VectorMPEG Moving Picture Experts GroupMSE Mean Square ErrorPSNR Peak –to – peak Signal to Noise RatioQ QuantizerR-D Rate - DistortionSDDT Simplified DCT Domain TranscoderSP/SI Switched P / Switched ISMPTE Society of Motion Picture and Television EngineersSSIM Structural Similarity Index MeasureSVC Scalable Video CodingVCEG Video Coding Experts GroupVLC Variable Length CodingVLD Variable Length DecoderYUV Y- Luminance and UV- Chrominance

26

proposal

Documents

domain transcoding

video transcoding

spatial domain

pixel domain transcoders

basic architecture

heterogeneous transcoding

sddt architecture

rate reduction transcoding