Top Banner
-1- 2004. 10. 20. Overview of H.264 / Overview of H.264 / MPEG-4 Part10 MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington
138

DVR H.264 Slides

Nov 07, 2014

Download

Documents

imran_212

video format H.264
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

-1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington -2- Contents 1. Introduction 2. Layered Structure 3. Video Coding Algorithm 4. Error Resilience 5. Comparison of Coding Efficiency6. Conclusions -3- Introduction Scope of Image and Video Coding StandardsOnly the Syntax and Decoder are standardized: Optimization beyond the obvious Complexity reduction for implementation Provides no guarantees of quality Pre-ProcessingEncoding Post-Processing & Error Recovery Decoding Input (image / video) Output (image / video) Scope of Standard -4- Introduction Video Coding Standards2003Advanced Video Coding 2002Multimedia FrameworkMPEG-21 2001 Multimedia Content description Interface MPEG-7 2000Interactive videoMPEG-4 1995DTV, SDTV, HDTV, DVDMPEG-2 1992Video CDMPEG-1 1998, 2000VideophoneH.263, H.263++ 1995, 2000DTV, SDTVH.262, H.262+ 1990Video ConferencingH.261 1995-2000FaxJBIG 1992-1999, 2000ImageJPEG, JPEG2000 YearMain ApplicationsStandard 2004 August Fidelity Range Extensions(High profile), Studio editing, Post processing, Digital cinema H.264/MPEG-4 part 10 -5- Introduction MPEG-1Formally ISO/IEC 11172-2 (93), developed by ISO/IEC JTC1 SC29 WG11 (MPEG) use is fairly widespread, but mostly overtaken by MPEG-2 Superior quality compared to H.261 when operated at higher bit rates ( > 1Mbps for CIF 352x288 resolution) Provides approximately VHS quality between 1-2Mbps using SIF 352x240/288 resolution Additional technical features : Bi-directional motion prediction (B-pictures) Half-pel motion vector resolution Slice-structured coding DC-only D pictures -6- Introduction Predictive Coding with B PicturesIBPBP -7- Introduction MPEG-2 / H.262 Formally ISO/IEC 13818-2 & ITU-T H.262, developed (1994) jointly by ITU-T and ISO/IEC SC29 WG11 (MPEG) Now in wide use for DVD and standard & high-definition DTV (the most commonly used video coding standard) Primary new technical features: Support for interlaced-scan pictures Also Various forms of scalability (SNR, Spatial, Temporal and hybrid) I-picture concealment motion vectors Essentiallysame as MPEG-1 for progressive-scan pictures, and MPEG-1 forward compatibility is required Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV broadcast, 6-8Mbps DVD, 18Mbps HDTV), picture skipping not easy -8- Introduction H.263 : The Next Generation ITU-T Rec. H.263 (v1: 1995): The next generation of video coding performance, developed by ITU-T the current premier ITU-T video standard (has overtaken H.261 as dominant videoconferencing codec) Superior quality to prior standards at all bit rates (except perhaps for interlaced video) Wins by a factor of two at very low rates Version 2 (late 1997 / early 1998) & version 3 (2000) later developed with a large number of new features Profiles defined early 2001 H.263+ & H.263++ (Extensions to H.263) -9- Introduction MPEG-4 Visual : Baseline H.263 and Many Creative Extras MPEG-4 Visual (formally 14496-2, v1: early 1999): Contains the H.263 baseline design and adds essentially all prior features and many creative new extras: Segmented coding of shapes Scalable wavelet coding of still textures Mesh coding Face animation coding Coding of synthetic and semi-synthetic content 10 & 12-bit sampling More v2 (early 2000) & v3 (early 2001) added later -10- Introduction Relationship to Other StandardsSame design to be approved in both ITU-T / VCEG and ISO/IEC / MPEG In ITU-T / VCEG this is a new & separate standard ITU-T Recommendation H.264 ITU-T Systems (H.32x) is modified to support it In ISO/IEC / MPEG this is a new part in the MPEG-4 suite Separate coded design from prior MPEG-4 visual (Part 2) New part 10 called Advanced Video Coding (AVC similar to AAC MPEG-2 as separate audio codec) Not backward or forward compatible with prior standards MPEG-4 Systems / File Format modifying to support it H.222.0 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization -11- Introduction History of H.264 / MPEG-4 part 10ITU-T Q.6/SG16 started work on H.26L (L: Long Range) July 2001: H.26L demonstrated at MPEG (Moving Picture Experts Group) call for technology December 2001: ITU-T VCEG (Video Coding Experts Group) and ISO/IEC MPEG starteda joint project Joint Video Team (JVT) May 2003: Final approval from ISO/IEC and ITU-T The standard is named H.264 by ITU-T and MPEG-4 part 10 by ISO/IEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3 -12- Introduction Purpose of H.264 / MPEG-4 part 10Higher coding efficiency than previous standards, MPEG-1,2,4 part 2, H.261, H.263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting, video streaming, video conferencing, D-Cinema, HDTV Network friendliness Balance between coding efficiency, implementation complexity and cost - based on state-of the-art in VLSI design technolgy -13- Introduction H.264 / MPEG-4 part 10 Architecture -14- Introduction Applications of H.264 / MPEG-4 part 10 : A Broad range of applications for video content including but not limited to the following: Video Streaming over the internet CATVCable TV on optical networks, copper, etc. DBSDirect broadcast satellite video services DSLDigital subscriber line video services DTTBDigitalterrestrialtelevisionbroadcasting,cablemodem, DSL ISMInteractive storage media (optical disks, etc.) MMMMultimedia mailing MSPNMultimedia services over packet networks RTCReal-time conversational services (videoconferencing, videophone, etc.) RVSRemote video surveillance SSMSerial storage media (digital VTR, etc.) D CinemaContent contribution, content distribution, studio editing, post processing -15- Introduction Profiles and Levels for particular applications Profile : a subset of entire bit stream of syntax, different decoder design based on the Profile Four profiles : Baseline, Main, Extended and High Streaming Video Extended Digital Storage Media Television Broadcasting Main Video ConferencingVideophone Baseline ApplicationsProfile Content contribution Content distributionStudio editing Post processing High -16- Introduction Specific coding parts for the Profiles -17- Introduction Common coding parts for the Profiles I slice (Intra-coded slice) : the coded slice by using prediction only from decoded samples within the same slice P slice (Predictive-coded slice) : the coded slice by using inter prediction from previously-decoded reference pictures, using at most one motion vector and reference index to predict the sample values of each block CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding -18- Introduction Coding parts for Baseline Profile Common parts : I slice, P slice, CAVLC FMO Flexible macroblock order : macroblocks may not necessarily be in the raster scan order. The map assigns macroblocks to a slice group ASO Arbitrary slice order : the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture RS Redundant slice : This slice belongs to the redundant coded data obtained by same or different coding rate, in comparison with previous coded data of same slice -19- Introduction Coding parts for Main Profile Common parts : I slice, P slice, CAVLC B slice (Bi-directionally predictive-coded slice) : the coded slice by using inter prediction from previously-decoded reference pictures, using at most two motion vectors and reference indices to predict the sample values of each block Weighted prediction : scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding -20- Introduction Coding parts for Extended Profile Common parts : I slice, P slice, CAVLC SP slice : the specially coded slice for efficient switching between video streams, similar to coding of a P slice SI slice : the switched slice, similar to coding of an I slice Data partition : the coded data is placed in separate data partitions, each partition can be placed in different layer unit Flexible macroblock order (FMO) Arbitrary slice order (ASO) Redundant slice (RS) B slice Weighted prediction -21- Introduction Profile specifications X CABAC XX Interlaced Coding XX B Slice X SP/SI Slices XX Error Resilience Tools Flexible MB Order, ASO, Red. Slices XXX CAVLC/UVLC XXX Variable Block Size (16x16 to 4x4) XXX Pel Motion Compensation XXX Deblocking Filter XXX I & P Slices ExtendedMainBaselineHigh X X X X X X X X Data PartitioningX -22- Introduction Application requirements

Application Requirements H.264 Profiles MPEG-4 Profiles Broadcast television Coding efficiency, reliability (over a controlled distribution channel), interlace, low-complexity decoder Main ASP (AdvancedSimple) Streaming video Coding efficiency, reliability (over a uncontrolled packet-based network channel), scalability Extended ARTS (Advanced Real Time Simple) or FGS (Fine Granular Scalability) Video storage and playback Coding efficiency, interlace, low-complexity encoder and decoder Main ASP Videoconferencing Coding efficiency, reliability, low latency, low-complexity encoder and decoder Baseline SP (Simple) Mobile video Coding efficiency, reliability, low latency, low-complexity encoder and decoder, low power consumption Baseline SP Studio distribution Lossless or near-lossless, interlace, efficient transcoding Main High Studio Profile

-23- Introduction Level : corresponding to processing power and memory capability of a codec Level numberPicture type & frame rate 1QCIF @ 15fps 1.1QCIF @ 30fps 1.2CIF @ 15fps 1.3CIF @ 30fps 2CIF @ 30fps 2.1HHR @15 or 30fps 2.2SDTV @ 15fps 3SDTV: 720x480x30i,720x576x25i 10Mbps(max) 3.11280x720x30p 3.21280x720x60p 4HDTV: 1920x1080x30i, 1280x720x60p, 2Kx1Kx30p 20Mbps(max) 4.1HDTV: 1920x1080x30i, 1280x720x60p, 2Kx1Kx30p 50Mbps(max) 4.2HDTV: 1920x1080x60i, 2Kx1Kx60p5SHDTV/D-Cinema: 2.5Kx2Kx30p 5.1SHDTV/D-Cinema: 4Kx2Kx30p -24- Introduction Parameter set limits for each Level Level number Max macroblock processing rate (MB/s) Max frame size (MBs) Max decoded picture buffer size (1024 bytes) Max video bit rate(1000 bits/s or 1200 bits/s) Max CPB size(1000 bits or 1200 bits) Vertical MV component range (luma frame samples) Min compression ratioMax number of MVs per two consecutive MBs11 48599148.564175[-64,+63.75]2 - 1.13 000396337.5192500[-128,+127.75] 2 - 1.26 000396891.03841 000[-128,+127.75] 2 - 1.311 880396891.07682 000[-128,+127.75] 2 - 211 880396891.02 0002 000[-128,+127.75]2 - 2.119 8007921 782.04 0004 000[-256,+255.75] 2 - 2.220 2501 6203 037.54 0004 000[-256,+255.75] 2 - 340 5001 6203 037.510 00010 000[-256,+255.75]2 32 3.1108 0003 6006 750.014 00014 000[-512,+511.75] 4 16 3.2216 0005 1207 680.020 00020 000[-512,+511.75] 4 16 4245 7608 19212 288.020 00025 000[-512,+511.75]4 16 4.1245 7608 19212 288.050 00062 500[-512,+511.75] 2 16 4.2491 5208 19212 288.050 00062 500[-512,+511.75] 2 16 5589 82422 08041 310.0135 000135 000[-512,+511.75] 2 16 5.1983 04036 86469 120.0240 000240 000[-512,+511.75] 2 16 -25- Layered Structure Two Layers : Network Abstraction Layer (NAL), Video Coding Layer (VCL)NAL Abstracts the VCL data hence the name Network Abstraction Layer Header information about the VCL format Appropriate for conveyance by the transport layers or storage media NAL unit (NALU) defines a generic format for use in both packet based and bit-streaming systems VCL Core coding layer Concentrates on attaining maximum coding efficiency -26- Layered Structure Elements of VCL -27- Layered Structure Supporting picture format : 4:2:0 chroma sampling CIF Format QCIF format 3524 288 lines 360 pels 4 22 144 lines 176 180 pels 22 144 lines 176 180 pels 1762 144 lines 180 pels 2 11 72 lines 88 90 pels 11 72 lines 88 90 pels YCb Cr -28- Video Coding Algorithm Block diagram for H.264 encoder Transform & Quantization Motion Estimation Motion Compensation Picture Buffering Entropy Coding Intra Prediction Intra/Inter Mode Decision Inverse Quantization & Inverse Transform Deblocking Filter + - + + Video Input Bitstream Output -29- Video Coding Algorithm Block diagram for H.264 Decoder Motion Compensation Entropy Decoding Intra Prediction Intra/Inter Mode Selection Inverse Quantization & Inverse Transform Deblocking Filter + + BitstreamInput Video Output Picture Buffering -30- VC Algorithm : Intra Prediction Exploits Spatial redundancy between adjacent macroblocks in a frame 4 x 4 luma block 9 prediction modes : 8 Directional predictions and 1 DC prediction (vertical : 0, horizontal : 1, DC : 2, diagonal down left : 3, diagonal down right : 4,vertical right : 5, horizontal down : 6, vertical left : 7, horizontal up : 8) abcd efgh ijkl m nop ABCD I J K L MEFGH mode 1 mode 6 mode 0mode 5 mode 4 abcd efgh ijkl mnop ABCD I J K L MEFGH mode 8 mode 3mode 7 samples a, b, , p : the predicted ones for the current block,above and left samples A, B, , M : previously reconstructed ones-31- VC Algorithm : Intra Prediction Example of 4 x 4 luma block Sample a, d : predicted by round(I/4 + M/2 + A/4), round(B/4 + C/2 + D/4) for mode 4 Sample a, d : predicted by round(I/2 + J/2), round(J/4 + K/2 + L/4) for mode 8 abcd efgh ijkl m nop ABCD I J K L MEFGH mode 4 abcd efgh ijkl mnop ABCD I J K L MEFGH mode 8 -32- VC Algorithm : Intra Prediction 16 x 16 luma 4 prediction modes (vertical : 0, horizontal : 1, DC : 2, plane : 3) Plane: works well in smoothly varying luminance. A linear plane function is fitted to the upper (H) and left side (V) samples (8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance Plane -33- VC Algorithm Intra Prediction Chroma always operates using full MB prediction (8x8) 4:2:0 Format (8x16) 4:2:2 (16x16) 4:4:4 (Similar to 16x16 luma block but different mode order) 4 Prediction modes (DC: 0, Horizontal: 1, Vertical: 2, Plane: 3) -34- VC Algorithm : Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures -35- VC Algorithm : Inter Prediction Prediction of variable block size A MB can be partitioned into smaller block sizes 4 cases for 16 x 16 MB, 4 cases for 8 x 8 Sub-MB Large partition size : homogeneous areas, small : detailed areas Cannot mix the two partitions .i.e. cannot have 16x8 and 4x8 partitions When sub-MB partition (8x8) is selected, the (8x8) block can be further partitioned -36- VC Algorithm : Inter Prediction Sub-pel motion compensation Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions Transform & Quantization Motion Estimation Motion Compensation Picture Buffering Entropy Coding Intra Prediction Intra/Inter Mode Decision Inverse Quantization & Inverse Transform Deblocking Filtering + - + + Video Input Bitstream Output

motion vector accuracy 1/4 (6 tap filter) 0 0 1 01 01 23 MB 16x1616x88x168x8 0 0 1 01 01 23 Sub MB 8x88x44x84x4 -37- VC Algorithm : Inter Prediction Sub-pel accuracy A distinct MV can be sent for each sub-MB partition. ME can be based on multiple pictures that lie in the past or in the future in display order. Reference picture for ME is selected at the MB partition level. Sub-MB partitions within the same MB partition must use the same reference picture. Integer position pixels1/8 pixels1/2 and 1/4 pixels-38- VC Algorithm : Inter Prediction Half-pel : interpolated from neighboring integer-pel samples using a 6-tap Finite Impulse Response filter with weights (1, -5, 20, 20, -5, 1)/32Quarter-pel : produced using bilinear interpolation between neighboring half- or integer-pel samples bba c E F I J GhdnHmACBDRTSUM s N K L P Qf e gj i kq p raabcc dd ee ffhhggb = round((E-5F+20G+20H-5I+J)/32) a = round((G+b)/2) -39- VC Algorithm : Inter Prediction Deblocking filter Adaptive To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noiseFiltering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock, adaptively on the several levels(slice, block-edge, sample) Vertical edges(chroma)Vertical edges(luma)Horizontal edges(luma)Horizontal edges(chroma)16*16 Macroblock 16*16 Macroblock-40- VC Algorithm : Inter Prediction Management of multiple reference pictures To take care of marking some stored pictures as unused and deciding which pictures to delete from the buffer Transform & Quantization Motion Estimation Motion Compensation Picture Buffering Entropy Coding Intra Prediction Intra/Inter Mode Decision Inverse Quantization & Inverse Transform Deblocking Filtering + - + + Video Input Bitstream Output management of multiple reference pictures (short term, long term) -41- VC Algorithm : Transform & Quantization Transform Integer transform, multiplier free : additions and shifts in 16-bit arithmetic Hierarchical structure : 4 x 4 Integer DCT + Hadamard transform 0 1 4 52 3 6 78 9 12 1310 11 14 1500 01 02 0310 11 12 1320 21 22 2330 31 32 33Assignment of the indices of DC (dark samples) to luma 4 x 4 block,the numbers 0, 1, , 15 are the coding order for (4x4) integer DCT transform (0,0), (0,1), (0,2), , (3,3) are DC coefficients of each 4x4 block Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT. Similarly for the chroma: MB size for chroma depends on 4:2:0, 4:2:2 and 4:4:4 formats -42- VC Algorithm : Transform 4 x 4 integer DCT X : input pixels, Y : output coefficients Y=(Cf x CfT) Ef 1 2 1, ,2 5 2a b d = = =Implies element by element multiplication 00 01 02 0310 11 12 1320 21 22 2330 31 32 331 1 1 1 1 2 1 12 1 1 2 1 1 1 21 1 1 1 1 1 1 21 2 2 1 1 2 1 1 ((( ((( (((= ((( ((( ((( x x x xx x x xYx x x xx x x x2 22 22 22 22 22 4 2 42 22 4 2 4ab aba aab b ab bab aba aab b ab b ( ( ( ( ( ( ( ( ( ( ( -43- 4x4 Inverse IntDCT 2 22 22 22 22 22 4 2 42 22 4 2 4ab aba aab b ab bab aba aab b ab b ( ( ( ( ( ( ( ( ( ( ( In both forward and inverse transforms QP (Quantization step) is embedded in matrices Ef and Ei 2 22 22 22 2[ '] [ ]a ab a abab b ab bY Ya ab a abab b ab b ( ( (= ( ( Here X = CiT (Y Ei) Ci -44- VC Algorithm : Transform Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed using Walsh Hadamard transform 2 //1 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 133 32 31 3023 22 21 2013 12 11 1003 02 01 00|||||.|

\|(((((

((((((

(((((

D D D DD D D DD D D DD D D Dx x x xx x x xx x x xx x x xYD= where // = rounding to the nearest integer -45- VC Algorithm : Transform Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform : 2 x 2 DC coefficients YD= ((

((

((

1 11 11 11 111 1001 00DC DCDC DC18 19 20 21 22 23 2425 VU 2x2 DC AC 16 17 , 4:2:0 For 4:2:2 and 4:4:4 chroma formats Hadamard block size is increased. -46- VC Algorithm : Transform Block diagram emphasizing transform Transform & Quantization Motion Estimation Motion Compensation Picture Buffering Entropy Coding Intra Prediction Intra/Inter Mode Decision Inverse Quantization & Inverse Transform Deblocking Filtering + - + + Video Input Bitstream Output - 4 x 4 integer DCT transform H = - Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks 11 1 1 21 1 2 1 1 1 1 1 2 2 1 -47- VC Algorithm : Quantization Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder : post-scaling and quantization Decoder : inverse quantization and pre-scaling ||.|

\|=QstepSFround X Yijij ijij ij ijSF Qstep Y X - - = 'X : quantizer input Y : quantizer output Qstep : quantization parameter, a total of 52 values, doubles in size for every increment of 6 in QP 8 for bits per decoded sample. FRExt expands QP beyond 52 by 6 for each additional bit of decoded sampleSF : scaling term -48- VC Algorithm : Transform, Quantization Rescale and Inverse transform Intra (16x16) prediction mode only Forward transformPost-scaling and quantization 2x2 or 4x4 DC transform Chroma or Intra- 16 Luma Only Encoder part Input block Inversequantization and pre-scaling Inverse transform2x2 or 4x4 DC inverse transform Chroma or Intra- 16 Luma Only Decoder part Encoder output / decoder inputOutput block-49- VC Algorithm : Entropy Coding All syntax elements other than residual transform coefficients are encoded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficients) : zig-zag, alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profiles Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Profile 0 1 5 62 4 7 123 8 11 139 10 14 15a b0 2 8 121 5 9 133 6 10 144 7 11 15Zig-zag scan Alternate scan -50- Exponential Golomb codes (for data elements other than tansform coefficients these codes are actually fixed, and are also called Universal Variable Length Codes (UVLC)) -51- These are variable length codes with a regular construction [M Zeroes] [1] [INFO] INFO is an M-bit carrying information. The first codeword as no leading zero or trailing info. Code words 1 and 2 have a single-bit INFO field, code words 3-6 have a two-bit INFO field and so on. The length of each Exp-Golomb codeword is (2M+1) bits. M = Floor (Log2 [code_num + 1]) INFO = code_num + 1 2M -52- Decoding 1. Read in M leading zeroes followed by 1 2. Read in M-bit INFO field 3. Code_num = 2M + INFO 1 (For codeword 0, INFO and M are zero) CAVLC: Codes transform coefficients CABAC: Codes transform coefficients and MV All other syntax elements are coded with the Exp_Golomb codes -53- VC Algorithm : Entropy Coding CAVLC : handles the zero and +/-1 coefficients as the different manner with the levels of coefficients. The total numbers of zeros and +/-1 are coded. For the other coefficients, their levels are coded. Encoding steps step 1 : encode the total number of nonzero coefficients and +/-1 (trailing ones) valuesstep 2 : encode the sign of each trailing one in reverse order step 3 : encode the levels of the remaining non-zero coefficients in reverse orderstep 4 : encode the total number of zeros before the last coefficientstep 5 : encode each run of zeros H.264 maintains 11 different sets of codes (4 for # of coefficients and 7 for the actual coefficients) These are adopted to the current stream or context (thus CAVLC) -54- VC Algorithm : Entropy Coding Example of CAVLC c0c1c20110100 0012 3456 7 89 16ordercoeff.Step 1 : encode for no. of nonzero total coefficients and 1 or 1 (trailing ones)from look-up table no. of nonzero total coefficients = 6 (order 0, 1, 2, 4, 5, 7)no. of trailing ones = 3 (order 4, 5, 7)Step 2 : encode for sign of trailing one in reverse order - (order 7) , + (order 5), + (order 4) Step 3 : encode for level of remaining non-zero coefficients in reverse order c2 (order 2), c1, c0 Step 4 : encode for total no. of zeros before the last coefficient2 (order 3, 6) Step 5 : encode for run of zeros in reverse order1 (order 6-5), 0 (order 4), 1 (order 3-2) -55- VC Algorithm : Entropy Coding CABAC : utilizes the arithmetic coding, also in order to achieve good compression, the probability model for each symbol element is updated. Both MV and residual transform coefficients are coded by CABAC. Encoding steps step 1 : context modeling: Choose a suitable model step 2 : binarization: I f a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins step 3 : binary arithmetic coding using probability estimates provided by context modeling -56- CABAC increases compression efficiency by 10% over CAVLC but computationally more intensive -57- VC Algorithm : B Slice Generalized Bidirectional prediction Supports not only forward/backward prediction pair, but also forward/forward and backward/backward pairsDirect mode Derives reference picture, block size, and motion vector data from the subsequent inter picture. Weighted prediction Scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice. Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order) -58- VC Algorithm : B Slice Generalized Bidirectional prediction Multiple reference pictures mode Two forward references : proper for a region just before scene change Two backward references : proper for a region just after scene change

......next picturescurrent picture...... ............previous pictures2 forward MVs2 backward MVs1 forward MV +1 backward MV-59- VC Algorithm : B Slice Direct mode Forward / backward pair of bi-directional predictionPrediction signal is calculated by a linear combination of two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures. List 0 ReferencetdtbmvColmvL0mvL1......direct-mode partitionco-located partitionList 1 Reference Current PicturemvL0 = tb mvCol / tdmvL1 = (td tb) mvCol / td where mvCol is a MV usedin the co-located MB ofthe subsequent picture -60- VC Algorithm : B Slice Weighted prediction Different weights of reference signals for gradual transitions from scene to scene, i.e.,fade to black (the luma samples of the scene gradually approach zero), fade from blackDifferent weighted prediction method for a macroblock of P slice or B slice A prediction signal p for B slice is obtained by different weights from two reference signals, r1 and r2.p = w1 r1 + w2 r2 where w1 and w2 are weighting factorsImplicit type : the factors are calculated based on the temporal distance between the pictures Explicit type : the factors are transmitted in the slice header -61- VC Algorithm: SP and SI Slices (Extended profile only) Switched slice SP slice : the specially coded slice for efficient switching between video streams, similar to coding of a P slice SI slice : the switched slice, similar to coding of an I slice P(1,1)P(1,2)P(1,3)P(1,4)P(1,5) P(2,1)P(2,2) P(2,3)P(2,4)P(2,5) S(3) Bitstream A Bitstream B Allows bit stream switching and additional functionalities such as random access, fast forward,reverse and stream splicing.-62- Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SP/SI Data partitioning Arbitrary Slice Order ASO Only in Extended Profile -63- Data partitioning slices (Extended profile only) 1. Coded data of a slice is placed in three separate data partitions A,B & C. 2. A has slice header and header data for each MB in the splice 3. B has coded residual data for intra and SI slice MBs 4. C has coded residual data for inter coded MB 5. Place each partition A, B & C in a separate NAL unit and transport separately-64- Error Resilience : Parameter setting The sequence parameter set contains all information related to a sequence of pictures a picture parameter set contains all information related to all the slices belonging to a single picture. The encoder chooses the appropriate picture parameter set to use by referencing the storage location in the slice header of each coded slice. H.264 Encoder H.264 Decoder Parameter Set #3 -Video format NTSC -Motion Resolution -Enc: CABAC -Frame width: 11 1 2 3 3 2 1 Reliable Parameter Set Exchange VCL Data transfer with PS #3 -65- Error Resilience : FMO Flexible macroblock ordering allows to assign macroblocks to slices in an order other than the scan order. Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1, and the macroblocks in each slice group are dispersed through the picture.If the packet containing the information of slice group 1 is lost during transmission, then the lost macroblock can be recovered by the error concealment mechanism, since every lost macroblock has several spatial neighbors that belong to the other slice. ASO is similar to FMO. Randomizes data prior to transmission. Errors are distributed more randomly over the video frames rather than in a single block of data. -66- Error Resilience : Redundant Slice Redundantslicesallowtoplaceoneormoreredundant representations of the same macroblocks. For example, the primary representation can be coded with a low quantization parameter (hence in good quality), whereas the redundant slice can be coded with a high quantization parameter (hence, in a much coarser quality, but also utilizing fewer bits). A decoder reacts to redundant slices by reconstructing only the primary slice, if it is available, and discarding the redundant slice. However, if the primary slice is missing, the redundant slice can be reconstructed. -67- Comparison of Coding Efficiency Subjective verification test Comparison of the H.264 Baseline Profile (BP) and MPEG-4 part 2SimpleProfile(SP)forthemultimediadefinition(MD).The numbers in the table indicate the coding efficiency improvement achievedbytheH.264wherethecodecsbeingcompared providestatisticallyequivalentpicturequality.TheletterT indicates that H.264 achieved transparency. H.264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases.

Sequence Bitrate[kbps] for QCIF Bitrate[kbps] for CIF 24 48 96 192 96 192 384 768 Foreman > 1x 2x 2x T 2x > 2x T T Paris > 1x 2x 2x

2x 2x T, 2x T Head

> 2x 2x

2x

T T Zoom > 1x 1x 2x

2x

-68- Comparison of Coding Efficiency Subjective verification test ComparisonofH.264MainProfile(MP)andMPEG-4Part2 Advanced Simple Profile (ASP) for the MD.H.264MainProfileachievesacodingefficiencyimprovement of2timesorgreaterin18outof25statisticallyconclusive cases.

Sequence Bitrate[kbps] for QCIF Bitrate[kbps] for CIF 24 48 96 192 96 192 384 768 Football 2x / 1x 2x 2x

> 1x > 1x 1x > 1x Mobile 2x / 1x 2x 2x

> 2x 4x > 2x T Husky 2x 2x > 1x

2x 2x 2x

Tempete 2x 2x > 2x T 2x 2x T,2x T -69- Comparison of Coding Efficiency Subjective verification test Comparison of H.264 Main Profile and MPEG-2 for the Standard Definition (SD) When compared to MPEG-2 HiQ (real-time High Quality), H.264 MainProfileachievesacodingefficiencyimprovementof1.5 times or greater in 8 out of 12 statistically conclusive cases. When compared to MPEG-2 TM5, H.264 Main Profile achieves a codingefficiencyimprovementof1.8timesorgreaterin9out of 12 statistically conclusive cases.

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5 1.5 2.25 3 4 6 1.5 2.25 3 4 6 Football > 1.5x > 1.3x 1.3x 1.5x

2x 1.8x 1.3x 1.5x

Mobile 4x 2.7x 2x T T > 4x > 2.7x > 2x T T Husky > 1.5x 1.3x 1x /1.3x 1.5x

2.7x / 2x 1.8x 2x > 1.5x

Tempete T, 2x T T T T T, 4x T T T T

-70- Comparison of Coding Efficiency Subjective verification test ComparisonofH.264MainProfileandMPEG-2fortheHigh Definition (HD) WhencomparedtoMPEG-2HiQ,H.264MainProfileachievesa codingefficiencyimprovementof1.7timesorgreaterin7out of 9 statistically conclusive cases.When compared to MPEG-2 TM5, H.264 Main Profile achieves a codingefficiencyimprovementof1.7timesorgreaterin8out of 9 statistically conclusive cases.

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5 6 10 20 6 10 20 720 (60p) Crew 1.7x 2x T 1.7x 2x T Harbour T, 3.3x T T T, 1.7x T T 1080 (30i) Stockholm Pan

1x

2x

New Mobile & Calendar

T, 2x T

T, 2x T 1080 (25p) River Bed > 1.7x > 1x T > 1.7x > 1x T Vintage Car 1.7x T, 2x T 1.7x T, 2x T

-71- Comparison of Coding Efficiency Objective test PSNR (between original and reconstructed pictures) and bitrate savingresultsofTempeteCIF15Hzsequenceforthevideo streaming application

HLP High Latency ProfileASP Advanced Simple Profile H.26L H.264 Main Profile -72- Comparison of Coding Efficiency Objective test PSNR and bitrate saving results of Paris CIF 15Hz sequence for the video conferencing application

CHC Conversational High Compression SP Simple Profile ASP Advanced Simple Profile H.26L H.264 Baseline Profile -73- Conclusions H.264 outperforms over the previous standards Comparison of standards

Feature/StandardMPEG-1MPEG-2MPEG-4 part 2 (visual) H.264/MPEG-4 part 10 Macroblock size16x1616x16 (frame mode) 16x8 (field mode) 16x16 16x16Block Size 8x8 8x8 16x16, 16x8, 8x816x16, 8x16, 16x8, 8x8, 4x8, 8x4, 4x4Transform 8x8 DCT8x8 DCT8x8 DCT/Wavelet4x4, 8x8 Int DCT 4x4, 2x2 Hadamard QuantizationScalar quantization with step size of constant incrementScalar quantization with step size of constant increment Vector quantizationScalar quantization with step size increase at the rate of 12.5% Entropy coding VLCVLCVLC VLC, CAVLC, CABAC Motion Estimation & CompensationYesYesYesYes, more flexible Up to 16 MVs per MB Playback & Random AccessYesYesYesYes-74- Conclusions Comparison of standards (continued)

Feature/StandardMPEG-1MPEG-2MPEG-4 part 2 (visual) H.264/MPEG-4 part 10 Pel accuracy Integer, -pel Integer, -pel Integer, -pel,-pel Integer, -pel,-pel Profiles No584 Reference picture oneoneonemultiple Bidirectional prediction mode forward/backwardforward/backwardforward/backwardforward/forward forward/backward backward/backward Picture Types I, P, B, D I, P, B I, P, B I, P, B, SP, SI Error robustness Synchronization & concealmentData partitioning, FEC for important packet transmissionSynchronization, Data partitioning, Header extension, Reversible VLCsData partitioning, Parameter setting, Flexible macroblock ordering, Redundant slice, Switched slice Transmission rate Up to 1.5Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240MbpsCompatibility with previous standardsn/aYesYesNo Encoder complexity LowMedium Medium High -75- Conclusions

CurrentlythecommercialH.264codecsarewidelydevelopedby several companies for replacing / complementing existing products. Related companies - UBVideo website http://www.ubvideo.com - LSI Logic website http://www.lsilogic.com - Microsoft website: http://www.microsoft.com - Envivio website: http://www.envivio.com- Broadcom website: http://www.broadcom.com - Nagravision website: http://www.nagravision.com - Philips website: http://www.philips.com - Polycom website: http://www.polycom.com - PixelTools Corporation website: http://www.pixeltools.com - Amphion website: http://www.amphion.com -76- Conclusions

Related companies (continued) - Ligos Corporation website: http://www.ligos.com - LifeSize website: http://www.lifesize.com - Netvideo website: http://www.netvideo.com - Motorola website: http://www.motorola.com - Vanguard Software Solutions website: http://www.vsofts.com - STMicroelectronics website: http://us.st.com - MainConcept website: http://www.mainconcept.com - Impact Labs Inc. website: http://www.impactlabs.com - Sorenson media AVC Pro codec (H.264) - Blu-RayDiscAssociation(BDA)MPEG-4AVCHighProfileand Microsofts VC-1 video codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification) -77- Conclusions

Related group- MPEG website http://www.mpeg.org - JVT website: ftp://standards.polycom.com - www.mpegif.org Test softwarehttp://iphome.hhi.de/suehring/tml/download - H.264/AVC JM Software: http://bs.hhi.de/~suehring/tml/download Test sequences- http://ise.stanford.edu/video.html - http://kbs.cs.tu-berlin.de/~stewe/vceg/sequences.htm - http://www.its.bldrdoc.gov/vqeg - ftp.tnt.uni-hannover.de/pub/jvt/sequences/ - http://trace.eas.asu.edu/yuv/yuv.html -78- Conclusions H.264licensing:MPEGLAandViaLicensingarenowcoordinating thelicensingterms,decoder-encoderroyaltiesforproduct manufacturersandparticipationfeesforvideostreamingservices regardless of Profile(s) MPEG LA website : http://www.mpegla.com Via Licensing : http://www.vialicensing.com FRExtensions to 4:2:2 and 4:4:4 chroma formats 12 bit resolution for medical imaging Scalable coding/ Lossless coding for digital cinema application High fidelity coding for the next generation optical discs ExtensionforvariousapplicationsH.Schwartz,D.MarpeandT. Wiegand,SNRscalableextensionofH.264/AVC,ICIP2004, vol., pp., Singapore, Oct. 2004. FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79- Contacts for Further Information JVT documents and software on open ftp website: ftp://standards.polycom.com http://iphome.hhi.de/suehring JVT reflector subscription:http:/mail.imtc.org/cgi-bin/lyris.pl?enter=jvt-experts JVT reflector e-mail:[email protected] JVT management team:Chair: Gary Sullivan ([email protected]) Co-chair: Ajay Luthra ([email protected]) Co-chair: Thomas Wiegand ([email protected]) Dr. K. R . Rao, UTA: [email protected] Dr. S. K. Kwon, Dongeui University: [email protected] Ms. A. Tamhankar, T-Mobile: [email protected] [email protected]

-80- References [1] MPEG-2: ISO/IEC JTC1/SC29/WG11 and ITU-T, ISO/IEC 13818-2: Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Video, ISO/IEC and ITU-T, 1994.[2] MPEG-4: ISO/IEC JTCI/SC29/WG11, ISO/IEC 14 496:2000-2: Information on Technology-Coding of Audio-Visual Objects-Part 2: Visual, ISO/IEC, 2000.[3] H.263 : International Telecommunication Union, Recommendation ITU-T H.263: Video Coding for Low Bit Rate Communication, ITU-T, 1998. [4] H.264 : International Telecommunication Union, Recommendation ITU-T H.264: Advanced Video Coding for Generic Audiovisual Services, ITU-T, 2003. [5] T. Stockhammer, M. Hannuksela, and S. Wenger, H.26L/JVT Coding Network Abstraction Layer and IP-based Transport, IEEE ICIP 2002, Rochester, New York, Vol. 2, pp. 485-488, Sep. 2002. -81- [6] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, Adaptive Deblocking Filter, IEEE Trans. CSVT, Vol. 13, pp. 614-619, July 2003. [7] K. R. Rao and P. Yip, Discrete Cosine Transform, Academic Press, 1990.[8] I. E.G. Richardson, H.264 and MPEG-4 Video Compression : Video Coding for Next-generation Multimedia, Wiley, 2003.[9] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-Complexity Transform and Quantization in H.264/AVC, IEEE Trans. CSVT, Vol. 13, pp. 598-603, July 2003. [10] S. W. Golomb, Run-Length Encoding, IEEE Trans. on Information Theory, IT-12, pp. 399-401, December 1966. [11] D. Marpe, H. Schwarz, and T. Wiegand, Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard, IEEE Trans. CSVT, Vol. 13, pp. 620-636, July 2003. -82- [12] M. Flierl and B. Girod, Generalized B Picture and the Draft H.264/AVC Video-Compression Standard, IEEE Trans. CSVT, Vol. 13, pp. 587-597, July 2003. [13] M. Karczewicz and R. Kurceren, The SP- and SI-Frames Design for H.264/AVC, IEEE Trans. CSVT, Vol. 13, pp. 637-644, July 2003. [14] S. Wenger, H.264/AVC Over IP, IEEE Trans. CSVT, Vol. 13, pp. 645-656, July 2003. [15] ISO/IEC JTC1/SC29/WG11, Report of The Formal Verification Tests on AVC (ISO/IEC14496-10 | ITU-T Rec. H.264), MPEG2003/N6231, December 2003. [16] M. Ghanbari, Standard Codecs : Image Compression to Advanced Video Coding, Hertz, UK: IEE, 2003. [17] A. Joch, F. Kossentini, H. Schwarz, T. Wiegand, and G. J. Sullivan, Performance Comparison of Video Coding Standards using Lagrangian Coder Control, IEEE ICIP 2002, Rochester, New York, Vol. 2, pp. 501-504, Sept. 2002. -83- [18] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, Overview of the H.264/AVC Video Coding Standard, IEEE Trans. CSVT, Vol. 13, pp. 560-576, July 2003. [19] MPEG website : http://www.mpeg.org [20] JVT website : ftp://standards.polycom.com [21] MPEG LA website : http://www.mpegla.com [22] H.264 / AVC JM Software : http://bs.hhi.de/~suehring/tml/download [23] UBVideo website http://www.ubvideo.com [24] LSI Logic website: http://www.lsilogic.com [25] Microsoft website: http://www.microsoft.com [26] Envivio website: http://www.envivio.com [27] PixelTools Corporation website: http://www.pixeltools.com [28] Nagravision website: http://www.nagravision.com [29] Philips website: http://www.philips.com -84- [30] Polycom website: http://www.polycom.com [31] MainConcept website: http://www.mainconcept.com [32] Amphion website: http://www.amphion.com [33] Ligos Corporation website: http://www.ligos.com [34] LifeSize website: http://www.lifesize.com [35] Broadcom website: http://www.broadcom.com [36] Netvideo website: http://www.netvideo.com [37] Motorola website: http://www.motorola.com [38] http://www.mediaware.com [39] Impact Labs Inc. website: http://www.impactlabs.com [40] Vanguard Software Solutions website: http://www.vsofts.com [41] STMicroelectronics website: http://us.st.com www.thomson.net [42] www.conexant.com (H.264 decoder ICs _ HDTV & SDTV) [43] www.pixtree.com-85- [44] BT Exact--http://www.btexact.bt.com/ [45] DemoGaFrX--www.dolby.com [46] Equator--http://www.equator.com/ [47] Moonlight--www.elecard.com [48]Sand Video--www.broadcom.com/ [49] VideoLocus-http://www.lsilogic.com/technologies/industry_standards/mpeg_based_standards_h_264.html [50] W&W Communications (and DSP Research)--http://www.wwcoms.com/ [51] Cisco Systems -- www.cisco.com [52] Deutsche Telekom--http://www.telekom3.de/en-p/home/cc-startseite.html -86- [53] FastVDO-- http://www.fastvdo.com/ [54] Glance Networks---http://www.glance.net [55] RADVISION-- www.radvision.com/ [56] Sun Microsystems--http://www.sun.com/ [57]S. Srinivasan et al, Windows media video 9: Overview and applications, Signal Processing: Image Communication, vol.19, pp. 851-875, Oct. 2004. [57a] G. Sullivan and T. Wiegand, Video compression from concepts to H.264/AVC standard, Proc. IEEE, vol.93, pp. 18-31, Jan. 2005.[57b] C. Gomila, The H. 264/MPEG -4 AVC video coding standard, Short tutorial, EURASIP News Letter, vol. 15, pp. 19-34, June 2004. [58]http://ecs.itu.ch -87- [59]N. Kamaci and Y. Altunbasak, Performance comparison of the emerging H.264 video coding standard with the existing standards, IEEE ICME, pp. , Baltimore, MD, July 2003. [60]H. Schwartz, D. Marpe and T. Wiegand, SNRscalable extension of H.264/AVC, ICIP 2004, vol., pp., Singapore, Oct. 2004. [61] G. J. Sullivan, P. Topiwala and A. Luthra The H.264/AVC advanced video coding standard: Overview and introduction to the fidelity range extensions, SPIE Conf. on applications of digital image processing XXVII, vol. 5558, pp. 53-74, Aug. 2004. [62] J. Ostermann et al, Video coding with H.264/AVC: Tools, performance and complexity, IEEE CAS Magazine, vol.pp.7-34,I quarter, 2004.[63] W. Gao et al, AVS The Chinese next-generation video coding standard, NAB 2004, Las Vegas, NV, April 2004.[64] http://www.imtc.org/activity_groups/ JVT-EXPERTS LIST (FAQ) -88- [65] H.264 / AVC reference SOFWARE 9.3 [66] http://iphome.hhi.de/suehring/tml/download/jm93.zip [67] S. Kumar et al Overview of error resiliency schemes in H.264/AVC standard, JVCIR, Special Issue on H.264/AVC, VOL. , pp. , June-Aug. 2005. [68] www.stmicroelectronics.com WMV 9 and HD H.264/AVC decoder chip (STB7100) [69] a. Concept Main http://www.mainconcept.com/index_flash.shtml b. Mpegable http://www.mpegable.com/show/home.html c. Moonlight http://www.moonlight.co.il/cons_xmuxer.php Moonlights codec is one of the popular ones in the industry and it supports AAC. All the codecs have a trial version for download and also sample video clips are available. -89- [70] ST Thomson, Broadcom and Ateme http://www.ateme.com/products/h264.php have decoder chips for H.264. Ateme has real time single chip H.264 Main profile encoder (FPGA) [71] Moscow State University has published a study of current implementation of H.264 standard, including a widely-used implementation of MPEG-4 ASP as a reference. The study is available at:http://compression.ru/video/codec_comparison/mpeg-4_avc_h264_en.html Some of the results and observations in the study may be interesting to H.264/AVC community. Another interesting test has been performed in December 2004. http://www.doom9.org/codecs-104-1.htm The methodology is completely different than the one used by the Moscow State University. It features H264, WM9, RV10, VP6 and MPEG-4 ASP. -90- http://www.avc-alliance.org http://ftp3.itu.int/av-arch/jvt-site Http://www.dvdforum.org/29cmtg-resolution.htm\ High Profile is now officially mandatory for HD DVD Video (DVD - Forum). http://tinyurl.com/3u9ww (up to 3 recommendations can be downloaded per year) http://tinyurl.com/6dnck (ISO/IEC 14493-10 - MPEG-4 part 10 published standard costs CHF 260.00 Swiss Franks.) -91- Fidelity Range Extensions Slices in a picture are compressed as follows: "Intra" spatial (block based) prediction o Full-macroblock luma or chroma prediction 4 modes (directions) for prediction o 8x8 (FRExt-only) or 4x4 luma prediction 9 modes (directions) for prediction 4:2:2, 4:4:4 Formats > 8 bit depths (8x8) integer DCT HVS weighting matrices Transform bypass lossless mode: uses prediction and entropy coding of prediction errors Residual color transform Source editing such as Alpha blending High bit rates [use RGB color format] Y Cg Co High resolution -92- "Inter" temporal prediction block based motion estimation and compensation o Multiple reference pictures o Reference B pictures o Arbitrary referencing order o Variable block sizes for motion compensation Seven block sizes: 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 & 4x4 o 1/4-sample luma interpolation (1/4 or 1/8th-samplechroma interpolation) o Weighted prediction o Frame or Field based motion estimation for interlaced scanned video -93- Interlaced coding features o Frame-field adaptation Picture Adaptive Frame Field (PicAFF) Choice of compression (frame or field) is selected a the frame level MacroBlock Adaptive Frame Field (MBAFF) o Field scan Lossless representation capability o Intra PCM raw sample-value macroblocks o Entropy-coded transform-bypass lossless macroblocks (FRExt-only) In the MBAFF, choice of compression (frame or field) is selected at the two-vertical-pair-MB pair. -94- 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT) Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only) Scalar quantization Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only) Logarithmic control of quantization step size as a function of quantization control parameter -95- Deblocking filter (within the motion compensation loop) Coefficient scanning o Zig-Zag (Frame) o Field (alternate scan) Lossless Entropy coding o Universal Variable Length Coding (UVLC) using Exp-Golomb codes o Context Adaptive VLC (CAVLC) o Context-based Adaptive Binary Arithmetic Coding (CABAC) -96- Error Resilience Tools o Flexible Macroblock Ordering (FMO) o Arbitrary Slice Order (ASO) o Redundant Slices SP and SI synchronization pictures for streaming and other uses -97- Various color spaces supported (YCbCr of various types, YCgCo, RGB, etc. especially in FRExt) 4:2:0, 4:2:2 (FRExt-only), and 4:4:4 (FRExt-only) color formats Auxiliary pictures for alpha blending (FRExt-only) Each slice need not use all these tools. Depending upon the subset of these tools, a slice can be I, P, B, SP or SI. A picture may contain different slice types. -98- Slice I(Intra) P (Predicted) B (Bidirectionally predicted) (Reference for temporal prediction or non-reference) SP (Switching P) SI (Switching I) -99- I Slice (MB in I slice and intra MB in P and B slices) Spatial intra prediction 9 directional modes for (4x4) or (8x8) blocks. Apply (4 x4) or (8x8) IntDCT to Intra prediction errors. Note (8x8) IntDCT for FRExt-only. After (8x8) IntDCT, HVS weighting is applied to coefficients (FRExt-only). -100- Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC) PICAFF: Field processing similar to frame mode MBAFF: If MB pair in field mode (frame mode), field (frame) neighbors are used for spatial prediction. -101- I Slice (Spatial Prediction) (16x16) Luma & Corresponding chroma block size for full MB prediction (8x8)luma prediction (FRExt-only) (4x4) Luma prediction -102- For (16x16) luma, full MB prediction has four modes Vertical pels in MB predicted from pels just above of MB Horizontal pels in MB predicted from pels just left of MB DC pels in MB are predicted as average value of the neighboring pels Planar Prediction Assume MB covers diagonally increasing luma values. Predictor is formed based upon the planar equation. -103- Chroma spatial prediction (operates on entire MB) 4:2:0 (8x8) Similar to (16x16) Luma MB prediction

4:2:2 (8x16) Vertical, Horizontal, DC, Planar 4:4:4 (16x16) -104- For (8x8) luma intra prediction Nine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4 FRExt Only -105- Integer 8x8 Transform (luma only) FRExt Only -106- FRExt Only HVS Weighting Matrices Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and Intra Encoder can design and use customized scaling matrices. These are to be sent to the decoder at the sequence or picture level. Default matrices -107- HVS Weighting Matrices Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization. (This itself is a multiplication) Weighting matrices can be customized separately for 4x4 Intra Y 4x4 Intra Cb, Cr 4x4 Inter Y 4x4 Inter Cb, Cr 8x8 Intra Y 8x8 Inter Y -108- Two scans similar to 4x4 transform switched for frame/field coding Coefficient scanning is based on the decreasing variances and to maximize number of zero-valued coefficients along the scan Frame Zig-ZagField FRExt Only -109- Examples of parameters to be encoded ParametersDescription Sequence, picture and Headers and parameters slice-layer syntax elements Macroblock type mb_typePrediction method for each coded macroblock Coded block pattern Indicates which blocks within amacroblock contain coded coefficients Quantiser parameterTransmitted as a delta value from theprevious value of QP Reference frame indexIdentify reference frame(s) forinter prediction Motion vectorTransmitted as a difference (mvd) frompredicted motion vector Residual dataCoefficient data for each 4x4 or 2x2block -110- Exponential Golomb Codes (for data elements other than transform coefficients these codes are actually fixed, and are also called Universal Variable Length Codes (UVLC)) -111- These are variable length codes with a regular construction [ M Zeros] [ 1 ] [ INFO ] INFO is an M-bit field carrying information.The first codeword has no leading zero or trailing INFO. Code words 1 and 2 have a single-bit INFO field, code words 3-6 have a two-bit INFO field and so on. The length of each Exp-Golomb codeword is (2M + 1) bits. M = Floor(log2 [ code_num + 1 ]) INFO = code_num + 1 2M -112- Decoding 1. Read in M leading zeros followed by 1 2. Read M-bit INFO field 3. Code_num = 2M + INFO 1 CAVLC: Codes transform coefficients CABAC: Code transform coefficients and MV All other syntax elements are coded with the Exp_Golomb codes -113- DVD Forum: High Profile is mandatory for HD DVD players. The BD-ROM Video specification of the Blu-ray Disc Association: FRExtentions are mandatory. The DVB (digital video broadcast) standards for European broadcast television. For SD main is mandatory and high is optional. For HD High is mandatory. ATSC has preliminarily selected high profile.Several other environments may soon embrace it as well in the U.S. and various designs for satellite and cable television. ADOPTIONS -114- For applications such as content-contribution,content-distribution, and studio editing and post- processing: Use more than 8 bits per sample of source video accuracy Use higher resolution for color representation than what is typical in consumer applications (i.e., 4:2:2 or 4:4:4 sampling as opposed to 4:2:0 chroma sampling format) Perform source editing functions such as alpha blending (a process for blending of multiple video scenes, best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene) -115- Use very high bit rates Use very high resolution Achieve very high fidelity even representing some parts of the video losslessly Avoid color-space transformation rounding error Use RGB color representation -116- High profile (HP), supporting 8-bit video with 4:2:0 sampling, addressing high-end consumer use and other applications using high-resolution video without a need for extended chroma formats or extended sample accuracy High 10 profile (Hi10P), supporting 4:2:0 video with up to 10 bits of representation accuracy per sample High 4:2:2 profile (H422P), supporting up to 4:2:2 chroma sampling and up to 10 bits per sample, and High Profiles -117- High 4:4:4 profile (H444P), supporting up to 4:4:4 chroma sampling, up to 12 bits per sample, and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error All of these profiles support all features of the Main profile, and additionally support an adaptive transform block size and perceptual quantization scaling matrices. -118- FRExt Only 4:2:2 MB 4:4:4 MB MB structure in 4:2:2 and 4:4:4 formats 16 8 8 16 Y Cb Cr 16 16 16 16 -119- RGB Y Cb Cr Y = KR * R + (1 KR KB) * G + KB * B KR = 0.2126;KB = 0.0722;KR + KB + KG = 1 Y = 0.2126 R + 0.7152 G + 0.0722 B Cb = 0.5389 (B Y);Cr = 0.7874 (R Y) (ITU-R Rec.BT.601 defines KB=0.114, KR=0.299) ( )2(1 )bBB YCK=( )2(1 )rRR YCK=-120- Rounding error in RGB Y Cb Cr FRExt Only : YCgCo Cg = Green Chroma ; Co = Orange Chroma To further avoid any rounding error, add only one bit of precision to chroma samples 1 ( )[ ]2 21 ( )[ ]2 2( )2goR BY GR BC GR BC+= ++= =-121- In 4:4:4 video, FRExt has residual color transform. Keep RGB domain (same depth) for input, output and stored reference pictures and use the forward and inverse colortransformations inside the encoder and decoder for processing of the residual data only. Eliminates color-space conversion error without significantly increasing the overall complexity of the system. -122- Co = (R - B) t = B + (Co >> 1) Cg = G t Y = t + (Cg >> 1) Where t is an intermediate temporary variable and >> denotes an arithmetic right shift operation. Inverse color space conversion t = Y (Cg >> 1) G + t + Cg B = t (Co >> 1) R = B + Co Forward color space conversion -123- Auxiliary pictures, which are extra monochrome pictures sent along with the main video stream, and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI). Film grain characteristics SEI, which allow a model of film grain statistics to be sent along with the video data, enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding, rather than burdening the encoder with the representation of exact film grain during the encoding process. SEI :Supplemental Enhancement Information -124- Deblocking filter display preference SEI, which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures. Stereo video SEI indicators, which allow the encoder to identify the use of the video on stereoscopic displays, with proper identification of which pictures are intended for viewing by each eye. -125- Higher profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower nested profiles All high profiles support all features of the main profile New Profiles in the H.264/AVC FRExt Amendment -126- Levels in H.264/AVC Level 1b added in FRExt. For some 3G wireless environments -127- Levels in H.264/AVC 1. If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 frames/sec 2. Horizontal and vertical maximum sizes cannot be more than sqrt[(Total # of pixels/frame)x8] 3. If at a given level, picture size is less than that in the table, # of reference frames for ME and MC can be up to 16. -128- To meet more demanding high fidelity applications Compressed Bit Rate Multipliers for FRExt Profiles Multipliers for fourth column of table in page 125-129- 24 Frames/sec film 1920x1080 progressive The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps) The High profile of FRExt produced nominally transparent (i.e., difficult to distinguish from the original video without compression) video quality at only 16 Mbps. [9] T. Wedi, Y. Kashiwagi, Subjective quality evaluation of H.264/AVC FRExt for HD movie content, JVT document JVT-L033, July 2004. -130- Courtesy: Advanced Technology Group of Motorola BCS-131- Courtesy: Advanced Technology Group of Motorola BCS-132- Fig. 7: (a) (e) Comparison of R-D curves for MPEG-2 (MP2), MPEG-4 ASP (MP4 ASP) and H.264/AVC (MP4 AVC). I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3). Courtesy: Advanced Technology Group of Motorola BCSMP4 ASP yields 1.5 coding gain over MPEG-2. MPEG-4 AVC yields 2.0 coding gain over MPEG-2. -133- High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps Nominally transparent video quality on 1080p24 at 16 Mbps -134- (Fast VDO) Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt, thus the remark in the figure about potential future performance -135- High Profile Details: Deblocking Filter, CABAC, Signaling Deblocking Filter: Only control of filter is adjusted: do not filter 4x4 blocks No change to filter operation itself CABAC: 61 new contexts and corresponding initialization values No change to CABAC engine Signaling: 8x8 transform on/off flag at PPS level 8x8 transform on/off flag per macroblock allows adaptive use -136- High vs. Main Profile Summary High Profile contains: Main profile Adaptive MB level switching between 8x8 and 4x4 transform block sizes. Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP Coding efficiency impact (measured as average bit-rate reduction): HD Film: 12% HD Video (progressive): 12% HD Video (interlace): 4% (only 2 test clips) SD Video (interlace): 6% Complexity impact: Implementation beyond Main Profile affects Intra prediction, transform, deblocking filter control, CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC, transform) -137- Licensing of H.264/AVC Technology Two patent pools to obtain the license1. MPEGLA www.mpegla.com 2. Via licensing www.vialicensing.com These two patent pools do not guarantee that they cover the entire technology of H.264 as participation of a patent owner in a patent pool is voluntary. -138- AUDIO coding & systems H.264 is limited to video Audio coder: Bit rates, Quality levels and # of channels left to industry and standards groups (ATSC, SCTE, ARIB, DVB etc.) DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC(HE High efficiency) ATSC, SCTE, ARIB, MPEG etc. will continue to use MPEG-1 Audio, MPEG-2, AAC and AC-3.