THEJASWINI PURUSHOTHAM ELECTRICAL ENGINEERING GRADUATE STUDENT THE UNIVERSITY OF TEXAS AT ARLINGTON ADVISOR Dr. K. R. RAO, EE DEPT, UTA Low Complexity.

8 Septmeber 2010

1

THEJASWINI PURUSHOTHAMELECTRICAL ENGINEERING GRADUATE

STUDENTTHE UNIVERSITY OF TEXAS AT

ARLINGTON

ADVISOR Dr. K . R . RAO, EE DEPT, UTA

Low Complexity H.264 Encoder using Machine Learning.

8 Septmeber 2010

2

Agenda

Introduction.H.264/AVC.Machine learning.C4.5.Weka.Thesis Approach.Results.Conclusions.

8 Septmeber 2010

3

video compression and standardization

Importance of video Need for compression

High bandwidth requirements Remove inherent redundancy

Need for standardization Ensures interoperability

Coding Effi-

ciency

Network

awareness

Complexity2005

2010

1999

1994

MPEG4

H.264

1992MPEG1

Video Conferencing

H.263

2003

Mobile Phone

Hand PC

Mobile TV

SVC

HDTV

MPEG2

H.265/HEC/ NGVC

VC-1

8 Septmeber 2010

4

MOTIVATION FOR THE RESEARCH

8 Septmeber 2010

5

Motivation for a low complexity H.264 encoder

H.264 can achieve considerably higher coding efficiency than previous standards.

Motion estimation, in-loop deblocking filter, sub-pel interpolation and mode decision bring in the complexity.

The high-computational complexity of H.264 and real-time requirements of video systems are the main challenges.

8 Septmeber 2010

6

OVERVIEW OF H.264/AVC

8 Septmeber 2010

7

Design Features Highlights

Features for enhancement of prediction Directional spatial prediction for intra coding

9 intra 4x4 modes + 4 intra 16x16 modes + 9 intra 8x8 modes

Variable block-size motion compensation with small block size 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4

Quarter-sample-accurate motion compensation Multiple reference picture motion compensation In-the-loop deblocking filtering to remove blocky artifacts

Features for improved coding efficiency Small block-size transform – 4x4 and 8x8 integer DCT Exact-match inverse transform Short word-length transform Hierarchical block transform Arithmetic entropy coding Context-adaptive entropy coding

8 Septmeber 2010

8

H.264 - Encoder

8 Septmeber 2010

9

H.264 Decoder

8 Septmeber 2010

10

H.264 Decoder

8 Septmeber 2010

11

OVERVIEW OF MACHINE LEARNING

8 Septmeber 2010

12

Machine learning is a subfield of artificial intelligence.

It is the subject concerned with the design and development of algorithms and techniques that allow computers to learn.

Machine learning method in this thesis extracts rules and patterns out of massive data sets.

The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods.

8 Septmeber 2010

13

C4.5 CLASSIFIER

8 Septmeber 2010

14

C4.5 was developed by Ross Quinlan.C4.5 (know as a J48) is a system that constructs

classifiers.Classifiers are one of the commonly used tools in

data mining.Such systems take as input a collection of cases,

each belonging to one of a small number of classes and described by its values for a fixed set of attributes.

With that, a classifier accurately predicts the class to which a new case belongs.

C4.5 uses the information gain of the data attribute to sort the data.

8 Septmeber 2010

15

Illustration of C4.5 classification

8 Septmeber 2010

16

Decision tree

8 Septmeber 2010

17

WEKA

Weka is a collection of machine learning algorithms for data mining tasks.

The algorithms can either be applied directly to a dataset or called from another Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.

It is also well-suited for developing new machine learning schemes [25].

8 Septmeber 2010

18

COMPLEXITY IN THE H.264 ENCODER

8 Septmeber 2010

19

Figure 1: Multi-frame Motion Estimation.

8 Septmeber 2010

20

The most computational expensive process in H.264 is the Motion Estimation.

For example, assuming FS and P block types, Q reference frames and a search range of MxN, MxNxPxQ computions are needed.

8 Septmeber 2010

21

APPROACH IN THIS THESIS

8 Septmeber 2010

22

Approach

J4.8 analysis is used to reduce the complexity of determining mode decisions.

The statistics for each 16x16 macroblock of the first four frames of the video sequence is calculated.

The statistics are the mean, variance, variance of means for all the sub macroblock sizes in the macroblock, mean of the adjacent macroblocks, variance of the adjacent macroblocks and variance of means for all the submacroblock sizes in the adjacent blocks.

8 Septmeber 2010

23

Figure 2:Flow chart of the process followed to achieve the low complexity encoder.

8 Septmeber 2010

24

The modes for the same first four frames from the video sequences are determined from the H.264 encoder in the JM 16.2 software.

These modes and the determined statistics are collectively given as attributes for training in the WEKA tool.

This is an offline process. WEKA tool uses C4.5 (J48) classifier algorithm to

determine the mode decision tree. A universal tree that can give relatively accurate

mode decisions to any video sequence is developed.

8 Septmeber 2010

25

Different combination of video sequences are used for training the mode decision trees and later testing the mode decision trees.

Table 1 summarizes the results. The attributes most commonly considered for mode

decision in all the entries in the table are considered to determine the mode decision for the universal mode decision tree.

This tree is implemented in the form of if – else statements in the motion estimation block of JM16.2.

Hence, the mode decision process is reduced to if –else statements.

8 Septmeber 2010

26

Attributes in the thesis

The metrics used in the decision trees are the mean, variance, variance of means, residual absolute sum, residual mean, residual variance, residual variance of means and means of variance.

These metrics were calculated for the main MB shapes 16x16, 8x8 and 4x4.

8 Septmeber 2010

27

Decision Tree for mode decision

8 Septmeber 2010

28

Table 1:Classification rule accuracy

Training Seq 1

% Accuracy * for Training seq 1

Training seq 2

% Accuracy * for Training seq 2

Test sequence

% Accuracy*

Bus_cif 70.6861 Foreman_cif 80.7645 Mobile_cif 77.188

Stefan_cif 81.8182 Tempete_cif 82.8897 Container_cif 85.207

Container_cif 98.9268 ------ ---- Waterfall_cif 93.358

Waterfall_cif 90.5636 --------- ------- Stefan_cif 85.9583

Bus_cif 70.6861 -------- ------ Container_cif 86.529

Bus_cif 75.4665 Foreman_cif 94.9495 Mobile_cif 82.0896

Stefen_cif 88.3838 Tempete_cif 85.0444 Container_cif 90.1812

Container_cif 98.131 ---------- ----- Waterfall_cif 95.00442

Waterfall_cif 92.1086 -------- ------------- Stefan_cif 88.952

Bus_cif 70.6861 --------- ------------- Bus_cif 74.8865

Waterfall_cif 90.5636 --------- ------ Bus_cif 83.0469

8 Septmeber 2010

29

Table 1 summarizes the WEKA tool results.The accuracy in determining the modes from

the classification rule is summarized.

30

8 Septmeber 2010

Sequence Encoding time (seconds)

for JM 16.2 without

machine learning.

Enc oding time

(seconds) using

machine learning.

ME time (seconds) for JM

16.2 without machine

learning.

ME time (seconds) using

machine learning.

Foreman_qcif 346.720 270.037 247.147 151.595

Coast_qcif 361.714 279.803 242.531 144.371

Car phone_qcif 347.85 269.674 249.081 152.576

Silent_qcif 368.155 253.006 254.297 139.053

Suzie_qcif 343.983 342.583 263.777 260.981

Miss-america_qcif 368.694 198.909 310.542 141.584

Bus_cif 1608.934 1346.542 1010.012 617.088

Container_cif 1542.106 1241.772 1109.672 686.165

Foreman_cif 1689.383 889.833 1316.543 537.128

Mobile_cif 2031.07 1695.243 1066.867 627.440

Tempete_cif 1808.560 1361.954 1078.435 590.689

Stefan_cif 1750.255 1267.813 1136.800 617.822

Waterfall_cif 1497.525 994.996 1017.974 529.557

Mother-daughter_qcif 422.332 360.371 322.011 276.212

Table 2: Results obtained using JM 16.2 and JM using machine learning for 4 frames.

31

8 Septmeber 2010

Table 3: Speed up in encoding time and motion estimation time for 4 frames using machine learning compared to JM 16.2 encoder.

Sequence Speed up in Encoding time Speed up in ME time

Foreman_qcif 22.11 % 38.66 %

Coast_qcif 22.64 % 40.47 %

Car phone_qcif 22.47 % 38.74 %

Miss-america_qcif 15.772% 28.86%

Bus_cif 16.30 % 38.90 %

Container_cif 19.47 % 38.16 %

Foreman_cif 47.32 % 59.20 %

Mobile_cif 47.47% 62.99%

Tempete_cif 40.370% 56.629%

Stefan_cif 35.04% 51.268%

Waterfall_cif 32.022% 46.778%

Silent_qcif 30.9266% 45.039%

Suzie_qcif 23.36779% 23.819%

Mother-daughter_qcif 23.75% 23.353%

32

8 Septmeber 2010

Motion estimation time for 4 frames for sequences in Table 3.

1 2 3 4 5 6 7 8 9 10 11 12 13 140

200

400

600

800

1000

1200

1400

ME (sec)ME machine learning (sec)

Sequence Number

Sec

33

8 Septmeber 2010

Table 4: Comparison of compressed file sizes for four frames for sequences in Table 2.

Sequence Compressed file

size (KB) in JM 16.2

encoder.

Compressed file

size (KB) using

machine learning.

% Increase in

encoded file size

using machine

learning

Foreman_qcif 4.34 4.34 0

Coast_qcif 5.68 5.67 + 0.0017

Silent_qcif 4.0 4.0 0

Suzie_qcif 3.0 3.0 0

Car phone_qcif 4.52 4.54 0.0044

Bus (cif) 31.9 32.2 0.0093

Container (cif) 12.0 12.0 0

Foreman (cif) 12.4 12.7 0.1903

Mobile(cif) 50.4 51.0 0.0119

Stefan(cif) 32.5 34.0 0.0462

Waterfall(cif) 18.7 19.0 0.0160

Tempete(cif) 36.7 37.0 0.0082

Miss-america_qcif 2.0 2.0 0

Mother-daughter_qcif 2.279 2.279 0.0

34

8 Septmeber 2010

Compressed file sizes using machine learning for four frames for sequences in Table 4.

1 2 3 4 5 6 7 8 9 10 11 12 13 140

10

20

30

40

50

60

Compressed file size using JM 16.2

Compressed file size using machine learn-ing

Sequence Number

KB

35

8 Septmeber 2010

Sequence PSNR(dB) using JM 16.2

encoder

PSNR (dB) using machine

learning

MSE using JM 16.2

encoder.

MSE using machine

learning.

Foreman_qcif 37.389 37.324 11.881 12.068

Coast_qcif 35.24 35.21 19.539 19.681

Car ph_qcif 37.937 37.879 10.472 10.619

Miss-america_qcif 40.949 40.881 5.22970 5.31475

Bus_cif 35.961 35.932 16.518 16.633

Container_cif 37.162 37.153 12.517 12.544

Foreman_cif 37.833 37.85 10.371 10.684

Mobile_cif 35.541 35.512 18.2873 18.419

Tempete_cif 35.962 35.93 16.594 16.705

Stefan_cif 37.011 36.985 13.00572 13.08644

Waterfall_cif 35.912 35.906 16.6923 16.716

Mother-daughter_qcif 38.363 38.363 9.481 9.481

Silent_qcif 36.784 36.775 13.63795 13.6775

Suzie_qcif 37.749 37.741 10.938 10.381

Table 5: Comparison of PSNR and MSE for four frames.

36

8 Septmeber 2010

Comparison of PSNR and MSE for four frames in Table 5.

1 2 3 4 5 6 7 8 9 10 11 12 13 1432

33

34

35

36

37

38

39

40

41

42

PSNR using JM 16.2

PSNR using machine learn-ing

Sequence Number

PSN

R

37

8 Septmeber 2010

Table 6: SSIM comparison for four frames.

Sequence SSIM for JM 16.2 SSIM using machine

learning.

% decrease **

Foreman_qcif 0.95944 0.95910 0.035

Coast_qcif 0.91793 0.91763 0.032

Car phone_qcif 0.96670 0.96641 0.029

Suzie_qcif 0.9555 0.9557 0.0002

Bus_cif 0.94973 0.94941 0.033

Container_cif 0.92827 0.92823 0.0043

Foreman_cif 0.94302 0.94306 + 0.0042

Mobile_cif 0.9758 0.9755 .00003

Tempete_cif 0.9711 0.9709 0.02

Stefan_cif 0.9807 0.9806 0.0001

Waterfall_cif 0.9420 0.9420 0.00

Silent_qcif 0.9600 0.9600 0.00

Miss-america_qcif 0.9707 0.9706 0.001

Mother-daughter_qcif 0.9663 .9663 0.00

38

8 Septmeber 2010

Comparison of SSIM for four frames in Table 6.

1 2 3 4 5 6 7 8 9 10 11 12 13 140

5

10

15

20

25

MSE using JM 16.2

MSE using machine learn-ing

Sequence Number

MSE

8 Septmeber 2010

39

CONCLUSIONS

It was observed that a single universal mode decision tree failed in terms of fidelity of the video when all the modes for ME/MC were used in the machine learning algorithm.

So this thesis uses only sub macroblock modes, i.e 8x8, 8x4, 4x8 and 4x4 modes for the machine learning. The function called ‘submacroblock_mode_decision’ in the JM 16.2 was replaced by the if-else statements .

The results are tabulated in the Tables 7 through 11. From Table 8, it is clear that the average speed up in the encoding time is 28.5%. The average speed up in the motion estimation time is 42.846%.

From table 9, the average percentage decrease in compressed file size is 0.36%. From Table 11, it is evident that the average decrease in SSIM is less than 0.0107%.

When 100 frames are encoded the average speed up in the encoding time is 8.5%. The average speed up in the motion estimation time is 18.346% and the average decrease in SSIM is less than 0.0109%.

8 Septmeber 2010

40

REFERENCES

[1] http://iphome.hhi.de/suehring/tml/ for JM software [2] Soon-kak Kwon, A. Tamhankar and K.R. Rao ”Overview of H.264 / MPEG-4 Part 10”, J.

Visual Communication and Image Representation, vol. 17, pp.186-216, April 2006.[3] http://www.vcodex.com/files/h264_overview_orig.pdf reference for H.264[4] http://iphome.hhi.de/suehring/tml/JM%20 Reference%20Software%20Manual%20(JVT-

AE010).pdf for JM reference software documentation manual[5] G. A. Davidson, et al “ATSC video and audio coding”, Proceedings of IEEE, vol. 94, pp. 60-

76, Jan. 2006[6] http://www.birds-eye.net/definition/c/cif-common_intermediate_format.shtml for

information about CIF and QCIF formats[7] M.Fieldler, “Implementation of basic H.264/AVC Decoder”, seminar paper at Chemnitz

University of Technology, June 2004[8] A.Puri, X.Chen and A. Luthra , “ Video coding using H.264/MPEG-4 AVC compression

standard”, Science Direct. Signal processing: Image communication, vol.19, pp 793-849, Oct. 2004.

[9] T.Wiegand, et al “Overview of the H.264/AVC video coding standard”, IEEE Trans. CSVT, vol.13, pp 560-576, July 2003.

41

8 Septmeber 2010

[10] T. Wiegand and G. J. Sullivan, “The H.264 video coding standard”, IEEE Signal Processing Magazine, vol. 24, pp. 148-153, March 2007.

[11] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006.

[12] R. Schäfer, T. Wiegand and H. Schwarz, “The emerging H.264/AVC standard”, EBU Technical Review, Jan. 2003.

[13] Video test sequences (YUV 4:2:0): http://trace.eas.asu.edu/yuv/index.html [14] Z. Wang et al, “Image quality assessment: From error visibility to structural similarity,”

IEEE Trans. on Image Processing, vol. 13, pp. 600-612, Apr. 2004. [15] Z. Wang, L. Lu, and A.C. Bovik, “Video quality assessment based on structural distortion

measurement,” Signal Processing: Image Communication, Special Issue on Objective Video Quality Metrics, vol. 19,pp. 122-124, Jan. 2004.

[16] Z. Wang, H.R. Sheikh, and A.C. Bovik, “Objective video quality assessment,” in The Handbook of Video Databases: Design and Applications (B. Furht and O. Marques, eds.), pp. 1041–1078, CRC Press, Sept. 2003.

[17] T.K. Tan, G. Sullivan and T. Wedi, “Recommended simulation conditions for coding efficiency experiments”, ITU-T SC16/Q6, 34th VCEG Meeting, Antalya, Turkey, Jan. 2008, Doc.VCEG-AH10r3.

[18] P.Carrillo, H.Kalva, and T.Pin, “Low complexity H.264 video encoding”, Applications of Digital Image Processing. Proc. of SPIE, vol. 7443, 74430A, Sept.2009.

8 Septmeber 2010

42

[19] G.Sullivan and T.Wiegand, “Video compression – From concepts to the H.264/AVC Standard,” Proc. IEEE, vol.93, pp. 18-31, Jan.2005.

[20] http://www.apple.com/quicktime/technologies/h264/ for H.264 codec reference[21] D. Kumar, P. Shastry and A. Basu, “Overview of the H.264 / AVC”, 8th Texas Instruments

Developer Conference India, 30 Nov. – 1 Dec. 2005, Bangalore.[22] http://wiki.multimedia.cx/index.php?title=Motion_Prediction for motion prediction[23] Zhi-Yi Mai, et al “A new-rate distortion optimization using structural information in H.264

I-frame encoder” ACIVS 2005, LNCS 3708, pp. 435–441, 2005 [24] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Synthesis Lectures on

Image, Video and Multimedia Processing. Morgan and Claypool, 2006.[25] http://www.cs.waikato.ac.nz/ml/weka/ for WEKA tool download[26]I.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley , 2006.[27]I.E.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley, II edition,

2010.[28] HTTP://iphome.hhi.de/suehring/tml/download/ , JM reference software.[29] http://trace.eas.asu.edu/yuv/index.html, Video sequences.[30] E. Peixoto, R. L. de Queiroz, and D. Mukherjee, “Mobile video communications using a

Wyner-Ziv transcoder,” Proc. SPIE 6822, VCIP, 68220R Jan. 2008.[31] A. Aaron, D. Varodayan, and B. Girod, “Wyner-Ziv residual coding of video,” Proc.

International Picture Coding Symposium, Beijing, P. R. China , April 2006.

8 Septmeber 2010

43

THANK YOU

8 Septmeber 2010

44

H.264 - Profiles

8 Septmeber 2010

45

Design Features Highlights

Features for enhancement of prediction Directional spatial prediction for intra coding Variable block-size motion compensation with small block

size Quarter-sample-accurate motion compensation Motion vectors over picture boundaries Multiple reference picture motion compensation Decoupling of referencing order from display order Decoupling of picture representation methods from picture

referencing capability Weighted prediction Improved “skipped” and “direct” motion inference In-the-loop deblocking filtering

8 Septmeber 2010

46

Features for improved coding efficiency Small block-size transform Exact-match inverse transform Short word-length transform Hierarchical block transform Arithmetic entropy coding Context-adaptive entropy coding

8 Septmeber 2010

47

Features for robustness to data errors/losses Parameter set structure NAL unit syntax structure Flexible slice size Flexible macroblock ordering (FMO) Arbitrary slice ordering (ASO) Redundant pictures Data Partitioning SP/SI synchronization/switching pictures

8 Septmeber 2010

48

Directional spatial prediction for intra coding

Intra prediction is to predict the texture in current block using the pixel samples from neighboring blocks

Intra prediction for 44 (9 modes) and 16 16 blocks (4 modes) are supported in all H.264 profiles.

Intra prediction for 8x8 (9 modes) is supported in the high profiles.

8 Septmeber 2010

49

Luma prediction modes in H.264

8 Septmeber 2010

50

Variable block-size motion compensation

Partitioned in 2 stagesIn the 1st stage, determine first 4 modes

161616881688

If mode 4 (88) is chosen, further partition into smaller blocks for every 88 block

844844

At most 16 motion vectors may be transmitted for a 1616 macroblockSub pixel accuracyLarge computational complexity to determine the modes but efficient encoding

8 Septmeber 2010

51

Variable block-size motion compensation

8 Septmeber 2010

52

P Slice More than one prior coded picture can be

used as reference for MC prediction Reference index parameter is transmitted

for each MC 1616, 168, 816 or 88 For smaller blocks within the 88 use 1

reference index P-Skip type is supported

B Slice Utilize two distinct lists of reference

pictures Four different types of inter-picture predict

List 0, list 1, bi-predictive, and direct Bi-predictive

weighted average of MC list 0 and list 1

Direct prediction Inferred from previously transmitted

syntax Either list 0 or list 1 prediction or bi-

predictive Similar macroblock partitioning as P slices

is utilized B Skip mode is supported

Multiple reference picture motion compensation

P frame

B frame

8 Septmeber 2010

53

Hierarchical block transform

4x4 and 8x8 (high profile only) multiplier-free integer DCT transform

Transform coefficients perfectly invertible Hierarchical transform (Integer DCT and

Hadamard) For macroblock coded in 1616 Intra mode and

chrominance blocks DC coefficients are further grouped and

transformed Hadamard transform is used for chrominance

block

Integer DCT 4x4 Integer DCT 8x8Hadamard 4x4

Hadamard 2x2

8 Septmeber 2010

54

In loop deblocking filter

Block based operations are responsible for blocking artifacts

In-loop deblock filter –smoothes blocky edges; increases rate-distortion performance.

Applied to all 4x4 blocks except at picture boundaries.

Filtering adaptive at Slice level Block level Pixel level

Vertical edges filtered first (left to right)

Followed by horizontal edges (top to bottom)

8 Septmeber 2010

55

Entropy encoding

CAVLC (Context-based Adaptive Variable

Length Coding).

CABAC (Context-based Adaptive Binary

Arithmetic Coding).

CAVLC makes use of run-length encoding.

CABAC utilizes arithmetic coding; codes

both MV and residual transform

coefficients.

Typically CABAC provides 10-15 %

reduction in bit rate compared to CAVLC,

for the same PSNR.

All other syntax elements are encoded by

Exp-Golomb codes (Universal Variable

Length Codes (UVLC)).

CAVLC

CABAC

8 Septmeber 2010

56

Computational Overhead

Entropy encodingMultiple block sizeSmaller block sizeInteger transformIn-loop deblocking

8 Septmeber 2010

57

H.264 Extensions Scalable video coding

Application scenario

8 Septmeber 2010

58

H.264 Extensions Scalable video coding

8 Septmeber 2010

59

Types of Scalability

8 Septmeber 2010

60

H.264 Extensions Multi view coding

Applications3-D Video

Stereoscopic TV

8 Septmeber 2010

61

H.264 Extensions Multi view coding

8 Septmeber 2010

62

Snapshots of video sequences considered in the thesis.

8 Septmeber 2010

63

8 Septmeber 2010

64

8 Septmeber 2010

65

8 Septmeber 2010

66

THEJASWINI PURUSHOTHAM ELECTRICAL ENGINEERING GRADUATE STUDENT THE UNIVERSITY OF TEXAS AT ARLINGTON ADVISOR Dr. K. R. RAO, EE DEPT, UTA Low Complexity.

Documents

intra coding

overview of machine

machine learning method

illustration of c4

intra 4x4 modes

higher coding efficiency

intra 16x16 modes

small block size 16x16