Unitary Transforms and Transform Coding

Unitary Transforms and Transform Coding

Yao Wang Polytechnic School of Engineering, New York University

@Yao Wang, 2016 EL6123: Image and Video Processing 1

@Yao Wang, 2016 EL6123: Image and Video Processing

Outline

•  Overview of video coding systems •  Linear and unitary 1D transform •  2D transform, separable 2D transform •  Transform coding

–  Optimal bit allocation

•  JPEG Image Coding Standard

2


Components in a Coding System

Focus of this lecture

3


Encoder Block Diagram of a Typical Block-Based Video Coder

(Assuming No Intra Prediction)

Previous lecture: Motion estimation Last lecture: Variable Length Coding Last lecture: Scalar and Vector Quantization This lecture: transform and predictive coding

4


A Review of Vector Quantization

•  Motivation: quantize a group of samples (a vector) together, to exploit the correlation between samples

•  Each sample vector is replaced by one of the representative vectors (or patterns) that often occur in the signal

•  Typically a block of 4x4 pixels •  Design is limited by ability to obtain training samples

that are similar to samples to be quantized •  Implementation is limited by large number of nearest

neighbor comparisons – exponential in the block size

5


Transform Coding

•  Motivation: –  Represent a vector (e.g. a block of image samples) as the

superposition of some typical vectors (block patterns) –  Quantize and code the coefficients –  Can be thought of as a constrained vector quantizer

+ t1 t2 t3 t4

6


Block Diagram

7


One Dimensional Linear Transform

•  Let CN represent the N dimensional complex space. •  Let h0, h1, …, hN-1 represent N linearly independent

vectors in CN. •  Any vector f є CN can be represented as a linear

combination of h0, h1, …, hN-1 :

.

)1(

)1()0(

],,...,,[

,)(

110

1

0

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

==

==

−

−

=∑

Nt

tt

where

kt

N

N

kk

thhhB

Bthf

AffBt == −1

f and t form a transform pair

8


Inner Product

•  Definition of inner product

•  Orthogonal

•  Norm of a vector

•  Normalized vector: unit norm

•  Orthonomal = orthogonal + normalized

∑−

=

=>=<1

02

*12121 )()(,

N

n

H nfnfffff

0, 21 >=< ff

∑−

=

=>==<1

0

22 |)(|,N

n

H nffffff

12 =f

9


Orthonormal Basis Vectors (OBV)

•  {hk, k=0,…N-1} are OBV if

•  With OBV

⎩⎨⎧

≠=

=>=<lklk

lklk 01

, ,δhh

fhhhhhfh Hlkl

N

kk

N

kll ltktkt =>=<>=>=<< ∑∑

−

=

−

=

)(,)()(,,1

0

1

0

AffBf

h

hh

t ==

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

−

H

HN

H

H

1

1

0

.,1 IBBBBBB ===− HHH or B is unitary

10


Definition of Unitary Transform

•  Basis vectors are orthonormal •  Forward transform

•  Inverse transform

AffBf

h

hh

t

fh

==

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

>==<

−

−

=∑

H

HN

H

H

N

nkk nfnhkt

1

1

0

1

0

* ,)()(,)(

[ ] tABtthhhhf HN

N

kk

N

kk

kt

nhktnf

====

=

−

−

=

−

=

∑

∑

110

1

0

1

0

)(

,)()()(

11


Example: 4-pt Hadamard Transform

⎪⎪⎩

⎪⎪⎨

⎧

−==−==

⇒

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

−=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

1025

4321

,

2/12/12/12/1

,

2/12/12/12/1

,

2/12/12/12/1

,

2/12/12/12/1

3

2

1

0

3210

tttt

f

hhhh

12


1D DFT as a Unitary Transform

.1,...,1,0,

1

1

,1)(

)1(2

2

2

−=

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

=

−

Nk

e

eN

oreN

nh

NkNj

Nkj

k

Nknj

k

π

π

π

h

.1...,,1,0,)(1)(

;1...,,1,0,)(1)(

1

0

2

1

0

2

−==

−==

∑

∑−

=

−

=

−

NnekFN

nf

NkenfN

kF

N

k

Nknj

N

n

Nknj

π

π

13


Example: 1D DFT, N=2

( ) ( )

fhh

fh

f

hhh

=⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡−

−⎥⎦

⎤⎢⎣

⎡=+

−=−=⎟⎟⎠

⎞⎜⎜⎝

⎛⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡−

==+=⎟⎟⎠

⎞⎜⎜⎝

⎛⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡=

>=<

⎥⎦

⎤⎢⎣

⎡=

⎥⎦

⎤⎢⎣

⎡−

=⎥⎦

⎤⎢⎣

⎡=

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

=

=

21

11

21

11

23 :Verify

212*11*1

21

21

,1

12

1,2

32*11*12

121

,11

21

obtain we,, Using

, determine ,21

if

11

21,

11

21:

)12

2exp(

)02

2exp(

21

: vectorsbasis only two are there:case 2

1100

10

10

10

tt

tt

t

tt

kj

kj

N

kk

k

π

π

14


Another Example: 1D DFT, N=4

( ) ( )

( ) ( ) ( )

fhhhh

f

hhhhh

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−++−+++−++−−−+−

=+++

−−=−−+==−+−=

+−=+−−==+++=⇒

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

−=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

==

1220168

41

)3()3(14)3()3(14)3()3(14)3()3(14

41 :Verify

.3213542

21;03542

21

);3(213542

21;73542

21

3542

1

1

21;

111

1

21;

1

1

21;

1111

21:yields

)34

2exp(

)24

2exp(

)14

2exp(

)04

2exp(

21 using :case 4

33221100

32

10

3210

jjjjjjjjjj

jj

tttt

jjjtt

jjjtt

j

j

j

j

kj

kj

kj

kj

N k

π

π

π

π

15


1D Discrete Cosine Transform (DCT)

1,...,10

2

1)(

2)12(cos)()(

:Vectors Basis

−==

⎪⎩

⎪⎨⎧

=

⎥⎦⎤

⎢⎣⎡ +=

Nkk

N

Nkwhere

Nknknhk

α

πα

∑

∑−

=

−

=

=

=

1

0

1

0

)()()( :Transforms Inverse

)()()( :Transform Forward

N

uk

N

nk

nhkTnf

nhnfkT

( )

( )

fhhhh

f

hhhhh

=+++

=−+−=−=+−−=

−=−+−==+++=⇒

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

−=

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

=⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−

=

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

=⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−

=

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

=⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

⎟⎠⎞⎜

⎝⎛

=

=

33221100

32

10

3210

:Verify

.3827.06533.0*)45(2706.0*)32(;2354221

;9239.02706.0*)54(6533.0*)32(;7354221

3542

2706.06533.06533.0

2706.0

821cos

815cos

89cos

83cos

21;

5.05.05.0

5.0

814cos

810cos

86cos

82cos

21;

6533.02706.0

2706.06533.0

87cos

85cos

83cos

81cos

21;

5.05.05.05.0

1111

21:yields

87cos

85cos

83cos

81cos

)(

:case 4tion RepresentaVector

tttt

tt

tt

k

k

k

k

k

N

k

π

π

π

π

π

π

π

π

π

π

π

π

π

π

π

π

α

16


Two Dimensional Transform

•  Decompose a MxN 2D matrix F=[F(m,n)] into a linear combination of some basic images, Hk,l=[Hk,l(m,n)], so that:

∑∑

∑∑−

=

−

=

−

=

−

=

=

=

1

0

1

0,

1

0

1

0,

),(),(),(

,),(

M

k

N

llk

M

k

N

llk

nmHlkTnmF

lkT HF

17


Graphical Interpretation

+ t1 t2 t3 t4

Inverse transform: Represent a vector (e.g. a block of image samples) as the superposition of some basis vectors (block patterns) Forward transform: Determine the coefficients associated with each basis vector

18


Two Dimensional Inner Product

•  Inner Product

•  Norm of a Matrix

•  A set of basis images {Hk,l, k=0,1,…,M-1, l=0,1,…,N-1} is orthonormal if

∑∑−

=

−

=

>=<1

0

1

02

*121 ),(),(,

M

m

N

nnmFnmFFF

∑∑−

=

−

=

>==<1

0

1

0

2),(,M

m

N

nnmFFFF

⎩⎨⎧ ==

=>=<otherwise

jlikifjlikjilk ,0

,,1, ,,,, δδHH

19


Two Dimensional Unitary Transform

•  {Hk,l} is an orthonormal set of basis images •  Forward transform

•  Inverse transform

∑∑−

=

−

=

>==<1

0

1

0

*,, ),(),(,),(

M

m

N

nlklk nmFnmHlkT FH

∑∑

∑∑−

=

−

=

−

=

−

=

=

=

1

0

1

0,

1

0

1

0,

),(

,),(),(),(

M

k

N

llk

M

k

N

llk

lkT

ornmHlkTnmF

HF

20


Example of 2D Unitary Transform

⎪⎪⎩

⎪⎪⎨

⎧

=−=−==

⇒⎥⎦

⎤⎢⎣

⎡=

⎥⎦

⎤⎢⎣

⎡−

−=⎥

⎦

⎤⎢⎣

⎡−−

=⎥⎦

⎤⎢⎣

⎡−−

=⎥⎦

⎤⎢⎣

⎡=

0)1,1(1)0,1(2)1,0(5)0,0(

4321

,2/12/12/12/1

,2/12/12/12/1

,2/12/12/12/1

,2/12/12/12/1

11100100

TTTT

F

HHHH

21


Separable Unitary Transform

•  Let hk, k=0, 1, …, M-1 represent orthonormal basis vectors in CM,

•  Let gl, l=0, 1, …, N-1 represent orthonormal basis vectors in CN,

•  Hk,l=hkglT, or Hk,l(m,n)=hk(m)gl(n).

•  Then Hk,l will form an orthonormal basis set in CMxN.

22


Example of Separable Unitary Transform

•  Example 1

⎥⎦

⎤⎢⎣

⎡−

−==⎥

⎦

⎤⎢⎣

⎡−−

==

⎥⎦

⎤⎢⎣

⎡−−

==⎥⎦

⎤⎢⎣

⎡==

⎥⎦

⎤⎢⎣

⎡

−=⎥

⎦

⎤⎢⎣

⎡=

2/12/12/12/1

2/12/12/12/1

2/12/12/12/1

2/12/12/12/1

.2/12/1,

2/12/1

11110110

10010000

10

TT

TT

hhHhhH

hhHhhH

hh

23


Example: 4x4 DFT

( )

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−

−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−

−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−

−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−

−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−

−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−

−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−

−−−−=Η

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−

−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−

−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−

−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

−=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

1111

1111

41,

1111

1111

41,

1111

1111

41,

1111

1111

41

1111

1111

41,

11111111

11111111

41,

1111

1111

41,

111111111111

1111

41

1111

1111

41,

1111

1111

41,

1111

1111

41,

1111

1111

41

11111111

41,

1111111111111111

41,

11111111

41,

1111111111111111

41

:yields using

1

1

21;

111

1

21;

1

1

21;

1111

21: are basis DFT 1D theRecall

3,32,31,30,3

3,22,21,20,2

3,12,11,10,1

3,02,01,00,0

3210

jjjj

jjjj

jjjj

jjjj

jjjj

jjjj

jjjj

jjjj

jjjjjjjj

jjjjjjjj

jjjj

jjjj

jjjj

jjjj

jjjj

jjjj

jjjj

jjjj

jjjjjjjj

jjjjjjjj

j

j

j

j

Tlkk,l

HHHH

HHH

HHHH

HHHH

hhH

hhhh

24


Example: 4x4 DFT

( )

( ) )1(4122123221

41

1221121013100221

,

1111

1111

41,

4181221121013100221

41

1221121013100221

,

1111111111111111

41,

e.g, yields, , Using

compute ,

1221121013100221

For

*

3,23,2

0,00,0

,,

,

jjjjjjjj

jjjjjjjj

T

T

T

T

lklk

lk

−=−+−−−−+++−−+=

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−

−−−−

>==<

=−++++++++++++++=

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

>==<

>=<

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

=

FH

FH

FH

F

25


Example: 8x8 DCT

Low-Low

Low-High

High-Low

High-High

1,...,10

2

1)(

2)12(cos

2)12(cos)()(),(,

−==

⎪⎩

⎪⎨⎧

=

⎥⎦⎤

⎢⎣⎡ +

⎥⎦⎤

⎢⎣⎡ +=

Nkk

N

Nkwhere

Nln

NkmlknmH lk

α

ππαα

Example: D=dctmtx(8); Basis43=D(:,4)*D(3,:)’;

26

Hadamard Transform: Basis images

EL6123: Image and Video Processing

Example: D=hadamard(8); reindex=[1,8,4,5,2,7,3,6]; D(reindex,:)=D; Basis43=D(:,4)*D(:,3)’;

From Amy Reibman @Yao Wang, 2016 27


Property of Separable Transform

•  When the transform is separable, we can perform the 2D transform separately. –  First, do 1D transform for each row using basis vectors gl, –  Second, do 1D transform for each column of the intermediate

image using basis vectors hk. –  Proof:

∑∑ ∑∑∑−

=

−

=

−

=

−

=

−

=

===1

0

*1

0

1

0

**1

0

1

0

*, ),()(),()()(),(),(),(

M

m

M

m

N

nl

M

m

N

nlk lmUmhnmFngmhnmFnmHlkT

kk

28


DCT on a Real Image Block

>>imblock = lena256(128:135,128:135)-128

imblock=

54 68 71 73 75 73 71 45

47 52 48 14 20 24 20 -8

20 -10 -5 -13 -14 -21 -20 -21

-13 -18 -18 -16 -23 -19 -27 -28

-24 -22 -22 -26 -24 -33 -30 -23

-29 -13 3 -24 -10 -42 -41 5

-16 26 26 -21 12 -31 -40 23

17 30 50 -5 4 12 10 5

>>dctblock =dct2(imblock)

dctblock=

31.0000 51.7034 1.1673 -24.5837 -12.0000 -25.7508 11.9640 23.2873

113.5766 6.9743 -13.9045 43.2054 -6.0959 35.5931 -13.3692 -13.0005

195.5804 10.1395 -8.6657 -2.9380 -28.9833 -7.9396 0.8750 9.5585

35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125

40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606

7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -3.5087

-1.4562 -13.3225 -0.8750 1.3248 10.3817 16.0762 4.4157 1.1041

-6.7720 -2.8384 4.1187 1.1118 10.5527 -2.7348 -3.2327 1.5799

In JPEG, “imblock-128” is done before DCT to shift the mean to zero

Note that low-low coefficients are much larger than high-high coefficients

29


Energy Distribution of DCT Coefficients in Typical Images

Variance of each coefficient is determined by the average of the square of this coefficient in all blocks of an image

Zig-zag ordering

30


Images Approximated by Different Number of DCT Coefficients

Original

With 8/64 Coefficients



31


Demos

•  Use matlab demo to demonstrate approximation using different number of DCT coefficients

(dctdemo.m)

32

Transform design

•  What are desirable properties of a transform for image and video? –  Nearly decorrelating – improves efficiency of scalar quantizer –  High energy compaction – a few large coefficients to send –  Easy to compute (few operations) –  Separable – compute 1-D transform first on rows, then on

columns

•  What size transform should we use? –  Entire image? Small? –  2-D (on an image) or 3-D (incorporating time also)?

•  From Amy Reibman @Yao Wang, 2016 EL6123: Image and Video Processing 33

Karhunen Loève Transform (KLT)

•  Optimal transform •  Requires statistics of the input source

–  Known covariance function

•  Coefficients are completely uncorrelated •  The best energy compaction

–  Sort coefficients from largest to smallest expected squared magnitude; then the sum of the energies of the first M coefficients is as large as possible

•  No computationally efficient algorithm •  We’ll derive it later

•  From Amy Reibman


EL6123: Image and Video Processing

Other Transform Bases

•  Suboptimal transforms – many available! –  Discrete Fourier Transform (DFT): complex values;

discontinuities –  Discrete Cosine transform (DCT): nearly as good as KLT for

common image signals –  Hadamard and Haar: basis functions contain only +1,0,-1

@Yao Wang, 2016 35


Distortion in Transform Coding

•  Distortion in sample (image) domain

•  Distortion in coefficient (transform) domain

•  With a unitary transform, the two distortions are equal

36


Modeling of Distortion Due to Coefficient Quantization

•  High Resolution Approximation of Scalar Quantization –  With the MMSE quantizer, when each coefficient is scalar

quantized with sufficient high rates, so that the pdf in each quantization bin is approximately flat

Depends on the pdf of the k-th coefficient.

One coefficient

Average over all coefficients

37


Optimal Bit Allocation Among Coefficients

•  How Many Bits to Use For Each Coefficient? –  Can be formulated as an constrained optimization problem:

–  The constrained problem can be converted to unconstrained one using the Lagrange multiplier method

Minimize:

Subject to:

Minimize:

38


Derivation and Result

Multiply to obtain:

Substitute into first equation: Result: all distortions are equal!

39


Implication of Optimal Bit Allocation

•  Bit rate for a coefficient is proportional to its variance (energy)

•  Distortion is equalized among all coefficients and depends on

the geometric mean of the coefficient variances

•  Geometric mean is smaller than arithmetic mean! –  Ex (1, 9): arithmetic mean=5, geometric mean=3 –  The more disparate are the numbers, the smaller is their geometric mean

Geometric mean

40


Transform Coding Gain Over PCM

•  PCM: quantize each sample in the image domain directly •  Distortion for PCM if each sampled is quantized to R bit:

•  Gain over PCM:

•  For Gaussian source –  each sample is Gaussian, so that coefficients are also Gaussian,

are all the same

Arithmetic mean

Geometric mean

41


Example

•  Determine the optimal bit allocation and corresponding TC gain for coding 2x2 image block using 2x2 DCT. Assuming the image is a Gaussian process with inter-sample correlation as shown below.

42


Example Continued (Convert 2x2 into 4x1)

•  Correlation matrix

•  DCT basis images

•  Equivalent 1D transform matrix

43


Example Continued

(for R=2)

For practical applications, we may need to consider only integer and positive bits (integer programming problem).

44


Optimal Transform (KLT)

•  Optimal transform –  Should minimize the distortion for a given average bit rate –  Equivalent to minimize the geometric mean of the coefficient variances

•  When the source is Gaussian, the optimal transform is the Karhunen-Loeve transfrom, which depends on the covariance matrix between samples

–  Basis vectors are the eigen vectors of the covariance matrix, the coefficient variances are the eigen values

45


Example

•  Determine the KLT for the 2x2 image block in the previous example

Determine the eigenvalues by solving:

Determine the eigenvectors by solving

(same as the coefficient variances with DCT)

Resulting transform is the DCT

46

Properties of KLT

•  Optimal transform for Gaussian sources •  Nearly optimal transform for non-Gaussian sources

•  Minimal approximation error for K<N coefficients among all unitary transforms

•  KLT has highest energy compaction •  Coefficients are uncorrelated

•  Requires a stationary source with known covariance matrix – most sources vary spatially and temporally

•  No fast algorithms – and not signal independent



What is JPEG

•  The Joint Photographic Expert Group (JPEG), under both the International Standards Organization (ISO) and the International Telecommunications Union-Telecommunication Sector (ITU-T)

–  www.jpeg.org

•  Has published several standards –  JPEG: lossy coding of continuous tone still images

•  Based on DCT

–  JPEG-LS: lossless and near lossless coding of continuous tone still images

•  Based on predictive coding and entropy coding

–  JPEG2000: scalable coding of continuous tone still images (from lossy to lossless)

•  Based on wavelet transform

48


The 1992 JPEG Standard

•  Contains several modes: –  Baseline system (what is commonly known as JPEG!): lossy

•  Can handle gray scale or color images (8bit) –  Extended system: lossy

•  Can handle higher precision (12 bit) images, providing progressive streams, etc.

–  Lossless version •  Baseline system

–  Each color component is divided into 8x8 blocks –  For each 8x8 block, three steps are involved:

•  Block DCT •  Perceptual-based quantization •  Variable length coding: Runlength and Huffman coding

49


Quantization of DCT Coefficients

•  DC coefficient is predicted from the DC of the previous block, and the prediction error is quantized.

•  Use uniform quantizer on the DC prediction error and each AC coefficient

•  Different coefficient is quantized with different step-size (Q): –  Human eye is more sensitive to low frequency components –  Low frequency coefficients with a smaller Q –  High frequency coefficients with a larger Q –  Specified in a normalization matrix –  Normalization matrix can then be scaled by a scale factor

•  JPEG bit allocation does not intend to minimize MSE!

50


Default Normalization Matrix in JPEG

Actual step size for C(i,j): Q(i,j) = QP*M(i,j)

Note that the stepsize for the DC coefficient is for the prediction error for DC, not the original DC value

51


Example: DCT on a Real Image Block

>>imblock = lena256(128:135,128:135)-128

imblock=

54 68 71 73 75 73 71 45

47 52 48 14 20 24 20 -8

20 -10 -5 -13 -14 -21 -20 -21

-13 -18 -18 -16 -23 -19 -27 -28

-24 -22 -22 -26 -24 -33 -30 -23

-29 -13 3 -24 -10 -42 -41 5

-16 26 26 -21 12 -31 -40 23

17 30 50 -5 4 12 10 5


dctblock=

31.0000 51.7034 1.1673 -24.5837 -12.0000 -25.7508 11.9640 23.2873

113.5766 6.9743 -13.9045 43.2054 -6.0959 35.5931 -13.3692 -13.0005

195.5804 10.1395 -8.6657 -2.9380 -28.9833 -7.9396 0.8750 9.5585

35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125

40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606

7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -3.5087

-1.4562 -13.3225 -0.8750 1.3248 10.3817 16.0762 4.4157 1.1041

-6.7720 -2.8384 4.1187 1.1118 10.5527 -2.7348 -3.2327 1.5799

In JPEG, “imblock-128” is done before DCT to shift the mean to zero

52


Example: Quantized Indices


dctblock=

31.0000 51.7034 1.1673 -24.5837 -12.0000 -25.7508 11.9640 23.2873

113.5766 6.9743 -13.9045 43.2054 -6.0959 35.5931 -13.3692 -13.0005

195.5804 10.1395 -8.6657 -2.9380 -28.9833 -7.9396 0.8750 9.5585

35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125

40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606

7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -3.5087

-1.4562 -13.3225 -0.8750 1.3248 10.3817 16.0762 4.4157 1.1041

-6.7720 -2.8384 4.1187 1.1118 10.5527 -2.7348 -3.2327 1.5799

>>QP=1;

>>QM=Qmatrix*QP;

>>qdct=floor((dctblock+QM/2)./(QM))

qdct =

2 5 0 -2 0 -1 0 0

9 1 -1 2 0 1 0 0

14 1 -1 0 -1 0 0 0

3 -1 -1 -1 0 0 0 0

2 -1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Only 19 coefficients are retained out of 64

53


Example: Quantized Coefficients

%dequantized DCT block

>> iqdct=qdct.*QM

iqdct=

32 55 0 -32 0 -40 0 0

108 12 -14 38 0 58 0 0

196 13 -16 0 -40 0 0 0

42 -17 -22 -29 0 0 0 0

36 -22 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Original DCT block

dctblock=

31.0000 51.7034 1.1673 -24.5837 -12.0000 -25.7508 11.9640 23.2873

113.5766 6.9743 -13.9045 43.2054 -6.0959 35.5931 -13.3692 -13.0005

195.5804 10.1395 -8.6657 -2.9380 -28.9833 -7.9396 0.8750 9.5585

35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125

40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606

7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -3.5087

-1.4562 -13.3225 -0.8750 1.3248 10.3817 16.0762 4.4157 1.1041

-6.7720 -2.8384 4.1187 1.1118 10.5527 -2.7348 -3.2327 1.5799

54


Example: Reconstructed Image

%reconstructed image block

>> qimblock=round(idct2(iqdct))

qimblock=

58 68 85 79 61 68 67 38

45 38 39 33 22 24 19 -2

21 2 -11 -12 -13 -19 -24 -27

-8 -19 -31 -26 -20 -35 -37 -15

-31 -17 -21 -20 -16 -39 -41 0

-33 3 -1 -14 -11 -37 -44 1

-16 32 18 -10 1 -16 -30 8

3 54 30 -6 16 11 -7 23

Original image block

imblock=

54 68 71 73 75 73 71 45

47 52 48 14 20 24 20 -8

20 -10 -5 -13 -14 -21 -20 -21

-13 -18 -18 -16 -23 -19 -27 -28

-24 -22 -22 -26 -24 -33 -30 -23

-29 -13 3 -24 -10 -42 -41 5

-16 26 26 -21 12 -31 -40 23

17 30 50 -5 4 12 10 5

55


Coding of Quantized DCT Coefficients

•  DC coefficient: Predictive coding –  The DC value of the current block is predicted from that of the

previous block, and the error is coded using Huffman coding •  AC Coefficients: Runlength coding

–  Many high frequency AC coefficients are zero after first few low-frequency coefficients

–  Runlength Representation: •  Ordering coefficients in the zig-zag order •  Specify how many zeros before a non-zero value •  Each symbol=(length-of-zero, non-zero-value)

–  Code all possible symbols using Huffman coding •  More frequently appearing symbols are given shorter codewords •  One can use default Huffman tables or specify its own tables.

•  Instead of Huffman coding, arithmetic coding can be used to achieve higher coding efficiency at an added complexity.

56


Example: Run-length Coding

qdct =

2 5 0 -2 0 -1 0 0

9 1 -1 2 0 1 0 0

14 1 -1 0 -1 0 0 0

3 -1 -1 -1 0 0 0 0

2 -1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Run-length symbol representation: {2,(0,5),(0,9),(0,14),(0,1),(1,-2),(0,-1),(0,1),(0,3),(0,2),(0,-1),(0,-1),(0,2),(1,-1),(2,-1), (0,-1), (4,-1),(0,-1),(0,1),EOB} EOB: End of block, one of the symbol that is assigned a short Huffman codeword

Zig-zag ordering of DCT coefficients:

57


Coding of DC Symbols

•  Example: –  Current quantized DC index: 2 –  Previous block DC index: 4 –  Prediction error: -2 –  The prediction error is coded in two parts:

•  Which category it belongs to (Table of JPEG Coefficient Coding Categories), and code using a Huffman code (JPEG Default DC Code)

–  DC= -2 is in category “2”, with a codeword “100” •  Which position it is in that category, using a fixed length

code, length=category number –  “-2” is the number 1 (starting from 0) in category 2, with a

fixed length code of “01”. –  The overall codeword is “10001”

58


JPEG Tables for Coding DC

59


Coding of AC Coefficients

•  Example: –  First symbol (0,5)

•  The value ‘5’ is represented in two parts: •  Which category it belongs to (Table of JPEG Coefficient Coding

Categories), and code the “(runlength, category)” using a Huffman code (JPEG Default AC Code)

–  AC=5 is in category “3”, –  Symbol (0,3) has codeword “100”

•  Which position it is in that category, using a fixed length code, length=category number

–  “5” is the number 5 (starting from 0) in category 3, with a fixed length code of “101”.

–  The overall codeword for (0,5) is “100101” –  Second symbol (0,9)

•  ‘9’ in category ‘4’, (0,4) has codeword ‘1011’,’9’ is number 9 in category 4 with codeword ‘1001’ -> overall codeword for (0,9) is ‘10111001’

–  ETC 60


JPEG Tables for Coding AC (Run,Category) Symbols

61


JPEG Performance for B/W images

65536 Bytes 8 bpp

4839 Bytes 0.59 bpp CR=13.6

3037 Bytes 0.37 bpp CR=21.6

1818 Bytes 0.22 bpp CR=36.4

62


JPEG for Color Images

•  Color images are typically stored in (R,G,B) format •  JPEG standard can be applied to each component separately

–  Does not make use of the correlation between color components –  Does not make use of the lower sensitivity of the human eye to

chrominance samples •  Alternate approach

–  Convent (R,G,B) representation to a YCbCr representation •  Y: luminance, Cb, Cr: chrominance

–  Down-sample the two chrominance components •  Because the peak response of the eye to the luminance component occurs

at a higher frequency (3-10 cpd) than to the chrominance components (0.1-0.5 cpd). (Note: cpd is cycles/degree)

•  JPEG standard is designed to handle an image consists of many (up to 100) components

63


RGB <-> YCbCr Conversion

Note: Cb ~ Y-B, Cr ~ Y-R, are known as color difference signals.

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡−−

−=

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡+

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−−−=

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

128128

001.0772.1000.1714.0344.0000.1402.1001.0000.1

1281280

081.0419.0500.0500.0331.0169.0114.0587.0299.0

r

b

r

b

CCY

BGR

BGR

CCY

64


Chrominance Subsampling

4 : 2 : 0 F o r e v e r y 2 x 2 Y P i x e l s

1 C b & 1 C r P i x e l ( S u b s a m p l i n g b y 2 : 1 b o t h h o r i z o n t a l l y a n d v e r t i c a l l y )


2 C b & 2 C r P i x e l ( S u b s a m p l i n g b y 2 : 1

h o r i z o n t a l l y o n l y )


4 C b & 4 C r P i x e l ( N o s u b s a m p l i n g )

Y P i x e l C b a n d C r P i x e l


1 C b & 1 C r P i x e l ( S u b s a m p l i n g b y 4 : 1

h o r i z o n t a l l y o n l y )

4:2:0 is the most common format

65


Coding Unit in JPEG

4 8x8 Y blocks 1 8x8 Cb blocks 1 8x8 Cr blocks

Each basic coding unit (called a data unit) is a 8x8 block in any color component. In the interleaved mode, 4 Y blocks and 1 Cb and 1 Cr blocks are processed as a group (called a minimum coding unit or MCU) for a 4:2:0 image.

66


Default Quantization Table

17 18 24 47 99 99 99 99

18 21 26 66 99 99 99 99

24 26 56 99 99 99 99 99

47 66 99 99 99 99 99 99

99 99 99 99 99 99 99 99

99 99 99 99 99 99 99 99

99 99 99 99 99 99 99 99

99 99 99 99 99 99 99 99

For luminance For chrominance

The encoder can specify the quantization tables different from the default ones as part of the header information

67


Performance of JPEG

•  For color images at 24 bits/pixel (bpp) –  0.25-0.5 bpp: moderate to good –  0.5-0.75 bpp: good to very good –  0.75-1.5 bpp: excellent, sufficient for most

applications –  1.5-2 bpp: indistinguishable from original –  From: G. K. Wallace: The JPEG Still picture

compression standard, Communications of ACM, April 1991.

•  For grayscale images at 8 bpp –  0.5 bpp: excellent quality

68


JPEG Performance

487x414 pixels, Uncompressed, 600471 Bytes,24 bpp 85502 Bytes, 3.39 bpp, CR=7

487x414 pixels 41174 Bytes, 1.63 bpp, CR=14.7

69


Reading

•  Reading assignment: –  [Wang2002] Sec. 9.1

•  Optional reading: –  R. Gonzalez, “Digital Image Processing,” Section 8.5 - 8.6 –  G. K. Wallace: The JPEG Still picture compression standard,

Communications of ACM, April 1991. –  [Wang2002] Sec. 9.2 (Predictive coding)

70


Written Homework (1)

•  [Wang2002] Prob. 9.4,9.5 •  Additional problems in the following page

71



1. For the 2x2 image S given below, compute its 2D DCT, reconstruct it by retaining

different number of coefficients to evaluate the effect of different basis images. a)  Determine the four DCT basis images. b)  Determine the 2D-DCT coefficients for S, Tk,l, k=0,1;l=0,1. c)  Show that the reconstructed image from the original DFT coefficients equal to the original

image. d)  Modify the DCT coefficients using the given window masks (W1 to W5) and reconstruct

the image using the modified DCT coefficients. (for a given mask, “1” indicates to retain that coefficient, “0” means to set the corresponding coefficient to zero) What effect do you see with each mask and why?

2. For the same image S as given in Prob. 1, quantize its DCT coefficients using the

quantization matrix Q. Determine the quantized coefficient indices and quantized values. Also, determine the reconstructed image from the quantized coefficients.

⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡=

1001

,1000

,0100

,0010

,0001

,9119

54321 WWWWWS

⎥⎦

⎤⎢⎣

⎡=

5333

Q

72



3. Describe briefly how JPEG compresses an image. You may want to break down your discussion into three parts:

a)  How does JPEG compress a 8x8 image block (three steps are involved) b)  How does JPEG compress a gray-scale image c)  How does JPEG compress a RGB color image

4. Suppose the DCT coefficient matrix for an 4x4 image block is as shown below (dctblock).

a)  Quantize its DCT coefficients using the quantization matrix Q given below, assuming QP=1. Determine the quantized coefficient indices and quantized values.

b)  Represent the quantized indices using the run-length representation. That is, generate a series of symbols, the first being the quantized DC index, the following symbols each consisting of a length of zeros and the following non-zero index, the last symbol is EOB (end of block).

c)  Encode the DC index and the runlength symbols from (b) using the JPEG coding method, with the coding tables given in the lecture note. For this problem, assuming the quantized index for the DC coefficient of the previous block is 60.

,

0.0085 0.0044 0.0086 0.0046- 0.0019 0.0036 0.0207- 0.0086- 0.0071 0.0877 0.0033- 0.0134 0.0912 0.0466- 0.0500- 1.3676

* 0030.1

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

+= edctblock ,

120 103 78 49 103 68 37 18 69 40 16 14 51 24 10 16

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=Q

73


Computer Assignment

Write a program to examine the effect of quantizing the DCT coefficients. For each 8 x 8 block in the image, your program should first calculate the DCT of the block, quantize the coefficients, and then take the inverse DCT. For quantization, use a scaled version of the JPEG quantization matrix as the stepsizes. You only need to do this for a gray scale image or the luminance component of a color image. Furthermore, the program should compute the PSNR of the reconstructed image from quantized DCT coefficients and also count the total number of non-zero coefficients after quantization and derive the average number of non-zero coefficients per block. Your program should show the original and reconstructed image. Examine the resulting image quality with the following values for the scaling factor: 0.5, 1, 2, 4, 8. What is the largest value of the scaling factor at which the reconstructed image quality is very close to the original image? Please also plot PSNR vs. the quantization scaling factor, the number of non-zero coefficients/block vs. the quantization scaling factors, and finally the PSNR vs. the number of non-zero coefficients. Interpret these plots, to comment on the effect of the quantization factor on the image quality and bit rate. Note that you may roughly consider the number of non-zero coefficients to be proportional to the required bits to represent the quantized image. Hint: you may want to make use of the ”blockproc” function in MATLAB to speed up your program. You can use the dct2( ) and idct2( ) functions in MATLAB.

74

Unitary Transforms and Transform Coding

Documents