Unitary Transforms and Transform Coding Yao Wang Polytechnic School of Engineering, New York University @Yao Wang, 2016 EL6123: Image and Video Processing 1
Unitary Transforms and Transform Coding
Yao Wang Polytechnic School of Engineering, New York University
@Yao Wang, 2016 EL6123: Image and Video Processing 1
@Yao Wang, 2016 EL6123: Image and Video Processing
Outline
• Overview of video coding systems • Linear and unitary 1D transform • 2D transform, separable 2D transform • Transform coding
– Optimal bit allocation
• JPEG Image Coding Standard
2
@Yao Wang, 2016 EL6123: Image and Video Processing
Components in a Coding System
Focus of this lecture
3
@Yao Wang, 2016 EL6123: Image and Video Processing
Encoder Block Diagram of a Typical Block-Based Video Coder
(Assuming No Intra Prediction)
Previous lecture: Motion estimation Last lecture: Variable Length Coding Last lecture: Scalar and Vector Quantization This lecture: transform and predictive coding
4
@Yao Wang, 2016 EL6123: Image and Video Processing
A Review of Vector Quantization
• Motivation: quantize a group of samples (a vector) together, to exploit the correlation between samples
• Each sample vector is replaced by one of the representative vectors (or patterns) that often occur in the signal
• Typically a block of 4x4 pixels • Design is limited by ability to obtain training samples
that are similar to samples to be quantized • Implementation is limited by large number of nearest
neighbor comparisons – exponential in the block size
5
@Yao Wang, 2016 EL6123: Image and Video Processing
Transform Coding
• Motivation: – Represent a vector (e.g. a block of image samples) as the
superposition of some typical vectors (block patterns) – Quantize and code the coefficients – Can be thought of as a constrained vector quantizer
+ t1 t2 t3 t4
6
@Yao Wang, 2016 EL6123: Image and Video Processing
Block Diagram
7
@Yao Wang, 2016 EL6123: Image and Video Processing
One Dimensional Linear Transform
• Let CN represent the N dimensional complex space. • Let h0, h1, …, hN-1 represent N linearly independent
vectors in CN. • Any vector f є CN can be represented as a linear
combination of h0, h1, …, hN-1 :
.
)1(
)1()0(
],,...,,[
,)(
110
1
0
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
==
==
−
−
=∑
Nt
tt
where
kt
N
N
kk
thhhB
Bthf
AffBt == −1
f and t form a transform pair
8
@Yao Wang, 2016 EL6123: Image and Video Processing
Inner Product
• Definition of inner product
• Orthogonal
• Norm of a vector
• Normalized vector: unit norm
• Orthonomal = orthogonal + normalized
∑−
=
=>=<1
02
*12121 )()(,
N
n
H nfnfffff
0, 21 >=< ff
∑−
=
=>==<1
0
22 |)(|,N
n
H nffffff
12 =f
9
@Yao Wang, 2016 EL6123: Image and Video Processing
Orthonormal Basis Vectors (OBV)
• {hk, k=0,…N-1} are OBV if
• With OBV
⎩⎨⎧
≠=
=>=<lklk
lklk 01
, ,δhh
fhhhhhfh Hlkl
N
kk
N
kll ltktkt =>=<>=>=<< ∑∑
−
=
−
=
)(,)()(,,1
0
1
0
AffBf
h
hh
t ==
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
−
H
HN
H
H
1
1
0
.,1 IBBBBBB ===− HHH or B is unitary
10
@Yao Wang, 2016 EL6123: Image and Video Processing
Definition of Unitary Transform
• Basis vectors are orthonormal • Forward transform
• Inverse transform
AffBf
h
hh
t
fh
==
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
>==<
−
−
=∑
H
HN
H
H
N
nkk nfnhkt
1
1
0
1
0
* ,)()(,)(
[ ] tABtthhhhf HN
N
kk
N
kk
kt
nhktnf
====
=
−
−
=
−
=
∑
∑
110
1
0
1
0
)(
,)()()(
11
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: 4-pt Hadamard Transform
⎪⎪⎩
⎪⎪⎨
⎧
−==−==
⇒
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
−=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
1025
4321
,
2/12/12/12/1
,
2/12/12/12/1
,
2/12/12/12/1
,
2/12/12/12/1
3
2
1
0
3210
tttt
f
hhhh
12
@Yao Wang, 2016 EL6123: Image and Video Processing
1D DFT as a Unitary Transform
.1,...,1,0,
1
1
,1)(
)1(2
2
2
−=
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
=
−
Nk
e
eN
oreN
nh
NkNj
Nkj
k
Nknj
k
π
π
π
h
.1...,,1,0,)(1)(
;1...,,1,0,)(1)(
1
0
2
1
0
2
−==
−==
∑
∑−
=
−
=
−
NnekFN
nf
NkenfN
kF
N
k
Nknj
N
n
Nknj
π
π
13
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: 1D DFT, N=2
( ) ( )
fhh
fh
f
hhh
=⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡−
−⎥⎦
⎤⎢⎣
⎡=+
−=−=⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡−
==+=⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡=
>=<
⎥⎦
⎤⎢⎣
⎡=
⎥⎦
⎤⎢⎣
⎡−
=⎥⎦
⎤⎢⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
=
=
21
11
21
11
23 :Verify
212*11*1
21
21
,1
12
1,2
32*11*12
121
,11
21
obtain we,, Using
, determine ,21
if
11
21,
11
21:
)12
2exp(
)02
2exp(
21
: vectorsbasis only two are there:case 2
1100
10
10
10
tt
tt
t
tt
kj
kj
N
kk
k
π
π
14
@Yao Wang, 2016 EL6123: Image and Video Processing
Another Example: 1D DFT, N=4
( ) ( )
( ) ( ) ( )
fhhhh
f
hhhhh
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−++−+++−++−−−+−
=+++
−−=−−+==−+−=
+−=+−−==+++=⇒
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
−=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
==
1220168
41
)3()3(14)3()3(14)3()3(14)3()3(14
41 :Verify
.3213542
21;03542
21
);3(213542
21;73542
21
3542
1
1
21;
111
1
21;
1
1
21;
1111
21:yields
)34
2exp(
)24
2exp(
)14
2exp(
)04
2exp(
21 using :case 4
33221100
32
10
3210
jjjjjjjjjj
jj
tttt
jjjtt
jjjtt
j
j
j
j
kj
kj
kj
kj
N k
π
π
π
π
15
@Yao Wang, 2016 EL6123: Image and Video Processing
1D Discrete Cosine Transform (DCT)
1,...,10
2
1)(
2)12(cos)()(
:Vectors Basis
−==
⎪⎩
⎪⎨⎧
=
⎥⎦⎤
⎢⎣⎡ +=
Nkk
N
Nkwhere
Nknknhk
α
πα
∑
∑−
=
−
=
=
=
1
0
1
0
)()()( :Transforms Inverse
)()()( :Transform Forward
N
uk
N
nk
nhkTnf
nhnfkT
( )
( )
fhhhh
f
hhhhh
=+++
=−+−=−=+−−=
−=−+−==+++=⇒
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
−=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
=⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
=⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
=⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
⎟⎠⎞⎜
⎝⎛
=
=
33221100
32
10
3210
:Verify
.3827.06533.0*)45(2706.0*)32(;2354221
;9239.02706.0*)54(6533.0*)32(;7354221
3542
2706.06533.06533.0
2706.0
821cos
815cos
89cos
83cos
21;
5.05.05.0
5.0
814cos
810cos
86cos
82cos
21;
6533.02706.0
2706.06533.0
87cos
85cos
83cos
81cos
21;
5.05.05.05.0
1111
21:yields
87cos
85cos
83cos
81cos
)(
:case 4tion RepresentaVector
tttt
tt
tt
k
k
k
k
k
N
k
π
π
π
π
π
π
π
π
π
π
π
π
π
π
π
π
α
16
@Yao Wang, 2016 EL6123: Image and Video Processing
Two Dimensional Transform
• Decompose a MxN 2D matrix F=[F(m,n)] into a linear combination of some basic images, Hk,l=[Hk,l(m,n)], so that:
∑∑
∑∑−
=
−
=
−
=
−
=
=
=
1
0
1
0,
1
0
1
0,
),(),(),(
,),(
M
k
N
llk
M
k
N
llk
nmHlkTnmF
lkT HF
17
@Yao Wang, 2016 EL6123: Image and Video Processing
Graphical Interpretation
+ t1 t2 t3 t4
Inverse transform: Represent a vector (e.g. a block of image samples) as the superposition of some basis vectors (block patterns) Forward transform: Determine the coefficients associated with each basis vector
18
@Yao Wang, 2016 EL6123: Image and Video Processing
Two Dimensional Inner Product
• Inner Product
• Norm of a Matrix
• A set of basis images {Hk,l, k=0,1,…,M-1, l=0,1,…,N-1} is orthonormal if
∑∑−
=
−
=
>=<1
0
1
02
*121 ),(),(,
M
m
N
nnmFnmFFF
∑∑−
=
−
=
>==<1
0
1
0
2),(,M
m
N
nnmFFFF
⎩⎨⎧ ==
=>=<otherwise
jlikifjlikjilk ,0
,,1, ,,,, δδHH
19
@Yao Wang, 2016 EL6123: Image and Video Processing
Two Dimensional Unitary Transform
• {Hk,l} is an orthonormal set of basis images • Forward transform
• Inverse transform
∑∑−
=
−
=
>==<1
0
1
0
*,, ),(),(,),(
M
m
N
nlklk nmFnmHlkT FH
∑∑
∑∑−
=
−
=
−
=
−
=
=
=
1
0
1
0,
1
0
1
0,
),(
,),(),(),(
M
k
N
llk
M
k
N
llk
lkT
ornmHlkTnmF
HF
20
@Yao Wang, 2016 EL6123: Image and Video Processing
Example of 2D Unitary Transform
⎪⎪⎩
⎪⎪⎨
⎧
=−=−==
⇒⎥⎦
⎤⎢⎣
⎡=
⎥⎦
⎤⎢⎣
⎡−
−=⎥
⎦
⎤⎢⎣
⎡−−
=⎥⎦
⎤⎢⎣
⎡−−
=⎥⎦
⎤⎢⎣
⎡=
0)1,1(1)0,1(2)1,0(5)0,0(
4321
,2/12/12/12/1
,2/12/12/12/1
,2/12/12/12/1
,2/12/12/12/1
11100100
TTTT
F
HHHH
21
@Yao Wang, 2016 EL6123: Image and Video Processing
Separable Unitary Transform
• Let hk, k=0, 1, …, M-1 represent orthonormal basis vectors in CM,
• Let gl, l=0, 1, …, N-1 represent orthonormal basis vectors in CN,
• Hk,l=hkglT, or Hk,l(m,n)=hk(m)gl(n).
• Then Hk,l will form an orthonormal basis set in CMxN.
22
@Yao Wang, 2016 EL6123: Image and Video Processing
Example of Separable Unitary Transform
• Example 1
⎥⎦
⎤⎢⎣
⎡−
−==⎥
⎦
⎤⎢⎣
⎡−−
==
⎥⎦
⎤⎢⎣
⎡−−
==⎥⎦
⎤⎢⎣
⎡==
⎥⎦
⎤⎢⎣
⎡
−=⎥
⎦
⎤⎢⎣
⎡=
2/12/12/12/1
2/12/12/12/1
2/12/12/12/1
2/12/12/12/1
.2/12/1,
2/12/1
11110110
10010000
10
TT
TT
hhHhhH
hhHhhH
hh
23
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: 4x4 DFT
( )
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−
−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−
−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−
−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−
−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−
−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−
−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−
−−−−=Η
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−
−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−
−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−
−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
−=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
1111
1111
41,
1111
1111
41,
1111
1111
41,
1111
1111
41
1111
1111
41,
11111111
11111111
41,
1111
1111
41,
111111111111
1111
41
1111
1111
41,
1111
1111
41,
1111
1111
41,
1111
1111
41
11111111
41,
1111111111111111
41,
11111111
41,
1111111111111111
41
:yields using
1
1
21;
111
1
21;
1
1
21;
1111
21: are basis DFT 1D theRecall
3,32,31,30,3
3,22,21,20,2
3,12,11,10,1
3,02,01,00,0
3210
jjjj
jjjj
jjjj
jjjj
jjjj
jjjj
jjjj
jjjj
jjjjjjjj
jjjjjjjj
jjjj
jjjj
jjjj
jjjj
jjjj
jjjj
jjjj
jjjj
jjjjjjjj
jjjjjjjj
j
j
j
j
Tlkk,l
HHHH
HHH
HHHH
HHHH
hhH
hhhh
24
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: 4x4 DFT
( )
( ) )1(4122123221
41
1221121013100221
,
1111
1111
41,
4181221121013100221
41
1221121013100221
,
1111111111111111
41,
e.g, yields, , Using
compute ,
1221121013100221
For
*
3,23,2
0,00,0
,,
,
jjjjjjjj
jjjjjjjj
T
T
T
T
lklk
lk
−=−+−−−−+++−−+=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−
−−−−
>==<
=−++++++++++++++=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
>==<
>=<
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
=
FH
FH
FH
F
25
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: 8x8 DCT
Low-Low
Low-High
High-Low
High-High
1,...,10
2
1)(
2)12(cos
2)12(cos)()(),(,
−==
⎪⎩
⎪⎨⎧
=
⎥⎦⎤
⎢⎣⎡ +
⎥⎦⎤
⎢⎣⎡ +=
Nkk
N
Nkwhere
Nln
NkmlknmH lk
α
ππαα
Example: D=dctmtx(8); Basis43=D(:,4)*D(3,:)’;
26
Hadamard Transform: Basis images
EL6123: Image and Video Processing
Example: D=hadamard(8); reindex=[1,8,4,5,2,7,3,6]; D(reindex,:)=D; Basis43=D(:,4)*D(:,3)’;
From Amy Reibman @Yao Wang, 2016 27
@Yao Wang, 2016 EL6123: Image and Video Processing
Property of Separable Transform
• When the transform is separable, we can perform the 2D transform separately. – First, do 1D transform for each row using basis vectors gl, – Second, do 1D transform for each column of the intermediate
image using basis vectors hk. – Proof:
∑∑ ∑∑∑−
=
−
=
−
=
−
=
−
=
===1
0
*1
0
1
0
**1
0
1
0
*, ),()(),()()(),(),(),(
M
m
M
m
N
nl
M
m
N
nlk lmUmhnmFngmhnmFnmHlkT
kk
28
@Yao Wang, 2016 EL6123: Image and Video Processing
DCT on a Real Image Block
>>imblock = lena256(128:135,128:135)-128
imblock=
54 68 71 73 75 73 71 45
47 52 48 14 20 24 20 -8
20 -10 -5 -13 -14 -21 -20 -21
-13 -18 -18 -16 -23 -19 -27 -28
-24 -22 -22 -26 -24 -33 -30 -23
-29 -13 3 -24 -10 -42 -41 5
-16 26 26 -21 12 -31 -40 23
17 30 50 -5 4 12 10 5
>>dctblock =dct2(imblock)
dctblock=
31.0000 51.7034 1.1673 -24.5837 -12.0000 -25.7508 11.9640 23.2873
113.5766 6.9743 -13.9045 43.2054 -6.0959 35.5931 -13.3692 -13.0005
195.5804 10.1395 -8.6657 -2.9380 -28.9833 -7.9396 0.8750 9.5585
35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125
40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606
7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -3.5087
-1.4562 -13.3225 -0.8750 1.3248 10.3817 16.0762 4.4157 1.1041
-6.7720 -2.8384 4.1187 1.1118 10.5527 -2.7348 -3.2327 1.5799
In JPEG, “imblock-128” is done before DCT to shift the mean to zero
Note that low-low coefficients are much larger than high-high coefficients
29
@Yao Wang, 2016 EL6123: Image and Video Processing
Energy Distribution of DCT Coefficients in Typical Images
Variance of each coefficient is determined by the average of the square of this coefficient in all blocks of an image
Zig-zag ordering
30
@Yao Wang, 2016 EL6123: Image and Video Processing
Images Approximated by Different Number of DCT Coefficients
Original
With 8/64 Coefficients
With 16/64 Coefficients
With 4/64 Coefficients
31
@Yao Wang, 2016 EL6123: Image and Video Processing
Demos
• Use matlab demo to demonstrate approximation using different number of DCT coefficients
(dctdemo.m)
32
Transform design
• What are desirable properties of a transform for image and video? – Nearly decorrelating – improves efficiency of scalar quantizer – High energy compaction – a few large coefficients to send – Easy to compute (few operations) – Separable – compute 1-D transform first on rows, then on
columns
• What size transform should we use? – Entire image? Small? – 2-D (on an image) or 3-D (incorporating time also)?
• From Amy Reibman @Yao Wang, 2016 EL6123: Image and Video Processing 33
Karhunen Loève Transform (KLT)
• Optimal transform • Requires statistics of the input source
– Known covariance function
• Coefficients are completely uncorrelated • The best energy compaction
– Sort coefficients from largest to smallest expected squared magnitude; then the sum of the energies of the first M coefficients is as large as possible
• No computationally efficient algorithm • We’ll derive it later
• From Amy Reibman
@Yao Wang, 2016 EL6123: Image and Video Processing 34
EL6123: Image and Video Processing
Other Transform Bases
• Suboptimal transforms – many available! – Discrete Fourier Transform (DFT): complex values;
discontinuities – Discrete Cosine transform (DCT): nearly as good as KLT for
common image signals – Hadamard and Haar: basis functions contain only +1,0,-1
@Yao Wang, 2016 35
@Yao Wang, 2016 EL6123: Image and Video Processing
Distortion in Transform Coding
• Distortion in sample (image) domain
• Distortion in coefficient (transform) domain
• With a unitary transform, the two distortions are equal
36
@Yao Wang, 2016 EL6123: Image and Video Processing
Modeling of Distortion Due to Coefficient Quantization
• High Resolution Approximation of Scalar Quantization – With the MMSE quantizer, when each coefficient is scalar
quantized with sufficient high rates, so that the pdf in each quantization bin is approximately flat
Depends on the pdf of the k-th coefficient.
One coefficient
Average over all coefficients
37
@Yao Wang, 2016 EL6123: Image and Video Processing
Optimal Bit Allocation Among Coefficients
• How Many Bits to Use For Each Coefficient? – Can be formulated as an constrained optimization problem:
– The constrained problem can be converted to unconstrained one using the Lagrange multiplier method
Minimize:
Subject to:
Minimize:
38
@Yao Wang, 2016 EL6123: Image and Video Processing
Derivation and Result
Multiply to obtain:
Substitute into first equation: Result: all distortions are equal!
39
@Yao Wang, 2016 EL6123: Image and Video Processing
Implication of Optimal Bit Allocation
• Bit rate for a coefficient is proportional to its variance (energy)
• Distortion is equalized among all coefficients and depends on
the geometric mean of the coefficient variances
• Geometric mean is smaller than arithmetic mean! – Ex (1, 9): arithmetic mean=5, geometric mean=3 – The more disparate are the numbers, the smaller is their geometric mean
Geometric mean
40
@Yao Wang, 2016 EL6123: Image and Video Processing
Transform Coding Gain Over PCM
• PCM: quantize each sample in the image domain directly • Distortion for PCM if each sampled is quantized to R bit:
• Gain over PCM:
• For Gaussian source – each sample is Gaussian, so that coefficients are also Gaussian,
are all the same
Arithmetic mean
Geometric mean
41
@Yao Wang, 2016 EL6123: Image and Video Processing
Example
• Determine the optimal bit allocation and corresponding TC gain for coding 2x2 image block using 2x2 DCT. Assuming the image is a Gaussian process with inter-sample correlation as shown below.
42
@Yao Wang, 2016 EL6123: Image and Video Processing
Example Continued (Convert 2x2 into 4x1)
• Correlation matrix
• DCT basis images
• Equivalent 1D transform matrix
43
@Yao Wang, 2016 EL6123: Image and Video Processing
Example Continued
(for R=2)
For practical applications, we may need to consider only integer and positive bits (integer programming problem).
44
@Yao Wang, 2016 EL6123: Image and Video Processing
Optimal Transform (KLT)
• Optimal transform – Should minimize the distortion for a given average bit rate – Equivalent to minimize the geometric mean of the coefficient variances
• When the source is Gaussian, the optimal transform is the Karhunen-Loeve transfrom, which depends on the covariance matrix between samples
– Basis vectors are the eigen vectors of the covariance matrix, the coefficient variances are the eigen values
45
@Yao Wang, 2016 EL6123: Image and Video Processing
Example
• Determine the KLT for the 2x2 image block in the previous example
Determine the eigenvalues by solving:
Determine the eigenvectors by solving
(same as the coefficient variances with DCT)
Resulting transform is the DCT
46
Properties of KLT
• Optimal transform for Gaussian sources • Nearly optimal transform for non-Gaussian sources
• Minimal approximation error for K<N coefficients among all unitary transforms
• KLT has highest energy compaction • Coefficients are uncorrelated
• Requires a stationary source with known covariance matrix – most sources vary spatially and temporally
• No fast algorithms – and not signal independent
@Yao Wang, 2016 EL6123: Image and Video Processing 47
@Yao Wang, 2016 EL6123: Image and Video Processing
What is JPEG
• The Joint Photographic Expert Group (JPEG), under both the International Standards Organization (ISO) and the International Telecommunications Union-Telecommunication Sector (ITU-T)
– www.jpeg.org
• Has published several standards – JPEG: lossy coding of continuous tone still images
• Based on DCT
– JPEG-LS: lossless and near lossless coding of continuous tone still images
• Based on predictive coding and entropy coding
– JPEG2000: scalable coding of continuous tone still images (from lossy to lossless)
• Based on wavelet transform
48
@Yao Wang, 2016 EL6123: Image and Video Processing
The 1992 JPEG Standard
• Contains several modes: – Baseline system (what is commonly known as JPEG!): lossy
• Can handle gray scale or color images (8bit) – Extended system: lossy
• Can handle higher precision (12 bit) images, providing progressive streams, etc.
– Lossless version • Baseline system
– Each color component is divided into 8x8 blocks – For each 8x8 block, three steps are involved:
• Block DCT • Perceptual-based quantization • Variable length coding: Runlength and Huffman coding
49
@Yao Wang, 2016 EL6123: Image and Video Processing
Quantization of DCT Coefficients
• DC coefficient is predicted from the DC of the previous block, and the prediction error is quantized.
• Use uniform quantizer on the DC prediction error and each AC coefficient
• Different coefficient is quantized with different step-size (Q): – Human eye is more sensitive to low frequency components – Low frequency coefficients with a smaller Q – High frequency coefficients with a larger Q – Specified in a normalization matrix – Normalization matrix can then be scaled by a scale factor
• JPEG bit allocation does not intend to minimize MSE!
50
@Yao Wang, 2016 EL6123: Image and Video Processing
Default Normalization Matrix in JPEG
Actual step size for C(i,j): Q(i,j) = QP*M(i,j)
Note that the stepsize for the DC coefficient is for the prediction error for DC, not the original DC value
51
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: DCT on a Real Image Block
>>imblock = lena256(128:135,128:135)-128
imblock=
54 68 71 73 75 73 71 45
47 52 48 14 20 24 20 -8
20 -10 -5 -13 -14 -21 -20 -21
-13 -18 -18 -16 -23 -19 -27 -28
-24 -22 -22 -26 -24 -33 -30 -23
-29 -13 3 -24 -10 -42 -41 5
-16 26 26 -21 12 -31 -40 23
17 30 50 -5 4 12 10 5
>>dctblock =dct2(imblock)
dctblock=
31.0000 51.7034 1.1673 -24.5837 -12.0000 -25.7508 11.9640 23.2873
113.5766 6.9743 -13.9045 43.2054 -6.0959 35.5931 -13.3692 -13.0005
195.5804 10.1395 -8.6657 -2.9380 -28.9833 -7.9396 0.8750 9.5585
35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125
40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606
7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -3.5087
-1.4562 -13.3225 -0.8750 1.3248 10.3817 16.0762 4.4157 1.1041
-6.7720 -2.8384 4.1187 1.1118 10.5527 -2.7348 -3.2327 1.5799
In JPEG, “imblock-128” is done before DCT to shift the mean to zero
52
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: Quantized Indices
>>dctblock =dct2(imblock)
dctblock=
31.0000 51.7034 1.1673 -24.5837 -12.0000 -25.7508 11.9640 23.2873
113.5766 6.9743 -13.9045 43.2054 -6.0959 35.5931 -13.3692 -13.0005
195.5804 10.1395 -8.6657 -2.9380 -28.9833 -7.9396 0.8750 9.5585
35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125
40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606
7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -3.5087
-1.4562 -13.3225 -0.8750 1.3248 10.3817 16.0762 4.4157 1.1041
-6.7720 -2.8384 4.1187 1.1118 10.5527 -2.7348 -3.2327 1.5799
>>QP=1;
>>QM=Qmatrix*QP;
>>qdct=floor((dctblock+QM/2)./(QM))
qdct =
2 5 0 -2 0 -1 0 0
9 1 -1 2 0 1 0 0
14 1 -1 0 -1 0 0 0
3 -1 -1 -1 0 0 0 0
2 -1 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Only 19 coefficients are retained out of 64
53
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: Quantized Coefficients
%dequantized DCT block
>> iqdct=qdct.*QM
iqdct=
32 55 0 -32 0 -40 0 0
108 12 -14 38 0 58 0 0
196 13 -16 0 -40 0 0 0
42 -17 -22 -29 0 0 0 0
36 -22 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Original DCT block
dctblock=
31.0000 51.7034 1.1673 -24.5837 -12.0000 -25.7508 11.9640 23.2873
113.5766 6.9743 -13.9045 43.2054 -6.0959 35.5931 -13.3692 -13.0005
195.5804 10.1395 -8.6657 -2.9380 -28.9833 -7.9396 0.8750 9.5585
35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125
40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606
7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -3.5087
-1.4562 -13.3225 -0.8750 1.3248 10.3817 16.0762 4.4157 1.1041
-6.7720 -2.8384 4.1187 1.1118 10.5527 -2.7348 -3.2327 1.5799
54
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: Reconstructed Image
%reconstructed image block
>> qimblock=round(idct2(iqdct))
qimblock=
58 68 85 79 61 68 67 38
45 38 39 33 22 24 19 -2
21 2 -11 -12 -13 -19 -24 -27
-8 -19 -31 -26 -20 -35 -37 -15
-31 -17 -21 -20 -16 -39 -41 0
-33 3 -1 -14 -11 -37 -44 1
-16 32 18 -10 1 -16 -30 8
3 54 30 -6 16 11 -7 23
Original image block
imblock=
54 68 71 73 75 73 71 45
47 52 48 14 20 24 20 -8
20 -10 -5 -13 -14 -21 -20 -21
-13 -18 -18 -16 -23 -19 -27 -28
-24 -22 -22 -26 -24 -33 -30 -23
-29 -13 3 -24 -10 -42 -41 5
-16 26 26 -21 12 -31 -40 23
17 30 50 -5 4 12 10 5
55
@Yao Wang, 2016 EL6123: Image and Video Processing
Coding of Quantized DCT Coefficients
• DC coefficient: Predictive coding – The DC value of the current block is predicted from that of the
previous block, and the error is coded using Huffman coding • AC Coefficients: Runlength coding
– Many high frequency AC coefficients are zero after first few low-frequency coefficients
– Runlength Representation: • Ordering coefficients in the zig-zag order • Specify how many zeros before a non-zero value • Each symbol=(length-of-zero, non-zero-value)
– Code all possible symbols using Huffman coding • More frequently appearing symbols are given shorter codewords • One can use default Huffman tables or specify its own tables.
• Instead of Huffman coding, arithmetic coding can be used to achieve higher coding efficiency at an added complexity.
56
@Yao Wang, 2016 EL6123: Image and Video Processing
Example: Run-length Coding
qdct =
2 5 0 -2 0 -1 0 0
9 1 -1 2 0 1 0 0
14 1 -1 0 -1 0 0 0
3 -1 -1 -1 0 0 0 0
2 -1 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Run-length symbol representation: {2,(0,5),(0,9),(0,14),(0,1),(1,-2),(0,-1),(0,1),(0,3),(0,2),(0,-1),(0,-1),(0,2),(1,-1),(2,-1), (0,-1), (4,-1),(0,-1),(0,1),EOB} EOB: End of block, one of the symbol that is assigned a short Huffman codeword
Zig-zag ordering of DCT coefficients:
57
@Yao Wang, 2016 EL6123: Image and Video Processing
Coding of DC Symbols
• Example: – Current quantized DC index: 2 – Previous block DC index: 4 – Prediction error: -2 – The prediction error is coded in two parts:
• Which category it belongs to (Table of JPEG Coefficient Coding Categories), and code using a Huffman code (JPEG Default DC Code)
– DC= -2 is in category “2”, with a codeword “100” • Which position it is in that category, using a fixed length
code, length=category number – “-2” is the number 1 (starting from 0) in category 2, with a
fixed length code of “01”. – The overall codeword is “10001”
58
@Yao Wang, 2016 EL6123: Image and Video Processing
JPEG Tables for Coding DC
59
@Yao Wang, 2016 EL6123: Image and Video Processing
Coding of AC Coefficients
• Example: – First symbol (0,5)
• The value ‘5’ is represented in two parts: • Which category it belongs to (Table of JPEG Coefficient Coding
Categories), and code the “(runlength, category)” using a Huffman code (JPEG Default AC Code)
– AC=5 is in category “3”, – Symbol (0,3) has codeword “100”
• Which position it is in that category, using a fixed length code, length=category number
– “5” is the number 5 (starting from 0) in category 3, with a fixed length code of “101”.
– The overall codeword for (0,5) is “100101” – Second symbol (0,9)
• ‘9’ in category ‘4’, (0,4) has codeword ‘1011’,’9’ is number 9 in category 4 with codeword ‘1001’ -> overall codeword for (0,9) is ‘10111001’
– ETC 60
@Yao Wang, 2016 EL6123: Image and Video Processing
JPEG Tables for Coding AC (Run,Category) Symbols
61
@Yao Wang, 2016 EL6123: Image and Video Processing
JPEG Performance for B/W images
65536 Bytes 8 bpp
4839 Bytes 0.59 bpp CR=13.6
3037 Bytes 0.37 bpp CR=21.6
1818 Bytes 0.22 bpp CR=36.4
62
@Yao Wang, 2016 EL6123: Image and Video Processing
JPEG for Color Images
• Color images are typically stored in (R,G,B) format • JPEG standard can be applied to each component separately
– Does not make use of the correlation between color components – Does not make use of the lower sensitivity of the human eye to
chrominance samples • Alternate approach
– Convent (R,G,B) representation to a YCbCr representation • Y: luminance, Cb, Cr: chrominance
– Down-sample the two chrominance components • Because the peak response of the eye to the luminance component occurs
at a higher frequency (3-10 cpd) than to the chrominance components (0.1-0.5 cpd). (Note: cpd is cycles/degree)
• JPEG standard is designed to handle an image consists of many (up to 100) components
63
@Yao Wang, 2016 EL6123: Image and Video Processing
RGB <-> YCbCr Conversion
Note: Cb ~ Y-B, Cr ~ Y-R, are known as color difference signals.
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−−
−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−−−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
128128
001.0772.1000.1714.0344.0000.1402.1001.0000.1
1281280
081.0419.0500.0500.0331.0169.0114.0587.0299.0
r
b
r
b
CCY
BGR
BGR
CCY
64
@Yao Wang, 2016 EL6123: Image and Video Processing
Chrominance Subsampling
4 : 2 : 0 F o r e v e r y 2 x 2 Y P i x e l s
1 C b & 1 C r P i x e l ( S u b s a m p l i n g b y 2 : 1 b o t h h o r i z o n t a l l y a n d v e r t i c a l l y )
4 : 2 : 2 F o r e v e r y 2 x 2 Y P i x e l s
2 C b & 2 C r P i x e l ( S u b s a m p l i n g b y 2 : 1
h o r i z o n t a l l y o n l y )
4 : 4 : 4 F o r e v e r y 2 x 2 Y P i x e l s
4 C b & 4 C r P i x e l ( N o s u b s a m p l i n g )
Y P i x e l C b a n d C r P i x e l
4 : 1 : 1 F o r e v e r y 4 x 1 Y P i x e l s
1 C b & 1 C r P i x e l ( S u b s a m p l i n g b y 4 : 1
h o r i z o n t a l l y o n l y )
4:2:0 is the most common format
65
@Yao Wang, 2016 EL6123: Image and Video Processing
Coding Unit in JPEG
4 8x8 Y blocks 1 8x8 Cb blocks 1 8x8 Cr blocks
Each basic coding unit (called a data unit) is a 8x8 block in any color component. In the interleaved mode, 4 Y blocks and 1 Cb and 1 Cr blocks are processed as a group (called a minimum coding unit or MCU) for a 4:2:0 image.
66
@Yao Wang, 2016 EL6123: Image and Video Processing
Default Quantization Table
17 18 24 47 99 99 99 99
18 21 26 66 99 99 99 99
24 26 56 99 99 99 99 99
47 66 99 99 99 99 99 99
99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99
For luminance For chrominance
The encoder can specify the quantization tables different from the default ones as part of the header information
67
@Yao Wang, 2016 EL6123: Image and Video Processing
Performance of JPEG
• For color images at 24 bits/pixel (bpp) – 0.25-0.5 bpp: moderate to good – 0.5-0.75 bpp: good to very good – 0.75-1.5 bpp: excellent, sufficient for most
applications – 1.5-2 bpp: indistinguishable from original – From: G. K. Wallace: The JPEG Still picture
compression standard, Communications of ACM, April 1991.
• For grayscale images at 8 bpp – 0.5 bpp: excellent quality
68
@Yao Wang, 2016 EL6123: Image and Video Processing
JPEG Performance
487x414 pixels, Uncompressed, 600471 Bytes,24 bpp 85502 Bytes, 3.39 bpp, CR=7
487x414 pixels 41174 Bytes, 1.63 bpp, CR=14.7
69
@Yao Wang, 2016 EL6123: Image and Video Processing
Reading
• Reading assignment: – [Wang2002] Sec. 9.1
• Optional reading: – R. Gonzalez, “Digital Image Processing,” Section 8.5 - 8.6 – G. K. Wallace: The JPEG Still picture compression standard,
Communications of ACM, April 1991. – [Wang2002] Sec. 9.2 (Predictive coding)
70
@Yao Wang, 2016 EL6123: Image and Video Processing
Written Homework (1)
• [Wang2002] Prob. 9.4,9.5 • Additional problems in the following page
71
@Yao Wang, 2016 EL6123: Image and Video Processing
Written Homework (2)
1. For the 2x2 image S given below, compute its 2D DCT, reconstruct it by retaining
different number of coefficients to evaluate the effect of different basis images. a) Determine the four DCT basis images. b) Determine the 2D-DCT coefficients for S, Tk,l, k=0,1;l=0,1. c) Show that the reconstructed image from the original DFT coefficients equal to the original
image. d) Modify the DCT coefficients using the given window masks (W1 to W5) and reconstruct
the image using the modified DCT coefficients. (for a given mask, “1” indicates to retain that coefficient, “0” means to set the corresponding coefficient to zero) What effect do you see with each mask and why?
2. For the same image S as given in Prob. 1, quantize its DCT coefficients using the
quantization matrix Q. Determine the quantized coefficient indices and quantized values. Also, determine the reconstructed image from the quantized coefficients.
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=
1001
,1000
,0100
,0010
,0001
,9119
54321 WWWWWS
⎥⎦
⎤⎢⎣
⎡=
5333
Q
72
@Yao Wang, 2016 EL6123: Image and Video Processing
Written Homework (3)
3. Describe briefly how JPEG compresses an image. You may want to break down your discussion into three parts:
a) How does JPEG compress a 8x8 image block (three steps are involved) b) How does JPEG compress a gray-scale image c) How does JPEG compress a RGB color image
4. Suppose the DCT coefficient matrix for an 4x4 image block is as shown below (dctblock).
a) Quantize its DCT coefficients using the quantization matrix Q given below, assuming QP=1. Determine the quantized coefficient indices and quantized values.
b) Represent the quantized indices using the run-length representation. That is, generate a series of symbols, the first being the quantized DC index, the following symbols each consisting of a length of zeros and the following non-zero index, the last symbol is EOB (end of block).
c) Encode the DC index and the runlength symbols from (b) using the JPEG coding method, with the coding tables given in the lecture note. For this problem, assuming the quantized index for the DC coefficient of the previous block is 60.
,
0.0085 0.0044 0.0086 0.0046- 0.0019 0.0036 0.0207- 0.0086- 0.0071 0.0877 0.0033- 0.0134 0.0912 0.0466- 0.0500- 1.3676
* 0030.1
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
+= edctblock ,
120 103 78 49 103 68 37 18 69 40 16 14 51 24 10 16
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=Q
73
@Yao Wang, 2016 EL6123: Image and Video Processing
Computer Assignment
Write a program to examine the effect of quantizing the DCT coefficients. For each 8 x 8 block in the image, your program should first calculate the DCT of the block, quantize the coefficients, and then take the inverse DCT. For quantization, use a scaled version of the JPEG quantization matrix as the stepsizes. You only need to do this for a gray scale image or the luminance component of a color image. Furthermore, the program should compute the PSNR of the reconstructed image from quantized DCT coefficients and also count the total number of non-zero coefficients after quantization and derive the average number of non-zero coefficients per block. Your program should show the original and reconstructed image. Examine the resulting image quality with the following values for the scaling factor: 0.5, 1, 2, 4, 8. What is the largest value of the scaling factor at which the reconstructed image quality is very close to the original image? Please also plot PSNR vs. the quantization scaling factor, the number of non-zero coefficients/block vs. the quantization scaling factors, and finally the PSNR vs. the number of non-zero coefficients. Interpret these plots, to comment on the effect of the quantization factor on the image quality and bit rate. Note that you may roughly consider the number of non-zero coefficients to be proportional to the required bits to represent the quantized image. Hint: you may want to make use of the ”blockproc” function in MATLAB to speed up your program. You can use the dct2( ) and idct2( ) functions in MATLAB.
74