Image Compression Fundamentals

Resmi N.G. Reference:

Digital Image Processing 2nd Edition Rafael C. Gonzalez Richard E. Woods

Overview Introduction

Fundamentals Coding Redundancy

Interpi xel Redundancy

Psychovisual Redundancy

Fidelity Criteria

Image Compression Models Source Encoder and Decoder

Channel Encoder and Decoder

Elements of Information Theory Measuring Information

The Information Channel

Fundamental Coding Theorems Noiseless Coding Theorem

Noisy Coding Theorem

Source Coding Theorem

3/24/2012 2 CS 04 804B Image Processing Module 3

Error-Free Compression

Variable-Length Coding

Huffman Coding

Other Near Optimal Variable Length Codes

Arithmetic Coding

LZW Coding

Bit-Plane Coding

Bit-Plane Decomposition

Constant Area Coding

One-Dimensional Run-Length Coding

Two-Dimensional Run-Length Coding

Lossless Predictive Coding

Lossy Compression

Lossy Predictive Coding


Transform Coding

Transform Selection

Subimage Size Selection

Bit Allocation

Zonal Coding Implementation

Threshold Coding Implementation

Wavelet Coding

Wavelet Selection

Decomposition Level Selection

Quantizer Design

Image Compression Standards

Binary Image Compression Standards

One Dimensional Compression

Two Dimensional Compression

3/24/2012 CS 04 804B Image Processing Module 3 4

Continuous Tone Still Image Compression Standards

JPEG

Lossy Baseline Coding System

Extended Coding System

Lossless Independent Coding System

JPEG 2000

Video Compression Standards


Introduction Need for Compression

Huge amount of digital data

Difficult to store and transmit

Solution

Reduce the amount of data required to represent a digital image

Remove redundant data

Transform the data prior to storage and transmission

Categories

Information Preserving

Lossy Compression


Fundamentals Data compression

Difference between data and information

Data Redundancy

If n1 and n2 denote the number of information-carrying

units in two datasets that represent the same information,

the relative data redundancy RD of the first dataset is

defined as


1

2

11 ,

, , .

D

R

R

RC

nwhere C is called the compression ratio

n


2 1

2 1

2 1

1:

1 0

2 :

1

3:

0

R D

R D

R D

Case n n

C and R no redundant data

Case n n

C and R highly redundant data

significant compression

Case n n

C and R second dataset contains

more data than the original





Fidelity Criteria









Coding Redundancy Let a discrete random variable rk in [0,1] represent the

graylevels of an image.

pr(rk) denotes the probability of occurrence of rk.

If the number of pixels used to represent each value of rk

is l(rk), then the average number of bits required to

represent each pixel is


( ) , 0,1,2,... 1kr k

np r k L

n

1

0

( ) ( )L

avg k r k

k

L l r p r

Hence, the total number of bits required to code an MxN

image is MNLavg.

For representing an image using an m-bit binary code,

Lavg= m.


How to achieve data compression?

Variable length coding - Assign fewer bits to the more probable graylevels than to the less probable ones.

Find Lavg, compression ratio and redundancy.







Fidelity Criteria









Interpixel Redundancy Related to interpixel correlation within an image.

The value of a pixel in the image can be reasonably

predicted from the values of its neighbours.

The gray levels of neighboring pixels are roughly the

same and by knowing gray level value of one of the

neighborhood pixels one has a lot of information about

gray levels of other neighborhood pixels.

Information carried by individual pixels is relatively

small. These dependencies between values of pixels in the

image are called interpixel redundancy.


Autocorrelation




The autocorrelation coefficients along a single line of

image are computed as

For the entire image,


1

0

( )( )

(0)

1( ) ( , ) ( , )

N n

y

A nn

A

where A n f x y f x y nN n

To reduce interpixel redundancy, transform it into an

efficient format.

Example: The differences between adjacent pixels can be

used to represent the image.

Transformations that remove interpixel redundancies are

termed as mappings.

If original image can be reconstructed from the dataset,

these mappings are called reversible mappings.







Fidelity Criteria









Psychovisual Redundancy Based on human perception

Associated with real or quantifiable visual information.

Elimination of psychovisual redundancy results in loss of

quantitative information. This is referred to as

quantization.

Quantization – mapping of a broad range of input values

to a limited number of output values.

Results in lossy data compression.






Fidelity Criteria









Fidelity Criteria Objective fidelity criteria

When the level of information loss can be expressed as a

function of original (input) image and the compressed and

subsequently decompressed output image.

Example: Root Mean Square error between input and

output images.


121 1 2

0 0

( , ) ( , ) ( , )

1( , ) ( , )

M N

rms

x y

e x y f x y f x y

e f x y f x yMN

Mean Square Signal-to-Noise Ratio


1 12

0 0

21 1

0 0

( , )

( , ) ( , )

M N

x y

ms M N

x y

f x y

SNR

f x y f x y

Subjective fidelity criteria

Measures image quality by subjective evaluations of a

human observer.






Fidelity Criteria









Image Compression Models


Encoder – Source encoder + Channel encoder

Source encoder – removes coding, interpixel, and

psychovisual redundancies in input image and outputs a

set of symbols.

Channel encoder – To increase the noise immunity of the

output of source encoder.

Decoder - Channel decoder + Source decoder


Source Encoder

Mapper

Transforms input data into a format designed to reduce

interpixel redundancies in input image.

Reversible process generally

May or may not reduce directly the amount of data required

to represent the image.

Examples

Run-length coding(directly results in data compression)

Transform coding


Quantizer

Reduces the accuracy of the mapper’s output in

accordance with some pre-established fidelity criterion.

Reduces the psychovisual redundancies of the input

image.

Irreversible process (irreversible information loss)

Must be omitted when error-free compression is desired.


Symbol encoder

Creates a fixed- or variable-length code to represent the

quantizer output and maps the output in accordance with

the code.

Usually, a variable-length code is used to represent the

mapped and quantized output.

Assigns the shortest codewords to the most frequently

occuring output values.

Reduces coding redundancy.

Reversible process


Source decoder

Symbol decoder

Inverse Mapper

Inverse operations are performed in the reverse order.



Essential when the channel is noisy or error-prone.

Source encoded data – highly sensitive to channel noise.

Channel encoder reduces the impact of channel noise by

inserting controlled form of redundancy into the source

encoded data.

Example

Hamming Code – appends enough bits to the data being

encoded to ensure that two valid codewords differ by a

minimum number of bits.


7-bit Hamming(7,4) Code

7-bit codewords

4-bit word

3 bits of redundancy

Distance between two valid codewords (the minimum number of bit changes required to change from one code to another) is 3.

All single-bit errors can be detected and corrected.

Hamming distance between two codewords is the number of places where the codewords differ.

Minimum Distance of a code is the minimum number of bit changes between any two codewords.

Hamming weight of a codeword is equal to the number of non-zero elements (1’s) in the codeword.



Binary data

b3b2b1b0

Hamming Codeword

h1h2h3h4h5h6h7

0000 0000000

0001 1101001

0010 0101010

0011 1000011

0100 1001100

0101 0100101

0110 1100110

0111 0001111





Fidelity Criteria









Basics of Probability


Ref: http://en.wikipedia.org/wiki/Probability






A random event E occuring with probability P(E) is said

to contain

units of information.

I(E) is called the self-information of E.

Amount of self-information of an event E is inversely

related to its probability.


1( ) log log( ( ))

( )I E P E

P E

If P(E) = 1, I(E) = 0. That is, there is no uncertainty

associated with the event.

No information is conveyed because it is certain that the

event will occur.

If base m logarithm is used, the measurement is in m-ary

units.

If base is 2, the measurement is in binary units. The unit of

information is called a bit.

If P(E) = ½, I(E) = -log (½) = 1 bit. That is, 1 bit of

information is conveyed when one of the two possible

equally likely outcomes occur.






Fidelity Criteria









The Information Channel Information channel is the physical medium that connects

the information source to the user of information.

Self-information is transferred between an information

source and a user of the information, through the

information channel.

Information source – Generates a random sequence of

symbols from a finite or countably infinite set of possible

symbols.

Output of the source is a discrete random variable.


The set of source symbols or letters{a1, a2, …, aJ} is

referred to as the source alphabet A.

The probability of the event that the source will produce

symbol aj is P(aj).

The Jx1 vector is used to

represent the set of all source symbol probabilities.

The finite ensemble (A,z) describes the information source

completely.


1

( ) 1J

j

j

P a

1 2( ), ( ),..., ( )T

JP a P a P az

The probability that the discrete source will emit symbol

aj is P(aj).

Therefore, the self-information generated by the

production of a single source symbol is,

If k source symbols are generated, the average self-

information obtained from k outputs is


1 1 2 2

1

( ) log ( ) ( ) log ( ) ... ( ) log ( )

( ) log ( )

J J

J

j j

j

kP a P a kP a P a kP a P a

k P a P a

( ) log ( )j jI a P a

The average information per source output, denoted as

H(z), is

This is called the uncertainty or entropy of the source.

It is the average amount of information (in m-ary units

per symbol) obtained by observing a single source

output.

If the source symbols are equally probable, the entropy is

maximized and the source provides maximum possible

average information per source symbol.


1

1 1

( ) [ ( )] ( ) ( )

1( ) log ( ) log ( )

( )

J

j j

j

J J

j j j

j jj

H E I P a I a

P a P a P aP a

z z


A simple information system

Output of the channel is also a discrete random variable which takes on values from a finite or countably infinite set of symbols {b1, b2, …, bK} called the channel alphabet B.

The finite ensemble (B,v), where

describes the channel output completely and thus the information received by the user.


1 2( ), ( ),..., ( )T

JP b P b P bv

The probability P(bk) of a given channel output and the

probability distribution of the source z are related as


1

( ) ( | ) ( )

( | )

,

.

J

k k j j

j

k j

k

j

P b P b a P a

where P b a is the conditional probability that

the output symbol b is received given that the

source symbol a was generated

Forward Channel Transition Matrix or Channel Matrix

Matrix element,

The probability distribution of the output alphabet can be

computed from

v = Qz


1 1 1 2 1

2 1 2 2 2

1 2

| | ... |

| | ... |

: : ... :

| | ... |

J

J

K K K J

P b a P b a P b a

P b a P b a P b aQ

P b a P b a P b a

|kj k jq P b a

Conditional entropy function


1 1

1

1

( ) [ ( )] ( ) ( ) ( ) log ( )

( ) [ ( )] ( ) ( )

( | ) log ( | )

( | )

J J

j j j j

j j

J

k k j k j k

j

J

j k j k

j

j k j

Entropy

H E I P a I a P a P a

Conditional entropy function

H b E I b P a b I a b

P a b P a b

where P a b is the probability that symbol a is

transmitt

z z

z | z | | |

.ked by the source giventhat theuser receivesb

The expected or average value over all bk is


1

1 1

1 1

1 1

( ) ( ) ( )

( | ) log ( | ) ( )

( | ) ( ) log ( | )

( , ), ( | )

( )

( ) ( , ) log ( | )

K

k k

k

K J

j k j k k

k j

K J

j k k j k

k j

j k

j k

k

K J

j k j k

k j

H H b P b

P a b P a b P b

P a b P b P a b

P a bConditional Probability P a b

P b

H P a b P a b

z | v z |

z | v

P(aj,bk) is the joint probability of aj and bk. That is, the

probability that aj is transmitted and bk is received.

Mutual information

H(z) is the average information per source symbol,

assuming no knowledge of the output symbol.

H(z|v) is the average information per source symbol,

assuming observation of the output symbol.

The difference between H(z) and H(z|v) is the average

information received upon observing the output symbol,

and is called the mutual information of z and v, given by

I(z|v) = H(z) - H(z|v)



1 1 1

1 1 1

1 2

1

( ) ( ) ( )

( ) log ( ) ( , ) log ( | )

( ) log ( ) ( , ) log ( | )

( ) ( , ) ( , ) ... ( , )

( , )

J J K

j j j k j k

j j k

J J K

j j j k j k

j j k

j j j j K

K

j k

k

I H H

P a P a P a b P a b

P a P a P a b P a b

P a P a b P a b P a b

P a b

z | v z z | v


1 1 1 1

1 1

1 1

( ) ( , ) log ( ) ( , ) log ( | )

( | )( , ) log

( )

( , )( , ) log

( ) ( )

J K J K

j k j j k j k

j k j k

J Kj k

j k

j k j

J Kj k

j k

j k j k

I P a b P a P a b P a b

P a bP a b

P a

P a bP a b

P a P b

z | v


1 1

1 1

1 1

( , ) ( | ). ( )

( , ) ( | ). ( )

( | ). ( )( ) ( | ). ( ) log

( ) ( )

. ( ). ( ) log

( ) ( )

. ( ) log( )

. ( ) log( )

j k j k k

j k k j j

J Kk j j

k j j

j k j k

J Kkj j

kj j

j k j k

J Kkj

kj j

j k k

kj

kj j

k k

P a b P a b P b

P a b P b a P a

P b a P aI P b a P a

P a P b

q P aq P a

P a P b

qq P a

P b

qq P a

P b

z | v

1 1

J K

j


1

1 1

1

1 1

1

( ) ( | ) ( )

( ) . ( ) log

( | ) ( )

. ( ) log

( )

J

k k j j

j

J Kkj

kj j Jj k

k i i

i

J Kkj

kj j Jj k

ki i

i

P b P b a P a

qI q P a

P b a P a

qq P a

q P a

z | v

The minimum possible value of I(z|v) is zero.

Occurs when the input and output symbols are statistically

independent.

That is, when P(aj,bk) = P(aj)P(bk).


1 1

1 1

1 1

( , )I( | ) ( , ) log

( ) ( )

( ) ( )( , ) log

( ) ( )

( , ) log1 0

J Kj k

j k

j k j k

J Kj k

j k

j k j k

J K

j k

j k

P a bP a b

P a P b

P a P bP a b

P a P b

P a b

z v

Channel Capacity

The maximum value of I(z|v) over all possible choices of

source probabilities in the vector z is called the capacity,

C, of the channel described by channel matrix Q.

Channel capacity is the maximum rate at which

information can be transmitted reliably through the

channel.

Binary information source

Binary Symmetric Channel (BSC)


max[I( | )]C z

z v

Binary Information Source


1 2

2 2

2 2

{ , } 0, 1

1 , 2 1-

,

( ) log log

1 , 2 ,1-

log log

(.)

bs bs bs

bs bs bs bs

T T

bs bs

bs bs bs bs

bs

Source alphabet A a a

P a p P a p p

Entropy of source

H p p p p

where P a P a p p

p p p p is called thebinary entropy

function denoted as H

z

z

2 2, ( ) log logbsFor example H t t t t t


Binary Symmetric Channel (Noisy Binary Information

Channel)


1 1 1 2

2 1 2 2

.

( | ) ( | )

( | ) ( | )

(0 | 0) (0 |1)

(1| 0) (1|1)

1

1

e

ee e e

e e e e

Let the probability of error during transmission

of any symbol be p

Channel matrix for BSC

P b a P b aQ

P b a P b a

P P

P P

p pp p

p p p p


1 2

1 2

1 2

{ , b } 0, 1

, b 0 , 1

,

(0)

(1)

T T

bsee

bse e

bs ee bs

e bs e bs

Output alphabet B b

P b P P P

The probabilities of the receiving output symbols

b and b canbe determined by

pp p

pp p

P p p p p

P p p p p

v

v Qz

=

The mutual information of BSC can be computed as


2 2

2 21 1

1

1111 1 2

11 1 12 2

2121 1 2

21 1 22 2

1212 2 2

11 1 12 2

2222 2 2

21 1 22 2

( ) . ( ) log

( )

. ( ) log( ) ( )

. ( ) log( ) ( )

. ( ) log( ) ( )

. ( ) log( ) ( )

kj

kj j

j kki i

i

qI q P a

q P a

qq P a

q P a q P a

qq P a

q P a q P a

qq P a

q P a q P a

qq P a

q P a q P a

z | v


2 2

2 2

2 2

2 2

2 2

. log . log

. log . log

. log . log

. log . log

. log . log

e ebs e bse

bs e e bse bs e bs

e ee bs e bs

bs e e bse bs e bs

bs bs bs ee e e e bs

e bs e e bs e bs e bs

e e e bs ebs bs e b

p pp p p p

p p p p p p p p

ppp p p p

p p p p p p p p

p p p p p p p p p

p p p p p p p p p

p p p p p p p p p

2 2

2 2

. log . log

( ) ( )

(.) log log

s

e bse bs e e bs e bs

bs e bs bs ee bs

bs bs bs bs bs

p p p p p p p p p

H p p p p H p

where H p p p p

Capacity of BSC

Maximum of mutual information over all source distributions.


2 2

1 1 1( ) max . , .

2 2 2

1 1( ) ( ) ( )

2 2

1 1( (1 ) ) ( )

2 2

1( )

2

1 1 1 1log log ( )

2 2 2 2

1 ( )

T

bs

bs e bs ee

bs e e bs e

bs bs e

bs e

bs e

I is imum when p is This corresponds to

I H p p H p

H p p H p

H H p

H p

H p

z | v z

z | v






Fidelity Criteria









Fundamental Coding Theorems


The Noiseless Coding Theorem or Shannon’s First

Theorem or Shannon’s Source Coding Theorem for

Lossless Data Compression

When both the information channel and communication

system are error-free

Defines the minimum average codeword length per source

symbol that can be achieved.

Aim: to represent source as compact as possible.

Let the information source (A,z), with statistically

independent source symbols, output an n-tuple of symbols

from source alphabet A. Then, the source output takes on

one of the Jn possible values, denoted by, αi , from


n1 2 3 JA' { , , , , }


1 2

1 2

1

1 2

, ( )

( ) ( ) ( )... ( )

' { ( ), ( ),..., ( )}

( ') ( ) log ( )

( ) ( )... ( ) log

n

n

i i

i j j jn

J

J

i i

i

j j jn

Probability of a given P is related to single symbol

probabilities as

P P a P a P a

P P P

Entropy of the sourceis givenby

H P P

P a P a P a P

z

z

1 2

1

( ) ( )... ( )

( ') ( )

nJ

j j jn

i

a P a P a

H nH

z z


Hence, the entropy of the zero-memory source is n times

the entropy of the corresponding single symbol source.

Such a source is called the nth extension of single-symbol

source.

1log .

( )

1 1log ( ) log 1

( ) ( )

i

i

i

i i

i

i

Self informationof isP

lP P

α is therefore represented by a codeword whoselength

is the smallest integer exceeding the self - information

of α .


1 1 1

1

1 1( ) log ( ) ( ) ( ) log ( )

( ) ( )

1 1( ) log ( ) ( ) ( ) log 1

( ) ( )

( ') ' ( ') 1

' ( ) ( )

'( ') ( ') 1

' 1( ) ( )

lim

n n n

n

i i i i i

i i

J J J

i i i i

i i ii i

avg

J

avg i i

i

avg

avg

n

P P l P PP P

P P l PP P

H L H

where L P l

LH H

n n n

LH H

n n

L

z z

z z

z z

'( )

avgH

n

z

Shannon’s source coding theorem for lossless data

compression states that for any code used to represent the

symbols from a source, the minimum number of bits

required to represent the source symbols on an average

must be atleast equal to the entropy of the source.


( )

'

( ')

'

avg

avg

Theefficiency of any encoding strategy canbe defined as

nH

L

H

L

z

z

' 1( ) ( )

avgLH H

n n z z

The Noisy Coding Theorem or Shannon’s Second

Theorem

When the channel is noisy or prone to error

Aim: to encode information so that the communication is

made reliable and the error is minimized.

Use of repetitive coding scheme

Encode nth extension of source using K-ary code

sequences of length r, Kr ≤ Jn.

Select only φ of the Kr possible code sequences as valid

codewords.


A zero-memory information source generates information

at a rate equal to its entropy.

The nth extension of the source provides information at a

rate of information units per symbol.

If the information is coded, the maximum rate of coded

information is log(φ/r) and occurs when the φ valid

codewords used to code the source are equally probable.

Hence, a code of size φ and block length r is said to have a

rate of

information units per symbol.


( ')H

n

z

logRr

The noisy coding theorem thus states that for any R<C,

where C is the capacity of the zero-memory channel with

matrix Q, there exists an integer r, and code of block

length r and rate R such that the probability of a block

decoding error is less than or equal to ε for any ε>0.

That is, the probability of error can be made arbitrarily

small so long as the coded message rate is less than the

capacity of the channel.


The Source Coding Theorem for Lossy Data Compression

When channel is error-free, but communication process is lossy.

Aim: information compression

To determine the smallest rate at which information about the source can be conveyed to the user.

To encode the source so that the average distortion is less than a maximum allowable level D.

Let the information source and ecoder output be defined by (A,z) and (B,v) respectively.

A nonnegative cost function ρ(aj,bk), called distortion measure, is used to define the penalty associated with reproducing source output aj with decoder output bk.



1 1

1 1

( ) ( , ) ( , )

( , ) ( )

.

( )

( ) min ( )

{ | ( ) }

D

J K

j k j k

j k

J K

j k j kj

j k

Q Q

D kj

Averagevalueof distortionis givenby

d Q a b P a b

a b P a q

whereQis thechannel matrix

Rate distortion function R D is defined as

R D I

whereQ q d Q D is the set o

z, v

.

f all

D admissibleencoding decoding procedures

If D = 0, R(D) is less than or equal to the entropy of the

source, or R(0)≤H(z).

defines the minimum rate at

which information can be conveyed to user subject to the

constraint that the average distortion be less than or equal

to D.

I(z,v) is minimized subject to:

d(Q) = D indicates that the minimum information rate

occurs when the maximum possible distortion is allowed.


( ) min ( )DQ Q

R D I

z, v

1

0, 1, ( )K

kj kj

k

q q and d Q D

Shannon’s Source Coding Theorem for Lossy Data

Compression states that for a given source (with all its

statistical properties known) and a given distortion

measure, there is a function, R(D), called the rate-

distortion function such that if D is the tolerable amount

of distortion, then R(D) is the best possible compression

rate.

The theory of lossy data compression is also known as

rate distortion theory.

The lossless data compression theory and lossy data

compression theory are collectively known as the source

coding theory.


Thank You


Image Compression Fundamentals

Documents