Huffman Coding. Main properties ： –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Huffman Coding

Huffman Coding

• Main properties：– Use variable-length code for encoding a source symbol.

– Shorter codes are assigned to the most frequently used

symbols, and longer codes to the symbols which appear

less frequently.

– Unique decodable & Instantaneous code.

– It was shown that Huffman coding cannot be improved

or with any other integral bit-width coding stream.

Example

40302010

Number

of

pixels

Gray level

a.Step 1:Histogram4.0

100

40

1.0100

10

3.0100

30

2.0100

20

3

2

1

0

g

g

g

g

1.0

3.02.0

6.03.03.0

4.04.04.0

4.0

6.0

d.Step 4:Reorder and add until only two values remain

1.0

2.0

3.0

4.0

2

0

1

3

gggg

b.Step 2:Order

1.0

3.02.0

3.03.0

4.04.0

c.Step 3:Add

1.0

3.02.0

6.03.03.0

4.04.04.0

1.0

2.0

3.0

4.0

4.0

6.0

3.0

3.0

4.0

a.Assign 0 and 1 to the rightmost probabilities

1.0

2.0

3.0

4.0

3.0

3.0

4.0

4.0

6.01 0

1

0

0

b.Bring 0 and 1 back along the tree

1

00

010

011

c.Append 0 and 1 previously added branches

d.Repeat the process until the original branch is labeled

1.0

2.0

3.0

4.0

4.0

6.0

3.0

3.0

4.01 0

1

01

00

1.0

2.0

3.0

4.0

4.0

6.0

3.0

3.0

4.01 0

1

01

00

Original Gray Level(Natural Code)

Probability Huffman code

00 0.2 010

01 0.3 00

10 0.1 011

11 0.4 1g

g

g

g

3

2

1

0

)4.0(1)1.0(3)3.0(2)2.0(3

1

0

plL ii

L

iave

)4.0log()4.0()1.0log()1.0()3.0log()3.0()2.0log()2.0(

)log(3

0

ppii

i

Entropy

1.846 bits/pixel

=1.9 bits/pixel

Arithmetic Coding

Arithmetic Coding

• Huffman coding has been proven the best fixed length coding method available.

• Yet, since Huffman codes have to be an integral number of bits long, while the entropy value of a symbol may (as a matter of fact, almost always so) be a faction number, theoretical possible compressed message cannot be achieved.

Arithmetic Coding

• For example, if a statistical method assign 90% probability to a given character, the optimal code size would be 0.15 bits.

• The Huffman coding system would probably assign a 1-bit code to the symbol, which is six times longer than necessary.

Arithmetic Coding

• Arithmetic coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating point output number.

Character probability Range ^(space) 1/10

A 1/10

B 1/10

E 1/10

G 1/10

I 1/10

L 2/10

S 1/10

T 1/10

Suppose that we want to encode the message “BILL GATES”BILL GATES”

Arithmetic Coding

• Encoding algorithm for arithmetic coding：low = 0.0 ; high =1.0 ;low = 0.0 ; high =1.0 ;while not EOF dowhile not EOF do

range = high - low ;range = high - low ;read(c) ;read(c) ;high = low + rangehigh = low + rangehigh_range(c) ;high_range(c) ;low = low + rangelow = low + rangelow_range(c) ;low_range(c) ;

end doend dooutput(low);output(low);

Arithmetic Coding

• To encode the first character B properly, the final coded message has to be a number greater than or equal to 0.20 and less than 0.30.– range = 1.0 – 0.0 = 1.0– high = 0.0 + 1.0 × 0.3 = 0.3– low = 0.0 + 1.0 × 0.2 = 0.2

• After the first character is encoded, the low end for the range is changed from 0.00 to 0.20 and the high end for the range is changed from 1.00 to 0.30.

Arithmetic Coding

• The next character to be encoded, the letter I, owns the range 0.50 to 0.60 in the new subrange of 0.20 to 0.30.

• So, the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established.

• Thus, this number is further restricted to 0.25 to 0.26.

Arithmetic Coding

• Note that any number between 0.25 and 0.26 is a legal encoding number of ‘BI’. Thus, a number that is best suited for binary representation is selected.

• (Condition : the length of the encoded message is known or EOF is used.)

0.0

1.0

0.1

0.2

0.3

0.4

0.5

0.6

0.8

0.9

( )

A

B

E

G

I

L

S

T

0.2

0.3

( )

A

B

E

G

I

L

S

T

0.25

0.26

( )

A

B

E

G

I

L

S

T

0.256

0.258

( )

A

B

E

G

I

L

S

T

0.2572

0.2576

( )

A

B

E

G

I

L

S

T

0.2572

0.25724

( )

A

B

E

G

I

L

S

T

0.257216

0.25722

( )

A

B

E

G

I

L

S

T

0.2572164

0.2572168

( )

A

B

E

G

I

L

S

T

0.25721676

0.2572168

( )

A

B

E

G

I

L

S

T

0.257216772

0.257216776

( )

A

B

E

G

I

L

S

T

0.2572167752

0.2572167756

Arithmetic Coding

Character Low HighB 0.2 0.3I 0.25 0.26L 0.256 0.258L 0.2572 0.2576

^(space) 0.25720 0.25724G 0.257216 0.257220A 0.2572164 0.2572168T 0.25721676 0.2572168E 0.257216772 0.257216776S 0.2572167752 0.2572167756

Arithmetic Coding

• So, the final value 0.2572167752 (or, any value between 0.2572167752 and 0.2572167756, if the length of the encoded message is known at the decode end), will uniquely encode the message ‘BILL GATES’.

Arithmetic Coding

• Decoding is the inverse process.• Since 0.2572167752 falls between 0.2 and

0.3, the first character must be ‘B’.• Removing the effect of ‘B’from

0.2572167752 by first subtracting the low value of B, 0.2, giving 0.0572167752.

• Then divided by the width of the range of ‘B’, 0.1. This gives a value of 0.572167752.

Arithmetic Coding

• Then calculate where that lands, which is in the range of the next letter, ‘I’.

• The process repeats until 0 or the known length of the message is reached.

Arithmetic Coding

• Decoding algorithm：r = input_numberrepeat

search c such that r falls in its rangeoutput(c) ;r = r - low_range(c);r = r ÷ (high_range(c) - low_range(c));

until EOF or the length of the message is reached

r c Low High range

0.2572167752 B 0.2 0.3 0.1

0.572167752 I 0.5 0.6 0.1

0.72167752 L 0.6 0.8 0.2

0.6083876 L 0.6 0.8 0.2

0.041938 ^(space) 0.0 0.1 0.1

0.41938 G 0.4 0.5 0.1

0.1938 A 0.2 0.3 0.1

0.938 T 0.9 1.0 0.1

0.38 E 0.3 0.4 0.1

0.8 S 0.8 0.9 0.1

0.0

Arithmetic Coding

• In summary, the encoding process is simply one of narrowing the range of possible numbers with every new symbol.

• The new range is proportional to the predefined probability attached to that symbol.

• Decoding is the inverse procedure, in which the range is expanded in proportion to the probability of each symbol as it is extracted.

Arithmetic Coding

• Coding rate approaches high-order entropy theoretically.

• Not so popular as Huffman coding because × , ÷ are needed.

Huffman Coding. Main properties ： –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Documents

huffman coding slide

arithmetic coding slide

bitspixel slide

outputlow slide

histogram slide

arithmetic coding low

character b

low range low