Top Banner
Huffman Coding
25

Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Huffman Coding

Page 2: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Huffman Coding

• Main properties:– Use variable-length code for encoding a source symbol.

– Shorter codes are assigned to the most frequently used

symbols, and longer codes to the symbols which appear

less frequently.

– Unique decodable & Instantaneous code.

– It was shown that Huffman coding cannot be improved

or with any other integral bit-width coding stream.

Page 3: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Example

40302010

Number

of

pixels

Gray level

a.Step 1:Histogram4.0

100

40

1.0100

10

3.0100

30

2.0100

20

3

2

1

0

g

g

g

g

Page 4: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

1.0

3.02.0

6.03.03.0

4.04.04.0

4.0

6.0

d.Step 4:Reorder and add until only two values remain

1.0

2.0

3.0

4.0

2

0

1

3

gggg

b.Step 2:Order

1.0

3.02.0

3.03.0

4.04.0

c.Step 3:Add

1.0

3.02.0

6.03.03.0

4.04.04.0

Page 5: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

1.0

2.0

3.0

4.0

4.0

6.0

3.0

3.0

4.0

a.Assign 0 and 1 to the rightmost probabilities

1.0

2.0

3.0

4.0

3.0

3.0

4.0

4.0

6.01 0

1

0

0

b.Bring 0 and 1 back along the tree

1

00

010

011

c.Append 0 and 1 previously added branches

d.Repeat the process until the original branch is labeled

1.0

2.0

3.0

4.0

4.0

6.0

3.0

3.0

4.01 0

1

01

00

1.0

2.0

3.0

4.0

4.0

6.0

3.0

3.0

4.01 0

1

01

00

Page 6: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Original Gray Level(Natural Code)

Probability Huffman code

00 0.2 010

01 0.3 00

10 0.1 011

11 0.4 1g

g

g

g

3

2

1

0

Page 7: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

)4.0(1)1.0(3)3.0(2)2.0(3

1

0

plL ii

L

iave

)4.0log()4.0()1.0log()1.0()3.0log()3.0()2.0log()2.0(

)log(3

0

ppii

i

Entropy

1.846 bits/pixel

=1.9 bits/pixel

Page 8: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

Page 9: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• Huffman coding has been proven the best fixed length coding method available.

• Yet, since Huffman codes have to be an integral number of bits long, while the entropy value of a symbol may (as a matter of fact, almost always so) be a faction number, theoretical possible compressed message cannot be achieved.

Page 10: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• For example, if a statistical method assign 90% probability to a given character, the optimal code size would be 0.15 bits.

• The Huffman coding system would probably assign a 1-bit code to the symbol, which is six times longer than necessary.

Page 11: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• Arithmetic coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating point output number.

Page 12: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Character probability Range ^(space) 1/10

A 1/10

B 1/10

E 1/10

G 1/10

I 1/10

L 2/10

S 1/10

T 1/10

Suppose that we want to encode the message “BILL GATES”BILL GATES”

Page 13: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• Encoding algorithm for arithmetic coding:low = 0.0 ; high =1.0 ;low = 0.0 ; high =1.0 ;while not EOF dowhile not EOF do

range = high - low ;range = high - low ;read(c) ;read(c) ;high = low + rangehigh = low + rangehigh_range(c) ;high_range(c) ;low = low + rangelow = low + rangelow_range(c) ;low_range(c) ;

end doend dooutput(low);output(low);

Page 14: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• To encode the first character B properly, the final coded message has to be a number greater than or equal to 0.20 and less than 0.30.– range = 1.0 – 0.0 = 1.0– high = 0.0 + 1.0 × 0.3 = 0.3– low = 0.0 + 1.0 × 0.2 = 0.2

• After the first character is encoded, the low end for the range is changed from 0.00 to 0.20 and the high end for the range is changed from 1.00 to 0.30.

Page 15: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• The next character to be encoded, the letter I, owns the range 0.50 to 0.60 in the new subrange of 0.20 to 0.30.

• So, the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established.

• Thus, this number is further restricted to 0.25 to 0.26.

Page 16: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• Note that any number between 0.25 and 0.26 is a legal encoding number of ‘BI’. Thus, a number that is best suited for binary representation is selected.

• (Condition : the length of the encoded message is known or EOF is used.)

Page 17: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

0.0

1.0

0.1

0.2

0.3

0.4

0.5

0.6

0.8

0.9

( )

A

B

E

G

I

L

S

T

0.2

0.3

( )

A

B

E

G

I

L

S

T

0.25

0.26

( )

A

B

E

G

I

L

S

T

0.256

0.258

( )

A

B

E

G

I

L

S

T

0.2572

0.2576

( )

A

B

E

G

I

L

S

T

0.2572

0.25724

( )

A

B

E

G

I

L

S

T

0.257216

0.25722

( )

A

B

E

G

I

L

S

T

0.2572164

0.2572168

( )

A

B

E

G

I

L

S

T

0.25721676

0.2572168

( )

A

B

E

G

I

L

S

T

0.257216772

0.257216776

( )

A

B

E

G

I

L

S

T

0.2572167752

0.2572167756

Page 18: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

Character Low HighB 0.2 0.3I 0.25 0.26L 0.256 0.258L 0.2572 0.2576

^(space) 0.25720 0.25724G 0.257216 0.257220A 0.2572164 0.2572168T 0.25721676 0.2572168E 0.257216772 0.257216776S 0.2572167752 0.2572167756

Page 19: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• So, the final value 0.2572167752 (or, any value between 0.2572167752 and 0.2572167756, if the length of the encoded message is known at the decode end), will uniquely encode the message ‘BILL GATES’.

Page 20: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• Decoding is the inverse process.• Since 0.2572167752 falls between 0.2 and

0.3, the first character must be ‘B’.• Removing the effect of ‘B’from

0.2572167752 by first subtracting the low value of B, 0.2, giving 0.0572167752.

• Then divided by the width of the range of ‘B’, 0.1. This gives a value of 0.572167752.

Page 21: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• Then calculate where that lands, which is in the range of the next letter, ‘I’.

• The process repeats until 0 or the known length of the message is reached.

Page 22: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• Decoding algorithm:r = input_numberrepeat

search c such that r falls in its rangeoutput(c) ;r = r - low_range(c);r = r ÷ (high_range(c) - low_range(c));

until EOF or the length of the message is reached

Page 23: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

r c Low High range

0.2572167752 B 0.2 0.3 0.1

0.572167752 I 0.5 0.6 0.1

0.72167752 L 0.6 0.8 0.2

0.6083876 L 0.6 0.8 0.2

0.041938 ^(space) 0.0 0.1 0.1

0.41938 G 0.4 0.5 0.1

0.1938 A 0.2 0.3 0.1

0.938 T 0.9 1.0 0.1

0.38 E 0.3 0.4 0.1

0.8 S 0.8 0.9 0.1

0.0

Page 24: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• In summary, the encoding process is simply one of narrowing the range of possible numbers with every new symbol.

• The new range is proportional to the predefined probability attached to that symbol.

• Decoding is the inverse procedure, in which the range is expanded in proportion to the probability of each symbol as it is extracted.

Page 25: Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,

Arithmetic Coding

• Coding rate approaches high-order entropy theoretically.

• Not so popular as Huffman coding because × , ÷ are needed.