Huffman Coding
Dec 22, 2015
Huffman Coding
Huffman Coding
• Main properties:– Use variable-length code for encoding a source symbol.
– Shorter codes are assigned to the most frequently used
symbols, and longer codes to the symbols which appear
less frequently.
– Unique decodable & Instantaneous code.
– It was shown that Huffman coding cannot be improved
or with any other integral bit-width coding stream.
Example
40302010
Number
of
pixels
Gray level
a.Step 1:Histogram4.0
100
40
1.0100
10
3.0100
30
2.0100
20
3
2
1
0
g
g
g
g
1.0
3.02.0
6.03.03.0
4.04.04.0
4.0
6.0
d.Step 4:Reorder and add until only two values remain
1.0
2.0
3.0
4.0
2
0
1
3
gggg
b.Step 2:Order
1.0
3.02.0
3.03.0
4.04.0
c.Step 3:Add
1.0
3.02.0
6.03.03.0
4.04.04.0
1.0
2.0
3.0
4.0
4.0
6.0
3.0
3.0
4.0
a.Assign 0 and 1 to the rightmost probabilities
1.0
2.0
3.0
4.0
3.0
3.0
4.0
4.0
6.01 0
1
0
0
b.Bring 0 and 1 back along the tree
1
00
010
011
c.Append 0 and 1 previously added branches
d.Repeat the process until the original branch is labeled
1.0
2.0
3.0
4.0
4.0
6.0
3.0
3.0
4.01 0
1
01
00
1.0
2.0
3.0
4.0
4.0
6.0
3.0
3.0
4.01 0
1
01
00
Original Gray Level(Natural Code)
Probability Huffman code
00 0.2 010
01 0.3 00
10 0.1 011
11 0.4 1g
g
g
g
3
2
1
0
)4.0(1)1.0(3)3.0(2)2.0(3
1
0
plL ii
L
iave
)4.0log()4.0()1.0log()1.0()3.0log()3.0()2.0log()2.0(
)log(3
0
ppii
i
Entropy
1.846 bits/pixel
=1.9 bits/pixel
Arithmetic Coding
Arithmetic Coding
• Huffman coding has been proven the best fixed length coding method available.
• Yet, since Huffman codes have to be an integral number of bits long, while the entropy value of a symbol may (as a matter of fact, almost always so) be a faction number, theoretical possible compressed message cannot be achieved.
Arithmetic Coding
• For example, if a statistical method assign 90% probability to a given character, the optimal code size would be 0.15 bits.
• The Huffman coding system would probably assign a 1-bit code to the symbol, which is six times longer than necessary.
Arithmetic Coding
• Arithmetic coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating point output number.
Character probability Range ^(space) 1/10
A 1/10
B 1/10
E 1/10
G 1/10
I 1/10
L 2/10
S 1/10
T 1/10
Suppose that we want to encode the message “BILL GATES”BILL GATES”
Arithmetic Coding
• Encoding algorithm for arithmetic coding:low = 0.0 ; high =1.0 ;low = 0.0 ; high =1.0 ;while not EOF dowhile not EOF do
range = high - low ;range = high - low ;read(c) ;read(c) ;high = low + rangehigh = low + rangehigh_range(c) ;high_range(c) ;low = low + rangelow = low + rangelow_range(c) ;low_range(c) ;
end doend dooutput(low);output(low);
Arithmetic Coding
• To encode the first character B properly, the final coded message has to be a number greater than or equal to 0.20 and less than 0.30.– range = 1.0 – 0.0 = 1.0– high = 0.0 + 1.0 × 0.3 = 0.3– low = 0.0 + 1.0 × 0.2 = 0.2
• After the first character is encoded, the low end for the range is changed from 0.00 to 0.20 and the high end for the range is changed from 1.00 to 0.30.
Arithmetic Coding
• The next character to be encoded, the letter I, owns the range 0.50 to 0.60 in the new subrange of 0.20 to 0.30.
• So, the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established.
• Thus, this number is further restricted to 0.25 to 0.26.
Arithmetic Coding
• Note that any number between 0.25 and 0.26 is a legal encoding number of ‘BI’. Thus, a number that is best suited for binary representation is selected.
• (Condition : the length of the encoded message is known or EOF is used.)
0.0
1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.8
0.9
( )
A
B
E
G
I
L
S
T
0.2
0.3
( )
A
B
E
G
I
L
S
T
0.25
0.26
( )
A
B
E
G
I
L
S
T
0.256
0.258
( )
A
B
E
G
I
L
S
T
0.2572
0.2576
( )
A
B
E
G
I
L
S
T
0.2572
0.25724
( )
A
B
E
G
I
L
S
T
0.257216
0.25722
( )
A
B
E
G
I
L
S
T
0.2572164
0.2572168
( )
A
B
E
G
I
L
S
T
0.25721676
0.2572168
( )
A
B
E
G
I
L
S
T
0.257216772
0.257216776
( )
A
B
E
G
I
L
S
T
0.2572167752
0.2572167756
Arithmetic Coding
Character Low HighB 0.2 0.3I 0.25 0.26L 0.256 0.258L 0.2572 0.2576
^(space) 0.25720 0.25724G 0.257216 0.257220A 0.2572164 0.2572168T 0.25721676 0.2572168E 0.257216772 0.257216776S 0.2572167752 0.2572167756
Arithmetic Coding
• So, the final value 0.2572167752 (or, any value between 0.2572167752 and 0.2572167756, if the length of the encoded message is known at the decode end), will uniquely encode the message ‘BILL GATES’.
Arithmetic Coding
• Decoding is the inverse process.• Since 0.2572167752 falls between 0.2 and
0.3, the first character must be ‘B’.• Removing the effect of ‘B’from
0.2572167752 by first subtracting the low value of B, 0.2, giving 0.0572167752.
• Then divided by the width of the range of ‘B’, 0.1. This gives a value of 0.572167752.
Arithmetic Coding
• Then calculate where that lands, which is in the range of the next letter, ‘I’.
• The process repeats until 0 or the known length of the message is reached.
Arithmetic Coding
• Decoding algorithm:r = input_numberrepeat
search c such that r falls in its rangeoutput(c) ;r = r - low_range(c);r = r ÷ (high_range(c) - low_range(c));
until EOF or the length of the message is reached
r c Low High range
0.2572167752 B 0.2 0.3 0.1
0.572167752 I 0.5 0.6 0.1
0.72167752 L 0.6 0.8 0.2
0.6083876 L 0.6 0.8 0.2
0.041938 ^(space) 0.0 0.1 0.1
0.41938 G 0.4 0.5 0.1
0.1938 A 0.2 0.3 0.1
0.938 T 0.9 1.0 0.1
0.38 E 0.3 0.4 0.1
0.8 S 0.8 0.9 0.1
0.0
Arithmetic Coding
• In summary, the encoding process is simply one of narrowing the range of possible numbers with every new symbol.
• The new range is proportional to the predefined probability attached to that symbol.
• Decoding is the inverse procedure, in which the range is expanded in proportion to the probability of each symbol as it is extracted.
Arithmetic Coding
• Coding rate approaches high-order entropy theoretically.
• Not so popular as Huffman coding because × , ÷ are needed.