Top Banner
Lecture 6: Huffman Code Thinh Nguyen Oregon State University
24

Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Apr 27, 2018

Download

Documents

vuongdung
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Lecture 6:Huffman Code

Thinh NguyenOregon State University

Page 2: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

ReviewCoding: Assigning binary codewords to (blocks of) source symbols.

Variable-length codes (VLC)

Tree codes (prefix code) are instantaneous.

Page 3: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Example of VLC

Page 4: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Creating a Code: The Data Compression Problem

Assume a source with an alphabet A and known symbol probabilities {pi}.

Goal: Chose the codeword lengths as to minimize the bitrate, i.e., the average number of bits per symbol ∑ li * pi.

Trivial solution: li = 0 * i.

Restriction: We want an decodable code, so ∑ 2-li <=1 (Kraft inequality) must be valid.

Solution (at least in theory): li = – log pi

Page 5: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

In practice…Use some nice algorithm to find the codes

Huffman codingTunnstall codingGolomb coding

Page 6: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Huffman Average Code LengthInput: Probabilities p1, p2, ... , pm for symbols a1, a2, ... ,am, respectively.

Output: A tree that minimizes the average number of bits (bit rate) to code a symbol. That is, minimizes

i

m

iilpl ∑

=

=1

Where li is the length of codeword ai

Page 7: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Huffman CodingTwo-step algorithm:

1. Iterate:– Merge the least probable symbols.– Sort.

2. Assign bits.

a

d

b

c

0.5

0.25

0.125

0.125

0.5

0.25

0.5

0.25

0.5

Merge

Sort

Assign

0

1

0

10

11

0

10

110

111 Get code

Page 8: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

More Examples of Huffman Code

Page 9: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

More Examples of Huffman Code

Page 10: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

More Examples of Huffman Code

Page 11: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

More Examples of Huffman Code

Page 12: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Average Huffman Code Length

Page 13: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Optimality of A Prefix CodeNecessary conditions for an optimal variable-length binary code:

1. Given any two letters aj and ak, if P(aj) >= P(ak) , then lj <= lk, where lj is the length of the codeword aj.

2. The two least probable letters have codewords with the same maximum length lm.

3. In the tree corresponding to the optimum code, there must be two branches stemming from each intermediate node.

4. Suppose we change an intermediate node into a leaf node by combining all the leaves descending from it into a composite word of a reduced alphabet. Then if the original tree was optimal for the original alphabet, the reduced tree is optimal for the reduced alphabet.

Page 14: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Condition 1: If P(aj) >= P(ak) , then lj <= lk, where lj is the length of the codeword aj.

Easy to see why?Proof by contradiction:

Suppose a code X is optimal with P(aj) >= P(ak), but lj > lk

By simply exchanging aj and ak, we have a new code Y in which, its average length = ∑ lipi is smaller than that of code X.Hence, the contradition is reached. Thus, condition must hold

Page 15: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Condition 2: The two least probable letters have codewords with

the same maximum length lm.

Easy to see why?

Proof by contradiction:Suppose we have an optimal code X in which, two codewords with lowest probabilities ci and cj and that ci is longer than cj by k bits.

Then because this is a prefix code, cj cannot be the prefix to cj. So, we can drop the last k bits of ci.

We also guarantee that by dropping the last k bits of ci, we still have a decodable codeword. This is because ci and cj have the longest length (least probable codes), hence they cannot be the prefix of any other code.By dropping the k bits of ci , we create a new code Y which has shorter average length, hence contradiction is reached.

Page 16: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Condition 3: In the tree corresponding to the optimum code, there must be two branches stemming from each intermediate node..

Easy to see why?

If there were any intermediate node with only one branch coming from that node, we could remove it without affecting the decodability of the code while reducing its average length.

0 1

0

0 1

a b

c

a: 000

b: 001

c: 1

0 1

0

0 1

a b

c

a: 00

b: 01

c: 1

Page 17: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Condition 4:Suppose we change an intermediate node into a leaf node by combining all the leaves descending from it into a composite word of a reduced alphabet. Then if the orginal tree was optimal for the original alphabet, the reduced tree is optimal for the reduced alphabet.

0 1

0

0 1

a b

d

a: 000

b: 001

c: 01

d:1

c

1

0 1

0

e

d

c

1

e: 00

c: 01

d:1

Page 18: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Huffman code satisfies all four conditions

Lower probable symbols are at longer depth of the tree (condition 1).

Two lowest probable symbols have equal length (condition 2).

Tree has two branches (condition 3).

Code for the reduced alphabet needs to be optimum for the code of the original alphabet to be optimum by construction (condition 4)

Page 19: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Optimal Code Length (Huffman Code Length)

1)()( +<≤−

SHlSH

source theofEntropy :)(log)()(

code optimalan oflength Average :

12 i

m

ii aPaPSH

l

∑=

−=

Proof:

Page 20: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Extended Huffman Code

nSHlSH /1)()( +<≤−

source theofEntropy :)(CodeHuffman oflength Average :

SHl−

Proof: page 53 of the book

}...,...,...,...{},,...{ 211

n times

1112,1 mmmn

m aaaaaaaaaAaaaA43421

==

alphabet A in the symbols nnm

Page 21: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Huffman Coding: Pros and Cons+ Fast implementations.

+ Error resilient: resynchronizes in ~ l2 steps.

- The code tree grows exponentially when the source is extended.

- The symbol probabilities are built-in in the code.Hard to use Huffman coding for extended sources / large alphabets or when the symbol probabilities are varying by time.

Page 22: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Huffman Coding of 16-bit CD-quality audio

1.15349,30013.8402,442Folk rock (Cohn)

1.30725,42012.8939,862Mozart symphony

Compression Ratio

Compressed File Size (bytes)

Entropy (bits)Original file size (bytes)

Filename

1.54261,59010.4402,442Folk rock (Cohn)

1.65569,7929.7939,862Mozart symphony

Compression Ratio

Compressed File Size (bytes)

Entropy (bits)Original file size (bytes)

Filename

Huffman coding of the Differences

Page 23: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Complexity of Huffman CodeO(n log(n))

Log(n) is the depth of the tree and n operation to compare for the lowest probabilities.

Page 24: Lecture 6: Huffman Code - College of Engineering - …web.engr.oregonstate.edu/.../spring06/huffman_code.pdfHuffman Average Code Length Input: Probabilities p 1, p 2, ... , p m for

Notes on Huffman CodeFrequencies computed for each input

Must transmit the Huffman code or frequencies as well as the compressed input.Requires two passes

Fixed Huffman tree designed from training dataDo not have to transmit the Huffman tree because it is known to the decoder.H.263 video coder

3. Adaptive Huffman codeOne passHuffman tree changes as frequencies change