Top Banner
Data Compressor---Huffman Encoding and Decoding
44

Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Dec 17, 2015

Download

Documents

Jared Summers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Data Compressor---Huffman Encoding and Decoding

Page 2: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman Encoding

• Compression• Typically, in files and messages,

• Each character requires 1 byte or 8 bits• Already wasting 1 bit for most purposes!

• Question• What’s the smallest number of bits that can be used to

store an arbitrary piece of text?

• Idea• Find the frequency of occurrence of each character• Encode Frequent characters short bit strings• Rarer characters longer bit strings

Page 3: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 4: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 5: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 6: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 7: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 8: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman's Algorithm

• 1952• Repeatedly merges trees - maintains a forest• Tree weight - the sum of its leaves

frequencies• For C characters to code, start with C single

node trees

• Select two trees, T1 and T2, of smallest weights and merge them

• C - 1 merge operations

Page 9: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman Encoding• Encoding

• Use a tree• Encode by following

tree to leaf• eg

• E is 00• S is 011

• Frequent charactersE, T 2 bit encodings

• Others A, S, N, O 3 bit encodings

Page 10: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman Encoding• Encoding

• Use a tree• Inefficient in practice

• Use a direct-addressed lookuptable

? Finding the optimal encoding• Smallest number of bits to

represent arbitrary text

A 010

E 00

B

:

:

N

:

S

T

110

001

10

Page 11: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

• A divide-and-conquer approach might have us asking which characters should appear in the left and right subtrees and trying to build the tree from the top down.

• A greedy approach places our n characters in n sub-trees and starts by combining the two least weight nodes into a tree which is assigned the sum of the two leaf node weights as the weight for its root node.

Page 12: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman Encoding• Divide and conquer

• Decide on a root - n choices• Decide on roots for sub-trees - n choices• Repeat n times

O(n!)

• Greedy Approach• Sort characters by frequency• Form two lowest weight nodes into a sub-tree

• Sub-tree weight = sum of weights of nodes• Move new tree to correct place

Page 13: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Standard Coding Scheme

Page 14: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Binary Tree Representation

• For the character set of C characters, the standard fixed-length coding needs ┌log C┐ bits

• Fixed-length code can be represented by a binary tree where characters are stored only in leaf nodes - binary trie

• Each character path - start at the root, follow the branches, record 0 for the left branch and 1 for the right branch

• Optimal code is always a full tree - all nodes are either leaves or have two children

Page 15: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Representation by a Binary Trie

Page 16: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Improved Binary Trie

Page 17: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Prefix Code

• The fixed-length character code that has characters places only at the leaves guarantees that any bit sequence can be decoded unambiguously

• Prefix code - characters may have varying lengths as long as no character code is a prefix of another code

• That means that characters can be only in leafs

Page 18: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 19: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Optimal Prefix Code Tree

Page 20: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Optimal Prefix Code Cost

Page 21: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 22: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman’s Algorithm Example - I

Page 23: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman’s Algorithm Example - II

Page 24: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman’s Algorithm Example - III

Page 25: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman’s Algorithm Example - IV

Page 26: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman’s Algorithm Example - V

Page 27: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman’s Algorithm Example - VI

Page 28: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman’s Algorithm Example-VII

Page 29: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 30: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 31: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 32: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman Encoding - Operation

Initial sequenceSorted by frequency

Combine lowest twointo sub-tree

Move it to correctplace

Page 33: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

After shifting sub-treeto its correct place ...

Huffman Encoding - Operation

Combine next lowestpair

Move sub-tree to correct place

Page 34: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Move the new tree to the correct place ...

Huffman Encoding - Operation

Now the lowest two are the“14” sub-tree and D

Combine and move to correct place

Page 35: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Move the new tree to the correct place ...

Huffman Encoding - Operation

Now the lowest two are thethe “25” and “30” trees

Combine and move to correct place

Page 36: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman Encoding - Operation

Combine last two trees

Page 37: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 38: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 39: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 40: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 41: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

• How do we decode a Huffman-encoded bit string? With these variable length strings, it's not possible to break up an encoded string of bits into characters!"

• The decoding procedure is deceptively simple. Starting with the first bit in the stream, one then uses successive bits from the stream to determine whether to go left or right in the decoding tree. When we reach a leaf of the tree, we've decoded a character, so we place that character onto the (uncompressed) output stream. The next bit in the input stream is the first bit of the next character.

Page 42: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman Encoding - Decoding

Page 43: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Page 44: Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Huffman Encoding - Time Complexity

• Sort keys O(n log n)

• Repeat n times• Form new sub-treeO(1)

• Move sub-tree O(logn)(binary search)

• Total O(n log n) • Overall O(n log n)