Top Banner
CPSC 335 CPSC 335 Compression and Compression and Huffman Coding Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada
20

CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Dec 17, 2015

Download

Documents

Jemima Heath
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

CPSC 335CPSC 335

Compression and Compression and Huffman CodingHuffman Coding

Dr. Marina Gavrilova

Computer Science

University of Calgary

Canada

Page 2: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Lecture OverviewLecture Overview

Huffman CodingHuffman Coding Non-determinism of the Non-determinism of the

algorithmalgorithm Implementations: Implementations:

Singly-linked ListSingly-linked List Doubly-linked listDoubly-linked list Recursive top-downRecursive top-down Using heapUsing heap

Adaptive Huffman codingAdaptive Huffman coding

Page 3: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Huffman CodingHuffman Coding

Algorithm is used to assign a Algorithm is used to assign a codework codework to each character in the to each character in the text according to their frequencies. The codework is usually text according to their frequencies. The codework is usually represented as a bitstring. represented as a bitstring.

Algorithm starts with the set of individual trees, consisting of a Algorithm starts with the set of individual trees, consisting of a single node, sorted in the order of increasing character single node, sorted in the order of increasing character probabilities. probabilities.

Then two trees with the smallest probabilities are selected and Then two trees with the smallest probabilities are selected and processed so that they become the left and the right sub-tree of processed so that they become the left and the right sub-tree of the parent node, combining their probabilities. the parent node, combining their probabilities.

In the end, 0 are assigned to all left branches of the tree, 1 to all In the end, 0 are assigned to all left branches of the tree, 1 to all right branches, and the codework for all leaves (characters) of right branches, and the codework for all leaves (characters) of the tree is generated.the tree is generated.

Page 4: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Non-determinism of the Huffman CodingNon-determinism of the Huffman Coding

Page 5: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Non-determinism of the Huffman CodingNon-determinism of the Huffman Coding

Page 6: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Huffman Algorithm Huffman Algorithm Implementation – Linked ListImplementation – Linked List

Implementation depends on the ways to represent Implementation depends on the ways to represent the priority queue, which requires removing two the priority queue, which requires removing two smallest probabilities and inserting the new smallest probabilities and inserting the new probability in the proper positions.probability in the proper positions.

The first way to implement the priority queue is the The first way to implement the priority queue is the ssingly linked list of references to treesingly linked list of references to trees, which , which resembles the algorithm presented in the previous resembles the algorithm presented in the previous slides. slides.

The tree with the smallest probability is replaced by The tree with the smallest probability is replaced by the newly created tree. the newly created tree.

From the trees with the same probability, the first From the trees with the same probability, the first trees encountered are chosen.trees encountered are chosen.

Page 7: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Doubly Linked ListDoubly Linked List

All probability nodes are first ordered, the All probability nodes are first ordered, the first two trees are always removed. first two trees are always removed.

The new tree is inserted at the end of the The new tree is inserted at the end of the list in the sorted order. list in the sorted order.

A doubly-linked list of references to trees A doubly-linked list of references to trees with immediate access to the beginning and with immediate access to the beginning and to the end of this list is used.to the end of this list is used.

Page 8: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Doubly Linked-List implementationDoubly Linked-List implementation

Page 9: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Recursive ImplementationRecursive Implementation

Top-down approach for building a tree starting Top-down approach for building a tree starting from the highest probability. The root probability from the highest probability. The root probability is known if lower probabilities, in the root’s is known if lower probabilities, in the root’s children, have been determined, the latter are children, have been determined, the latter are known if the lower probabilities have been known if the lower probabilities have been computed etc.computed etc.

Thus, the recursive algorithm can be used.Thus, the recursive algorithm can be used.

Page 10: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Implementation using HeapImplementation using Heap

The min-heap of probabilities is built.The min-heap of probabilities is built.

The highest probability is put in the root.The highest probability is put in the root.

Next, the heap property is restoredNext, the heap property is restored

The smallest probability is removed and the root The smallest probability is removed and the root probability is set to the sum of two smallest probability is set to the sum of two smallest probabilities.probabilities.

The processing is complete when there is only one The processing is complete when there is only one node in the heap left.node in the heap left.

Page 11: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Huffman implementation with a heapHuffman implementation with a heap

Page 12: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Huffman Coding for pairs of charactersHuffman Coding for pairs of characters

Page 13: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Devised by Robert Gallager and improved by Donald Devised by Robert Gallager and improved by Donald Knuth.Knuth.

Algorithm is based on the Algorithm is based on the sibling property: sibling property: if each if each node has a sibling, and the breadth-first right-to-left node has a sibling, and the breadth-first right-to-left tree traversal generates a list of nodes with non-tree traversal generates a list of nodes with non-increasing frequency counters, it is a Huffman tree.increasing frequency counters, it is a Huffman tree.

In adaptive Huffman coding, the tree includes a In adaptive Huffman coding, the tree includes a counter for each symbol updated every time counter for each symbol updated every time corresponding symbol is being coded. corresponding symbol is being coded.

Checking whether the sibling property holds ensures Checking whether the sibling property holds ensures that the tree under construction is a Huffman tree. If that the tree under construction is a Huffman tree. If the sibling property is violated, the tree is restored. the sibling property is violated, the tree is restored.

Adaptive Huffman CodingAdaptive Huffman Coding

Page 14: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Adaptive Huffman CodingAdaptive Huffman Coding

Page 15: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Adaptive Huffman CodingAdaptive Huffman Coding

Page 16: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

SourcesSources

Web links:Web links: MP3 Converter:MP3 Converter:

http://www.mp3-onverter.com/mp3codec/huffman_coding.htm

Practical Huffman Coding: Practical Huffman Coding: http://www.compressconsult.com/huffman/http://www.compressconsult.com/huffman/

Drozdek Textbook - Chapter 11Drozdek Textbook - Chapter 11

Page 17: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Shannon-FanoShannon-Fano

In the field of In the field of data compression, , Shannon–Fano codingShannon–Fano coding, , named after named after Claude Shannon and and Robert Fano, is a technique , is a technique for constructing a for constructing a prefix code based on a set of symbols and based on a set of symbols and their probabilities (estimated or measured). their probabilities (estimated or measured).

It is It is suboptimal in the sense that it does not achieve the lowest in the sense that it does not achieve the lowest possible expected code word length like possible expected code word length like Huffman coding; ; however unlike Huffman coding, it does guarantee that all code however unlike Huffman coding, it does guarantee that all code word lengths are within one bit of their theoretical ideal – word lengths are within one bit of their theoretical ideal – entropy.entropy.

Page 18: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Shannon-Fano CodingShannon-Fano Coding

For a given list of symbols, develop a corresponding list of probabilities For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of or frequency counts so that each symbol’s relative frequency of occurrence is known.occurrence is known.

Sort the lists of symbols according to frequency, with the most frequently Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right.occurring symbols at the left and the least common at the right.

Divide the list into two parts, with the total frequency counts of the left Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible.part being as close to the total of the right as possible.

The left part of the list is assigned the binary digit 0, and the right part is The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start first part will all start with 0, and the codes in the second part will all start with 1.with 1.

Recursively apply the steps 3 and 4 to each of the two halves, Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.become a corresponding code leaf on the tree.

Page 19: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Shannon-Fano exampleShannon-Fano example

Page 20: CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Shannon-Fano Shannon-Fano

ReferencesReferences Shannon, C.E. (July 1948). "A Mathematical Theory of Shannon, C.E. (July 1948). "A Mathematical Theory of

Communication". Communication". Bell System Technical JournalBell System Technical Journal 2727: 379–423. : 379–423. http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf.pdf.

Fano, R.M. (1949). "The transmission of information". Fano, R.M. (1949). "The transmission of information". Technical Technical Report No. 65Report No. 65 (Cambridge (Mass.), USA: Research Laboratory (Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT).of Electronics at MIT).