Top Banner
Compression and Huffman Coding
16

Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Jun 29, 2018

Download

Documents

trinhphuc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Compression and Huffman Coding

Page 2: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Huffman Coding Non-determinism of the algorithm Implementations: ◦ Singly-linked List ◦ Doubly-linked list ◦ Recursive top-down ◦ Using heap

Adaptive Huffman coding

Page 3: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Algorithm is used to assign a codework to each character in the text according to their frequencies. The codework is usually represented as a bitstring.

Algorithm starts with the set of individual trees, consisting of the a single node, sorted in the order of increasing character probabilities.

Then two trees with the smallest probabilities are selected and processed so that they become the left and the right sub-tree of the parent node, combining their probabilities.

In the end, o are assigned to all left branches of the tree, 1 to all right branches, and the codework for all leaves 9(characters) of the tree is generated.

Page 4: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Non-determinism of the Huffman Coding

Page 5: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Non-determinism of the Huffman Coding

Page 6: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Implementation depends on the ways to represent the priority queue, which requires removing two smallest probabilities and inserting the new probability in the proper positions.

The first way to implement the priority queue is the singly linked list of references to trees, which resembles the algorithm presented in the previous slides.

The tree with the smallest probability is replaced by the newly created tree.

From the trees with the same probability, the first trees encountered are chosen.

Page 7: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

All probability nodes are first ordered, the first two trees are always removed.

The new tree is inserted at the end of the list in the sorted order.

A doubly-linked list of references to trees with immediate access to the beginning and to the end of this list is used.

Page 8: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Doubly Linked-List implementation

Page 9: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Top-down approach for building a tree starting

from the highest probability. The root probability is known if lower probabilities, in the root’s children, have been determined, the latter are known if the lower probabilities have been computed etc.

Thus, the recursive algorithm can be used.

Page 10: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Recursive top-down algorithm

Page 11: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

The min-heap of probabilities is built. The highest probability is put in the root. Next, the heap property is restored The smallest probability is removed and the root probability

is set to the sum of two smallest probabilities. The processing is complete when there is only one node in

the heap left.

Page 12: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Huffman implementation with a heap

Page 13: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Devised by Robert Gallager and improved by Donald Knuth. Algorithm is based on the sibling property: if each node has a

sibling, and the breadth-first right-to-left tree traversal generates a list of nodes with non-increasing frequency counters, it is a Huffman tree.

In adaptive Huffman coding, the tree includes a counter for each symbol updated every time corresponding symbol is being coded.

Checking whether the sibling property holds ensures that the tree under construction is a Huffman tree. If the sibling property is violated, the tree is restored.

Page 14: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Adaptive Huffman Coding

Page 15: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

What is Huffman Coding? Is algorithm non-deterministic and why? List all possible implementations (i.e. lsit,

recursive, heap). What is the main idea of adaptive Huffman

coding?

Page 16: Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Web links: ◦ MP3 Converter http://www.mp3-

converter.com/mp3codec/huffman_coding.htm ◦ Practical Huffman Coding:

http://www.compressconsult.com/huffman/

Drozdek Textbook - Chapter 11