Data Compression Huffman Coding - University of Babylon · 2016-12-16 · The Huffman Coding Algorithm: This technique was developed by David Huffman as part of a class assignment;
Post on 05-Jul-2020
4 Views
Preview:
Transcript
وزارة التعليم العالي والبحث العلمي
كلية العلوم للبنات-جامعة بابل
الحاسبات-قسم
Data Compression
- Huffman Coding
بأشراف االستاذ
علي كاظم محمد
***************************
-A-ةاعداد طالبات المرحلة الرابع
- ختام حسين حبيب
- رسل سمير عبد العالي
Introduction of Huffman Code
In computer science and information theory, a Huffman code is a particular type
of optimal prefix code that is commonly used for lossless data compression. The
process of finding and/or using such a code proceeds by means of Huffman
coding, an algorithm developed by David A. Huffman while he was a Sc.D. student
at MIT, and published in the 1952 paper "A Method for the Construction of
Minimum-Redundancy Codes.
The output from Huffman's algorithm can be viewed as a variable-length code
table for encoding a source symbol (such as a character in a file). The algorithm
derives this table from the estimated probability or frequency of occurrence
(weight) for each possible value of the source symbol. As in other entropy
encoding methods, more common symbols are generally represented using fewer
bits than less common symbols. Huffman's method can be efficiently
implemented, finding a code in time linear to the number of input weights if these
weights are sorted.[2] However, although optimal among methods encoding
symbols separately, Huffman coding is not always optimal among all compression
methods.
The Huffman encoding scheme takes advantage of the disparity between
frequencies and uses less storage for the frequently occurring characters at the
expense of having to use more storage for each of the more rare characters.
Huffman is an example of a variable-length encoding some characters may only
require 2 or 3 bits and other characters may require 7, 10, or 12 bits. The savings
from not having to use a full 8 bits for the most common characters makes up for
having to use more than 8 bits for the rare characters and the overall effect is that
the file almost always requires less space.
The Huffman Coding Algorithm:
This technique was developed by David Huffman as part of a class assignment; the
class was the first ever in the area of information theory and was taught by
Robert Fano at MIT [22].
The codes generated using this technique or procedure are called Huffman codes.
These codes are prefix codes and are optimum for a given model (set of
probabilities).
The Huffman procedure is based on two observations regarding optimum prefix
codes.
1. In an optimum code, symbols that occur more frequently (have a higher
probability of occurrence) will have shorter codewords than symbols that occur
less frequently.
2. In an optimum code ,the two symbols that occur least frequently will have the
same length.
It is easy to see that the first observation is correct. If symbols that occur more
often had codewords that were longer than the codewords for symbols that
occurred less often, the average number of bits per symbol would be larger than
if the conditions were reversed.
Therefore, a code that assigns longer codewords to symbols that occur more
frequently cannot be optimum
Example 1:-
If the alphabet {a,b,c,d,e,f} with p(a)=18% ,p(b)=25% ,p(c)=12% ,p(d)=20%
,p(e)=10% ,p(f)=15% ,Encoding using Huffman .
Solution:-
ai P(ai) code Li e 10% 101 3
c 12% 100 3 f 15% 001 3
a 18% 000 3
d 20% 11 2 b 25% 01 2
1- We link e and c in ec
ec=P(e)+p(c)=22%
ai P(ai)
f 15%
a 18%
d 20%
ec 22%
b 25%
2- We link f and a in fa
fa=p(f)+p(a)=33%
ai P(ai)
d 20%
ec 22% b 25%
fa 33%
3- We link d and ec in dec
dec=p(d)+p(ec)=42%
4- We link b and fa in bfa
bfa=p(b)+p(fa)=58%
5- We link dec and bfa in decbfa
decbfa=p(dec)+p(bfa)=100%
ai P(ai)
b 25%
fa 33%
dec 42%
ai P(ai)
dec 42%
bfa 58%
ai P(ai)
decbfa 100%
* decbfa represents the root.
decbfa 100%
0 1
%58bfadec42% 0 1 0 1
%33 fa b 25% %22 ec d 20% 1
0 1 0 1
%18 a f 15% 1 12% c e 10%
* The average length for this code is
L average=3*10+3*12+3*15+3*18+2*20+2*25=255/100%
=2.55 bits/symbol
* The entropy for this source is given by
=-[0.1*log2 0.1+0.12*log2 0.12+0.15*log2 0.15+0.18*log2 0.18+
0.2*log2 0.2+0.25*log2 0.25]
=2.4
* The efficiency of the Huffman code is
Efficiency= H / L average *100%
=2.46/2.55*100%
=96%
* Length - Limited Huffman Codes =
Example 2:-
If the alphabet {a1,a2,a3,a4,a5} with p(a1)=p(a3)=0.2,p(a2)=0.4,p(a4)=p(a5)=0.1
The entropy for this source is 2.122 . To design the Huffman code.
Solution:-
ai P(ai) code Li
a5 0.1 0000 4
a4 0.1 0001 4
a3 0.2 001 3
a1 0.2 01 2
a2 0.4 1 1
1- We link a5 and a4 in a5a4
a5a4=P(a5)+p(a4)=0.2
ai P(ai)
a5a4 0.2
a3 0.2
a1 0.2
a2 0.4
2- We link a5a4 and a3 in a5a4a3
a5a4a3=P(a5a4)+p(a3)=0.4
ai P(ai)
a1 0.2
a5a4a3 0.4
a2 0.4
3- We link a1 and a5a4a3 in a1a5a4a3
a1a5a4a3=P(a1)+p(a5a4a3)=0.6
4- We link a2 and a1a5a4a3 in a2a1a5a4a3
a2a1a5a4a3=P(a2)+p(a1a5a4a3)=1
* a2a1a5a4a3 represents the root
ai P(ai)
a2 0.4
a1a5a4a3 0.6
ai P(ai)
a2a1a5a4a3 1
a2a1a5a4a3
0 1
a1a5a4a3 (0.6) a2 (0.4)
0 1
a5a4a3 (0.4) a1 (0.2)
0 1
a5a4 (0.2) a3 (0.2)
0 1
a5 (0.1) a4(0.1)
* The average length for this code is
L average=0.4*1+0.2*2+0.2*3+0.1*4+0.1*4
=2.2 bits/symbol
* The efficiency of the Huffman code is
Efficiency= H / L average *100%
=2.122/2.2
=96.5%
* Length - Limited Huffman Codes =
=0.93
top related