Information Retrieval 902333 1 Paper Topic : Huffman Paper Topic : Huffman coding coding Team Members : Team Members : Hana Hana ’ ’ a shdefat 0950902031 a shdefat 0950902031 Bushra hasaien 0900902008 Bushra hasaien 0900902008 Dr.Saif Dr.Saif Rababah Rababah
24
Embed
Paper Topic : Huffman coding Codes1.pdf• Paper topic: Huffman coding. Information Retrieval 902333 6 Huffman Coding • Uses the minimum number of bits • Variable length coding
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Information Retrieval 902333 1
Paper Topic : Huffman Paper Topic : Huffman codingcoding
Team Members :Team Members :HanaHana’’a shdefat 0950902031a shdefat 0950902031Bushra hasaien 0900902008Bushra hasaien 0900902008
Dr.SaifDr.Saif RababahRababah
Information Retrieval 902333 2
Data CompressionData Compression
•• PressurePressure data:data:data compression (for a picture or sound or video or text , or to draw a Graphics ... etc.) is to reduce the size of the data. The main aim of data compression is to reduce the file size.
Information Retrieval 902333 3
Type of Data Type of Data CompressionCompression
Data compressionData compression in generalin general is dividedis divided into two into two partsparts oror two types, namely:two types, namely:
o The first type is called (Lossless) without losing anything. that the file is the output of the compression process Photocopy the original file before any pressure without losing anything.
o The second type is called (Lossy) the loss of everything .that the file output of the compression process is
similar to the original file to a certain extent, . But these data are the data lost is essential to us (for the User).
Information Retrieval 902333 4
History of Huffman codes• In 1951 was Professor Robert M.. Fano (who was a
professor at the Massachusetts Institute of Technology), who teaches Trivhtermaz Shannon -Fano his students. The Professor Btejearstudents either to attend the final exam or Ajdo way better and more efficient than encoding Shannon -Fano. Try David Hoffman- and it was one of the students, Professor - to find a better way ofShannon - Fano path of trial and error and was about to give upthe idea and Alstaadad of the test until he found a way to build atree from the bottom up unlike encoding Shannon -Fano and thusthe encoding is better than coding Shannon - Fano
Information Retrieval 902333 5
David A. Huffman• BS Electrical Engineering at
Ohio State University• Worked as a radar maintenance
officer for the US Navy• PhD student, Electrical
Engineering at MIT 1952• Was given the choice of writing
a term paper or to take a final exam
• Paper topic: Huffman coding
Information Retrieval 902333 6
Huffman Coding
• Uses the minimum number of bits• Variable length coding – good for data transfer
– Different symbols have different lengths• Symbols with the most frequency will result
in shorter codewords• Symbols with lower frequency will have
longer codewords• “Z” will have a longer code representation
then “E” if looking at the frequency of character occurrences in an alphabet
• No codeword is a prefix for another codeword!
Information Retrieval 902333 7
DecodingDecoding
• To determine the original message, read the string of bits from left to right and use the table to determine the individual symbols
ABCABCABCA
Decode the following: 01101101101101101101
SymSymbolbol
CodCodee
A 01
B 10
C 11
Information Retrieval 902333 8
Decoding
Original String:Original String:01101101101101101101
Symbol
Code
A 01
B 10
C 11
10 11 01 10 1101 01 10 11 01
A A A AB B BC C C
Information Retrieval 902333 9
• This text contain 10 character8 * 10 = 80 bit• To represent the text with 2
bits instead of 8-bit 2 * 10 = 20 bit• In this case is not lost from the text
but on the contrary, we reduced space
• Compression ratio = compressed text \ text non-compressed20 \ 80 = %25
Information Retrieval 902333 10
• A is More than a repetition of the characters that place just put (0)
ratio = compressed text \ text non-compressed16 \ 80 = % 20
SymSymbolbol
CodCodee
A 0
B 10
C 11
Information Retrieval 902333 11
Representing a Huffman Table as Representing a Huffman Table as a Binary Treea Binary Tree
• Codewords are presented by a binary tree• Each leaf stores a character• Each node has two children
– Left = 0– Right = 1
• The codeword is the path from the root to the leaf storing a given character
• The code is represented by the leads of the tree is the prefix code
Information Retrieval 902333 12
AlgorithmAlgorithm• Make a leaf node for node symbol
– Add the generation probability for each symbol to the leaf node
• Take the two leaf nodes with the smallest probability (pi) and connect them into a new node (which becomes the parent of those nodes)– Add 1 for the right edge– Add 0 for the left edge– The probability of the new node is the sum
of the probabilities of the two connecting nodes
• If there is only one node left, the code construction is completed.