47 Lossless compression methods Mohammed Chassab Mahdi Assist .Lecturer-Institute of Technical /Kufa Abstract the aim of research is to find compression methods which are more appropriate to compress files with different extensions, and what the effect of changing the size of the file on the compression ratio . In this research have been selected eight most common of file extensions and for each one of these extensions have been selected ten different size files as a samples to compress by using three lossless compression methods(RLE, Huffman ,LZW ) have also been discussed the response of each extension to the three methods. All programs have been written using visual basic language مستخلص ال يهدف البحث إلى إيجاد طرق الضغطكثر اءم مملفات ة لضغط ال ذات ادادات مد ال فة ، مخدل ك ما ي هدف إ لى درا سة جم ير ح دأثير دغ ال ملف على نسبة ال ضغط. Compression Ratio لبحااثذا اااه اا يدنااثااة ث ماادااا باادلضااغطرق ا طااLossless compression methods ة طرياي(RLE ,Huffman ,LZW) خدمهايسادت ذا فا ضغط مل ل اي فهات مخدلداد د ية ام ت ثمانك ثر ال كهعا شي ضغط ع شرة مل فات دمداد امد بإح جام مخدل فة سدجابةق شة ا ك ما د مت منا ضغط بالطرقلداد ل كه ام د ثة الث. غة الف ج برامج كد بت بل كه ال اه بيسيك.1-Introduction Data compression is a general term used to describe the process of recoding data so that it requires fewer bytes of storage space .Two very important terms used in all however are lossless compression and lossy compression [6]. Lossless data is the ability to shrink a file then reconstitute it to its original form. Lossy compression however, is the ability to eliminate some data during the process (rather than shrink) then reconstitute it. These terms are vital in understanding the differences between the different types. There are many different kinds of ways to perform data compression. Each one is used differently and has its intended purpose. The first kind of compression is dictionary-based compression. In this type compression replaces characters with one individual codeword. This codeword is then directed to a dictionary that is able to find the original structure of the word. It shortens words, sentences, or paragraphs so that the entirety of each do not have to be transported. Another type of compression is statistical compression. During this process the frequencies of characters are manipulated in order to perform the necessary task. Characters that are repeated throughout the file are given bit patterns. In this process letters or individual characters that are repeated many times throughout a piece are
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
47
Lossless compression methods Mohammed Chassab Mahdi
Assist .Lecturer-Institute of Technical /Kufa
Abstract
the aim of research is to find compression methods which are more appropriate to
compress files with different extensions, and what the effect of changing the size of the
file on the compression ratio .
In this research have been selected eight most common of file extensions and for each
one of these extensions have been selected ten different size files as a samples to
compress by using three lossless compression methods(RLE, Huffman ,LZW ) have
also been discussed the response of each extension to the three methods. All programs
have been written using visual basic language
المستخلص ة لضغط الملفات مالءم األكثرالضغط طرق إيجاد إلىالبحث يهدفً ددادات ذات اال فة ،الم هدف ماكمخدل لىي سة إ جم درا ير ح دأثير دغ
فات ذا ضغط مل اي ل فه ددادات مخدل ية ام ثرت ثمان كه األك شي عا لكما دمت مناقشة اسدجابة مخدلفة بإحجامامدداد دم ضغط عشرة ملفات اه كه البرامج كدبت بلغة الفج . الثالثةكه امدداد للضغط بالطرق
بيسيك.
1-Introduction
Data compression is a general term used to describe the process of recoding data
so that it requires fewer bytes of storage space .Two very important terms used in all
however are lossless compression and lossy compression [6].
Lossless data is the ability to shrink a file then reconstitute it to its original form. Lossy
compression however, is the ability to eliminate some data during the process (rather
than shrink) then reconstitute it. These terms are vital in understanding the differences
between the different types.
There are many different kinds of ways to perform data compression. Each one is
used differently and has its intended purpose.
The first kind of compression is dictionary-based compression. In this type
compression replaces characters with one individual codeword. This codeword is then
directed to a dictionary that is able to find the original structure of the word. It shortens
words, sentences, or paragraphs so that the entirety of each do not have to be
transported.
Another type of compression is statistical compression. During this process the
frequencies of characters are manipulated in order to perform the necessary task.
Characters that are repeated throughout the file are given bit patterns. In this process
letters or individual characters that are repeated many times throughout a piece are
48
recoded. This type of compression makes it possible for a file to be very small during
transportation then in full text when it is recoded at the end [1,8].
Additionally there is spatial compression . In this file the redundant data
contained in a file is taken advantage. Data that is constantly repeated within a file is
replaced with a message that can accurately describe its contents. This is often used in
the compression of image files. For example, run-length encoding is commonly used to
compress redundant color pixels.
Finally, but no less important there is temporal compression. This is most
commonly used in the compression of video or audio files. This process excludes
redundant information in video and audio samples. It stores only the information
necessary and eliminates all other “useless” information. During cuts, wipes, dissolves,
and transitions the only thing transferred is the “key frame.” The key frame is the
information common in all the original data. That way unnecessary information does
not have to be transferred in all the transfers [5].
1-1-The Need For Compression
In the past, storing documents were stored on paper and kept in filing cabinets
have been very inefficient in terms of storage space and also the time taken to locate and
retrieve information when required. This traditional method of storing documents is
now being replaced by storing and accessing documents electronically through
computers.
This has enabled us to manage things more efficiently and effectively, so that
items can be located and information extracted without undue expense or inconvenience.
In terms of storage, the capacity of a storage device can be effectively increased with a
method that compresses a body of data on its way to a storage device and decompresses
it when it is retrieved [4].
In terms of communications, the bandwidth of a digital communication link can be
effectively increased by compressing data at the sending end and decompressing data at
the receiving end [7].
At any given time, the ability of the Internet to transfer data is fixed. Thus, if data
can effectively be compressed wherever possible, significant improvements of data
throughput can be achieved. Many files can be combined into one compressed document
making sending easier. In computer graphics, we are interested in reducing the size of a
block of graphics data so we can fit more information in a given physical storage space.
1-2-Data compression classification
Data compression can be divided into two main types [2,10]:
A-Lossless data compression:-
The data which have been compressed by the lossless compression methods can
be returned to the original form exactly. Lossless compression methods used for task
49
data such as texts, some kinds of pictures, signals from earthquakes and volcanoes, and
medical tomography images.
B-Lossy data compression:-
Lossy compression methods cause the loss of some information (non-task),
which normally can not be retrieved or restructure but it can get higher compression
ratio than those we get using lossless compression methods[9]. In many applications, the
shortfall when restructuring unimportant, for example when storing a talk, the exact
value of each sample of the talk is not necessary[3].
2-System design
Three algorithms (RLE , Huffman , LZW) has been used to write three
programs using visual basic language and then develop a system includes three
programs. This system can be load the different files in order to compression by one of
each method (Huffman ,LZW , RLE) and then saved.
Also it can be load the compression files in order to decompression and then saved . The
system also allow to present file content before and after compression or decompression
2-1Selected file extensions
In this research have been selected eight file extensions which is common used .
These selected file extensions listed below :-
1- txt. (Text file)
2- doc. (Text file)
3- bmp. (Image file)
4- jpg. (Image file)
5- mp3. (Audio file)
6- mpeg. (Video file)
7- ppt . ( Data file)
8-html. (Web file)
2-2Experimental results
For each file extension have been selected ten different size files and compressed
these files by using three methods (Huffman, LZW, RLE) and have been calculated the
compression ratio .Compression ratio is the ratio of the size of the original data to the
size of the compressed data.
Compression Ratio (CR) =size of original file / size of compressed file.
50
Results of compression using three methods for each ten files from the same
extension has been put in the table also contains the compression ratios calculated.
The scheme was designed to clarify the relationship between file size and the