Top Banner
Data Compressi on
69
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pbl1

Data Compression

Page 2: Pbl1

Data compression (bit-rate reduction) involves encoding information using fewer bits than the original representation.

The process of reducing the size of a data file is popularly referred to as data compression, although its formal name is source coding (coding done at the source of the data before it is stored or transmitted).[

Compression can be either lossy or lossless.  Lossless compression reduces bits by identifying and

eliminating statistical redundancy. No information is lost in lossless compression. 

Lossy compression reduces bits by identifying unnecessary information and removing it.

Data Compression

Page 3: Pbl1

The lossless data compression method essentially has two steps: Analyze the files and then eliminate the redundant data found within them.

For example, if a file compressor analyzed and eliminated all the repeated words in a document file, the result would be a document with about 60 percent fewer words. Such is the case with compressed files. Your application analyzes the file and removes all the equivalent superfluous data bits, and shrinks the overall size of the file.

However, if you attempted to read the article with the omitted words, it wouldn't make any sense. Therefore, the file compression applications insert placeholders where those eliminated words were.

When you extract the file, the application automatically restores the repeated words to their places, making the file readable. Because no data is lost, this method is called lossless compression.

Loseless

Page 4: Pbl1

Lossy data compression is the converse of lossless data compression. In these schemes, some loss of information is acceptable. Dropping nonessential detail from the data source can save storage space.

Lossy data compression schemes are informed by research on how people perceive the data in question.

For example, the human eye is more sensitive to subtle variations inluminance than it is to variations in color. 

JPEG image compression works in part by rounding off nonessential bits of information. There is a corresponding trade-off between information lost and the size reduction.

A number of popular compression formats exploit these perceptual differences, including those used in music files, images, and video.

Lossy

Page 5: Pbl1

Data compression works by finding patterns in data that occur frequently, and changing their representation to something short, so that the total amount of data is reduced without sacrificing any useful information.

For example, suppose you have a stream of data that consists of only ones and zeros, like this: 10100001010101001. And suppose that you know that this data stream usually contains a lot more zeros than ones; that is, a stream is more likely to be 100000100000101000000 than 111110111011011111.

In this case, you can develop a way of abbreviating the zeros so that they take up less space. You can define A as representing a one, B as representing a single zero, and C as representing four consecutive zeros. Now suppose you have a data stream like this:100000100100000000000001010010000

How It Work (Example)

Page 6: Pbl1

After you encode it, it will look like ACBABBACCCBABABBAC. Notice that this is shorter than the original, because your encoding method helped abbreviate long strings of consecutive zeros. This is data compression.

In order for data compression to work, the data stream must not be random. There has to be some sort of pattern in it, or you can't compress it. For example, if the stream contains ones and zeros, but there's no pattern, and neither ones nor zeroes are more common, then you can't compress the data stream, because there's nothing predictable about it.

If you want a more formal definition, data compression consists of a way of encoding a set of input messages into a set of output messages such that the most common input messages encode to the shortest output messages, and the least common input messages encode to the longest output messages. As long as the input messages are not randomly distributed, this will result in an output stream that is shorter than the input stream. It's all information theory.

Page 7: Pbl1

The objective of image compression is to reduce irrelevance and redundancy of the image data in order to be able to store or transmit data in an efficient form.

Image compression may be lossy or lossless.

IMAGE COMPRESSION

Page 8: Pbl1

Lossless compression is preferred for archival purposes and often for medical imaging, technical drawings, clip art, or comics.

Lossless compression is possible because most real-world data has statistical redundancy.

For example, an image may have areas of colour that do not change over several pixels; instead of coding "red pixel, red pixel, ..." the data may be encoded as "279 red pixels". This is a basic example of run-length encoding; there are many schemes to reduce file size by eliminating redundancy.

Lossless compression

Page 10: Pbl1

Lossy compression methods, especially when used at low bit rates, introduce compression artifacts.

Lossy methods are especially suitable for natural images such as photographs in applications where minor (sometimes imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in bit rate. The lossy compression that produces imperceptible differences may be called visually lossless.

Lossy image compression can be used in digital cameras, to increase storage capacities with minimal degradation of picture quality. Similarly, DVDs use the lossy MPEG-2 Video codec for video compression.

Lossy compression

Page 11: Pbl1

Methods for lossy compression: Reducing the color space to the most common colors in the image. The

selected colors are specified in the color palette in the header of the compressed image. Each pixel just references the index of a color in the color palette, this method can be combined with dithering to avoid posterization.

In contrast to lossless compression, which retains the integrity of the original file, the so-called lossy data compression method scans the file being compressed to determine what information the file can do without. It then eliminates those bits completely, with no method to retrieve that data.

This method is akin to taking picture with your camera phone, opening a photo-editing app, cropping off the edges of the picture and then sending it to a friend. The recipient of that message cannot restore the pixels you cropped off before you sent the image. Such is the case with lossy compression. While this is method is more effective at reducing the size of the file, you won't be able to restore the file to its original state when you extract the file on the back end of the process.

Page 12: Pbl1

What is the so-called image compression coding?

To store the image into bit-stream as compact as possible and to display the decoded image in the monitor as exact as possible.

Image Compression Coding

Page 13: Pbl1

The image file is converted into a series of binary data, which is called the bit-stream.

The decoded receives the encoded bit-stream and decoded it to reconstruct the image.

The total data quantity of the bit-stream is less than the total data quantity of the original image.

Flow of compression

Page 14: Pbl1
Page 15: Pbl1
Page 16: Pbl1

GIF – Graphics Interchange Format Compressed but do not lose any of the

original data (loseless) Limited to 256 colors Still patented in a few countries

PNG – Portable Network Graphics Up to 48 bits worth of color New graphic format

Graphic File Formats

Page 17: Pbl1

JPEG: Joint Photographic Experts Group – an international standard since 1992.

Compresses the data but can lose some of the original content (lossy).

Contains millions of colors. Works with colour and greyscale images. Up to 24 bit colour images (Unlike GIF) Target photographic quality images (Unlike

GIF) Suitable for many applications e.g.,satellite,

medical, general photography.

JPEG Image Compression

Page 18: Pbl1

Example of JPEG

Page 19: Pbl1

AUDIO COMPRESSION

Page 20: Pbl1

“Audio compression is a way to reduce the size of the audio file.”

A form of data compression designed to reduce the size of audio files

Audio compression can be lossless or lossy Audio compression algorithms are typically

referred to as audio codecs.

Audio Compression

Page 21: Pbl1

2 types of Audio Compression Lossless - allows one to preserve an exact copy

of one's audio filesUsage: For archival purposes, editing, audio quality.

Lossy - irreversible changes , achieves far greater compression, use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system.Usage: distribution of streaming audio, or interactive applications

Audio Compression (Cont.)

Page 22: Pbl1

Codecs:

Lossless Lossy

Free Lossless Audio Codec (FLAC)

MP2- MPEG-1Layer 2 audio codec

Apple Lossless MP3 – MPEG-1 Layer 3 audio codec

MPEG-4 ALS MPC Musepack

Monkey's Audio Vorbis Ogg Vorbis

Lossless Predictive Audio Compression (LPAC)

AAC Advanced Audio Coding (MPEG-2 and MPEG-4)

Lossless Transform Audio Compression (LTAC)

WMA Windows Media Audio

AC3 AC-3 or Dolby Digital A/52

Page 23: Pbl1

Motion Picture Experts Group An ISO standard for high-fidelity audio

compression. An ISO/IEC working group, established in

1988 to develop standards for digital audio and video formats.

MPEG

Page 24: Pbl1

MPEG-1- Designed for up to 1.5 Mbit/sec.- Is used to compress video and is designed for

specially for Video CD (VCD).

MPEG-2- Designed for between 1.5 and 15 Mbit/sec.- Similar to MPEG-1, but it can be used for more

applications.- Transmission rates are more than double the

transmission for MPEG-1.- Works with HDTV and DVD.

MPEG Layers

Page 25: Pbl1

MPEG-4- Designed specially for the Internet.- Provides greater audio and video interactivity

than previous MPEG versions.- It allows developers to control objects

independently in a scene.- MPEG-4 includes the capability of representing

natural and synthesized sound and also support natural textures, images, photograph, natural video and animated video.

MPEG Layers (Cont.)

Page 26: Pbl1

MPEG Encoder Architecture

Time to Frequency Mapping

Filter bank

Bit Allocation, Quantize and

coding

Psychoacousti

cModel

Bit stream formatting

Audio Input

Encoded Bit Stream

Page 27: Pbl1

MP3- The name of the file extension and also the

name of the type of file for MPEG.- A popular audio file that can be opened in

Windows Media Player and many other players.

WAV- WAV files are a format for sound files

developed by Microsoft with a .wav file extension.

Audio Compression Formats

Page 28: Pbl1

Ogg- Is an audio compression format, comparable

to other formats used to store and play digital music, but differs in that it is free, open and unpatented.

- It uses Vorbis, a specific audio compression scheme that's designed to be contained in Ogg.

Audio Compression Formats

Page 29: Pbl1

WMA- Short for Windows Media Audio - WMA is a Microsoft file format for encoding

digital audio files similar to MP3 though can compress files at a higher rate than MP3.

- WMA files, which use the ".wma" file extension, can be of any size compressed to match many different connection speeds, or bandwidths.

Audio Compression Formats

Page 30: Pbl1

Video Compression

Page 31: Pbl1

Once a video signal is digital, it requires a large amount of storage space and transmission bandwidth.

To reduce the amount of data, several strategies are employed that compress the information without negatively affecting the quality of the image.

Storing and transmitting uncompressed raw video is not an efficient technique because it needs large amounts of storage and bandwidth.

Digital Versatile Disk (DVD), DSS, and internet video, all use digital data because it take a lot of space to store and large bandwidth to transmit

Video Compression

Page 32: Pbl1

Video compression technique is used to compress the data for these applications because it less storage space and less bandwidth to transmit data.

With efficient compression techniques, a significant reduction in file size can be achieved with little or no adverse effect on the visual quality. The video quality can be affected if the file size is further lowered by raising the compression level for a given compression technique.

Videos are sequences of images displayed at a high rate. Each of these images is called a frame.

Human eye can not notice small changes in the frames such as a slight difference in color.

Typically 30 frames are displayed on the screen every second. 

Video Compression

Page 33: Pbl1

video compression standards do not require the encoding of all the details and some of the less important video details are lost because lossy compression is used due to its ability to get very high compression ratios.

less efficient during sequences of fast movement because fewer MBs in the same position from frame to frame. In fact, users may note video artifacts during these sequences if the file is over compressed.

Video Compression

Page 34: Pbl1

To accomplish this, an application known as a “codec” analyzes the video frame by frame, and breaks each frame down into square blocks known as “macro blocks.”

One macro block(MB) consists of four pixels. Typically, the codec then analyzes each frame, checking for changes in the MBs.

Areas where the MBs do not change for several frames in a row are noted and further analyzed.

If the video compression codec determines that these areas can be removed from some of the frames, it does so, thus reducing overall file size.

Video Compression

Page 35: Pbl1

Marco block (MB) and Block

Page 36: Pbl1

Intra frame ( I )-Typically about 12 frames between 1 frame-every MB of the frame is coded using spatial redundancy

Predictive frame ( P )-Encode from previous I or P reference frame-most of the MBs of the frame are coded exploiting temporal redundancy in the past

Bi-directional frames ( B )-Encode from previous and future I or P frames-most of the MBs of the frame are coded exploiting temporal redundancy in the past and in the future

Three Type of Frame

Page 37: Pbl1

Type of Frame

Page 38: Pbl1

Lossy Lossy compression reduce file size by considerably graeter

amount than lossless compression but lose both information and quantity.

The compressed file has less data in it than the original file. It can lose a relatively large amount of data before you start to

notice a difference. Lossy compression makes up for the loss in quality by producing

comparatively small files. For example, DVDs are compressed using the MPEG-2format,

which can make files 15 to 30 times smaller, but we still tend to perceive DVDs as having high-quality picture.

Video Compression

Page 39: Pbl1

Lossless Lossless compression is exactly what it sounds like, compression

where none of the information is lost. produces a less compressed file, but maintains the original quality. reducing the file size by encoding image information more

efficiently. If file size is not an issue, using lossless compression will result in

a perfect-quality picture. For example, a video editor transferring files from one computer

to another using a hard drive might choose to use lossless compression to preserve quality while he or she is working.

Video Compression

Page 40: Pbl1

Start by encoding the first frame using a still image compression method.

It should then encode each successive frame by identifying the differences between the frame and its predecessor, and encoding these differences. If the frame is very different from its predecessor it should be coded independently of any other frame.

Video compression process

Page 41: Pbl1

Intraframe Intra frame compression is a brute-force method that

often requires significantly more CPU time than inter frame, but it can achieve a better balance between file size and quality loss.

occurs within individual frames designed to minimize the duplication of data in each picture(Spatial Redundancy)

Video compression process

Page 42: Pbl1

Interframe Inter frame video compression considers frames one at

a time, seeing them only as still images. It can analyze brightness and color and search for areas that can be optimized, but it does not consider macro blocks.

compression between frames designed to minimize data redundancy in successive pictures(Temporal redundancy)

Video compression process

Page 43: Pbl1

Flow Control and Buffering Temporal Compression

-Adjacent frames highly Spatial Compression 

-Nearby pixels often correlated(as in still images) Discrete Cosine Transform (DCT) Vector Quantization (VQ) Fractal Compression Discrete Wavelet Transform (DWT).

Video Compression Techniques

Page 44: Pbl1

Example: AVI: Audio Video Interleave

-use to store audio and video data in file-formatted as .AVI

JPEG2000: Compression standard for still image-Lower latency-Type of lossless compression

MPEG2 & MPEG4: Video Compression Standard-widely used to DVD Discs and digital television broad casting-used in as encoder before transmission

Video Compression Techniques

Page 45: Pbl1

The ISO/IEC, or International Organization for Standardization and the International Electrotechnical Commission, have a group called the Moving Pictures Experts Group or MPEG. MPEG is responsible, for example, for the familiar compression formats MPEG-1, MPEG-2 and MPEG-4.

The ITU-T standardizes formats for the International Telecommunications Union, a United Nations Organization. Some popular ITU-T compression formats include the H.261 and H.264 formats.

There are other compression formats, such as Intel Indeo and RealVideo (based on the ITU-T H.263 codec). These are just as useful as the ones standardized by the international groups, although some video sharing websites won’t accept them.

There are also a few different formats to consider when exporting for the web: MPEG4 (which includes .MV4 files), MPEG2, H.264, DivX, Quicktime, Window Media Video(WMV), etc. 

It’s important not to get video compression formats mixed up with media container formats. A media container is a file format that contains data that had been compressed using a video compression format. So the media container is the end product of video.

Video Compression Formats

Page 46: Pbl1

Step 1: Add Video File

Click the +FILE button in the upper left of the program interface. Choose the video you want to convert in the Add File dialog box and press Open.

How to Compress Video

Page 47: Pbl1

Step 2: Choose the Format or Device Preset

Choose the desired video format or target mobile device from the list of presets. You can also use the Search function to quickly find the format or device you need. Next, choose the output folder for the compressed videos by clicking Browse and selecting the desired destination. By default, the output video will be saved in C:\Users\%your username%\Videos\Movavi Library.

How to Compress Video

Page 48: Pbl1

Step 3: Define Quality and Size Values

Return to the source file list and click on the value displayed in the Quality/Size column. A dialog box will open. Move the slider bar to adjust the output file size and bitrate to meet your needs. Note that the output video size value is only an estimate; the actual size of the converted video file may differ slightly Check out our detailed article for other ways to reduce video size.

How to Compress Video

Page 49: Pbl1

How to Compress Video

Step 4: Start the Video Compression

Press the Convert button to start the compression process. After the operation is complete, the output folder with the converted video will open automatically.

Page 50: Pbl1

Data Representation

Alternative approaches to the data representation problem:

1. ASN.12. XDR3. MIME

Page 51: Pbl1

ASN.1Abstract Syntax Notation One (ASN.1) is a standard and notation that describes rules and structures for representing, encoding, transmitting, and decoding data in telecommunications and computer networking.

Page 52: Pbl1

The notation provides a certain number of pre-defined basic types such as:

integers (INTEGER), booleans (BOOLEAN), character strings (IA5String, UniversalString...), bit strings (BIT STRING), etc.,

and makes it possible to define constructed types such as: structures (SEQUENCE), lists (SEQUENCE OF), choice between types (CHOICE), etc.

Page 53: Pbl1

ASN.1 sends information in any form anywhere it needs to be communicated digitally. ASN.1 only covers the structural aspects of information there are no operators to handle the values once these are defined or to make calculations with. Therefore it is not a programming language.

One of the main reasons for the success of ASN.1 is that this notation is associated with several standardized encoding rules such as the BER (Basic Encoding Rules), or more recently the PER (Packed Encoding Rules), which prove useful for applications that undergo restrictions in terms of bandwidth.

Encoding rules describe how the values defined in ASN.1 should be encoded for transmission regardless of machine, programming language, or how it is represented in an application program.

ASN.1's encodings are more streamlined than many competing notations, enabling rapid and reliable transmission of extensible messages, this is an advantage for wireless broadband.

Because ASN.1 has been an international standard since 1984, its encoding rules are mature and have a long track record of reliability and interoperability.

ASN.1 is widely used in industry sectors where efficient (low-bandwidth, low-transaction-cost) computer communications are needed.

What is ANS.1

Page 54: Pbl1

The standard ASN.1 encoding rules include: Basic Encoding Rules (BER) Canonical Encoding Rules (CER) Distinguished Encoding Rules (DER) XML Encoding Rules (XER) Canonical XML Encoding Rules (CXER) Extended XML Encoding Rules (E-XER) Packed Encoding Rules (PER, unaligned:

UPER, canonical: CPER) Generic String Encoding Rules (GSER)

Page 55: Pbl1

ASN.1's abstract syntax is similar in form to that of any high level programming language.

For example, consider the following C structure:

struct Student {

char name[50]; /* ``Foo Bar'' */

int grad; /* Grad student? (yes/no) */

float gpa; /* 1.1 */

int id; /* 1234567890 */

char bday[8]; /* mm/dd/yy */

}

Its ASN.1 counterpart is:

Student ::= SEQUENCE {

name OCTET STRING, -- 50 characters

grad BOOLEAN, -- comments preceded

gpa REAL, -- by ``--''

id INTEGER,

bday OCTET STRING -- birthday

}

ASN.1’s syntax

Page 56: Pbl1

ASN.1 has been adopted in the communications protocol specification of

Telecommunications, including 3GPP mobile phones Intelligent Transport Systems ITS Internet voice communications technology in the VoIP Multimedia standards Security-related systems, including smart-cards and certificates - the

basis for e-commerce Embedded systems communications Air traffic control

Why ASN.1?

Page 57: Pbl1

The eXternal Data Representation (XDR) is a standard for the description and encoding of data. XDR uses a language to describe data formats, but the language is used only for describing data and is not a programming language. Protocols such as Remote Procedure Call (RPC) and the Network File System (NFS) use XDR to describe their data formats.

XDR is an alternative to ASN.1. XDR is much simpler than ASN.1, but less powerful. For instance:◦ XDR uses implicit typing. Communicating peers must know the type of

any exchanged data. In contrast, ASN.1 uses explicit typing; it includes type information as part of the transfer syntax.

◦ In XDR, all data is transferred in units of 4 bytes. Numbers are transferred in network order, most significant byte first.

◦ Strings consist of a 4 byte length, followed by the data (and perhaps padding in the last byte). Contrast this with ASN.1.

◦ Defined types include: integer, enumeration, boolean, floating point, fixed length array, structures, plus others.

One advantage that XDR has over ASN.1 is that current implementations of ASN.1 execute significantly slower than XDR.

Sun's XDR

Page 58: Pbl1

there is a user named "john" who wants to store his lisp program "sillyprog" that contains just the data "(quit)". His file would be encoded as follows:

OFFSET HEX BYTES ASCII COMMENTS ------ -- ------------- ------- ----------------

0 00 00 00 09 .... -- length of filename = 9 4 73 69 6c 6c sill -- filename characters 8 79 70 72 6f ypro -- ... and more characters ... 12 67 00 00 00 g... -- ... and 3 zero-bytes of fill 16 00 00 00 02 .... -- filekind is EXEC = 2 20 00 00 00 04 .... -- length of interpretor = 4 24 6c 69 73 70 lisp -- interpretor characters 28 00 00 00 04 .... -- length of owner = 4 32 6a 6f 68 6e john -- owner characters 36 00 00 00 06 .... -- length of file data = 6 40 28 71 75 69 (qui -- file data bytes ... 44 74 29 00 00 t).. -- ... and 2 zero-bytes of fill

Example of an XDR Data Description

Page 59: Pbl1

MIME (Multipurpose Internet Mail Extensions) is a standard in order to expand upon the limited capabilities of email, and in particular to allow documents (such as images, sound, and text) to be inserted in a message.

MIME

Page 60: Pbl1

MIME adds the following features to email service: Be able to send multiple attachments with a

single message; Unlimited message length; Use of character sets other than ASCII code; Use of rich text (layouts, fonts, colors, etc) Binary attachments (executable, images,

audio or video files, etc.), which may be divided if needed.

Page 61: Pbl1

MIME uses special header directives to describe the format used in a message body, so that the email client can interpret it correctly: MIME-Version: This is the version of the MIME standard used

in the message. Currently only version 1.0 exists. Content-type: Describes the data's type and subtype. It can

include a "charset" parameter, separated by a semi-colon, defining which character set to use.

Content-Transfer-Encoding: Defines the encoding used in the message body

Content-ID: Represents a unique identification for each message segment

Content-Description: Gives additional information about the message content.

Content-Disposition: Defines the attachment's settings, in particular the name associated with the file, using the attribute filename.

Page 62: Pbl1

Example of MIME header

Page 63: Pbl1

Encryption

Page 64: Pbl1

Introduction Encryption is a method used to enhance the

security of a file or message by scrambling the contents so that it can be read only by someone who has the right key to unscramble it. For example, the information used for transaction such as purchasing online (e.g address, phone number, and credit card number) is usually encrypted to help keep it safe.

Page 65: Pbl1

How it works

Page 66: Pbl1

Encryption/Decryption Keys Symmetric keys- only one, same key used

to encrypt and decrypt information transmitted.

Page 67: Pbl1

Encryption/Decryption Keys cont Asymmetric keys- use receiver’s public key

to encrypt and receiver’s private key to decrypt.

Page 68: Pbl1

Advantages Preserve confidentiality of the file or

message. Save money on extra protection software as

the machine that uses the encrypted message does not have to be secured.

Page 69: Pbl1

Disadvantages If the key to unlock the encrypted file is lost

then the data is no longer protected and could also be lost.

Overall performance of the machine that use the data will decrease since it takes a lot of energy, processing and computer power to do the encryption process.

Difficult to use the encrypted message as some limitations have been placed on it.