Top Banner
Computer Science 335 Data Compression
27

Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Computer Science 335

Data Compression

Page 2: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Compression .. Magic or Science

• Only works when a MORE EFFICIENT means of encoding can be found

• Special assumptions must be made about the data in many cases in order to gain compression benefits

• “Compression” can lead to larger files if the data does not conform to assumptions

Page 3: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Why compress?

• In files on a disk– save disk space

• In internet access– reduce wait time

• In a general queueing system– keep paybacks can be more than linear if

operation is nearing or in saturation

Page 4: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

A typical queueing graph

Load

Delay

25% load decrease

66 % delaydecrease

Page 5: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Example

• Ascii characters require 7 bits

• Data may not use ALL characters in Ascii set– consider just digits 0..9

• Only 10 values -> really only requires 4 bits

• There is actually a well used code for this which also allows for +/- -> BCD

Page 6: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Other Approaches

Page 7: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Run length encoding• Preface each run with a 8-bit length byte • aaabbabbbccdddaaaa -> 18 bytes• 3a2b1a3b2c3d4a -> 14 bytes

• benefit from runs of 3 or more– aaa versus 3a

• No gain or loss – aa versus 2a

• lose in single characters– a versus 1a

Page 8: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

• Facsimile Compression (example of run-length encoding) – Example of application of run-length encoding.– Decomposed into black/white pixels– Lots of long runs of black and white pixels– Don’t encode each pixel but runs of pixels

Page 9: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Differential encoding

• values between 1000 and 1050– 1050 requires 11 bits– difference plus +/- requires 7 bits

• 6 bits -> 64• 1 additional bit for direction (+/-)

• Differential encoding can lead to problems as each value is relative to the last value.– Like directions, one wrong turn and everything

else is irrelevant.

Page 10: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Frequency Based Encoding

• Huffman– Encoding is not the same length for all values

– Short codes for frequently occurring symbols

– Longer codes for infrequently occurring

• Arithmetic (not responsible for this)– Interpret a string as a real number

– Infinite number of values between 0 and 1

– Divide region up based on frequency

– A ->12% and B 5%, A is 0 to 0.12 and B 0.12 to 0.17

– Limit based on the fact that computer has limited precision

Page 11: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Huffman(more details)

Page 12: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Huffman encoding

• Must know distribution of symbols

• Symbols typically have DIFFERENT lengths unlike most schemes you have seen (Ascii, etc)

• Characters occurring most have shortest code

• Characters occurring least have longest

• Solution minimal but not unique

Page 13: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Assume following data

A -> 30%B-> 20%C-> 10 %D-> 5%E-> 35%

Page 14: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Lets peek at the answer

E

A

B

C D

0 1

1

1

1

0

0

0

Note that you read the encodingof a character from top to bottom.

For example C is 0110.

Also note that choice of 0 or 1for a branch is arbitrary.

Page 15: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Build the solution treeChoose the smallest two at a time and group

A

B

C

D

E

30

20

10

5

35

15

35

65

This could choose E and A instead!

100

E

A

B

C D

0 1

1

1

1

0

0

0

Page 16: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

And the binary encoding..

E

A

B

C D

0 1

1

1

1

0

0

0

A 00B 010C 0110D 0111E 1

Page 17: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Compute expected length

A 00B 010C 0110D 0111E 1

A -> 30%B-> 20%C-> 10 %D-> 5%E-> 35%

Expected Bits Per Character.3*2 +.2*3 +.1*4 +.05*4 +.35*1

=

.6+

.6+

.4+

.2+

.35

= 2.15

Each symbol has average length of 2.15 bits

You would have assumed 5 values -> 3 bits

Page 18: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Is it hard to interpret a message?

A 00B 010C 0110D 0111E 1

Message Example:

0 0 1 0 1 0 0 0 0 1 1 1

A E B A D

NOT REALLY!

What if last 1 in message was missing? -> illegalWhile message is not ambiguous, illegal message are possible

Page 19: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Observations of Huffman

• Method creates a shorter code

• Assumes knowledge of symbol distribution

• Different symbols .. Different length

• Knowing distribution ahead of time is not always possible!

• Another version of Huffman coding can solve that problem

Page 20: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Revisiting Facsimiles

• Huffman says one can minimize by assigning different length codes to symbols

• Fax transmissions can use this principle to give short messages to long runs of white/black pixels/

• Run-length combined with Huffman

• See Table 5.7 in the text

Page 21: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Table 5.7

TERMINATINGLength White Black 0 00110101 000110111 1 000111 010 2 0111 11 3 1000 10

MAKEUPLength White Black 64 11011 000001111 128 10011 000011001000 256 0110111 000001011011

Example:

66 white -> 64 + 2 = 110110111

Page 22: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Multimedia compression•Many of these include techniques that result in “lossy” compression. Uncompressing results in loss of information.•This loss is tolerated because the inaccuracy is only perceivable based on human perception

•Video•Pictures•Audio

•Compression ratios of other techniques result in 2-3:1•Compression in multimedia need 10-20:1•Compression rates achieved by lossy techniques -> tradeoff•Techniques

•JPEG – pictures•MPEG – motion pictures•MP3 - music

Page 23: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

Image compression• Represented as RGB

– 8 bits typical for each color

• Or as Luminance (brightness 8 bits) and Chrominance (color 16 bits)

• Perception of color by humans reacts significantly to light in addition to color

• Really two ways to represent the same thing

Y = 0.30R + 0.59G + 0.11 B (luminance)I = 0.60R - 0.28G – 0.32B (color)Q = 0.21R – 0.52G + 0.31B (color)

Page 24: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

JPEG

Image -> DCTPhase

-> QuantizationPhase

-> EncodingPhase

-> CompressedImage

So how does this work?

Page 25: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

JPEG algorithm

1. Consider 8x8 blocks at a time2. Create 3 8x8 arrays with color values of each pixel(for RGB)3. Now go through a complex transformation (theory beyond us)4. When you finish the transformation the numbers in upper left

indicate little variation in color in the block, values furtheraway from [0,0] indicate large color variation- see fig 5.10 top one with small variation, bottom with large

5. Simplify the numbers in the result (eliminate small values) by dividing by an integer and then truncating.- see Eqn 5-5- value is different for each term and application dependent

6. Use encoding (run-length) and odd pattern (Fig 5.11) to compress

I don’t expect you to do this on a test, but it shows how JPEG is lossy.

Page 26: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

MPEG

• Uses differential encoding to compare successive frames of a motion picture.

• Three kinds of frames:– I -> JPEG complete image

– P -> incremental change to I (where block moves)– ½ size I

– B -> use a different interpolation technique– ¼ size I

• Typical sequence -> I B B P B B I ….

Page 27: Computer Science 335 Data Compression. Compression.. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions.

MP3• Music/audio compression• Uses psychoacoustic principles

– Some sounds can’t be heard because they are drowned by other louder sounds (freqs)

• Divide the sound into smaller subbands• Eliminate sounds you can’t hear anyway because

others are too loud.• 3 types with varying compression

– Layer 1 4:1 192K

– Layer 2 8:1 128K

– Layer 3 12:1 64K