Top Banner
1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering
28

1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

1

Data CompressionEngineering Math Physics (EMP)

Steve Lyon

Electrical Engineering

Page 2: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

2

Why Compress?

• Digital information is represented in bits– Text: characters (each encoded as a number)– Audio: sound samples– Image: pixels

• More bits means more resources– Storage (e.g., memory or disk space)– Bandwidth (e.g., time to transmit over a link)

• Compression reduces the number of bits– Use less storage space (or store more items)– Use less bandwidth (or transmit faster)– Cost is increased processing time/CPU hardware

Page 3: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

• Video– TV – 640x480 pixels (ideal US broadcast TV)– 3 colors/pixel (Red, Green, Blue)– 1 byte (values from 0 to 255) for each color ~900,000 bytes per picture (frame)– 30 frames/second ~27MB/sec– DVD holds ~5 GB Can store ~3 minutes of uncompressed video on a DVD

Must compress

3

Do we really need to compress?

Page 4: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

4

Compression Pipeline

• Sender and receiver must agree– Sender/writer compresses the raw data– Receiver/reader un-compresses the compressed data

• Example: digital photography

compress uncompress

compress uncompress

Page 5: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

5

Two Kinds of Compression

• Lossless– Only exploits redundancy in the data– So, the data can be reconstructed exactly– Necessary for most text documents (e.g., legal

documents, computer programs, and books)

• Lossy– Exploits both data redundancy and human perception– So, some of the information is lost forever– Acceptable for digital audio, images, and video

Page 6: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

6

Lossless: Huffman Encoding

• Normal encoding of text– Fixed number of bits for each character

• ASCII with seven bits for each character– Allows representation of 27=128 characters– Use 97 for ‘a’, 98 for ‘b’, …, 122 for ‘z’

• But, some characters occur more often than others– Letter ‘a’ occurs much more often than ‘x’

• Idea: assign fewer bits to more-popular symbols – Encode ‘a’ as “000”– Encode ‘x’ as “11010111”

Page 7: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

7

Lossless: Huffman Encoding

• Challenge: generating an efficient encoding– Smaller codes for popular characters– Longer codes for unpopular characters

English Text: frequency distribution

Morse code

Page 8: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

8

Lossless: Run-Length Encoding

• Sometimes the same symbol repeats– Such as “eeeeeee” or “eeeeetnnnnnn”– That is, a run of “e” symbols or a run of “n” symbols

• Idea: capture the symbol only once– Count the number of times the symbol occurs– Record the symbol and the number of occurrences

• Examples– So, “eeeeeee” becomes “@e7”– So, “eeeeetnnnnnn” becomes “@e5t@n6”

• Useful for fax machines– Lots of white, separate by occasional black

Page 9: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

9

Image Compression

• Benefits of reducing the size– Consume less storage space and network bandwidth– Reduce the time to load, store, and transmit the image

• Redundancy in the image– Neighboring pixels often the

same, or at least similar– E.g., the blue sky

• Human perception factors– Human eye is not sensitive

to high spatial frequencies

Page 10: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

Approximating arbitrary functions (curves)

How can we represent some

arbitrary function by some

simple ones?

Ex. This mountain range

Page 11: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

Approximating with a sum of cosines

n=1½ wavelength2 ½ wavelengths

n=5n=15

7 ½ wavelengthsconstantn=0

Page 12: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

Approximation with 5 terms

Page 13: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

Approximation with 15 terms

Page 14: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

Approximation with 45 terms

Page 15: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

Approximation with 145 terms

Page 16: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

16

Discrete cosine transform

• How do we determine the coefficients of each term? – How much of “3 wavelengths” vs. “47 wavelengths”?– Look at the fit and tweak the coefficients?

– Maybe for a couple– Insane for 145

• Idea: look at

• = 0 if n m• = /2 if n = m (or if n = m = 0)

• So, if

• Then

0

)cos()cos( dmn

)cos()(0

nxfxfn

n

dxxnxffn )cos()(2

0

Page 17: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

Often, most of the information is in the first few fn

low frequencies

Ex. “filter” and keep only the low frequencies

compression

Can manipulate the Fourier coefficients (fn)

17

Page 18: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

Periodic functions

18

Produces a periodic curve:

Cosine transforms particularly good forrepresenting periodic signals

- Like sound (music)

Page 19: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

19

Example: Digital Audio

• Sampling the analog signal– Sample at some fixed rate – Each sample is an arbitrary real number

• Quantizing each sample– Round each sample to one of a finite number of values– Represent each sample in a fixed number of bits

4 bit representation(values 0-15)

Page 20: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

20

Example: Digital Audio

• Speech– Sampling rate: 8000 samples/second– Sample size: 8 bits per sample– Rate: 64 kbps

• Compact Disc (CD)– Sampling rate: 44,100 samples/second– Sample size: 16 bits per sample– Rate: 705.6 kbps for mono,

1.411 Mbps for stereo

Page 21: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

21

Example: Digital Audio

• Audio data requires too much bandwidth – Speech: 64 kbps is too high for a dial-up modem user– Stereo music: 1.411 Mbps exceeds most access rates

• Compression to reduce the size– Remove redundancy– Remove details that humans tend not to perceive

• Example audio formats– Speech: GSM (13 kbps), G.729 (8 kbps), and G.723.3

(6.4 and 5.3 kbps)– Stereo music: MPEG 1 layer 3 (MP3) at 96 kbps, 128

kbps, and 160 kbps

Page 22: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

22

108 KB 34 KB

8 KB

Joint Photographic Experts Group (JPEG)

Lossy compression

Page 23: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

23

Contrast Sensitivity Curve

Page 24: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

• Digital cameras (CCDs) output RGB– Eyes most sensitive to intensity– Less sensitive to color variations

• Convert image to YCbCr– Y = intensity ~ (R+G+B)

Gives black & white B&W TV’s could use that when color TV first came out

– Cb ~ (B – Y)– Cr ~ (R – Y)

• Sometimes leave as RGB – gives poorer quality jpeg 24

How JPEG works 1

Page 25: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

25

How JPEG works 2• Either RGB or YCbCr gives 3 8-bit “planes”

– Process separately

• Process image in 8-pixel x 8-pixel blocks– 2-dimensional discrete Fourier Transform (DCT) N1 = N2 = 8

– Just matrix multiplication– Produces 8x8 matrix (B) of spatial frequencies– “Quantize” divide each element by fixed number

High-frequency coefficients divided by larger number If result is small, set to 0 (the lossy part) Can be “lossier” on Cb and Cr than on Y

• Lossless compression to squeeze out the 0’s

Page 26: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

26

151 -9 -1 0

5 0 0 0

1 0 0 0

0 0 0 0

62 65 72 42

45 60 66 51

58 52 60 42

82 65 76 53

452 -60 -7 2

32 2 -8 3

-20 7 1 0

5 -1 0 5

3 7 11 15

7 11 15 19

11 15 19 23

15 19 23 27

Block of pixels (really 8 by 8) 2D DCT of Block

Quantization Matrix(accentuate the low frequencies)

Quantized Pixel Matrix

2D DiscreteCosine

Transform(DCT)

Division andRounding

Lowfrequency

Highfrequency

Page 27: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

JPEG Artifacts

27

JPEG does not compress text or diagrams well. Here same file size as lossless compression – gif Get “halos” around letters, lines, etc.

Lines and text have sharp edges JPEG smears

Get “blotchy” appearance when heavily compressed

Have 8x8 blocks of all one color – only constant term in DCT remained

Page 28: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.

28

Conclusion

• “Raw” digital information often has many more bits than necessary– Redundancies and patterns we can use– Information that is imperceptible to people

• Lossless compression– Used when must be able to exactly recreate original– Find common patterns (letter frequencies, repeats, etc.)

• Lossy Compression– Can get very large compression ratios – a few to 1000’s– Exploit redundancy and human perception

Remove information we (people) don’t need

– Too much compression degrades the signals