Top Banner
Chapter 3 Data Representation
68

Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

Mar 28, 2015

Download

Documents

Aniya Broadnax
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

Chapter 3

Data Representation

Page 2: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

2

Data and Computers

Computers are multimedia devices, dealing with many categories of information.Computers store, present, and help modify:

Numbers Text Audio Images and graphics Video

Page 3: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

3

Analog and Digital Information Computers are finite. Computer memory and

other hardware devices have only so much room to store and manipulate a certain amount of data. The goal of data representation is to represent enough of the world to satisfy our computational needs and our senses of sight and sound.

Page 4: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

4

Analog and Digital Information Information can be represented in one of two ways:

analog or digital.

Analog data: A continuous representation, analogous to the actual information it represents.

Digital data: A series of discrete representations, breaking the information up into separate elements.

Page 5: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

5

Analog and Digital InformationA mercury thermometer exemplifies analog data as it continually rises and falls in direct proportion to the temperature.

Digital displays only show discrete information.

Page 6: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

6

Analog and Digital Information Computers cannot work well with analog information, so we

digitize it by sampling it at discrete intervals and representing each interval by a numeric value.

Page 7: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

7

Electronic Signals

An analog signal continually fluctuates up and down in voltage. But a digital signal has only a high or low state, corresponding to the two binary digits.

An analog and a digital signal

Page 8: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

8

Electronic Signals

All electronic signals (both analog and digital) degrade as they move down a line. That is, the voltage of the signal fluctuates due to environmental effects.

Degradation of analog and digital signals

Page 9: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

9

Electronic Signals (Cont’d)

Even when it has deteriorated, it is possible to distinguish the 2 states of a digital signal by comparison to the threshold.

Periodically, a digital signal can be reclocked to regain its original shape.

No such process is available for analog signals.

Page 10: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

10

Representing Text

Page 11: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

11

Representing Text

To represent a text document in digital form, we need to be able to represent every possible character that may appear.

There is a finite number of characters to represent, so the general approach is to list them all and assign each a binary string.

Page 12: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

12

Representing Text

A character set is a list of characters and the codes used to represent each one.

In 1960, a survey revealed 60 different characters sets in use. At IBM alone there were 9 different sets.

By agreeing to use ONE particular character set, computer manufacturers have made the processing of text data easier.

Page 13: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

13

The ASCII Character Set

ASCII stands for American Standard Code for Information Interchange.

The ASCII character set originally used seven bits to represent each character, allowing for 128 unique characters.

Wikipedia has an excellent entry on ASCII.

Page 14: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

14

The ASCII Character Set (7 bit)

Page 15: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

15

The ASCII Character Set

Notice the organisation of the ASCII table. The table divides in half according to the MSB.

Letters are all in the second half so all codes for alphabetic characters start with 1. This second half of the table divides in half again according to

the next bit: UPPERCASE letters start 10. lowercase letters start 11.

The first half of the table also divides in half according to the next bit: Control characters start 00. Numerals and punctuation start 01.

Page 16: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

16

The ASCII Character Set

Note that control characters (the first 32 in the ASCII character set) do not have simple character representations that you could print to the screen.

Many, however, perform actions with which you are familiar.

Some have there own keys, others need to be constructed.

Page 17: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

17

The ASCII Character SetControl CharactersControl sequences are created by holding the Ctrl key

(control) and pressing a letter.

This has the effect of subtracting 64 from the ASCII value of the letter pressed.

For example: ‘M’ has ASCII value 77

(1001101 in binary), Ctrl-M has ASCII value 13

(0001101 in binary).

Alternately, we can see this as “masking bit 6.”

Page 18: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

18

The ASCII Character SetCommon Control Characters

Hex Binary Decimal Name Function

00 0000000 0 NUL Null

07 0000111 7 BEL Bell

08 0001000 8 BS Backspace

09 0001001 9 HT Horizontal Tab

0A 0001010 10 LF Line Feed

0B 0001011 11 VT Vertical Tab

0C 0001100 12 FF Form Feed

0D 0001101 13 CR Carriage Return

0E 0001110 14 SO Shift Out

0F 0001111 15 SI Shift In

1B 0011011 29 ESC Escape

Page 19: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

19

The ASCII Character Set

Coding letters in ASCII is easy.

Let’s look at ‘j’ as an example:

Since ‘j’ is a letter, its code starts with a 1.

Since it’s lowercase, the next bit is also a 1.

Since it’s the tenth letter of the alphabet the rest of the code is 01010.

The complete ASCII code for ‘j’ is 1101010.

Page 20: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

20

The ASCII Character Set

ASCII evolved so that eight bits are used. The 7-bit codes were simply prefixed with

another bit, giving another natural doubling. The original 7-bit codes were padded with 0.

So the code for ‘j’ became 01101010. 128 new characters were added.

The codes for this alternate character set start with 1.

Page 21: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

21

The Unicode Character Set

Even the extended version of the ASCII character set is not enough for international use.

The Unicode character set uses 16 bits per character. The Unicode character set can represent 216, or over 65 thousand characters.

Unicode was designed to be a superset of ASCII. That is, the first 256 characters in the Unicode character set correspond exactly to the extended ASCII character set.

Page 22: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

22

Examples of Unicode Characters

Figure 3.6 A few characters in the Unicode character set

Page 23: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

23

Compressing Files

Page 24: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

24

Data Compression

It is important that we find ways to store and transmit data efficiently, which leads computer scientists to find ways to compress it.

Data compression is a reduction in the amount of space needed to store a piece of data.

Compression ratio is the size of the compressed data divided by the size of the original data.

Page 25: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

25

Data Compression A data compression technique can be

lossless, which means the data can be retrieved without any loss of the original information,

lossy, which means some information may be lost in the process of compaction.

As examples, consider these 3 techniques: keyword encoding run-length encoding Huffman encoding

Page 26: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

26

Keyword Encoding

Frequently used words are replaced with a single character.For example…

Note, that the characters used to encode cannot be part of the original text.

Page 27: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

27

Keyword Encoding

Consider the following paragraph,The human body is composed of many independent systems, such as the circulatory system, the respiratory system, and the reproductive system. Not only must all systems work independently, they must interact and cooperate as well. Overall health is a function of the well-being of separate systems, as well as how these separate systems work in concert.

Page 28: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

28

Keyword Encoding

This version highlights the words that can be replaced.

The human body is composed of many independent systems, such as the circulatory system, the respiratory system, and the reproductive system. Not only must each system work independently, they must interact and cooperate as well. Overall health is a function of the well-being of separate systems, as well as how those separate systems work in concert.

Page 29: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

29

Keyword Encoding

This is the encoded paragraph:The human body is composed of many independent systems, such ^ ~ circulatory system, ~ respiratory system, + ~ reproductive system. Not only & each system work independently, they & interact + cooperate ^ %. Overall health is a function of ~ %- being of separate systems, ^ % ^ how # separate systems work in concert.

Page 30: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

30

Keyword Encoding There are a total of 349 characters in the

original paragraph including spaces and punctuation.

The encoded paragraph contains 314 characters, resulting in a savings of 35 characters.

The compression ratio for this example is 314/349 or approximately 0.9.

Page 31: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

31

Keyword Encoding A compression ratio of .9 (90%) is NOT very

good. The compressed file is 90% the size of the original.

However, there are several ways this can be improved. Can you think of some?

Page 32: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

32

Run-Length Encoding A single character may be repeated over and

over again in a long sequence. This type of repetition doesn’t generally take place in English text, but often occurs in large data streams.

In run-length encoding, a sequence of repeated characters is replaced by: a flag character, followed by the repeated character, followed by a single digit that indicates how many

times the character is repeated.

Page 33: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

33

Run-Length EncodingSome examples:

AAAAAAAwould be encoded as

*A7

*n5*x9ccc*h6 some other text *k8eeecan be decoded into the following original text:

nnnnnxxxxxxxxxccchhhhhh some other text kkkkkkkkeee

Page 34: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

34

Run-Length Encoding In the second example, the original text

contains 51 characters, and the encoded string contains 35 characters, giving us a compression ratio of 35/51 or approximately 0.68.

Since we are using one character for the repetition count, it seems that we can’t encode repetition lengths greater than nine. However, instead of interpreting the count character as an ASCII digit, we could interpret it as a binary number.

Page 35: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

35

Huffman Encoding

Why should the blank, which is used very frequently, take up the same number of bits as the character “X”, which is seldom used in text?

Huffman codes use variable-length bit strings to represent each character.

A few characters may be represented by five bits, and another few by six bits, and yet another few by seven bits, and so forth.

Page 36: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

36

Huffman Encoding

If we use only a few bits to represent characters that appear often and reserve longer bit strings for characters that don’t appear often, the overall size of the document being represented will be smaller.

Page 37: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

37

Huffman Encoding

An example of a Huffman alphabet

Page 38: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

38

Huffman Encoding

DOORBELL would be encoded in binary as 1011110110111101001100100.

If we used a fixed-size bit string to represent each character (say, 8 bits), then the binary from of the original string would be 64 bits.

The Huffman encoding for that string is 25 bits long, giving a compression ratio of 25/64, or approximately 0.39.

An important characteristic of any Huffman encoding is that no bit string used to represent a character is the prefix of any other bit string used to represent a character.

Page 39: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

39

Representing Audio Data

Page 40: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

40

Representing Audio Information We perceive sound when a series of air

compressions vibrate a membrane in our ear, which sends signals to our brain.

A stereo sends an electrical signal to a speaker to produce sound. This signal is an analog representation of the sound wave. The voltage in the signal varies in direct proportion to the sound wave.

Page 41: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

41

Representing Audio Information

To digitize the signal we periodically measure the voltage of the signal and record the appropriate numeric value. The process is called sampling.

In general, a sampling rate of around 40,000 times per second is enough to create a reasonable sound reproduction.

The standard sampling rate for CDs is 44.1 kHz. The Pro Audio standard is 48 kHz.

Page 42: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

42

Representing Audio Information

Figure 3.8 Sampling an audio signal

Page 43: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

43

Representing Audio InformationIt should be noted that the potential loss of

peak values suggested in the previous slide is a myth. The time lapse between samples is much too short for any such loss.

The human ear hears sounds between 20 Hz and 20,000 Hz. Sampling at twice this frequency (44,000+) eliminates any potential loss of data.

For a complete explanation refer to the Nyquist–Shannon sampling theorem.

Page 44: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

44

Representing Audio Information

A compact disk (CD) stores audio information digitally. On the surface of the CD are microscopic pits that represent binary digits. A low intensity laser is pointed at the disc. The laser light reflects strongly if the surface is smooth and reflects poorly if the surface is pitted.

Page 45: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

45

Representing Audio Information

Figure 3.9 A CD player reading binary information

Page 46: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

46

Audio Formats

Audio Formats WAV, AU, AIFF, VQF, and MP3.

MP3 is dominant MP3 is short for MPEG-2, audio layer 3 file. MP3 employs both lossy and lossless compression.

First it analyses the frequency spread and compares it to mathematical models of human psychoacoustics (the study of the interrelation between the ear and the brain), and it discards information that can’t be heard by humans.

Then the bit stream is compressed using a form of Huffman encoding to achieve additional compression.

Page 47: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

47

Representing Graphic Images

Page 48: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

48

Representing Images and Graphics Colour is our perception of the various

frequencies of light that reach the retinas of our eyes.

Our retinas have three types of colour photoreceptor cones which respond to different sets of frequencies. These photoreceptor categories correspond to the colours of red, green, and blue.

Page 49: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

49

Representing Images and Graphics Colour is often expressed in a computer as

an RGB (red, green, blue) value, which is actually three numbers that indicate the relative contribution of each of these three primary colours.

For example, an RGB value of (255, 255, 0) maximizes the contribution of red and green, and minimizes the contribution of blue. The resulting colour is a bright yellow.

Page 50: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

50

Representing Images and Graphics

Figure 3.10 Three-dimensional color space

Page 51: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

51

Representing Images and Graphics

Page 52: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

52

Representing Images and Graphics The amount of data that is used to represent

a colour is called the colour depth. HiColor is a term that indicates a 16-bit colour

depth. Five bits are used for each number in an RGB value and the extra bit is sometimes used to represent transparency.

TrueColor indicates a 24-bit colour depth. Therefore, each number in an RGB value gets eight bits.

Page 53: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

53

Representing Images and Graphics HiColor uses 5 bits for each number.

Since 25 = 32, there are 32 different levels for each of the 3 primary colours. So there are 323 (or 215) possible colours.

This is a total of 32,768 different colours. TrueColor uses eight bits for each colour component.

28* 28* 28 = 224 or 16,777,216 colours. Some monitors can use as many as 32 bits for colour

depth. This is potentially 4,294,967,296 colours!

Page 54: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

54

Representing Images and Graphics The human eye is able to distinguish about

200 intensity levels in each of the three primaries red, green, and blue. All in all, up to 10 million different colours can be distinguished.

So modern monitors are examples of solutions without a problem. If the human eye can distinguish only 10 million

colours, why develop monitors that can display over 4 billion?

Page 55: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

55

Indexed Colour A particular application such as a browser

may support only a certain number of specific colours, creating a palette from which to choose. For example, Netscape Navigator’s colour palette has only 216 colours.

Page 56: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

56

Digitized Images and Graphics Digitizing a picture is the act of representing it

as a collection of individual dots, called pixels.

The number of pixels used to represent an image is called the resolution.

As an example, the resolution of many monitors is 1024 X 768, or 786,432 pixels.

If the colour of each pixel is stored as 24 bits (3 bytes) of data, the screen alone requires 2,359,296 bytes (2 megabytes) of memory.

Page 57: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

57

Digitized Images and Graphics

Figure 3.12 A digitized picture composed of few individual pixels

Page 58: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

58

Digitized Images and Graphics

Figure 3.12 A digitized picture composed of many individual pixels

Page 59: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

59

Digitized Images and Graphics The storage of image information on a pixel-

by-pixel basis is called a raster-graphics format.

There are several popular raster file formats including: BMP (bitmap) GIF (Graphics Interchange Format) JPEG (Joint Photographic Experts Group)

Page 60: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

60

Vector Graphics

Instead of assigning colours to pixels as we do in raster graphics, a vector-graphics format describes an image in terms of lines and geometric shapes.

A vector graphic is a series of commands that describe a line’s direction, thickness, and colour. The file size for these formats tends to be small because every pixel does not need to be represented.

Page 61: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

61

Vector Graphics

Vector graphics can be resized mathematically, and these changes can be calculated dynamically as needed.

This makes them particularly useful for defining scalable fonts.

However, vector graphics is not a good technique for representing real-world images.

Page 62: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

62

Representing Video Data

Page 63: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

63

Representing Video

A video codec (COmpressor/DECompressor) refers to the methods used to shrink the size of a movie to allow it to be played on a computer or over a network.

Almost all video codecs use lossy compression to minimize the huge amounts of data associated with video.

Page 64: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

64

Representing Video

To simulate motion, movies need to record (and play back) at least 12 frames per second.

However, good sound quality requires 24 frames/s.

24 frames/s = 1440 frames/minute= 46400 frames/hour

Page 65: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

65

Representing Video

Recall…

If each frame has a resolution of 1024 x 768*

there are 786,432 pixels in a frame.If the colour of each pixel is stored as 24 bits (3 bytes) of data, one

frame alone requires 2,359,296 bytes (2 MB) of memory.An hour of film then, requires 203,843,174,400 bytes (194,400 MB

– more than 190 Gigabytes) of storage – just for the images.

*This is a very conservative resolution.

Page 66: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

66

Representing Video

The first step in compressing video is to reduce the amount of information stored for a frame.

This problem is essentially the same as that faced when compressing still images.

Spatial compression: A technique based on removing redundant information within a frame.

Page 67: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

67

Representing Video

Each compressed frame will still be quite large.

Moreover, each one is a still picture that looks very much like the one before it. After all, how much can change in 1/24 of a second?

Why should we waste space to duplicate all of the identical information?

Page 68: Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

68

Representing Video

We can save even more space by recognizing that between two frames, most of the image hasn’t changed. Storing only the changes (deltas) from one cell to the next is much more efficient.

Temporal compression A technique based on storing differences between consecutive frames.