Ch. 1: Audio/Image/Video Fundamentals Multimedia …classes.engr.oregonstate.edu/eecs/spring2013/ece477/slides/1.Media.pdfCh. 1: Audio/Image/Video Fundamentals Multimedia Systems Prof.

Ch. 1: Audio/Image/Video

Fundamentals Multimedia Systems

Prof. Thinh Nguyen (based on Prof. Ben Lee’s slides) Oregon State University

School of Electrical Engineering and

Computer Science

Outline

Computer Representation of Audio

◦ Quantization

◦ Sampling

Digital Image Representation

◦ Color System

◦ Chrominance Subsampling

Digital Video Representation

Hardware Requirements

Chapter 1: Audio/Image/Video

Fundamentals 2


Fundamentals 3

Computer Representation of

Audio Sound is created by vibration of matter (i.e., air

molecules).

Sound is a continuous wave that travels through air:

◦ Amplitude of a sound is the measure of the displacement of air

pressure wave from its mean or quiescent state (measured in

decibels, db)

◦ Frequency represents the number of periods in a second

(measured in hertz, Hz, cycles/second). Period is the reciprocal

value of the frequency.

Period

Time

Amplitude

(Air Pressure)


Fundamentals 4

Computer Representation of

Audio A transducer (inside a microphone)

converts pressure to voltage levels.

Convert analog signal into a digital stream by discrete sampling. ◦ Discretization both in time and amplitude

(quantization).

In a computer, we sample these values at intervals to get a vector of values.

A computer measures the amplitude of the waveform at regular time intervals to produce a series of numbers (samples).


Fundamentals 5

Quantization and Sampling

Samples

Sample

Height

0.75

0.25 0.5

1.00

Quan

tiza

tion

Sampling Rate


Fundamentals 6

Sampling Rate

Direct relationship between sampling rate, sound quality (fidelity) and storage space.

How often do you need to sample a signal to avoid losing information? ◦ To decide a sampling rate - must be aware of

difference between playback rate and capturing (sampling) rate.

◦ It depends on how fast the signal is changing. In reality, twice per cycle (follows from the Nyquist sampling theorem).

Human hearing frequency range: 20Hz - 20KHz, voice is about 500Hz to 2KHz.


Fundamentals 7

Nyquist Sampling Theorem

If a signal f(t) is sampled at regular intervals of time and at a rate higher than twice the highest significant signal frequency, then the samples contain all the information of the original signal.

Example

◦ Actual playback frequency for CD quality audio is 22,050 Hz

◦ Because of Nyquist Theorem - we need to sample the signal twice this frequency, therefore sampling frequency is 44,100 Hz.


Fundamentals 8

Quantization

Sample precision - the resolution of a sample value.

Quantization depends on the number of bits used measuring the height of the waveform.

16-bit CD quality quantization results in 64K values.

Audio formats are described by sample rate and quantization: ◦ Voice quality - 8 bit quantization, 8,000 Hz mono (64

Kbps)

◦ CD quality - 16 bit quantization, 44,100 Hz linear stereo (705.6 Kbps for mono, 1.411 Mbps for stereo)


Fundamentals 9

Signal-to-Noise Ratio

A measure of the quality of the signal. Let Psignal and Pnoise be

the signal power and noise power (variances), respectively

◦ SNR = 10 log10 (Psignal / Pnoise )

Assuming quantization error is uniform, and the variance of signal is

not too large compared to the maximum signal value Vmax, then

each bit adds about 6 dB of resolution! (see accompanied

derivations for details)

V

t

Vsignal(t)

Vmax +2N-1

-2N-1

Quan

tiza

tion

Sampling Rate

Max.

quantization

noise, Vnoise

2Vmax/2N=

Vmax/2N-1


Fundamentals 10

Pulse Code Modulation (PCM)

The two step process of sampling and

quantization is known as Pulse Code

Modulation.

Based on the Nyquist sampling theorem.

Used in speech and CD encoding.


Fundamentals 11

How Are Audio Samples

Represented? Audio samples are represented as formats

characterized by four parameters: ◦ Sample rate: Sampling frequency

◦ Precision: Number of bits used to store audio samples

◦ Encoding: Audio data representation (compression)

◦ Channel: Multiple channels of audio may be interleaved at sample boundaries.

PCM-encoded speech (64 Kbps) and music (1.411 Mbps) strains the bandwidth of Internet, thus some form of compression is needed!

See Chapter 5: Audio Compression


Fundamentals 12

Preview of Chapter 5

Audio samples are encoded (compressed) based on

◦ Non-uniform quantization - humans are more sensitive to

changes in “quiet” sounds than “loud” sounds:

-law encoding

Difference encoding

◦ Psychoacoustic Principles - humans do not hear all frequencies

the same way due to Auditory Masking:

Simultaneous masking

Temporal masking

This information is used in MPEG-1 Layer 3, known as

MP3.

◦ Reduces bit rate for CD quality music down to 128 or 112

Kbps.

Chapter 1: Audio/Image/Video Fundamentals 13

Outline


◦ Quantization

◦ Sampling


◦ Color System





• An image is a collection of an nm array of picture

elements or pixels.

• Pixel representation can be bi-level, gray-scale, or color.

• Resolution specifies the distance between points –

accuracy.

W B

Bi-level

Gray-scale

1 bit

n bits

3 x n bits

Color

R

G

B

Intensity/Brightness Level

Pixel

14


Fundamentals

Pixels

• Images are made up of

dots called pixels for

picture elements

• The number of pixels

affects the resolution

of the monitor

• The higher the resolution,

the better the image quality

15


Fundamentals

Color Depth (Pixel Depth)

The amount of information per pixel is known as

the color depth

◦ Monochrome (1 bit per pixel)

◦ Gray-scale (8 bits per pixel)

◦ Color (8 or 16 bits per pixel)

8-bit indexes to a color palette

5 bits for each RGB + 1 bit Alpha (16 bits)

◦ True color (24 or 32 bits per pixel)

RGB (24 bits)

RGB + Alpha (32 bits)

16


Fundamentals

Example Color Depth

1-bit depth 4-bit depth

8-bit depth 16-bit depth

17


Fundamentals

Color Spaces

A method by which we can specify, create, and visualize color.

Why more than one color space? Different color spaces are better for different applications.

◦ Humans => Hue Saturation Lightness or Brightness (HSL or HSB)

◦ CRT monitors => Red Green Blue (RGB)

◦ Printers => Cyan Magenta Yellow Black (CMYK)

◦ Compression => Luminance and Chrominance (YIQ, YUV, YCbCr)


Fundamentals 18

Visible Spectrum

580 nm 545 nm 440 nm

Human retina is most sensitive to these wavelengths

19


Fundamentals

Color Perception S

en

siti

vit

y

Wavelength (nm)

400 500 600 700

Blue Red

Green

Luminosity

20


Fundamentals

HSB

H

dominant

wavelength

S

purity

% white

B

Luminance

Intensity of

light

Defines the color itself

Indicates the degree to which the hue differs

from a neutral gray with the same value (brightness)

Indicates the level of illumination

21


Fundamentals

RGB Color System

RGB (Red-Green-Blue) is the most widely used color system.

Represents each pixel as a color triplet in the form (R, G, B), e.g., for 24-bit color, each numerical values are 8 bits (varies from 0 to 255).

◦ (0, 0, 0) = black

◦ (255, 255, 255) = white

◦ (255, 0, 0) = red

◦ (0, 255, 255) = cyan

◦ (65, 65, 65) = a shade of gray


Fundamentals 22

RGB

RGB is an additive model.

No beam, no light.

All 3 beams => white!

Cyan

Yellow Magenta

23


Fundamentals

CMYK Color System

• For printing, there is no light source. We see light

reflected from the surface of the paper.

• Subtractive color model.

Cyan

Yellow Magenta

No ink, 100% reflection

of light => white!

All 3 colors => black!

But, due to imperfect ink, its usually a muddy brown.

That’s why Black (K) ink is added. 24


Fundamentals

YUV Color System

PAL (Phase Alternating Line) standard.

Humans are more sensitive to luminance (brightness) fidelity than color fidelity.

◦ Luminance (Y) - Encodes the brightness or intensity.

◦ Chrominance (U and V) -Encodes the color information.

YUV uses 1 byte for luminance component, and 4 bits for each chrominance components.

◦ Requires only 2/3 of the space (RGB = 24 bits), so better compression! This coding ratio is called 4:2:2 subsampling.

RGB <=> YUV

◦ Y = 0.3R + 0.59G + 0.11B

◦ U = (B-Y) * 0.493

◦ V = (R-Y) * 0.877


Fundamentals 25

YCbCr Color System

Closely related to YUV. It is a scaled and shifted YUV.

Cb (blue) and Cr (red) chrominance.

Used in JPEG and MPEG.

YCbCr <=> RGB ◦ Y = 0.257R + 0.504G + 0.098B + 16

◦ Cb = ((B-Y)/2)+0.5 = - 0.148R - 0.291G + 0.439B + 128

◦ Cr = ((R-Y)/1.6)+0.5 = 0.439R - 0.368G - 0.071B + 128


Fundamentals 26

YIQ Color System

Used in NTSC color TV broadcasting. B/W TV if only Y is used.

YIQ signal ◦ similar to YUV Y = 0.299R + 0.587G + 0.114B

I = 0.596R - 0.275G - 0.321B

Q = 0.212R -0.528G + 0.311B

Composite signal ◦ All information is composed into one signal.

◦ To decode, need modulation methods for eliminating interference b/w luminance and chrominance components.


Fundamentals 27

Color Decomposition


Fundamentals 28

RGB CMYK YCbCr YIQ

Red

Green

Blue

Cyan

Magenta

Yellow

Y

U

V

Y

I

Q

Chapter 1: Audio/Image/Video Fundamentals 29

Chrominance Subsampling

What’s another way to cut chrominance bandwidth in

half?

◦ Use 4-bits per pixel.

Human eye less sensitive to variations in color than in

brightness.

Compression achieved with little loss in perceptual

quality.

4:4:4 Y Cb Cr

Horizontal

sampling

reference Horizontal factor

(relative to 1st digit)

Horizontal factor

(relative to 1st digit,

except when 0 –

½ horizontal &

½ vertical)


Fundamentals 30

4:2:2 Subsampling For every 4 luminance samples, take 2 chrominance

samples (subsampling by 2:1 horizontally only).

Chrominance planes just as tall, half as wide.

Reduces bandwidth by 1/3

Used in professional editing (high-end digital video formats)


Fundamentals 31

4:1:1 Subsampling

For every 4 luminance samples, take 1 chrominance sample (subsampling by 4:1 horizontally only).

Used in digital video.


Fundamentals 32

4:2:0 Subsampling

For every 4 luminance samples, take 1 chrominance sample (subsampling by 2:1 both horizontally and vertically).

Chrominance halved in both directions.

Most commonly used.

Three varieties:

JPEG, MPEG-1, MJPEG MPEG-2


Fundamentals 33

How Are Images Represented?

A single digitized image of 1024 pixels 1024 pixels, 24 bits per pixels requires

◦ ~25 Mbits of storage

◦ ~7 minutes to send over a 64 Kbps modem!

◦ ~8-25 seconds to send over a 1-3 Mbps cable modem!

Some form of compression is needed!

See Chapter 2: Compression Basics and Chapter 3: Image Compression


Fundamentals 34

Preview of Chapters 2 and 3

Lossless - no information is lost:

◦ Exploits redundancy

◦ Most probable data encoded with fewer bits

Lossy - approximation of original image

◦ Looks for how pixel values change

◦ Human eye more sensitive to luminance than chrominance.

◦ Human eye less sensitive to subtle feature of the image.

JPEG uses both techniques.


Fundamentals 35

Outline


◦ Quantization

◦ Sampling


◦ Color System





Can be thought of as a sequence of moving

images (or frames).

Important parameters in video:

◦ Digital image resolution (e.g., nm pixels)

◦ Quantization (e.g., k-bits per pixel)

◦ Frame rate (p frames per second, i.e., fps)

Continuity of motion is achieved at

◦ a minimal 15 fps

◦ is good at 30 fps

◦ HDTV recommends 60 fps!


Fundamentals 36


Fundamentals 37

Standard Video Data Formats

National Television System Committee (NTSC) ◦ Set the standard for transmission of analog color

pictures back in 1953! ◦ Used in the US and Japan. ◦ 525 lines (480 visible). ◦ Resolution? Not digital, but equivalent to the quality

produced by a 720486 pixels. ◦ 30 fps (i.e., delay between frames = 33.3 ms). ◦ Video aspect ratio of 4:3 (e.g., 12 in. wide, 9 in. high)

Other standards: ◦ PAL (Phase Alternating Line): Used in parts of Western

Europe. ◦ SECAM: French Standard


Fundamentals 38

HDTV

Advanced Television Systems Committee (ATSC)

> 1000 lines

60 fps

Resolutions of 19201080 and 1280720 pixels

Video aspect ratio of 16:9

MPEG-2 for video compression

AC-3 (Audio Coding-3) for audio compression

5.1 channel Dolby surround sound


Fundamentals 39

Bandwidth Requirements

NTSC - 720 486 pixels, 30 fps, true color ◦ 3 720 486 8 30 = 251,942,400 bps or ~252

Mbps!

With 4:2:2 subsampling ◦ Luminance part: 720 486 8 30 = 83,980,800 bps

◦ Chrominance part: 2 720/2 486 8 30 = 83,980,800 bps

◦ Together ~168 Mbps!

For uncompressed HDTV quality video, BW requirement is ◦ 3 1920 1080 8 60 = 2,985,984 bps or ~3

Gbps!


Fundamentals 40

Video Compression

In addition to techniques used in JPEG, MPEG uses ◦ Spatial redundancy - correlation between

neighboring pixels.

◦ Spectral redundancy - correlation between different frequency spectrum.

◦ Temporal redundancy - correlation between successive frames.

◦ See Chapter 5: Video Compression.

What about delay through the network? ◦ See Chapter 6: Multimedia Networking.


Fundamentals 41

Outline


◦ Quantization

◦ Sampling


◦ Color System





Fundamentals 42


Multimedia servers

Routers

Multimedia Enhanced PCs

Wireless Mobile devices - Cell Phones,

Pocket PCs, Internet Appliances

◦ See Chapter 7: Multimedia Embedded Systems.

Ch. 1: Audio/Image/Video Fundamentals Multimedia …classes.engr.oregonstate.edu/eecs/spring2013/ece477/slides/1.Media.pdfCh. 1: Audio/Image/Video Fundamentals Multimedia Systems Prof.

Documents