Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems Prof. Thinh Nguyen (based on Prof. Ben Lee’s slides) Oregon State University School of Electrical Engineering and Computer Science
Ch. 1: Audio/Image/Video
Fundamentals Multimedia Systems
Prof. Thinh Nguyen (based on Prof. Ben Lee’s slides) Oregon State University
School of Electrical Engineering and
Computer Science
Outline
Computer Representation of Audio
◦ Quantization
◦ Sampling
Digital Image Representation
◦ Color System
◦ Chrominance Subsampling
Digital Video Representation
Hardware Requirements
Chapter 1: Audio/Image/Video
Fundamentals 2
Chapter 1: Audio/Image/Video
Fundamentals 3
Computer Representation of
Audio Sound is created by vibration of matter (i.e., air
molecules).
Sound is a continuous wave that travels through air:
◦ Amplitude of a sound is the measure of the displacement of air
pressure wave from its mean or quiescent state (measured in
decibels, db)
◦ Frequency represents the number of periods in a second
(measured in hertz, Hz, cycles/second). Period is the reciprocal
value of the frequency.
Period
Time
Amplitude
(Air Pressure)
Chapter 1: Audio/Image/Video
Fundamentals 4
Computer Representation of
Audio A transducer (inside a microphone)
converts pressure to voltage levels.
Convert analog signal into a digital stream by discrete sampling. ◦ Discretization both in time and amplitude
(quantization).
In a computer, we sample these values at intervals to get a vector of values.
A computer measures the amplitude of the waveform at regular time intervals to produce a series of numbers (samples).
Chapter 1: Audio/Image/Video
Fundamentals 5
Quantization and Sampling
Samples
Sample
Height
0.75
0.25 0.5
1.00
Quan
tiza
tion
Sampling Rate
Chapter 1: Audio/Image/Video
Fundamentals 6
Sampling Rate
Direct relationship between sampling rate, sound quality (fidelity) and storage space.
How often do you need to sample a signal to avoid losing information? ◦ To decide a sampling rate - must be aware of
difference between playback rate and capturing (sampling) rate.
◦ It depends on how fast the signal is changing. In reality, twice per cycle (follows from the Nyquist sampling theorem).
Human hearing frequency range: 20Hz - 20KHz, voice is about 500Hz to 2KHz.
Chapter 1: Audio/Image/Video
Fundamentals 7
Nyquist Sampling Theorem
If a signal f(t) is sampled at regular intervals of time and at a rate higher than twice the highest significant signal frequency, then the samples contain all the information of the original signal.
Example
◦ Actual playback frequency for CD quality audio is 22,050 Hz
◦ Because of Nyquist Theorem - we need to sample the signal twice this frequency, therefore sampling frequency is 44,100 Hz.
Chapter 1: Audio/Image/Video
Fundamentals 8
Quantization
Sample precision - the resolution of a sample value.
Quantization depends on the number of bits used measuring the height of the waveform.
16-bit CD quality quantization results in 64K values.
Audio formats are described by sample rate and quantization: ◦ Voice quality - 8 bit quantization, 8,000 Hz mono (64
Kbps)
◦ CD quality - 16 bit quantization, 44,100 Hz linear stereo (705.6 Kbps for mono, 1.411 Mbps for stereo)
Chapter 1: Audio/Image/Video
Fundamentals 9
Signal-to-Noise Ratio
A measure of the quality of the signal. Let Psignal and Pnoise be
the signal power and noise power (variances), respectively
◦ SNR = 10 log10 (Psignal / Pnoise )
Assuming quantization error is uniform, and the variance of signal is
not too large compared to the maximum signal value Vmax, then
each bit adds about 6 dB of resolution! (see accompanied
derivations for details)
V
t
Vsignal(t)
Vmax +2N-1
-2N-1
Quan
tiza
tion
Sampling Rate
Max.
quantization
noise, Vnoise
2Vmax/2N=
Vmax/2N-1
Chapter 1: Audio/Image/Video
Fundamentals 10
Pulse Code Modulation (PCM)
The two step process of sampling and
quantization is known as Pulse Code
Modulation.
Based on the Nyquist sampling theorem.
Used in speech and CD encoding.
Chapter 1: Audio/Image/Video
Fundamentals 11
How Are Audio Samples
Represented? Audio samples are represented as formats
characterized by four parameters: ◦ Sample rate: Sampling frequency
◦ Precision: Number of bits used to store audio samples
◦ Encoding: Audio data representation (compression)
◦ Channel: Multiple channels of audio may be interleaved at sample boundaries.
PCM-encoded speech (64 Kbps) and music (1.411 Mbps) strains the bandwidth of Internet, thus some form of compression is needed!
See Chapter 5: Audio Compression
Chapter 1: Audio/Image/Video
Fundamentals 12
Preview of Chapter 5
Audio samples are encoded (compressed) based on
◦ Non-uniform quantization - humans are more sensitive to
changes in “quiet” sounds than “loud” sounds:
-law encoding
Difference encoding
◦ Psychoacoustic Principles - humans do not hear all frequencies
the same way due to Auditory Masking:
Simultaneous masking
Temporal masking
This information is used in MPEG-1 Layer 3, known as
MP3.
◦ Reduces bit rate for CD quality music down to 128 or 112
Kbps.
Chapter 1: Audio/Image/Video Fundamentals 13
Outline
Computer Representation of Audio
◦ Quantization
◦ Sampling
Digital Image Representation
◦ Color System
◦ Chrominance Subsampling
Digital Video Representation
Hardware Requirements
Digital Image Representation
• An image is a collection of an nm array of picture
elements or pixels.
• Pixel representation can be bi-level, gray-scale, or color.
• Resolution specifies the distance between points –
accuracy.
W B
Bi-level
Gray-scale
1 bit
n bits
3 x n bits
Color
R
G
B
Intensity/Brightness Level
Pixel
14
Chapter 1: Audio/Image/Video
Fundamentals
Pixels
• Images are made up of
dots called pixels for
picture elements
• The number of pixels
affects the resolution
of the monitor
• The higher the resolution,
the better the image quality
15
Chapter 1: Audio/Image/Video
Fundamentals
Color Depth (Pixel Depth)
The amount of information per pixel is known as
the color depth
◦ Monochrome (1 bit per pixel)
◦ Gray-scale (8 bits per pixel)
◦ Color (8 or 16 bits per pixel)
8-bit indexes to a color palette
5 bits for each RGB + 1 bit Alpha (16 bits)
◦ True color (24 or 32 bits per pixel)
RGB (24 bits)
RGB + Alpha (32 bits)
16
Chapter 1: Audio/Image/Video
Fundamentals
Example Color Depth
1-bit depth 4-bit depth
8-bit depth 16-bit depth
17
Chapter 1: Audio/Image/Video
Fundamentals
Color Spaces
A method by which we can specify, create, and visualize color.
Why more than one color space? Different color spaces are better for different applications.
◦ Humans => Hue Saturation Lightness or Brightness (HSL or HSB)
◦ CRT monitors => Red Green Blue (RGB)
◦ Printers => Cyan Magenta Yellow Black (CMYK)
◦ Compression => Luminance and Chrominance (YIQ, YUV, YCbCr)
Chapter 1: Audio/Image/Video
Fundamentals 18
Visible Spectrum
580 nm 545 nm 440 nm
Human retina is most sensitive to these wavelengths
19
Chapter 1: Audio/Image/Video
Fundamentals
Color Perception S
en
siti
vit
y
Wavelength (nm)
400 500 600 700
Blue Red
Green
Luminosity
20
Chapter 1: Audio/Image/Video
Fundamentals
HSB
H
dominant
wavelength
S
purity
% white
B
Luminance
Intensity of
light
Defines the color itself
Indicates the degree to which the hue differs
from a neutral gray with the same value (brightness)
Indicates the level of illumination
21
Chapter 1: Audio/Image/Video
Fundamentals
RGB Color System
RGB (Red-Green-Blue) is the most widely used color system.
Represents each pixel as a color triplet in the form (R, G, B), e.g., for 24-bit color, each numerical values are 8 bits (varies from 0 to 255).
◦ (0, 0, 0) = black
◦ (255, 255, 255) = white
◦ (255, 0, 0) = red
◦ (0, 255, 255) = cyan
◦ (65, 65, 65) = a shade of gray
Chapter 1: Audio/Image/Video
Fundamentals 22
RGB
RGB is an additive model.
No beam, no light.
All 3 beams => white!
Cyan
Yellow Magenta
23
Chapter 1: Audio/Image/Video
Fundamentals
CMYK Color System
• For printing, there is no light source. We see light
reflected from the surface of the paper.
• Subtractive color model.
Cyan
Yellow Magenta
No ink, 100% reflection
of light => white!
All 3 colors => black!
But, due to imperfect ink, its usually a muddy brown.
That’s why Black (K) ink is added. 24
Chapter 1: Audio/Image/Video
Fundamentals
YUV Color System
PAL (Phase Alternating Line) standard.
Humans are more sensitive to luminance (brightness) fidelity than color fidelity.
◦ Luminance (Y) - Encodes the brightness or intensity.
◦ Chrominance (U and V) -Encodes the color information.
YUV uses 1 byte for luminance component, and 4 bits for each chrominance components.
◦ Requires only 2/3 of the space (RGB = 24 bits), so better compression! This coding ratio is called 4:2:2 subsampling.
RGB <=> YUV
◦ Y = 0.3R + 0.59G + 0.11B
◦ U = (B-Y) * 0.493
◦ V = (R-Y) * 0.877
Chapter 1: Audio/Image/Video
Fundamentals 25
YCbCr Color System
Closely related to YUV. It is a scaled and shifted YUV.
Cb (blue) and Cr (red) chrominance.
Used in JPEG and MPEG.
YCbCr <=> RGB ◦ Y = 0.257R + 0.504G + 0.098B + 16
◦ Cb = ((B-Y)/2)+0.5 = - 0.148R - 0.291G + 0.439B + 128
◦ Cr = ((R-Y)/1.6)+0.5 = 0.439R - 0.368G - 0.071B + 128
Chapter 1: Audio/Image/Video
Fundamentals 26
YIQ Color System
Used in NTSC color TV broadcasting. B/W TV if only Y is used.
YIQ signal ◦ similar to YUV Y = 0.299R + 0.587G + 0.114B
I = 0.596R - 0.275G - 0.321B
Q = 0.212R -0.528G + 0.311B
Composite signal ◦ All information is composed into one signal.
◦ To decode, need modulation methods for eliminating interference b/w luminance and chrominance components.
Chapter 1: Audio/Image/Video
Fundamentals 27
Color Decomposition
Chapter 1: Audio/Image/Video
Fundamentals 28
RGB CMYK YCbCr YIQ
Red
Green
Blue
Cyan
Magenta
Yellow
Y
U
V
Y
I
Q
Chapter 1: Audio/Image/Video Fundamentals 29
Chrominance Subsampling
What’s another way to cut chrominance bandwidth in
half?
◦ Use 4-bits per pixel.
Human eye less sensitive to variations in color than in
brightness.
Compression achieved with little loss in perceptual
quality.
4:4:4 Y Cb Cr
Horizontal
sampling
reference Horizontal factor
(relative to 1st digit)
Horizontal factor
(relative to 1st digit,
except when 0 –
½ horizontal &
½ vertical)
Chapter 1: Audio/Image/Video
Fundamentals 30
4:2:2 Subsampling For every 4 luminance samples, take 2 chrominance
samples (subsampling by 2:1 horizontally only).
Chrominance planes just as tall, half as wide.
Reduces bandwidth by 1/3
Used in professional editing (high-end digital video formats)
Chapter 1: Audio/Image/Video
Fundamentals 31
4:1:1 Subsampling
For every 4 luminance samples, take 1 chrominance sample (subsampling by 4:1 horizontally only).
Used in digital video.
Chapter 1: Audio/Image/Video
Fundamentals 32
4:2:0 Subsampling
For every 4 luminance samples, take 1 chrominance sample (subsampling by 2:1 both horizontally and vertically).
Chrominance halved in both directions.
Most commonly used.
Three varieties:
JPEG, MPEG-1, MJPEG MPEG-2
Chapter 1: Audio/Image/Video
Fundamentals 33
How Are Images Represented?
A single digitized image of 1024 pixels 1024 pixels, 24 bits per pixels requires
◦ ~25 Mbits of storage
◦ ~7 minutes to send over a 64 Kbps modem!
◦ ~8-25 seconds to send over a 1-3 Mbps cable modem!
Some form of compression is needed!
See Chapter 2: Compression Basics and Chapter 3: Image Compression
Chapter 1: Audio/Image/Video
Fundamentals 34
Preview of Chapters 2 and 3
Lossless - no information is lost:
◦ Exploits redundancy
◦ Most probable data encoded with fewer bits
Lossy - approximation of original image
◦ Looks for how pixel values change
◦ Human eye more sensitive to luminance than chrominance.
◦ Human eye less sensitive to subtle feature of the image.
JPEG uses both techniques.
Chapter 1: Audio/Image/Video
Fundamentals 35
Outline
Computer Representation of Audio
◦ Quantization
◦ Sampling
Digital Image Representation
◦ Color System
◦ Chrominance Subsampling
Digital Video Representation
Hardware Requirements
Digital Video Representation
Can be thought of as a sequence of moving
images (or frames).
Important parameters in video:
◦ Digital image resolution (e.g., nm pixels)
◦ Quantization (e.g., k-bits per pixel)
◦ Frame rate (p frames per second, i.e., fps)
Continuity of motion is achieved at
◦ a minimal 15 fps
◦ is good at 30 fps
◦ HDTV recommends 60 fps!
Chapter 1: Audio/Image/Video
Fundamentals 36
Chapter 1: Audio/Image/Video
Fundamentals 37
Standard Video Data Formats
National Television System Committee (NTSC) ◦ Set the standard for transmission of analog color
pictures back in 1953! ◦ Used in the US and Japan. ◦ 525 lines (480 visible). ◦ Resolution? Not digital, but equivalent to the quality
produced by a 720486 pixels. ◦ 30 fps (i.e., delay between frames = 33.3 ms). ◦ Video aspect ratio of 4:3 (e.g., 12 in. wide, 9 in. high)
Other standards: ◦ PAL (Phase Alternating Line): Used in parts of Western
Europe. ◦ SECAM: French Standard
Chapter 1: Audio/Image/Video
Fundamentals 38
HDTV
Advanced Television Systems Committee (ATSC)
> 1000 lines
60 fps
Resolutions of 19201080 and 1280720 pixels
Video aspect ratio of 16:9
MPEG-2 for video compression
AC-3 (Audio Coding-3) for audio compression
5.1 channel Dolby surround sound
Chapter 1: Audio/Image/Video
Fundamentals 39
Bandwidth Requirements
NTSC - 720 486 pixels, 30 fps, true color ◦ 3 720 486 8 30 = 251,942,400 bps or ~252
Mbps!
With 4:2:2 subsampling ◦ Luminance part: 720 486 8 30 = 83,980,800 bps
◦ Chrominance part: 2 720/2 486 8 30 = 83,980,800 bps
◦ Together ~168 Mbps!
For uncompressed HDTV quality video, BW requirement is ◦ 3 1920 1080 8 60 = 2,985,984 bps or ~3
Gbps!
Chapter 1: Audio/Image/Video
Fundamentals 40
Video Compression
In addition to techniques used in JPEG, MPEG uses ◦ Spatial redundancy - correlation between
neighboring pixels.
◦ Spectral redundancy - correlation between different frequency spectrum.
◦ Temporal redundancy - correlation between successive frames.
◦ See Chapter 5: Video Compression.
What about delay through the network? ◦ See Chapter 6: Multimedia Networking.
Chapter 1: Audio/Image/Video
Fundamentals 41
Outline
Computer Representation of Audio
◦ Quantization
◦ Sampling
Digital Image Representation
◦ Color System
◦ Chrominance Subsampling
Digital Video Representation
Hardware Requirements