Top Banner
CS338 Additional topics
50

CS338

Feb 25, 2016

Download

Documents

harva

CS338. Additional topics. Entertainment: Video on demand Interactive television Games, virtual worlds, etc . Education Distance learning Hypermedia and/or multimodal courseware Digital libraries Communications Teleconferencing Web Emails Business E-commerce E-business E-banking. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS338

CS338

Additional topics

Page 2: CS338

What Are the Applications?

• Entertainment:– Video on demand– Interactive television– Games, virtual worlds, etc.

• Education– Distance learning– Hypermedia and/or multimodal

courseware– Digital libraries

• Communications– Teleconferencing– Web– Emails

• Business– E-commerce– E-business– E-banking

• Law enforcement– Hypermedia/multimodal search and archive– Multimedia criminal archive

• Military, Intelligence– Multimedia databases– Virtual simulation of battlefields

• Medicine– Multimodal medical databases– Virtual diagnosis– Telesurgery

• Art and Music– Digital sound and music– Computerized art

• etc.

Page 3: CS338

Media Indexing• Different media modality requires different indexing schemes and possibly uses different data

structure in storage in the database• Text --- 0 dimensional media ASCII strings• Audio --- 1 dimensional media

• Image --- 2 dimensional media

• Video --- 3 dimensional media

time

x

y

y

time

Frame#

Page 4: CS338

Text Fundamentals

• One of the most popular modalities• May be viewed as a linear stream of data• Typically represented as an ASCII string for each document• Content indexing still requires certain level of understanding• No general, satisfactory solutions, but the problem is not as acute as in image domain• If certain requirements may be relaxed, there are “general” solutions available• Retrieval problem:

– User wants to find documents related to a topic T– The search program typically tries to find the documents in the “document database” that

contains the string T• Two essential problems:

– Synonymy: Given a word T (i.e., specifying a topic), the word T does not occur anywhere in a document D, even though the document D is in fact closely related to the topic T in question

– Polysemy: The same word may mean many different things in different contexts

Page 5: CS338

Basic Terminologies

• Term: an indexed word in an archive (i.e., database)• Term space: the whole set of words indexed in an archive• Document: represented by a set of words that are indexed• Document space: the whole set of documents in an archive• Frequency: the reciprocal of the number of occurrences of a word in a particular

document• Frequency table: assuming all the words have been preprocessed after stop list

and stemming, and assuming there are M words and N documents, the matrix of MN with each entry being the frequency of that word in the document

• In all the indexing techniques, an archive database is actually represented by the frequency table– In real applications, the database is huge, so the matrix is also huge– Indexing methods are required to handle this huge matrix

Page 6: CS338

Precision *

• D: finite set of documents• A: any algorithm that takes as input, a topic string T, returns as output, a

set S of documents• T: finite set of topics, i.e., a finite set of strings• Precision of A w.r.t. the predicate relevant and the topic t T is defined: 1 + card({dD| dA(t)^ relevant(t,d)}) Pt% = 100 ------------------------------------------------- 1 + card({dD | d A(t)})• Precision of A w.r.t. the predicate relevant and the document set D as well

as the topic set T is defined: tT Pt

P% = 100 ------------ card(T)

– How many of the answers returned are in fact correct

Page 7: CS338

Recall*

• Recall of A w.r.t. the predicate relevant and the topic tT is defined:

1 + card({dD| dA(t)^ relevant(t,d)}) Rt% = 100 ------------------------------------------------- 1 + card({dD | relevant(t,d)})• Recall of A w.r.t. the predicate relevant and the document set

D as well as the topic set T is defined: tT Rt

R% = 100 ------------ card(T)

– How many of the correct documents are in fact retrieved by the query

Page 8: CS338

Text Indexing

• Preprocessing– Key word detection

• Proper noun detection– Word expansion

• Stemming– Stop list– Light parsing

• Named entity tagging– Layout analysis

• Indexing– Inverted files– Vector space indexing

• Regular vector space indexing• Latent semantic indexing

– Signature files

Page 9: CS338

Preprocessing

• Key word detection– String matching for a prepared list of words– Sometimes use heuristics, e.g., proper noun detection looking for capitalized words

• Stop list– A set of words that do not “discriminate” between the documents in a given archive– There are general stop words (e.g., a, the); but in most situations stop words are archive-

specific (e.g., in CS literature, computer could be a stop word)– Needs to explicitly come up with a list, and then excludes them from all the documents

by direct string matching• Word stemming

– Many words share the same semantic stem meaning but with small syntactic variations (e.g., names, naming, named)

– They may be all represented using their stem words (e.g., name)– Requires stemming rules; typically uses heuristics; needs to pay attention to special

cases (e.g., run, running, ran)

Page 10: CS338

Preprocessing, Cont’d

• Named entity tagging– A method of Light Parsing --- a step towards natural language understanding– A “direct” way to resolve polysemy issue, but more applied to proper noun

processing– Many words may have different meanings in different contexts:

• I am leaving for Washington, D.C.• One dollar bill is George Washington.

• Layout analysis– A multimedia document is typically presented with data in different modalities– A specific presentation specifies a spatial layout indicating spatial correlation

between data in different modalities– This correlation is lost in the hypertext source --- needs to find it back --- layout

analysis– In general, it is a hard problem; typically relies on heuristics– In certain situations, it is relatively easy to find a solution

Page 11: CS338

Inverted Files

• For each term, maintains a list of pointers (posting_list) with each of the pointers points to a document in which this term appears

• Typically the inverted files are organized using sophisticated primary-key access methods (e.g., B-trees, hashing)

• Pros:– Easy to implement– Fast to execute– Supports synonym indexing (e.g., using threaded list) to a certain degree

• Cons:– Storage overhead (can reach up to 300% of the original file size if too much information

kept)– Cost of updating and reorganizing the index if the database changes significantly– Still not a complete solution to synonymy problem in a broad sense

• Widely used in many applications (most popularly used) and even in different languages

Page 12: CS338

Inverted Files May Not Always Work

• It is not a complete solution to solving for the synonymy problem in a broad sense --- it does not address the semantics contained in a document

• Example:– Titanic– Maritime aviation tragedies

• The semantic correlation between the two documents are obvious, but they don’t share any common words; nor are they synonyms

• Requires semantics understanding and correlation• Techniques

– NLP --- understanding --- expensive and not reliable– LSI --- directly find correlation --- inexpensive and reliable

Page 13: CS338

Zipf’s Law*

• Assume that all the vocabulary is sorted in terms of word occurrence frequency in decreasing order

• Occurrence frequency is inversely proportional to its rank in the sorting

1 f = ---------------- r ln(1.78 V)

where f is the occurrence frequency of the word; r is its rank in the sorting; V is the size of the vocabulary

• Means that a few vocabulary words appear very often, while the majority of vocabulary words appear once or twice; also true for other languages

Page 14: CS338

Latent Semantic Indexing*• Tries to find a relatively small subset of K words which discriminate between

the M documents in the archive• In the literature, K may be as small as 200• Now only needs to search the k nearest neighbors in K dimensional space, as

opposed to in N dimensional space; a significant saving of computation• Question is how to find such a relatively small subset of words that

discriminate between the M documents in the archive• Algorithm:

– Given the database represented as the matrix A– Singular value decompose this matrix A into U, , V– Keep the first K singular values in to become ’, and accordingly truncate U and

V to become U’ and V’– Now each document may be indexed by the corresponding row vector in U’, which

is in a space of the dimension K << N– Given a query, it may be viewed as a document, and thus represented as a vector q– Since U’ = AV’’ , the corresponding query vector in the new term space: q’ = q V’

’ – Finally, find the first k nearest neighbors in the new space in dimension K w.r.t. q’ in

database U’

-1 T T -1

Page 15: CS338

Diagrammatical Illustration of LSI*

A

A

U’ ’ V’

MxN MxN NxN NxN

MxN MxK KxK KxN

UVT

T

=

Page 16: CS338

Remaining Question*

• Even though K << N and saves significantly in computation and in memory storage, it is still high enough to be indexed in “usual” data structures

• The solution is to use “unusual” data structures --- e.g., TV trees

Page 17: CS338

SBC Scheme• Time/Frequency Mapping: a filter bank or FFT to decompose the input audio

signal into subbands• Psychoacoustic Model: looks at the subbands as well as the original signal and

determines masking thresholds dynamically using psychoacoustic information• Quantizer and Coding: each of the subband samples is quantized and

encoded so as to keep the quantization noise below the masking threshold• Frame Packing: assemble all the quantized samples into frames• Frame Unpacking: frames are unpacked• Reconstruction: subband samples are decoded• Frequency/Time Mapping: turn decoded subband signal samples into single

signal• Example of SBC: MPEG --- 3 different layers, each is a self-contained SBC coder

with its own time-frequency mapping, psychoacoustic model, and quantizer --- tradeoff between computation burden and compression performance– Layer 1: simplest, fastest, but poorest compression– Layer 2:moderate in both computation and compression– Layer 3: most complicated, most expensive, but best compression

Page 18: CS338

Image characteristics

Page 19: CS338

Image Fundamentals

• Digital images are like digital audio signals, obtained from two levels of digitization:– Spatial level --- In addition to the Nyquist sampling theorem that governs the

resolution, representational geometry also needs to pay attention to for analysis, even though in practice, each sampling point is just a blob• Square• Triangle• Hexagon

– Intensity level --- the same as the sampling in audio signals, called quantization; depending on the intensity signal nature, may be signal intensity value or an intensity vector (e.g., color)

• Each spatial digitization element is called a pixel; the corresponding signal quantization intensity is called pixel value

• A digital image may be viewed as a mathematical matrix, with all the pixel values as the matrix entry values

Page 20: CS338

Basic Image Operations• Algebraic Operations:

– Addition: I = I1 + I2 same resolutions, pixelwise addition– Subtraction: I = I1 – I2 same resolutions, pixelwise subtraction– Scalar Multiplication: I = I1 pixelwise multiplication– Scalar Division: I = I1/ pixelwise division

• Logic Operations:– AND: I = I1 AND I2 same resolutions, pixelwise, bitwise AND– OR: I = I1 OR I2 same resolutions, pixelwise, bitwise OR– NOT: I = NOT I1 pixelwise, bitwise NOT

• Correlation:– Image I: NxN, Mask G: (2k+1)x(2k+1)– Ignoring “boundary effect”: (I G)(i,j) = [x=-k,k][y=-k,k]I(i+x,j+y)G(x+k,y+k)– How to handle the “boundary effect”?

• Ignore the boundary pixels of IG, i.e., counts i,j: k, … , N-k• Expand I to (N+2k)x(N+2k) by adding k rows and columns at the four boundaries of I with

pixel values 0 before the correlation

• Convolution:– Same as the correlation except for (I G)(i,j) = [x=-k,k][y=-k,k]I(i-x,j-y)G(x+k,y+k)– Same ways to handle the “boundary effect”

Page 21: CS338

Edge Detection

• Edges discontinuities, caused by:– Changes in surface orientation– Changes in depth– Changes in surface reflectance– Cast shadows

• Edge detectors cannot distinguish between the various kinds of discontinuities; nor can they detect illusory edges

Surface discontinuities

Illumination discontinuities

Depth discontinuities

Reflectance discontinuities

Page 22: CS338

Edge Detection, Cont’d

• Goal: extraction of “significant” edges from images• Hope to distinguish local image edges from 3D edges• Only information available is the image intensity surface• Edges: characterized by rapid changes in the image intensity surface or its

derivatives

• Observations:– Edges are visually apparent where abrupt changes occur in an image feature (e.g.,

image brightness or color)– Local edges have a location, an orientation, and a strength (“edge contrast”)– Local edges are only loosely related to lines, curves, object boundaries, occlusion

boundaries, etc.

Page 23: CS338

Edge Detection, Cont’d• Goal: extraction of “significant” edges from images• Hope to distinguish local image edges from 3D edges• Only information available is the image intensity surface• Edges: characterized by rapid changes in the image intensity surface or its derivatives

• Observations:– Edges are visually apparent where abrupt changes occur in an image feature (e.g., image

brightness or color)– Local edges have a location, an orientation, and a strength (“edge contrast”)– Local edges are only loosely related to lines, curves, object boundaries, occlusion boundaries,

etc.

Page 24: CS338

Ideal vs. Real Discontinuities

• Ideal “Step”

• Ideal “Ramp”

• Ideal “Strip”

• Ideal “Roof”

• Real Edges

Page 25: CS338

Edge Detection Techniques*

• Edges are high-frequency components of an image• High-frequency components may be detected by taking derivatives of an image• Taking derivatives of an image may be approximated by taking differences between

two adjacent pixels• Typically an edge pixel contains two components:

– Strength S = sqrt(Px(i,j)^2 + Py(i,j)^2)– Orientation = arctan(Py(i,j)/Px(i,j))

• Edge detectors cannot distinguish between “true” edges and noise• Examples of edge detection masks:

– 12: dx: dy:– 13: dx: dy:– 22 Roberts

– 33 Sobel:

-1 1 x

-1

1 x

-1 0 x 1

-1

0 x

1

-1-1-1

-1-1

-1

1

111

11 x

0 x

0

0

00x0

0

0 x

0

0

-2

-2

22

Page 26: CS338

Region Detection• Regions are opposite to edges; look for continuities or homogeneities• Regions stand for low-frequency components in an image• Like edges, in real images, very difficult to distinguish “true region” boundary

and noise• Goal: partitioning an image into different regions (i.e., connected

components), each having uniform properties in certain defined image features:– Intensity values– Color values– Texture– Local gradient

• Formal definition*: a region detection of an image I is a partition of the set of pixels of I into a set of regions {Rj}, j = 1, … , k, s.t. – I = U[j=1,k]Rj, every pixel belongs to one region at least– RiRj = , if i j, no pixel belongs to more than one region– p connected to p’ for all p, p’ Rj, spatial coherence– For certain predicate P, if P(Rj) true for some j, then P(Rj U Ri) false if i j, Rj, Ri

adjacent

Page 27: CS338

Region Detection Basic Approaches

• Two basic approaches– Region Growing

• Start with many trivial regions (e.g., pixels)• Merge regions into larger regions based on some similarity criteria• Continue merging till no further merges are possible

– Region Splitting• Start with a single large region (e.g., an entire image)• Split into several smaller regions based on some “splitting” criteria• Continue until no further splits are possible (i.e., regions are

uniform)• Split and Merge: hybrid approach– Combination: split followed by merges, or vice versa– Split and merge decisions can be either

• Local:– A pixel and its immediate neighbors– A region and its immediate neighbors

• Global: on the basis of a large number of pixels scattered through the image

Page 28: CS338

Image indexing

Page 29: CS338

Image Indexing:Image Features and Similarity Matching

• Image feature based similarity matching is the essential approach to image indexing and retrieval

• Every similarity function depends on a set of well defined image features• Image features

– Color features• Color histograms, color correlogram

– Texture features• Gabor wavelet features, Fractal features

– Statistical features• Histograms, moments

– Transform features in other domains• Fourier features, wavelet features, fractal features

– Intensity profile features• Gaussian features

Page 30: CS338

Histogram

• A statistical feature of an image: count the number of pixels for each intensity bucket– An intensity bucket may be a group of intensity values, or may just be each intensity value– A vector that can be displayed as a 1D signal

• Example:

3 5 1 2

5 15 16 1

3 12 18 4

4 5 3 2 0 5 10 15 20

Original Image Region Segmentation with T=9 Histogram

General Case Histogram Region Segmentation through Thresholding in Valleys

Page 31: CS338

Histogram Manipulations

Change pixels’ intensity values by manipulating the histogram; typically global effect

• Stretch

• Shrink

• Slide

Page 32: CS338

Histogram Based Similarity*• Given two images I and I’, and their normalized histograms H and H’ assuming

H and H’ both having the same number of buckets n• Function 1: if || H – H’ || < Threshold, then I and I’ are similar

– || H – H’ || in L2 distance: [i=1,n](H(i)-H’(i))^2– || H – H’ || in L1 distance: [i=1,n]|H(i)-H’(i)|

• Function 2: H H’ [-1, 1] and threshold the value for similarity matching • Function 3: normalized intersection of histograms [i=1,n]min(H(i),H’(i)) S(I,I’) = ---------------------------------- [0,1] [i=1,n]H(i)

– Pros: insensitive to change in image resolution, histogram size, occlusion, depth, and viewpoints

– Cons: expensive• Improvements

– Only use a small number of “peaks” in the histogram– Divide whole image into subimages and conduct histogram matching in each

subimages• For color images, H is just a set of (e.g., 3) vectors; definitions may be changed

accordingly

Page 33: CS338

Moments*• Given image f(x,y), the (p+q)th order moment is defined: m(p,q) = f(x,y)x^p y^q, p, q = 0, 1, 2, …• Moment representation theorem: The infinite set of moments {m(p,q), p, q=0, 1, 2, …} uniquely determine f(x,y),

and vice versa• Statistic to characterize an image

– According to the theorem, only the whole set of the moments can uniquely characterize the image. Can we truncate into the first finite number of moments?

– In practice, only a finite number of moments can be used for similarity matching, making it a necessary condition for similarity matching, just like histograms

• Given moments m(p,q) of f(x,y) up to the order p+q=N, a “similar” function may be reconstructed g(x,y) = [i,j=0,N]h(i,j)x^i y^j by solving for the unknowns h(i,j) in a set of equations by equating the moments of g(x,y) to m(p,q)– Problem: when more moments are available, we will have to resolve for all the

unknowns h(i,j) in the set of equations --- moments defined this way are coupled• The solution is to change to the orthogonal moments

Page 34: CS338

Orthogonal Moments*

• Define the orthogonal Legendre polynomials:P_0(x) = 1P_n(x) = (1/(n!2^n))(d^n (x^2 – 1)^n)/dx^n), n = 1, 2, …[-1,1]P_n(x)P_m(x)dx = 2 (m-n)/(2n+1)(x) = 1, if x = 0; 0, otherwise

• Given an image f(x,y), x,y[-1,1] W.L.O.G., the orthogonal moments (p,q) is defined:(p,q) = ((2p+1)(2q+1)/4) [-1,1]f(x,y)P_p(x)P_q(y)dxdy

• Similarly, f(x,y) can be reconstructed asf(x,y) = [p,q=0,] (p,q)P_p(x)P_q(y)

• The relationship between (p,q) and m(p,q) is:P_m(x) = [j=0,m]c(m,j) x^j(p,q) = ((2p+1)(2q+1)/4) [j=0,p] [k=0,q]c(p,j)c(q,k) m(j,k)– c(m,j) is the combination number choosing j from m

• Now an approximation to f(x,y) can be obtained by truncating (p,q) at a given finite order p+q=N

f(x,y) g(x,y) = [p=0,N] [q=0,N-p] (p,q) P_p(x) P_q(y)– (p,q) do not need to be updated when more or less moments are available

Page 35: CS338

Moment Invariants*• Certain functions of moments are invariant under geometric transforms such

as translation, scaling, and rotation• Goal: these invariant functions may be used for image similarity matching• Translation: define the central moments

(p,q) = (x-)^p(y-)^q f(x,y)dxdy, = m(1,0)/m(0,0), = m(0,1)/m(0,0)• Scaling: under a scale change, x’ = ax, y’ = ay, the moments of f(ax, ay) change

to ’(p,q) = (p,q) / a^(p+q+2); the normalized moments defined below are invariant to scaling change:(p,q) = ’(p,q) / (’(0,0))^((p+q+2)/2)

• Rotation and reflection: change coordinates x’ = a1x + a2y, y’ = a3x + a4y, the transformed moments (p,q) are invariant in terms of the following functions for rotation (a1=a4=cos , a2=-a3=sin ) or reflection (a1=-a4= cos , a2=a3= sin ):– For first order moments: (0,1) = (1,0) = 0– For second order moments: 1 = (2,0) + (0,2) 2 = ((2,0) - (0,2))^2 - 4 (1,1)^2

Page 36: CS338

Gaussian Invariants*• Observation: Intensity functions are not continuous• Derivative of a possibly discontinuous function can be made well posed if

it is convolved with the derivative of a smooth function• Use Gaussian function as the smooth function G(x,) = (1/(2^2)) exp(-

x^2/(2^2)) where taking 1D case as an example; – in 2D, x is replaced by a vector (x,y)

• Given an image I, its complete n derivatives at scale at point x:I_[i1,i2,…,in, ](x) = (I G_[i1,i2,…,In](x, )– Where i1,…,in are x, or y, indicating the order of the derivatives; and

convolution• Define J^N[I](x, ) = {I_[i1,…,in, ]| n = 0,…,N} called N-Jet

– Example: N=2 J^2[I](x, ) = {I_ (x), I_[x, ](x), I_[y, ](x), I_[xx, ](x), I_[xy, ](x), I_[yy, ](x)}

• Theorem: for any order N, the local N-Jet at scale locally contains all the information required to reconstruct I at the scale of observation up to order N

Page 37: CS338

Gaussian Invariants, Cont’d*

• In practice, in order to allow scale-invariant, compute the N-Jet at each location for multiple scale

{J^N[I](x, 1), J^N[I](x, 2), … ,J^N[I](x, K)}– Example: N=2, K=3, the set of invariants at each location: d0 = I --- intensity d1 = (I_x)^2 + (I_y)^2 --- magnitude d2 = I_[xx] + I_[yy] --- Laplacian d3 = I_[xx]I_x I_x + 2I_[xy] I_x I_y + I_[yy]I_y I_y d4 = (I_[xx])^2 + 2(I_[xy])^2 + (I_[yy])^2– d0 is omitted as it is sensitive to gray-level shifts– For each location, sample d1 to d4 for three scale (1, 2, 3) --- forming a 12 element

vector D– For each image, uniformly sample each location, and for each location, D is computed– Similarity matching becomes finding whether an image contains a set of D’s that are

similar to the D’s from the query image

Page 38: CS338

Fourier Transform*

• Motivation: decomposition of a function f(x) into the weighted summation of an infinite number of sine and cosine basis functions

• Formally, given f(x), its Fourier transform F(u) is defined:FT: F(u) = f(x)exp(-j2xu)dxIFT: f(x) = F(u)exp(j2ux)du– Here j = sqrt(-1) F(u) in general is a complex function

• Similarly, given 2D discrete image f(m,n) and its Fourier transform F(k,l) are defined:

F(k,l) = [[m=0,M-1][n=0,N-1]f(m,n)exp(-j2km/M)exp(-j2ln/N)]R(M,N)f(m,n) = (1/(MN))[[m=0,M-1][n=0,N-1]F(k,l)exp(j2mk/M)exp(j2nl/N)]R(M,N)– Where the image f(m,n) are limited to m = 0, …, M-1; n = 0, …, N-1– R(M,N) is defined 1 for m = 0, … , M-1 and n = 0, …, N-1, and 0 elsewhere

Page 39: CS338

Fourier Features*

• Fourier transform properties: given f(x) F(u)f(ax) (1/|a|) F(u/a)f(x-a) F(u)exp(-j2a)For 2D f(x,y) F(u,v) f(x,y) rotates F(u,v) rotates

• Given an image f(x,y), index |F(u,v)| – Location invariant (after using central coordinates)– Scaling invariant (after normalization)– Rotation invariant (after using central coordinates in frequency domain)

• Caution: only indexing to |F(u,v)| may cause false indexing– Given f(x,y) F(u,v) = |F(u,v)| exp(j(u,v))– f1(x,y) f2(x,y) |F1(u,v)| = |F2(u,v)|– In fact, most information is contained in (u,v)

• Given constant A, IFT{A exp(j(u,v)} f(x,y)• But IFT{|F(u,v)| exp(jA)} nothing

Page 40: CS338

Color Fundamentals

• Colorimetry --- Psychophysics of color perception• Basic result: Trichromatic Theory --- Most of the colors observed in daily life

can be perfectly reproduced by a mixture of three fixed colors, and that the proportions of the mixtures are uniquely determined– Only “most of the colors”, not all the colors --- there are exceptions– “Mixture” means algebraic addition, i.e., components could be negative– The three “basic” colors are called primaries– The choices of primaries are broad, not unique, as long as independent, i.e.,

none of them may be obtained from the other two– Mixtures obey addition and proportional laws, i.e., if x, y, z are the primaries,

then• U = ax+by+cz, V = mx+ny+pz U+V = (a+m)x+(b+n)y+(c+p)z• U = ax+by+cz sU = sax+sby+scz

– Metamerism: different spectral energy distributions can yield an identical color

Page 41: CS338

Video

Page 42: CS338

MPEG Standard• Only specified as a standard --- actual CODECs are up to many different algorithms,

most of them are proprietary• All MPEG algorithms are intended for both applications

– Asymmetric: frequent use of decompression process while compression process is performed once (e.g., movie on demand, electronic publishing, e-education, distance learning)

– Symmetric: equal use of compression and decompression processes (e.g., multimedia mail, video conferencing)

• Decoding is easy– MPEG1 decoding in software on most platforms– Hardware decoders widely available with low prices– Windows graphics accelerators with MPEG decoding now entering market (e.g.,

Diamond)• Encoding is expensive

– Sequential software encoders are 20:1 real-time– Real-time encoders use parallel processing– Real-time hardware encoders are expensive

• MPEG standard consists of 3 parts:– Synchronization and multiplexing of video and audio– Video– Audio

Page 43: CS338

Compression Mode of H.261*

• Selection depends on answers to several key questions:– Should a MC be transmitted?– Inter vs. Intra compression?– Should the quantizer step size be changed?

• Specifically, selection is based on the following values– Variance of the original macroblock– The macroblock difference (bd)– The displaced macroblock difference (dbd)

• Selection algorithm:– If the variance of dbd < bd as determined by a threshold, select mode Inter + MC, and

the MV needs to be transmitted as side information– Else, MV will not be transmitted; if the original MB has a smaller variance, select Intra

mode where DCT of each 8x8 block of the original picture elements are computed; else, select Inter mode (with zero MV), and the difference blocks (prediction error) are DCT encoded

Page 44: CS338

H.261 Coding Scheme*• In each MB, each block is 64 point DCT coded; this applies to the four

luminance blocks and the two chroma blocks (U and V)• A variable thresholding is applied before quantization to increase the number

of zero coefficients; the accuracy of the coefficients is 12 bits with dynamic range in [-2048, 2047]

• Within a MB the same quantizer is used for all coefficients except for the Intra DC; the same quantizer is used for both luminance and chromainance coding; the Intra DC coefficient is separately quantized

• After quantization, coefficients are zigzag coded by a series of pairs (the run length of the number of zeros preceding the coefficient, the coefficient value)– Example: 3 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 … (0, 3), (1, 2), (7, 1), EOB

• In most implementations the quantizer step size is adjusted based on a measure of buffer fullness to obtain the desired bit rate; the buffer size is chosen not to exceed the maximum allowable coding delay (150 ms)

Page 45: CS338

Comparison b/w H.261 and MPEG-1*

H.261 MPEG-1

Sequential access Random accessOne basic frame rate Flexible frame rateCIF and QCIF images only Flexible image sizeI and P frames only I, P, and B framesMC over 1 frame MC over 1 or more frames1 pixel MV accuracy ½ pixel MV accuracyVariable threshold + uniform quantization Quantization matrix (predefined)No GOP structure GOP structureGOB structure Slice structure

Page 46: CS338

• Bandwidth requirement: MPEG-1 --- 1.2 Mbps MPEG-2 2-20 Mbps• MB structures: alternative subsampling of the chroma channels 3 subsampling

formats

• MPEG-2 accepts both progressive and interlaced inputs– Progressive video: like MPEG-1, all pictures are frame pictures– Interlaced video: encoder consists of a sequence of fields; two options:

• Every field is encoded independently (field pictures)• Two fields are encoded together as a composite frame (frame pictures)• Allowed to switch between frame pictures and field pictures on a frame to frame

basis frame encoding is preferred for relatively still images while field encoding is preferred for images with significant motion

Differences b/w MPEG-2 and MPEG-1 Video*

4:2:0 4:2:2 4:4:4

1 MB = 6 blocks (4Y,1Cr,1Cb) 1 MB = 8 blocks (4Y,2Cr,2Cb) 1 MB = 12 blocks (4Y,4Cr,4Cb)

Page 47: CS338

MPEG-4

• Finalized in October of 1998; available in standards in early 1999• Technical features:

– Represent units of aural, visual, or audiovisual content, called “media objects”. These media objects can be of natural or synthetic origin; this means they could be recorded with a camera or microphone, or generated with a computer

– Describe the composition of these objects to create compound media objects that form audiovisual scenes

– Multiplex and synchronize the data associated with media objects, so that they can be transported over network channels providing a QoS appropriate for the nature of the specific media objects

– Interact with the audiovisual scene generated at the receiver’s end• Enables

– Authors to produce content with greater reusability and flexibility– Network service providers to have transparent information to maintain QoS– End users to interact with content at higher levels within the limits set by the

authors

Page 48: CS338

MPEG History

• MPEG-1 is targeted for video CD-ROM• MPEG-2 is targeted for Digital Television• MPEG-3 was initiated for HDTV, later was found to be absorbed into

MPEG-2 abandoned • MPEG-4 targeted to provide the standardized technological elements

enabling the integration of the production, distribution, and content access paradigms of the fields of digital television, interactive graphics, and interactive multimedia

• MPEG-7, formally named “Multimedia Content Description Interface”, is targeted to create a standard for describing the multimedia content data that will support some degree of interpretation of the information’s meaning, which can be passed onto, or accessed by, a device or a computer code; MPEG-7 is not aimed at any one application in particular; rather, the elements that MPEG-7 standardizes will support as broad a range of applications as possible

• MPEG-21 is now under design and review

Page 49: CS338

Buffer Retrieval Scheduling*• FCFS: Given an interval of time, process data (sectors) in the order of request

arrival– Seek time = [i=1,k]|s_i – s_(i-1)|/v

• s_i --- sector location• v --- head velocity

• SCAN: Given an interval of time, first sort all the requests in terms of sector numbers, then process requests from the beginning– Seek time is much smaller than that of FCFS, but needs sorting– Example: requests: 25, 5, 35, 5, 10, and initially at 1

• SCAN-EDF: Also consider deadlines of requests; process requests with earlier deadlines first; in each “group” of requests, use SCAN– Example: Job ID Sector Deadline 1 15 10 2 20 5 3 10 10 4 35 10 5 50 5 Processes {2, 5} group job first, then {1, 3, 4}. For each group, use SCAN

Page 50: CS338

Placement Algorithms*• How to layout the data over a CD-ROM to optimize the retrieval?• Real time files (RTFs): a triple (lf, bf, pf) such that:

– lf: number of blocks of the file f– bf: number of sectors in each block of the file f– pf: total number of sectors in each block of file fExample: (4, 2, 7)

• Start Assignment Problem (SAP)– Given start position of a RTF, the sectors of block i: occi(f) = {j|st(f) + (i-1)pf j st(f) + (i-1)pf + bf – 1}– Note that the sectors number from 0– All the sectors of f are: occ(f) = [i=1,lf]occi(f) Example:(4, 2, 7), st(f) = 3 ooc1(f) = {3, 4}, ooc2(f) = {10, 11}, ooc3(f) = {17, 18}, ooc4(f) = {24,

25}– Non-collision Axiom: for all fi, fj F (RTFs map to {1,…,N} sectors), fi fj

occ(fi)occ(fj)=. If such a function st exists, it is called a placement function– SAP is trying to find st(fi) for all fi F such that the non-collision axiom follows

NP-Hard problem in general