Top Banner
Iούνιος 6, 2006 The MPEG-7 Multimedia Content Description Interface Αναστασία Μπολοβίνου, Υ/Δ Ινστιτούτου Πληροφορικής και Τηλεπικοινωνιών Ε.Κ.Ε.Φ.Ε ΔΗΜΟΚΡΙΤΟΣ
73

The MPEG-7 Multimedia Content Description Interface

Jan 28, 2016

Download

Documents

sabin

The MPEG-7 Multimedia Content Description Interface. Αναστασία Μπολοβίνου, Υ/Δ Ινστιτούτου Πληροφορικής και Τηλεπικοινωνιών Ε.Κ.Ε.Φ.Ε ΔΗΜΟΚΡΙΤΟΣ. Outline. MPEG-7 motivation and scope Visual Descriptors (color, texture, shape) MPEG-7 retrieval evaluation criterion - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The MPEG-7 Multimedia Content Description Interface

Iούνιος 6, 2006

The MPEG-7

Multimedia Content Description Interface

Αναστασία Μπολοβίνου,

Υ/Δ Ινστιτούτου Πληροφορικής και Τηλεπικοινωνιών

Ε.Κ.Ε.Φ.Ε ΔΗΜΟΚΡΙΤΟΣ

Page 2: The MPEG-7 Multimedia Content Description Interface

2

Outline

• MPEG-7 motivation and scope• Visual Descriptors (color, texture, shape)• MPEG-7 retrieval evaluation criterion• Similarity measures and MPEG-7 visual descriptors• Building MPEG-7 Descriptors and Descriptors

Schemes with Description Definition Language• MPEG-7 VXM current state• Towards MPEG-7 Query Format Framework

(Queries and visual descriptor tools employed by the queries)

• Summary

Page 3: The MPEG-7 Multimedia Content Description Interface

3

Proliferation of audio-visual content

MPEG-7 motivation and design scenarios (possible queries)

• Music/audio: play a few notes and return music with similar music/audio

• Images/graphics: draw a sketch and return images with similar graphics

• Text/keywords: find AV material with subject corresponding to a keyword

• Movement: describe movements and return video clips with the specified temporal and spatial relations

• Scenario: describe actions and return scenarios where similar actions take place

Standardize multimedia metadata descriptions (facilitate

multimedia content-based retrieval) for

various types of audiovisual information

Consumer content

news

sports

Scientific content

Digital art galleries

Recorded material

Page 4: The MPEG-7 Multimedia Content Description Interface

4

- How to extract descriptions(feature extraction, indexing process,annotation & authoring tools,...)

Scope of the Standard

DescriptionProduction(extraction)

DescriptionConsumption

StandardDescription

Normative part ofMPEG-7 standard

- How to use descriptions (search engine, filtering tool, retrieval process, browsing device, ...) - The similarity between contents->The goal is to define the minimum that enables interoperability.

* MPEG-7 does not specify (non normative parts of MPEG-7):

Page 5: The MPEG-7 Multimedia Content Description Interface

5

Information flow

Page 6: The MPEG-7 Multimedia Content Description Interface

6

• Color DescriptorsDominant ColorScalable ColorColor LayoutColor StructureGoF/GoP Color

• Texture DescriptorsHomogeneous TextureTexture BrowsingEdge Histogram

• Shape DescriptorsRegion ShapeContour Shape3D Shape

Visual Descriptors• LocalizationRegion LocatorSpatio-TemporalLocator

OtherFace Recognition

• Motion Descriptors for VideoCamera MotionMotion TrajectoryParametric MotionMotion Activity

(Normative, basic, for localization)

Page 7: The MPEG-7 Multimedia Content Description Interface

7

Color Descriptors

Constrained color spaces:->Scalable Color Descriptor uses HSV->Color Structure Descriptor uses HMMD

Color Descriptors

Dominant Color Scalable Color- HSV space

Color Structure-HMMD space

Color Layout-YCbCr space

GroupOfFrames/Pictures

• Color Space: - R, G, B- Y, Cr, Cb- H, S, V- Monochrome- Linear transformation of R, G, B- HMMD

Page 8: The MPEG-7 Multimedia Content Description Interface

8

Scalable Color Descriptor (CSD)

• A color histogram in HSV color space

• Encoded by Haar TransformFeature vector: {NoCoef, NoBD, Coeff[..], CoeffSign[..]}

Page 9: The MPEG-7 Multimedia Content Description Interface

9

SCD extraction

to 4bits/bin

to 11bits/bi

nNbits/bin

(#bin<256)

Page 10: The MPEG-7 Multimedia Content Description Interface

10

GoF/GoP Color Descriptor

• Histograms Aggregation methods:– Average ..but sensitivity to outliers (lighting changes, occlusion, text overlays)– Median ..increased comp. complexity for sorting– Intersection ..differs: a “least common” color trait

viewpoint

•Extends Scalable Color Descriptor for a video segment or a group of pictures (joint color hist. is then possessed as CSD- Haar transform encoding)

Extraction

Page 11: The MPEG-7 Multimedia Content Description Interface

12

Dominant Color Descriptor (DCD)

• Clustering colors into a small number of representative colors (salient colors)

• F = { {ci, pi, vi}, s}• ci : Representative colors

• pi : Their percentages in the region

• vi : Color variances

• s : Spatial coherency

Page 12: The MPEG-7 Multimedia Content Description Interface

13

DCD Extraction (based on Lloyd gen. algorithm)

ci centroid of cluster ;

x(n) color vector at pixel;

v(n) perceptual weight for pixel .

+spatial coherency:Average number of connecting pixels of a dominant color using 3x3 masking window

H.V.P more sensitive to smooth regions

Page 13: The MPEG-7 Multimedia Content Description Interface

14

• http://debut.cis.nctu.edu.tw/Demo/ContentBasedVideoRetrieval/CBVR/Dominant/index.html

Page 14: The MPEG-7 Multimedia Content Description Interface

15

Color Layout Descriptor (CLD)

• Clustering the image into 64 (8x8) blocks

• Deriving the average color of each block (or using DCD)• Applying (8x8)DCT and encoding

• Efficient for– Sketch-based image retrieval– Content Filtering using image

indexing

.

.

...

. .

.

Page 15: The MPEG-7 Multimedia Content Description Interface

16

If the time domain data is smooth (with little variation

in data) then frequency domain data will make low frequency data larger and high frequency data smaller.

-> derived average colors are transformed into a series of coefficients by performing DCT (data in time

domain - > data in frequency domain).

-> A few low-frequency coefficients are selected using zigzag scanning and quantized to form a CLD (large quantization step in quantizing AC coef / small quantization

step in quantizing DC ). ->The color space adopted for CLD is YCrCb.

CLD extraction

F ={CoefPattern, YDCCoef,CbDCCoef,CrDCCoef,YACCoef, CbACCoef, CrACCoef}

Page 16: The MPEG-7 Multimedia Content Description Interface

17

Color Structure Descriptor (CSD)• Scanning the image by an

8x8 struct. element• Counting the number of

blocks containing each color• Generating a color histogram

(HMMD/4CSQ operating points)

8 x 8 structuringelement

COLORBINC0

C1 +1

C2

C3 +1

C4

C5

C6

C7 +1

Page 17: The MPEG-7 Multimedia Content Description Interface

18

CSD extraction

If

Then sub sampling factor p is given by:

F = {colQuant, Values[m]}

Page 18: The MPEG-7 Multimedia Content Description Interface

19

CSD scaling

Page 19: The MPEG-7 Multimedia Content Description Interface

20

Texture Descriptors

• Homogenous Texture Descriptor• Non-Homogenous Texture

Descriptor (Edge Histogram)• Texture Browsing

Page 20: The MPEG-7 Multimedia Content Description Interface

21

Homogenous Texture Descriptor (HTD)

• Partitioning the frequency domain into 30 channels (modeled by a 2D-Gabor function)

• Computing the energy and energy deviation for each channel

• Computing mean and standard variation of frequency coefficients

- > F = {fDC, fSD, e1,…, e30, d1,…, d30}

• An efficient implementation: – Radon transform followed by Fourier

transform

Page 21: The MPEG-7 Multimedia Content Description Interface

22

HTD Extraction –How to get 2-D frequency layout following the HVS

2-D image f(x,y)

1D P (R, θ)

Radon transform

1D F(P (R, θ))

Resulted sampling grid in polar coords

Page 22: The MPEG-7 Multimedia Content Description Interface

23

- > 2D-Gabor Function deployed to define Gabor filter banks

• It is a Gaussian weighted sinusoid

• It is used to model individual channels

• Each channel filters a specific type of texture

HTD Extraction - Data sampling in feature channel

Page 23: The MPEG-7 Multimedia Content Description Interface

25

HTD properties

One can perform

• Rotation invariance matching

• Intensity invariance matching (fCD removed from the feature vector)

• Scale-Invariant matching

F = {fDC, fSD, e1,…, e30, d1,…, d30}

Page 24: The MPEG-7 Multimedia Content Description Interface

26

Texture Browsing Descriptor

-> Same sp. filtering procedure as the HTD..

Scale and orientation

selective band-pass filters

regularity(periodic to random)

Coarseness(grain to coarse)

Directionality (/300)

->the texture browsing descriptor can be used to find aset of candidates with similar perceptual properties and thenuse the HTD to get a precise similarity match list among thecandidate images.

e.g look for textures that are very regular and oriented at 300

Page 25: The MPEG-7 Multimedia Content Description Interface

27

Edge Histogram Descriptor (EHD)

• Represents the spatial distribution of five types of edges– vertical, horizontal, 45°, 135°, and non-

directional

• Dividing the image into 16 (4x4) blocks• Generating a 5-bin histogram for each

block• It is scale invariant

Retain strong edges by thresholding canny edge operator

…• F = {BinCounts[k]} ,k=80

Page 26: The MPEG-7 Multimedia Content Description Interface

28

EHD extraction

Basic (80 bins) Extended (150 bins)

+13 clusters for semi-global

basic Semi-global

global

Egde map image using “Canny” edge operator

.

Page 27: The MPEG-7 Multimedia Content Description Interface

29

ETD valuation

• Cannot be used for object-based image retrieval

• Thedgeif set to 0 ETD applies for binary edge images (sketch-based retrieval)

• Extended HTD achieves better results but does not exhibits rotation invariant property

Page 28: The MPEG-7 Multimedia Content Description Interface

30

Shape Descriptors

• Region-based Descriptor• Contour-based Shape Descriptor• 2D/3D Shape Descriptor• 3D Shape Descriptor

Page 29: The MPEG-7 Multimedia Content Description Interface

31

Region-based Descriptor (RBD)

• Expresses pixel distribution within a 2-D object region

• Employs a complex 2D-Angular Radial Transformation (ART)

2

0

1

0,,,,,, ddfVfVF nmnmnm

jmAm exp2

1

0cos2

01

nn

nRn

m = 0, ..12

n = 0, ..3

• F = {MagnitudeOfART[k]} ,k=nxm

Page 30: The MPEG-7 Multimedia Content Description Interface

32

Region-based Descriptor (2)

• Applicable to figures (a) – (e)• Distinguishes (i) from (g) and

(h)• (j), (k), and (l) are similar

 

 

 

Advantages:Describes complex shapes with disconnected regions Robust to segmentation noise Small size Fast extraction and matching

Page 31: The MPEG-7 Multimedia Content Description Interface

33

Contour-Based Descriptor (CBD)

• It is based on Curvature Scale-Space representation

Page 32: The MPEG-7 Multimedia Content Description Interface

34

Curvature Scale-Space

• Finds curvature zero crossing points of the shape’s contour (key points)

• Reduces the number of key points step by step, by applying Gaussian smoothing

• The position of key points are expressed relative to the length of the contour curve

Page 33: The MPEG-7 Multimedia Content Description Interface

35

CBD Extraction

Location xCSS of curvature zero-crossing points

Filtering pass ycss

Repetitive smoothing of X and Y contour coordinates by the low-pass kernel (0.25, 0,5, 0,25) until the contour becomes convex

• F = {NofPeaks, GlobalCurv[ecc][circ], PrototypeCurv[ecc][circ], HighestPeakY, peakX[k], peakY[k]}

Page 34: The MPEG-7 Multimedia Content Description Interface

36

CBD Applicability

• Applicable to (a)• Distinguishes

differences in (b)• Find similarities in

(c) - (e)

Advantages:• Captures the shape very well• Robust to the noise, scale, and orientation• It is fast and compact

Page 35: The MPEG-7 Multimedia Content Description Interface

37

Comparison (RB/CB descriptors)

• Blue: Similar shapes by Region-Based• Yellow: Similar shapes by Contour-

Based

Page 36: The MPEG-7 Multimedia Content Description Interface

38

How MPEG-7 compare descriptors?

ANMRR (average modified retrieval rank):

-normalized measures that take into account different sizes of ground truth sets and the actual ranks obtained from the retrieval were defined -> retrievals that miss items are assigned a penalty.

Traditional metric

Page 37: The MPEG-7 Multimedia Content Description Interface

39

Similarity between features

• Typically descriptors: multidimensional vectors (of low level features)

• Similarity of two images in the vector feature space:

– the range query: all the points within a hyperrectanglealigned with the coordinate axes– the nearest-neighbour or within-distance (α−cut)query: a particular metric in the feature space– dissimilarity between statistical distributions: thesame metrics or specific measures

Page 38: The MPEG-7 Multimedia Content Description Interface

40

• http://nayana.ece.ucsb.edu/M7TextureDemo/Demo/client/M7TextureDemo.html

An example of CBIR system using HTD performing range query and NN query

Page 39: The MPEG-7 Multimedia Content Description Interface

41

Criticism on MPEG-7 distance measures• MPEG-7 adopts feature vector space distances based on

geometric assumptions of descriptor space, e.g

..but these quantitative measures (low-level information) do not fit ideally with human similarity perception

->researchers from other areas have developed alternative predicate-based models (descriptors are assumed to contain just binary elements in opposition to continuous data) which express the existence of properties and express high level information

See “Pattern difference” : 2K

bc K:NofPredicates in the data vectors Xi, Xj

b: property exists in Xi c: property exists in Xj

Page 40: The MPEG-7 Multimedia Content Description Interface

44

How to build and deploy an MPEG-7 Description

A description A Description Scheme (structure) .

A set of Descriptor Values (instantiation of a Descriptor for a given data set)

+

MPEG-7 Description Tools are a library of standardized Descriptions and Description Schemes

Adopting the XML Schema as the basis for the MPEG-7 DDL and the resulting XML-compliant instances (Descriptions in MPEG-7 textual format) eases interoperability by using a common, generic and powerful (+ extensible) representation format

in DDLanguage

Page 41: The MPEG-7 Multimedia Content Description Interface

45

How that worksDescription Definition Language:

->XML Schema (flexibility) - XMLS struct.lang.components - XMLS datatype lang.components - mpeg-7 spesific extentions + - >Binary version (efficiency)

Mpeg7 support for

vectors, matrices and

typed references

Text formatBiM formatmix

(XML)

Page 42: The MPEG-7 Multimedia Content Description Interface

47

Descriptions enabled by the MPEG-7 tools

Perceptual Descriptions:

- content’s spatio-temporal structure- info on low-level features - semantic info related to the reality captured by the content

Archival-oriented Descriptions:

-content’s creation/production

- info on using the content

- info on storing and representing the content

Additional info for organizing, managing and accessing the content:

- How objs are related and gathered in collections

-summaries/variations/transcoding to support efficient browsing

- User interaction info

Organization/Naviga-tion/Access/ User Interaction Tools

Content description Tools

Content management Tools

Page 43: The MPEG-7 Multimedia Content Description Interface

48

Type hierarchy for top levels elements

Page 44: The MPEG-7 Multimedia Content Description Interface

49

<Mpeg7><Description xsi:type=“ContentEntity”><MultimediaContent xsi:type=“VideoType”> <Video id=“video_example”> <MediaInformation>...</MediaInformation> <TemporalDecomposition gap=“false” overlap=“false”> <VideoSegment id=“VS1”> <MediaTime> <MediaTimePoint> T00:00:00</MediaTimePoint> <MediaDuration>PT2M</MediaDuration> </MediaTime> <VisualDescriptor xsi:type=“GoFGoPColorType” aggregation=“average”> <ScalableColor numOfCoef=“8” numOfBitplanesDicarded=“0”> <Coeff>1 2 3 4 5 6 7 8</Coeff> </ScalableColor> </VisualDescriptor> </VideoSegment>……

</VideoSegment> </TemporalDecompostion> </Video></MultimediaContent></Description></Mpeg7>

Page 45: The MPEG-7 Multimedia Content Description Interface

50

What DS to choose..?

MPEG-7 provides DSs for description of the structure and semantics of AV content + content management

Cont.Manag.Info can be attached to individual Segments

Page 46: The MPEG-7 Multimedia Content Description Interface

51

Viewpoint of the structure: Segments

Page 47: The MPEG-7 Multimedia Content Description Interface

52

Structure description

Video Segment

Segment decomposition

• Time• Color• Motion• Texture• Shape• Annotation

• Time• Mosaic• Annotation

Moving region

Relation Linkabove

Video Segments

Moving regions

Segment decomposition

Segments decomposition

Page 48: The MPEG-7 Multimedia Content Description Interface

53

Segment Decomposition

timeconnectivity

Page 49: The MPEG-7 Multimedia Content Description Interface

54

Content structural aspects (Segment DS tree) Annotate

the whole image with StillRegion

Spatial segmentation at different levels

Among different regions we could use

SegmentRelationship description tools

Page 50: The MPEG-7 Multimedia Content Description Interface

55

Content structural aspects

Temporal segments

(Segment Relationship DS graph)

Page 51: The MPEG-7 Multimedia Content Description Interface

57

Content Semantic aspects (SemanticGraph)

Page 52: The MPEG-7 Multimedia Content Description Interface

58

Example of Structure-Semantic Link DS

Page 53: The MPEG-7 Multimedia Content Description Interface

59

Content abstraction aspects (CoAbstr)-Hierarchical summary of a video

f0

f0

f0

f00

f01

f02

- > enables rapid browsing, navigation (also sequential summary)

Page 54: The MPEG-7 Multimedia Content Description Interface

60

(CoAbstr)-Partitions and decompositions(ViewDecomposition DS)

Frequency-space graph

Page 55: The MPEG-7 Multimedia Content Description Interface

61

(CoAbstr) Content Variation

• Universal Multimedia Access: Adapt delivery to network and terminal characteristics

Page 56: The MPEG-7 Multimedia Content Description Interface

62

CoAbstr – A collection (Collection StructureDS)

- >groups segments, events, or objects into collection clusters and specifies properties that are common to the elements:•The CollectionStructure DS describes also statistics and models of the attribute values of the elements, such as a mean color histogram for a collection of images. •The CollectionStructure DS also describes relationships among collection clusters.

Page 57: The MPEG-7 Multimedia Content Description Interface

63

Reference Software: the XM

• XM implements– MPEG-7 Descriptors (Ds) – MPEG-7 Description Schemes (DSs)– Coding Schemes– DDL

extraction <--search and retrieval

<--trasnscoding

description filtering

Page 58: The MPEG-7 Multimedia Content Description Interface

64

Beyond mpeg-7 version 1 (D&DS in VXM)

ColorTemperature: This descriptor specifies the perceptual temperaturefeeling of illumination color in an image for browsing and display preference controlpurposes (user friendly). Four perceptual temperature browsing categories areprovided; hot, warm, moderate, and cool. Each category is used for browsing imagesbased upon its perceptual meaning. – uses dominant color descriptor

Illumination Invariant Color: wraps the color descriptors. One or more color descriptors processed by the illumination invariant method can be included in this descriptor.

Shape Variation: can describe shape variations in terms of Shape Variation Map and the statistics of the region shape description of each binary shape image in the collection. Shape Variation Map consists of StaticShapeVariation and DynamicShapeVariation. The former corresponds to 35 quantized ART coefficients on a 2-dimensional histogram of group of shape images and the latter to the inverse of the histogram except the background.

Media-centric description schemes: Three visual description schemes are designed to describe several types of visual contents. The StillRegionFeatureType contains several elementary descriptors to describe the characteristics of arbitrary shaped still regions.

Page 59: The MPEG-7 Multimedia Content Description Interface

65

Visual CE current phase

• CE explore new technologies on identifying original images and their modified versions (N-1 modified versions), focused on the accuracy and robustness of identification

- > robustness is measured as the accuracy (HitRatio = k/(N)) separately calculated with each level of modification

Modifications: Brightness Size reduction Color to Monochrome

JPEG compr. with varying quality factors Color reduction Crop Histogram Equalization

Blur Geometric Transformation

Page 60: The MPEG-7 Multimedia Content Description Interface

66

Towards MPEG-7 Query Format

- >Though, the interface to support queries in an MPEG-7 database is not yet supported, requirements have been drafted

Output Query Format

ClientApplication

MPEG-7 Database

Input Query Format

Query Management Tools

e.g-query by textual description-Combinations of query conditions-spesification of the structure of the result set

e.g. structure of

the response

containing the

resulting set

e.g-spesification of the exceptions

-relevant feedback

Page 61: The MPEG-7 Multimedia Content Description Interface

67

Basic search functionalities may include:

• Query by Description (the client application provides possible query criteria)

Page 62: The MPEG-7 Multimedia Content Description Interface

68

Page 63: The MPEG-7 Multimedia Content Description Interface

69

Page 64: The MPEG-7 Multimedia Content Description Interface

70

Page 65: The MPEG-7 Multimedia Content Description Interface

71

Page 66: The MPEG-7 Multimedia Content Description Interface

72

Page 67: The MPEG-7 Multimedia Content Description Interface

73

Page 68: The MPEG-7 Multimedia Content Description Interface

74

Page 69: The MPEG-7 Multimedia Content Description Interface

75

Page 70: The MPEG-7 Multimedia Content Description Interface

76

Page 71: The MPEG-7 Multimedia Content Description Interface

77

Page 72: The MPEG-7 Multimedia Content Description Interface

78

Page 73: The MPEG-7 Multimedia Content Description Interface

79