MediaView -- Towards a Semantic Multimedia Database Model Qing Li Dept of Computer Science City University of Hong Kong.

MediaView -- Towards a “Semantic” Multimedia Database Model

Qing LiDept of Computer Science

City University of Hong Kong

Outline

Motivation & Introduction Modeling Constructs Logical Implementation Real-World Applications Conclusion

State-of-the-art

Multimedia Systems and Applications an explosive growth in recent years demand on managing multimedia using

databases

Database techniques for multimedia data modeling indexing query processing presentation & synchronization

“Semantic Gap”

semantics-intensive multimedia systems & applications

non-semanticmultimedia data models

require

model

semantic meaning of the

data

raw data,primitive

properties (size, format,

etc)

Semantic Gap

Semantic modeling of multimedia -- Why hard? Context-dependency

Semantics is not a static and intrinsic property The semantics of an object often depends on:

the application/user who manipulate the object the role that the object plays other objects in the same “context”

Van Gogh’s

paintings

flower

Example:

Why hard? (cont.) Modality-independency

Media objects of different modalities may suggest the similar/related semantic meanings.

Example:

Harry Potter has never been the star of a Quidditch team, scoring points while riding a broom far above the ground. He knows no spells, has never helped to hatch a dragon, and has never worn a cloak of invisibility.

Query:

Results:

image video text

MediaView – A “Semantic Bridge”

An object-oriented view mechanism that bridges the semantic gap between multimedia systems and databases

Core concept – media view (MV) a customized context for semantic

interpretation of media objects (text docs, images, video, etc)

collectively constitute the conceptual infrastructure of an multimedia system & application

Architecture

External Schema

mediaview 1

Internal Schema

mediaview 2

mediaview n

. . .

Object-oriented Database

Multimedia Systems

Conceptual Schema

. . .

MediaView Mechanism

Basic Concepts

So, a media view MVi can be represented as a triple:MVi= <Mi, Pi, Ri,>

Where:Mi - a set of objects that are included into MVi as its

members. Each object o∈Mi belongs to a certain source class, and different members of MVi may belong to different source classes.

Piv - a set of properties (attributes and methods) applied on either MVi itself (Piv) or on all the members (Pim).

Ri - a set of relationships, and each r∈Ri is in the form of <oj, ok, t>, which denotes a relationship of type t between member oj and ok in MVi; Ri itself may exhibit a “graph”.

Basic Concepts

An example…

Image

MultimediaObject

TextDocument

AudioClip

VideoClip

Image

BitmapImage

JPEGImage

keyframe

audiotrack

ImpressionisticArtworks

Name

Artist

Type

Style

Wavelet-Texture

Dominant-Shape

Color-Histogram

Artworks

RealisticArtworks

ImpressionisticPaintings


Post-modernArtworks

(a) Base Class (B) Media View

(d) View Schema(c) Base Schema

SongSpeechImpressionistic

Sculptures

subclasssubclass

subviewsubview

Basic Concepts

Semantics-based data reorganization via media views

text

audio

video

image

media view

Basic ConceptsDefinition 5: The semantic graph (SG) is an

undirected graph G={V, E}, where V is a finite set of vertices and E is a finite set of edges. Each element ViV corresponds to a multimedia object Oi in the database. E is a ternary relation defined on V×V×N. Each e=<Vi,Vj, n>E represents a semantic link of degree n between object Oi and Oj, where n is the number of media views to which both objects belong. We define n as the correlation factor between Oi and Oj.

Basic Concepts

Definition 6: The correlation matrix M=[Mij] is an adjacency matrix of the semantic graph. Specifically, each element Mij contains the correlation factor between Oi and Oj, with all the diagonal elements set to be zero.

Basic Concepts

Semantic Graph Model

O1O5O4O3O2

O1

O5

O4

O3

O2

0

0

1

1

2

2

1

0

0

0

0

1

1

1

1

1

1

11

11

1

0 0

0

31

24

5

van Gogh’ sGallery


(a) semantic grpah (b) correlation matrix

O1

O5

O4

O3

O2

“ Sunflower” (by van Gogh)

“ Potato Eaters” (by van Gogh)

Biography of van Gogh

Ohter impressionistic artwork

An audio guide

View Operators

A set of operators that take media views and view instances as operands.

Our intension is not to come up with a complete set of operators, but to focus on those that are indispensable in supporting queries and navigation over multimedia objects.

View Operatorstype-level

V-overlapsyntax<boolean>:= v-overlap (<media view1, media view2 >)semantics true, if and only if ( o O)(oextent(<media view1>) and oextent(<media view2>))

Crosssyntax{<object>}:= cross (<media view1, media view2 >)semantics{<object>} := {o O | o extent(<media view1>) and oextent(<media view2>)}

Sumsyntax{<object>}:= sum (<media view1, meida-view2 >)semantics{<object>} := {o O | o extent(<media view1>) or oextent(<media view2>)}

Subtractsyntax{<object>}:= subtract (<media view1, media view2>)semantics{<object>}:= {o O | o extent(<media view1>) and oextent(<media view2>)}

View Operatorsinstance-level

Classsyntax<base class> := class(<view instance>)semantics<view instance> is a instance of <base class>

componentssyntax{<object>} := components (<view instance>) semantics {<object>} := { oO | o is a component (direct or indirect) of <view instance>}

i-overlapsyntax<boolean> := i-overlap (<view instnace1>, <view instance2>)semantics true, if and only if ( o O) (o components (<view instance1>) and o components(<view instance2>))

View Algebra

Functions-- derivation of new MVs from existing MVs

Heuristic Enumeration1. Blind enumeration 2. Content-based enumeration 3. Semantics-based enumeration

View Algebra

Algebra Operators select from src-MV where <predicate> project <property-list> from src-MV intersect (src-MV1, src-MV2) union (src-MV1, src-MV2) difference (src-MV1, src-MV2)

Comparison (vs. class)

media view object classmembershi

pheterogeneous objects uniform objects

member acquisition

dynamic inclusion/exclusion of existing objects of other classes

creating new objects

mapping one object can belong to multiple media views

one object has exactly one class

relationship inter-member semantic relationship

N/A

Comparison (vs. traditional object view)

media view object viewmembershi

pheterogeneous objects uniform objects

relationship inter-member semantic relationship

N/A

member properties

instance-level properties (user-defined)

inherited or derived properties (for view

instances)global

propertiesMV-level properties (user-

defined)N/A

Logical Implementation

MediaView Construction MediaView Customization MediaView Evolution

MediaViews Construction

Work with CBIR systems to acquire the knowledge from queries Learn from previously performed queries A multi-system approach to support multi-

modality of media objects

Organize the semantics by following WordNet

Why WordNet? Different queries may greatly vary with

the liberty of choosing query keywords

We need an approach to organize those knowledge into a logic structure A simple “context”: a concept in WordNet Common media views: corresponds to

simple contexts We provide all common media views, based

on which users can build complex ones.

Navigating the Multimedia Database

Navigating via semantic relationships of WordNetSemantic Relationship ExamplesSynonymy (similar) pipe, tubeAntonymy (opposite) fast, slowHyponymy (subordinate) tree, plantMeronymy (part) chimney, houseTroponomy (manner) march, walkEntailment drive, ride

Navigating the Multimedia Database

Multimedia Database

MediaView 1

MediaView 2

MediaView 3

MediaView 4

Semantic Relationship in WordNet

User

browse

MediaViews Construction

CBIRSystem(Video)

CBIRSystem(Image)

CBIR System(Text)

Query

...

Multimedia Database

MediaView Engine

System Feedback

Users

User Feedback

Results

Issue

MediaView Customization

Two level MediaView Framework

Basic MediaView

Customized MediaView

Simple Context Advanced Context

MediaView Customization

Dynamically construct complex-context-based media views based on simple ones An example complex context: “the Grand

Hall in City University” Several user-level operators are devised

to support more complex/advanced contexts, besides the basic operators

User-level Operators INHERIT_MV(N: mv-name, NS: set-of-

mv-refs, VP: set-of-property-ref, MP: set-of-property-ref): mv-ref

UNION_MV(N: mv-name, NS: set-of-mv-refs): mv-ref

INTERSECTION_MV(N: mv-name, NS: set-of-mv-refs): mv-ref

DIFFERENCE_MV(N1: mv-ref, N2: mv-ref): mv-ref

Build a MediaView in Run-time Example: find out

info about "Van Gogh"

Who is "Van Gogh"? What is his work? Know more about his

whole life. Know more about his

country. See his famous

painting "sunflower"

Legend

Multimedia Document

Media View 1

Text

Sound

Image

Video

Topic 1

Topic 2

Topic 3

Build MediaView

Build a MediaView in Run-time Who is “Van Gogh”?

INHERIT_MV(“V. Gogh“, {<painter>},name=”Van Gogh” ,);

What is his work? INTERSECTION_MV(“work”, {<painting>, vg});

Know more about his whole life. INTERSECTION_MV(“life”, {<biography>, vg});

Know more about his country. INTERSECTION_MV(“country”, {<country>, vg});

See his famous painting “sunflower” Set sunflower = INTERSECTION_MV(“sunflower”,

{<sunflower>, <painting>});Set vg_sunflower = INTERSECTION_MV(“vg_sunflower”, {vg_work, sunflower});

Authoring Scenario Creates a new media view named after the subject

All multimedia materials used in the document would be put into this MediaView for further reference.

To collect the most relevant materials for authoring, the user performs the MediaView building process.

Import suitable media objects by browsing media views Reference the manner and style of authoring, to

find other media views with similar topics. Drag & Drop “learning-from-references”

Interface of Our Authoring System

System Features

A Dynamic Environment Helps a user select materials from the

database to incorporate into the document

Query other similar media views for referencing the manner and/or style of authoring

Real-World Applications

A Multimedia Recipe Database Modeling basis Personalized (context-aware) manipulation

Cross-media indexing and retrieval system Novel way of annotating and retrieving media o

bjects Lead to new indexing strategies

A Personalized Recipe Database System

People can not live without foods Existing recipe websites provide huge amounts of recipes

throughout the world Fail to give support on analyzing and comparing recipes

(What are important cooking principles & skills; what makes two dishes’ taste so different, etc.)

Unable to help users find similar recipes in a comprehensive manner (only keyword-based search on recipe names)

Fail to adapt recipes to meet the real-world situation (e.g. due to lack of ingredients or user preference)

A Personalized Recipe Database System -- Our Contributions

Propose a recipe model which encompasses static attributes as well as dynamic behaviours (e.g. cooking procedures and constraints)

Present a novel perspective of evaluating the “quality” of a recipe by constructing and analysing its cooking graph (capture both action flows and data/ingredient flows)

Provide a promising way to address the problem of recipe adaptation heuristically (with flexible and feasible solutions)

Recipe on the Web

Ingredients:Chicken Thighs 250 g

Scallions 10 gSesame Paste 2 tsp.

Sugar 1 tsp.Soybean Sauce 2 tsp.

Sesame Oil ½ tsp.… ...

Step Illustration

Steps1. Use chicken thighs and cut away skin and fat2. Poach the chicken. Drain and cool. 3. Mix the sesame paste, sugar, soybean sauce and sesame oil4. Cub the chicken lightly till soft and shred. Put to a plate.5. Put shredded scallion around the chicken and pour the sauce over the chicken.

… ...Users’ Rating

and Comments

CategoryRegion-->Sichuan

Cooking Method-->PoachedIngredient-->Chicken

Video Clip Final Look

Bang Bang Chicken

Sample Recipe -- The Cooking Procedure of

“Triple Cheese Pasta Primavera”

Step number

Recipe cooking procedure in steps

1 Dice bell peppers. Slice squash and mushrooms.

2 Cook pasta according to package directions in unsalted water.

3Meanwhile, in a large skillet melt butter. Add bell peppers; cook

and stir occasionally until barely tender, about three minutes.

4Add squash and mushrooms; cook and stir occasionally until

barely tender, about four minutes.

5 Drain pasta; toss with vegetables in skillet.

6

In the saucepot in which the spaghetti was cooked, combine ricotta, mozzarella, milk, Parmesan, Italian seasoning, salt and black pepper. Over a medium-low heat cook and stir cheese mixture just until hot, about 1 minute.

7Add reserved pasta and vegetables; toss to coat; remove to a

serving platter.

Sample Recipe

Triple Cheese Pasta Primavera

1: Dice bell peppers. Slice squash and mushrooms.2: Cook pasta according to package directions in unsalted water.3: Meanwhile, in a large skillet melt butter. Add bell peppers; cook and stir occasionally until barely tender, about three minutes. 4: ………

action 1: diceaction 2: sliceaction 3: cook…

action i: stir…

action n: remove

p Steps in the Web Page

Primitive LevelComposite Level

Divided into n ActionsRecipeCrawled from the Web Page

Recipe Level

Parsing the Cooking Procedure of “Triple Cheese Pasta Primavera”

Recipe Model A recipe R is modeled and represented by a tuple of three

elements:R = <M, RP, SP>

where (a) M={Mi | i = 1.. m} – a set of ingredients. An ingredient

Mi is either a basic ingredient or a set of ingredients: Mi = <MID, MP>, MID—unique identity, MP—member level

properties (and functions) such as the name, quantity and image An ingredient Mi belongs to one of the three classes: Main, Minor

and Seasoning;

(b) RP is a set of recipe-level properties (and functions) applied on R itself, such as the main cooking style, region, nutrition and images of the dish of the recipe;

Recipe Model (c) SP = (V, E, Cons, Ingr) is a labeled directed “Cooking

Graph”, V={vi | i = 1..n} is a set of nodes.

vi—a cooking action “cooking action constraints”: Cons(vi)—associated constraint

conditions that should be satisfied when the action of vi takes place. e.g. conditions on temperature and duration etc.

E is a set of directed edges on V—temporal execution flow of the cooking actions; named “action flows”. An edge <vi ,vj> —vj should take place after vi. “cooking transition constraints”: Cons(vi , vj) –the conditions that

should be satisfied for the flow to take place. Ingr(vi) – ingredients that should be added into vi

O(vi) –the output ingredients of viThese inputs and outputs for the nodes are called “ingredient flows”.

Cooking Graph

bell peppers

squashmushrooms

dice

slice

add

pasta

meltcook

toss

Start Node

v1

v2

v3

v4

v5v8

v9

v10

v7v6

M1

M2

M4

M3

stir add

cook

stir

v11cook

Loop

LoopFork

Join

Sequential

v12

drain

butterM5

milksalt

mozzarellaParmesan

ricottablack pepper

Italian seasoning

stiradd combinetossremove

v13v14v15v16v17M6M7

M8

M9M10

M11

M12

End Node

Cons(v7,v8)

Cons(v7,v6)

Cons(v10,v12)

Cons(v13)

Cons(v10,v9)Cons(v4)

Cons(v3)

Cons(v12)

Cooking Graph

M : Ingredient Action Node

SP = (V, E, Cons, Ingr)

Action Flow

: Explanation V: E: Ingr:

Ingredient Flow

Cons( ):

Constraint

The Cooking Graph of “Triple Cheese Pasta Primavera”

Basic Properties

Definition 1. (Reachability) A cooking graph is defined as “reachable” if each of its nodes is “reachable”; a node is “reachable” if it is on a directed path from a starting node to the end node.

Definition 2. (Consistency) A cooking graph is defined to be “consistent” if the conditions for each node/edge is consistent (i.e. there exists assignment to variables to make the conditions true).

Constraints and Rules

Definition 3. (Constraint) A constraint is a predicate followed by one or more terms, enclosed in parentheses and separated by commas; a term is either a constant, variable or function expression. Constraints specify all kinds of conditions or

restrictions in the recipe model; Three categories: intra-recipe constraints, inter-recipe

constraints and outer-recipe constraints. Incompatible(Spinach, Tofu) says spinach and tofu ar

e incompatible and should not be cooked together.

Constraints and Rules Definition 4. (Rule) A rule is a logical implication of

the form “If Ф Then Ψ” (or, ), where Ф and Ψ are sentences. Validate the correctness of a recipe through reasoning and

recognition process. Handle complex situations such as to make necessary

adjustment or compensation once an improper cooking action occurs.

Describe cooking skills that have been widely accepted and commonly used.

Over_Put(salt) → Add(vinegar|water) says that if too much salt has been put into a dish, then neutralize the salty taste by adding either vinegar or water.

Recipe Cooking Graph Mining

Pattern — Some subgraphs occur in one or more cooking graphs and they have certain influence on the cooking effects (e.g. taste, appearance).

Find patterns for a set of recipes What’s usually done and what’s usually put in the cooking procedure

(one action, a series of actions, an ingredients, a set of ingredients, actions combined with ingredients)

Cooking graphs of different recipes may share the same pattern

Distinct subgraphs that determine the cooking effect (e.g. taste) should be identified

Sample Patterns

……

marinate

……

e.g. salt, sauce, garlic, scallion

Main Ingredient(s)

Seasoning Ingredient(s)

e.g. pork, chicken

……

coat

……

e.g. starch, water, egg

Main Ingredient(s)

Seasoning Ingredient(s)

e.g. pork, chicken

heat

Ingredient(s)

……

fry/ stir-fry/

deep-fry

remove from oil

……

oil

Passing Oil

boil

Ingredient(s)

……

simmer briefly

……

boiling/cold

water

remove

Blanching

Sample Cooking Style

Cooking Style Pattern with Dominating Action

Soft Deep-frying Coating + Passing Oil + deep-fry

Dry Deep-frying Marinating + Coating + deep-fry

Cooked-frying Passing Oil/Blanching/Steaming+ stir-fry (+ Thickening)

Slip-frying(Marinating + Coating) + Passing Oil + stir-fry +

Thickening

Soft Stirring Blanching/Steaming+ stir + Thickening

BraisingPassing Oil/Blanching/Steaming + simmer in sauce (+

Thickening)

Simmering Blanching + simmer in water/broth

Generally describe how a recipe is cooked in a Pattern Combination or in Graph Abstraction.

User Adaptation

Usually a user wants to make a dish that has the same cooking result (e.g. taste, appearance) as the recipe exhibits.

Unfortunately, the user is very likely to get a slightly or even totally different dish as he/she modifies the cooking procedure.

Objective reasons—e.g. lack of some ingredients, Subjective reasons—e.g. wrong cooking actions by carelessness or personal preference.

User Adaptation When the user makes an

adaptation, the system will check if the modified cooking graph is feasible.

If not, a set of feasible templates are provided.

The remaining subgraph is replaced by the user selected one.

Property check (Reachability, Consistency)

… … …...

… …

… …

… …

Remaining Original Subgraph

Templates

Adapted Subgraph

UserSelection

?

Originally One Recipe

…...

Adapted Subgraph

… …

User Selected Template

Substantial Ingredients & Constraints

Instantiation

Template Selection and Instantiation

Prototype System

Global Systemvs. User Space

Global System

… ...

Conventional Recipes in Structure

Adopted & Adapted Recipes in User Organized Structure

User Space

Export

Import

Linda

Tom

Mary

User Area

Global Area

Export a Recipe “Steamed Chicken”

Search “Spicy Bean Curd”, “West Lake Fish”, “…”

Comment a Recipe “Carp Soup”

Add a Favorite Recipe “Stir-Fried Prawns”

Try a Pop Recipe“Eight Precious Rice”

Prepare a Party Menu

Prototype System – Recipe Browser

Prototype System – Cooking Pattern Miner

Select Recipe

Select Cooking Style

Name of Recipe

Cooking Graph of

Selected Recipe

Show All Patterns in

Cooking Graph

Revert Recipe

Find Common Patterns for Recipes of Selected

Cooking Style

Recipe ListContaining

Selected Pattern

Common Pattern List

Selected Cooking Pattern

Prototype System – Similarity Calculator

Recipe 1

Similarity Ranking List

Name of Recipe 1

Cooking Graph of Recipe 1

Cooking Graph of Recipe 2

Revert Recipe 1 Revert Recipe 2Recipe 2 Find Common Subgraphs for Recipes 1 & 2

Apply Selected Subgraph to Recipe 1 & 2

Graph Similarity

Common Subgraph List

Selected Common Subgraph

Summary

Proposed a data model to represent a recipe Advocated cooking graph mining to find frequent

used patterns (actions, ingredients) Attempt to solve recipe adaptation problem by

using patterns as templates Developed a prototype system—RecipeView Further work include:

discover patterns of cooking graphs Refine and strengthen the algorithm of recipe

adaptation

Application Scenario

Candidates

Seeds

Results

Discover Refine

Users

PresentDesignate

Feedback(adjust)

Application Scenario

Advantages (vs. traditional retrieval techniques)

Easy-to-compose query By browsing (to get “seed” objects of arbitrary modalities) By subject (simply keyword) at various abstraction level

Multi-modal results a collection of images, text docs, videos, etc vs. a single type of media

Semantically relevant results natural outcome of exploring previously learnt knowledge vs. a set of specifically chosen features

Advantages (cont’d)

“Hill-climbing” Effect – retrieval performance grows as more user interactions are conducted

Materialized knowledge

Retrieval process

exploration

encouragelearning

User interactions

Conclusion

MediaView – a semantic multimedia database modeling mechanism to bridge the semantic gap between convention

al database and semantics-intensive multimedia applications

A set of user-level operators to accommodate the specialization/generalization relationships among the media views

Conclusion MediaView promises more effective access t

o the content of media databases Users could get the right stuff and tailor it to the

context of their application easily. Providing the most relevant content from p

re-learnt semantic links between media and context high performance database browsing and multi

media authoring tools can enable more comprehensive applications to the user

Conclusion

Users could customize specific media view according to their tasks, by using user-level operators

The effectiveness of using MediaView in the experimental problem domains Multimedia recipe database Cross-media indexing and retrieval

Further Issues

The development and transition of MediaView to a fully-fledged multimedia database system supporting “declarative” queries

Intensive and extensive performance studies

Advanced semantic relations (eg. temporal and spatial ones) can also be incorporated in combining individual media views

Thank you!

Q & A

Email: [email protected]

MediaView -- Towards a Semantic Multimedia Database Model Qing Li Dept of Computer Science City University of Hong Kong.

Documents

media view mvi

set of objects

type of mvi

semantic gap slide

media views slide

mvi ri

o o mvi

media view mvj