A Hierarchical Framework for Content-Based Image Retrieval - Dipti Vaidya
Jan 18, 2018
A Hierarchical Framework for Content-Based Image Retrieval
- Dipti Vaidya
Outline of Presentation
• Introduction and Motivation• Related Work• Shortcomings of Related Work• Problem Statement• Solution Approach• Details• Future Work
Introduction and Motivation
• Gigabytes of images generated and stored everyday
• Simple matching methods using text-based retrieval of these images are not appropriate
• Make the information organized to allow efficient browsing ,searching and retrieval
Introduction and Motivation
• This information or metadata can be split into 3 main categories :
1. Catalogue Info: type of image,author,date etc2. Syntactic Content : Information about primary features
like color, texture, shape,spatial relationship3. Semantic Content : Information / knowledge about the
content of the images, as to what the image represents. ex: a smiling girls represents a happy person.
Introduction and Motivation
CBIR systems fall into 2 main categories :1. Image retrieval by Syntactic Content :• Images can be presented to the system either in form of
actual image or by sketch• These images can be processed and closest matches are
returned
2. Image retrieval by Semantic Content :• Queries are posed and images are retrieved by matching
the query to the knowledge encoded
Related Work Syntactic Image Retrieval (RIE): RIE – Most of the systems use Color or Texture
Feature General Approach : Record a distribution of colors or
textures in the images.Images with the smallest difference values from example image are matches.
For example they use a simple color histogram to record the predominant colors in the following example image:
Related Work This causes problem when querying. Instead of looking for
pictures of a dog, it looks for images which have a brown blob on a green background.
Query: results
Image not found,cause the system has no idea of semantics
Related Work
Retrieval by Semantic Content :• The Knowledge-Based spatial image model defines a 3 –
layer model for representing knowledge about the domain specific content of the images – Chu, Hsu, Tiara
• Ontology based Photo Annotation -agent,object,action approach using Ontology- University of Amsterdam
• Structured Knowledge Representation for Image Retrieval using DL. – Based on Image regions– Meghini,et al
Related Work
Retrieval by semantic content has been shown to be successful , but it has the following drawbacks:
• Requires significant effort by domain experts when developing
• Unlikely to be extensible beyond a specific problem domain
Related Work – Bob’s System
Proposes a system which uses the complimentary strengths of Semantic and Syntactic Retrieval methods:
• Create a small domain database that will process semantic queries related to the problem domain and will generate a set of example images
• These synthesized example images can be sent to a larger Image database, for matching by Example.
Advantages of Hierarchical Framework
• Provides a semantically-relevant method of querying an image database
• Decouples the knowledge organization from the image matching mechanism– Requires expert involvement to encode knowledge of
smaller set of data– Images and image matching algorithms in the large target
database can change and improve with no impact to knowledge
– Multiple, distributed domain databases can be used against one target database
Target Database
User SemanticQuery
ExtractSemantic
Values
Select/SynthesizeExampleImages
ImagesMatching
Semantic Query
Domain #1 Database Domain #2 Database Domain #n Database
User SemanticQuery
ExtractSemanticValues
Select/SynthesizeExampleImages
ImagesMatching
Semantic Query
User SemanticQuery
ExtractSemanticValues
Select/SynthesizeExampleImages
ImagesMatching
Semantic Query
Example ImagePre-Processor
Search Engine
Ordered List ofClosest Matched
Images
Pre-Processed Image Library
RETRIEVE Function
Hierarchical CBIR Framework Diagram
Related Work – Bob’s System
Here’s what Bob’s system uses :• Structured annotation ( agent,action, object) to specify semantic values
of interest• Domain specific ontology to represent agents and objects and has an
image stored for each of the concept introduced in the ontology• Spatial Relationships to encode actions• A query is posed , and it is semantically processed to generate a set of
example images • These images are then sent to Gift for Retrieval by Example
Example (Bob’s system)Ontology
object
Two wheelers
bikemotorbike
hat man
bike
hat
Role
•Wears(agent, object) = object above agent
•Rides(agent,two-wheeler) = agent above two wheeler
man
Sample database
agent
Example (simple queries)Query: man rides a bike Query: man wears hat
Issues With Bob’s System
• Scalability and Maintenance: If we want to introduce a new concept in the domain ontology,we have to do it manually
• Can not store composite images : There is no method to reuse the synthesized images
• The results returned from GIFT ( image matching server ) could be improved such that they are more domain query related.
Problem Statement
• We need to build a system such that we are able to represent the image content in a way that is Hierarchical, so that we can make semantic queries; Compositional so that we can build complex terms from simple terms and thus reuse the synthesized images
• To be able to retrieve better results from the Image Matching server, given example images.
Problem Formulation• Let DA be a domain database, SDA be semantic values in DA and Q
be a query that resolves to a specific set of semantic values in DA:
{SDA} QUERY(Q, DA) (1)
• Furthermore, let DA have the property that each semantic value can be mapped to a set of image {I}
• Then there is a resultant mapping of Q to a set of example images {Iex} that can be represented as: {Iex} MAP(QUERY( Q , DA ),DA) (3)
Problem Formulation (Cont)• Now suppose an RIE database exists and has a RETRIEVE function
which returns all images {i} that match any of a set of example images {Iex}
{i} RETRIEVE( {Iex}, T ) (4)
• Combining equations (3) and (4) we get{i} RETRIEVE( MAP(QUERY(Q, DA),DA) , T ) (5)
• Equation (5) shows we can make a semantic query in one collection of knowledge (DA) and retrieve matching images from another (T).
• This approach can be considered a hierarchy of RIE and RSC systems.
Problem FormulationIn order to be able realize the hierarchical framework we need to solve several problems:
• Method to define and encode domain knowledge such that we can use it for semantic queries
• Method to represent the semantic content of the Composite Images and map it to the domain knowledge base
•Method to define the actions
• Method to be able to reuse the synthesized Images
• Develop the way to interface with the RIE such that we get more specific domain related results from the Target databaseThe solutions of the above problem will be my contribution to the system
Solution Approach
Our primary aim is to investigate if Description Logics in general, can be used to represent the contents of the domain specific database in a way that it is hierarchical and compositional
Or could we do with using Semantic Networks, thus reducing the computational power ??
Hierarchical CBIR System Diagram
User Query
Racer DL system ( domain knowledge base)
Image DB
Query Results Process Query Results
Image Synthesis
Synthesized Images
RIE - GIFT
Feature selection
Target DBReturned
images
R.F.
Description Logic
What is Description Logic?It is a language that allows reasoning about information in particular supporting the classification of descriptors
Description Logic models a domain in terms of 3 things:Individuals – which represent instances of objects which we are modeling
Concepts – denoting a collection of individuals or instances
Roles – relationships between or attributes of concepts or individuals
Example
• Concept Example– person represents all human beings– fruit represents all the fruits
• Individual Example– Man, woman are individuals of the concept person– Banana is an individual of the concept fruit
• Role Example– Eat(person,fruit) is a relationship describing person and
something they are eating
DL
• Using these small blocks we can build more complex expressions
• Example– Eat(person, fruits)– Eat(Person, fruits) & Sits(Person,Chair)
• Example– cool-student = student & drives(student,Ferrari)
Reasoning with DL• Subsumption ( )
– Basic inferencing tool– Checks whether a concept is more
general than other– Example:
• mother woman
Reasoning with DL
Classification– Collection of descriptions can be classified using subsumption,
providing a hierarchy of descriptions ranging from general to specific.
Exampleperson driving car and wearing hat person driving car person
New Concept: person wearing hat ?person driving car and wearing hat person driving car Personperson wearing hat Person
Automatic Classification
• person person driving car
person driving car and wearing hat
New concept or query: person wearing hat ???
Architecture of DL
• DL is described as being split into 2 parts.T-Box & A-BoxT- Box => Subsumption & ClassificationA-Box => Reasons about relationships between individuals thus providing
classification and retrievalEg: mammal vehicle person dog person wearing cap person driving bus bus bike person wearing cap & driving bus
TedMary’s neighbor
nimbusdriving
DetailsDescribing the semantic contents of the image:
We must describe three types of spaces: that of images themselves , that of the real world concepts they contain and that of what each action or role means.
For E.g. the following Image can be described as :
Image instance Image1
Image1 contains ( person driving car,wearing hat)
Image1 contains ( person driving car)
Image1 contains ( person wearing hat)
Image1 contains ( person) ; Image1 contains (car)
Image1 contains (hat)
Tools UsedDescribing the Semantic Content :
In order to describe the world concepts type-space and progressively link the image instances with these concepts, we use a DL system called RACER
Racer DL•RACER is a semantic web inference engine for developing ontologies
•RACER is a Description Logic reasoning system with support for
•TBoxes with generalized concept inclusions •ABoxes
Example of Racer Files
T-box:(signature :atomic-concepts (human person female male woman man
parent mother father
grandmother aunt uncle
sister brother
only-child pet organization politicalorganisation politician malepolitician image)
:roles ((has-descendant :transitive t)(has-pet :domain person :range pet)(covers :transitive t :domain image :range human)
(has-child :parent has-descendant
:domain parent
:range person)
Example of Racer FilesA- Box(instance image01 image)(instance image01 malepolitician)
(related image01 jonmajor covers)
(instance jonmajor malepolitician)
(instance alice mother)
(related alice betty has-child)
(related alice charles has-child)
QUERIES:(concept-instances sister)
(concept-ancestors mother)
(concept-descendants man)
(individual-fillers alice has-descendant)
Identifying objects in an image
• Segment sections of images and associate them with concepts
2 men standing
mountains
Identifying Objects in an Image
Implemented an image annotator which: Allows the user the identify the objects in the
image and store the information about it’s region of interest
This data is stored in the form of an XML file, which can be parsed during the synthesis process.
Resolving queries
Query: man driving a car and wearing a hat1- attempt to find an image describe with the query (no synthesizing needed), if not found then
2- break query into components (synthesizing needed)
Man driving a car, man wears a hat, if not found
3- break query into components (synthesizing needed)
man ,car, hats,actions…(for actions, we can have a geo-spatial modeling which maps the action..can use Bob’s definitions here.
IN DLS, query language and description language
are unified
Example 2 (composite queries)Query: girl rides a bike and wears a hat
Hierarchical CBIR System Diagram
User Query
Racer DL system ( domain knowledge base)
Image DB
Query Results Process Query Results
Image Synthesis
Synthesized Images
RIE - GIFT
Feature selection
Target DBReturned
images
R.F.
RIE
The query is posed to generate a set of example images
These example images will then be sent to the Image Matching Server for Retrieval by Example from the target database
This Server uses various image features such as color,texture, shape to retrieve similar images
Problem 2In order to be able to improve the quality of images retrieved from the Image Matching server, the returned images should be more domain specific
Suggestion: use feature selection ( color, texture or shape) for handling queries in RIE
Problem 2In the CBIR, where the feature selection is used, the following problems should be solved:
• It must select a subset of features that provides the best input algorithm for the server (GIFT)
•Since we want the feature selection to take place at every query, it must be time efficient
•It must be able to handle example set of size as small as 3 or 5
Proposed Solution•Users can manually assign image feature combination in our user interface for image retrieval.
•Query request is computed and processed by query server.
• The server processes all procedures, find out the closest images, display them on query viewer, and transmitted user feedbacks to index the images
•More accurate query result would be obtained in the next round of search.
Future Work
• Need to implement the image synthesis part
• Populate the domain knowledge base with more concepts and images
• Find a method to implement the improved RIE
• Test the retrieval results by giving in different algorithms based on texture, color and shape