Visual Information Systems visual information retrieval.

Visual Information Visual Information SystemsSystems

visual information retrievalvisual information retrieval

Computational steps for visual Computational steps for visual retrieval systemsretrieval systems

1.1. image processing (colour, texture etc)image processing (colour, texture etc)1.1. human perception and computer perception human perception and computer perception

(computer vision)(computer vision)2.2. Sensory gapSensory gap

2.2. features definition, extractionfeatures definition, extraction1.1. low-level and high-levellow-level and high-level2.2. content, semantics, and conceptscontent, semantics, and concepts3.3. small scale and large scalesmall scale and large scale4.4. knowledge domain, knowledge elicitation, knowledge knowledge domain, knowledge elicitation, knowledge

discovery and managementdiscovery and management

3.3. Similarity measure, learn from feedback, and Similarity measure, learn from feedback, and dynamic indexingdynamic indexing

4.4. Databases and system architectureDatabases and system architecture5.5. Evaluation, not just system performance, but Evaluation, not just system performance, but

insights for the future insights for the future

VIR and Traditional Database?VIR and Traditional Database?

A traditional SQL database has as its A traditional SQL database has as its basic element data items in a relation:basic element data items in a relation:select nameselect namefrom employee, projectfrom employee, projectwhere employee.deptnumber = “25” ANDwhere employee.deptnumber = “25” AND project.number = “100”project.number = “100”

databases exploit known structures and databases exploit known structures and relationsrelations

DBMS retrieval is not probabilisticDBMS retrieval is not probabilistic How different from the WWW?How different from the WWW? And from traditional IR?And from traditional IR?

VIR and Traditional IR systems?VIR and Traditional IR systems?

IR systems can be considered the precursors IR systems can be considered the precursors to VIRto VIR

The basic unit of a IR system is a document The basic unit of a IR system is a document and the focus is on and the focus is on textual retrievaltextual retrieval exact matching - Boolean, text pattern searchingexact matching - Boolean, text pattern searching inexact matching - probabilistic, vector space, inexact matching - probabilistic, vector space,

clusteringclustering Visual information has its own characteristics Visual information has its own characteristics

that traditional IR is incapable to handlethat traditional IR is incapable to handle

Recap IR: Recap IR: What’s IRWhat’s IR

MotivationMotivation the larger the holdings of the archive, the larger the holdings of the archive,

the more useful it isthe more useful it is however, it is harder to find what you however, it is harder to find what you

wantwant IR is all about finding what you IR is all about finding what you

want when what you want is buried want when what you want is buried in a mass of what you don’t wantin a mass of what you don’t want

from Lesk, http://community.bellcore.com/lesk/columbia/session2/

Simple IR ModelSimple IR ModelUser

Query Results

Pre-Processing

Post-Processing

Searching

Storage

Collection & Processing

Stuff

BooleanVector

StemmingThesaurusSignature

RankingClusteringWeighting

BooleanVector

Feedback

Flat FilesInverted FilesSignature FilesPAT Trees

StemmingStoplist

Recap IR: Precision and Recap IR: Precision and RecallRecall

PrecisionPrecision ““ratio of the number of relevant documents ratio of the number of relevant documents

retrieved over the total number of documents retrieved over the total number of documents retrieved” retrieved”

how much extra stuff did you get?how much extra stuff did you get? RecallRecall

““ratio of relevant documents retrieved for a ratio of relevant documents retrieved for a given query over the number of relevant given query over the number of relevant documents for that query in the database”documents for that query in the database”

how much did you miss?how much did you miss?

Recap IR: Recap IR: Text RetrievalText Retrieval

The most popular approach is to extract The most popular approach is to extract keywords from each text document in the keywords from each text document in the database to form the indices of the database to form the indices of the document. document.

The keyword extraction process may be The keyword extraction process may be divided into three major steps, stopwords divided into three major steps, stopwords removal, stemming and word weightingremoval, stemming and word weighting stopwords removal: “a”, “an” and “the”.stopwords removal: “a”, “an” and “the”. stemming: removes the suffix and prefix of stemming: removes the suffix and prefix of

each word.each word. word weighting: estimates the weighting of word weighting: estimates the weighting of

each word. each word.

Recap IR: Recap IR: Text RetrievalText Retrieval

- Query will go through the same procedureQuery will go through the same procedure- Similarity matching: calculated from the pre-Similarity matching: calculated from the pre-

computed weighting of the matched computed weighting of the matched keywords.keywords.

- All documents with a similarity value higher All documents with a similarity value higher than a certain threshold will be considered as than a certain threshold will be considered as relevant documents and returned to the relevant documents and returned to the user.user.

- These relevant document may be ranked These relevant document may be ranked according to the similarity values when according to the similarity values when presenting to the user. (Most web search presenting to the user. (Most web search engines do this.)engines do this.)

Visual Information Retrieval-Visual Information Retrieval-keywordkeyword

It is difficult for text to capture the It is difficult for text to capture the perceptual saliency of some visual featuresperceptual saliency of some visual features Pictures cannot speak, but they are stronger Pictures cannot speak, but they are stronger

than words. than words. Text is not well suited for modelling Text is not well suited for modelling

perceptual similarity. perceptual similarity. Subjective. Subjective. ““What is needed in these cases is the use of a What is needed in these cases is the use of a

more concrete description of visual content, more concrete description of visual content, one more closely related to human one more closely related to human perception, and a new way of interaction perception, and a new way of interaction that fully exploits human perception that fully exploits human perception capabilities.”capabilities.”

•Textual content : free text search• image content : image features, shapes,

color, textures, spatial relationships

• Video content : motions, image features, scene composition, video semantics, audio, etc.

Visual information RetrievalVisual information Retrieval – – content-based approachcontent-based approach

Content-Based Image RetrievalContent-Based Image Retrieval

As happens during the maturation process As happens during the maturation process of many a discipline, after early successes of many a discipline, after early successes in a few applications, research is now in a few applications, research is now concentrating on deeper problems, concentrating on deeper problems, challenging the hard problems at the challenging the hard problems at the crossroads of the discipline from which it crossroads of the discipline from which it was born (Arnold 2000)was born (Arnold 2000)

computer vision, databases, and information computer vision, databases, and information retrieval. retrieval.

Deeper analysis is needed and semantics is Deeper analysis is needed and semantics is more desirable – make use of domain more desirable – make use of domain knowledgeknowledge

Domain and VariabilityDomain and Variability

A narrow domain has a limited and A narrow domain has a limited and predictable variability in all relevant predictable variability in all relevant aspects of its appearance.aspects of its appearance. Semantics is well-defined, and unique.Semantics is well-defined, and unique.

A broad domain has an unlimited and A broad domain has an unlimited and unpredictable variability in its unpredictable variability in its appearance even for the same appearance even for the same semantic meaningsemantic meaning Semantics is more ambiguous, and partialSemantics is more ambiguous, and partial Need more contextual informationNeed more contextual information

Domain and VariabilityDomain and Variability The notions of broad and narrow domains The notions of broad and narrow domains

are helpful are helpful in characterizing patterns of in characterizing patterns of use, in selecting features, and in designing use, in selecting features, and in designing systems. systems.

For narrow, specialized image domains, the For narrow, specialized image domains, the gap between features and their semantic gap between features and their semantic interpretation is usually smaller, so interpretation is usually smaller, so domain-specific models may help. domain-specific models may help.

In a broad image domain, the gap between In a broad image domain, the gap between the feature description and the semantic the feature description and the semantic interpretation is generally wideinterpretation is generally wide the required number of computational variables the required number of computational variables

would be enormous.would be enormous. Research issues raised……Research issues raised……

Research issuesResearch issues

How to handle variability?How to handle variability? Multiple processors and fusion Multiple processors and fusion

process? process? Inference engines?Inference engines?

Domain KnowledgeDomain Knowledge Laws of syntactic (literal) equality and similarity define the

relation between image pixels or image features regardless of its physical or perceptual causes.

Laws describing the human perception of equality and similarity

Physical laws describing equality and difference of images under differences in sensing and object surface properties. The physics of illumination, surface reflection, and image formation have a general effect on images.

Geometric and topological rules describe equality and differences of patterns in space.

Category-based rules encode the characteristics common to class z of the space of all notions Z.

Finally, man-made customs or man-related patterns introduce rules of culture-based equality and difference.

Difficulties in VISDifficulties in VIS

The sensory gap and the semantic The sensory gap and the semantic gapgap

The Semantic GapThe Semantic Gap A linguistic description is almost always A linguistic description is almost always

contextual, whereas an image may live by itself.contextual, whereas an image may live by itself. associate higher level semantics to data-driven associate higher level semantics to data-driven

observablesobservables labelling is seldom complete, context sensitive, labelling is seldom complete, context sensitive,

and, in any case, there is a significant fraction and, in any case, there is a significant fraction of requests whose semantics can't be captured of requests whose semantics can't be captured by labelling alone. Both methods will cover the by labelling alone. Both methods will cover the semantic gap only in isolated cases.semantic gap only in isolated cases.

This works well in narrow domain like I-Browse, This works well in narrow domain like I-Browse, though it is not the perfect solutionthough it is not the perfect solution

From broad domain to narrow From broad domain to narrow domaindomain

The challenge for image search The challenge for image search engines on a broad domain is to engines on a broad domain is to tailor the engine to the narrow tailor the engine to the narrow domain the user has in mind via domain the user has in mind via specification, examples, and specification, examples, and interaction.interaction.

Bridging the GapBridging the Gap New challenges in content-based retrieval are New challenges in content-based retrieval are

the huge amount of objects to search among, the huge amount of objects to search among, the incomplete query specification, the the incomplete query specification, the incomplete image description, and the incomplete image description, and the variability of sensing conditions and object variability of sensing conditions and object states.states.

The aim of content-based retrieval systems The aim of content-based retrieval systems must be to provide maximum support in must be to provide maximum support in bridging the semantic gap between the bridging the semantic gap between the simplicity of available visual features and the simplicity of available visual features and the richness of the user semantics. richness of the user semantics.

The broader the domain, the more browsing or The broader the domain, the more browsing or search by association can be the right solution. search by association can be the right solution. The narrower the domain, the more likely an The narrower the domain, the more likely an application of domain knowledge will succeedapplication of domain knowledge will succeed

Video RetrievalVideo Retrieval

There are three major processes to prepare There are three major processes to prepare a video for retrieval, video segmentation, a video for retrieval, video segmentation, index extraction and keyframe extraction.index extraction and keyframe extraction.

From another perspective, video retrieval From another perspective, video retrieval could be considered simpler than image could be considered simpler than image retrieval since video reveals its objects more retrieval since video reveals its objects more easily as the points corresponding to one easily as the points corresponding to one object move together. object move together.

In addition, video has a linear timeline, as In addition, video has a linear timeline, as important to the narrative structure of video as important to the narrative structure of video as it is in text.it is in text.

Video RetrievalVideo Retrieval VVideo segmentation divides the video ideo segmentation divides the video

into a number of segments by into a number of segments by detecting the camera breaks. detecting the camera breaks.

Index extraction: manual indexing, Index extraction: manual indexing, image analysis and computer vision image analysis and computer vision and object recognitionand object recognition

Keyframe extraction is to select Keyframe extraction is to select representative image frames from each representative image frames from each video segment to represent the segment. video segment to represent the segment. These keyframes may be used for These keyframes may be used for browsing and for presentation. browsing and for presentation.

Visual Information Systems visual information retrieval.

Documents