Multimedia Information Management Final Project Multimodal Searching Sara Egidi Fabio Greco Alessio Villardita
Multimedia Information
ManagementFinal Project
Multimodal Searching
Sara EgidiFabio Greco
Alessio Villardita
Roadmap●System Architecture
●Dataset
●Feature extraction
●Features quantisation
● Indexing
●Searching
● Interface Implementation
●Results
Project objectivesDevelopment of a search engine that allows textual, visual and multimodal searches. Implementation steps:
Extraction of global deep featuresIndexing of visual featuresIndexing of image metadata (tags)Combination of text and visual features at search timeExtensionsMultiple layers’ visual featuresShow classification results
Dataset 25000 images from Flickr
Raw - Exif Includes information on camera, settings, date, time and perhaps location.
Annotations Very few in number (29), not sufficiently large to be representative and useful for indexing purposes.
TagsPreprocessed by flickr. Tags written by users yield some meaningless and thus useless entries.
Deep feature extraction
AlexNet
Global features:
FC6-Layer 6
FC7-Layer 7
FC8-Class labels
Overall System Architecture
Features quantizationFrom deep features to alphanumeric strings:
each component of the feature vector is associated with a unique alphanumeric keyword
to keep the feature weight into account, the float value of each component is represented as integer using Math.round and using a quantization factor Q
Example Q = 30[ 0.20 0.005 0.12 0.29 ] → [ 6 0.15 3.6 8.7 ] → [ 6 0 4 9 ] → [A1 A1 A1 A1 A1 A1 A3 A3 A3 A3 A4 A4 A4 A4 A4 A4 A4 A4 A4 ]
Indexing
Id
Tags
Deep feature 6
Deep feature 7
Class label (lvl 8)
Text Query
Visual Query
Text+Visual Query
Searching5 different combinations
Text
Text + uploaded image
Text + indexed image
Uploaded image
Indexed image
text
Interface Implementation
ResultsWithout text
(Visual Query)
With text (dog)
(Multimodal Query)
Different layers
ReferencesM. J. Huiskes, M. S. Lew (2008). The MIR Flickr Retrieval Evaluation.
ACM International Conference on Multimedia Information Retrieval (MIR'08), Vancouver, Canada (bib)
Large Scale Deep Convolutional Neural Network Features Search with Lucene, Claudio Gennaro
The MIR Flickr Retrieval Evaluation, Mark J. Huiskes and Michael S. Lew
Source code: http://www.github.com/egidisa/MultiModalSearch