Multi-modal Information Systems Khurshid Ahmad Chair in Computer Science, Trinity College Dublin, Ireland
Jan 21, 2016
Multi-modal Information Systems
Khurshid AhmadChair in Computer Science,
Trinity College Dublin, Ireland
Computation and its neural basis
Information Processing in the brain is characterised by the interaction of two or more areas of the brain concurrently:
Picture naming involves the interaction between the vision and the speech areas; and both apparently are stimulated by having the other modality present;
Numerosity and its articulation, understanding what a graph of numbers tells us, involves the interaction between audio areas and the spatial attention areas of the brain.
Computation and its neural basis
Information Processing in the brain involves a reliance on the conceptual organisation of the world around us – what there is or ontology.
I have developed a text-based ontology method that has been used successfully for:
Terminology Extraction;Sentiment Analysis;Knowledge Management;Automatic annotation of images
and video sequences;
Computation and its neural basis
Much of modern computing relies on the discrete serial processing of uni-modal data
Much of the computing in the brain is on sporadic, multi-modal data streams
Computation and its neural basis
Much of modern computing relies on the discrete serial processing of uni-modal data
Much of the computing in the brain is on sporadic, multi-modal data streams
Computation and its neural basis
• Adaptive Image Annotation: I am working with Trinity Institute of Molecular Medicine to annotate images of animal cells in motion; with the National Gallery of Ireland for annotating fine art images with archival material; and there is a possibility of annotating images of containerized goods at ports of entry
• Sentiment Analysis: Computation of ‘sentiments’, related to the behaviour of stakeholders, from free text and the time-serial correlation of the sentiment with indexes of prices, volumes, and ‘goodwill’. This work is in conjunction with the School of Business and the Irish Stock Exchange.
Computation and its neural basis
• I am working a neural simulation of multi-modal information enhancement and suppression in conjunction in a self-organising framework;
• study of non-stochastic and unstable time series using wavelets and fuzzy logic
Image and Collateral Texts
The key problem for the evolving semantic web and the creation of large data repositories is the indexation and efficient retrieval of images – both still and moving- and the identification of key objects and events in the images.
The visual features of an image – colour distribution, shapes, edges, texture, under-constrain an image . So an image cannot be described using visual features alone
Typically, image retrieval systems use images and associated keywords for indexing and retrieving images using both the visual features and the keywords
Introduction to Image Annotation Why image annotation –
consider that in the data there are 20,000 medical images, and one would like to have a look at all those containing cells Visual query: low-level
feature or a exemplar image similar enough to the desired images
Text query: a linguistic description of the context of image
“Cell”
Query
Results
Introduction to Image Annotation Why image annotation –
consider that in the data there are 20,000 medical images, and one would like to have a look at all those containing cells Visual query: low-level
feature or a exemplar image similar enough to the desired images
Text query: a linguistic description of the context of image
“Cell”
Query
Results
Image and Collateral TextsCLOSELY COLLATERAL
TEXTS
BROADLY COLLATERAL TEXTS
References to the Figure in the main body of text
Title of the text
Other texts cited in the paper
Figure Caption
The Image
Annotating an Image
Keywords Specialist terms Tags, “Folksonomy”
Descriptions authoritative/non-authoritative
Systems of concepts? Classification Systems Ontology
Flickr tags
HerculesHydraDuplicate ContentDuplicate Content Penalization
Flickr – Photo Sharing: http://www.flickr.com/
Steve Project: http://www.steve.museum - an “experiment in social tagging of art museum collections”
“Syntactic“ similarity ?
Semantic similarity ?
Image and Collateral Texts
A. Jaimes & S. Chang. A conceptual framework for indexing visual information at multiple levels. In IS&T/SPIE Internet Imaging, 2000.
Erwin Panofsky. Studies in Iconology. Harper & Row, New York, 1962.
Gustave Moreau, Hercules and the Lernaean Hydra, c. 1876, Art Institute of Chicago.
Trinity Multi-modal Systems Multi-modal Information Systems: To develop a system that learns to segment images, and learns to annotate images with keywords and learns to illustrate keywords with images
Joint feasibility study between Trinity Computer Science & Trinity Molecular Medicine Laboratory;
Computer Science team is led by Prof. Khurshid Ahmad, and includes Dr. Chaoxin Zheng, Dr. Jacob Martin and Dr Ann Devitt.
Trinity Multi-modal Systems A neural computing solution to automatic annotation and illustration
•The Trinity system is based on an earlier system that has learnt to associate 9,000 keywords associated with 1,000 images (9 Keywords/image on average). •Once trained the system can retrieve images given keywords using full and partial matches.
Query Term Matched Text
Retrieved Image
K. Ahmad, B. Vrusias, and M. Zhu. ‘Visualising an Image Collection?’ In (Eds.) Ebad Banisi et al. Proceedings of the 9th International Conference Information Visualisation (London 6-8 July 2005). Los Alamitos: IEEE Computer Society Press. pp 268-274. (ISBN 0-7695-2397-8).
Trinity Multi-modal Systems
Query Image Matched Image
Retrieved Text
A neural computing solution to automatic annotation and illustration
Indexing and Annotation
Human annotation is tedious and slow, and cannot cope with the huge volume of images generated by advanced image acquisition techniques, such as high content screening used in biological and medical research
There is a need to automate the process of annotating or indexing images in laboratories, at customs check posts, in art galleries or on the internet
How to Annotate Images
People are trained with knowledge in a specified domain and become experts so that they can annotate images using their expertise – a lot of other analysis can be done at the meantime.
Remember! It is a cell
Ha, I know this is a cell
Training/learning Similarit
y
Automatic Image Annotation Training set: this is the basic of most
systems. Without the training dataset, it is like asking somebody to do a job without giving any education or training
Similarity: the new and unseen situation has something in common with the training set
Learning: this is to explore the association between images and their associated descriptions.
词图 CITU (C2) System
What is in the system? A user-friendly and efficient interface to collect
training data A modern image analysis toolbox to process
images and extract features for similarity measures and a text processing component, which can extract linguistic features
A state-of-the-art cross-modal system, based on neural computing, to learn the associations between image features and textual features
A database acts as the communicator between different modules
词 = words; 图 = images
词图 CITU (C2) System
What can the system do? Automatically analyse images – image
segmentation, colour, texture, and shape analysis
Automatically process text documents associated with images – frequency analysis and collocations – to extract key terms and key features
Automatically learn the association between image features and textual features – once it is trained or learnt, the system will automatically generate keywords for images or retrieve images for textural queries.
1. C. Zheng, K. Ahmad, A. Long, Y. Volkov, A. Davies, D. Kelleher 2007. Hierarchical SOMs: segmentation of cell migration images. International Symposium on Neural Networks, Nanjing, China, June 3-7.
2. C. Zheng, A. Long, Y. Volkov, A. Davies, D. Kelleher, K. Ahmad 2007. A cross-modal system for cell migration image annotation and retrieval. International Joint Conference on Neural Networks, Orlando, Aug. 11-17.
3. C. Zheng, D. Kelleher, K. Ahmad 2008. A semi-automatic indexing system for cell migration images. 2008 World Congress of Computational Intelligence, HongKong, June. 1-6.
Architecture of CITU (C2)
Manual annotation
Cross modal learning
Database
Image analysis
Image pre-processing
Image segmentati
on
Feature extraction
Language processing
Frequency analysis
Collocations
Feature extraction
Image content
Image feature
Free text
Linguistic feature
Image and
linguistic Features
Cross modal
associations
Image content and free
text
Automatic Image Annotation
User
Cross modal learning
Database
Image analysis
Image pre-processing
Image segmentati
on
Feature extraction
Image content
Image feature
Image feature
Linguistic feature
Image Annotation
Image Retrieval
User
Cross modal learning
Database
Language processing
Frequency analysis
Collocations
Feature extraction
Free text
Linguistic feature
Image feature
Linguistic feature
Free text
Images
Image Feature Extraction
Multiscale analysis Wavelet transform
is used to decompose the image into different scales.
Moment extraction Zernike moments
are extracted from each scale as features and passed to the database
I
A H V D
A DH V
…
Notation: I, image; A, approximation signal; H, horizontal signal; V, vertical signal; D, diagonal signal
词图 CITU (C2) System