Multi-modal Information Systems

Multi-modal Information Systems

Khurshid AhmadChair in Computer Science,

Trinity College Dublin, Ireland

Computation and its neural basis

Information Processing in the brain is characterised by the interaction of two or more areas of the brain concurrently:

Picture naming involves the interaction between the vision and the speech areas; and both apparently are stimulated by having the other modality present;

Numerosity and its articulation, understanding what a graph of numbers tells us, involves the interaction between audio areas and the spatial attention areas of the brain.


Information Processing in the brain involves a reliance on the conceptual organisation of the world around us – what there is or ontology.

I have developed a text-based ontology method that has been used successfully for:

Terminology Extraction;Sentiment Analysis;Knowledge Management;Automatic annotation of images

and video sequences;


Much of modern computing relies on the discrete serial processing of uni-modal data

Much of the computing in the brain is on sporadic, multi-modal data streams


Much of modern computing relies on the discrete serial processing of uni-modal data

Much of the computing in the brain is on sporadic, multi-modal data streams


• Adaptive Image Annotation: I am working with Trinity Institute of Molecular Medicine to annotate images of animal cells in motion; with the National Gallery of Ireland for annotating fine art images with archival material; and there is a possibility of annotating images of containerized goods at ports of entry

• Sentiment Analysis: Computation of ‘sentiments’, related to the behaviour of stakeholders, from free text and the time-serial correlation of the sentiment with indexes of prices, volumes, and ‘goodwill’. This work is in conjunction with the School of Business and the Irish Stock Exchange.


• I am working a neural simulation of multi-modal information enhancement and suppression in conjunction in a self-organising framework;

• study of non-stochastic and unstable time series using wavelets and fuzzy logic

Image and Collateral Texts

The key problem for the evolving semantic web and the creation of large data repositories is the indexation and efficient retrieval of images – both still and moving- and the identification of key objects and events in the images.

The visual features of an image – colour distribution, shapes, edges, texture, under-constrain an image . So an image cannot be described using visual features alone

Typically, image retrieval systems use images and associated keywords for indexing and retrieving images using both the visual features and the keywords

Introduction to Image Annotation Why image annotation –

consider that in the data there are 20,000 medical images, and one would like to have a look at all those containing cells Visual query: low-level

feature or a exemplar image similar enough to the desired images

Text query: a linguistic description of the context of image

“Cell”

Query

Results

Introduction to Image Annotation Why image annotation –

consider that in the data there are 20,000 medical images, and one would like to have a look at all those containing cells Visual query: low-level

feature or a exemplar image similar enough to the desired images

Text query: a linguistic description of the context of image

“Cell”

Query

Results

Image and Collateral TextsCLOSELY COLLATERAL

TEXTS

BROADLY COLLATERAL TEXTS

References to the Figure in the main body of text

Title of the text

Other texts cited in the paper

Figure Caption

The Image

Annotating an Image

Keywords Specialist terms Tags, “Folksonomy”

Descriptions authoritative/non-authoritative

Systems of concepts? Classification Systems Ontology

Flickr tags

HerculesHydraDuplicate ContentDuplicate Content Penalization

Flickr – Photo Sharing: http://www.flickr.com/

Steve Project: http://www.steve.museum - an “experiment in social tagging of art museum collections”

“Syntactic“ similarity ?

Semantic similarity ?

Image and Collateral Texts

A. Jaimes & S. Chang. A conceptual framework for indexing visual information at multiple levels. In IS&T/SPIE Internet Imaging, 2000.

Erwin Panofsky. Studies in Iconology. Harper & Row, New York, 1962.

Gustave Moreau, Hercules and the Lernaean Hydra, c. 1876, Art Institute of Chicago.

Trinity Multi-modal Systems Multi-modal Information Systems: To develop a system that learns to segment images, and learns to annotate images with keywords and learns to illustrate keywords with images

Joint feasibility study between Trinity Computer Science & Trinity Molecular Medicine Laboratory;

Computer Science team is led by Prof. Khurshid Ahmad, and includes Dr. Chaoxin Zheng, Dr. Jacob Martin and Dr Ann Devitt.

Trinity Multi-modal Systems A neural computing solution to automatic annotation and illustration

•The Trinity system is based on an earlier system that has learnt to associate 9,000 keywords associated with 1,000 images (9 Keywords/image on average). •Once trained the system can retrieve images given keywords using full and partial matches.

Query Term Matched Text

Retrieved Image

K. Ahmad, B. Vrusias, and M. Zhu. ‘Visualising an Image Collection?’ In (Eds.) Ebad Banisi et al. Proceedings of the 9th International Conference Information Visualisation (London 6-8 July 2005). Los Alamitos: IEEE Computer Society Press. pp 268-274. (ISBN 0-7695-2397-8).

Trinity Multi-modal Systems

Query Image Matched Image

Retrieved Text

A neural computing solution to automatic annotation and illustration

Indexing and Annotation

Human annotation is tedious and slow, and cannot cope with the huge volume of images generated by advanced image acquisition techniques, such as high content screening used in biological and medical research

There is a need to automate the process of annotating or indexing images in laboratories, at customs check posts, in art galleries or on the internet

How to Annotate Images

People are trained with knowledge in a specified domain and become experts so that they can annotate images using their expertise – a lot of other analysis can be done at the meantime.

Remember! It is a cell

Ha, I know this is a cell

Training/learning Similarit

y

Automatic Image Annotation Training set: this is the basic of most

systems. Without the training dataset, it is like asking somebody to do a job without giving any education or training

Similarity: the new and unseen situation has something in common with the training set

Learning: this is to explore the association between images and their associated descriptions.

词图 CITU (C2) System

What is in the system? A user-friendly and efficient interface to collect

training data A modern image analysis toolbox to process

images and extract features for similarity measures and a text processing component, which can extract linguistic features

A state-of-the-art cross-modal system, based on neural computing, to learn the associations between image features and textual features

A database acts as the communicator between different modules

词 = words; 图 = images


What can the system do? Automatically analyse images – image

segmentation, colour, texture, and shape analysis

Automatically process text documents associated with images – frequency analysis and collocations – to extract key terms and key features

Automatically learn the association between image features and textual features – once it is trained or learnt, the system will automatically generate keywords for images or retrieve images for textural queries.

1. C. Zheng, K. Ahmad, A. Long, Y. Volkov, A. Davies, D. Kelleher 2007. Hierarchical SOMs: segmentation of cell migration images. International Symposium on Neural Networks, Nanjing, China, June 3-7.

2. C. Zheng, A. Long, Y. Volkov, A. Davies, D. Kelleher, K. Ahmad 2007. A cross-modal system for cell migration image annotation and retrieval. International Joint Conference on Neural Networks, Orlando, Aug. 11-17.

3. C. Zheng, D. Kelleher, K. Ahmad 2008. A semi-automatic indexing system for cell migration images. 2008 World Congress of Computational Intelligence, HongKong, June. 1-6.

Architecture of CITU (C2)

Manual annotation

Cross modal learning

Database

Image analysis

Image pre-processing

Image segmentati

on

Feature extraction

Language processing

Frequency analysis

Collocations

Feature extraction

Image content

Image feature

Free text

Linguistic feature

Image and

linguistic Features

Cross modal

associations

Image content and free

text

Automatic Image Annotation

User


Database

Image analysis

Image pre-processing

Image segmentati

on

Feature extraction

Image content

Image feature

Image feature

Linguistic feature

Image Annotation

Image Retrieval

User


Database

Language processing

Frequency analysis

Collocations

Feature extraction

Free text

Linguistic feature

Image feature

Linguistic feature

Free text

Images

Image Feature Extraction

Multiscale analysis Wavelet transform

is used to decompose the image into different scales.

Moment extraction Zernike moments

are extracted from each scale as features and passed to the database

I

A H V D

A DH V

…

Notation: I, image; A, approximation signal; H, horizontal signal; V, vertical signal; D, diagonal signal


Multi-modal Information Systems

Documents

medical images

image retrieval systems

fuzzy logic image

exemplar image similar

image colour distribution

efficient retrieval

fine art images

images of animal cells