Top Banner
Multi-modal Information Systems Khurshid Ahmad Chair in Computer Science, Trinity College Dublin, Ireland
26

Multi-modal Information Systems

Jan 21, 2016

Download

Documents

berke

Multi-modal Information Systems. Khurshid Ahmad Chair in Computer Science, Trinity College Dublin, Ireland. Computation and its neural basis. Information Processing in the brain is characterised by the interaction of two or more areas of the brain concurrently: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multi-modal Information Systems

Multi-modal Information Systems

Khurshid AhmadChair in Computer Science,

Trinity College Dublin, Ireland

Page 2: Multi-modal Information Systems

Computation and its neural basis

Information Processing in the brain is characterised by the interaction of two or more areas of the brain concurrently:

Picture naming involves the interaction between the vision and the speech areas; and both apparently are stimulated by having the other modality present;

Numerosity and its articulation, understanding what a graph of numbers tells us, involves the interaction between audio areas and the spatial attention areas of the brain.

Page 3: Multi-modal Information Systems

Computation and its neural basis

Information Processing in the brain involves a reliance on the conceptual organisation of the world around us – what there is or ontology.

I have developed a text-based ontology method that has been used successfully for:

Terminology Extraction;Sentiment Analysis;Knowledge Management;Automatic annotation of images

and video sequences;

Page 4: Multi-modal Information Systems

Computation and its neural basis

Much of modern computing relies on the discrete serial processing of uni-modal data

Much of the computing in the brain is on sporadic, multi-modal data streams

Page 5: Multi-modal Information Systems

Computation and its neural basis

Much of modern computing relies on the discrete serial processing of uni-modal data

Much of the computing in the brain is on sporadic, multi-modal data streams

Page 6: Multi-modal Information Systems

Computation and its neural basis

• Adaptive Image Annotation: I am working with Trinity Institute of Molecular Medicine to annotate images of animal cells in motion; with the National Gallery of Ireland for annotating fine art images with archival material; and there is a possibility of annotating images of containerized goods at ports of entry

• Sentiment Analysis: Computation of ‘sentiments’, related to the behaviour of stakeholders, from free text and the time-serial correlation of the sentiment with indexes of prices, volumes, and ‘goodwill’. This work is in conjunction with the School of Business and the Irish Stock Exchange.

Page 7: Multi-modal Information Systems

Computation and its neural basis

• I am working a neural simulation of multi-modal information enhancement and suppression in conjunction in a self-organising framework;

• study of non-stochastic and unstable time series using wavelets and fuzzy logic

Page 8: Multi-modal Information Systems

Image and Collateral Texts

The key problem for the evolving semantic web and the creation of large data repositories is the indexation and efficient retrieval of images – both still and moving- and the identification of key objects and events in the images.

The visual features of an image – colour distribution, shapes, edges, texture, under-constrain an image . So an image cannot be described using visual features alone

Typically, image retrieval systems use images and associated keywords for indexing and retrieving images using both the visual features and the keywords

Page 9: Multi-modal Information Systems

Introduction to Image Annotation Why image annotation –

consider that in the data there are 20,000 medical images, and one would like to have a look at all those containing cells Visual query: low-level

feature or a exemplar image similar enough to the desired images

Text query: a linguistic description of the context of image

“Cell”

Query

Results

Page 10: Multi-modal Information Systems

Introduction to Image Annotation Why image annotation –

consider that in the data there are 20,000 medical images, and one would like to have a look at all those containing cells Visual query: low-level

feature or a exemplar image similar enough to the desired images

Text query: a linguistic description of the context of image

“Cell”

Query

Results

Page 11: Multi-modal Information Systems

Image and Collateral TextsCLOSELY COLLATERAL

TEXTS

BROADLY COLLATERAL TEXTS

References to the Figure in the main body of text

Title of the text

Other texts cited in the paper

Figure Caption

The Image

Page 12: Multi-modal Information Systems

Annotating an Image

Keywords Specialist terms Tags, “Folksonomy”

Descriptions authoritative/non-authoritative

Systems of concepts? Classification Systems Ontology

Flickr tags

HerculesHydraDuplicate ContentDuplicate Content Penalization

Flickr – Photo Sharing: http://www.flickr.com/

Steve Project: http://www.steve.museum - an “experiment in social tagging of art museum collections”

Page 13: Multi-modal Information Systems

“Syntactic“ similarity ?

Semantic similarity ?

Image and Collateral Texts

A. Jaimes & S. Chang. A conceptual framework for indexing visual information at multiple levels. In IS&T/SPIE Internet Imaging, 2000.

Erwin Panofsky. Studies in Iconology. Harper & Row, New York, 1962.

Gustave Moreau, Hercules and the Lernaean Hydra, c. 1876, Art Institute of Chicago.

Page 14: Multi-modal Information Systems

Trinity Multi-modal Systems Multi-modal Information Systems: To develop a system that learns to segment images, and learns to annotate images with keywords and learns to illustrate keywords with images

Joint feasibility study between Trinity Computer Science & Trinity Molecular Medicine Laboratory;

Computer Science team is led by Prof. Khurshid Ahmad, and includes Dr. Chaoxin Zheng, Dr. Jacob Martin and Dr Ann Devitt.

Page 15: Multi-modal Information Systems

Trinity Multi-modal Systems A neural computing solution to automatic annotation and illustration

•The Trinity system is based on an earlier system that has learnt to associate 9,000 keywords associated with 1,000 images (9 Keywords/image on average). •Once trained the system can retrieve images given keywords using full and partial matches.

Query Term Matched Text

Retrieved Image

K. Ahmad, B. Vrusias, and M. Zhu. ‘Visualising an Image Collection?’ In (Eds.) Ebad Banisi et al. Proceedings of the 9th International Conference Information Visualisation (London 6-8 July 2005). Los Alamitos: IEEE Computer Society Press. pp 268-274. (ISBN 0-7695-2397-8).

Page 16: Multi-modal Information Systems

Trinity Multi-modal Systems

Query Image Matched Image

Retrieved Text

A neural computing solution to automatic annotation and illustration

Page 17: Multi-modal Information Systems

Indexing and Annotation

Human annotation is tedious and slow, and cannot cope with the huge volume of images generated by advanced image acquisition techniques, such as high content screening used in biological and medical research

There is a need to automate the process of annotating or indexing images in laboratories, at customs check posts, in art galleries or on the internet

Page 18: Multi-modal Information Systems

How to Annotate Images

People are trained with knowledge in a specified domain and become experts so that they can annotate images using their expertise – a lot of other analysis can be done at the meantime.

Remember! It is a cell

Ha, I know this is a cell

Training/learning Similarit

y

Page 19: Multi-modal Information Systems

Automatic Image Annotation Training set: this is the basic of most

systems. Without the training dataset, it is like asking somebody to do a job without giving any education or training

Similarity: the new and unseen situation has something in common with the training set

Learning: this is to explore the association between images and their associated descriptions.

Page 20: Multi-modal Information Systems

词图 CITU (C2) System

What is in the system? A user-friendly and efficient interface to collect

training data A modern image analysis toolbox to process

images and extract features for similarity measures and a text processing component, which can extract linguistic features

A state-of-the-art cross-modal system, based on neural computing, to learn the associations between image features and textual features

A database acts as the communicator between different modules

词 = words; 图 = images

Page 21: Multi-modal Information Systems

词图 CITU (C2) System

What can the system do? Automatically analyse images – image

segmentation, colour, texture, and shape analysis

Automatically process text documents associated with images – frequency analysis and collocations – to extract key terms and key features

Automatically learn the association between image features and textual features – once it is trained or learnt, the system will automatically generate keywords for images or retrieve images for textural queries.

1. C. Zheng, K. Ahmad, A. Long, Y. Volkov, A. Davies, D. Kelleher 2007. Hierarchical SOMs: segmentation of cell migration images. International Symposium on Neural Networks, Nanjing, China, June 3-7.

2. C. Zheng, A. Long, Y. Volkov, A. Davies, D. Kelleher, K. Ahmad 2007. A cross-modal system for cell migration image annotation and retrieval. International Joint Conference on Neural Networks, Orlando, Aug. 11-17.

3. C. Zheng, D. Kelleher, K. Ahmad 2008. A semi-automatic indexing system for cell migration images. 2008 World Congress of Computational Intelligence, HongKong, June. 1-6.

Page 22: Multi-modal Information Systems

Architecture of CITU (C2)

Manual annotation

Cross modal learning

Database

Image analysis

Image pre-processing

Image segmentati

on

Feature extraction

Language processing

Frequency analysis

Collocations

Feature extraction

Image content

Image feature

Free text

Linguistic feature

Image and

linguistic Features

Cross modal

associations

Image content and free

text

Page 23: Multi-modal Information Systems

Automatic Image Annotation

User

Cross modal learning

Database

Image analysis

Image pre-processing

Image segmentati

on

Feature extraction

Image content

Image feature

Image feature

Linguistic feature

Image Annotation

Page 24: Multi-modal Information Systems

Image Retrieval

User

Cross modal learning

Database

Language processing

Frequency analysis

Collocations

Feature extraction

Free text

Linguistic feature

Image feature

Linguistic feature

Free text

Images

Page 25: Multi-modal Information Systems

Image Feature Extraction

Multiscale analysis Wavelet transform

is used to decompose the image into different scales.

Moment extraction Zernike moments

are extracted from each scale as features and passed to the database

I

A H V D

A DH V

Notation: I, image; A, approximation signal; H, horizontal signal; V, vertical signal; D, diagonal signal

Page 26: Multi-modal Information Systems

词图 CITU (C2) System