Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects

Preparing for the 2008 Beijing Olympics :

The LingTour and KNOWLISTICS projects

MAO Yuhang, DING Xiao-Qing, NI Yang,LIN Shiuan-Sung, Laurence LIKFORMAN,

Christian BOITET, Gérard CHOLLET, Alain GOYE, Eric LECOLINET, Jacques PRADO

Presented here by Gérard [email protected] GET-ENST/CNRS-LTCI

http://www.tsi.enst.fr/~chollet

Outline

Rationale of the proposal Objectives

The Beijing 2008 Olympics Approaches

Multimedia, multilingual information server Information kiosk Intelligent Camera Bilingual Voice Communicator

Needs and relevance A PDA for tourists and travelling businessmen

Conclusions and Perspectives

Rationale for the IP-KNOWLISTICS

Logistics for knowledge in a specific domain (OG) Language independent knowledge representation

and management Multimedia (text, speech, image, video) Multimodal access (text, speech, pen, visual I/O) Distributed multilingual, multimedia server

accessible from mobile terminals (phone, PDA, PC,…) and kiosks

Primarily targetted for tourist applications initially 2008 Beijing Olympics as a field trial

Technical developments

Language independent knowledge representation (using conceptual graphs and an Intermediate Representation Language like the ‘Universal Networking Language’)

Tools for enconversion and evaluation Generation in 12 target languages Multilingual Speech Synthesis and Recognition VoiceUNL-based interactive dialog agent ‘Intelligent camera’ with Chinese character recognition Cross-language ‘Multimodal communicator’ on a PDA Cross-language lexical access

Chinese character recognition

Intelligent camera from Tsinghua Univ.

capturereco

translation

Extracting text from scene images

• Complex color images • Uncontrolled illumination • Variations : size, fonts, orientation,

texture• Complex backgrounds, shadows

Text extraction

Searching for character regions (text has uniform color) Multi-channel decomposition Connected components analysis Grouping of components Alignment analysis (number of horizontally or

vertically aligned components) Text identification (language independant features :

size, alignment,…)

Detection rate : 84 % False alarm rate : 5.6 %

Cross-language Multimodal Communicator

Use of a visual display (e.g. on a PDA) to mediate the dialogue between 2 persons speaking different languages.

Recognition of short utterances, display of a word graph, selection of keywords, visualisation (and synthesis) of the translation of key words and groups of words.

Specialised lexicon for dialog acts in typical touristic situations (in a restaurant, at the hotel, medical assitance, in the street, in public transport, about the Olympic games,…)

UMTS access to an information server offering maps, photographs, video sequences, web browsing, …

Automatic Speech Recognition in Multiple Languages

Sharing of acoustic models between languages to simplify extensibility to other languages.

Combination of phone models and adaptation from small amounts of data in new languages.

Model adaptation to user and environmental situations.

French

ChineseSharedacousticmodels

Language specific models

Knowledge representation

A formal language for representating the meaning of natural language sentence.

UNL (Universal Networking Language) introduced to describe natural language semantics.

Language-independent context indexing for cross-language information retrieval.

Use of conceptual hierarchy of UNL to address the inherent ambiguity of natural languages.

A set of semantic relations (linking concepts together) for a structured information pattern.

UNL representation

“The cat drank the milk”

agt(drink(icl>do,agt>thing, obj>liquid).@past.@entry,cat(icl>mammal>animal).@def)

obj(drink(icl>do,agt>thing, obj>liquid).@past.@entry,milk(icl>beverage>food).@def)

can be encoded by:

agt, obj are binary semantic relations

Role of semantic contents representation in indexing

Digital AudioVideo

Textual

Cross lingualMultimedia platform

User’s request

UNL encoding

User specific informationUNL decoding

Application architecture

UMTS server

Speech synthesis

Access information

a word graph,+ a list of keywordsTranslation

Digital OlympicDigital OlympicMulti-Language Information Multi-Language Information

Network Service System ProjectNetwork Service System Project

From VoiceXML to VoiceUNLand MultimediaUNL.

Presented here by Gérard [email protected] ENST/CNRS-LTCI

http://www.tsi.enst.fr/~chollet

With the contribution of Christian BOITET, Mutsuko TOMOYIKO and Catherine PELACHAUD

Outline

Rationale of the proposition Objectives

Promotion of a new standard, demonstrations

Approaches An extra layer of VoiceXML

Need and relevance Multilingual Vocal Servers

Integration and structuring effect Conclusions and Perspectives

Rationale for VoiceUNL

Need for Language Independent Vocal Servers, Need for a language independent

knowledge representation and management formalism

Principle of proposed solution: Start from UNL graphs augmented with

voice-oriented semantic marks (special UWs, attributes),

Generate in the target language, Voice-oriented marks become prosodic

markers, Final conversion to VoiceXML

2008 Beijing Olympics as a field trial

What is VoiceXML ?

A recommendation of W3C (WWW Consortium) An extension of XML for vocal information

servers, A set of normalised markup tags, Current ags concern language identification,

voice prompting, speech synthesis, form filling, barge in, echo cancelling,…

No provision to access a semantically encoded data base,

Need for a UNL-type front-end Compatibility with MPEG4-SNHC (talking head)

Applications

Prosodic information in UNL

Attributes that can influence the grammatical and the prosodic structure of a sentence already exist: @emphasis @qfocus

Representations should be defined, concerning : Emotion: @angry, @bored, @relaxed…? Focus: grouping words to emphasize in a

scope? Passivity: @passive? Speaker: @age, @sex, special UWs for voice

characteristics…? Expression (for face and gesture animation):

special UWs/constructs?


Demonstrations to be prepared within the LingTour, Normalangue and KNOWLISTICS projects

First target is the Beijing 2008 Olympics Some concept-oriented formalism

(such as Sowa's conceptual graphs) may be used to store knowledge

before building in UNL"interlingual prelinguistic, communicative

content"


UNL representation of meaning of natural language sentences directly available for retrieval, indexing and knowledge extraction.

UNL with multimedia contents (text, speech, image, video) and multimodal access (text, speech, visual I/O) to enrich the service for communication.

Comprehensive and extensive information service on PDAs with access to UMTS and wireless LAN.

Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects

Documents

languageindependent

meaning of natural language

natural language semantics

new languages

different languages

character regions text

extracting text

managementmultimedia