Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Speech Translation on a PDABy: Santan ChallaInstructor Dr. Christel Kemke

Introduction Based on an article “PDA translates

Speech” by Kimberley Patch[1]. Combined effort of researchers from CMU,

Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc.

What is the Aim? Two-way translation of medical information from

English to Arabic and Arabic to English. System Used: iPaq handheld computer

System

iPaq handheld computer 64 MB memory Requirements

Two recognizers Translators Synthesizers

Different Phases

Automatic Speech Recognition (ASR)

Speech Translation

Speech Synthesis

Automatic Speech Recognition ASR-Technology that recognizes and

executes voice commands Steps in ASR

Feature Extraction Acoustic modeling Language modeling Pattern Classification Utterance verification Decision

Speech Recognition Process[2]

Feature Extraction

Pattern Classification

Acoustic Modeling Language Modeling

Utterance Verification

Decision

Functions of a speech recognizer

Feature Extraction Features:- Attributes pertaining to a person that enable a

speech recognizer to distinguish the phonemes in each word[3].

Energy:

Visual Display of Frequencies Spectrogram. The energy levels are decoded to

extract the features, which are stored in an feature vector for further processing[3].

Feature Extraction Speech Signal ->Microphone->Analog signal. Digitization of analog signal to store in the computer. Digitization involves sampling (Common sampling

rates…8000hz to 16,000hz). Features are extracted from the digitized speech. Results in feature vector (numerical measurements of

speech attributes [3]) Speech recognizer uses the feature vectors to decode

the digitized speech signal.

Acoustic Modeling Numerical representation of sound (utterances of

words in a language). Comparison of speech features of digitized speech

signal with the features of existing models. Determination of sound is probabilistic by nature. Hidden Markov Model (HMM) is a statistical technique

which forms basis for the development of acoustic models.

HMMs give the statististical likelihood of particular sequence of words or phonemes[3]

HMMs are used in both speech training and speech recognition

HMMs Cont’d Depend on the Markov Chain. (a sequence of random

variables whose next values depend on the previous values[3] as represented below).

Other Speech Recognition Components Pattern Classifier: The Pattern classification

component groups the patterns generated by the acoustic modeling component. Speech patterns having similar speech features are grouped together.

The correctness of the words generated by the pattern classifier is measured by the utterance verification component.

What the Speechalator Prototype[4] uses… The prototype uses a HMM based recognizer,

designed and developed by Multi-Modal Technologies Inc.

The speech recognizer needs 1 MB of memory and the acoustic models occupy 3MB of memory.

Speech Translation

Speech Translation

What is Machine Translation (MT)? Translation of Speech from one language

to another with the help of software. Types of MT:

Direct Translation (Word–to-word) Transfer Based Translation Interlingua Translation

Why MT is difficult Ambiguity: Sentence and words have

different meanings. Lexical Ambiguity, Structural Ambiguity, Semantically Ambiguous.

Structural Differences between Language

Idioms cannot be translated

Approaches in Machine Translation

Analysis

IL

Synthesis

Source LanguageTarget LanguageDirect Translation

Machine Translation Triangle or Vauqois Triangle

Transfer

Differences between the three translation architectures:

Direct translation: Word-to-word translation Transfer based: Requires the knowledge of

both source and target language. Suits for Bilingual Translation Intermediate representations are language

dependent Parses the source language sentence, and

applies transfer rules that map grammatical segments of the source and target language.

Differences between the three translation architectures cont’d..

Interlingual Transaltion. Generates a language independent

representation called Interlingua (IL) for the meaning of sentences or segments of sentences in the source language.

A text in source language can be converted into any target language. Hence suits for multilingual translation.

More on Machine Translation Knowledge Based MT

(KBMT): Completely analyze and

understand the meaning of the source text [5].

Translate into target language text.

Performance heavily relies on the amount of world knowledge present to analyze the source language.

Knowledge represented in the form of frames. [Event: Murder is a: Crime]

Machine Translation Cont’d Example Based MT (EBMT):

Sentence are analyzed on the basis of similar example sentences analyzed previously.

What Speechalator Prototype Uses? Statistical based MT (SBMT) [5]:

Uses Corpora that is analyzed previously. No linguistic information required. N-gram modeling used

Speech Synthesis

Speech Synthesis

Generation of human voice from a given text or phonetic description [6]. Text To Speech (TTS) systems.

Snapshot of Spechalator

Conclusions Speechalator is an good achievement in both mobile

technology and NLP. Simple push-to-talk button interface. Uses optimized Speech recognizers and speech

synthesizers. This architecture allows components to be placed both

on-device and on a server. Presently most of the components are ported to the

device. Performance:

80% accuracy Takes 2-3 seconds for translation Presently restricted to a domain…

Future Work

Increase accuracy of the device to deal with noisy environments.

Build more learning algorithms. Multi-lingual speech recognizer. To achieve Domain independence.

References1. Kimberley Patch. PDA Translates Speech. Technology and

Research News (TRN), 17/24 December, 2003.2. Richard V. Cox, Lawrence R. Rabiner, Candace A. Kamm.

Speech and Language Processing for next-millennium communication services. Proceedings of the IEEE, 88(8):1314-1337, Feb 2000.

3. http://www.isip.msstate.edu/projects/speech/ ASR Home page.4. Speechalator: Two-Way Speech-To-Speech Translation on a

Consumer PDA, Eurospeech 2003 Geneva, Switzerland Pages:1-4.

5. Machine Translation: A survey of approaches. Joseph Seaseley. University of Michigan Ann Arbor.

6. Thierry Dutoit . A short introduction to Text-to-Speech Synthesis (TTS). http://tcts.fpms.ac.be/synthesis/introtts.html

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Documents

translation of speech

speech translation slide

speech recognition slide

speech recognizer slide

speech patterns

speech training

similar speech features

feature extraction features