Top Banner
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Speech Translation on a PDABy: Santan ChallaInstructor Dr. Christel Kemke

Page 2: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Introduction Based on an article “PDA translates

Speech” by Kimberley Patch[1]. Combined effort of researchers from CMU,

Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc.

What is the Aim? Two-way translation of medical information from

English to Arabic and Arabic to English. System Used: iPaq handheld computer

Page 3: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

System

iPaq handheld computer 64 MB memory Requirements

Two recognizers Translators Synthesizers

Page 4: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Different Phases

Automatic Speech Recognition (ASR)

Speech Translation

Speech Synthesis

Page 5: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Automatic Speech Recognition ASR-Technology that recognizes and

executes voice commands Steps in ASR

Feature Extraction Acoustic modeling Language modeling Pattern Classification Utterance verification Decision

Page 6: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Speech Recognition Process[2]

Feature Extraction

Pattern Classification

Acoustic Modeling Language Modeling

Utterance Verification

Decision

Functions of a speech recognizer

Page 7: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Feature Extraction Features:- Attributes pertaining to a person that enable a

speech recognizer to distinguish the phonemes in each word[3].

Energy:

Page 8: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Visual Display of Frequencies Spectrogram. The energy levels are decoded to

extract the features, which are stored in an feature vector for further processing[3].

Page 9: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Feature Extraction Speech Signal ->Microphone->Analog signal. Digitization of analog signal to store in the computer. Digitization involves sampling (Common sampling

rates…8000hz to 16,000hz). Features are extracted from the digitized speech. Results in feature vector (numerical measurements of

speech attributes [3]) Speech recognizer uses the feature vectors to decode

the digitized speech signal.

Page 10: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Acoustic Modeling Numerical representation of sound (utterances of

words in a language). Comparison of speech features of digitized speech

signal with the features of existing models. Determination of sound is probabilistic by nature. Hidden Markov Model (HMM) is a statistical technique

which forms basis for the development of acoustic models.

HMMs give the statististical likelihood of particular sequence of words or phonemes[3]

HMMs are used in both speech training and speech recognition

Page 11: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

HMMs Cont’d Depend on the Markov Chain. (a sequence of random

variables whose next values depend on the previous values[3] as represented below).

Page 12: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Other Speech Recognition Components Pattern Classifier: The Pattern classification

component groups the patterns generated by the acoustic modeling component. Speech patterns having similar speech features are grouped together.

The correctness of the words generated by the pattern classifier is measured by the utterance verification component.

What the Speechalator Prototype[4] uses… The prototype uses a HMM based recognizer,

designed and developed by Multi-Modal Technologies Inc.

The speech recognizer needs 1 MB of memory and the acoustic models occupy 3MB of memory.

Page 13: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Speech Translation

Page 14: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Speech Translation

What is Machine Translation (MT)? Translation of Speech from one language

to another with the help of software. Types of MT:

Direct Translation (Word–to-word) Transfer Based Translation Interlingua Translation

Page 15: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Why MT is difficult Ambiguity: Sentence and words have

different meanings. Lexical Ambiguity, Structural Ambiguity, Semantically Ambiguous.

Structural Differences between Language

Idioms cannot be translated

Page 16: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Approaches in Machine Translation

Analysis

IL

Synthesis

Source LanguageTarget LanguageDirect Translation

Machine Translation Triangle or Vauqois Triangle

Transfer

Page 17: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Differences between the three translation architectures:

Direct translation: Word-to-word translation Transfer based: Requires the knowledge of

both source and target language. Suits for Bilingual Translation Intermediate representations are language

dependent Parses the source language sentence, and

applies transfer rules that map grammatical segments of the source and target language.

Page 18: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Differences between the three translation architectures cont’d..

Interlingual Transaltion. Generates a language independent

representation called Interlingua (IL) for the meaning of sentences or segments of sentences in the source language.

A text in source language can be converted into any target language. Hence suits for multilingual translation.

Page 19: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

More on Machine Translation Knowledge Based MT

(KBMT): Completely analyze and

understand the meaning of the source text [5].

Translate into target language text.

Performance heavily relies on the amount of world knowledge present to analyze the source language.

Knowledge represented in the form of frames. [Event: Murder is a: Crime]

Page 20: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Machine Translation Cont’d Example Based MT (EBMT):

Sentence are analyzed on the basis of similar example sentences analyzed previously.

What Speechalator Prototype Uses? Statistical based MT (SBMT) [5]:

Uses Corpora that is analyzed previously. No linguistic information required. N-gram modeling used

Page 21: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Speech Synthesis

Page 22: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Speech Synthesis

Generation of human voice from a given text or phonetic description [6]. Text To Speech (TTS) systems.

Page 23: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Snapshot of Spechalator

Page 24: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Conclusions Speechalator is an good achievement in both mobile

technology and NLP. Simple push-to-talk button interface. Uses optimized Speech recognizers and speech

synthesizers. This architecture allows components to be placed both

on-device and on a server. Presently most of the components are ported to the

device. Performance:

80% accuracy Takes 2-3 seconds for translation Presently restricted to a domain…

Page 25: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Future Work

Increase accuracy of the device to deal with noisy environments.

Build more learning algorithms. Multi-lingual speech recognizer. To achieve Domain independence.

Page 26: Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

References1. Kimberley Patch. PDA Translates Speech. Technology and

Research News (TRN), 17/24 December, 2003.2. Richard V. Cox, Lawrence R. Rabiner, Candace A. Kamm.

Speech and Language Processing for next-millennium communication services. Proceedings of the IEEE, 88(8):1314-1337, Feb 2000.

3. http://www.isip.msstate.edu/projects/speech/ ASR Home page.4. Speechalator: Two-Way Speech-To-Speech Translation on a

Consumer PDA, Eurospeech 2003 Geneva, Switzerland Pages:1-4.

5. Machine Translation: A survey of approaches. Joseph Seaseley. University of Michigan Ann Arbor.

6. Thierry Dutoit . A short introduction to Text-to-Speech Synthesis (TTS). http://tcts.fpms.ac.be/synthesis/introtts.html