THE ELECTRICAL ENGINEERING AND APPLIED SIGNAL …index-of.co.uk/Artificial-Intelligence/Pattern... · Lal Chand Godara Nonlinear Signal and Image Processing: Theory, Methods, and

THE ELECTRICAL ENGINEERINGAND APPLIED SIGNAL PROCESSING SERIES

Edited by Alexander Poularikas

The Advanced Signal Processing Handbook:Theory and Implementation for Radar, Sonar,

and Medical Imaging Real-Time SystemsStergios Stergiopoulos

The Transform and Data Compression HandbookK.R. Rao and P.C. Yip

Handbook of Multisensor Data FusionDavid Hall and James Llinas

Handbook of Neural Network Signal ProcessingYu Hen Hu and Jenq-Neng Hwang

Handbook of Antennas in Wireless CommunicationsLal Chand Godara

Noise Reduction in Speech ApplicationsGillian M. Davis

Signal Processing NoiseVyacheslav P. Tuzlukov

Digital Signal Processing with Examples in MATLAB®

Samuel Stearns

Applications in Time-Frequency Signal ProcessingAntonia Papandreou-Suppappola

The Digital Color Imaging HandbookGaurav Sharma

Pattern Recognition in Speech and Language ProcessingWu Chou and Biing Huang Juang

Forthcoming Titles

Propagation Data Handbook for Wireless Communication System DesignRobert Crane

Smart AntennasLal Chand Godara

Nonlinear Signal and Image Processing: Theory, Methods, and ApplicationsKenneth Barner and Gonzalo R. Arce

Forthcoming Titles (continued)

Soft Computing with MATLAB®

Ali Zilouchian

Signal and Image Processing Navigational SystemsVyacheslav P. Tuzlukov

Wireless Internet: Technologies and ApplicationsApostolis K. Salkintzis and Alexander Poularikas

CRC PR ESSBoca Raton London New York Washington, D.C.

Edited byWU CHOUAvaya Labs Research

BIING HWANG JUANGGeorgia Institute of Technology

PATTERNRECOGNITION inSPEECH andLANGUAGEPROCESSING

Preface

Basking Ridge, New JerseySeptember, 2002

Contributors

A. Abella

James Allan

T. Alonso

Jerome R. Bellegarda

William Byrne

Wu Chou

Sadaoki Furui

Jean-Luc Gauvain

Vaibhava Goel

Allen L. Gorin

Qiang Huo

Biing-Hwang Juang

Shigeru Katagiri

Lori Lamel

Qi (Peter) Li

John Makhoul

Hermann Ney

F. J. Och

G. Riccardi

Richard M. Schwartz

J. H. Wright

Contents

1 Minimum Classification Error (MCE) Approach in Pattern RecognitionWu Chou

2 Minimum Bayes-Risk Methods in Automatic Speech RecognitionVaibhava Goel� and William Byrne� � �

3 A Decision Theoretic Formulation for Robust Automatic Speech Recog-nitionQiang Huo

4 Speech Pattern Recognition using Neural NetworksShigeru Katagiri

5 Large Vocabulary Speech Recognition Based on Statistical MethodsJean-Luc Gauvain and Lori Lamel

6 Toward Spontaneous Speech Recognition and UnderstandingSadaoki Furui

7 Speaker AuthenticationQi Li� and Biing-Hwang Juang� � �

8 HMMs for Language Processing ProblemsRichard M. Schwartz and John Makhoul

9 Statistical Language Models With Embedded Latent Semantic Knowl-edgeJerome R. Bellegarda

10 Semantic Information Processing of Spoken Language – How May I

Help You?sm

A. L. Gorin, A. Abella, T. Alonso, G. Riccardi, and J. H. Wright,

11 Machine Translation Using Statistical ModelingHerman Ney, and F. J. Och

12 Modeling Topics for Detection and TrackingJames Allan

1

Minimum Classification Error (MCE)Approach in Pattern Recognition

Wu ChouAvaya Labs Research, Avaya Inc., USA

CONTENTS

1.1 Introduction

Proceedings of The IEEE �

��

1.2 Optimal Classifier from Bayes Decision Theory

�

��

��

��

��

� �

��

��

��

��

��

� �

��

��

��

� ��

��

��

� ��

��

��

��

�

��

� ��

�

��

� ��

��

��

� ��

� ��

� ��

1.3 Discriminant Function Approach to Classifier Design

� ��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� ��

��

� ��

� ��

1.4 Speech Recognition and Hidden Markov Modeling

�

��

��

� ��

� ��

� ��

�

��

� ��

��

��

� � � � �

� � � � ��

�

�

��

� ��

��

��

� ��

��

1.4.1 Hidden Markov Modeling of Speech

� � ��

�

� � ��

� � � ��

� ��

� ��

��

��

��

��

��

��

��

��

��

��

� ��

��

�

�

�

�

� ��

� ��

�

�

�

�

�

1.5 MCE Classifier Design Using Discriminant Functions

1.5.1 MCE Classifier Design Strategy

��

� ��

��

��

��

��

� �

��

��

��

�

��

��

� �

� � ��

�

loss function

��

��

� ��

� � � � ��

��

�

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

�

��

��

��

1.5.2 Optimization Methods

�

1.5.2.1 Expected Loss

��

��

��

��

��

��

��

� � � � ��

�

Property 1 Suppose the following conditions are satisfied:

��

��

��

��

��

��

��

�� such that for all t, the inner product

��

where is the Hessian matrix of second order partial derivatives;

��

�� is the unique such that

� ��

Then, � given by��

will converge to � almost surely (i.e. with probability one).

��

��

��

� � ��

�

1.5.2.2 Empirical Loss

� ��

��

�

��

��

��

��

��

��

� � � �

��

� � � �

� ��

��

��

�

�

�

��

1.5.3 Other Optimization Methods

� ��

��

��

�

��

�

� ��

�

��

��

�

�

�

1.5.4 HMM as a Discriminant Function

��

��

��

� ��

� ��

� ��

��

� ��

��

��

�

�

��

��

��

� �

��

� � �

segmental GPD� � ��

� �

��

��

��

��

��

�

��

��

��

� � ��

��

��

� � ��

��

��

��

��

��

��

��

�

� � � � ��

��

� ��

��

��

��

��

� ��

��

��

��

��

� ��

�

��

��

��

��

�

��

��

��

�

��

��

��

� ��

��

��

��

��

��

��

��

��

��

��

Æ��

��

��

��

��

��

��

��

��

� ��

� ��

��

��

� ��

��

� ��

� � �Æ��

��

��

��

��

��

��

��

��

� ��

Æ��

��

��

� ��

��

� ��

��

��

��

��

� ��

��

��

��

�

��

��

��

� �

��

��

�

��

��

1.5.5 Relation between MCE and MMI

��

��

��

��

��

� ��

��

��

��

��

��

��

��

��

� ��

��

��

��

��

� ��

� ��

��

��

��

��

� ��

�

��

� � �

�

��

��

�

� ��

��

��

� � ��

� � �

��

��

��

��

�

��

��

� ��

�

��

��

��

��

� ��

� � ��

� ��

��

��

��

-10 -8 -6 -4 -2 0 2 4 6 8 10-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

FIGURE 1.1A plot of the value of the derivative of the sigmoid function.

��

� ��

��

��

� ��

� ��

��

��

��

�

��

��

�

��

� � �

��

��

��

��

��

��

��

��

� ��

��

1.5.6 Discussions and Comments

�

�

�

�

��

��

��

��

��

� �

1.6 Embedded String Model Based MCE Training

� �

�

FIGURE 1.2A structure diagram of a context dependent head-body-tail digit model inspeech recognition.

1.6.1 String Model Based MCE Approach

� � ��

�

��

��

��

��

�

� � � ��

� ��

� �

� ��

��

��

��

��

� ��

��

��

��

��

��

��

��

��

� � �

�

��

��

� �

FIGURE 1.3A diagram of the embedded string model based MCE training process.

��

� � ��

�

��

� �

�

� �

�

�

� ��

��

�� LD��

LD��

��

��

1.6.2 Combined String Model Based MCE Approach

��

��

��

��

1.6.2.1 Discriminative Model Combination

��

� �

��

��

��

��

��

��

��

�

�

��

��

� �

��

� ��

��

1.6.2.2 Discriminative Language Model Estimation

“too” “two”

�

� ��

�

�

�

�

�

�

1.6.3 Discriminative Feature Extraction

��

��

��

��

� � � � ��

� � � � � ��

1.7 Verification and Identification

��

� ��

��

��

��

��

� ��

��

� � � ��

� � ��

� � � ��

��

� ��

��

��

� � ��

FIGURE 1.4Block diagram of a speaker verification system

1.7.1 Speaker Verification and Identification

� �

� ��

��

� ��

� � ��

� ��

��

� ��

� ��

��

��

��

��

��

��

��

��

��

� ��

�

��

��

��

��

��

��

��

�

��

��

��

��

�

��

��

��

��

��

��

1.7.2 Utterance Verification

� ��

��

��

��

��

��

� ��

��

��

��

��

��

��

�

��

�

��

� ��

� �

� �

� �

��

� ��

� ��

� ��

��

� ��

��

��

��

��

��

��

��

��

��

�

1.8 Summary

Acknowledgement

References

IEEE Trans. on Elec-tronic Computers

IEEE Transactions on Computers

CLSP Research Note No. 40

Proceed-ings of ICASSP-86

IEEE Trans. Speech and Audio Processing

IEEE Transactions on Pattern and MachineIntelligence

Ann. Math. Stat.

Inequalities

Bull. Amer.Math Soc.,

Pacific J. Math.

Mathematical Statistics

Adaptive Algorithms and StochasticApproximations

Proc. 1997 Workshop onAutomatic Speech Recognition and Understanding Proceedings

IEEE Trans. Signal Processing

Proc. ICASSP92


IEEE Proc. ICASSP-92


Proc.ICSLP-94

Proc. DARPA ANN Tech. Program CSR Mtg.

Proceedings of The IEEE

International Journal ofPattern Recognition and Artificial Intelligence

“Adaptive discriminative learning in pattern recog-nition,”

Elements of Information Theory

IEEE Proc. ICSLP’98

IEEE Transactions on Comput-ers

J. Roy. Soc.

Stochastic Process

Pattern Classification and Scene Analysis

IEEE Transactions on Information Theory

IEEE Transactions on Informa-tion Theory

Porc.1997 IEEE Workshop on Automatic Speech Recognition and Understanding

IEEE Proc. ICASSP’98


IEEE Trans. on InformationTheory



Speech Communication


Proc. IEEE

Proc. of theIEEE

Advances in Speech Signal Processing

Statistical Methods for Speech Recognition

IEEE Trans. Acoust. Speech Signal Processing

Technometrics

IEEE Trans. onInformation Theory

IEEE Trans. Acoust., Speech & Sig.Proc.

IEEE Trans. Acoust., Speech & Sig. Proc.

IEEE Trans. on Speech and Audio Process-ing

Proc. ICASSP’95

IEEE Trans. Acoustic., Speech, SignalProcessing

IEEE Transactions on Audio andSpeech Processing

Proc.ICASSP’97

IEEE Proc.ICASSP-92

Proc.IEEE-SP Workshop on Neural Networks for Signal Processing

Artificial Neural Networks for Speech and Vision

IEEE Proceedings

IEEE Transactions onSpeech and Audio Processing

Proc. ICASSP’98


The Development of the SPHINX System

Proc. ICASSP’90

Testing Statistical Hypotheses

Proc. ICASSP’96

Proc.ICSLP96

Computer Speech and Language

Proc. EuroSpeech’97

Proc. NORSIG’98

Proc. ICASSP’98

IEEE Trans. Audio & Speech Proc.

J. Acoust. Soc.Am.

Proc.ICSLP’96

Proc.ICASSP’96

Computer, Speech and Lan-guage

IEEE Transaction on Speech and AudioProcessing

IEEE Proc.ICASSP’99

Comput. Speech Language


Adaptive, Leaning and Pattern Recognition

IEEE Trans., on Acoustics, Speech and SignalProcessing

IEEE Trans. on Speech and AudioProcessing

Proc.ICASSP’99


Convergence of Stochastic Process

Proc. IEEE

Fundamentals of Speech Recognition




Proc. 1995 EuroSpeech’95


ESCAWorkshop on Interactive Dialogue in Multi-Modal Systems

SIAM Review


Proc. ICSLP’92


Neural Network for Signal Processing II


Proc. ICASSP 91

IEEE Proc.EuroSpeech’97

Proc. ASRU’99


IEEE Transactions on Speech and Audio Processing


Proc.ICASSP’96

Speech Commu-nication

IEEE Transactions on AutomaticControl

IEEEProc. ICASSP’99

IEEE Proc.ICASSP’96

Proc. ICASSP-2002

IEEE Transactions on Image Processing

2

Minimum Bayes-Risk Methods in AutomaticSpeech Recognition

Vaibhava Goel� and William Byrne�

�IBM; �Johns Hopkins University

CONTENTS

2.1 Minimum Bayes-Risk Classification Framework

� � ��

��

��

hypothesis space

� �� Æ��

�

��

� ��

� ��

� �� Æ��

Æ��

Æ��

�

�

��

��

� ��

��

� � ��

Æ��

�

�

��

��

expected loss

��

��

��

� �

��

�� evidence space �

� �� evidencedistribution

2.1.1 Likelihood Ratio Based Hypothesis Testing

�

�

� �

�

Æ ��

��

� ��

� �

�

��

� ��

��

� � � ��

��

��

��

��

2.1.2 Maximum A-Posteriori Probability Classification

Æ ��

��

��

� � � ��

Æ��

��

��

��

��

2.1.3 Previous Studies of Application Sensitive ASR

2.2 Practical MBR Procedures for ASR

��

��

2.2.1 Summation over Hidden State Sequences

� ��

� �� language model� ��

acoustic model� � ��

� � ��

� ��

��

� ��

��

��

� ��

� ��

�

Æ��

��

�

��

��

��

�

��

� � ��

� ��

� ��

��

�

Æ��

�

��

��

� � ��N-best list lattice

2.2.2 MBR Recognition with N-best Lists

� ��

��

Æ��

�

��

��

2.2.3 MBR Recognition with Lattices

��

��

��

��

2.2.3.1 Lattice Definitions

� ��

� � � � � �

�

��

path complete path � �

�� path segment��

�� partial path � �

� � ��

��

��

��

��

��

partial path log-probability lattice backward log-probabilitylattice total probability ��

��

��

��

��

��

��

��

��

� ��

��

� ��

��

��

� ��

��

� �

��

��

� ��

��

FIGURE 2.1An example lattice. The time marks correspond to the node times and theword ending times. The numbers on the edges are logarithms of conditionaljoint probabilities as described in the text. The partial path log-probability ofa partial hypothesis is the log of the probability of its path; the partial path�� (‘HELLO’,‘0.6’) in this lattice has value ��. The lattice backwardlog-probability of a partial hypothesis �� is the log of the sum of probabili-ties of all lattice paths from end node of �� to the lattice end node; for thepartial path �� (‘HELLO’,‘0.6’) in this lattice these paths are indicated bydotted lines and the lattice backward log-probability of this �� is ��. Thelattice total probability of a partial path is the exponentiated sum of its partialpath log-probability and lattice backward log-probability; its value is �� for�� (‘HELLO’, ‘0.6’) in the lattice above.

� �

��

��

� ��

��

��

��

� ��

� ��

�

��

2.2.3.2 �� Search Under General Loss Functions

��

��

��

Æ��

�

��

��

� � ��

��

��

��

� �

��

��

��

�

��

��

� �

��

��

��

��

��

2.2.3.3 Single Stack Search Under Levenshtein Loss Function

��

� �

��

��

��

��

��

� � ��

��

��

�

� �

� ��

� ��

�

��

��

��

��

��

� � � ��

��

��

��

��

��

2.2.3.4 Prefix Tree Search Under Levenshtein Loss Function

�

��

��

��

��

�

��

��

��

� � ��

��

��

��

�

��

��

��

��

��

��

��

��

��

��

� � ��

��

��

��

��

��

� ��

��

prefix tree

��

��

��

��

partial hypothesis comparison cost

�

2.2.3.5 Pruning and Multistack Organization of the Prefix Tree Search

��

2.2.3.6 Loss Functions Other than Levenshtein Distance

��

2.3 Segmental MBR Procedures

high con-fidence regions

low confidence regions

��

��

�� segment sets� ��

� ��

� ��

��

�

��

��

��

��

conjunction rule ��

��

��

� � �

�

� ��

� ��

� � � ��

��

��

��

��

��

��

��

Proposition.�

Æ�� Æ��

��

��

Æ��

��

�

�

� ��

��

��

��

��

��

� ��

induced

��

��

��

��

��

��

2.3.1 Segmental Voting

Æ��

�

� ��

� ��

� ��

��

��

��

��

��

��

2.3.2 ROVER

��

� ��

��

� ��

��

��

��

��

��

��

simultaneous alignment

� �

corre-spondence set

��

� �

� � ��

FIGURE 2.2An example word transition network.

��

��

��

��

��

� � �

��

2.3.3 e-ROVER

joining expanded

� � � �

FIGURE 2.3Joining two correspondence sets.

��

��

��

��

��

��

� ��

��

segmentation

2.4 Experimental Results

2.4.1 Parameter Tuning within the MBR Classification Rule

� ��

� ��

��

��

�� word insertion penalty �

languagemodel scale factor

likelihood scale factor �

��

TABLE 2.1

� � ��

��

��

��

��

��

2.4.1.1 Optimization of Likelihood Parameters

Æ��

Æ��

�

��

�� Æ��

� � ��supervised optimization

unsupervised optimization

� � ��

� �

� � ��

� � � �

�

2.4.2 Utterance Level MBR Word and Keyword Recognition

��

��

��

��

��

��

��

��

��

abilities, bartenders, calculation, databasesa, and, the, besides, collaboration, distribution

2.4.2.1 Likelihood Scale Factor Tuning

��

2.4.2.2 N-best List Rescoring and �� Search

��

��

��

TABLE 2.2

��

�

2.4.3 ROVER and e-ROVER for Multilingual ASR

�

�

FIGURE 2.4Top panel shows the ratio of total number of e-ROVER correspondence sets tothat of ROVER correspondence sets, as a function of the pinching threshold.Bottom panel shows the WER performance of e-ROVER for these thresholds.

2.4.3.1 Correspondence Set Pinching

2.5 Summary

��

��

2.6 Acknowledgements

References

Mathematical Statistics: Basic Ideas andSelected topics

IEEE Conference on Acoustics, Speech, and Signal Pro-cessing

�� Hub-5 Conversational Speech Recognition Workshop

In Proceedings of the NIST and NSASpeech Transcription Workshop

IEEE Workshop on Au-tomatic Speech Recognition and Understanding

ACL99

IEEE Conference on Acous-tics, Speech, and Signal Processing

Word List With Content Word Marks

Minimum Bayes-Risk Automatic Speech Recognition

�� Eurospeech-99

In Proceedings of the NIST and NSA Speech Transcription Work-shop


Research Notes No. 40, Center for Language andSpeech Processing

IEEE Conference on Acous-tics, Speech, and Signal Processing

�� International Conference on Spoken Language Pro-

cessing

IEEE Conferenceon Acoustics, Speech, and Signal Processing

IEEE Transactions on Systems Scienceand Cybernetics

SIGART Newsletter

IBM Journalof Research Development

Statistical Methods for Speech Recognition

Proceedings of the 1997 Large Vocabulary Continuous Speech RecognitionWorkshop

IEEE Transactions on Signal Processing

�� International

Conference on Spoken Language Processing

IEEE

Conference on Acoustics, Speech, and Signal Processing

1997 IEEE Workshopon Automatic Speech Recognition and Understanding

Soviet Phys. Dokl.

Eurospeech-99

9th Hub-5 Conversational Speech Recognition Work-shop

9th Hub-5 Conversational Speech RecognitionWorkshop

Eurospeech-95

IEEE Transactions on Acoustics, Speech,and Signal Processing

IEEE Transactions on Acoustics, Speech, and Signal Processing

��

IEEE Conference onAcoustics, Speech, and Signal Processing

IEEE Trans. PAMI

IEEE Conference on Acoustics, Speech, and Signal Processing

Eurospeech-97

Estimation of Dependences Based on Empirical Data

IEEE Conference on Acoustics, Speech, andSignal Processing

IEEE Transactions on Acoustics, Speech, and Signal Processing

HTK 2.1

3

A Decision Theoretic Formulation for RobustAutomatic Speech Recognition

Qiang HuoThe University of Hong Kong, Hong Kong, China

CONTENTS

3.1 Introduction

� �

decision problem �

� �

� class �

� �

�

statistical pattern recognition

FIGURE 3.1Communication Theoretic View of ASR: Noisy Channel for Speech Generationand Signal Capturing (adapted from [68]).

� � �

��

� �� parametric family��

� ��

� ��training data

plug-in MAP a posteriori decision rule

��

� ��

��

��

statistical decision

3.2 Optimal Bayes’ Decision Rule for ASR

� � � � ��

� ��decision rule ��

� � ��

� � ��

� � �� deci-sion space ��

�� nonrandomized decision rule

��

� ��

��

��

��

��

� � �� sampling paradigm

��

loss�� loss function

��

� � ��

true distribution ��

total risk ��

��

��

��

��

��

�

��

��

��

��

��

��

� ��

��

��

��

��

��

��

��

��

��

��

��

��

Bayes’ decision rule

��

��

��

��

��

Bayes’ risk��

� 0-1 loss function

��

��

� � � ��

��

��

� ��

��

��

� ��

��

��

� ��

�� minimum classificationerror ��

� � ��

� ��

��

MAP decision rule

�

��

��

��

3.3 Adaptive Decision Rules Constructed from Training Samples

true ��true prior uncertainty

independent � � �� independent ��

� �� X independent �adaptive decision rule

3.3.1 Plug-in Bayes’ Decision Rules with Maximum-likelihood DensityEstimate

3.3.1.1 What are Plug-in Bayes’ Decision Rules?

plug-indecision rules � ��

�� plug-indecision rule � � ��

� ��

��

��

��

��

��

��

��

��

��

plug-in MAP decision rule��

plug-in risk ��

��

��

��

��

��

density plug-in estimator � ��

��

��

��

3.3.1.2 Why Could Plug-in Bayes’ Decision Rules Work?

��

��

Property: � ��

��

Bayes’ risk consistency

Theorem: (Bayes’ risk consistency): � ��

� ��

��

��

��

3.3.1.3 Implications on Parametric Models and Parameter Estimation

assume �� estimated

Bayes’ risk consistent

��

representative

Discrete HMM Contin-uous Density HMM �

��

finite state knowledge sourcesnetwork search

maximumlikelihood �

��

��

��

��

��

�� point estimator ��minimum

discrimination informationdiscrimination information directed divergence

discriminative trainingmaximum mutual information

conditional maximum likelihood estimate H-criteria

corrective training minimum empirical classification error

3.3.2 Maximum-Discriminant Decision Rules Minimizing the Empiri-cal Classification Error

3.3.2.1 What are Maximum-Discriminant Decision Rules?

discriminant function ��

� � ��

maximum-discriminant decision rule ��

��

��

� � ��

�

��

��

� ��

minimum misclassification best-count ��

��

��

��

density estimator

��

��

��

3.3.2.2 Why Could Discriminant Approach Work?

Theorem: (Uniform Convergence) m-convex �

� � � �� uniformly ��

��

��

best-count ��

��

��

��

��

��

��

��

��

3.3.2.3 Implications on the Choice of Discriminant Functions and the PracticalTraining Algorithms

3.3.3 Discussion

plug-in MAP

maximum discriminantminimum empirical classification error

representative

3.4 Violations of Modeling Assumptions in ASR

3.4.1 Types of Distortions

� ��

� � �� independent ��

� representative�

� � �

� � � �

� ��

� ��

modeling error estimationerror

3.4.2 Towards Adaptive and Robust ASR

3.5 Improving Adaptive Decision Rules via Decision ParameterAdaptation

3.5.1 Decision Parameter Adaptation for Stationary Operating Condi-tions

��

��

��

��

��

��

��

��

� goals of adaptation� �

� �

3.5.1.1 Adaptation for Plug-in Decision Rules

Remark 1:regularization

imposing constraints

maximum penalized likelihood

Remark 2:

3.5.1.2 Adaptation for Maximum-Discriminant Decision Rules

w.r.t.

empiricalminimum

expected classification error

��

��

��

��

�� stochasticapproximation

��

��

��

��

�

3.5.2 Decision Parameter Adaptation for Slowly Changing OperatingConditions

��

��

��

��

forgetting mechanisms

3.5.3 Decision Parameter Adaptation for Switching Operating Condi-tions

adaptive model fu-sion

�

�

�

3.5.4 Discussion

robustdecision rule

3.6 Robust Decision Rules

3.6.1 Decision Rule Robustness

��

�

��

� � ��

�

��

��

��

��

��

��

�

�

��

guaranteed (upper) risk ��

��

��

��

��

�

�� overall risk �� robust (with respect to distortions ��

� ) decisionrules ��

��

��

minimax decision rule ��

��

��

predictive decision rule

��

�

��

�

minimax decision ruleBayesian predictive decision rule

� ��

� � �

� uncertainty neighborhood� �

��

3.6.2 Minimax Classification Rule

�� uncertainty neighborhood��

� �

��

� � ��

��

�

��

��

��

��

��

��

��

��

��

��

��

� ��

��

��

��

��

��

��

��

��

��

��

minimax decision rule

��

��

��

��

��

��

��

��

��

� ��

��

��

� �

minimax decision rule ��

��

��

��

model-space stochastic matching

3.6.3 Bayesian Predictive Classification Rule

��

average out�

Bayesian predictive classification��

��

a priori ��

��

��

��

hyperparameters

��

��

��

��

�

��

��

��

��

��

��

��

��

��

��

��

��

� ��

��

��

��

��

�

��

��

��

��

�

��

��

��

point estimate ��

��

�� empirical Bayes

��

��

�� prior uncertainty�

� ��

��

��

� representative ��

��

prior uncertainty ��

� �

�

��

� � ��

�

��

� overall risk ��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� ��

��

��

predictive densities ��

��

��

��

��

�� Bayesian predictive classification(BPC) rule

��

��

��

� � ��

��

��

��

��

��

��

��

��

�� model parameter

uncertainty

��

��

� ��

��

��

3.6.4 Discussion

�� training set �

��

��

��

��

��

�

�

��

reproducing density

approximate Bayesian (AB) decision rule

��

��

��

��

�

��

�

��

��

��

�� Bayesian minimax rule

Bayesian predictive density

Bayesian predictive density based model compensation

3.7 Summary

� class�

�

Acknowledgement

References

Acoustical and Environmental Robustness in Automatic SpeechRecognition

Proc. of ICASSP-2001

IEEE Trans. on Speechand Audio Processing

Statistical Prediction Analysis

IEEE Trans. on ElectronicComputers

IEEE Trans. on Pattern Analysis and MachineIntelligence

Proc. of ICASSP-86


Speech Recognition

IEEE Trans. on Acoustics,Speech, and Signal Processing

Inequalities

IEEE Signal Pro-cessing Letters

Proc. of Eurospeech-2001

IEEE Trans. on Speech and Audio Pro-cessing

Proceedings of the IEEE


Speech Communica-tion

Pattern Classification and Scene Analysis

Pattern Classification

Spoken Dialogues with Computers

IEEE Trans. on InformationTheory

IEEE Trans. on Information Theory

Proc.IEEE

Mathematical Statistics: a Decision Theoretic Approach

Proc. ETRW onRobust Speech Recognition for Unknown Communication Channels


IEEETrans. on Speech and Audio Processing


Proc. ICSLP-00

Handbook of Statistics

Predictive Inference: An Introduction

Journal of the American Statistical Association



Speech Com-munication

IEEE Trans. on Speech and Audio Processing

Biometrika

Proc. ICASSP-88

IEEE Trans. on Speech andAudio Processing

Proc. Eurospeech-01

RobustStatistics: The approach Based on Influence Functions



IEEE Trans. on Automatic Control


Spoken language processing: aguide to theory, algorithm, and system development

Robust Statistics


IEEE Trans. on Speechand Audio Processing



Proc. ICSLP-2000


Proc. ICASSP-2000

IEEE Trans. on Pattern Analysis and Machine Intelligence

Proceed-ings of the IEEE

Statistical Method for Speech Recognition

Advances in Speech Signal Processing

IEEE Trans. on Speech andAudio Processing


SpeechCommunication


IEEE Trans.on Information Theory

IEEE Transactions on Acous-tics, Speech, and Signal Processing

Technometrics

Computer Speechand Language

IEEE Trans. on Signal Processing

1996 IEEE Workshop on Neural Net-works For Signal Processing


Robustness in Automatic Speech Recognition:Fundamentals and Applications

IEEE Trans. on Infor-mation Theory

Proc. of IEEE

IEEE Trans. Acoust., Speech, SignalProcessing

Computer Speech andLanguage




Proc. of ASRU-1999

Robustness in Statistical Pattern Recognition

IEEE Signal Processing Letters

IEEESignal Processing Letters

Proc. ICSLP-96

Proc.ICASSP-98

Automatic Speech andSpeaker Recognition: Advanced Topics



The Bell System Technical Journal

Proc.IEEE

IEICE Trans. Inf. & Syst.

Automatic Speech Recognition – The Development of the SPHINX-System


Proc. ICASSP-90

Proc. Eurospeech-95

IEEE Trans. on Signal Processing


IEEE Trans. onNeural Networks

IEEE Trans. on Acoustics, Speech, and Signal Process-ing

IEEE Trans. on Acoustics, Speech, and Signal Processing

IEEE Trans. on Acoustics, Speech, and SignalProcessing




�

AT&T Tech. Journal


Fundamentals of Speech Recognition

Pattern Recognition and Neural Networks

Annals ofMathematical Statistics

IEEESignal Processing Letters



Proc. Workshop on Adaptation Methods for SpeechRecognition

Proc. ETRW on Ro-bust Speech Recognition For Unknown Communication Channels

IEEE Trans. on Audio and Speech Processing


IEICE Trans. Inf. & Syst.

Adaptation and learning in automatic systems

Foundations of the theory of learning systems

Proc. ICASSP-01


Proc. ICASSP-00

Statistical Decision Functions

Proc. of ICASSP-99

Proc. ICASSP-2002


The HTK Book Version 3.0

4

Speech Pattern Recognition using NeuralNetworks

Shigeru KatagiriNTT Communication Science Laboratories

CONTENTS

4.1 Introduction

4.2 Bayes Decision Theory

4.2.1 Preparations

�

� �

� � ��

� � � ��

�

��

4.2.2 Decision Rule

��

��

��

� �

�

4.2.3 Minimum Error-rate Classification

��

��

�

4.2.4 Probability Function Estimation

��

��

��

��

4.2.5 Discriminative Training

�

�

4.2.5.1 Functional Form Embodiment of the Entire Process

��

��

� � �

��

��

��

�

� � ��

�

� �

4.2.5.2 Discriminant Functions

�

�

�

4.2.5.3 Loss over an Individual Pattern

� � �

��

� � ��

��

��

��

4.2.5.4 Loss over Multiple Patterns

� � ��

��

�

�

�

�

�

��

� � � ��

��

�

4.2.5.5 Adjustment of Trainable System Parameters

�

[Probabilistic Descent Theorem]

��

Æ��

Æ��

��

��

��

��

��

��

��

�� Æ��

��

��

��

�

��

��

��

��

4.2.5.6 Training Optimality

��

��

��

��

�

��

��

��

��

��

� �

��

��

��

��

� �

��

��

��

��

��

� ��

��

� ��

��

��

��

��

��

��

4.2.5.7 Global Design Scope

��

��

��

4.3 Speech Recognizers Based on Neural Networks

4.3.1 Preparations

�

� �

4.3.2 Classification Error Minimization

4.3.2.1 Learning Vector Quantization

� ��

��

��

��

� �

4.3.2.2 Shift-tolerant LVQ Classifier

FIGURE 4.1Architecture of shift-tolerant LVQ classifier [20].

4.3.2.3 LVQ/HMM Hybrid Classifier

FIGURE 4.2Block diagram of LVQ/HMM hybrid classifier.

�

4.3.2.4 HMM/LVQ Hybrid Classifier

FIGURE 4.3Block diagram of HMM/LVQ hybrid classifier.

�

� �

��

�

�

�

� �

4.3.3 Squared Error Minimization

4.3.3.1 Training Using the Squared Error Loss

�

��

�

��

��

��

��

� ��

��

��

�

��

��

�

�

�

�

��

��

��

�

��

��

��

�

��

��

FIGURE 4.4Architecture of time-delay neural network [27].

4.3.3.2 Time-delay Neural Network

c c c

c c c

FIGURE 4.5Schematic description of distance classifier as a single intermediate layer net-work (2-dimensional input, 3 references/class, 3 classes).

4.3.3.3 Multi-state Time-delay Neural Network

4.3.4 Cross Entropy Minimization

4.3.4.1 Training Using the Cross Entropy Loss

��

��

�

� � �

��

��

��

��

��

� ��

��

��

��

� �

��

��

��

��

��

��

��

��

��

��

��

��

��

��

�

� �

��

��

4.3.4.2 Unidirectional Network Classifier

� � ��

��

4.3.4.3 Bidirectional Network Classifier

W

V

utyt

st s(t+1)

Time delayut : Input vector

st : State vector

yt : Output vector

FIGURE 4.6Architecture of unidirectional network [23].

4.4 Fusion of Multiple Classification Decisions

4.4.1 Principles

FIGURE 4.7Architecture of bi-directional network [25].

FIGURE 4.8Typical classifier design schemes of averaging-based decision fusion.

4.4.2 Examples of Embodiment

4.4.2.1 Multi-codebook Classifier Designed with GPD

FIGURE 4.9Relation between recognition accuracy and the number of prototypes per classand codebook [3].

4.4.2.2 Multi-class Classification Based on Support Vector Machine

4.4.2.3 Decision Fusion Using Different Classifiers

FIGURE 4.10Typical block diagrams of the MSTDNN-based audio-visual speech recognition[7].

4.4.2.4 Decision Fusion Using Multi-modal Classifiers

FIGURE 4.11Block diagram of the twofold-HMM-based audio-visual speech recognition [21].

4.5 Concluding Remarks

References

4.6 Appendix: Maximizing Mutual Information

��

� ��

� ��

��

��

��

��

��

�

� � ��

��

��

� ��

� �

��

��

��

��

� ��

��

��

5

Large Vocabulary Speech Recognition Basedon Statistical Methods

Jean-Luc Gauvain and Lori LamelLIMSI, France

CONTENTS

5.1 Introduction

5.2 Overview

��

��

� �

��

� � ��

��

�

� �� n

� ��

��

�

�

FIGURE 5.1LVCSR speech generation model: The word sequence � produced by the lan-guage model is successively transformed by the pronunciation model (� �� )and the acoustic model (�� ), resulting in the speech signal � .

5.3 Language Modeling

n

n� � ��

FIGURE 5.2System diagram of a generic speech recognizer based on statistical models, in-cluding training and decoding processes and the main knowledge sources.

� ��

��

��

��

� � ��

nn

�

��

� � �

��

��

� ��

�

�

� � ��

5.3.1 Text Preparation

n

� one hundred fifty dollars �

nineteen ninety one one thousand nine hundred and ninety one

hundred � � hundred andmillion dollars

million

� ��

FIGURE 5.3Some example transformation rules applied during text normalization with as-sociated probabilities.

million officials

�

neunzehnhun-derteinundneunzig neunzehn hundert einund neunzig

5.3.2 Vocabulary Selection

�

5.3.3 N-gram Estimation

�

�

� ��

��

��

�

n

nn n

� ��

��

��

�

��

�

�

�

�

��

��

�

�

�

� ��

��

�

5.3.4 LM Adaptation

cache model trigger model topic coherence model-ing

n

5.4 Pronunciation Modeling

Phone Example Phone Example

��

� �

��

�

��

��

��

��

�

��

FIGURE 5.4Set of 45 phone symbols for English with illustrative words, with the portioncorresponding to the phone sound underlined.

excuse,record, moderate anti-, bi-, multi-, -ization

� � �

� ��

��

��

�

� ��

FIGURE 5.5Some example lexical entries and their pronunciations along with estimateprobabilities. For the compound words, the original concatenated pronunci-ation is given in the 1st line and the reduced forms are given in the 2nd line.

interest conferencecompany

don’t knowdid you going to

gonna, dunno

5.5 Acoustic Modeling

5.5.1 Acoustic Front-end

�

� � ��

��

FIGURE 5.6A simple 3-state left-to-right HMM topology commonly used for allophone mod-eling in LVCSR. The model generates at least 3 speech frames per allophone, re-sulting in a minimal phone segment duration of 30ms for frame rate of 100Hz.

5.5.2 Modeling Allophones

� �

� � ��

��

��

��

��

� � ��

� � ��

/s�st�/s(*,�) �(s,s) s(�,t) t(s,�) �(t,*)s(*,�s) �(s,st) s(s�,t�) t(�s,�) �(st,*)

FIGURE 5.7Examples of allophonic transcriptions in terms of intra-word triphones andquinphones. Each contextual unit is defined by the central phone followed by itsphone context shown in parentheses (left-context, right-context). * is a wildcardsignifying any context.

��

��

��

��

a priori

Position:General classes:

Vowel classes:

Consonant classes:Individual phones:

FIGURE 5.8Example questions used for decision tree clustering.

senones genones PELs tied-states

5.5.3 HMM Parameter Estimation

� �

Question Log likelihood gain Question Log likelihood gain

FIGURE 5.9The most frequently used decision tree questions for an American Englishbroadcast news transcription system [40]. The [+1] and [-1] indicate that thequestion has been applied to the right or left context respectively, and [0] to thephone itself.

��

��

� �

A Posteriori

�

�

��

��

5.5.4 HMM Adaptation

� �

� � � ��

��

��

��

��

� ��

��

��

A b

5.6 Decoding

� �

� � � ��

� ��

�

��

� ��

�

� � � ��

��

� ��

5.6.1 Speech/Non-speech Detection

5.6.2 Decoding Strategies

�

�

�

�

�

FIGURE 5.10Example word lattice generated by a speech recognizer using a bigram languagemodel for a 2.1s utterance. Each graph edge corresponds to a word hypothesisand a time interval (as specified by the time information on the nodes). In thisexample the word transcription with the highest likelihood is “sil IT WAS AGOOD PROGRAM sil” which happens to be what was said. (The acoustic andlanguage model likelihoods are not given on the figure.)

5.6.3 Efficiency

n

�

�

�

5.6.4 Confidence Measures

��

��

��

��

��

5.7 Indicative Performance Levels

substitutionsinsertions

deletions

5.7.1 Dictation

�

5.7.2 Speech Recognition for Dialog Systems

�

n

exact

5.7.3 Transcription for Audio Indexation

�

�

5.8 Portability and Language Dependencies

�

References

The THISL Broadcast NewsRetrieval System,

Experiments in Vocal Tract Normaliza-tion,

A CompactModel for Speaker Adaptation Training,

One Pass Cross Word Decoding for Large Vocabularies Based on aLexical Tree Search Organization, 4

The Forward-Backward Search Strat-egy for Real-Time Speech Recognition,

Preliminary results on the performance of a system for the au-tomatic recognition of continuous speech,

AcousticMarkov Models used in the Tangora Speech Recognition System,

1

A Maximum Likelihood Approach toContinuous Speech Recognition,

PAMI-5

A Fast Match for Continuous Speech Recognition Using Allophonic Models,1

Large Vocabulary Recogni-tion of Wall Street Journal Sentences at Dragon Systems,

A maximization technique oc-curring in the statistical analysis of probabilistic functions of Markov chains

41

Vector quantization for efficient computation of continuous den-sity likelihoods, 2

A Baseline for the Tran-scription of Italian Broadcast News,

Word and acoustic confidence annotation for large vocabularyspeech recognition

Improvements in Language, Lexical and PhoneticModeling in Sphinx-II,

An empirical study of smoothing techniques forlanguage modeling, 13

Speaker, Environment and ChannelChange Detection and Clustering via the Bayesian Information Criterion

The Role of Word-Dependent Coartic-ulatory Effects in a Phoneme-Based Speech Recognition System

3

Statistical Language Modelling using CMU-Cambridge Toolkit,

Comparison of Parametric Representations ofMonosyllabic Word Recognition in Continuously Spoken Sentences,

28

Maximum Likelihood from In-complete Data via the EM Algorithm

39

Human SpeechRecognition Performance on the 1995 CSR Hub-3 Corpus

Genones: Optimization the Degree of Tying ina Large Vocabulary HMM-based Speech Recognizer,1

Speaker adaptation using con-strained estimation of Gaussian mixtures3

Sonograph and Sound Mechanics,22

Automatic Recognition of Phonetic Patterns inSpeech, 30

Human Speech Recognition Performance on the 1994CSR Spoke 10 Corpus

Comparison of speaker recognition methods using statistical featuresand dynamic features,ASSP-29

An improved approach to hidden Markov modeldecomposition of speech and noise,

Robust Continuous Speech Recognition usingParallel Model Combination, 9

Cluster Adaptive Training for Speech Recognition,

Semi-Tied Covariance Matrices for Hidden Markov Models,7

Transcribing Broad-cast News: The LIMSI Nov96 Hub4 System,

Spoken Lan-guage component of the MASK Kiosk

Speech Recognition for an Informa-tion Kiosk,

Partitioning and Transcription of Broad-cast News Data, 5

Developments in ContinuousSpeech Dictation using the ARPA WSJ Task,

Maximum a Posteriori Estimation for Multivari-ate Gaussian Mixture Observations of Markov Chains,

2

The LIMSI Broadcast News TranscriptionSystem 37

A Rapid Match Algorithm for Continuous SpeechRecognition,

A Probabilistic Approach to Confidence Mea-sure Estimation and Evaluation

Real-time Telephone-basedSpeech Recognition in the Jupiter Domain, 1

SWITCHBOARD: Telephone SpeechCorpus for Research and Development,

The Population Frequencies of Species and the Estimation of Popu-lation Parameters 40

A tree search strategyfor large-vocabulary continuous speech recognition,1

Linear Discriminant Analysis for ImprovedLarge Vocabulary Continuous Speech Recognition, 1

SegmentGeneration and Clustering in the HTK Broadcast News Transcription System,

News-on-Demand-’An Ap-plication of Informedia Technology’,

The ATIS Spoken LanguageSystems Pilot Corpus,

Perceptual linear predictive (PLP) analysis of speech,87

Large vocabu-lary continuous speech recognition using a hybrid connectionist-HMM system,

Signal Representation

Subphonetic Modeling with Markov States - Senone,1

Predicting Unseen Triphones withSenones, II

Continuous Speech Recognition by Statistical Methods,64

Statistical Methods for Speech Recognition,

A Dynamic LanguageModel for Speech Recognition,

: Speech BasedVideo Retrieval,

Maximum-Likelihood Estimation for Mixture MultivariateStochastic Observations of Markov Chains 64

Estimation of Probabilities from Sparse Data for the LanguageModel Component of a Speech Recognizer,

ASSP-35

Unsupervised Training of a Speech Recognizer: Re-cent Experiments, 6

The 1995 Abbot hybridconnectionist-HMM large-vocabulary recognition system,

Improved Clustering Techniques for Class-Based Statis-tical Language Modelling,

Improved backing-off for n-gram language modeling,1

Design of the 1994 CSR Benchmark Tests,

Toward Automatic Recognition of Broadcast News,

Heteroscedastic discriminant analysis and re-duced rank HMMs for improved speech recognition,26

Eigenvoices for Speaker Adaptation,

On Designing Pronunciation Lexicons for Large Vo-cabulary, Continuous Speech Recognition, 1

Speech Recognition of European Languages,

Continuous Speech Recognition at LIMSI,

A Phone-based Approach to Non-LinguisticSpeech Feature Identification, 9

Lightly Supervised and UnsupervisedAcoustic Model Training 16

Development of Spoken Language Corpora for Travel Infor-mation 3

Large-vocabulary speaker-independent continuous speech recogni-tion: The SPHINX system,

Speaker Normalization Using Efficient Frequency Warp-ing Procedures 1

Maximum Likelihood Linear Regression forSpeaker Adaptation of Continuous Density Hidden Markov Models,

9

Maximum Likelihood Estimation for Multivariate Observa-tions of Markov Sources IT-28

Speech recognition by machines and humans,22

Fast Speaker Change Detection for Broadcast NewsTranscription and Indexing 3

Multi-site Data Collection for a Spoken Language Corpus,

Finding Consensus in Speech Recognition:Word Error Minimization and Other Applications of Confusion Networks,

Subspace distribution clustering for continuousobservation density hidden Markov models,

Spoken Language Processing and Human-Machine Communica-tion in the European Union Programs,

An overview of EU programs related to conver-sational/interactive systems,

Algorithms for Bigram and Trigram Clus-tering,

News on Demand,43

Named Entity Extrac-tion from Broadcast News,

Full Expansion ofContext-Dependent Networks in Large Vocabulary Speech Recognition,

Large-VocabularyDictation using SRI’s Decipher Speech Recognition System: Progressive

Search Techniques, II

The Use of a One-Stage Dynamic Programming Algorithm for Con-nected Word Recognition,

ASSP-32

Improvements in BeamSearch for 10000-Word Continuous Speech Recognition,

I

Single-Tree Method for Grammar-DirectedSearch, 2

The Use of Decision Trees with Context Sensitive Phoneme Mod-elling,

A One Pass DecoderDesign for Large Vocabulary Recognition,

Recent Advancesin Japanese Broadcast News Transcription,2

Modeling Inverse Covariance Matrices by Ba-sis Expansion,

Language-model look-ahead for largevocabulary speech recognition,

A Word Graph Algorithm for Large Vo-cabulary Continuous Speech Recognition,11

The Role ofPhonological Rules in Speech Understanding Research,

ASSP-23

Continuous WordRecognition Based on the Stochastic Segment Model,

1993 Benchmark Tests for the ARPA Spoken Language Program,

1994 Benchmark Tests for the ARPA Spoken Language

Program,

1995 Hub-3 Multiple Microphone Corpus Benchmark Tests,

1998Broadcast News Benchmark Test Results: English and Non-English Word Er-ror Rate Performance Measures,

An efficient A� stack decoder algorithm for continuous speechrecognition with a stochastic language model,

Improved Discriminative Training Techniques ForLarge Vocabulary Continuous Speech Recognition

Evaluation of Spoken Language Systems: The ATIS Domain,

An Introduction to Hidden Markov ModelsASSP-3

Efficient Algorithms for Speech Recognition,

Stochastic pronuncia-tion modelling from hand-labelled phonetic corpora,29

Improvements in Stochastic Language Modeling,

Adaptive Statistical Language Modeling,

Two Decades of Statistical Language Modeling: Where Do WeGo From Here?,

88

Language-independent and langauge-adaptiveacoustic modeling for speech recognition 35

Memory-efficient LVCSR search using a one-pass stack decoder,14

New uses for N-Best Sen-tence Hypothesis, within the BYBLOS Speech Recognition System,

I

Improved Hid-den Markov Modeling of Phonemes for Continuous Speech Recognition,

3

NYU Language Modeling Experiments for the1995 CSR Evaluation,

A Markov Random Field Approach to Bayesian SpeakerAdaptation,

Modeling Those F-Conditions – Or Not,

Scalable backoff language models1

Automatic Segmentation, Classifica-tion and Clustering of Broadcast News Audio,

Evaluation of word confidence for speech recognitionsystems 13

Entropy-based Pruning of Backoff Language Models

Four-level Tied Structure for Efficient Repre-sentation of Acoustic Modeling,

An Investigation into Vocal Tract LengthNormalization,

Human Bench-marks for Speaker Independent Large Vocabulary Recognition Performance,

Speech discrimination by dynamic programming,4

Elements-wise recognition of continuous speech composed ofwords from a specified dictionary, 7

Verbmobil: Translation of Face-to-Face Dialogs,Plenary

Multilinguality in Speech and Spoken Language Systems88

Probabilistic Models for Topic De-tection and Tracking, 1

DragonSystems’ 1997 Broadcast News Transcription System,

Progress in Broadcast News Transcrip-tion at Dragon Systems,

Neural-Network based Measures of Confidence for Word Recognition,

Using word probabilities as confi-dence measures,

Unsupervised training of acoustic models for large vo-cabulary continuous speech recognition

The Zero Frequency problem: Estimating the prob-lems of Novel Events in Adaptive tex Compression

37

Large scale discriminative training of hiddenMarkov models for speech recognition,16

The de-velopment of the 1994 HTK large vocabulary speech recognition system,

The HTK large vocab-ulary recognition system for the 1995 ARPA H3 task,

A Hid-den Markov Approach to Text Segmentation and Event Tracking

1

A Review of Large-Vocabulary Continuous Speech Recognition,13

Multilingual large vocabulary speech recognition: the Euro-pean SQALE project, 11

Speech recognition evaluation: a review of the U.S.CSR and LVCSR programmes, 12

Tree-Based State Tying for High Ac-curacy Acoustic Modeling,

The Use of State Tying in Continuous SpeechRecognition, 3

Utilizing Untranscribed Training Data to Im-prove Performance

Maximum a Posteriori Adap-tation for Large Scale HMM Recognizers,

The MIT Speech Recog-nition System: A Progress Report

6

Toward Spontaneous Speech Recognition andUnderstanding

Sadaoki FuruiTokyo Institute of Technology

CONTENTS

6.1 Introduction

��

��

� ��

��

��

��

�� !��

��" ��

��" ��#��!� �#��!�

�� ! ��$�� ! �� !��!

� �� ! ��$�� ! �� !��!

� ��" ��!�

� ��" ��!�

��

��

�� !�� !

��"�� "��

��

��!��

��!

��!��!��!��!

"��

"��

��

��

� ��

��

��

��

�� !��

��" ��

��" ��#��!� �#��!�

�� ! ��$�� ! �� !��!

� �� ! ��$�� ! �� !��!

� ��" ��!�

� ��" ��!�

��

��

�� !�� !

��"�� "��

��

��!��

��!

��!��!��!��!

"��

"��

FIGURE 6.1Progress of spoken language technology along the dimensions of vocabulary sizeand speaking styles.

6.2 Four Categories of Speech Recognition Tasks

TABLE 6.1

6.3 Spontaneous Speech Recognition and Understanding - Re-view

6.3.1 Category I (human-to-human dialogue)

6.3.2 Category II (human-to-human monologue)

%&%#'� �" �

��

�� " �

��(��

)��*+,

-��. �" �

�/� �" �

-�� " �� +� �" �

/�� " �

%&%#'� �" �

��

�� " �

��(��

)��*+,

-��. �" �

�/� �" �

-�� " �� +� �" �

/�� " �

FIGURE 6.2The SCANMail architecture [12].

6.3.3 Category III (human-to-machine dialogue)

FIGURE 6.3AT&T Communicator architecture [15].

6.4 Japanese National Project on Spontaneous Speech Corpusand Processing Technology

6.4.1 Project Overview

0��! #��

1�� !

0��!��

%��#��!��

+��

��

2��#��

�� !��

.��

�� !��

�� .�

3 ��

�� "��

0��! #��

1�� !

0��!��

%��#��!��

+��

��

2��#��

�� !��

.��

�� !��

�� .�

3 ��

�� "��

FIGURE 6.4Overview of the Japanese national project on spontaneous speech corpus andprocessing technology.

6.4.2 Corpus

FIGURE 6.5Overall design of the Corpus of Spontaneous Japanese.

6.5 Automatic Transcription of Spontaneous Presentation

6.5.1 Recognition Task

6.5.2 Language and Acoustic Modeling

CSJ:

Web :

TABLE 6.2

� �

SpnL WebL

SpnL:

WebL:

SpnA:

RdA:

6.5.3 Recognition Results

SpnLWebL WebL

FIGURE 6.6Test-set perplexity and OOV rate for the two language models.

SpnL WebL SpnA RdA

SpnL WebL SpnARdA

SpnL SpnA

FIGURE 6.7Word accuracy for each combination of models.

SpnA

SpnA

SpnL

SpnA � �

6.5.4 Analysis on Individual Differences

FIGURE 6.8Results of unsupervised adaptation.

6.5.4.1 Speaker Attributes

TABLE 6.3

�

0.280.32

-0.42 -0.47 -0.54 -0.62-0.40 -0.33-0.54 -0.51 0.33 0.520.38 0.38 -0.50 -0.41

-0.30 -0.31

6.5.4.2 Correlation Analysis

�

� �

6.5.4.3 Regression Analysis

��

��

��

��

FIGURE 6.9Speaking rate vs. word accuracy.

�

6.5.4.4 Selection of Major Attributes

�

�

FIGURE 6.10Summary of correlation between various attributes.

TABLE 6.4

�

6.5.5 Discussion

6.6 Automatic Speech Summarization and Evaluation

6.6.1 Summarization of Each Sentence Utterance

� �

�

��

� � � ��

��

��

��

��

��

��

6.6.1.1 Word Significance Score

��

FIGURE 6.11An example of dependency structure.

6.6.1.2 Linguistic Score

��

6.6.1.3 Word Confidence Score

��

6.6.1.4 Word Concatenation Score

� ��

i k k j j Lw w w w w w w wlwmwi nw

FIGURE 6.12A phrase structure tree based on a dependency structure.

� � �� (right-headed)

� � �� (left-headed)

� � �

� � �

� ��

��

�

��

��

��

�

�

��

��

��

��

��

��

��

��

��

��

��

��

��

6.6.2 Summarization of Multiple Utterances

��

� ��

6.6.3 Evaluation

6.6.3.1 Word Network of Manual Summarization Results for Evaluation

6.6.3.2 Evaluation Data

6.6.3.3 Training Data for Summarization Models

I_L

_T

SUB

I_L

_C_T

I_L

_C

I_L

I_L

RD

M I_L

_T

RD

M

I

I

REC TRSI_

L_T SU

B

I_L

_C_T

I_L

_C

I_L I_

L

RD

M I_L

_T

RD

M I

I

REC TRS

FIGURE 6.13Each utterance summarizations at 70% summarization ratio.

6.6.3.4 Evaluation Results

� �

� � �

� � � �

� � � �

� � � � �

I_L

_T

SUB

I_L

_C_T

I_L

_C

I_L

I_L

RD

M

I_L

_T

RD

M I I

REC TRS

I_L

_T

SUB

I_L

_C_T

I_L

_C

I_L

I_L

RD

M

I_L

_T

RD

M

I I

REC TRS

FIGURE 6.14Article summarizations at 30% summarization ratio.

6.6.4 Discussion

6.7 Spontaneous Speech Recognition and Understanding ResearchIssues

6.7.1 Language Models and Corpora

6.7.2 Message-driven Speech Recognition and Understanding

� ��

� �

�

� ��

�

� � ��

��

�

��

� � �� !��

��

( �! ��

0��!��

-��

0��!��!

��

4��

� ��

�� .�

)��

��

� " ��

5��

2��

(��

FIGURE 6.15A communication-theoretic view of speech generation and recognition.

�

�

��

� ��

�

�

� ��

��

� ��

�

�

� ��

� ��

��

� ��

� ��

� ��

� ��

6.7.3 Statistical Approaches and Speech Science

-

6.7.4 Research on the Human Brain

6.7.5 Dynamic Spectral Features

FIGURE 6.16Speech-generation and speech-perception processes.

6.8 Conclusion

References

7

Speaker Authentication

Qi Li� and Biing-Hwang Juang�

�Bell Labs; �Avaya Labs Research

CONTENTS

7.1 Introduction

FIGURE 7.1Speaker authentication approaches.

Speaker authentication

7.1.1 Speaker Recognition and Verification

Speaker recognitionSpeaker verification

hypothesis test-ing Speaker identification

classification

FIGURE 7.2A speaker verification system.

direct methoddirectly

fixed pass-phrase system

text-prompted system

text-independentSV system

closed test open test

7.1.2 Verbal Information Verification

FIGURE 7.3An example of verbal information verification by asking sequential questions.(Similar sequential tests can also be applied in speaker verification and otherbiometric or multi-modality verification.)

in-direct method

7.2 Pattern Recognition in Speaker Authentication

7.2.1 Bayesian Decision Theory

�

� � ��

��

��

a posteriori

� ��

��

��

��

��

��

��

��

��

��

��

��

Bayes decision rule� � ��

��

��

��

��

��

��

� ��

��

� ��

��

� ��

��

��

��

�

� � ��

��

� ��

� ��

��

� ��

��

��

��

��

��

� ��

7.2.2 Stochastic Models for Stationary Process

pdfpdf

pdf

��

��

��

��

� ��

��

��

��

�

��

�

��

�

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� ��

� � �

speaker-dependent

7.2.3 Stochastic Models for Non-Stationary Process

FIGURE 7.4Left-to-right hidden Markov model.

�

�

� �

�

� � ��

� � � ��

��

segmental K-mean

��

7.2.4 Speech Segmentation

� � � ��

��

7.2.5 Statistical Verification

� ��

��

��

��

��

� �

� ��

� ��

��

��

��

��

��

� ��

� ��

� � ��

��

��

��

��

��

��

� ��

Neymann-Pearson

��

��

�

7.3 Speaker Verification System

t )L(O,

L(O, )b

FIGURE 7.5A fixed-phrase speaker verification system.

whole-word or whole phrase model

��

�

��

��

��

� ��

��

� � � ��

� � ��

� � �

��

��

��

��

��

��

� ��

��

��

TABLE 7.1Experimental Results in Average Equal-Error Rates

7.4 Verbal Information Verification

FIGURE 7.6Utterance verification in verbal information verification (VIV).

7.4.1 Utterance Segmentation

�

��

�

� � ��

� ��

� ��

��

��

��

� � ��

��

��

��

��

� �

��

7.4.2 Subword Hypothesis Testing

��

��

��

��

� ��

� ��

��

��

��

��

�

� ��

��

��

��

��

��

��

7.4.3 Confidence Measure Calculation

�

��

� �

� ��

�

��

��

� �

� ��

�

��

� ��

�

��

�

��

��

�

��

��

��

�� normalized confidence measure

�

� �

�

��

��

��

��

� ��

�

� � � � � �

� �

� �

� � ��

7.4.4 Sequential Utterance Verification

step-down procedure

�

��

��

��

��

��

��

��

� ��

��

��

��

��

�

��

��

false rejectionfalse acceptance

�� equal-error rate �

��

Definition 1: False rejection error on � utterances � � ��

Definition 2: False acceptance error on � utterances � � ��

Definition 3: Equal-error rate on � utterances�

� ��

��

� ��

��

��

�

��

��

��

��

� �

��

��

��

�

��

��

��

��

�

��

��

��

��

��

��

��

Example 1:

��

��

��

Example 2: ��

�

� �

��

� � ��

��

��

� � ��

��

��

7.4.5 VIV Experimental Results

� �

�

� � � ��

�

� ��

��

�

� ��

TABLE 7.2Summary of the Experimental Results on Verbal Information Verification

FIGURE 7.7An integrated voice authentication system combining verbal information verifi-cation and speaker verification.

7.5 Speaker Authentication by Combining SV and VIV

TABLE 7.3Experimental Results without Adaptation in Average Equal-Error Rates

TABLE 7.4Experimental Results with Adaptation in Average Equal-Error Rates

7.6 Summary

References

An Introduction to Multivariate Statistical Analysis

Journal of the AcousticalSociety of America

Proceedingof the IEEE

Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing

Ann. Math. Stat.


Proceedings of the IEEE International Conferenceon Acoustics, Speech, and Signal Processing

Journal of Royal Statistical Society

Pattern Classification, Second Edition

Proceeding of IEEE

Introduction to Statistical Pattern Recognition

IEEETrans. Acoust., Speech, Signal Processing

IEEE SignalProcessing Magazine

AT&T Technical Journal

IEEE Trans. on Speech and Audio Process.

IEEE Transactions on Signal Processing

Proceed-ings of ICASSP

Proceedings of Int. Conf. on Spoken Language Processing

Proc.of ICSLP


Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing

IEEE Robotics & Automation magazine

Proceedings of EUROSPEECH


Proceedings of theIEEE International Conference on Acoustics, Speech, and Signal Processing

IEEE International Conference on Acoustics, Speech, and Signal Processing

Proceedings of IEEE Workshop on Automatic Identifi-cation

IEEE Trans. onSpeech and Audio Processing

Journal of theAcoustical Society of America

Proc. IEEE Int. Conf.Acoust., Speech, Signal Processing

Biometrika

Phil. Trans. Roy. Soc. A

IEEE Trans. onSpeech and Audio Processing

Proceedingsof ICSLP-96

Fundamentals of speech recognition


Proc. IEEE Int. Conf. Acoust., Speech, SignalProcessing

Proc. IEEE Int. Conf. Acoustic, Speech, Signal Processing


Proceedings of theIEEE

Proceedings of the IEEE In-ternational Conference on Acoustics, Speech, and Signal Processing

Proc. IEEE Int. Conf. Acoust., Speech,Signal Processing

Proc. Int. Conf. on Spoken LanguageProcessing

Proc. IEEE Int. Conf. Acoust., Speech,Signal Processing


IEEE Trans. Speech and Audio Process.

Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing

IEEE Transactions on Information Theory

Sequential analysis

IEEETrans. on Acoustics, Speech, and Signal Proc.

The Annalsof Statistics

8

HMMs for Language Processing Problems

Richard M. Schwartz and John MakhoulBBN Technologies, Verizon

CONTENTS

8.1 Introduction

8.2 Use of Probabilities

8.2.1 Hidden Markov Models

8.3 Name Spotting

8.4 Topic Classification

��

.

.

P(Tj|Set)

storystart

storyend

T1

T2

TM

T0G eneralLanguage

Loop

P(Set)

nP (W n|Tj)

FIGURE 8.1A hidden Markov model for topics. Each state can emit words for one topic.State T0 emits words corresponding to general language.

8.4.1 The Model

��

� ��

� ��

� ��

� ��

� ��

��

��

� ��

��

��

�

�

Set� ��

� ��

��

� ��

� ��

� ��

� ��

�

�

��

� ��

�

8.4.2 Estimating HMM Parameters

� ��

��

��

��

��

��

��

��

��

�

� �

��

�

�

��

��

��

��

��

��

��

��

��

�

��

�

8.4.3 Classification

��

��

��

�

��

��

� � ��

� ��

��

��

8.4.4 Experiments

8.5 Information Retrieval

8.5.1 A Bayesian Model for IR

� ��

� ��

� ��

� ��

� ��

� ��

8.5.2 Training the IR HMM

�

�

� �

�

�

�

8.5.3 Performance

8.6 Event Tracking

8.7 Unsupervised Topic Detection

��

-

8.8 Summary

References

9

Statistical Language Models With EmbeddedLatent Semantic Knowledge

Jerome R. BellegardaApple Computer, Inc.

CONTENTS

9.1 Introduction

�

�

9.1.1 Scope Locality

�

�

�

stocks fell sharply as a result of the announcementstocks, as a result of the announcement, sharply fell

fell stocks� � �

� � �

�

information aggregation span extension

�

9.1.2 Syntactically–Driven Span Extension

�

�

�

�� headwords �� words

stocks fell

9.1.3 Semantically–Driven Span Extension

document

stock market trends

stocksfell stocksfell

stocks fell

latent semantic analysis

�

�

9.1.4 Organization

�

�

9.2 Latent Semantic Analysis

� ��

� � ��

��

9.2.1 Feature Extraction

�

�

�

� ��

��

��

��

��

�

��

� ��

��

��

��

��

��

��

��

�

� � ��

� ��

� �

9.2.2 Singular Value Decomposition

��

� � � �

��

��

� � ��

��

� � � ��

��

� � � � ��

� � � ��

� � ��

� � ��

��

� �� word vector ��

� �� document vector��

� �

� � � �

��

�

� �

9.2.3 General Behavior

�

�

�

�

� � ��

FIGURE 9.1Improved Topic Separability in LSA Space.

9.3 LSA Feature Space

� � � � �

� ��

� � ��

�

�

� � �

� � �

9.3.1 Word Clustering

� � �

� ��

� � � � � � � � � �

�

� ��

��

� ��

��

� � ��

��

��

��

��

�

9.3.2 Word Cluster Example

� � � ��

� � ��

�

drawingrule polysemy draw-

ing rule

Cluster 1

Cluster 2

FIGURE 9.2Word Cluster Example (After [2]).

drawing a conclusion breaking a rule

hysteria here

9.3.3 Document Clustering

� � � � � � � � � �

� � ��

��

� ��

��

� ��

�

�

��

FIGURE 9.3Document Cluster Example.

9.3.4 Document Cluster Example

9.4 Semantic Classification

9.4.1 Framework Extension

�

��

� � � ��

� �

��

� � ��

��

��

��pseudo document vector

�

��

�

�

��

� ��

��

9.4.2 Semantic Inference

��

semantic anchor

semantic inference

� � �what is the time what is the day

what time is the meeting cancel the meeting� � � what is

the� � �

�

day can-celthe

day cancel

what–is time time meetingtime

when is themeeting what time is the meeting

�

�

��

��

FIGURE 9.4An Example of Semantic Inference for Command and Control (� � �).

9.4.3 Caveats

not

change popup to windowchange window to popup

exact same point

� � �

� �

�

9.5 N-gram+LSA Language Modeling

�

9.5.1 LSA Component

��

��

��

�

��

��

9.5.1.1

��

��

�

��

� � � � � �

��

��

��

��

��

�

��

��

��

��

9.5.1.2

�

� � ��

� � ��

��

��

��

��

� ��

� ��

��

��

��

��

��

the�

9.5.2 Integration with N-grams

��

��

��

��

��

��

��

��

��

��

��

�

�

��

��

��

��

��

��

� � ��

��

��

��

��

� ��

�

� ��

� ��

��

��

��

��

��

��

�

��

9.5.3 Context Scope Selection

�

� ��

��

� � � � �

��

��

��

�

�

��

��

�

�

��

�

9.6 Smoothing

9.6.1 Word Smoothing

��

��

��

��

��

��

��

� �

��

��

�� Æ��

� � �

��

��

��

9.6.2 Document Smoothing

��

��

��

��

��

��

��

��

��

��

� � �

� � �

��

��

�

��

��

��

�

��

� ��

� �

��

��

9.6.3 Joint Smoothing

��

��

��

��

��

��

��

��

��

��

��

��

��

��

�

9.7 Experiments

��

�

�

9.7.1 Experimental Conditions

� � � ��

�

� � ��

� � ��

� � ��

��

� � ��

�

9.7.2 Experimental Results

�

�

�

TABLE 9.1

� � � � � � � �

�

� � � �

� � � �

� � � �

� � � �

� �

9.7.3 Context Scope Selection

� � ��

� � � � � � ��

��

�

�

�

TABLE 9.2

� �

� � ��

� � ��

� � ��

� � ��

� � ��

� � ��

� � ��

�

� � � � � ��

� � ��

9.8 Inherent Trade-Offs

�

9.8.1 Cross-Domain Training

�

��

� �

��

��

TABLE 9.3

� �

��

��

��

�

�

9.8.2 Discussion

�

�

�

�

��

� �

9.9 Conclusion

�

�

�

�

�

�

References

Context-Dependent Vector Clustering for Speech Recognition

A Multi-Span Language Modeling Framework for Large Vo-cabulary Speech Recognition

Large Vocabulary Speech Recognition With Multi-Span Sta-tistical Language Models

Exploiting Latent Semantic Information in Statistical Lan-guage Modeling

Robustness in Statistical Language Modeling: Review andPerspectives

Fast Update of Latent Semantic Spaces Using a Linear Trans-form Framework

ANovel Word Clustering Algorithm Based on Latent Semantic Analysis

Toward Unconstrained Command andControl: Data-Driven Semantic Inference

Natural Language Spoken InterfaceControl Using Data-Driven Semantic Inference

Large–Scale Sparse Singular Value Computations

Using Linear Algebra for In-telligent Information retrieval

An Overview of Parallel Algorithms for the SingularValue and Dense Symmetric Eigenvalue Problems

Natural Language Call Routing: A Robust,Self–Organized Approach

Structure and Perfor-mance of a Dependency Language Model

Recognition Performance of a Structured LanguageModel

Building Probabilistic Models for Natural Language

Dialog Management in Vector–Based CallRouting

Language Model Adaptation Using Mix-tures and an Exponentially Decaying Cache

Towards Better Integration of Semantic Predictorsin Statistical Language Modeling

Lanczos Algorithms for Large SymmetricEigenvalue Computations – Vol. 1 Theory

Recognizing and Using Knowledge Structures in Dialog Sys-tems

Indexing by Latent Semantic Analysis

Adaptive Lan-guage Model Estimation Using Minimum Discrimination Estimation

Improving the Retrieval of Information from External Sources

Latent Semantic Indexing (LSI) and TREC–2

Language Modeling

Personalized Information Delivery: An Analysisof Information Filtering Methods

On Topic Identification and Dialogue Move Recognition

Topic–Based Language Modeling Using EM

Matrix Computations

Document Space Models Using Latent Semantic Anal-ysis

Probabilistic Latent Semantic Analysis

Probabilistic Topic Maps: Navigating Through Large Text Col-lections

Modeling Long Distance Dependencies in Lan-guage: Topic Mixtures Versus Dynamic Cache Models

Self–Organized Language Modeling for Speech Recognition

Putting Language into Language Modeling

Using a Stochastic Context–Free Grammar as a Language Model forSpeech Recognition

Putting Language Back into Language Modeling

Statistical Language Modeling Using a Variable Context

The Hub and Spoke Paradigm for CSR Evaluation

A Cache-based Natural Language Method for SpeechRecognition

Cluster Expansion and Iterative Scaling for Maxi-mum Entropy Language Models

Solution to Plato’s Problem: The LatentSemantic Analysis Theory of Acquisition, Induction, and Representation ofKnowledge

How Well Can Pas-sage Meaning Be Derived Without Using Word Order: A Comparison of LatentSemantic Analysis and Humans

Trigger–Based Language Models: AMaximum Entropy Approach

On Structuring Probabilistic Dependencesin Stochastic Language Modeling

A Variable–Length Category–Based N–Gram Lan-guage Model

Latent Seman-tic Indexing: A Probabilistic Analysis

Beyond Word �-Grams

An Overview of Automatic SpeechRecognition

The CMU Statistical Language Modeling Toolkit and its Use inthe 1994 ARPA CSR Evaluation

A Maximum Entropy Approach to Adaptive Statistical LanguageModeling

Two Decades of Statistical Language Modeling: Where Do WeGo From Here

Interactive Feature Induc-tion and Logistic Regression for Whole Sentence Exponential Language Mod-els

Language Representation

A MaximumLikelihood Model for Topic Classification of Broadcast News

An Explanation of the Effectiveness of Latent Semantic Indexing byMeans of a Bayesian Regression Model

Combining Nonlocal, Syntactic and N-Gram De-pendencies in Language Modeling

Recognition and Parsing of Context–Free Languages in Time�

�

Using Detailed Linguistic Structure inLanguage Modeling

Linguistic Features for Whole Sen-tence Maximum Entropy Language Models

Integration of Speech Recognition and Natural Language Processing in theMIT Voyager System

10

Semantic Information Processing of SpokenLanguage – How May I Help You?sm

A. L. Gorin, A. Abella, T. Alonso, G. Riccardi, and J. H. Wright,AT&T Laboratories

CONTENTS

10.1 Introduction

AT&T’s‘How May I Help You?’ ��

“The fundamental problem of communication is that of reproducing atone point either exactly or approximately a message selected at another

point. Frequently the messages have meaning, � � � These semantic as-pects of communication are irrelevant to the engineering problem.”

confirmclarify

“Do you want to make a collect call?”“Charge this call please”

“How do you want to charge this call, to a credit card or to a third num-ber?”

“What is your home phonenumber?”

Construct Algebra

dialog motivators

inheritance hierarchy

‘is a’‘has a’

10.2 Call-Classification

‘press one if you want x, press two if you want y’

‘please say collect, calling card’ ‘press orsay one if you want x’

‘How may I help you?’

“I want to reverse the charges on this call.”“Can you tell me what time it is in Tokyo?”“I was trying to call my sister and dialed a wrong number.”“I’ve been trying to dial this number all day and can’t get through.”

“How much money do I owe you?”“I don’t recognize this phone call to Tallahassee on October 4.”“What’s this charge for one dollar and fifty cents?”“I have a question about my bill.”

FIGURE 10.1Call classification and routing in HMIHY.



FIGURE 10.2Inheritance hierarchy of task knowledge in operator services.

perplexity

� � � ��

� �

Evaluating Call Classification.

FIGURE 10.3Histogram of utterance lengths.

false rejection

correct classificationtrue rejection rate

Remark:

“I want to know howto pay my bill”

10.3 Language Modeling for Recognition and Understanding

� � ��

��

‘I want to make a’‘collect call’ ‘card call’

� � � �

‘wrong’‘wrong number’

‘dialed a wrong number’

‘dialed a wrong number’ ‘dialed the wrong number’

FIGURE 10.4A salient grammar fragment.

salient grammar fragments

� User

� �� yeah I’m not AT&T WIRELESS PHONE and when I got and she toldme that I would be switched to 7 CENTS A MINUTES FOR ALL my AT&Tlong distance on that I was on 10 10 cents ONE RATE PLAN

� SLU

FIGURE 10.5Natural spoken dialog in HMIHY.

10.4 Dialog

Machine:User:Machine:User:Machine:User:Machine:User:Machine:

Machine:User:Machine:User:Machine:

User:Machine:

10.5 Conclusions

www.research.att.com/�algor/hmihy

References

11

Machine Translation Using StatisticalModeling

Herman Ney, and F. J. OchAachen University of Technology, Germany

CONTENTS

Abstract.

11.1 Introduction

machine translationwritten language text

spoken speech

spontaneous speech

� Statistical Decision Theory and Linguistics.

� Alignment and Lexicon Models.

� Alignment Templates: From Single Word to Word Groups.

� Experimental Resultswritten spoken

� Speech Translation: The Integrated Approach.serial

integrated

11.2 Statistical Decision Theory and Linguistics

11.2.1 The Statistical Approach

�

�

�

�

�

�

�

11.2.2 Bayes Decision Rule for Written Language Translation

� ��

��

��

�

� wordfull-form

��

��

��

� ��

��

��

��

��

after

11.2.3 Related Approaches

�

�

�

Source Language Text

Transformation

Lexicon Model

Language Model

Global Search:

Target Language Text

over

maximize Alignment Model

Transformation

FIGURE 11.1Architecture of the translation approach based on Bayes decision rule.

11.3 Alignment and Lexicon Models

11.3.1 Concept of Alignment Modelling

��

��

��

��

well

I

think

if

we

can

make

it

at

eight

on

both

days

ja

ich

denke

wenn

wir

das

hinkriegen

an

beiden

Tagen

acht

Uhr

FIGURE 11.2Example of an alignment for a German-English sentence pair.

exactlyonealignment models

11.3.2 Hidden Markov Models

� � � � ��

��

� ��

��

��

��

��

��

��

��

‘hidden’not

sequence ��

� � ��

��

��

��

��

� ��

� ��

��

� ��

��

��

��

��

��

� ��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� baseline HMM

� �

��

��

��

��

�

�

��

��

��

� ��

��

�

��

� homogeneous HMM

��

��

� �

��

� ��

� ��

��

��

��

��

��

� context dependent HMM

��

��

��

��

��

� ��

��

� ��

11.3.3 Models IBM 1–5

before

� models IBM-1 and IBM-2: zero-oder dependence.first-order zero-order

absolute

��

� ��

� ��

��

��

�

��

��

��

��

� ��

� ��

��

��

��

��

��

��

��

��

� ��

��

��

� model IBM-3: fertility concept.

�

�

��

��

��

��

Æ��

��

� models IBM-4 and IBM-5: inverted alignments with first-order depen-dence.

��

��

��

� ��

��

��

� � ��

� �

� � � � �

��

��

�

��

� � �

�

� �� absoluterelative

��

�

��

��

��

�

��

� � ��

��

��

��

�

� ��

�

��

��

��

��

word context

11.3.4 Training

exact allmaximum approximation

11.3.5 Search

� ��

� � ��

� invertedtarget source

� several�

� � ��

SENTENCE INSOURCE LANGUAGE

TRANSFORMATION

SENTENCE GENERATEDIN TARGET LANGUAGE

SENTENCE

KNOWLEDGE SOURCESSEARCH: INTERACTION OF

KNOWLEDGE SOURCES

WORD + POSITION

ALIGNMENT

LANGUAGEMODEL

BILINGUALLEXICON

ALIGNMENTMODEL

WORD RE-ORDERING

SYNTACTIC ANDSEMANTIC ANALYSIS

LEXICAL CHOICE

HYPOTHESES

HYPOTHESES

HYPOTHESES

TRANSFORMATION

FIGURE 11.3Illustration of search in statistical translation.

sets � �

��

� �

��

��

��

��

��

��

��

��

��

��

��

FIGURE 11.4Illustration of bottom-to-top search.

bottom-to-top �

��

�all

once

11.3.6 Algorithmic Differences between Speech Recognition and Lan-guage Translation

�

�

11.4 Alignment Templates: From Single Words to Word Groups

11.4.1 Concept

alignment template

okay

,

how

about

the

nineteenth

at

maybe

,

two

o’clock

in

the

afternoon

?

okay ,

wie

sieht

es

am

neunzehnten

aus ,

vielleicht

um

zwei

Uhr

nachmittags ?

FIGURE 11.5Example of alignment templates for a German-English sentence pair.

� ��

��

��

��

��

withinbetween ��

� between

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��within

��

��

�

�

��

�

��

�

� � ��

��

��

��

��

��

��

��

��

��

��

11.4.2 Training

each � �

��

11.4.3 Search

� �

�

�

between the word groups within

11.5 Experimental Results

11.5.1 The Task and the Corpus

before

�

�

don’t � do not

�

11.5.2 Offline Results

�

�

�

�

several

�

TABLE 11.1

TABLE 11.2

� �

11.5.3 Integration into the Prototype System

stattrans

stattransrepair

stat-trans

prosodyprosody

11.5.4 Final Evaluation

�

�

slot filling

�

and

relative

TABLE 11.3

�

11.6 Speech Translation: The Integrated Approach

11.6.1 Principle

��

� � ��

��

��

��

�

� ��

� ��

��

��

��

� ��

��

��

��

� ��

��

��

��

��

��

� ��

��

��

��

��

��

��

� ��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

if � ��

��

��

��

��

11.6.2 Practical Implementation

� ��

��

��

��

��

��

Speech Input inSource Language

Translated Text inTarget Language

Acoustic Model

Lexicon Model

Alignment Model

Language Model

AcousticAnalysis

Global Search:

maximize

over

FIGURE 11.6Integrated architecture of speech translation approach based on Bayes decisionrule.

� ��

��

joint ��

��

��

��

��

��

�

��

��

�

��

��

� meaning

se-mantically ��

��

sourcetarget

11.7 Summary

Acknowledgment

11.8 References

Spoken Language Translation Workshop, 35th Annual Conf. of the As-soc. for Computational Linguistics

Computational Linguistics

Int. Conf. on Spoken Language Processing,

ARPA Human Language Technology Workshop

United States Patent



ComputationalLinguistics

� Computational Linguistics

IEEE Automatic Speech Recognition and Understanding Workshop

Words and objections. Essays on the work of W. V. Quine

Workshop on Very Large Corpora

Int. Conf. on Spoken Language Processing,

Final report of the EuTrans project

39thAnnual Meeting of the Assoc. for Computational Linguistics,

39th Annual Meeting of the Assoc.for Computational Linguistics,

Statistical methods for speech recognition.

Europ. Conf. on Speech Communication and Technology,


2nd Conf.of the Assoc. for Machine Translation in the Americas

IEEE Int.Conf. on Acoustics, Speech and Signal Processing,


18th Int. Conf. on Computational Linguistics

2nd Int. Conf. on Language Resourcesand Evaluation

36th Annual Meeting of the Assoc. for Compu-tational Linguistics and 17th Int. Conf. on Computational Linguistics

9th Conf.of the Europ. Chapter of the Assoc. for Computational Linguistics


38th Annual Meet-ing of the Assoc. for Computational Linguistics

Joint SIGDAT Conf. on Empirical Methods in Natural Lan-guage Processing and Very Large Corpora

Data-Driven Machine Translation Workshop, 39thAnnual Meeting of the Assoc. for Computational Linguistics

IBM Research Report

Fundamentals of speech recognition

Data-Driven MachineTranslation Workshop, 39th Annual Meeting of the Assoc. for ComputationalLinguistics

6th Int. Workshop on Parsing Technologies

Data-Driven Machine Translation Workshop, 39th AnnualMeeting of the Assoc. for Computational Linguistics

18th Int. Conf. on Computational Linguistics 2000

IEEE Int. Conf. onAcoustics, Speech and Signal Processing

38th An-nual Meeting of the Assoc. for Computational Linguistics


Verbmobil: Foundations of speech-to-speech translation.

35th An-nual Conf. of the Assoc. for Computational Linguistics

IEEE Trans. on Speech and AudioProcessing,


39thAnnual Meeting of the Assoc. for Computational Linguistics,

12

Modeling Topics for Detection and Tracking

James AllanUniversity of Massachusetts Amherst

CONTENTS

12.1 Topic Detection and Tracking

12.1.1 Topic and Events

event

topic

not

12.1.2 TDT Tasks

12.1.2.1 Segmentation

12.1.2.2 Cluster Detection

12.1.2.3 Tracking

12.1.2.4 New Event Detection

12.1.2.5 Link Detection

12.1.3 Corpora

�

each

�

�

12.1.4 Evaluation

� � � � ��

� � � ��

� � ��

��

�

0.02 0.10.2 0.5 1 2 5 10 20 40 60 80 90False Alarm Rate

2

5

10

20

40

60

80

90

Miss

Rate

0.02 0.10.2 0.5 1 2 5 10 20 40 60 80

1

2

5

10

20

40

60

80

FIGURE 12.1A sample detection error tradeoff (DET) curve for the TDT tracking task withone training story (�� ).

minimum

12.2 Basic Topic Models

12.2.1 Vector Space

��

��

12.2.2 Language Models

� ��

�

� � �

�

� � ��

� � ��

��

� ��

��

12.3 Implementing the Models

12.3.1 Named Entities

President Bush George Bush

12.3.2 Document Expansion

� ��

��

� ��

12.3.3 Clustering

12.3.4 Time Decay

12.4 Comparing Models

12.4.1 Nearest Neighbors

� �

�

�

� � �

�

�

12.4.2 Decision Trees

�

12.4.3 Model-to-Model

�

��

� �

��

��

��

��

�

� � �

��

��

�

12.5 Miscellaneous Issues

�

�

12.5.1 Deferral

12.5.2 Multi-modal Issues

third

12.5.3 Multi-lingual Issues

FIGURE 12.2Screen snapshot of the Lighthouse system that was created to portray TDT topicclusters and their relationships.

12.6 Using TDT Interactively

12.6.1 Demonstrations

12.6.2 Timelines

��

Oklahoma

��

OklahomaMcVeigh Simpson

FIGURE 12.3Overview of January-June 1998. The topic labeled monica lewinsky allegation isthe highest ranked topic by the �� measure. The pop-up on oregon school shoot-ing shows significant named entities for that event. The other pop-up displays asub-menu for obtaining more information on the name kip kinkel.

��

12.7 Modeling Events

�

12.8 Conclusion

� research

References

Proceedings of Conference onInformation Retrieval Research (SIGIR)

Proceedings of the DARPA BroadcastNews Transcription and Understanding Workshop

Proceedings of Conference on Information Retrieval Research (SIGIR)

Information Retrieval

Topic Detection and Track-ing: Event-based Information Organization

In Proceedings of the 36th Annual Meetingof the Association for Computational Linguistics and the 17th InternationalConference on Computational Linguistics (COLING-ACL’98)

Proceedings for Empirical Methods in NLP

Proceedings of the Text Retrieval Conference(TREC-3)

Proceedings of the DARPA Broadcast News Workshop

Topic Detection and Tracking: Event-based InformationOrganization

Topic Detection and Tracking: Event-based Information Organization

Proceed-ings of the DELOS-NSF Workshop on Personalization and Recommender Sys-tems in Digital Libraries

Topic Detectionand Tracking: Event-based Information Organization


Proceedings of the Text Retrieval Conference (TREC-2)

Topic Detection and Tracking:Event-based Information Organization

Proceedings of the Human Language Technology Conference (HLT)

Proceedings of the Text RetrievalConference (TREC-8)

Proceedings of ACM SIGIR Conference on Research in Information Retrieval

Topic Detection andTracking: Event-based Information Organization

Proceedings of the IEEE Symposium on Information Visualization2000 (InfoVis 2000)

Foundations of Statistical Natural LanguageProcessing

EuroSpeech

Proceedings of the DARPABroadcast News Workshop

Proceedings of the 2000 Speech Transcription Workshop

Proceedings of the DARPA BroadcastNews Workshop

On-line New Event Detection, Clustering, and Tracking

Advances inInformation Retrieval: Recent Research from the CIIR

Proceedings of the DARPA Broadcast NewsWorkshop

Proceedings of SIGIR

A Language Modeling Approach to Information Retrieval

Proceedings ofthe European Conference on Research and Advanced Technology for DigitalLibraries (ECDL)


Introduction to Modern InformationRetrieval

Topic Detectionand Tracking: Event-based Information Organization

Proceedings of the DARPA Broadcast NewsWorkshop

Proceedings of the Eighth International Conference on Informa-tion and Knowledge Management (CIKM99)

Proceedings of SIGIR

Proceedings of KDD 2000 Conference

Information Retrieval


Proceedings of the DARPA Broadcast News Transcriptionand Understanding Workshop

ACM Transactions on Information Systems(TOIS)


THE ELECTRICAL ENGINEERING AND APPLIED SIGNAL …index-of.co.uk/Artificial-Intelligence/Pattern... · Lal Chand Godara Nonlinear Signal and Image Processing: Theory, Methods, and

Documents