Top Banner
). ( ) | ( max arg ˆ . ) ( ) ( ) | ( ) | ( ). | ( max ) | ˆ ( ˆ W P W Y P W Y P W P W Y P Y W P Y W P Y W P W W W Large Vocabulary Large Vocabulary Continuous Speech Recognition Continuous Speech Recognition
47

Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Dec 13, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

).()|(maxargˆ

.)(

)()|()|(

).|(max)|ˆ(ˆ

WPWYPW

YP

WPWYPYWP

YWPYWPW

W

W

Large VocabularyLarge VocabularyContinuous Speech RecognitionContinuous Speech Recognition

Page 2: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Subword Speech UnitsSubword Speech Units

Page 3: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 4: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 5: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

HMM-Based Subword Speech UnitsHMM-Based Subword Speech Units

Page 6: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 7: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

,: 321 IW WWWWS

),()()()()()(

)()()()()()(:

1)(213)(3231

2)(22211)(1211

3

21

WUWUWUWUWUWU

WUWUWUWUWUWUS

IWLIIWL

WLWLU

Training of Subword UnitsTraining of Subword Units

Page 8: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Training of Subword UnitsTraining of Subword Units

Page 9: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 10: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Training ProcedureTraining Procedure

Page 11: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 12: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 13: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 14: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Errors and performance Errors and performance evaluation in PLU recognitionevaluation in PLU recognition

Substitution error (s)Substitution error (s) Deletion error (d)Deletion error (d) Insertion error (i)Insertion error (i)

Performance evaluation:Performance evaluation: If the total number of PLUs is N, we define:If the total number of PLUs is N, we define:

Correctness rate: N – s – d /NCorrectness rate: N – s – d /N Accuracy rate: N – s – d – i / NAccuracy rate: N – s – d – i / N

Page 15: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

otherwise

validiswwifwwP

wwwPwwwwP

wwwwP

wwwPwwPwPwwwPWP

wwwW

jkkj

jNjjjQ

QQ

Q

Q

0

1)|(

),|()|(

|(

)|()|()()()(

,

11121

).121

21312121

21

Language Models for LVCSRLanguage Models for LVCSR

Word Pair Model: Specify which word pairs are valid

Page 16: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Statistical Language ModelingStatistical Language Modeling

)(

)(

)(

),(

),(

),,(),|(ˆ

,),,(

),,,(),,|(ˆ

),,,,|()(

13

1

212

21

3211213

11

1111

1211

i

Nii

NiiiNiii

Niii

Q

iiN

wF

wFp

wF

wwFp

wwF

wwwFpwwwP

wwF

wwwFwwwP

wwwwPWP

Page 17: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

),,,(log1

lim

)(log)(

)()()(),,,(

),,,(log),,,(1

lim

21

2121

2121

QQ

Vw

QQ

QQQ

wwwPQ

H

wPwPH

wPwPwPwwwP

wwwPwwwPQ

H

Perplexity of the Language ModelPerplexity of the Language Model

Entropy of the Source:

First order entropy of the source:

If the source is ergodic, meaning its statistical properties can be completely characterized in a sufficiently long sequence that the Source puts out,

Page 18: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

QQ

H

Qp

Ni

Q

iiiip

Q

wwwPB

wwwPQ

H

wwwwPQ

H

wwwPQ

H

p /121

21

11

21

21

),,,(ˆ2

),,,(ˆlog1

),,,|(log1

),,,(log1

We often compute H based on a finite but sufficiently large Q:

H is the degree of difficulty that the recognizer encounters, on average,When it is to determine a word from the same source.

Using language model, if the N-gram language model PN(W) is used,An estimate of H is:

In general:

Perplexity is defined as:

Page 19: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Overall recognition system based on subword unitsOverall recognition system based on subword units

Page 20: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Naval Resource (Battleship) Management Task:991-word vocabularyNG (no grammar): perplexity = 991

Page 21: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

)(},{)(})({})({)(},{)(:

322.|BE|sentence,aendorbegincannot

448|EB|sentence,aendcanbutsentenceabegincannot

64|EB|sentence,aendcannotbutsentenceabegincon

117|BE|sentence,aendorbegineithercon

that

that

that

that

words

word

words

words

of

of

of

of

set

set

set

set

}{

}{

}{

}{

silenceBEEBsilenceWWsilenceBEEBsilenceS

BE

EB

EB

BE

Word pair grammarWord pair grammar

We can partition the vocabulary into four nonoverlapping sets of words:

The overall FSN allows recognition of sentences of the form:

Page 22: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

WP (word pair) grammar:Perplexity=60

FSN based on Partitioning Scheme:995 real arcs and18 null arcs

WB (word bigram)Grammar:Perplexity =20

Page 23: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Control of word insertion/word Control of word insertion/word deletion ratedeletion rate

In the discussed structure, there is In the discussed structure, there is no control on the sentence lengthno control on the sentence length

We introduce a word insertion We introduce a word insertion penalty into the Viterbi decodingpenalty into the Viterbi decoding

For this, a fixed negative quantity is For this, a fixed negative quantity is added to the likelihood score at the added to the likelihood score at the end of each word arcend of each word arc

Page 24: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 25: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 26: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 27: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 28: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

diphone.(LRC)contextrightleft

diphone,(RC)contextrightt$

diphone(LC)contextleft$

Untis.DependentWord

UntisPhoneMultiple

Dependent)(ContextTriphones

UnitstIndependenContext

)(

1

$

)(

1

)(

2

)(

2

$

:

:

:

:

)4(

)3(

)2(

)1(

RL

R

L

ppp

pp

pp

abovev

v

vah

v

aboveah

ah

vahb

ah

aboveb

b

ahbax

b

aboveax

ax

bax

ax

above

above

above

above

Context-dependent subword unitsContext-dependent subword units

Creation of context-dependent diphones and triphones

Page 29: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

$$$$$$

.

$)(

)($

$$

$

$

.3

.2

.1

,)(

spspipishishawawowshowsh

otherwise

Tppcif

Tppcif

p

pp

pp

ppp

ppp

ppp

thenTpppcIf

L

R

L

R

RL

RL

RL

RL

If c(.) is the occurrence count for a given unit, we can use a unit reduction rule such as:

$$ spspipishishshawawowawowshowsh

CD units using only intraword units for “show all ships”:

CD units using both intraword and itnerword units:

Page 30: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 31: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Smoothing and interpolation of CD PLU Smoothing and interpolation of CD PLU modelsmodels

.1

,

ˆ

$$$$

$$$$$$

$$

pppppLpppL

pppppp

ppLppLpppLpppLpppL

RR

RR

RRR

BB

BBB

Page 32: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Implementation issues using Implementation issues using CD unitsCD units

Page 33: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 34: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 35: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Word junction effectsWord junction effects

To handle known phonological changes, a set of phonological rules are Superimposed on both the training and recognition networks.Some typical phonological rules include:

Page 36: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Recognition results using CD Recognition results using CD unitsunits

Page 37: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 38: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Position dependent unitsPosition dependent units

qppppq

YLYLpD ||min)(

Page 39: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Unit splitting and Unit splitting and clusteringclustering

Page 40: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 41: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 42: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 43: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 44: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Page 45: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

A key source of difficulty in continuous speech recognition is the So-called function words, which include words like a, and, for, in, is.The function words have the following properties:

Page 46: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Creation of vocabulary-Creation of vocabulary-independent unitsindependent units

Page 47: Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Semantic PostprocessorFor Recognition