Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Post on 18-Jan-2018

241 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

ASR - basics Page 3 Automated Speach Recognotion Observations representing a speech signal Vocabulary V of different words Our goal – find the most likely word sequence Since we have language modeling acoustic modeling

Transcript

Automated Speach Recognotion

Automated Speach Recognition

By: Amichai Painsky

Automated Speech Recognition - setup

Page 2

Automated Speach Recognotion

• Input – speech waveform:

• Preprocessing:

• Modeling:

• Output – transcription: The boy is in the red house

ASR - basics

Page 3

Automated Speach Recognotion

• Observations representing a speech signal

• Vocabulary V of different words

• Our goal – find the most likely word sequence

• Since

we have

language modeling

acoustic modeling

Observations preprocessing

Page 4

Automated Speach Recognotion

• A sampled waveform is converted into a sequence of parameter vectors at a certain frame rate

• A frame rate of 10 ms is usually taken, because a speech signal is assumed to be stationary for about 10 ms

• Many different ways to extract meaningful features have been developed, some based on acoustic concepts or knowledge of the human vocal tract and psychophysical knowledge of the human perception

Language modeling

Page 5

Automated Speach Recognotion

• Most generally, the probability of a sequence m of words is

• Language is highly structured and limited histories are capable of capturing quite a bit of this structure. Bigram models:

• More powerful two-words (trigrams) history models

• Longer history -> exponentially increasing number of models -> more data is required to train, more parameters, more overfitting

• Partial Matching modeling

Acoustic modeling

Page 6

Automated Speach Recognotion

• Determines what sound is pronounced when a given sentence is uttered

• Number of possibilities is infinite! (depends on the speaker, the ambiance, microphone placement etc.)

• Possible solution – a parametric model in the form of Hidden Markov Model

• Notice other solutions may also apply (for example, neural nets)

Hidden Markov Model

Page 7

Automated Speach Recognotion

• A simple example of HMM:

Hidden Markov Model

Page 8

Automated Speach Recognotion

Hidden Markov Model – Forward Algorithm

Page 9

Automated Speach Recognotion

• Given an observation sequence (for example 10110), what is the probability it was generated from a given HMM (for example – HMM from previous slides).

• For a path q=12312 and a given HMM :

• Therefore, summing over all possible paths:

Hidden Markov Model – Forward Algorithm

Page 10

Automated Speach Recognotion

• Complexity: for a sequence of T observations each path necessitates 2T multiplications. Total number of paths is , therefore

• A more efficient approach – forward algorithm

Hidden Markov Model – Forward Algorithm

Page 11

Automated Speach Recognotion

• Forward algorithm: calculates the probabilities for all subsequences in each time step, using the results from the previous step (dynamic programming)

• Define – the probability of being at the state i at time t and having observed the partial sequence :

Hidden Markov Model – Forward Algorithm

Page 12

Automated Speach Recognotion

• Define – the probability of being at the state i at time t and having observed the partial sequence :

Hidden Markov Model – Forward Algorithm

Page 13

Automated Speach Recognotion

• Complexity - At time t each calculation only involves N previous values of . The length of the sequence is T. Therefore, for each state we need and for the total N states we need

Hidden Markov Model – Viterbi algorithm

Page 14

Automated Speach Recognotion

• Previously: given an observation sequence (for example 10110), what is the probability it was generated from a given HMM

• We now ask: given an observation sequence , what is the sequence of states that it most likely to have generated it?

Hidden Markov Model – Viterbi algorithm

Page 15

Automated Speach Recognotion

• Define as the best path from the start state to state i at time t

• is our objective

• We solve this with the same forward algorithm as before, but this time with maximization instead of summation.

Hidden Markov Model – Viterbi algorithm

Page 16

Automated Speach Recognotion

Hidden Markov Model – Viterbi algorithm, example

Page 17

Automated Speach Recognotion

• Observations sequence - 101

Hidden Markov Model – model fitting

Page 18

Automated Speach Recognotion

• In practice, the parameters of the HMM are unknown

• We are interested in

• No analytical maximum likelihood solution. We turn to Baum-Welch algorithm or forward-backward algorithm.

• Basic idea – count the visits of each state and the number of transitions to derive a probability estimator

Hidden Markov Model – model fitting

Page 19

Automated Speach Recognotion

• Define:

This is the conditional probability that are observed, given that the system starts at state given the model . This can be calculated inductively:

• Define – the probability of being at the state i at time t and having observed the partial sequence :

Hidden Markov Model – model fitting

Page 20

Automated Speach Recognotion

• Define:

This is the conditional probability that are observed, given that the system starts at state given the model .

• Define – the probability of being at the state i at time t and having observed the partial sequence :

• Therefore, the probability of being in state i at time t, given the entire observations sequence and the model is simply:

Hidden Markov Model – model fitting

Page 21

Automated Speach Recognotion

• Define: the probability of being in state i at time t and state j at time t+1 given the model and the observations sequence:

• Graphically:

Hidden Markov Model – model fitting

Page 22

Automated Speach Recognotion

• We are now ready to introduce the parameters estimators:

Transition probability estimator:

The expected number of transitions from state i to j, normalized by the expected number of visits of state i

Hidden Markov Model – model fitting

Page 23

Automated Speach Recognotion

• We are now ready to introduce the parameters estimators:

Observations probability estimator:

The expected number times in state j at which the symbol was observed, normalized by the expected number of times the system visited state j

Hidden Markov Model – model fitting

Page 24

Automated Speach Recognotion

• Notice that the parameters we wish to estimate actually appear in both sides of the equation:

Hidden Markov Model – model fitting

Page 25

Automated Speach Recognotion

• Notice that the parameters we wish to estimate actually appear in both sides of the equation

• Therefore, we use an iterative procedure: after stating with an initial guess for the parameters we gradually update at each iteration and terminate once the parameters stop changing to a certain limit.

Hidden Markov Model – model fitting

Page 26

Automated Speach Recognotion

• For continuous observations

• We estimate the mean and variance for each state j:

Conclusions and final remarks

Page 27

Automated Speach Recognotion

• We learned how to:I. Estimate HMM parameters from a sequence of observationsII. Determine the probability of observing a sequence given an HMMIII. Determine the most likely sequence of states, given an HMM and a

sequence of observations

• Notice the states may either represent words, syllables, phoneme, etc. This is up for the system architect to decide

• For example, words are more informative than syllables, but results with more states and less accurate probability estimation (curse of dimensionality)

Questions?

Page 28

Thank you!

Automated Speach Recognotion

top related