Top Banner
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Slide 1
  • SPEECH RECOGNITION Kunal Shalia and Dima Smirnov
  • Slide 2
  • What is Speech Recognition? Speech Recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine readable format. Speech Recognition vs. Voice Recognition
  • Slide 3
  • Speech Recognition Demonstration
  • Slide 4
  • Early Automatic SR Systems Based on the theory of acoustic phonetics Describes how phonetic elements are realized in speech Compared input speech to reference patterns Trajectories along the first and second formant frequencies for the numbers 1 through 9 and oh: Used in the first speech recognizer built by Bell Laboratories in 1952
  • Slide 5
  • The Development of SR 1950s RCA Laboratories recognizing 10 syllables spoken by a single speaker MIT Lincoln Lab speaker-independent 10-vowel recognition 1960s Kyoto University speech segmenter University College first to use a statistical model of allowable phoneme sequences in the English language RCA Laboratories non-uniform time scale instead of speech segmentation 1970s Carnegie Mellon graph search based on a beam algorithm
  • Slide 6
  • The Two Schools of SR Two schools of applicability of ASR for commercial applications were developed in the 1970s IBM Speaker-dependent Converted sentences into letters and words Transcription - focus on the probability associated with the structure of the language model N-gram model AT&T Speaker-independent Emphasis on an acoustic model over language model
  • Slide 7
  • Markov Models A stochastic model where each state depends only on the previous state in time. The simplest Markov Model is the Markov chain which undergoes transitions from one state to the other through a random process. Markov Property
  • Slide 8
  • Hidden Markov Models A Hidden Markov Model (HMM) is a Markov Model using the Markov Property with unobserved (hidden) states. In a Markov Model the states are directly visible to the observer, while in an HMM the state is not directly visible but the output,which is dependent on the state, is visible.
  • Slide 9
  • Elements of a HMM There are a finite number of N states, and each state possesses some measurable, distinctive properties. At each clock time T, a new state is entered based upon a transition probability distribution which depends on the previous state(Markovian property) After each transition, an observation output symbol is produced according to the probability distribution of the state.
  • Slide 10
  • Urn and Ball Example We assume that there are N glass urns in room. In each urn there is a large quantity of colored balls and M distinct colors. A gene is in the room and randomly chooses the initial urn. Then a ball is chosen at random, its color recorded, and then the ball is replaced in the same urn. A new urn is selected according to a random procedure associated with the current urn.
  • Slide 11
  • Urn and Ball Example Each state corresponds to a specific urn Color probability is defined for each state (hidden)
  • Slide 12
  • Coin Toss Example You are in a room with a barrier and you cannot see what is happening on the other side. On the other side another person is performing a coin(or multiple coin) tossing experiment. You wont know what is happening, but will receive the results of each coin flip. Thus a sequence of HIDDEN coin tosses are performed and you can only observe the results.
  • Slide 13
  • One coin toss
  • Slide 14
  • Two coins being tossed
  • Slide 15
  • Three coins being tossed
  • Slide 16
  • HMM Notation
  • Slide 17
  • The Three Problems for HMM 1. Given the observation sequence O = (o1... oT ), and a model = (A, B, ), how do we e ciently compute P (O| ), the probability of the observation sequence given the model? 2. Given the observation sequence O = (o1... oT ), and a model = (A, B, ), how do we choose a corresponding sequence q = (q1... qT ) that is optimal in some sense (i.e., best explains the observations)? 3. How do we adjust the model parameters = (A, B, ) to maximize P (O| )?
  • Slide 18
  • 3 types of HMM Ergodic Model Left to Right Model Parallel Left to Right Model
  • Slide 19
  • Ergodic Model In an ergodic model it is possible to reach any state from any other state.
  • Slide 20
  • Left to Right (Bakis) Model As time increases, the state index increases or stays the same
  • Slide 21
  • Parallel Right to Left Model A left to right model where there are several paths through the states.
  • Slide 22
  • HMM in SR 1980s shift to rigorous statistical framework HMM can model the variability in speech Use Markov chains to represent linguistic structure and the set of probability distributions Baum-Welch Algorithm to find unknown parameters Hidden Markov Model merged with finite-state network
  • Slide 23
  • Speech Recognition Today Developments in algorithms and data storage models have allowed more efficient methods of storing larger vocabulary bases Modern Applications Military Health care Telephony Computing