Top Banner
替代役男 俊豪 [email protected] Sequence Learning from character based acoustic model to End-to-End ASR system
38

Sequence Learning with CTC technique

Apr 15, 2017

Download

Engineering

ChunHao Wang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sequence Learning with CTC technique

替代役男 ⺩王俊豪 [email protected]

Sequence Learning from character based acoustic model to End-to-End ASR system

Page 2: Sequence Learning with CTC technique

figure from A. Graves, Supervised sequence labelling, ch2 p9

…a stream of input data = a sequence of acoustic feature = a list of MFCC ( Here we use)

Learning Algorithm Thee ound oof

Error ?Extract Feature

Sequence Learning

Page 3: Sequence Learning with CTC technique

Outline1. Feedforward Neural Network 2. Recurrent Neural Network 3. Bidirectional Recurrent Neural Network 4. Long Short-Term Memory 5. Connectionist Temporal Classification 6. Build a End-to-End System 7. Experiment on Low-resource Language

Page 4: Sequence Learning with CTC technique

Active Function

Weighted

Feedforward Neural Network

Page 5: Sequence Learning with CTC technique
Page 6: Sequence Learning with CTC technique

Active Function

Weighted

Recurrent Neural Network

Page 7: Sequence Learning with CTC technique
Page 8: Sequence Learning with CTC technique

Active Function

Weighted

Bidirectional RNN

Page 9: Sequence Learning with CTC technique

figure from A. Graves, Supervised sequence labelling, ch3 p24

Page 10: Sequence Learning with CTC technique

figure from A. Graves, Supervised sequence labelling, ch3 p23

Page 11: Sequence Learning with CTC technique

Active func

G

G

G Input Gate

Output Gate

Forget Gate

G

W WeightedGate

M Multiplyer

M

M

M

W

W

RNNs

LSTMW

W

W

W

W

WW

Page 12: Sequence Learning with CTC technique

LSTM Forward Pass, Previous Layer Input

G

G

G Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

Previous layer

Page 13: Sequence Learning with CTC technique

LSTM Fordward Pass, Input Gate

G

G

Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

Previous layer

G

Page 14: Sequence Learning with CTC technique

LSTM Fordward Pass, Foget Gate

G

G Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

Previous layer

G

Page 15: Sequence Learning with CTC technique

LSTM Forward Pass, Cell

G

G Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

G

Page 16: Sequence Learning with CTC technique

LSTM Forward Pass, Output Gate

G Input Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

G

Previous layer

G

Page 17: Sequence Learning with CTC technique

LSTM Forward Pass, Output

G

G Input Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

G

Page 18: Sequence Learning with CTC technique

G

G

G Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

LSTM Backward Pass, Output Gate

Page 19: Sequence Learning with CTC technique

RNN

LSTM

figure from A. Graves, Supervised sequence labelling, p33,35

Page 20: Sequence Learning with CTC technique

Additional CTC Output Layer(1) Predict label at any timestep

Connectionist Temporal Classification

(2) Probability of Sentence

Page 21: Sequence Learning with CTC technique

Predict label at any tilmestep Framewise Acoustic Feature

Label

A B C D E F G I J K L M N O P Q R S T U V W X Y Z {space} {blank} others like ‘ “ .

Page 22: Sequence Learning with CTC technique

Probability of SentenceThh..e S..outtd ooof

Th.e S.outd of

The Soutd of

Target

Step 1, remove repeated labels

Step 2, remove all blanks

paths:

labelling:

Predict

Now, we can do the maximum likelihood training

Page 23: Sequence Learning with CTC technique

Choose the labelling :

Best Path, the most probable path will correspond to most probable labelling

Output Decoding

Something wrong here. Prefix Search would be better

Page 24: Sequence Learning with CTC technique

How to connect target label and RNNs Output

Page 25: Sequence Learning with CTC technique

Sum over paths probability, forward

Page 26: Sequence Learning with CTC technique

l_s = . s-0 : .h.e.l. s-1 : .h.e.l s-2 : .h.e. ( s-2 blank to blank )

l_s = l s-0 : .h.e.l.l s-1 : .h.e.l. s-2 : .h.e.l ( s-2 same non-blank label )

only allow transition (1) between blank and non-blank label (2) distinct non-blank label

Page 27: Sequence Learning with CTC technique

Sum over paths probability, backward

Page 28: Sequence Learning with CTC technique

l_s = . s+0 : .l.l.o. s+1 : l.l.o. s+2 : .l.o. ( s+2 blank to blank )

l_s = l s+0 : l.l.o. s+1 : .l.o. s+2 : l.o. ( s+2 same non-blank label )

only allow transition (1) between blank and non-blank label (2) distinct non-blank label

Page 29: Sequence Learning with CTC technique

Sum over paths probability

D GO Target : dog -> .d.o.g..t=0

D GO .t=1

D GO .t=2

D GO .t=T-2

D GO .t=T-1

D GO .t=T

.

.d

.d.

.d.o

.d.o.

.d.o.g

.d.o.g.

s=1

s=2

s=3

s=4s=5

s=6

s=7

Page 30: Sequence Learning with CTC technique

Sum over paths probability

D GO

backward-forward : the probability of all the paths corresponding to l that go through the symbol at time t

.t=0

D GO .t=1

D GO .t=t’

D GO .t=T-1

D GO .t=T

Page 31: Sequence Learning with CTC technique

last we derivates w.r.t. unnormalised outputs, get error signal

Page 32: Sequence Learning with CTC technique

A. Graves. Sequence Transduction with Recurrent Neural Networks

Transcription network + Prediction network

Previous Labelfeature at time t

label prediction at time t next label probability

Page 33: Sequence Learning with CTC technique

A. Graves. Towards End-to-end Speech Recognition with Recurrent Neural Networks ( Google Deepmind )

Additional CTC Output Layer

Additional WER Output Layer

Minimise Objective Function -> Minimise Loss Function

Spectrogram feature

Page 34: Sequence Learning with CTC technique

Andrew Y. Ng. Deep Speech: Scaling up end-to-end speech recognition ( Baidu Research)

Additional CTC Output Layer

Additional WER Output Layer

Spectrogram feature

Language Model Transcription

RNN Output : arther n tickets for the game Decode Target: are there any tickets for the game

Q( L ) = log( P( L | x ) ) + a * log( P_LM( L ) ) + b * word_count( L )

Page 35: Sequence Learning with CTC technique

Low-resources language experiment

Target: ? ka didto sigi ra og tindogOutput: h bai to giy ngtndog

test 0.44, test 0.71

Page 36: Sequence Learning with CTC technique

1. Boulard and Morgan (1994) Connectionist speech recognition A hybrid approach. 2. A. Grave. Supervised Sequence Labelling with Recurrent Neural Networks3. A. Grave. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with

Recurrent Neural Network. 4. A. Graves. Sequence Transduction with Recurrent Neural Networks5. A. Graves. Towards End-to-end Speech Recognition with Recurrent Neural Networks ( Google

Deepmind ) 6. Andrew Y. Ng. Deep Speech: Scaling up end-to-end speech recognition ( Baidu Research) 7. standford CTC: https://github.com/amaas/stanford-ctc8. R. Pascanu , On the difficulty of training recurrent neural networks9. L. Besacier. Automatic Speech Recognition for Under-Resourced Languages: A Survey10.A. Gibiansky. http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural-

networks/11. LSTM Mathematic formalism ( mandarin )http://blog.csdn.net/u010754290/article/details/4716797911. Assumption of Markov Modelhttp://jedlik.phy.bme.hu/~gerjanos/HMM/node5.html12. Story about ANN-HMM Hybrid ( mandarin )HMM : http://www.taodocs.com/p-5260781.html13. Generative model v.s. discriminative model http://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-discriminative-algorithm

Reference

Page 37: Sequence Learning with CTC technique

Memory Network : A. Grave. Neural Turing Machine

Appendix B. future work

figure from http://www.slideshare.net/yuzurukato/neural-turing-machines-43179669

Page 38: Sequence Learning with CTC technique

Appendix A. Generative v.s. discriminative

A generative model learns the joint probability p(x,y). A discriminative model learns the conditional probability p(y|x).

data input : (1,0), (1,0), (2,0), (2,1)

y=0 y=1

x=1 1/2 0

x=2 1/4 1/4

y=0 y=1

x=1 1 0

x=2 1/2 1/2

According to Andrew Y. Ng. On Discriminative vs. Generative classifier: A comparison of logistic regression and naive Bayes

The overall gist is that discriminative models generally outperform generative models in classification tasks.

Generative Discriminative