Top Banner
Easy Does It: Robust Spectro-Temporal Many- Stream ASR without Fine Tuning Streams Ravuri, Morgan, UC Be rk eley Presented by JJ
28

Gabor presentation

Apr 06, 2018

Download

Documents

Jom Kantapon
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 1/28

Easy Does It:Robust Spectro-Temporal Many-

Stream ASR without Fine Tuning

Streams

Ravuri, Morgan, UC Berkeley

Presented by JJ

Page 2: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 2/28

Motivation

• Physiological experiments indifferent mammal species : alarge percentage of neurons inthe primary auditory cortex (A1)respond differently to upward-versus downward-moving ripplesin the spectrogram of the input(Depireux et al., 2001).

• Spectro-temporal receptivefields (STRFs) : individual neurons

are sensitive to specific spectro-temporal modulation frequenciesin the incoming sound signal

Page 3: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 3/28

Introduction

• Cortically-inspired TF features, which capture

spectral and temporal modulations speech

recognition and discrimination.

• Basically, spectro-temporal features are

derived from filtering spectrograms with

particular filters.

• In this case, the GABOR filter is applied to the

auditory spectrogram.

Page 4: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 4/28

Example

Page 5: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 5/28

Example Gabor Filters

Page 6: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 6/28

Example Gabor Filters

Gaussian envelope

complex sinusoid s(n, k)

Page 7: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 7/28

1D Gabor

Gaussian envelope complex sinusoid s(n, k)

Page 8: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 8/28

2D GaborGaussian envelope complex sinusoid s(n, k)

Page 9: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 9/28

Example Gabor Filters

Gaussian envelope

complex sinusoid s(n, k)

Page 10: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 10/28

Their Gabor Filters

Page 11: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 11/28

Their Gabor Filters

parametersDummy

indices

Page 12: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 12/28

Tons of Combinations!

Page 13: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 13/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

Page 14: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 14/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

Page 15: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 15/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• MLP (Multilayer Perceptron)

• The structure of the MLP

depends on the type of feature

and corpus.

Number of Spectral Cepstral

input units 567 351

frames of context 9 9

hidden units 160 for Aurora2

500 for Number95

160 for Aurora2

500 for Number95

output units 56 56

56D

32D

56D

45D

Page 16: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 16/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• The outputs of the MLP stream

provide an estimate of the

posterior probability distribution

for phones.

• Then, combine each of these

phone probability estimates

across streams by inverse

entropy.56D

32D

56D

71D

Page 17: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 17/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• then apply the KL

Transform to the log

probabilities of the

merged MLPs

Principal Components Analysis

56D

32D

56D

71D

Page 18: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 18/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• then apply the KLTransform to the logprobabilities of themerged MLPs

• reduced to 32D

• orthogonalized

• the features are meanand variance normalized

by utterance• finally appended to the

MFCC feature

56D

32D

56D

71D

Page 19: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 19/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• Features HMM

56D

32D

56D

71D39D 32D

Page 20: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 20/28

Experiments

Database

• Aurora 2 (0  –  20 dB)

• Numbers95

• consists of various numeric portionsextracted from telephone dialogues .

• vocabulary size of 32 words

• training set contains 3590 utterancesof clean data, totaling roughly 3 hrs

• 2 test sets contains 1227 utterances.

• The first contains only clean data

• The second contains the sameutterances with noise added at five

SNR (20dB, 15dB, 10dB, 5dB, and0dB).

• Additive noise

Baseline

• 39 MFCC

• 4-stream system

• 28-stream system

Uni-modulation system

• 150 stream

• spectral only and spectral/cepstral

Metric: Word Error Rate (WER)

Page 21: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 21/28

ResultsAurora 2

Numbers 95

Page 22: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 22/28

ResultsAurora 2

Numbers 95

Page 23: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 23/28

ResultsAurora 2

Numbers 95

Page 24: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 24/28

ResultsAurora 2

Numbers 95

Discussion 1

Page 25: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 25/28

ResultsAurora 2

Numbers 95

Discussion 2

Page 26: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 26/28

ResultsAurora 2

Numbers 95

Discussion 3

Page 27: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 27/28

ResultsAurora 2

Numbers 95

Page 28: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 28/28

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• Not just additive noise

• Another TF feature

might not work

• Log-mel filterbank? Orpower like PNCC?

• How to combine MLP?

Inverse Entropy?

56D

32D

56D

71D39D 32D

Future Work