Speech recognition in MUMIS Judith Kessens, Mirjam Wester & Helmer Strik.

Speech recognition in MUMIS

Judith Kessens, Mirjam Wester

& Helmer Strik

Manual transcriptions

• Transcriptions made by SPEX:– orthographic transcriptions– transcriptions on chunk level (2-3 sec.)

• Formats:– *.Textgrid praat– xml-derivatives:

• *.pri – no time information• *.skp – time information

Manual transcriptions

Total amount of transcribed matches on ftp-site (including the demo matches):

• Dutch: 6 matches

• German: 21 matches

• English: 3 matches

Extensions:

Dutch (_N), German (_G), English (_E)

Automatic speech recognition

1. Acoustic preprocessing

• Acoustic signal features

2. Speech recognition

• Acoustic models

• Language models

• Lexicon

Automatic transcriptions

• Problem of recorded data:

Commentaries and stadium noise are mixed Very high noise levels

Recognition of such extreme noisy data is very difficult

Examples of data

Yug-Ned match

• Dutch

• English

• German

“op _t ogenblik wordt in dit stadion de opstelling voorgelezen”

“and they wanna make the change before the corner”

“und die beiden Tore die die Hollaender bekommen hat haben”

Examples of data

Eng-Dld match

• Dutch

• English

• German

“geeft nu een vrije trap in _t voordeel van Ince”

“and phil neville had to really make about three yards to stop <dreisler*u> pulling it down and playing it”

“wurde von allen englischen Zeitungen aus der Mannschaft”

Evaluation of aut. transcriptions

insertions+deletions+substitutionsnumber of words

WER(%) =

WER can be larger than 100% !

WERs (all words)

Dutch English German

Yug-Ned 84.5 84.5 77.4

Eng-Dld 83.2 83.3 90.8

WERs (player names)

Yug-Ned

Eng-Dld

WERs versus SNR

Yug-Ned

Eng-Dld

Automatic transcriptions

The language model (LM) and lexicon (lex) are adapted to a specific match

• Start with a general LM and lex• Add player names of the specific match• Expand the general LM and lex when more

data is available

WERs for various amounts of data

0 50,000 100,000 150,000 200,000 250,000

number of words to train the language model

Yug-Ned (Dutch) lex: 1CDEng-Dld (Dutch) lex: 1CDYug-Ned (German)lex: 1CDYug-Ned (German)lex: 7CDsYug-Ned (German)lex: 19CDsEng-Dld (German)lex: 7CDs

Oracle experiments - ICLSP’02

Due to limited amount of material we started off with oracle experiments:

• Language models are trained on target match

• Acoustic models are trained on part of target match or other match

Much lower WERs

Summary of results

Acoustic model training:

• Leaving out non-speech chunks does not hurt recognition performance

• Using more training data is benificial, but more important:

• The SNRs of the training and test data should be matched

Summary of results

• WERs are SNR-dependent

0 5 10 15 20

SNR (dB)

(%) Dutch

English

German

(tested on Yug-Ned match)

Summary of results

function

content

Split words into categories, i.e. function words, content words and football player’s names:WER function words > WER content words > WER names

(tested on Yug-Ned match)

Summary of results• Noise reduction tool (FTNR) small improvement

WERs with and without FTNR

NL Eng Dld

No FTNR FTNR

Ongoing work

Techniques to lower WERs• Tuning of the generic language model

– Defining different classes – Reduction of OOV words in lexicon and in the

language model (using more material)• Speaker Adaptation in HTK

(note: all other experiments are being carried out using Phicos)

Ongoing work

Noise robustness

• Extension of the acoustic models by using double deltas.

• Histogram Normalization and FTNR.

• SNR dependent acoustic models.

Recommendations

Acoustic modeling

• Record commentaries and stadium noise separately

• Speaker adaptation:

- Transcribe characteristics of commentator

- Collect more speech data of commentator

Recommendations

Lexicon and language modeling

• Collect orthographic transcriptions of spoken material, instead of written material

- Subtitles

- Close captions

Speech recognition in MUMIS Judith Kessens, Mirjam Wester & Helmer Strik.

mannschaft slide

difficult slide

matched slide

phicos slide

available slide

yugned match slide

lower wers slide

time information slide

Documents

Klik op de verstopte peperkoekman met de strik en de hoed

#ELLIMINIVEST - Mayflower...omslaget drejet vrang mod højre...

STRIK BORNHOLM€¦ · Der arbejdes med strikkeprøver med....

[ PUBLIKACIJOS DVASIOS QflUfl IP HEQAUfl · 2017. 5....

Bernadine Strik Oral History Interview, “A Berry Expert...

000 casper strik ontwikkelingen rond uitbesteden

AFFORMATIVE, STRIK · AFFORMATIVE, STRIK Werner Hamacher*.....

Overview of the merger prototype. Overview Backgrounds: The....

Speech recognition in MUMIS Eric Sanders (KUN) March 2003.

Munich AUtomatic Segmentation (MAUS) - uni … · Munich...

Geol i Strik 003

MØNSTERSSTRIK.DK PUSSY HAT - Krista Suh · STRIKKEFASTHED....

SU MUMIS SAVO PLANUS PAVERSITE PELNINGU VERSLU · Katalogas...

Kessi1 Intoxikationen Referent: Christian Kessens...

FINANSINIO - Amazon S3...Džiaugiamės, kad jūs...

Spreekvaardigheidstraining met behulp van ASH (Helmer Strik,...