Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.

Discriminative Syntactic Language Modeling for Speech Recognition

Michael Collins, Brian Roark Murat, Saraclar

MIT CSAIL, OGI/OHSU, Bogazici University

報告者：郝柏翰2013/02/26

43rd Annual Meeting of the Association for Computational Linguistics

2

Outline

• Introduction

• Parse Tree Feature

• Experiments

• Conclusion

3

Introduction

1) SLM in my mind

2) p-value

4

Introduction

• Word n-gram models have been extremely successful as language models (LMs) for speech recognition.

• However, modeling long-span dependency can help the language model better predict words.

• This paper describes a method for incorporating syntactic features into the language model, for reranking approach, using discriminative parameter estimation techniques.

𝑤∗=𝑎𝑟𝑔max𝑤

(𝛽 log 𝑃 𝑙 (𝑤 )+⟨𝛼 ,∅ (𝑎 ,𝑤 ) ⟩+log 𝑃𝑎(𝑎∨𝑤))

5

Introduction

• Our approach differs from previous work in a couple of important respects.

1) through the feature vector representations we can essentially incorporate arbitrary sources of information from the string or parse tree into the model.We would argue that our method allows considerably more flexibility in terms of the choice of features in the model.

2) second contrast between our work and previous work, is in the use of discriminative parameter estimation techniques.

6

Parameter Estimation

1) Perceptron

2) GCLMs

where

• We are reporting results using just the perceptron algorithm. This has allowed us to explore more of the potential feature space than we would have been able to do using the more costly GCLM estimation techniques.

7

Parse Tree Feature

• Figure 2 shows a Penn Treebank style parse tree that is of the sort produced by the parser.

8

Parse Tree Feature

• Sequences derived from a parse tree

1) POS-tag sequence:we/PRP helped/VBD her/PRP paint/VB the/DT house/NN

2) Shallow parse tag sequence:we/NPb helped/VPb her/NPb paint/VPb the/NPb house/NPc

3) Shallow parse tag plus POS tag:we/PRP-NPb helped/VBD-VPb her/PRP-NPb paint/VB-VPb the/DT-NPb house/NN-NPc

4) Shallow category with lexical head sequence:we/NP helped/VP her/NP paint/VP house/NP

9

Parse Tree Feature H2H

• P : Parent Node• HC : Head Child• C : non-head Child

• + : right• - : left• 1 : adjacent• 2: non-adjacent

10

Experiments

• The training set consists of 297580 transcribed utterances (3297579 words). The oracle score for the 1000-best lists was 16.7%.

• n-gram perceptron partition into 28 sets

11

Experiments

• The first additional features that we experimented with were POS-tag sequence derived features.

1) ,,

12

Experiments

• This yielded 35.2% WER, a reduction of 0.3% absolute over what was achieved with just n-grams, which is significant at p < 0.001, reaching a total reduction of 1.2% over the baseline recognizer.

13

Conclusion

• The results presented in this paper are a first step in examining the potential utility of syntactic features for discriminative language modeling for speech recognition.

• The best of which gave a small but significant improvement beyond what was provided by the n-gram features.

• Future work will include a further investigation of parser-derived features. In addition, we plan to explore the alternative parameter estimation methods

Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.

Documents