Top Banner
Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive Digital Media Technology Research Center, CASIA
38

Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

Jan 04, 2016

Download

Documents

William Day
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

Compact WFSA basedLanguage Model and Its Application

in Statistical Machine Translation

Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu

Interactive Digital Media Technology Research Center, CASIA

Page 2: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

2

Outline

Task

Problems

Solution

Our Approach

Results

Conclusion

Page 3: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

3

Outline

Task

Problems

Solution

Our Approach

Results

Conclusion

Page 4: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

4

Task

N-gram Language Model assign probabilities to string of words or tokens Let wL denote a string of L tokens over a fixed

vocabulary

smoothing techniques– back-off

– Define

1 11 1 1

1 1

ˆ( ) ( | ) ( | )L L

L i ii i i n

i i

P w P w w P w w

1 1 11 1

1 2

1 11 1

( ) ( | )

( ) ( ) others

( ) 1.0

i ii i k i k

i i k i ii k i k

i ii k i k

w w LMP w w

w P w

w w LM

Page 5: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

5

Outline

Task

Problems

Solution

Our Approach

Results

Conclusion

Page 6: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

6

Problems

Query in trie structure Useless queries Problems in Forward

Query Problems in Back-off

Query

Page 7: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

7

Outline

Task

Problems

Solution

Our Approach

Results

Conclusion

Page 8: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

8

Solution

Another point of view a random procedure a continuous process

Benefit Speed up Forward Query Speed up Back-off Query

Goal Fast Compact

Page 9: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

9

Outline

Task

Problems

Solution

Our Approach

Results

Conclusion

Page 10: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

10

Our Approaches

FAST WFSA 5-turple M=(Q, Σ, I, F, δ )

Definition

Q a set of states

I a set of initial states

F a set of final states

Σa alphabet which represents the input and output labels

δ δ Q×(Σ∪{ε}), a transition relation

( ) { | ( , ) }i iL M w q w F

Page 11: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

11

Our Approaches

FAST WFSA 5-turple M=(Q, Σ, I, F, δ )

Example

Q a set of states

I a set of initial states

F a set of final states

Σa alphabet which represents the input and output labels

δ δ Q×(Σ∪{ε}), a transition relation

a bw

Page 12: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

12

Our Approaches

Compact Trie Sort Array

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 13: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

13

Our Approaches

Compact Trie Sort Array Link index

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 14: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

14

Our Approaches

WFSA-based LM Trie structure

Note:– Tf triggers corresponding to forward query– Tb triggers spontaneously without any input

– reaches to the leaves– carries out back-off queries

Q the nodes in trie

I the root of trie

F Each node of trie except the root

Σ the alphabet of input sentences

δ forward transition Tf and roll-back transition Tb

Page 15: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

15

Our Approaches

WFSA-based LM

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 16: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

16

Our Approaches

WFSA-based LM

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Probability

Back-off

Index

Probability

Back-off

Index

Roll-back index

Page 17: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

17

Our Approaches

WFSA-based LM

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Probability

Back-off

Index

Probability

Back-off

Index

Roll-back index

Cross Layer

Page 18: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

18

Our Approaches

Query Method

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 19: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

19

Our Approaches

Query Method

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 20: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

20

Our Approaches

Query Method

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 21: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

21

Our Approaches

Query Method

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 22: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

22

Our Approaches

Query Method

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 23: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

23

Our Approaches

Query Method

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 24: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

24

Our Approaches

Query Method

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 25: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

25

Our Approaches

Query Method

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 26: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

26

Our Approaches

Query Method

1w

2w

3w 4w

4w

2w 3w

3w 4w

5w4w

4w 6w

5w

5w5w

6w

order = 1

order = 2

order = 3

order = 4

6w

Page 27: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

27

Our Approaches

State Transitions

Page 28: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

28

Our Approaches

Query LM

Page 29: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

29

Our Approaches

For HPB SMT For a source sentence

– A huge number of LM queries– Ten Millions– Most of these are repetitive

Hash cache

Hash

WFSA-based LM

Yes

No

Query

Page 30: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

30

Our Approaches

For HPB SMT Hash cache

– Small & fast– Hash size 24bit

– 16M

– Simple operation– Additive Operation– Bitwise Operation

Hash clear– For each sentence

s1 w1

state input

s2 w2

sn wn

……

……

sa

value

sb

sy

……

sx

sz

Page 31: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

31

Outline

Task

Problems

Solution

Our Approach

Results

Conclusion

Page 32: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

32

Results

Setup LM Toolkit: SRILM Decoder: Hierarchical phrase-based translation

system Test data: IWSLT-07(489) & NIST-06(1664) Training data:

Tasks ModelParallel

sentencesChinese words

English words

IWSLT-07TM[1] 0.38M 3.0M 3.1M

LM[2] 1.3M —— 15.2M

NIST-06TM[3] 3.4M 64M 70M

LM[4] 14.3M —— 377M

[1] The parallel corpus of BTEC (Basic Traveling Expression Corpus) and CJK (China-Japan-Korea corpus)[2] The English corpus of BTEC+CJK+CWMT2008[3] LDC2002E18, LDC2002T01, LDC2003E07, LDC2003E14, LDC2003T17, LDC2004T07, LDC2004T08, LDC2005T06, LDC2005T10, LDC2005T34, LDC2006T04, LDC2007T09[4] LDC2007T07

Page 33: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

33

Results

Storage Space The storage sizes increase about 35% Linearly dependent with the nodes of trie Acceptable

Tasks n-grams SRILM (Mb) WFSA (Mb) Δ (%)

IWSLT-074 65.7 89.1 35.6

5 89.8 119.5 33.1

NIST-064 860.3 1190.4 38.4

5 998.5 1339.7 34.2

The comparison of LM size between SRILM and WFSA

Page 34: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

34

Results

Query Speed WFSA

– 60% in 4-grams– 70% in 5-grams

WFSA+cache– Speed up by 75%

n-grams methods IWSLT-07(s) NIST-06(s)

4

SRILM 163 15433

WFSA 70 6251

WFSA+cache 42 3907

5

SRILM 261 25172

WFSA 85 7944

WFSA+cache 59 6128

Page 35: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

35

Results

Analysis Repetitive queries and back-off queries in SMT 4-gram

– back-off queries are widely existed– most of these queries are repetitive

WFSA based LM can speed up queries effectively

Tasks Back-off Repetitive

IWSLT-07 60.5% 95.5%

NIST-06 60.3% 96.4%

Page 36: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

36

Outline

Task

Problems

Solution

Our Approach

Results

Conclusion

Page 37: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

37

Conclusion

A faster WFSA-based LM Faster forward query Faster back-off query

A compact WFSA-based LM Trie structure

A simple caching technique For SMT system

Other fields Speech recognition Information retrieval

Page 38: Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

Thanks!