Top Banner
1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1* , Kaoru Yamamoto 2, Yuji Ma tsumoto 1 1 Nara Institute of Science and Technology 2 CREST, Tokyo Institute of Technology * Currently, NTT Communication Science Labs.
31

1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

1

Applying Conditional Random Fields to Japanese Morphological Analysis

Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1

1 Nara Institute of Science and Technology

2 CREST, Tokyo Institute of Technology

* Currently, NTT Communication Science Labs.

Page 2: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

2

Backgrounds Conditional Random Fields [Lafferty 01]

A variant of Markov Random Fields Many applications

POS tagging [Lafferty01], shallow parsing [Sha 03], NE recognition [McCallum 03], IE [Pinto 03, Peng 04]

Japanese Morphological Analysis Must cope with word segmentation Must incorporate many features Must minimize the influence of the length bias

Page 3: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

3

Japanese Morphological Analysis

word segmentation (no explicit spaces in Japanese)

POS tagging lemmatization, stemming

INPUT: 東京都に住む (I live in Metropolis of Tokyo.)

東京 / 都 / に / 住む

東京 (Tokyo) NOUN-PROPER-LOC-GENERAL都 (Metro.) NOUN-SUFFIX-LOCに (in) PARTICLE-GENERAL住む (live) VERB BASE-FORM

Page 4: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

4

Simple approach for JMA Character-based begin / inside tagging

non standard method in JMA cannot directly reflect lexicons

over 90% accuracy can be achieved using the naïve longest prefix matching with a lexicon

decoding is slow

東 京 / 都 / に / 住 む B I B B B I

Page 5: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

5

Our approach for JMA Assume that a lexicon is available word lattice

represents all candidate outputs reduces redundant outputs

Unknown word processing invoked when no matching word can be

found in a lexicon character types e.g., Chinese character, hiragana, katakana, number .. etc

Page 6: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

6

Problem SettingInput: “ 東京都に住む (I live in Metropolis of Tokyo)”

BOS 東 (east)[noun]

東京 (Tokyo)[noun]

京都 (Kyoto)[noun]

都 (Metro.)[suffix]

に (in)[particle]

に (resemble)[verb]

住む (live) [verb]

EOS

Lattice:

京 (capital)[noun]

に particle, verb東 noun京  noun東京  noun京都 noun…

GOAL: select the optimal path out of all candidates

),,,,( ##11 YY twtwY

X

Input:

Output:

lexicon

NOTE: the number of tokens #Y varies

Page 7: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

7

Long-standing Problems in JMA

Page 8: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

8

Complex tagset Hierarchical tagset

HMMs cannot capture them How to select the hidden classes?

TOP level → lack of granularity Bottom level → data sparseness Some functional particles should be lexicalized

Semi-automatic hidden class selections [Asahara 00]

京都 (Kyoto)

NounProper

LocGeneralKyoto

Page 9: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

9

Complex tagset, cont. Must capture a variety of features

京都 (Kyoto)

nounproper

locgeneralKyoto

に (in)

particlegeneral

φφに

住む (live)

verbindependent

φφ

live

base-formPOS hierarchy

overlapping features

inflectionscharacter types

prefix, suffix

These features are important to JMA

lexicalization

Page 10: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

10

JMA with MEMMs [Uchimoto 00-03]

Use discriminative model, e.g., maximum entropy model, to capture a variety of features

sequential application of ME models

都 (capital) [suffix]

BOS 東 (east)[noun]

東京 (Tokyo) [noun]

P( 東 |BOS) < P( 東京 |BOS) に (particle)[particle]

に (resemble)[verb]

P( に , particle| 都 ,suffix) > P( に ,verb| 都 ,suffix)

),|,()|( 11 iii

ii twtwpXYP

Page 11: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

11

Problems of MEMMs

Label bias [Lafferty 01]

BOS A

B

DC

E

0.6

0.4

1.0

1.0

1.0

1.0

0.4

0.6 EOS

P(A, D | x) = 0.6 * 0.6 * 1.0 = 0.36P(B, E | x) = 0.4 * 1.0 * 1.0 = 0.4

P(A,D|x) < P(B,E|x)

paths with low-entropy are preferred

),|,()|( 11 iii

ii twtwpXYP

Page 12: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

12

Problems of MEMMs in JMA

Length bias

BOS A

B

DC0.6

0.4 1.0

1.0

1.0

0.4

0.6 EOS

P(A, D | x) = 0.6 * 0.6 * 1.0 = 0.36P(B    | x) = 0.4 * 1.0 = 0.4

P(A,D|x) < P(B|x)

long words are preferred length bias has been ignored in JMA !

),|,()|( 11 iii

ii twtwpXYP

Page 13: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

13

Long-standing problems must incorporate a variety of features

overlapping features, POS hierarchy, lexicalization, character-types

HMMs are not sufficient must minimize the influence of length bias

another bias observed especially in JMA MEMMs are not sufficient

Can CRFs solve these problems? Yes!

Page 14: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

14

Use of CRFs to JMA

Page 15: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

15

CRFs for word lattice

Global Feature F(Y,X) = (… 1 … … 1 … … 1 … )

Parameter   Λ = (… 3 … … 20 … 20 ... )

BOS - noun noun - suffix

BOS 東 (east)

[noun]

東京 (Tokyo)

[noun]

京都 (Kyoto)

[noun]

都 (Metro.)

[suffix]

に (in)[particle]

に (resemble)

[verb]

住む (live)

[verb]EOS

Lattice:

京 (capital)

[noun]

noun / Tokyo

encodes a variety of uni- or bi-gram features in a path

xZ

XYXYP

)),(exp()|(

)('

)),'(exp(XY

X XYZ FΛ

)(X : a set of all candidate paths

Page 16: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

16

CRFs for word lattice, cont. single exponential model for the entire paths

X

i kiiiikk

x

Z

twtwf

Z

XYXYP

),,,(exp(

)),(exp()|(

11

otherwise :0

, :1),,,( 1

111234

paritcletnounttwtwf ii

iiii

fewer restrictions in the feature design can incorporate a variety of features can solve the problems of HMMs

Page 17: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

17

Encoding Maximum Likelihood estimation

j XYjjj

N

jjj

j

K

XYXY

XYPL

L

)('

1

),'(),(explog

))|(log(

maxargˆ

FΛFΛ

Λ

Λ

ΛΛ

all candidate paths are taken in encoding influence of length bias will be minimized can solve the problems of MEMMs

A variant of Forward-Backward [Lafferty 01] can

also be applied to word lattice

Page 18: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

18

MAP estimation

L2-CRF (Gaussian prior) non-sparse solution (all features have non-zero weight) good if most given features are relevant non-constrained optimizers, e.g., L-BFGS, are used

L1-CRF (Laplacian prior) sparse solution (most features have zero-weight) good if most given features are irrelevant constrained optimizers, e.g., L-BFGS-B, are used

C is a hyper-parameter

2

1

1 ||||

||||

2

1))|(log(C

N

jjj XYPLΛ

Page 19: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

19

Decoding

),(maxarg

)|(maxargˆ

)(

)(

XY

XYPY

XY

XY

Viterbi algorithm essentially the same architecture as HMMs and MEMMs

Page 20: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

20

Experiments

Page 21: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

21

Data

KC

source Mainichi News Article ‘95

lexicon (size) JUMAN 3.61 (1,983,173)

POS structure 2-levels POS,

c-form, c-type, base

# training sentences 7,958

# training tokens 198,514

# test sentences 1,246

# test tokens 31,302

# features 791,798

KC and RWCP, widely-used Japanese annotated corpora

Page 22: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

22

Features

otherwise :0

, :1),,,( 1

111234

paritcletnounttwtwf ii

iiii

京都 (Kyoto)

nounproper

locgeneralKyoto

に (in)

particlegeneral

φφに

住む (live)

verbindependent

φφ

live

base-formPOS hierarchy

overlapping features

inflectionscharacter types

prefix, suffix lexicalization

Page 23: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

23

Evaluation

three criteria of correctnessseg: word segmentation only top: word segmentation + top level of POSall: all information

F =2 ・ recall ・ precision

recall + precision

recall =# correct tokens

# tokens in test corpus

precision=# correct tokens

# tokens in system output

Page 24: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

24

Resultsseg top all

L2-CRFs 98.96 98.31 96.75

L1-CRFs 98.80 98.14 96.55

HMMs 96.22 94.99 91.85

MEMMs 96.44 95.81 94.28

L1/L2-CRFs outperform HMM and MEMM L2-CRFs outperform L1-CRFs

Significance Tests: McNemar’s paired test on the labeling disagreements

Page 25: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

25

Influence of the length bias

HMM, CRFs: relative ratios are not much different MEMM: # of long word errors is large → influenced by the length bias

# long word err. # short word err.

HMMs 306 (44%) 387 (56%)

L2-CRFs 79 (40%) 120 (60%)

MEMMs 416 (70%) 183 (30%)

Page 26: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

26

L1-CRFs v.s L2-CRFs

L2-CRFs > L1-CRFs most given features are relevant (POS hierarchies, suffixes/prefixes, character types)

L1-CRFs produce a compact model # of active features

L2: 791,798 v.s L1: 90,163 11%

L1-CRFs are worth being examined

if there exist practical constraints

Page 27: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

27

Conclusions An application of CRFs to JMA Not use character-based begin / inside tags

but use word lattice with a lexicon CRFs offer an elegant solution to the proble

ms with HMMs and MEMMs can use a wide variety of features (hierarchical POS tags, inflections, character types, …etc)

can minimize the influence of the length bias (length bias has been ignored in JMA!)

Page 28: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

28

Future work Tri-gram features

Use of all tri-grams is impractical as they make the decoding speed significantly slower need to use a practical feature selection

  e.g., [McCallum 03]

Apply to other non-segmented languages e.g., Chinese or Thai

Page 29: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

29

CRFs encoding A variant of Forward-Backward [Lafferty 01] can also be

applied to word lattice

twtw X

twk

kktw

k

kXYP

Z

twtwf

twtwf

XYFE

,,','

,'

''','

)|(

,,','exp

,,','

),(

w,t

w’,t’α’

α’

α’

α

kkk twtwf ,,','exp'

w’,t’

w’,t’

boseosXeosbos Z ,1

BOS

w’,t’ w,tEOS

α β

XZ

exp()

Page 30: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

30

Influence of the length bias, cont.

caused rather by the influence of the length bias (CRFs can correctly analyze these sentences)

海sea

にparticle

かけたbet

ロマンはromanticist

ロマンromance

はparticle

The romance on the sea they bet is …

荒波 rough waves

にparticle

負けloose ない

not心

heart

ない心 one’s heart

A heart which beats rough waves is …

MEMMs select

MEMMs select

Page 31: 1 Applying Conditional Random Fields to Japanese Morphological Analysis Taku Kudo 1*, Kaoru Yamamoto 2, Yuji Matsumoto 1 1 Nara Institute of Science and.

31

Cause of label and length bias MEMM only use correct path in encoding transition probabilities of unobserved paths will be

distributed uniformly

BOS 東[noun]

東京 [non]

都 [suffix]

に [particle]

に[particle]

京都 [noun]

京 [noun]