Top Banner
Carnegie Mellon Christian Monson ParaMo r Finding Paradigms Across Morphology Christian Monson
68

Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

Jan 12, 2016

Download

Documents

Joella Bennett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

Carnegie Mellon

Christian Monson

ParaMorFinding Paradigms

Across Morphology

Christian Monson

Page 2: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

2Carnegie Mellon

Christian Monson

Turkish Morphology – Beads on a String

take passive negativepresent

progressive2nd person singular

You are not being taken

Page 3: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

3Carnegie Mellon

Christian Monson

götür ül m sunüyor

take passive negativepresent

progressive

You are not being taken

2nd person singular

Turkish Morphology – Beads on a String

Page 4: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

4Carnegie Mellon

Christian Monson

Applications of Computational Morphology

• Machine Translation– Turkish-English (Oflazer, 2007)

– Czech-English (Goldwater and McClosky, 2005)

• Speech Recognition– Finnish (Creutz, 2006)

• Information Retrieval

Page 5: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

5Carnegie Mellon

Christian Monson

Challenges of Computational Morphology

• Time Consuming for a New Language– Kemal Oflazer estimates

• 3-4 months to build basic Turkish analyzer• Plus lexicon development and maintenance

• Expertise Needed– Greenlandic

• Official language of Greenland• Agglutinative Inuit language• 50,000 speakers• Per Langaard

Page 6: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

6Carnegie Mellon

Christian Monson

The SolutionRaw Text

Unsupervised Morphology

Induction

Page 7: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

7Carnegie Mellon

Christian Monson

ParaMor – Paradigm MorphologyParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults• ParaMor

– Unsupervised morphology induction system

• Paradigm– The natural structure of morphology

Page 8: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

8Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m sunüyor

take passive negativepresent

progressive2nd person singular

Stem Voice PolarityTense &

MoodPerson & Number

götür

Page 9: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

9Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativepresent

progressive 1st person singular

umgötür

Page 10: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

10Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativepresent

progressive3rd person singular

umØ

götür

Page 11: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

11Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativepresent

progressive

1st person plural

umØuz

götür

Page 12: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

12Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativepresent

progressive

umØuz

götür

Page 13: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

13Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativefuture

umØuz

yecekgötür

Page 14: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

14Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negative

umØuz

yecekgötür

Page 15: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

15Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

umØuz

yecek

Page 16: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

16Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyorumØuz

yecek

Paradigms

Page 17: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

17Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyorumØuz

yecek

Paradigms

• Paradigm– Set of mutually replaceable strings

Page 18: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

18Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyorumØuz

yecek

Paradigm

• Paradigm– Set of mutually replaceable strings

Page 19: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

19Carnegie Mellon

Christian Monson

The ParaMor AlgorithmParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Identify suffix paradigms in 3 steps

Page 20: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

20Carnegie Mellon

Christian Monson

The ParaMor AlgorithmParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Identify suffix paradigms in 3 steps1.Search for candidate paradigms

Page 21: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

21Carnegie Mellon

Christian Monson

The ParaMor AlgorithmParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Identify suffix paradigms in 3 steps1.Search for candidate paradigms

2.Cluster candidates modeling the same paradigm

Page 22: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

22Carnegie Mellon

Christian Monson

The ParaMor Algorithm

• Identify suffix paradigms in 3 steps1.Search for candidate paradigms

2.Cluster candidates modeling the same paradigm

3.Filter

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 23: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

23Carnegie Mellon

Christian Monson

The ParaMor Algorithm

• Identify suffix paradigms in 3 steps1.Search for candidate paradigms

2.Cluster candidates modeling the same paradigm

3.Filter

• Segment words – Using the discovered paradigms

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 24: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

24Carnegie Mellon

Christian Monson

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

• All character boundaries are candidate morpheme boundaries

Page 25: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

25Carnegie Mellon

Christian Monson

s10662

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

autorizacionesbuscabamos

costasimportadoras

vallas…

• Begin search with the most frequent word-final string

Spanish

Page 26: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

26Carnegie Mellon

Christian Monson

s10662

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

autorizacionesbuscabamos

costasimportadoras

vallas…

Ø s5501

• Identify the most frequent mutually replaceable string– Stems that occur with one

suffix in a paradigm will likely occur with other suffixes in that paradigm Spanish

Page 27: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

27Carnegie Mellon

Christian Monson

s10662

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

• Stop adding suffixes – When the most frequent mutually

replaceable string severly decreases the stem count.

Ø s5501

Ø r s

287autorizaciones

buscabamoscostas

importadorasvallas

Page 28: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

28Carnegie Mellon

Christian Monson

s10662

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

• Move on to the next most frequent word-final string

Ø s5501

Ø r s

287

a8981

Page 29: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

29Carnegie Mellon

Christian Monson

a8981

s10662

a o2304

a o os

1410

a as o os892

Ø s5501

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 30: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

30Carnegie Mellon

Christian Monson

n6051

a8981

s10662

Ø n1874

Ø n r

509

Ø do n r354

Ø da das do dos n ndo r ron

118

a o2304

a o os

1410

a as o os892

Ø s5501

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 31: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

31Carnegie Mellon

Christian Monson

n6051

a8981

s10662

Ø n1874

Ø n r

509

Ø do n r354

Ø da das do dos n ndo r ron

118

a o2304

a o os

1410

a as o os892

Ø s5501

es2751

Ø es874

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 32: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

32Carnegie Mellon

Christian Monson

an1786

n6051

a8981

s10662

a an1049

a an ar

413

a an ar ó353

a ada adas ado ados an

ar aron ó149

Ø n1874

Ø n r

509

Ø do n r354

Ø da das do dos n ndo r ron

118

a o2304

a o os

1410

a as o os892

Ø s5501

es2751

Ø es874

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 33: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

33Carnegie Mellon

Christian Monson

...strado

15rado167

an1786

n6051

a8981

s10662

a an1049

a an ar

413

a an ar ó353

a ada adas ado ados an

ar aron ó149

rada radas rado rados

53

rada radorados

67

rada rado89

ra rada radasrado rados ran

rar raron ró23

Ø n1874

Ø n r

509

Ø do n r354

Ø da das do dos n ndo r ron

118

a o2304

a o os

1410

a as o os892

Ø s5501

strada strado12

strada strado stró

9

strada strado strar stró

8

strada stradas strado strar stró

7

es2751

Ø es874

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

...

Page 34: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

34Carnegie Mellon

Christian Monson

Cluster Candidates per Paradigm

15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci, aplic, apoy, celebr, concentr, …

330 Covered Types

15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci, apoy, confirm, consider, declar, …

345 Covered Types

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 35: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

35Carnegie Mellon

Christian Monson

Cluster Candidates per Paradigm

15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci, aplic, apoy, celebr, concentr, …

330 Covered Types

15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci, apoy, confirm, consider, declar, …

345 Covered Types

16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664

451 Covered Types

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 36: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

36Carnegie Mellon

Christian Monson

Cluster Candidates per Paradigm

15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó25 Stems: anunci, aplic, apoy, celebr, consider, …

375 Covered Types

15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci, aplic, apoy, celebr, concentr, …

330 Covered Types

15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci, apoy, confirm, consider, declar, …

345 Covered Types

16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664

451 Covered Types

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 37: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

37Carnegie Mellon

Christian Monson

Cluster Candidates per Paradigm

15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó25 Stems: anunci, aplic, apoy, celebr, consider, …

375 Covered Types

15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci, aplic, apoy, celebr, concentr, …

330 Covered Types

15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci, apoy, confirm, consider, declar, …

345 Covered Types

16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664

451 Covered Types

17: a aba aban ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.715

532 Covered Types

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 38: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

38Carnegie Mellon

Christian Monson

Filter Candidate ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• 2 types of filtering1. Remove small unclustered

candidate paradigms

2. Remove candidates modeling unlikely morpheme boundaries (Harris, 1955)

Page 39: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

39Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

Page 40: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

40Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

a ada adas ado ados an ar aron ó ...

Page 41: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

41Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

a ada adas ado ados an ar aron ó ...

administrada

Page 42: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

42Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +adas

administrada

a ada adas ado ados an ar aron ó ...

Page 43: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

43Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +adas

a as o os

administrada

Page 44: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

44Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +adas, administrad +as

a as o os

administrada

Old way: Separate alternative analysis

Page 45: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

45Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +adas, administrad +as

a as o os

administrada

administr +ad +as New way: Augment the current segmentation

Page 46: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

46Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +ad +a +s

Ø s

administradaØ

administr +adas, administrad +as, administrada +s

Page 47: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

47Carnegie Mellon

Christian Monson

Morpho Challenge 2007ParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Peer operated competition – For unsupervised morphology

induction algorithms

• 4 languages– English– German– Finnish– Turkish

Page 48: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

48Carnegie Mellon

Christian Monson

ParaMor in Morpho Challenge 2007ParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Developed on Spanish – ParaMor’s free parameters were

frozen

Page 49: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

49Carnegie Mellon

Christian Monson

2 Methods of EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

1. LinguisticSegmentations compared to a morphologically analyzed lexicon

Analysis Answer

administradas administr +ad +a +s administrar +Adj +Fem +Pl

administrada administr +ad +a administrar +Adj +Fem

Page 50: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

50Carnegie Mellon

Christian Monson

2 Methods of EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

1. LinguisticSegmentations compared to a morphologically analyzed lexicon

Analysis Answer

administradas administr +ad +a +s administrar +Adj +Fem +Pl

administrada administr +ad +a administrar +Adj +Fem

Page 51: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

51Carnegie Mellon

Christian Monson

2 Methods of EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

2. Task basedInformation retrieval– Short two-sentence queries– About international news topics – Binary relevance assessments – About 50 queries and 20K

relevance judgements for each language.

Page 52: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

52Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

47.2

Page 53: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

53Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

47.2

Mor

fess

or

Par

aMor

50.6

Page 54: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

54Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or47.2

50.6 50.7

Page 55: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

55Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

50.7

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

Page 56: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

56Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.3

Page 57: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

57Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.352.9 53.4

Page 58: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

58Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.352.9

53.4

Page 59: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

59Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orf.

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.352.9

53.4

48.2 48.5

Page 60: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

60Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orf.

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.352.9

53.4

48.2 48.5

24.7

52.0

Page 61: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

61Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

McN

amee

Par

.

27.0 – No Morphological Analysis

28.9

26.4

Page 62: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

62Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

McN

amee

Par

aMor 27.0 – No Morphological Analysis

28.9 29.3

Page 63: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

63Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

Par

aMor

McN

amee

Par

aMor

Mor

fess

or B

asel

ine

Par

aMor

& M

. 30.7 – No Morphological Analysis28.9 29.3

38.3

32.1

Page 64: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

64Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

Par

aMor

McN

amee

Par

aMor

Mor

fess

or B

asel

ine

Par

aMor

& M

. 30.7 – No Morphological Analysis28.9 29.3

38.3 38.2

Page 65: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

65Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

Par

aMor

Mor

fess

or

Par

aMor

McN

amee

Par

aMor

Mor

fess

or B

asel

ine

Par

aMor

& M

orfe

ssor

Mor

fess

or B

asel

ine

Par

aMor

& M

orfe

ssor

32.0 – No Morphological Analysis

28.9 29.3

38.8 38.2

41.2

37.2

Page 66: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

66Carnegie Mellon

Christian Monson

ParaMor: State-of-the-Art Unsupervised Morphology Induction System

• Combined system among the best in Morpho Challenge 2007

• Consistent across languages

• Better than no morphology– Task based (IR) measure

Page 67: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

67Carnegie Mellon

Christian Monson

Many Future Directions

• Improve Performance– F1 of 50-60% is state-of-the-art!

– Inflection classes– Morphophonology

• Beyond beads-on-a-string

Page 68: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

68Carnegie Mellon

Christian Monson