Top Banner
What’s in a Lexicon ? Cem Bozs ¸ahin Computer Engineering Middle East Technical University (METU), Ankara March 17, 2004
36

Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

Apr 03, 2018

Download

Documents

lamhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

What’s in a Lexicon ?

Cem Bozsahin

Computer Engineering

Middle East Technical University (METU), Ankara

March 17, 2004

Page 2: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

1

Overview

� Examples of bracketing mismatches and phrasal scope of inflections

� Architectures for morphology-syntax-semantics interface

� Morphosyntax: with words or morphemes?

� Morphosyntactic types

� Lexical representation of free/bound morphemes

� Sample derivations of the parser (and performance)

Page 3: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

2

Derivational morphology

� bracketing mismatches were first noted in derivational morphology(Williams, 1981)

� � � � � �

������

� � � � �

�����

Godel number

-ing

� � � � � � �

�������

� � � � � �

������

computer science

-ist

� � � � � � �

�������

Godel

� � � �

����

number -ing

� � � � � � �

�������

computer

� � � �

����

science -ist

Page 4: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

3

Verbal inflection

� The problem arises in inflectional morphology as well

� West Greenlandic (Fortescue, 1984)

Aatsaat tikeraa-nngi-laqfor.first.time visit-NEG-INDIC/3s‘It is not the first time he has visited.’

� It does not mean ’This is the first time he failed to visit.’

Page 5: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

4

Coordination

� German (Muller, 1999)

Wenn Ihr Lust und noch nichts anderes vor-habt,if you pleasure and yet nothing else intend

konnen wir sie ja vom Flughafen abholencan we them PARTICLE from.the airport pick up’If you feel like it and have nothing else planned, we can pick them up at the airport.’

� semantics:Ihr Lust habt UND noch nichts anderes vorhabt

Page 6: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

5

Subordination

� Turkish

Mehmet Ayse’nin

duzenli uyu

-ma-ma-sı-na kız-ıyorM.NOM A.-GEN regularly sleep-NEG-INF-AGR-DAT anger-TENSE’Mehmet is angry with Ayse for not sleeping regularly.’not ’Mehmet is constantly angry with Ayse for not sleeping.’

Mehmet Ayse’nin kitab-ı oku-ma-sı-nı iste-diM.NOM A.-GEN book-ACC read-INF-AGR-ACC want-TENSE’Mehmet wanted Ayse to read the book.’

� semantics: want (read book ayse) mehmet

Page 7: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

6

Relativisation

� Turkish (Bozsahin, 2002)

� Local and non-local morphosyntactic requirements of rel. noun may bedifferent

Ben Mehmet’in cocug-a/*-u ver-dig-i kitab-ı oku-du-mI.NOM M-GEN child-DAT/*ACC give-REL.OP book-ACC read-TENSE-PERS1’I read the book that Mehmet gave to the child.’

Ben Mehmet’in kitab-ı ver-dig-i cocug-u/*-a gor-du-mI.NOM M-GEN book-ACC give-REL.OP child-ACC/*DAT see-TENSE-PERS1’I saw the child to whom Mehmet gave the book.’

Page 8: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

7

Lexemic vs. morphemic lexicons

ver-dig-i :=

��

��

��

��

��

��

��

��

��

��

��

��

LOCAL

��

��

��

��

��

��

��

��

��

��

CAT�

��

��

��

��

HEAD

��

��

AGR

PERSON third

NUMBER sing

CASE dat

��

��

SUBCAT � NP[gen], NP[acc], � NP[dat] �

MOD MODSYN LOCAL CONT INDEX ��

��

��

��

��

CONTENT

��

��

��

RELN give

GIVER

GIVEE �

GIFT

��

��

��

��

��

��

��

��

��

��

��

��

��

NONLOCAL TO-BIND SLASH

��

��

��

��

��

��

��

��

��

��

��

��

��

Page 9: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

8

-dig-i :=

��

��

��

��

��

LOCAL

��

��

��

CAT ��

HEAD noun

acc or dat

SUBCAT � �

��

CONTENT npro

INDEX ��

��

��

��

NONLOCAL INHER SLASH

��

��

��

��

��

��

Page 10: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

9

Nominal inflection

� Morphological richness of the language does not seem to be the issue

� English (Carpenter, 1997)

four truck-s

semantics: four (plu truck)

alleged thiev-es

semantics: plu (alleged thief), not alleged (plu thief)

Page 11: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

10

Ki-relativisationa. araba-da-ki

car-LOC-REL’the one in the car’

b.

� �

c ocug-un ev-i-nde�

-ki-ler-in

-kichild-GEN house-POSS-LOC2-REL-PLU-GEN-RELlit. ’The one that belongs to the ones that are in the child’s house’

c. Ben ev-de-ki-ni hic kullan-ma-dı-mI.NOM house-LOC-REL-ACC2 never use-NEG-TENSE-PERS.1s’I never used the one at home.’

d.

k uc uk ev-de

-ki hediyehouse-LOC-REL present’the present

, the one

at the little house’not ’the little present at the house.’

Page 12: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

11

reduplication

� kırmızı kıpkırmızı yesil yemyesil

� yeni yepyeni/yesyeni acık apacık

� Semantics is uniform, yet there seems to be no morphemic representation.

� It is actually phonology at work on a lexical item (similar processes applyin Tagalog)

� Lexical rules are unary rules; they can create new lexical items, and se-mantics of that can be associated with the lexical rule (specified only once)

Page 13: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

12� NB: lexical rules refer to substantive categories (adjective, adverb etc.),

whereas combinatory rules use formal categories ( � ��� etc.)

Page 14: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

13

Resolving the mismatch

� semantics may require affixes to have scope larger than the inflected word

� Alternatives for the morphology-syntax-semantics interface

– Autonomous levels of morphology, syntax, and semantics (e.g. Sadock,1991)

– Morphosyntax-driven semantics (Heylen, 1997; Bozsahin, 2002)

� The lexicon can be morphemic in either case, but it is a combinatory mor-phemic lexicon in a more lexicalist approach

Page 15: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

14

Inflectional morphology & linguistic theory

� GB (Anderson, 1982) and LFG (Bresnan, 1995) consider inflectional mor-phology to be part of syntax, (in GB, it is not part of combinatory aspectsof grammar)

� MP (Chomsky, 1995) assumes words enter syntax fully inflected (numera-tion)

� HPSG (Pollard & Sag, 1994) keeps it in the lexicon (lexical rules, or lexicalinheritance hierarchy)

� CG work in general (Hoffman, 1995; Heylen, 1997; and others) assumesword-based lexicons, although this is not a theoretical commitment

Page 16: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

15

TLG and inflectional morphology

� Heylen’s (1997, 1999) unary modalities. Frau := � case � fem � sg � 3p � decl �

� Morphosyntactic type assignment is to inflected forms

� Structural rules regulate scope of inflections, e.g. � sg � case � can be turnedinto � case � sg � by a structural rule

� some iterative morphological processes challenge the lexical rules forword-based type assignment (e.g. -ki in Turkish)

� A more lexical solution is to have morphemic lexicons and morphosyntacticcalculus (i.e. -ki as lexical item)

Page 17: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

16

Lexical Syntactic types

� syntactic categories and features

N, NP, S

feature-decorations, NP �� � ,S ��� �

� But features as such are not part of combinatorics,

unlike e.g. �� � �� Det N Case

Page 18: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

17

Syntactic calculus

Application ( � ): � � � � � � � � � � � � �

Composition ( � B): � � � � � � �� � � � � � � �� � � � �

Type Raising ( � T): � � � � � � � � � � � � � � � �

��

Leftward Contraposition ( � XP): � � � � S+t � ��� � � � � � � � �

��

S+t � � S+t � � � � � � � �

��

Rightward Contraposition ( � XP): � � � � S-t � �� � � � � � � � �

��

S-t � � S-t � � � � � � � �

��

Page 19: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

18

Lexical Morphosyntactic types

� Two kinds of unary modalities on syntactic types

�� X (flexible morphosyntactic domain for � : “up to certain inflectional

type”)

�� X (strict domain : “require certain inflectional type”)

� if inflectional paradigm is Stem-Number-Case,

�� N stands for case-marked nouns

�� N stands for noun stems , number-marked, and case-marked nouns

Page 20: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

19� Lattice � � ��� � � � � �

� The set of basic morphosyntactic types: � � �

�� X � � � � and

�� X � � � � if � � � and � � � � ( � � : syntactic types)

� The set of complex morphosyntactic types: � �

� � � � �

If � � � � and� � � � , then � � � and � � � � � �

Page 21: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

20

Lattice of diacritics (inflectional types)

� Inclusion of domains is specified in a language-particular lattice

This comes in handy for specifying morphotactics as well

� More importantly, it allows morphosyntactic types to pick semantic do-mains independent of surface attachment

� All of this is specified in the lexical entry

attachment type, morphosyntactic type, diacritic, semantic type

Page 22: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

21

n-relbase n-root

n-num

s-caus

s-reflex s-recip(x)

(r)

(n)

n-possn-comp (m)

n-case (c)

free

(a)

s-tense

s-abil

s-neg

s-imp

s-pass

(s)

(m)

(t)

(g)

(i)

(p)

s-tense

n-base s-base

(b)

(t)

(v)(b)

(n)

s-person

s-modal

n-num

(f)

(a)

(c)

s-base (v)(l)

n-base (b)

(o)

(u)

free (f)

Page 23: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

22

Morphosyntactic lexicon & grammar

� -PLU :=�� s � �

� N ��

� N � �� plu�

� Forward Application ( � ):

� � � � � � �� �� � � � � �� � � � � � � � ��� � � �

� � � � � � � � �

if � � � � in lattice � , for: � � � � � � � � � � � ,

� � � � � in � ,

� � ��� � � � � � ��� � ,

� � �� � � �

Page 24: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

23

ev -de -kihouse -LOC -PROki

�� N

�� N �

�� N

�� N �

�� N � ��

��� N � ��

�� N � � � and � at PRO house � � � �

PRO

� �

’The one that is in the house’

Page 25: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

24

kucuk ev -de -ki cocuklittle house -LOC -ADJki child

�� N �

�� N

�� N

�� N �

�� N (

�� N �

�� N) �

�� N � �� �

� N��

� N

��� N � ��

�� N �

�� N

�� N � � � and � at � little house � child � � � �

child

� �

’The child� , the one� at the little house’

Page 26: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

25

four boy -s�

� N ��

� N

�� N

�� N �

�� N

��� N � plu boy

��� N � four(plu boy)

four boy -s

*** ��� N � four boy

�� N �

�� N

because n-base �� n-num

***�� N � * plu(four boy)

Page 27: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

26

toy gun -s

��� N �

�� N

�� N � plu gun

***�� N � *toy(plu gun)

because n-num �� n-base

toy gun -s

�� N �

�� N

�� N

�� N �

�� N

��� N � toy gun

��� N � plu(toy gun)

Page 28: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

27

Aatsaat tikeraa -nngi -laqfor.the.first.time visit -NEG -INDIC

(

�� S �

�� NP) � (

�� �

�� NP)

�� S �

�� NP (

�� S �

�� NP) � (

�� S �

�� NP)

��� S �

�� NP

��� S �

�� NP’This is not the first time he visited.’

Page 29: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

28

Aatsaat tikeraa -nngifor.the.first.time visit -NEG

(

�� S �

�� NP) � (

�� S �

�� NP)

�� S �

�� NP (

�� S �

�� NP) � (

�� S �

�� NP)

��� S �

�� NP

*** �

because � �� �

Page 30: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

29

C ocuk kız-a kalem-i ver -me -yi unut-tuchild.NOM girl-DAT pen-ACC give -SUB1i -ACC forgot

� T � T � T � B

N

� �� � N�

�� � N

� � � � DV

� N (

� S �� NP �� � )

� N� � � � N TV

�� �� � child � � � � � � girl � �� �� � pen � �� � � � � �� � �� �� �� �� �� � � � �

give �� � forget

�� � ana� �� �

� S �� NP �� � � NP �� ��

� S � NP �� �

�� N

�� N� � �

� T

(S NP) � (S NP �� NP� � � )

� S � NP �� �

�� S forget � give girl pen � ana child� � child’The child forgot to give the pen to the girl.’

Page 31: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

30

cocug-un kitab-ı ver -digi adam uyu-duchild-GEN book-ACC give -OP.AGR man sleep-TENSE

� � T � B

NP � ��� TV � DV DV (N � � N) � IV � ��� N IV

TV

�IV � ���

N � � N

N � =S � IV

S � and � sleep man � � give man book child �

’The man to whom the child gave the book slept.’

Page 32: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

31

completely destroy -ed

(

�� S ��

� NP) � (

�� S ��

� NP) (

�� S ��

� NP) ��

� NP (

�� S ��

� NP) � (

�� S ��

� NP)

� B

(�

� S ��

� NP) ��

� NP

� B �

(

�� S ��

� NP) ��

� NP

*completely did destroy

(

�� S ��

� NP) � (

�� S ��

� NP) (�

� S ��

� NP) � (

�� S ��

� NP) (

�� S ��

� NP) ��

� NP

� B

(

�� S ��

� NP) ��

� NP

***because � �� �

Page 33: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

32

did destroy completely

(

�� S ��

� NP) � (

�� S ��

� NP) (

�� S ��

� NP) ��

� NP (

�� S ��

� NP) � (

�� S ��

� NP)

� B �

(

�� S ��

� NP) ��

� NP

� B

(

�� S ��

� NP) ��

� NP

destroy -ed completely

(

�� S ��

� NP) ��

� NP (

�� S ��

� NP) � (

�� S ��

� NP) (

�� S ��

� NP) � (

�� S ��

� NP)

� B �

(

�� S ��

� NP) ��

� NP

� B �

???

Page 34: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

33

Experiments with the CKY parser

� a 21-morpheme sentence (12 words) parsed in 2.9 seconds

37-morphemes (20 words) in 40 seconds

� Gungordu & Oflazer’s LFG parser takes 10 seconds/sentence with 24,000word lexicon

� separate morphological analyzers deliver 2 to 5 analyses/second (Oflazer,1996; Komagata, 1997)

� 2.8 morphemes/word on the average including derivations (Turkish)

less than 2 inflections/word (Oflazer et. al, 2001)

Page 35: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

34

Sample text Number of items Avg. number of Avg. CPU time per testtype in text parses/gram. input (milliseconds)

PAS NF PAS NFtests words morphs check parse Unrestr. check parse

Word order & 58 216 384 1.26 3.68 39 39 30case

Subordination 14 70 137 3.00 5.09 267 270 180

Relativisation 23 130 232 2.04 2.32 796 783 266

Control verbs 33 147 291 1.42 3.34 166 163 137

Possessives & 26 109 200 1.23 2.47 137 135 98compounds

Adjuncts 14 57 100 1.12 4.87 89 88 72

-ki relatives 24 66 179 1.07 1.54 36 36 35

Page 36: Cem Bozs¸ahin Computer Engineering March 17, 2004users.metu.edu.tr/bozsahin/metu04_slides.pdf · four truck-s semantics ... yeni yepyeni/yesyeni ac¸ık apac¸ık ... yet there seems

35

Conclusion

� The key to integration of inflectional morphology and syntax is grantingrepresentational status to morphemes

� Morphosyntactic mismatches do not necessitate multi-tiered grammars

� Lexical items can be smaller or larger than words, and project their ownsemantic domains and attachment characteristics

� Loss of efficiency is tolerable up to medium-length sentences

� Modular grammar-lexicon (in fact, just the lexicon!)