Page 1
NooJ2008Budapest2008-06-08
Verb Valency Enhanced Croatian Lexicon
Kristina Vučković, Nives Mikelić Preradović, Zdravko Dovedan
[email protected] , [email protected] , [email protected] Faculty of Humanities and Social Sciences
University of ZagrebDepartment of Information Sciences
Ivana Lucica 3, Zagreb, Croatia
Page 2
NooJ2008Budapest2008-06-08
The Plan
Our agenda? Increase # of unambiguos NPs
By means of? Existing chunker Verb valency tags
Why? To raise the chunker performence to a higher
level Make preparations for a Croatian parser
Page 3
NooJ2008Budapest2008-06-08
Overview
Croatian verb valency lexicon main characteristics selected data
.xml to .dic conversion how we did it
previous grammars for <VP> | <NP> | <PP> selection
new enhanced grammars <VP+DCobl> <VP+PCobl> <VP+PCtyp>
results comparison precision, recall, f-measure
Page 4
NooJ2008Budapest2008-06-08
Croatian verb valency lexicon - CROVALLEX
Formal description of verb valency frames 1739 verbs
selected from the Croatian frequency dictionary, 1999.
5118 valency frames (in average: 3 frames per verb)
Each frame entry contains descriptions of valence frame frame attributes
frame attributes are either obligatory or optional i.e. obligatory or typical!
Page 5
NooJ2008Budapest2008-06-08
Selected data
1. Reflexive particle ‘se’
if the verb is derived reflexive (e.g. vratiti se)
reflexiva tantum (e.g. smijati se).
Page 6
NooJ2008Budapest2008-06-08
Selected data
2.Pure (prepositionless) case. 7 morphological cases in Croatian.
0 - hidden nominative, 1 - nominative, 2 - genitive, 3 - dative, 4 - accusative, 5 - vocative, 6 - locative, 7 - instrumental.
Page 7
NooJ2008Budapest2008-06-08
Selected data
3. Prepositional case.
Lemma of the preposition and
number of the required morphological case are specified,e.g. od+2, na+4, o+6
Page 8
NooJ2008Budapest2008-06-08
pjevati,aspect=inf+DC_obl=0+AL_typ+PC_obl=6+…
CROVALLEX 2.0008 - *.xml
Page 9
NooJ2008Budapest2008-06-08
Converting to *.dic
Page 10
NooJ2008Budapest2008-06-08
Previous grammars
Page 11
NooJ2008Budapest2008-06-08
Perfect
Page 12
NooJ2008Budapest2008-06-08
II. Future
Page 13
NooJ2008Budapest2008-06-08
Page 14
NooJ2008Budapest2008-06-08
New Grammars
Page 15
NooJ2008Budapest2008-06-08
Verb + Obligatory DC
Page 16
NooJ2008Budapest2008-06-08
Verb + obligatory PC
Page 17
NooJ2008Budapest2008-06-08
Verb + typical PC
Page 18
NooJ2008Budapest2008-06-08
VP+DCobl=
Page 19
NooJ2008Budapest2008-06-08
VP+DCobl=Genitiv
Page 20
NooJ2008Budapest2008-06-08
VP+DCobl=Dativ
Page 21
NooJ2008Budapest2008-06-08
<VP>+<NP+N> agreement
Page 22
NooJ2008Budapest2008-06-08
Results
By hand Before CROVALLEX
After CROVALLEX
# of NP 1150 1099 1070
# of T unambiguous NP
601 729
# of ambiguous NP
437 246+49
# of F unambiguous NP
26+20
Page 23
NooJ2008Budapest2008-06-08
P-R-F for unambiguous NPs
Before CROVALLEX
After CROVALLEX
Precision 33,31 68,13
Recall 52,26 63,39
F-measure 40,69 65,68
Page 24
NooJ2008Budapest2008-06-08
Future work
Subordinating conjunction. Infinitive construction can appear
with a preposition (e.g. 'nego+inf') with the morphological case (e.g. 'inf+4').
Construction with adjectives. e.g. adj-7 ('Osjećam se osvježenim' - 'I feel
fresh'). Construction with adverbs.
e.g. adv-hrabro ('Osjećam se hrabro' - 'I feel brave').
Construction with nominative predicate. e.g. nom_pred ('Historija je postala
legendom' - 'History has become a legend').