1 NOOJ Conference NOOJ Conference Inalco, Paris Inalco, Paris June 16th, 2012 June 16th, 2012 Vincent BÉNET INALCO CREE Recherche assistée par ordinateur Conception and realization Conception and realization of grammatical & lexical resources of grammatical & lexical resources for the Russian language for the Russian language for Max Silberztein’s Nooj software for Max Silberztein’s Nooj software Russian Module for NooJ: design and implementation
Russian Module for NooJ: design and implementation. Conception and realization of grammatical & lexical resources for the Russian language for Max Silberztein’s Nooj software. NOOJ Conference Inalco, Paris June 16th, 2012. Vincent BÉNET INALCO CREE Recherche assistée par ordinateur. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
NOOJ Conference NOOJ Conference Inalco, ParisInalco, Paris
June 16th, 2012June 16th, 2012
Vincent BÉNETINALCO
CREE Recherche assistée par ordinateur
Conception and realization Conception and realization of grammatical & lexical resourcesof grammatical & lexical resources
for the Russian languagefor the Russian language
for Max Silberztein’s Nooj software for Max Silberztein’s Nooj software
Russian Module for NooJ: design and implementation
Description of the realizationDescription of the realization Dictionaries / paradigms /grammarsDictionaries / paradigms /grammars
Job left to be done…Job left to be done…
Russian Module for NooJ:
design and implementation
33
Writing lexical resources for the Russian languageWriting lexical resources for the Russian language
Build dictionairies from textsBuild dictionairies from texts
Create one « small » dictionary and Create one « small » dictionary and many grammars for derivational formsmany grammars for derivational formsраб раб + a (slave) + a (slave) раб раб + o+ oтт ++ а +а + тьть (work) (work)за +за + раб +раб + отот + к+ к ++ аа (salary) (salary)
Complete one « big » existing dictionary Complete one « big » existing dictionary and create manyand create many grammarsgrammars
44
Writing lexical resources for the Russian languageWriting lexical resources for the Russian language
ZALIZNIAK’s grammatical dictionary : 96 000 entriescomplete dictionary, in inverted alphabetical order, with all grammatical annotation
To obtain, to reach :Достигать нсв нп 1a$3 (доcтигнуть//доcтичь) имеется страдDostigat’ ipf nt 1a$3 (dostignut’/dostich’) has a passive form
55
Writing lexical resources for the Russian languageWriting lexical resources for the Russian language
The problem of accent markers was delayed
Encountered problems Classification complete but some tags are absent ( V, N…)Classification based on accent markersA lot Unformal unclassified added annotations
Zalizniak’s dictionary was resorting, its classification was modified, simplified and completed for computer use
66
The design of lexical resources The design of lexical resources for the Russian languagefor the Russian language has consisted in: has consisted in:
33. sorting the dictionary . sorting the dictionary (inverted alphabetical order for each (inverted alphabetical order for each wordword))
2. 2. recoding the dictionary with this tagsrecoding the dictionary with this tags
6. 6. problem with problem with ë / eë / e
4. f4. fixing a paradigm model list ixing a paradigm model list ((kartakarta instead ofinstead of zh1a )
5. 5. writing paradigmswriting paradigms
7. a7. allocating models to the wordsllocating models to the words
8. 8. verifying the resultsverifying the results
9. 9. testing with textstesting with texts
10. 10. Correcting and proofreadingCorrecting and proofreading
77
Writing lexical ressources for RussianWriting lexical ressources for Russian
1. Creating tags and properties N, A, V, ADV ….
A_Forme = fc | fl | adv;A_Genre = m | f | n ;A_SGenr = an | inan ;A_Nombre = s | p;A_Cas = Im | Vi | Ro | Da | Tv | Pr | Zv;A_Deg = Comp | Sup ;ADV_Deg = Comp;
V_Pers = 1 | 2 | 3 ;V_Asp = Ipf | Pf ;V_Type = Mvt ;V_Morph = Pvb | Simp | Sufx | PvbSufx ;V_SsAsp = Det | Indet ;V_Temps = Pre | Pa | Fu ;V_Mode = Inf | Ind | Imp | Cond | Ger | Prtp ;V_Voix = Act | Pss ;V_Genre = m | f | n ;V_Nombre = s | p ;V_Constr = intr | tr | sja ;V_Cas = Im | Vi | Ro | Da | Tv | Pr ;
88
Writing lexical ressources for RussianWriting lexical ressources for Russian2. recoding the dictionary
3. Sorting the dictionary to get inverted aphabetical ordering
- useless words: source of unnecessary ambiguities the names of letters a, б, в, и, к, о, с, у, яarchaic unused words.- repetitions of the same word in different parts of speech ( adjectives / nouns; adjectives / pronouns; interjections/particles/parenthesis )
Increase the number of different models ?Increase the number of different models ?
To avoid generating To avoid generating unexpected or incongruous unexpected or incongruous forms forms or failing to recognize or failing to recognize existing forms.existing forms.Читав ? Читав ? Čitav ? Čitav ? Пиша ? Пиша ? Piša ? Piša ? Счастие ? Счастие ? ŜastiŜastiее ? ?
Suppress word entries Suppress word entries and / orand / or forms ? forms ?