Top Banner
CTpyKTypHaJI l'I n p l'I Kil 3A H aJI 11 "1 H re "1 CT"1 Ka 9 ISSN 0202-2400 • • •• •• •• •••• • • • • • •••••• • ••••• ••••••••••• LA FILOLÓGICA POR LA CAUSA
22

ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

Apr 23, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

CTpyKTypHaJI l'I n p l'I Kil 3A H aJI 11 "1 H re "1 CT"1 Ka

9

ISSN 0202-2400

• • • • • •• •• •• •••• • • • • • • • • • • • • •••••• • ••••• •••••••••••

LA FILOLÓGICA POR LA CAUSA

Page 2: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

CAHKT-TIETEPBYPfCKJ1J1 fOCY}lAPCTBEHHhIJ1 YHJ1BEPCJ1TET

CTPYKTYPHAH 11 I1Pl1KJIA,[J;HAH Jil1HfBl1CTl1KA

Me:HCBY308CKUU c6opHuK

BhrrrycK 9

LA FILOLÓGICA POR LA CAUSA

Page 3: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

Y,UK 80+618.31 BBK 81.1

C83

Pe A a K l..l w o H Ha H Ko JI JI er w H: npoc}>. n. H. EellJleea, npocp. A. C. fepo (oTB. pe­

AaKTop ), npocp. 0. H. fpuH6ayM, npocp. M.A. Ma­pyceHKO

C e K p e T a p b peAaKQHOHHot1: KOJIJiernn B. J!f. Py6uHep

p e l..l e H 3 e H T KaHA. cpHJIOJI. Hayi< AOQ. J!f. II. llaHKOO

lle11amaemcH no nocmaHoeneHu10 PeoaK11uoHHo-u3oamenbcKOzo coeema

<fiunonozuttecKozo <fiaKynbmema C.-llemep6ypzcKozo zocyoapcmeeHHozo yHueepcumema

CTp'fKTYPHaH H npHKJiap;HaH JIHHrBHCTHKa. Bbm. 9: Me)l(­

C83 By3. c6. I no.a; pe,a;. A. C. fep,a;a. - CI16.: l13A-BO C.-I1eTep6. ytt-Ta, 2012. - 356 c.

C6opttHK (Bbm. 8 BbIUieJI B 2010 r.) coAep)f(HT cTaTbH no UIHpOKOMY

Kpyry npo6JieM TeopenrtteCKOH 11 npHKJiaAHOH JIHHfBHCTHKH, no npHMe­

HeHHIO MaTeManttteCKHX MeTOAOB B Jl3bIK03HaHHH.

,D;JIR cneQHaJIHCTOB no TeOpJrn: Jl3bIKa, npHKJiaAHOH H TeopenPieCKOH

JIHHfBHCTHKe.

66K81.l

© C.-IleTep6yprCKJ1H

rocyAapcTBeHHbIH,

yttHBepcHTeT,2012

LA FILOLÓGICA POR LA CAUSA

Page 4: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

. .(''· ~ .K.

p~~· 0. A. MumpocfiaHo6a, 0. H. JIJcmeBcKaH, M.A. I'pattKoBa,

A. C. llluMopuHa, A. C. lllypvtzuHa, C. B. PoMaHoB

3KCIIEPJ1MEHTbl ITO ABTOMATJtlqECKOMY PA3PEIIIEHJ110 JIEKCJ1KO-CEMAHTJ1qECKOJiJ: HEO,D;H03HA qHOCTH

J1 Bbl,ll;EJIEHHIO KOHCTPYKIJ;HM (Ha MaTep11:arre Hau;11:onarrhnoro Kopnyca pycCKoro .ll3bIKa)*

AHHomal{UJl. HacTOH~ee Hcc11e110Bam1e HMeeT 11e11b10 aBTOMaTw1ecKoe H3Bne'leHHe

IIllHrBHCTH'leCKOH HH<jiopMaQHH H3 KOHTeKCTOB HaQHOHa/lbHOro Kopnyca pyccKoro

Jl3b1Ka (HKPJI) c norne11y10~HM HCTIO/lb30BaHHeM 11aHHbIX ll noCTpoeHHH KOMTI/leKCHO­

ro neKCHKorpa<jiH'lecKoro pecypca - Karnnora pyccKHX KOHCTpyK1111i1. ITpe1111araeMb1tt

IlOAXOll npe11no11araeT aBTOMaTH'ICCKYIO K/laCCH<jiHKaQHIO KOHTeKCTOB, Hanpas11em1yio Ha

aBTOMaTH'leCKOe pa3peweHHe neKCHKO-CeMaHTH'leCKOH HeO/IH03Ha'IHOCTH (WSD) H Bbl-

11e11eHHe KOHCTPYKQHH (Cxl). npo11e11ypa aBTOMaTH'leCKOH K/lacrn<jiHKaQHH KOHTeKCTOB

Y"HTbIBaeT rne11yio~He THilbl KOHTeKCTHOH HH<jiopMaQHH, npe11cTas11eHH011 B MHOro-

11pyrnol1 pa3MeTKe HKPJI: neKCH'leCKHe Tern (Tern neMM) (lex), Mop<jionornqecKHe Tern

(gr), /leKCHKO-CeMaHTH'leCKHe TerH (sem), a TaK)Ke KOM6HHaQHH pa31111'1HblX BH/IOB Teros.

Cep1m ::iKcnepHMeHTOB no WSD H Cxl BbmonHCHbl c Hcnonh30BaHHeM penpe3eHTaTHB­

HbIX Bb16opoK KOHTeKCTOB H3 HKPJI. B Ka)K11ol1 cepHH 3KcnepHMeHTOB aHa/IH3Hpy10T­

Cll (1) pa3/IH'IHbie KOHTCKCTHble MapKepbl 3Ha'leHHH 11e11eBbIX C/IOB H (2) KOHCTPYKLIHH,

BK/llO'la!O~He KOHTCKCTHble MapKepbl 11 11e11eBble C/IOBa.

K111o<teB111e CROBa: pa3peweH11e 11eKCHKo-ceMaHTH'lecKol1 HeO/IH03Ha'IHOCTH, KOH­

'TPYKQHH, BbI/ICIIeHHe KOHCTPYKQHl1, Ha11110Ha1IbHbltt Kopnyc pyccKoro Jl3brKa, Knacrn­

<jiHKaQHH KOHTeKCTOB

* Pa6oTa BbmonHeHa npH <j>HHaHcosol1 nop;p;ep)l(Ke P<l><l>.11 (npoeKT 10-06-00586-a), nporpaMMbI <j>yHp;aMeHTanbHblX 11ccnep;oBaHHtt ITp'e3HAHyMa PAH «KopnycHaJJ

nHHrBHCrnKa» (npoeKT FrameBank), a TaK)l(e npoeKJa HY:IP «Mop;enb HHTerpHpo­

saHHoro nporpaMMHO-/IHHrBHCTH'ICCKOrO KOMTIJICKCa ,D;/!Jl C03,D;aHIDI CflC!.\HaJIH3Hpo­

BaHHb!X Kopnycos pyccKoro Jl3bIKa».

© 0 . A. MHTpo<j>aHosa, 0. H. lIHUiescKaJJ, M.A. fpa'IKOBa, A. C. lliHMop1rna,

A. C. IllypbirHHa, C. B. PoMaHos, 2012

159

LA FILOLÓGICA POR LA CAUSA

Page 5: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

0. A. Mitrofanova, 0. A. Lyashevskaya, M.A. Grachkova, A. S. Shimorina, A. S. Shurygina, S. V. Romanov

EXPERIMENTS ON AUTOMATIC WORD SENSE DISAMBIGUATION AND CONSTRUCTION IDENTIFICATION

(Based on Russian National Corpus)

Summary. The research project reported in this paper aims at automatic extraction of linguistic information from contexts in the Russian National Corpus (RNC) and its subsequent use in building a comprehensive lexicographic resource - the Index of Russian lexical constructions. The proposed approach implies automatic context classification intended for word sense disambiguation (WSD) and construction identification (Cxl). The automatic context processing procedure takes into account the following types of contextual information represented in the RNC multilevel annotation: lexical (lemma) tags (lex), morphological tags (gr), lexical-semantic (taxonomy) tags (sem), and combinations of the various types of tags. Multiple experiments on WSD and Cxl are performed using RNC representative context samples. In each series of experiments we analyze (1) different context markers of meaning of target words and (2) constructions including context markers and target words.

Keywords: Word Sense Disambiguation, constructions, construction identification, Russian National Corpus, context classification

1. Bse,n;euHe

IlpoeKT, B paM'Kax KOToporo BbIIIOnHeHO HaCTORII.\ee HCCJieAOBaHHe, ocyll.\eCTBJIReTcR cosMeCTHhIMH ycHJIHRMH KOJIJieKTHBOB Ha1..1110HaJih­Horo Kopnyca pyccKoro R3bIKa (HKPR) H Ka<l>eApbI MaTeMaTtf'IeCKOH n11:HrBHCTHKH CaHKT-IleTep6yprcKoro rocy,l.\apcrneHHOro YffHBepc11:Te­Ta. Uenb npoeKTa - aBTOMaT11:a11posaHHoe nocTpoeHHe 3JieKTpOHHOro KaTanora pyccKHX KOHCTPYKLIHH Ha 6aae HKPR. IloA KOHCTPYKQHeH B 3TOM cnr1ae IlOHHMaeTCR CO"feTaHHe QeJieBOrO CJIOBa H KOHTeKCTHblX MapKepoB ero 3HatfeHHR, xapaKTepH3)'10II.\eecR qacTOTHOCTbIO H ycTOH­"fHBOCTbIO. B Ka"fecrne KOHTeKCTHbIX MapKepos paccMaTp11:saIOTCR Tern neMMhI (lex), Mop<l>onornqecK11:e (gr) 11: neKCHKo-ceMaHTw1ecK11:e (sem) Tern, AOCTynHhie B MHoroyposHeBoi1 paaMeTKe KOHTeKCTOB HKPR. TaK­)l(e nott11MaH11:e KOHCTPYKQHH cornacyeTcR c ocHOBHbIMH H,l.\eRMH fpaM­MaTHKH KOHCTPYKQm1 (Fillmore 1988; Goldberg 1995; 2006; Tomasello 2003; KyaHe1..1osa 2007).

TaK KaK BbI,l.\eJieH11:e KOHCTPYKQHH npo11:cxoAHT no np11:H1..111:ny «bottom-up generalization», KOHTeKCTbl nrna BepHbte MHe moou 6YAYT COilOCTaBJieHbl 11:epapXH"feCKOMY CilHCKY rna6JIOHOB, BKJIIO'laR: 6ep-

160

LA FILOLÓGICA POR LA CAUSA

Page 6: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

HbtU + SPRO; r: pers; dat, aepHbtU + SPRO; r: pers; dat + 11enoaeK, aepHbtU + (SPRO;r:pers)!(S; t: hum); dat + S; t: hum, aepHbtU + dat + S, aepHbtU + s, H3 KOTOpbIX nepBbie '!eTblpe xapaKTepHbl AJIJI aepHblU B 3Ha'leHHff Ha­Oe)l(HbtU, npo11HbtU, cmouKuu, npeoaHHbtU (Ka'lecTBeHHOe npffnaraTenb­HOe, o6o3Ha'!aIOw;ee Ka'!eCTBO '!eJIOBeKa, c IlOJIO)KffTeJibHOW OQeHKOW), a nocneAHHW rna6noH Ha6nIOAaeTc.11 ff B KOHTeKcTax c APYrHMff 3Ha'leHff­RMH npffnaraTeJibHOro.

KaK KOHCTPYKllffH, HanpffMep, TpaKTYIOTc.11 co'!eTaHff.11 cnoBa 6UO B 3Ha'leHffff 'noApa3AeJieHffe B CffCTeMantKe, BXOAJl:ll.l;ee B COCTaB BbICIIIero pa3AeJia - pOAa; pa3HOBffAHOCTb, nm' c npaBOCTOpOHHffMH KOJIJIOKaTaMff Tffna cnopm (r:abstrt:sport); OeRmenbHocmb (r:abstr der:v); CO'leTaHHJI CJIOBa 6UO B 3Ha'leHHH 6Heumocmb, 6UOUMblU o6JIUK; cocmo.smue c neBOCTOpOHHHMff KOJIJIOKaTaMH Tffna BHeumuu (r:rel t:place der:adv); oenamb (d:root); coenamb (d:pref I t:impact:creat t:be:appear ca:caus) H np. TeM ca.'dbIM, KOHCTPYKLIH.11 - nHHrBHCTH'leCKHW o6'beKT, B KOTOpOM cocpeAOTO'!eHa pa3HOPOAHaJI ffH<popMaQH.11, Il03BOn.!IIOll.l;aJI pacn03HaBaTb H pa3rpaHH'IHBaTb 3Ha'leHH.H MHOf03Ha'!HOro CJIOBa. 3THM 11 06'b.11cH11eTc.11 06'beAHHeH11e B HaUieM 11ccneAOBaHHH ABYX 3aAa'I KOMilbIOTepHOW ceMaHTHKH - aBTOMaTH3aQffH pa3perneHHJI neKCHKO­ceMaHTH'leCKOH HeOAH03Ha'!HOCTH (WSD) ff BbIAeneHH.H KOHCTPYKllffW (CxI) (MHTpocl>aHoBa, TiaHff'leBa, JfameBCKaJI 2008; MHTpocl>aHoBa, JfameBCKM, TiaHK'leBa 2008; Mitrofanova, Panicheva, Lashevskaya 2008; Mitrofanova, Lyashevskaya 2009; MHTpocl>aHOBa, fpa'!KOBa, lll11Mop11Ha, JI.11meBcKa11 2010; Shimorina, Grachkova 2011; Automatic Word ... 2011, HT.A.).

vfaBeCTHbl AOCTaTO'IHO 3<l><l>eKTHBHble MeTOAbl pa3peUieHH11 JieKCHKO­CeMaHTH'!eCKOH HeOAH03Ha'IHOCTH B nonyaBTOMaTH'leCKOM ffnH aBTOMaTH'!eCKOM pe)l(KMe (Mihalcea, Pedersen 2005; Word Sense Disam­biguation.. 2007; Navigli 2009). MeTOAbI nepBoro THna npeAnonaraIOT HcnoJib30BattHe KOMilbIOTepHbIX TesaypycoB (WordNet, http://wordnet.prin­ceton.edu/; FrameNet, http://framenet.icsi.berkeley.edu/) H cl>opMaJibHbIX OHTOJIOrHW B Ka'!eCTBe HCTO'IHHKOB :irncl>opMaQHH 0 3Ha'!eHHJIX CJIOB. MeTOAbI BTOporo THna OCHOBblBaIOTC.11 Ha CTaTHCTH'!eCKHX AaHHblX 0 KOHTeKCTHOM OKp)')Kemrn CJIOB, Il03BOJIJIIOll.l;eM pa3rpaHH'IHBaTb HX ynoTpe6nemte B pa3nH'IHbIX 3Ha'!eHHJIX (Schutze 1998; Pedersen 2002). Cyw;ecTByroT TaIOKe rn6pHAHbie IlOAXOAbI, npeAilOJiara10w;11e COBMew;eHHe neKc11Korpacl>11qecKHX 11 cTamcTH'leCKHX AaHHbIX (Leacock, Chodorow, Miller 1998; Mihalcea 2002). CaMocTo.11TeJibHb1e KCCJieAOBaHH.H npOBeAeHbI

161

LA FILOLÓGICA POR LA CAUSA

Page 7: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

c QeJiblO ycrattOBJieHIDI napaMeTpOB paaperneHIDI JieKCHKO-CeMaHTH'leCKOH tteOAH03Ha'IHOCTH (Jarowsky, Florian 2002).

IlpHMeHHTeJibHO K MaTepHany pyccKoro R3bIKa onpo6osaHbI o6a nma MeTOAOB. JIJcnOJib30BaHHe MO~HOro 3JieKTpOHHOro neKCHKOrpa­<l>H'leCKOro pecypca (PyTea (IlyKarneBH'I, qyf!Ko 2007), CeMaHTH'leCKHH cnoBapb HKP5l (Prurnmrna, Ko6pHQOB, KycTOBa, JinrneBcKan, llleMaesa 2006; Kustova, Lashevskaja, Paducheva, Rakhilina 2009)) o6ecne'IHBaeT BbICOKHH ypoBeHb paaperneHHR neKCHKO-CeMaHTH'leCKOH HeOAH03Ha'l­HOCTH. EcnH )Ke eCTb Heo6XOAHMOCTb 060HTHCb 6e3 CJIOBapHOH IlOA­Aep)l(KH (HanpHMep, B TOM cnyqae, ecm1 06pa6aTbrna10Tc11 TeKCTbI 6onb­IlIHX o6'beMOB, a HX JieKCH'leCKHH COCTaB He noKpbrnaeTCR HMelO~HMHCR B pacnopn)l(eHHH HCCJieAOBaTeJieH CJIOBapRMH), TO npeAilO'ITeHHe CJieAy­eT OTAaTb CTaTHCTH'leCKHM MeTOAaM. ,lJ;ocTaTO'IHO HaAe)l(HO paaperne­HHe JieKCHKO-CeMaHTH'leCKOH HeOAH03Ha'iHOCTH Ha OCHOBe cpaBHeHHR AHCTpH6yQHH 'laCTepe'IHbIX Teros KOHTeKCTHOro OKpy)l(eHHR CJIOB (Aaa­posa, MapHHa 2006; AaapoBa, oH'IHHesa, BrurnTOsa 2008) H Ha ocHoBe neKCH'leCKHX MapKepos KOHTeKCTOB (Ko6pHQOB, JI.11rneBCKM, IlaHH'le­Ba 2005) . .ll;onycTHMO COBMe~eHHe Teaaypyettoro H CTaTHCTH'leCKOro IlOAXOAOB K paaperneHHlO JieKCHKO-CeMaHTH'leCKOH HeOAH03Ha'IHOCTH c onopoit Ha CJIOBapttyio HH<l>opMaQHlO 0 MOAel111X CO'leTaeMOCTH CJIOB (TOJIAOBa, KycTOsa, JinrneBCKM 2008). Ilo AaHHbIM, nony'leHHbIM B Ha­rneM npoeKTe, 6onee 3<l><l>eKTHBHbIM OKa3bIBaeTCR CTaTHCTH'leCKOe pa3-perneHHe HeOAH03Ha'IHOCTH c yqeTOM AHCTp116yQHH JieKCHKO-CeMaHTH­'leCKHX Teros B KOHTeKCTaX. 3KcnepHMeHTbl no.o;o6Horo po.o;a BnepBbie ocy~ecTBneHbI B pycne o6cy)l(.o;aeMoro npoeKTa; no.o;o6ttb1e Hccne,o;oBa­HHR Ha MaTepi·taJie Kopnycos pyccKoro 113bIKa pattee He nposo,o;HJIHCb.

MeTO)l;bI H anropHTMbI Bb1,o;eneHH11 KOHCTPYKQHH no cpastteHHlO c aBTOMaTH'leCKHM paaperneHHeM JieKCHKO-CeMaHTH'leCKOH HeO)l;H0-3Ha'IHOCTH Mettee pa3pa6oTaHbl H B HaCTOJI~ee speMJI npe,o;cTaBJ111lOT npe,o;MeT ,o;J111 o6rnHpHbIX ,o;HcKyccHit (Sahlgren, Knutsson 2009; Pro­ceedings of the NAACL. .. 2010; Wible, Tsao 2010). HaH60JibllIHe ycnexH )l;OCTHfHYTbl B o6naCTH H3BJie'leHHR n-rpaMM (KOJIJIOKaQHH, HeOAHOCJIOBHbIX QenocTHOCTew - cp. (Manning, Schutze 2002; 5lry­Hosa, TIHBOBaposa 2011). O.o;ttaKo H)l;HOMaTH3HpoBaHHbie KOHCTPYKQHH, a TaK)l(e KOHCTPYKQHH c HecTatt,o;apTttow CHHTaKCH'leCKOH CTPYKTypoit, XOTJI OHM no.o;po6Ho OilHCaHbl B HCCJie,o;oBaTellbCKOH JIHTepaType (BopH­COBa 1995; JIJop,o;aHCKM, Mellb'IYK 2007), npe,o;cTaBJ111lOT cepbe3HYlO

npo6neMY s aBTOMaTH'leCKOH o6pa6oTKe TeKcTa. BMecTe c TeM cy~e-

162

LA FILOLÓGICA POR LA CAUSA

Page 8: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

cTByeT pRA npoeKTOB, B KOTOpbIX oco6oe BH111Matt111e yAe1U1eTcR cpopMa­n111saQ111111 neKCJllKO-CJllHTaKCJll'l{eCKJllX CBR3eH e,l\JllHJllQ TeKCTa, cpeAlll HlllX ecTb HCcne,l\0BaH111J1 111 Ha MaTep111ane pyccKoro R3bIKa:WordSketches AM pyccKoro R3hIKa (3axapoB, Xoxnosa 2010), pa6oThI no 111sBneqett11110 neK­c111Ko-c111HTaKc111qecK111.X ma6noHOB (BonhmaKoBa, BaeBa, BopAaqeHKOBa, Bac111nbeBa, Mopo30B 2007), no aBTOMaTw1ecK0My nocTpoeHl-i:IO cno­sapelf coqeTaeMocrn (fenh6yx, C111AopoB, 3pttaHAec-Py61110, qy6yKoBa 2004).

J1111HrB111cT111'1ecK111e AaHHbie 111 nporpaMMHbie pemeHJllR, nony'leHHbie B XOAe pa60Tbl Ha):I HaCTORll\lllM npoeKTOM, OTKpbIBaIOT B03MO)l(HOCTb cos,1:1aHJ11R KaTanora pyccKJAX KOHCTPYKl..\lllH, cooTHoc111MblX c onpe,1:1eneH­HhIMJ11 3HaqeHlllRMlll QeneBbIX cnoB. B Hameft: CTaTbe 06c~,1:1aeTcR o,1:1111H 1113 Ba)l<Heft:m111x acneKTOB npoeKTa, a 111MeHHO nocTpoeH111e wa6noHOB KOHCTPYKQlllH Ha octtose neKcw-1ecK111.X, MopcponornqecKlllX 111 neKc111Ko­ceMaHTJ11'1{eCK111x KOHTeKCTHbIX MapKepoB 3Ha'leHMH QeneBbIX cnoB.

2. ilJIHfBHCTil'ICCKHC )l;3HHl>IC

3Kcnep111MeHTbl no pa3perneHllllO neKCJllKO-CeMaHTlll'leCKOH HeO,LIH0-3Ha'l{HOCTJll 111 BhI,L1eneH11110 KOHCTPYKQlllH npoBO.LIJITCR Ha MaTep111ane HKP.H (http://www.ruscorpora.ru/). KoHTeKCTbI OCHOBHoro 110,1:1Koprryca HKP.H, 1113 Kornporo rrpo1113so,1:1111n111cb Bb16opK111, COAep)f(aT pa3MeTKY Tpex TlllIIOB: Tern neMM (lex,..- neKceMa, KOTOpoft: rrp111Ha,1:1ne)f(J11T cnosocpopMa), Mopcponorn'lecK111e Tern (gr - rpaMMaT111'1eCK111e np111sHaK111 cnosocpopM: 'laCTepe'IHaR 11p111Ha,1:1ne)f(HOCTb, 3Ha'leHJllR: rpaMMaTJll'leCKJllX KaTerop111H Iii T. ,LI.), neKCJllKO-CeMaHTlll'leCKHe Tern (sem - rrp1113HaKJll, yKa3bIBaIO­ll\He Ha 11p111Ha,l\ne)f(HOCTb cnoBa K orrpeAeneHHOMY neKCJllKO-CeMaHTlll'le­CKOMY Knaccy, Hanp111Mep, JZUl{O, Be~ecmBo, npocmpaHcmBo, cKopocmb, OBUJICeHue, o6naoaHue, cBoucmBo 11enoBeKa (110,1:1po6Hee CM.: http://www. ruscorpora.ru/corpora-sem.html).

B QeHTpe BHlllMaHlllR 111ccneAOBaTenbCKOH rpynnhI HaXO.LIJITCR pyccKHe cy1L1eCTBJ11TenbHb1e ooM, Buo, opzaH, nyK, maBa M T. ,1:1., np111naraTenhHb1e 6nu3KUU, BepH1xU 111 T.A., a TaK)f(e rnaronhI nponucamb, cnpaBumbcJC, 3a­Hecmu, 3aHocumb JA T . .z:I. Attan111s111pyeMb1e neKceMbI orn111'laIOTCR Kon111-'leCTBOM 3Ha'leHJllH, xapaKTepoM pa3BJllTJllR non111ceM111111/0MOHlllMlllJll, CTe­

rreHbIO CBR3aHHOCTlll 3Ha'leHlllH Me)f(AY co6olf. B Hamett pa6oTe 111cnonh­syeTcR TpaKTOBKa HeO):IH03Ha'IHOCTJll, rrp111HJITaR B KOMilbIOTepHOK n111H­rBJllCTJ11Ke Iii ,1:1011ycKaIOll\aR ycnOBHOe np111paBHJ11BaH111e OMOHlllMJll'IHblX

163

LA FILOLÓGICA POR LA CAUSA

Page 9: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

KoppeJIBTOB K MHoro3HaqHbIM CJIOBaM (Pax11mrna, Ko6pmi;oB, KycToBa,

Jl.snneBCKCUI, illeMaHaeBa 2006). Pa3MeTKa 3HaqeHHH CJIOB B KOHTeKcTax

HKP51 np0Bo,zi;11nacb Ha ocHoBe CeMaHT11qecKoro cJI0Bap11 HKP51.

3KcnepHMeHTbI npoBO,IJ;HJIHCb TOJibKO ,IJ;JUI 3HaqeHHH, npe,zi;cTaBJieH­

HbIX B HKP.SI ,zi;ocTaToqHbIM KOJIHqecTBOM KOHTeKcTOB (He Mettee 10 KOH­

TeKCTOB). HanpHMep, 113 paccMoTpeHHR 6brnH HCKJI10qettb1 cne,zi;y10ni:11e

HH3KOqaCTOTHbie 3HaqeHHR CJIOBa OOM: Mecmo, zoe )l(UBym moou, o6'beOU­HeHHble o6tu,UMU UHmepecaMU, ycnOBUHMU cyw,ecmBOBaHuH; j1uHacmuH, poo; 3HaqeHHe CJIOBa maaa: Kynon i{epKBU, BCTpeTHBIIIeecR B BbI6opKe

JIHIIIb B o,zi;HOM KOHTeKcTe. B 06yqa10ni:11x Bb16opKax neKCHKo-ceMaHTH­

qecKCUI HeO,ll;H03HaqHOCTb 6bIJia CHRTa BpyqttyIO, B OCTaJibHbIX cnyqaJJ:X

3Ta npoIJ;e.zi;ypa ocyni:ecTBJIBJiacb aBTOMaT11qecK11.

3. K0Mn1>10Tepuoe 06ecne11:euue 3KcnepHMCHTOB

KoMnhIOTepHbIH HHCTpyMeHT WSD 11 Cxl n03BOJIBeT BblilOJIHRTb

aBTOMaT11qeCKy10 KJiaccmpHKaIJ;HIO KOHTCKCTOB, HanpaBJieHHYIO Ha

pa3perneHHe JieKCHKO-CeMaHTHqecKOH HCO,IJ;H03HaqHoCTH CJIOB H Bbl­

,zi;eneHHe KOHCTPYKIJ;HH. 3TH npoIJ;eAypbI ocyni:ecTBJIBIOTCR c noMOIIJ;bIO

nporpaMMHoro o6ecneqeHHR, pa3pa6aTbIBaeMoro C. B. PoMaHOBbIM Ha

R3bIKe Python. l1ttcTpyMeHT WSD H Cxl co3,zi;aeT BCKTOptty10 Mo,zi;enh

3KCnepHMCHTaJibHOH BbI6opKH; B KaqecTBe 6a30BOfO amopHTMa Bbl-

6paH amopHTM KJiaccmp11Ka1J;HH c yq11TeJieM. ITporpaMMa pa6oTaeT

B ,ll;Byx pe)f(HMax: <PopMHpOBaHHe KJiaCCOB KOHTCKCTOB, COOTHOCHMbIX

c OT,ll;CJibHblMH 3HaqeHHRMl1 IJ;CJICBOfO CJIOBa; reHepaIJ;HH CilHCKOB HaH-

6onee qacTOTHbIX KOHCTPYKIJ;HH, B KOTOpbIX pean113yeTc11 TO HJIH HHoe

3HaqeH11e IJ;CJieBoro CJIOBa. ITp11 aBTOMaT11qecKOH o6pa6oTKe KOHTeKCTOB

yq11Tb1BaIOTCR pa3Hble Tl1Ilbl TeroB, np11cyTCTBYIOIIJ;HX B MHOroypoBHe­

BOH pa3MCTKe HKP.51: Tern lex, gr, sem, KOM611tta1J;1111 TeroB pa3HblX T11noB

(lex+gr, lex+sem, sem+gr, lex+sem+gr). Bo3MO)f(HO BapMtpoBaHHe TaK11x

napaMeTpOB 3KCnepHMCHTOB, KaK IIIHpHHa KOHTCKCTHOro OKHa [-l; +r],

o6pa60TKa c yqeTOM/6e3 yqeTa BCCOB KOHTCKCTHblX 3JICMCHTOB. KoM­

IlbIOTepHbIH 11HCTpyMeHT TaK)f(e npe,zi;oCTaBJIBeT ,ll;OilOJIHl1TCJibHbIC CTa­

THCTHqecKHe ,zi;aHHblC.

ITpoIJ;e.zi;ypa attan113a n11HrB11cT11qecKoi1 11tt<PopMa1J;11H npo113BO,IJ;HTCR

rne,zi;y10ni:11M o6pa30M. Ha 3Tane npe,zi;o6pa6oTKH B 3Kcnep11MeHTaJibHOH

BbI6opKe onpe,zi;eJIBeTCR q11cJIO KOHTCKCTOB Ha Ka)f(,ll;Oe 113 3HaqeHl1H IJ;e­

JICBOfO CJIOBa. ,[l;JIR Ka.)f(,IJ;OfO 113 3HaqeHl1H <PopMHpyIOTCR o6yqaIOIIJ;CUI

164

LA FILOLÓGICA POR LA CAUSA

Page 10: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

Bb16opKa ( cnyqaHHblM o6pa30M OT06paHHble KOHTeKCTbl co CIDITOH

HeO}l;H03Ha'IHOCTbIO, r,o;e peaJI113yeTC.R: paccMaTp11BaeMoe 3Ha'!ett11e)

11 TeCTOBa.R: Bbr6opKa (KOHTeKCTbl, }l;Jl.R: KOTOpblX npOBO}l;l1TC.R: aBTOMaTl1-

qecKoe pa3perneH11e Heo,o;H03Ha'IHOCT11 6e3 yqeTa anp11opHOH Jll1HfB11-

CT11'1eCKOH 11tt<l>opMal.l1111). Ha :nane Marn11Httoro 06yqett11.R: np0Bo,o;11TC.R:

<t>opM11poBaH11e CTaT11CT11'1eCK11X o6pa30B 3Ha'!eHl1H lleJieBoro CJIOBa.

06pa3 3Ha'!eH11.R: eCTb BeKTOp B BeKTOpHOM npocTpaHCTBe, Koop,o;11HaTbl

KOToporo onpe,o;eJI.R:IOTC.R: '!aCTOTaMl1 BCTpeqaeMOCTl1 TeroB lex, gr l1Jll1

sem B 06yqa10ll.leH Bb16opKe. YcTaHaBJI11Ba10Tc11 ,o;11cTp116yl.l1111 TeroB pa3-

JI11'1HbIX T11noB B Bb16opKe. Ha 3Tane pacno3HaBaH11.R: o6pa30B TecTOBbie

KOHTeKCTbl npe,o;cTaBJI.R:IOTC.R: KaK BeKTopa B BeKTOpHOM npocTpaHCTBe.

Yl3Mep.R:eTC.R: pacCTO.R:Hl1e Me)K,o;y KOHTeKCTHbIMl1 BeKTOpaM11 11 Ka)K}l;blM

113 o6pa30B 3Ha'!ett11l1. B Ka'!ecTBe MepbI 6m13ocT11 6brna Bb16patta Mepa

Cos ( v1, v), no3BOJI.R:IOll.la.R: BbI'll1CJI.R:Tb Koc11ttyc yrna Me)K,o;y KOHTeKCTHbI­

MH BeKTopaM11, CM. <t>opMyny (1):

(1)

Bb1611paeTC.R: o6pa3, K KOTOPOMY KOHTeKCTHblH BeKTop pacnoJIO)KeH

6Jil1)Ke Bcero, ff lleJieBOMY CJIOBY B KOHTeKCTe npffill1CbIBaeTC.R: 3Ha'leH11e

6JI11)KaHrnero o6pa3a •

.Uanee rrp0Bo,o;11Trn npoBepKa Ka'!eCTBa pa3perneH11.R: neKc11Ko-ceMaH­

T11'!eCKOH HeO}l;H03Ha'IHOCTl1: cpaBHl1BalOTC.R: pe3yJibTaTbl aBTOMaTl1'!e­

CKOH ff pyqttoH o~pa6oTKl1 KOHTeKCTOB, 01.leHffBaIOTC.R: TO'IHOCTb p (,o;oJIH

KOHTeKCTOB B TeCTOBOH Bb16opKe, }l;Jl.R: KOTOpblX 3Ha'!eHffe lleJieBOfO CJIOBa

6blll0 pacn03HaHO Beptto) 11 IlOJIHOTa R (,o;OJIH KOHTeKCTOB B TeCTOBOH Bbl-

6opKe, }l;Jl.R: KOTOpbIX 6blll0 npffIDITO Bepttoe mrn Olllff60'!HOe pernem1e).

ABTOMaTff'lecKoe Bb1,o;enett11e KOHCTPYKlll1H npoff3BO}l;l1TC.R: Ha octtoBe

CTaTl1CTl1'1eCKHX ,o;aHHblX 0 CO'leTaeMOCTff lleJieBblX CJIOB 11 KOHTeKCTHblX

MapKepoB ffX 3Ha'!ett11l1: TeroB lex, gr 11 sem. Co'!eTaeMOCTHa.R: 11tt<l>opMa­

llff.R: 113BJieKaeTc.R: 113 06yqa10ll.leH Bb16opK11. Pe3ynhTaT pa6oTbI nporpaM­

MbI oTpa)KaeTC.R: B B11,o;e cnffcKa qacTOTHbIX KOHCTPYKllHH (KoM611Halll1H

lleJieBoro CJIOBa 11 CTaTffCTl1'!eCKl1 3Ha'll1MblX JieBOCTOpOHHl1X 11 npaBO­

CTOpOHHHX KOHTeKCTHblX MapKepoB) c ,o;aHHblMl1 0 'laCTOTe BCTpeqaeMo­

CTH Ka)K}l;OH KOHCTPYKI.lffl1 11 c rrepe'IHRMl1 neKceM, peaJiff3YIOll.ll1X 3Ha'!e­

Hl1.R: KOHTeKCTHbIX MapKepoB B cocTaBe KOHCTPYKI.ll1H.

165

LA FILOLÓGICA POR LA CAUSA

Page 11: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

4. IlapaMeTpbI 3KCnep11MCHTOB

.D;1111 onpeJJ,enemrn HaHJI)"IWHX napaMeTpos aBTOMaTw1ecKoro pa3-

peweHHJI JieKCHKO-CeMaHTH'leCKOH HeOJJ,H03Ha'IHOCTH H BbIJJ,eJieHHH

KOHCTPYKJ..\HH 6brno nposeJJ,eHo CBhiwe 6000 3KcnepHMeHTOB, Hanpas-

11eHHbIX Ha 1) ycTaHOBJieHHe KoppeJIHQHH Me)l(JJ.Y TeraMH lex, gr H sem, Ha OQeHKY HaJJ,e)!<HOCnt pa3JIH'IHbIX KpHTep11eB (lex, gr, sem, HX KOM6H­

HaQHH lex+gr, lex+sem, sem+gr, lex+sem+gr) H onpeJJ,e11eH11e ycrroBHH HX

np11MeHeHHH, 2) OQCHKY pHJJ,a napaMeTpOB, KOTOpbie MOryT BJIJ1HTb Ha

pe3y11bTaTbI 3KCnepHMeHTOB (nrnpHHa KOHTeKCTHOfO OKHa, pa3Mep 06-

yqaJOI.QHX BbI6opoK H T. JI,.). CpaBHHTeJihHoe Hcc11eJJ,0BaH11e KpHTepHeB aBTOMaTH'lecKoro pa3pe­

weHHH JieKCHKO-CeMaHTH'leCKOH HeOJJ,H03Ha'IHOCTH H BbIJJ,eJieHHH KOH­

CTPYKJ..\HH 6bmo HanpasneHo Ha BhIBCHe~rne 11x HaJJ,e)f(HOCTH, T. e. Toro,

KaKOH 113 KpHTepHeB o6ecne'IHBaeT Ha1111yqwne noKa3aTeJIH TO'IHOCTH

P H nonHOThI R. 0Ka3arrocb, 'ITO HaH6011ee HaJJ,e)f(HaH KOM6HHaQHH Te­

roB 11eMMhI 11 ceMaHTH'leCKHX Teros (lex+sem) (P,,,87 ... 89%, R""95%), a HaHMeHee HaJJ,e)f(HbIMH - H3011HpoBaHHbie MOpcpo11orn'leCKHe Tern

(gr). IlpHeMJieMbie c TO'IKH 3pemrn TO'IHOCTH H IlOJIHOTbI pe3y11bTaTbI

TaK)f(e 6bIJIH no11yqeHbI c )"leTOM KOM6HHaQHH scex Tpex THilOB TefOB

(lex+sem+gr), a TaK)f(e H3011HposaHHhIX Teros neMM (lex). IlpoBeJJ,eHbI 3KCnep11MeHTbI c H3MeHeHHeM urnpHHbI KOHTeKCTHOfO

OKHa [-/; +r] (l, rs 5), np11 3TOM JJ,onycKaJIOCb CMMMeTpH'IHOe HJIH aCHM­

MeTpH'IHOe KOHTeKCTHOe OKHO, COOTHOCHMOe c CHHTafMOJ1: HJIH CHHTaK­

CM'leCKOH rpynnoJ1:. HaH11yqurne 3Ha'leHMH WHpHHbl KOHTeKCTHOfO OKHa

OQeHHBaJIHCh c noMOI.QhlO F-MepbI (2), )"IHTbIBalOI.QeH: OJJ.HOBpeMeHHO

TO'IHOCTb p H IlOJIHOTY R:

F=21(11P + l!R) (2)

HaHJI)"IWHe 3Ha'ICHHH urnpHHhI KOHTeKCTHoro OKHa onpeJJ,eJIHJIHCh

B 3KcnepHMeHTax c pa3HhIMH THnaMH Teros. Hanp11Mep, B 3KcnepHMeH­

Tax c y'leTOM TCfOB JICMM (lex) HaHJI)"IWHe pe3yJihTaTbI 6bIJIH IlOJI)"leHbI

npH WHpHHe KOHTCKCTHOfO OKHa [-4; +SJ. IlpH yqeTe Bcex Tpex THilOB

TefOB (lex+sem+gr) HaHJIY'IWHM.11 OKa3aJIJ1Cb ClICJJ.YIOI.QHe pa3MepbI KOH­

TCKCTHOfO OKHa: [-2;+4] H [-3;+4]. KOHTCKCTbI yKa3aHHOfO 06'beMa

Ha116orree xapaKTepHbI Jl,JIH HMeH Cyl.QCCTBHTCllbHbIX, nOCKOllbKY COJJ,ep­

)l(aT CO'leTaeMOCTHYIO HHcpopMaQHIO, Hrpa!OI.QYIO Ba)f(HyJO ponb B onpe­

Jl,CJICHHH 3Ha'leHHH QeJieBOfO CJIOBa B KOHTCKCTe. qaI.Qe BCero TaKoe

166

LA FILOLÓGICA POR LA CAUSA

Page 12: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

KOHTeKCTHOe OKHO COOTBeTcTByeT CHHTaKCH'leCKHM rpynnaM, pa3Mell.\a­

l01.I..\HMCH B npen03HQHH (TaKOBbI aA'beKTHBHbie rpynnbI) Hl!H B nocTno-

3HQHH (HMeHHbie, HH<i>HHHTHBHbie H Apyrne rpynnbI) no OTHOll!eHHlO

K Qel!eBOMY ClloBy. AHal!H3 KOHTeKCTOB, ocyll.\eCTBAABlllHHCH B XOAe 3KC­

nepHMeHTOB, n03BOAAeT rosopHTb 0 TOM, 'ITO yqeT rpaHHQ CHHTaKCH'le­

CKHX rpynn nOBbilllaeT TO'IHOCTb pe3YllbTaTOB aBTOMaTH'IeCKOro pa3pe­

ll!eHHH neKCHKO-CeMaHTH'IeCKOH HeOAH03Ha'IHOCTH p Ha 0,05 . 0,1. 3KcnepHMeHTbI CBHAeTellbCTBYlOT 0 TOM, 'ITO BbICOKOe Ka'IeCTBO aB­

TOMaTH'IeCKOro pa3perneHHR neKCHKO-CeMaHTH'IeCKOH HeO)l;H03Ha'IHO­

CTH (B cpeAHeM p,., 0,85, B HeKOTOpbIX ClIY'IaRX p,., 0,95 ... 1) )l;OCTH)l(HMO

npH yc110BHH BbI6opa COOTBeTCTBYlOl.I..\HX THnOB KOHTeKCTHbIX MapKepoB

(Teros), llIHpHHbl KOHTeKCTHoro OKHa, a TaK)l(e )l;OCTaTO'IHOro o6'beMa

06yqa101.I..1ei1: Bh16opKH (100 500 KOHTeKCTOB). ohrnH npose.n;eHhI TeCTbI

c nocTeneHHO ysel!H'IHBalOl.I..\HMHCR o6y'Ial01.I..\HMH BbI60pKaMH (10, 15, 55, 75, 100, 200, 500 ... KOHTeKCTOB), npH 3TOM o6'beM o6y'Ial01.I..\HX Bbl-

6opoK H3MeHHlICH nponopQHOHallbHO 061.I..1eMy 'IHClIY KOHTeKCTOB )l;AA

Ka)l()l;Oro H3 paccMaTpHBaeMbIX 3Ha'leHHH (10%, 15, 20%). TIO Hall!HM

Ha61110AeHHRM, 06yqa101.I..1aR Bb16opKa )l;Oll)l(Ha co.n;ep)l(aTb He MeHee 100 KOHTeKCTOB, HaHlIY'lllIHe pe3yllbTaTbl o6ecne'IHBalOTCH B Bb16opKax, co­

.n;ep)l(all.\HX 0Ko110 500 KOHTeKCTOB. B 061.I..1eM cnyqae, o6'heM 06yqa101.I..1e11

BbI6opKH AOl!)l(eH ~OCTaBlIHTb He Mettee 20% OT 061.I..1ero o6'heMa BbI6op­

KH KOHTeKCTOB )l;AA Qenesoro ClIOBa, B npOTHBHOM cnyqae o6pa3bI, <Pop­

MHpyeMbie Al!H OT)l;ellbHbIX 3Ha'IeHHH, MOryT OKa3aTbCH pa3Mb!TbIMH, 'ITO

CHH3HT Ka'IeCTBO aBTOMaTH'IeCKOro pa3perneHHH neKCHKO-CeMaHTH'le­

CKOH tteo.zi;tto3Ha'IHOCTH.

5. Pe3yJibTaTbJ 3KcnepHMeHTOB no aeroMaTH11ecK0My

pa3pemeHHIO JieKCHKO-CeMaHTH'leCKOH

HeO)l;H03Ha'IHOCTH

C noMOl.I..\blO KOMnhlOTepttoro HHCTpyMettTa WSD H Cxl npoBeAeHhI

cepHH 3KCnepHMeHTOB no aBTOMaTH'IeCKOMY pa3perneHHlO neKCHKO-ce­

MaHTH'IeCKOH HeO)l;H03Ha'IHOCTH HCC11e.n;yeMbIX MHOf03Ha'IHbIX Cl!OB.

TipHMepbl BbI)l;a'IH nporpaMMbI npHBep;eHbl B Ta611. 1. B KOHTeKcTe [1] 3Ha'IeHHe Qenesoro cnosa pacno3Hatto septto, Tor­

.n;a KaK npHMep [2] HHTepnpeTHpyeTCR tteseptto. BepoHTHO, ornH60'IHbie

perneHHH CBH3aHbl c He)l;OCTaTO'IHOCTblO KOHTeKCTHOro OKpy)l(eHlUI )l;l!H

11.n;eHTH<i>HKaQHH 3Ha'IeHHJI.

167

LA FILOLÓGICA POR LA CAUSA

Page 13: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

Ta611u14a 1. TipHMepbl KOMilblOTepHOK o6pa60TKH KOHTCKCTOB ynoTpe611eHHJI

CIIOBa ZllQ8Q c }"fCTOM KOM6HHaI~HH TerOB lex+ sem +gr

11cXO,[IHOe Pacno3HaHHoe llhip11Ha

KoHTeKCTbl Cos KOHTeKCTHOro 3Ha'lett11:e 3Ha'lett11:e

OKHa

( l J 3a cmo110M co6pa11ocb Bee 83poCRoe Hace11eHue HuK011ae8KU 80 z11a8e m2 m2 0,555 (-3; +SJ c '!f!OCeiJame11eM ~CmHOZO KOJIXOJa.

[2J BMecmo aiJMUHUcmpa14uu oiJHozo ceJlbCKOZO OKpyza 6yiJem 10-15 ZJIQB

aiJMUHUcmpa14uu 80 8XOORU{UX 8 Hezo m4 ms 0,112 (-1; OJ

iJepe8HRX - HQCKOJlbKO JKe 6yiJym pa3iJymbl rumambl ynpa811eH'leCKozo ann~ama?

OcHOBHbre pe3ynhTaTbI, nonyqeHHhre B xoAe 3Kcnep11MeHTOB, CBH-

3aHbI c BbIBBJieHHeM H CHCTeMaTH3aQHeH pa3JIH'JHbIX THilOB KOHTeKCT­

HbIX MapKepoB 3HatJeHHH QeJieBhIX CJIOB. Ha1160Jih1IIHH HHTepec npeA­

cTaBJIHIOT TaKHe KOHTeKCTHbie MapKepbI, KaK TerH lex H sem. Bo-nepBbIX,

npOH3BOAHTCJI 'ynopHAO'IeHHe TeroB lex no qacTOTe BCTpeqaeMOCTH

B KOHTeKCTHOM OKpy)l(eHHH QeJieBoro CJIOBa. Bo-BTOpbIX, 3HatJeHHJI KOH­

TeKCTHbIX MapKepoB - TeroB lex o6o6IQaIOTCJI Ha OCHOBe HX JieKCHKO­

ceMaHTHtJeCKOH pa3MeTKH, ycTaHaBJIHBaeTCJI COOTHOIIIeHHe TeroB sem H peaJIH3YIOIQHX HX neMM. Hanp11Mep, TaKHe neBOCTopoHHHe coceAH

cnoaa nyK, KaK ozypeu, (r:concr t:fruit t:food), apex (r:concr t:fruit t:food pt:part pc:plant), KapmOUlKa (r:concr t:fruit t:food pt:aggr sc:fruit), MO)l(­

HO OTHeCTH K OAHOH rpynne npeAMeTHbIX CYIQeCTBHTeJibHbIX, KOTOpbre

o6o3HatJaIOT ynoTpe6JIHeMbre B nHIQY npOAYKTbI. B HTore, AJIH Ka)l(AOro

J13 3HatJeHHH QeJieBorc CJIOBa COCTaBJIHeTCJI Ta6JIHQa tJaCTOTHbIX Ha6o­

poB TeroB lex H sem (cM., HanpHMep, Ta6n. 2). Y1CXOAH H3 AaHHbIX 0 COtJeTaeMOCTH CJIOBa maBa, npeACTaBJieHHbIX

B rn6n. 2, MO)l(HO yCTaHOBHTb KOHTeKCTHbie MapKephr 3HatJeHHH pyKoBo­oumeJZb, Ha'ia!lbHUK, cmapUlUU no nOJIOJICeHUIO - BCTpeqaJIHCb JieKCeMbI

wcyoapcmBo (r:concr t:space), <fieoepau,uJC (r:concr t:space), pezuoH (r:concr t:space pt:part pc:space), wpoo (r:concr t:space sc:constr), <jioHo (r:concr t:space pt:set sc:money) H T.J\. YKa3aHHbie KOHTeKCTHbie MapKepbl MO)l(HO

o6'beAHHHTb B rpynny npeoMemHblX UMeH npocmpaHcmBa u Mecma. Ha6JIIOAeHHH, CAeJiaHHbie B npoQecce o6o6IQeHHH KOHTeKCTHbIX

MapKepoB TeroB lex AO JieKCHKO-CeMaHTH'IeCKHX KJiaCCOB, CBHAeTeJib-

168

LA FILOLÓGICA POR LA CAUSA

Page 14: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

Ta6nu~a 2. 06pa31.\bI aHamua npaBOCTOpOHHHX COCeAeH CJIOBa znasa B 3H3'1eHHH pyKosooume1111, 11a'lan1111uK, cmapiuuu no nono"'e11u10

t-1Kc110

JleKCHKO- KOHTeKCTOB 3Ha'leHue ceMaHTH'leCKa.JI ilpHMep (113 o6~ero

3HHOT31.\IDI KOJIH'leCTBa KOHTeKCTOB)

znasa IlpasOCTOpOHHHe COCeAH

m4. PyKOBOA11TeJib, r:concr t:space H3'13JlbHHK, «npeAMCTHoe HM.II»

51(H363) cTapw11if no «npocTpaHCTBO Il0JIO)KeHl1IO HMCCTO»

113 pyK masbl

r:concr t:space zocyoapcmsa

41 (113 44) masa ¢eoepa~uu ~T6011a

r:concr t:space pt:part pc:space masa pezuo11a 5 «'13CTb '!£_0C~HCTB3»

r:concr t:space sc:constr masbl zopooa 3

«CO~lKCHHC»

r :concr t:space pt:set sc:money

masa ¢011oa 2 «COBOKYilHOCTb o6'beKTOB (AeHbrH)»

CTBYIOT B noJib3Y 3aKoHa ceMaHTH'leCKOro comacoBaHHR (faK 1972). Ce­

MaHTH'lecKoe comac0Batt11e - 3TO cpopMaJihHoe cpeACTBO opraHH3aQHH

BhICKa3hIBaHHR, npeAnonara10m;ee Ay6n11p0Batt11e xorn 6b1 OAHOro H3 ce­

MaHTH'leCKHX npH3HaKOB CJIOB, o6'beAHHeHHbIX KOHTeKCTHbIMH CBR3R­

MH. Hanp11Mep, 3Ha'lett11e m4 cnoBa ZllaBa onpeAeJIReTrn B neKCHKo-ce­

MaHTH'leCKOH pa3MeTKe KaK r:concr t:hum (npeoMemHoe UMH, TIUl{O), BMe­

CTe c TeM Ter t:hum (llUl{O) BXOAHT B COCTaB 60JiblllHHCTBa KOHTeKCTHbIX

MapKepoB yKa3aHHoro 3Ha'leHHR. B KOHTeKcTax co cnoBoM Zllaaa ceMa

TIUl{O 3KCilJIHQHTHO Bblp(l)l(eHa B ceMaHTH'leCKHX npH3HaKax JieBOCTO­

pOHHHX coceAeH r:propn t:hum, r:concr t:hum H r:concr t:hum d:nag der:v, a B ApyrHX CJIY'laJIX OHa HMilJIHQHTHO BXOAHT B COCTaB JieKCHKO-CeMaH -

TH'lecKotf aHHOTaQHH. TaK, ceMa TIUl{O noApa3yMeBaeTrn B 3Ha'leHHH ma­

ronoB roBopemrn t:speech, B 3Ha'leHHRX CJIOB, xapaKTep113yeMbIX Ter<lMH

t:org (opzaHU3al{uH) , t:group (zpynna), t:action (MeponpuHmue).

169

LA FILOLÓGICA POR LA CAUSA

Page 15: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

6. Pe3YJll>TaTbI 3KcnepHMeHTOB no Bbl~e11eHHJO KOHCTPYKl.\HH

3KcnepHMeHThI no asToMaTHqecKoMy BhIAeJieHHIO KOHCTPYKI..\HH

npoBOJ:VITCH B HeCKOJihKO 3Tanos. CttatJ:ana Ami Ka)l(AOro 3HatJ:eHH11 pac­

cMaTpHsaeMblX l.\eJieBhlX CJIOB COCTaBJUleTC11 cnHCOK KOHTeKCTOB ero yno­

Tpe6nemrn, Aanee l13 KOHTeKCTOB aBTOMaTHqecKH l13BJieKaeTC11 HaH6onee

tJ:aCTOTHa11 JieKCHKO-CeMaHTHtJ:eCKa11 l1 Mop<f>onom'!eCKa11 HH<f>opMaQHR

0 KOHTeKCTHbIX MapKepax 3HatJ:eHHR B 3aAaHHOM OKHe. BbIAeJieHHe KOH­

CTPYKQHH npm13BOAHTCR B npeAenax KOHTeKCTHoro OKHa [-1; + l], rAe

BbICOKa Bep011THOCTb BCTpeTHTb KOHTeKCTHbie 3JieMeHTbl, BXOJ:VI~He

B ycrntfqJ1Bbie CJIOBOCoqeTaHHR c HCcneAyeMbIMH cnoBaMH. J(anee <f>op­

MHPYIOTC11 Mop<f>onornqecKHe MOAeJIH KOHCTpyicQHH, on11cb1saeTcR HX

neKCHKo-ceMaHn1qecKoe HanonHeH11e. MeTO.flHKa JIHHrBHCTHqecKoro

aHaJIH3a Mop<f>onorntJ:eCKHX MOAene11: KOHCTPYKQHH H 11x neKCHKO-ce­

MaHTHqecKoro HanonHeHH11, HCnOJib30BaHHa11 B HarneM HCCJieAOBaHHH,

OCH08bIBaeTC11 Ha OnbITe aHaJIH3a COtJ:eTaeMOCTHblX orpaHHtJ:eHHH CJIOB

pa3HhIX qacTeJ1: peqH (MHTpo<f>aHosa, EenHK, KaAHHa 2008). B 3aBeprne­

HHe ocy~eCTBMeTC11 npoQe,a;ypa BepH<f>HKaQHH pe3yJibTaTOB: nonytJ:eH­

Hble cnHCKH KOHCTPYKQHH conocTaBJI11IOTCJI co cnHCKaMH KOJIJIOKaQHH,

<f>opMHpyeMhIMH Ha OCHOBe cepBHCa noHCKa 611rpaMM c. A. lliaposa

(http://corpus.leeds.ac. uk/ ruscorpora.html).

B xo,a;e o6pa6oTKH ,a;aHHbIX ,a;nR Ka)l(,D;Oro H3 3HaqeHHH aHaJIH3H­

pyeMbIX CJIOB 6hrnl1 BblRBJieHbl xapaKTepttbie Mop<f>oJIOrHtJ:eCKHe MO­

AeJIH KOHCTpyKQHH. EhrnH nonyqeHbI cneAyIO~He Mop<f>onorntJ:eCKHe

Mo,a;enH KOHCTPYKI..\HH c np11naraTeJibHbIMH: A + S, V +A, A +A, S +A, Adv+ A. J(n11 rnarona nponucamb B 3HaqeHHRX m2 (HaJHa'iumb Kaiwe-H. !leKapctn80 UllU ne•teHUe 60llbHOMy) l1 m5 (Coenamb JanUCb zoe-ll.) OC­

HOBHOH THn KOHCTPYKI..\HH - V + S;acc, a 8 3HaqeHHH ml ( O<fiopMumb o<fiuu,ua!lbHOU JanUCblO npoJ1CU8aHUe KOZO-H. zoe-H.) - V + (S;t:hum); ace, V +Adv, APRO;t:place r:rel: + V.

JleKCHKO-CeMaHTHqecKoe HanonHeHHe 3THX MO,a;eneH AJUI OT,D;eJibHbIX

3HatJ:eHHH np11naraTeJihHoro BepHblU, ,a;n11 rnarona JaHecmu oTpa)f(eHo

B Ta6JI. 3-4 (Tern lex H sem - nesocrnpoHHHe H npasocTOpoHHHe Map­

KepbI COOTBeTCTBYIO~HX 3HaqeHHH QeJieBbIX CJIOB).

Ehmo ycTaHOBJieHo, tJ:TO OTAeJibHbJe sapHaHTbI neKCHKo-ceMattTHqecKo­

ro ttanontteHM KOHCTPYKI..\HH xapaKTep113y10Tc11 BbICOKOH ycTOHtJ:HBOCTbJO.

TaK, no ,a;aHHhIM ceps11ca noHCKa 611rpaMM C. A. lllaposa (http:/ !corpus.

leeds.ac.uk/ruscorpora.html), cpe,a;H nesocrnpoHHJ1X H npasocrnpoHHHX

170

LA FILOLÓGICA POR LA CAUSA

Page 16: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

Ta611u~a 3. JleKCHKo-ceMaHTJf'lecKoe ttarronttettue KOHCTP}'I<I.\HH

c IIpHnaraTCnbHblM sepHblll B OT,l.\CnbHblX 3Ha'ICHJUIX

JleKCHKO-JleBOCTOpoHHlie IlpaBOCTOpoHHHe

3Ha'leHHJI ceMaHTH'ICCKaJI aHHOTallliJI

MapKepbl MapKepbl

~Hl>lll

t:hum:kin r:concr: CblH, JKeHa, m 1. Ha11eJKHbll!,

ev:posit t:humq t:loc: OCTaTbCJI MYlK npO'IHbll!, CTOl!Kllll,

r:qual r:spec: caMblii r:abstr: CII}"JK6a

npe11aHHbll!. r:ref: ce611 t:animal r:concr: nee

t:ment: 3HaTb t:space r:concr: Ilyrb, MecTO

m3. HecOMHCHHbll!, t:mod r:qual t:be:exist: 6b1Tb

t:be:disapp r:abstr: rn6e11b HCH36ClKHbll!.

r:rel t:dir: OTKy11a t:asp r:abstr: cnoco6 t:space r:concr t:fam: 11~a

Ta611u~a 4. JleKCHKo-ccMaHTH'lecKoe HarronHCHHC KOHCYp}'Kl.\HH c rnaronoM

JaHecmu B OT,l.\CJJbHblX 3Ha'ICHJUIX

JleKCHKO· JlesocTOpOHHHC flpaBOCTOpOHHHC

3Ha'ICHHJI CeMaHTH'lecKaJI aHHOTal.IHJI

MapKepb1 MapKepbl

3aHecmu

m7. OTKllOHllTb, t:tool:device:machine

pe3KO rmsepHyTb der:v ca:noncaus r:concr t:fam:

t:tool:device:machine B CTOpoHy lfllH

t:move d:pref r:concr t:fam: MaWHHa

MaUJHHa CHllbHO HaKpeHHTb (np11 /1BHlKCH1111).

t:stuff r:concr r:abstr top:stripe t:space

t:weather: CHer m8. (1-e 112-e 1111110

der:vca:O t:space r:concr pt:part r:concr t:fam: 11opora

He ynoTp.). KOrO-'ITO. r:abstr t:weather: 3aCblilaTb, 3aMeCTH.

t:changest d:pref pc:space: y11111.1a MCTCllb CHerona11

top:stripe t:space Henoro11a

r:concr: Tpona nyTh

MapKepos 3HaqeHYIH QeJieBhIX cnos B cocTase KOHCTPYKQYIH npY1cyTcrny10T KOMilOHeHTbI KonnoKaQYIH c BbICOKYIM noKa3aTeneM Log-Likelihood (LL). OrAenbHhie KonnoKaQYIYI HaIUnYI 0Tprot<eHY1e B cnosapffi{ MAC YI :SAC KaK ycToJlt:qHBble coqeTaHYIJI (npYIBOAJITCJI c IlOMeTOH 0 B :SAC) YI cppa3eono­rYJ3MbI (npHBOAJITCR c noMeToJ/t: 0 MAC YI c noMeToJ/t: - B :SAC). Hanptt­Mep, KOHCTPYKQYJR BepHbtil.+ce6e (LL=47,27) npY1cyTcrnyeT B MAC, :SAC, a KOHCTPYKQYJR 3aHecmu + cHez (LL= 346,32) ynoMHHaem:i- B :SAC.

171

LA FILOLÓGICA POR LA CAUSA

Page 17: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

B pe3ynhTaTe rrpoBeAeHHhIX 3Kcrrep11IMeHTOB 6hmo AOKa3aHo, qTO

rroqTllI y Bcex 3HaqeHllIH MHoro3HaqHoro cnoBa ecTb KOHCTPYKQllillI, xa­

paKTepHbie llIMeHHO Ailll 3Toro KOHKpeTHoro 3Haqemui:. IloA06Horo poAa

llIHcpOpMaQllIJI MO)f(eT 6h!Tb llICI10llb30BaHa B AanbHeHllleM Ailll ITOCTpoe­

HJIIR pa3nlliqHhIX cnoBapetf KOHCTPYKQllIH.

7. 3aKJIIO'ICHHC

B CTaThe llI3llO)l(eHbl OCHOBHbie pe3ynbTaTbl, rronyqeHHbie B XOAe pa3-

pa60TKllI rrpoQeAyp aBTOMaTlliqecKoro pa3pellleHllIJI neKCllIKO-CeMaHTllI­

qecKOH HeOAH03HaqHoCTllI QeneBbIX cnoB III BhIAeneHllIR KOHCTPYKQllIH

B KOHTeKcTax HKPJI: orr11IcaHo rrporpaMMHoe 06ecrreqeH11Ie AJUI o6pa-

6oTKllI n11IHrB11ICT11IqecKoro MaTep11Iana; rrpoaHanllI311IpoBaHbI rrapaMeTpb1

llICcneAOBaHllIJI III AaHHhie, rronyqeHHbie B XOAe 3KCnep11IMeHTOB c KOHTeK­

CTaM11 HKPJI. 0CHOBHOH llITOr llICCneAOBaHllIJI 3aKmoqaeTCJI B ITOATBep)l(AeHllillI Toro,

qTo TllIIT III CTerreHb AeTanllI3aQllillI llllIHfBllICTlliqecKOH pa3MeTKllI KOH­

TeKCTOB HKPJI rro3BOJUIIOT ccpopM11IpoBaTh MHO)l(eCTBO KOHTeKCTHhIX

MapKepOB Toro llillllI llIHOro 3HaqeHllIR Ha OCHOBe BbI6opoK KOHTeKCTOB;

06061.I.\llITb AaHHbl~ 0 KOHTeKCTHbIX MapKepax c TQqKllI 3peHJIIR I/IX npllI­HaAne)l(HOCTllI K neKCllIKO-CeMaHTlliqecKllIM KnaccaM; OITllICaTb KnaCCbl

KOHCTPYKQllIH, CBJI3aHHhIX c TeM 11In11I llIHbIM 3HaqeH11IeM; 11Icrronh30BaTb

rronyqeHHYIO TaKllIM o6pa30M MOAenh coqerneMOCTllI An.a aBTOMaT11Iqe­

cKoro rrocTpoeHllIR Karnnora KOHCTPYKQllIH Ha ocHoBe HKPJI.

A3apoaa Ji!. B., 5u<iuHeaa C. B., Baxumoaa f!. T. ABTOMarnqecKoe pa3pewe­H11e neKCW'leCKOH HeO,[(H03Ha'IHOCT"1 'laCTOTHblX cyl.J.(eCTBWTenbHblX (s TepMw­

Hax CTPYKTYPHhIX e,[(WHWl.I RussNet) 11 TpyAhI Me)l(,[(yHap. KoH<f>. «KopnyCHaH JIWHrswcTwKa-2008». CI16., 2008.

A3apoaa Ji!. B., MapuHa A. C. ABTOMaTw3wpoBaHHaH KJ1acrn<f>wKa11w11 KOHTeKCTOB npw nO,[(fOTOBKe ,[(aHHblX AJIH KOMTibJOTepHoro Te3aypyca Russ­Net 11 KoMnh10TepHa11 nwHrswcTwKa w WHTenneKTyanhHhie TeXHonorw11: TPYAhI Me)l(.r1yttap. KoHcp. «,L\11anor- 2006». M., 2006.

5om,tuaKoBa E. Ji!., 5aeaa H.B., 5opoa'ieHKOBa E. A., Bacunbeaa H. 3., Mopo­JOB c. c. JleKCWKO-CWHTaKCW'leCKWe wa6nOHbl B 3a,[(a'lax aBTOMaTW'leCKOH o6pa-

6oTK11 TeKCTa 11 KoMnhJOTepHaJI n11HrBWCT11Ka 11 11HTenneKTyanhHbie TeXHono­

n111: TPYAhI Me)l{,[(yttap. KOHcp. «,L\11anor-2007». M., 2007.

172

LA FILOLÓGICA POR LA CAUSA

Page 18: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

EopucoBa E. r. Kon110Ka1..11111. qTO 3TO TaKoe 11 KaK HX l13)"1aTh. M., 1995.

faic B.f. K npo611eMe ceMaHTwrecKol1 rnHTarMaT11KH // ITpo611eMbI cTpyK­

Typttol1 11HHrs11cT11KH. M., 1972.

fe11b6yx A. <P., CuoopoB r. 0 ., 3pHaHoec-Py6uo 3., lfy6yKoBa M. B. C11osap11

COlfeTaeMOCTl1 CllOB: KaKOH MeTOA COCTaBlleHHH ll)"lllle? // KOMilblOTepHaH mrnr­

Bl1CTHKa "' HHTe1111eKTya11bHb1e TeXH011ornH: TPYAhI Me~yttap. Kott<P. «,!l11a-

11or-2004». M., 2004.

3axapoB B. II., Xox110Ba M. B. Atta11113 3<P<PeKTHBHocrn crnrncT11lfecKHX Me­

To.v;os Bb!HBlleHJ1Jl KOllllOKaQHH B TeKCTax Ha pyccKOM H3bIKe // KOMilblOTCpHM

1111HrBHCTHKa H HHTe1111eKTya11bHb1e TexH011orn11: TPYAhI Me)l(.v;yHap. Kott<P. «,!l11a-

11or-2010». M., 2010.

JitopoaHCKQJC n. H., Mellb'iYK Ji!. A. CMblCll "' COlfeTaeMOCTb B CllOBape. M.,

2007

Ko6puu,oB E. II., HHweBcKaH 0. H., llleMaHaeBa 0. IO. CHHT11e 11eKrnKo-ce­

MaHTHlfeCKol1 OMOHHMHH B HOBOCTHblX l1 ra3eTHO-)l(ypHallbHblX TeKCTax: no­

sepXHOCTHbie <PHllbTpbl "' CTaTHCTHlfeCKaH OQeHKa // J.1HTepHeT-MaTeMaTl1Ka

2005: asToMaTHlfeCKM o6pa6oTKa se6-.v;aHHbIX. M.,2005.

KyJHeu,oBa IO. n. fpaMMaTHKa KOHCTPYKl.ll1H. 063op // Ha)"IHO-TeXHl1lfeCKaH

11tt<PopMaQHH. 2007 Cep. 2. No 4.

HyKa1«eBU'i H. B., lfyuKo J(. C. AsTOMaTHlfecKoe pa3pe11JeHHe 11eKrnlfecKol1

MHOf03HalfHOCTJ1 Ha 6a3e Te3aypyCHblX 3HaHHH // J.1HTepHeT-MaTeMaTJ1Ka 2007:

c6. pa6oT )"laCTHHKOB KOHKypca. EKaTepHH6ypr, 2007.

HH1«eBCKaH 0. H., KyJHeU,OBa IO. n. PyccKHH <I>pei1MHeT: K 3a.v;aqe C03AaHHH

KopnycHoro c11osapH KOHCTPYKQHH // KoMnhlOTepHM llHHrBl1CTHKa 11 11Hre11-

11eKTya11bHb1e rexHonornH: TPYAhI Me)l(AyHap. Kott<P. «,!lHa11or-2009». M., 2009.

MumpocfiaHOBa O.A., Ee11uK B.B., KaouHa B.B. KopnyCHoe 11crnenoaatt11e

colfeTaeMOCTHbIX npe.v;nolfTeHHH qacTOTHbIX 11eKceM pyccKoro H3bIKa // KoM­

IlblOTepHM llHHrBHCTHKa "' HHre1111eKTya11bHb1e TexH011ornH: TPYAhI Me)l(AyHap.

KoH<P. «,!lHa11or-2008». M., 2008.

MumpocfiaHoBa 0. A., fpa<iKOBa M.A., llluMopuHa A. C., HH1«eBcKaR 0. H. JleKC11lfeCKHe, ceMaHTHlfeCKl1e "' Mop<Po11ornlfeCKHe npH3HaKH KOHTeKCTOB

B pa3pellleHHl1 HeOAH03HalfHocr11 pyccKHX cy1.1.1ecTB11Te11bHbIX // XXXIX Me:>K­

.v;yttap. 4>1111011. KoHqi. CeKQ11H MaTeMaTHlfecKol111HHrBHCT11K11. CIT6., 2010.

MumpocfiaHoBa 0 . A., HR1«eBcKaR 0. H., IlaHU'ieBa II. B. 3Kcnep11MeHTbI no

CTaTHCTHlfeCKOMY pa3pellleHl1IO 11eKCHKO-CeMaHTl1lfeCKOH HCO.D;H03HalfHOCTlf

pycCKHX HMeH cy1.1.1eCTBl1TellbHblX B Kopnyce // TpyAhl Me)l(.v;yHap. KOH<P. «Kop­

nyCHaH lll1HrBHCTHKa-2008». CIT6., 2008.

MumpocfiaHoBa 0. A., IlaHu'ieBa II. B., HH1«eBcKaH 0. H. CrnrncTHlfeCKoe

pa3perneH11e 11eKCHKO-CeMaHTHlfeCKOH HeO.D;H03HalfHOCTl1 B KOHTeKCTax .D;llH

npe.v;MeTHbIX HMeH cy1.1.1ecrs11TenbHbIX // KoMilblOTepttaH IIHHrBl1CTl1Ka 11 HH­

Te1111eKTya11bHb1e TeXH011or1111: TPYAhI Me)l(nyttap. KoHqi. «,!l11a11or-2008». M.,

2008.

173

LA FILOLÓGICA POR LA CAUSA

Page 19: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

PaxunuHa E. B., Ko6pu14oa E. 11., Kycmoaa f. 11., /IxiueacKaH 0. H., IlleMaHa­eoa 0. IO. Mttoro3Ha'IHOCTb KaK npHKrraAHaH npo6rreMa: rreKCHKO-ceMaHTH'le­CKaH pa3MeTKa B Hal.\HOHallbHOM Kopnyce pyccKoro R3bIKa // KoMnblOTepHaH rrHHrBHCTHKa H HHTenrreKTyarrbHbie TexHorrornH: TPYAbI Me)!{Ayttap. KOHQ>. «.QHa­rror- 2006». M., 2006.

Crrosapb pyccKoro R3bIKa: B 4 T. I noA peA. A. IL Esrettbesow: 2-e H3A., 11cnp. 11 AOn. M., 1981- 1984 (s TeKcTe - MAC).

Crrosapb cospeMeHHoro pyccKoro rrHTepaTypHoro R3bIKa: B 17 T. I noA peA. B.11. '1.Jepttb1111esa. M.; JI., 1948-1965 (s TeKcTe - BAC).

Tonooaa C. IO.. Kycmoaa f. 11., JlxiueacKaH 0. H. CeMaHrn'leCKHe Q>11rrbTpb1 AITR pa3pe111eHHR MHOr03Ha'IHOCTH B Hal.\HOHarrhHOM Kopnyce pyccKoro R3bIKa: rrrarorrb1 // KoMnb10TepHa11 lIHHrBHCTHKa H HHTerrrreKTyarrbHble TeXHorror11H: TPYAbl Me)KAyHap. KOHQ>. «.QHarror-2008». M., 2008.

JlzyHoBa E. B., I1UBOBapoaa n. M. OT KOlllIOKa~Hfl K KOHCTPYKl.\HRM II Pyc­CKHH R3bJK: KOHCTPYKl.\HOHHbie H rreKCHKo-ceMaHTH'leCKHe nOAXOAbI. Cil6., 2011.

Fillmore Ch.J, The Mechanisms of Construction Grammar I I Proceedings of the Berkeley Linguistic Society. 1988. Vol. 14.

Goldberg A. E. Constructions at Work: the Nature of Generalization in Language. Oxford, 2006.

Goldberg A. E. Constructions: a Construction Grammar Approach to Argu­ment Structure. Chicago (Ill.); London, 1995.

farowsky D., Florian R. Evaluating Sense Disambiguation Across Diverse Parameter Spaces II Natural Language Engineering. 2002. Vol. 8(4).

Kustova G. I., Lashevskaja 0. N., Paducheva E. V. , Rakhilina E. V. Verb Taxonomy: From Theoretical Lexical Semantics to Practice of Corpus Tagging 11 Cognitive Corpus Linguistics Studies I ed. by B. Lewandowska, K. Dziwirek. Frankfur, 2009.

Leacock C., Chodorow M., Miller G. Using Corpus Statistics and WordNet Relations for Sense Identification // Computational Linguistics. 1998. Vol. 24 ( 1).

Automatic Word Sense Disambiguation and Construction Identification Based on Corpus Multilevel Annotation I 0. Lyashevskaya, 0. Mitrofanova, M. Grachkova, S. Romanov, A. Shimorina, A. Shurygina // Text, Speech and Dialogue. Proceedings of the 14th International Conference TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. Pilsen, 2011.

Manning C., Schutze H. Collocations I I Foundations of Statistical NLP. 2002. Mihalcea R. Word Sense Disambiguation Using Pattern Learning and

Automatic Feature Selection I I Journal of Natural Language and Engineering. 2002. Vol. 1 (1).

Mitrofanova 0., Lyashevskaya 0. Disambiguation of Taxonomy Markers in Context: Russian Nouns I/ 17th Nordic Conference of Computational Linguistics NODALIDA- 2009, Odense, Denmark, May 14- 16, 2009.

174

LA FILOLÓGICA POR LA CAUSA

Page 20: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

Mitrofanova 0., Panicheva P., Lashevskaya 0. Statistical Word Sense Disambiguation in Contexts for Russian Nouns Denoting Physical Objects 11 Text, Speech and Dialogue. Proceedings of the 11th International Conference TSD 2008, Brno, Czech Republic, September 8-12, 2008. Brno, 2008.

Navigli R. Word Sense Disambiguation: a Survey. ACM Computing Surveys. 2009. Vol. 41(2).

Pedersen T. A Baseline Methodology for Word Sense Disambiguation 11 CICLing. LNCS. Vol. 2276 /ed. by A. F. Gelbukh. Heidelberg, 2002.

Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics. Los Angeles (CA), 2010.

Sahlgren M., Knutsson 0. Workshop on Extracting and Using Constructions in NLP II NODALID.4:09: SICS Technical Report. Odenge, 2009.

Schutze H. Automatic Word Sense Disambiguation // Computational Linguistics. 1998. Vol. 24(1).

Shimorina A., Grachkova M. Identification of Context Markers for Russian Nouns // 18th Nordic Conference of Computational Linguistics NODALIDA 2011, Riga, Latvia, May 11-13. Riga, 2011.

Tomasello M. Constructing a Language: A Usage-Based Approach to Child Language Acquisition. Cambridge (MA), 2003.

Wible D., Tsao N.-L. StringNet as a Computational Resource for Discovering and Investigating Linguistic Constructions // Proceedings of the NAACL HLN Workshop on Extracting and Using Constructions in Computational Linguistics. Los Angeles (CA), 2010.

Word Sense Disambiguation: Algorithms and Applications. Text, Speech and Language Technology I ed. by E. Agirre, Ph. Edmonds. Vol. 33. Berlin; Heidelberg; New York, 2007

LA FILOLÓGICA POR LA CAUSA

Page 21: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

Hay'-IHOe H3AaHHe

CTPYKTYPHMI VI CTPVIKflAllHMI flVIHfBVICTVIKA

MeJKayJoacKuu c6opHuK

BbinycK 9

PeAaKTOp JI. A. Kapnoaa

KoMnblOTepHa.H sepcTKa E. M. BopoHKOaoii

no,l{mfCaHO B ne'-!aTb 13.07.12. <I>opMaT 60x84 I I 16'

CTe'-!aTb ocpcernaR. fiyMara ocpceTH<UI.

Yrn. ne'-1. n. 20,69. THpa:>K 250 3K3. 3aKa3 2.f.C

l13AaTellbCTBO CaHKT-DeTep6yprcKoro YHHBepcineTa.

199004, C.-0eTep6ypr, B.O., 6-.R !IHHH.R, 11/21.

Ten. (812)328-96-17; cpaKc (812)328-44-22

E-mail: [email protected]

www. uni press. ru

TimorpacpH.R l13,l{aTe!IbCTBa cn6n:

199061, C.-DeTep6ypr, CpeAHHtt np., 41.

LA FILOLÓGICA POR LA CAUSA

Page 22: ЭКСПЕРИМЕНТЫ ПО АВТОМАТИЧЕСКОМУ РАЗРЕШЕНИЮ ЛЕКСИКО-СЕМАНТИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ И ВЫДЕЛЕНИЮ КОНСТРУКЦИЙ

HHTepHeT-MaraJHH

OZON.rU

"13,QATEJlbCTBO C.-nETEP6YPrCKOrc 1111111 11 111111 1111111111 1009549712

:& 7 ... C\i ... 0 N

oi c Ji

a:i

~ :s: t; :s: e :c :s: c:;

:5

! :s: Q. c :s:

:5

I u g

~ 0

~ z "' !:!2

LA FILOLÓGICA POR LA CAUSA