Top Banner
M npMKlla.QHafl 11 "1 H rB"1CT"1 Ka 8 ISSN 0202-2400 •••••• t •• •• •• •t •••• ••• t •••••• •••••••• •••••••••••••• LA FILOLÓGICA POR LA CAUSA
24

АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Mar 31, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

CrpyKrypHa~ M npMKlla.QHafl 11 "1 H rB"1CT"1 Ka

8

ISSN 0202-2400

•••••• t •• •• •• •t •••• • ••• • • • • • • • • • • • • • t

•••••• • •••••••• ••••••••••••••

LA FILOLÓGICA POR LA CAUSA

Page 2: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

CAHKT-IlETEPBYPfCKHil fOCY,lJ;APCTBEHHbltt YHMBEPCMTET

CTPYKTYPHA5! I1 ITPI1KJ1A~HA.H

J111HI'BI1CTI1KA

Me~ey3oec'K:uu c6opttu'K:

Ifa.u,aeTCH c 1987 ro.u,a

BbIIIYCK 8

flo,rr, pe,n;a~neli A. C. fep,n;a

• I VI3,ll;ATEJibCTBO C.-ITETEPBYPrCKOrO YHMBEPCVITETA 2010

LA FILOLÓGICA POR LA CAUSA

Page 3: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Y,LU< 80+618.31 BBK 81.1

C83

PeAaK~HOHHax KOJIJier'HH: npo<P. JI. H. BeJIJl,eBa, npo<P. A . C. I'epa (OTB. peAaKTOp), npo<p. 0. H. I'puH.6ayM, npo<P. M.A. Mapycexito

CeKpeTapb peA~OHHolt KOJIJierirn B. H. Py6uxep

Pe~eH3eHT A-P cpR1IOJI. Hayi< npocp. M . A.Mapycex1t0

C83

Ile'WmaemC.R. no nocmattoeJ1.ettu10 Peda1'"4UO'H.H0-1.13dame.n.'bc1Cow coeema

Catt?Cm-Jlemep6ypzc1e020 wcydapcmeettttow yttueepcumema

CTpyKTypHrui: u npuKna,u;Hax Me)l(Bj'3. c6. I ITo~ pe.L(. A. c. rep~a. JH-Ta, 2010. 276 C.

nuHl'BHCTHKa. Bhm. 8: CIT6 .. lfa,n;-Bo C.-IleTep6.

C6opHHK (Bbln. 7 BblllleJI a 2007 ['.) coAep.lKHT cra.TbH no mHpOKOMY Kpy­cy npo6neM TOOpeTH"!ecKolt H npmu:a,rvrolt JlIDI['BHCTHKH, no npaMeHeHHIO MaTeMaTH'lecKHX MeTOAOB B H3bIK03H8.HHl'I .

.lVtx cne~aJJHCTOB no TOOpHH H3blKa, IlpHKJla,D,HOlt H TOOpeTH'leCKOlt JlHHI'BHCTHKe.

BBK 81.1

@ lbAaTeJibCTBO C .-IIerep6yprcKoro rocyAap­CTBeHHoro )'HHBepcHTeTa., 2010

LA FILOLÓGICA POR LA CAUSA

Page 4: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

0. A. Mumpo<fxi,ttoaa, M. A. I'pa"{'IWBa, A. C. filUMoputta

ABTOMATHqECKA.sI KJIACCHcJ>HKA.IJ):l.sI JIEKCHKH B IlAPAJIJIEJihHhIX TEKCTAX

(Ha MaTepHaJie pyccKOSl3bl~IHhIX TeKCTOB A. c. rpuua H

HX nepeBO):\OB Ha CJIOB~KHA Sl3hlK)

K.1110'1eB&1e CJlOBa. ABroMaTH'!'.ecl'..a.ll K/UICCH¢HKalQIJI JieKCHKH, KJiacrepHhIA a.HaJIH3, KJOO'leBhle CJioaa, KJI8CCbl yCJIOBHoA 3KBKB8.JleHTHOCTH, Kopnyc nap&nJie.m.­

HhlX TeKCTOB, pyccKffi!: .H3hlK, CJIOBaI.IJOilt H3blK

Aaa0Taq1u1. B craTbe ~CH pe3yJlbT&Tbl 3KCDepaMeHTOB no aBTO-­

MaTK'lecKoA KJl8CCHcP~H JieKCHKH B pyCCKruDhl'IHhlX opHrnHaJiaJC poMaHOB

A. c. rpmra ·~ecCH H Moprna.Ha., cBJIHCTa!O~ MHP• H B TeKcrax HX nepeBO­

)J;OB aa CJIOB~ .H3blK. B xo,a;e aCCJie,n;oaa.mui: npoBe,!l;eH cpa.sHHTeJlhHhlA aH&JIH3

JieKCH'IOCKOI'O cocraBa H ceMaHTH'IOCKOlt CTpYJCTYPhl KJI8CCOB yCJIOBHOA 3KBHBaJieHT­

HOCTH, c¢opMHpoB8HHhlX P.JUI KJIIO'leBhlX CJIOB B TeKCTax opHrKHaJIOB H nepeBOAQB.

0. A. Mitrofanova, M. A. Grachkova, A. S. Shimorina

Automatic Word Classification in Parallel Tuxts (Case Study of Russian Original 'Thxts by A. S. Grin

and Their Slovak Translations)

Keywords. Automatic word classification, cluster analysis, key words, nea.r­equivalence classes, parallel text corpus, Russian, Slovak

Summary. The paper discusses experimental results of automatic word classi­fication in Russian original texts of the novels cJessie and Morgiana•, cThe Shining World• by A. S. Grin and in the texts of their Slovak translations. The study implies comparative analysis of lexical content and semantic structure of near-equivalence classes which were generated for keywords in the original texts and their transla­tions.

© 0 . A. MHTpo<l>aHoaa, M. A. rpa'IKOBa, A. c. lliHMOpHHa, 2010

161

LA FILOLÓGICA POR LA CAUSA

Page 5: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

1. IlocTaHOBKa npo6JieMbI, n.eJib HCCJie)l;OBaHHR

<I>opMa.rrH3a~ cTpyKTYPbI TeKcTa H KOJUf'IeCTBeHHasr oa;eHKa ce­

MaHTlf'IeCKHX CBS3ett Me)K,Lly 3JieMeHTaMH TeKCTa ( CJIOBaMH, rrpe)l;CTaB­

JieHHbIMH JieMMa.MH Iii CJIOBOcPopMaMH) aKTJa.JibHOe HanpaBJiemre HC­

CJie)J;OBaHHtt, CBX3aHHOO c pememieM npa.KTH'IeCKIDC 3a,n,aq aBTOMaTB­

qecKOl'O IIOHHMamrn TeKCTOB (JleoHTheBa 2006). 0,IJ;Ha H3 KJIIO'IeBbIX

npoa;e)l;yp, II03BOJUIIOIIVDC pemaTh 3TH 3a,D,a'IH, - ocym;ecTBJieHHe aB­

TOMaTHi:i:ecKoA KJiacCHcPBKaa;HH JieKCBKH (AKJI)1 c O,IJ,HOA CTOpOHhl,

AKJI rrpe.n;ocTaBJUieT JIHHI'BHCTaM B03MO)KHOCTh HCIIOJlb30BaTb o6'b­

eKTHBHhie ,ll;aHHhle o6 HepapXH'iecKoA crpyKType JieKCHKoaa, co6paH­

Hhre rrpH aHaJIH3e npe,n;cTaBHTeJibHbIX KOpnycoB, H CTpOHTb Ha OCHOBe

3THX ,ll;a.HHhIX cPoPMaJibHhie OHTOJIOI'HH H Jl:eKCHKOrpacf;>H'IeCKJile MO)l;y­JIH, IlpHMeHHMbie B npou;e,n;ypax aBTOMaTH'IeCKOA o6pa60TKH TeKCTOB

H .n;onycKaIOm;He rrorroJIHeHHe H3 KopnycoB (Smrz, Rychly 2001, Pante!,

Lin 2003; A3apoBa, MapHHa 2006; BHHorpa,n,oBa, MHTpocf;>aaoBa 2008; · CH,D;opoBa 2008). C .n;pyroA cTopoHhr, AKJI - Heo6xo,L1,HMbitt aTan B

aBTOMaTH'leCKOtt KJiaCcHcf;>HKaa;HH/ KJiacTepH3aa;HH, TeMaTH'IeCKOA py6-

PBKaa;HH H HH,n;eKcHpoBaHHH .n;oKyMeHTOB B Kopnyce (Stein, Meyer zu Eissen 2002; AreeB, ,LI;o6poB, JlyKameBH'I 2004; Buscaldi, Rosso,

Alexandrov, Ciscar 2006; BBHorpa,n,oBa, MBTpocf;>am'.rna, IIamPrnBa 2007; BoraThlpea, T10xTBH 2008), npH ou;emce ceMaHTH'IeCKott o,IJ;Ho­

po,n;aocTH TeKCTOB B Kopnyce (MHTpocf;>auOBa 2008; Mitrofanova 2009) HT. )];.

I1uCTpyMeHThI AKJI TaIOKe OTKpbIBaIOT BeCbMa rrpmmeKaTeJihHhre

B03MO:>KHOCTH npH HCCJie,n;OBaIIHH rrapaJVJeJibHbIX KOprrycoB TeKCTOB

(BeJISeBa 2004; Ali.n;peeBa 2006; BeJUieBa, JiapHonoBa 2007). CpaBHH­

TeJibHbltt aHa.7IH3 KOJIH'IeCTBeHHbIX .n;auHbIX o6 yrroTpe6Jiemrn CJIOB, 0

CTeIIeHH HX ceMaHTH'IeCKOA 6JIH30CTH IIOMoraeT ycTaHaBJIHBaTh pac­

rrpe.n;eneHHe JieKcni:i:ecKHx e,II,HHHU. pa3HhIX H3hIKOB BHYTPH JieKCHKO­

ceMaHTH'IeCKHX H TeMaTH'ieCKBX rpyrrrr. I1acf;>opM~ 0 COOTHOmemm

aJieMeHTOB KJiaCTepoB, noJiyqeuuru1 npH napaJIJieJibHOtt o6pa6oTKe TeK­

CTOB oparnaa.rra H rrepeBo,n.a, HMeeT BbICOKJIO a;eHHOCTb B orrpe,n.eJieHHH

a,n,eKBaTHOCTH nepeBo.n;a, rrpH rrpoBe,n;emm KOHTpaCTHBHhIX HCCJie,n.oBa­

HHtt. QqeBH,n;uo, 'ITO rrpHMeneHHe MO)l;JJiett AKJI noBbIIIlaeT a<t>cf;>eKTHB­HOCTh IlOHCKa B napaJIJieJibHbIX Kopnycax, II03BOJUleT H3BJieKaTh .n;au-

1 Ilo,zq>o6Ho acneKThI AKJI, CBSl3a.t1Hble c aJiropHTMaMH KJiaccmp~ / KJia­

crepH3~ H c KOMilbIOTepHbIMH HHcrpyMeHT&'vlH, HCilOJib3yeMhIMH p83JIH'IHhIMH

HCCJie,n;oaaTeJibCKHMH rpynnaMH, a.BTOpbI paccMaTpHBBJOT B ,n;pyrHX CBOHX HCCJJe,n;o­

aaHWIX.

162

LA FILOLÓGICA POR LA CAUSA

Page 6: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Hbie ,ll,JUI IIOIIOJIHeHlrn H KOppeKTHJ>OBKH MHOrml3bl'llihlX CJIOBapeA, )VUl

npoBepKH KaqecTBa pa6oTbl CHCTeM Mammmoro nepeBo.n;a.

OcaoBarur ~eJib aaCTom:n;ero HCCJie.n;oBamrn - cpaBBeHHe TeKCTOB

OpHI'HHaJia H nepeoo.n;a c TO'IKH 3pemra ffAeHTiflHOCTH aa6opoB KJIIO­

'ieBbIX CJIOB, a TaIOKe COCTaBa H KOJIWieCTBeHHbIX napa.MeTpOB HX KJia­

CTepoB.

B 3KcnepHMeHTax no AKJI 6hIJIO 3a,n,eACTBOBa.Ho cne~8Jm3HpO­

Bamme nporpaMMHoe o6ecne'ieHHe (MlilTpo<PaHOBa, MyxHH, Ilrumqe­

Ba 2007). IIporpaMMa AKJI, coo.n;aHHrur Ha IDbIKe Python, BKJIIOqa­

eT B ce6x 6JIOK IIpe)l,BapHTeJibHOA o6pa.6oTKH TeKCTa H Bhf<UICJieHIDI

pacCTOHHH.tt Me)K,cy HCCJie.n;yeMblM.H JieKCeMa?-m, 6JIOK HepapXH'lecKOI'O

KJiacTepHoro aaanH3a, 6noK <PopMHpoBa.a:rur KJiaccoB yCJIOBHott 3KBH­

Ba.rreHTHOCTH. IIporpaMMa AKJI pa6oTa.eT B )l.Byx pe:>KHMax: (A) KJia­

CTepH3~ KJIIO'ieBbIX CJIOB B Ha6opax H (B) <PopMHpoBa.HHe KJiacCOB

yCJioBaoA 3KBHBaJieHTHOCTH AJU1 K~oro H3 KJIIOqeBbIX CJIOB. B npo­Be.n;eHHbnc 3KcnepHMeHTax 6bIJI 3a,n,eACTBOBaH pe:>KHM (B). IlpH aKTH­

BH3a~ nporpaMMbl B yKa3aHHOM pe)f(HMe onpe.n;emnoTcx CJie.n;yro~e

napa.MeTpbI:

1) HMH <Pa.ttna, co.n;ep>Kaill;ero aH8Jm3HpyeMbIA TeKCT (text. txt); 2) HMH <PaAna, co.n;ep:>Kam;ero KJIIO'ieBoe cnoBo (word. txt); 3) llllipHHa KOHTeKCTHOro OKHa (± s);

4) HaJIH'IHe / OTCYTCTBHe BecOBblX 3Ha'ieHHA ,zvrn 6JIWKHHX / y.n;a­

JieHHblX 3JieMeHTOB KOHTeKCTOB (yes/ no); 5) o)KH.D;aeM.bdt o6'heM KJiacca yCJIOBHoA 3KBHBaJieHTHOCTH ( C). IIepB:i:.rli sTan pa.6oThI nporpaMMbl B pe)f(HMe (B) o6ecne'iHBaeT o6-

pa6oTKY BXO.n;HOI'O TeKCTa. Ilpe)f(.n;e BCero o6Hapy)f(HB8JOTCH Bee BXO)f(­

.n;emrn HCCJie.u;yeMbIX JieKceM B TeKCT, 3aTeM IIpoH3BO.n;HTCH aBTOMaTH'Ie­

CKoe BbI,n;eJieHHe rpamm, KOHTeKCTOB B COOTBeTCTBHH c 3a,n,aHHOA illH­

pHHOA KOHTeKCTHOI'O OKHa. ,IJ;anee B03MO)f(ff0 aBTOMaTH'ieCKoe onpe­

.n;eneHHe BecOB 3JieMeHTOB KOHTeKCTa: 'ieM 6JllDKe Il03HWUI 3JieMeHTa

KOHTeKCTa K HCCJie.n;yeMOA JieKCH'ieCKOA eAIDIBl.l;e, TeM Bhiille ero Bee, H

Hao6opoT. B .n;aJibeeAmeM ,ll,JUI Ka)f(.n;OA JieKceMbl l <PopMHpyeTCH MHO­

)f(eCTBO KOHTeKCTOB ee ynOTpe6Jiemrn, KOTOpoe npe.n;cTaBJIHeTCH B BH,n;e

BeKropa .n;HCTpe6y~ B N-MepeoM npoCTpaHCTBe. lfaMepeHIDI npo­CTpa.HCTBa 3a,n,aIOTCH 3JieMeeTaMH KOHTeKCTOB ki ( i = 1. .. N) ,zvrn uc­

CJie.n;yeMoA JieKCeMbl, a 3Ha'ieHHH KOOp,n;HHaT BeKTOpa COOTBeTCTBYIOT

K03cPcPHD;HeHTY B3aHMHoA BCTpe'IaeMOCTH l H ki. 3aTeM IIpOH3BO.n;HTCH

163

LA FILOLÓGICA POR LA CAUSA

Page 7: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

onep~ cpaaHemrn BeKTopoB ~CTpm5yr:µ;d!: Bcex HCCJie,eyeMhr:x JieK­

ceM npHMemITeJibHO K o6pa6aTbIBaeMoMy TeKCTy. B~HCJieime pac­

CTOmnrlt d ocym;ecTBllileTcs c HCIIOJib30BaHHeM MepbI Cos, npH aTOM onpe,n;ensercs 3aaqeHHe KOCHHyca yrJia Me:>K,Jzy BeKTOpa.MH ,n;HcTpm5y­

qeA. Hror pa6oTbI nporpaMMbI Ha aTOM aTane - MaTpHIJ;a paccTommi!:

Me:it<,n;y BeKTOpa.MH ~CTpm5yIJ,H:a ,n;JIH KruK,D;O:ti rrapbl HCCJie,eyeMbIX JieK­

ceM.

Bropo:a aTarr pa6oTbl nporpa.MMbl B pe)l(HMe (B) 3aKJI10qaercs B

cPoPMHJ>OBaHHH KJiaccoB yCJioBHoA aKBHBaJieHTHOCTH .n;JIH KJIIOqeBblX CJIOB, BNAeJieHHbIX H3 TeKCTa. ,ll,JIS aTOro HCilOJib3YIOTCS )J.aHHble H3 MaTp~ COBMecTHOA BCTpeqaeMOCTH, Ha OCHOBe KOTOpblX orrpe,n;ens­

e'l'CS r.mo.>KeCTBO CJIOB c 6JIH3KOA ~CTpH6yIJ,He:a B TeKCTe no cpaBHeHHlO c Bbl6paHHbIM KJIIOqeBbIM CJIOBOM. ProyJibTaTbl BblBO,ZJ;STCS B BH,n;e KJia­

CTepa CJIOB-accoIJ,HaTOB c yKa3rumeM 3HaqeHHs Mepbl 6JIH30CTH Cos no OTHomeHHIO K KJIIOqeBOMy CJIOBy.

3. JlHHrBHCTH'teCK.He ~aHHble

Hccne,n;oBa.mre npoao~cs aa MaTepHaJie PyccKO-CJIOB~oro KOp­nyca nap8.JIJieJibHblX TeKCTOB PARUS {http://kassiopeia.juls.savba.

sk/parus/) {fapa6Hl(, 3axapoB 2006). OcHOBffYIO qacTI> TeKCTOB Kop­nyca cocTaBlliIIOT npoH3ae,n;emrn pyccKoA KJiaccHKH XIX- XX BB. H HX

nepeBO,n;bl aa CJIOBaJ:J;Kldt IDblK (eanpHMep, A. A. BeCTY.>1<es-MapJIH11-cKHA cCTpanmoe ra,n;aHbe», H.B. foroJib «MePTBhle ,n;ynrn», <ll. M.,n:o­croeBcKHA cIIpec~ynneHHe H HaKa3aHHe•, JI. H. TonCTott •AHHa Kape­HHHa•, 11. HJib<f>, E. IIerpoB cPaccK83bn, H. C. THXOHOB cJlemrarpa,n;­CKaSI CHMcPoHIDI• H T . .n;.). B Kopnyc TaK.>Ke BOIDJIH npOH3Be,n;eHHS COBpe­MeHHhlX IIHcaTeJieA (aarrpHMep, II. KpycaHoB «YKyc aereJia• HT. ,n;.), a TaK.>Ke Hayq:Ho-nonyJU1pHhie H3)J.aHHS (HanpHMep, B. ct>. CepreeB «)Ka­Bble JIOKBTOpbI OKeaHa• HT . .n;.).

Kopnyc BKmoqaer 818 097 CJIOB, 43 381 npe,ZJ;Jio.>1<eHH:a B CJIOB~oA qacTH H 819 009 CJIOB, 46 832 npe)J.JlmKeHHft B pyccKoA qacra. Ilo,n;o6-Hoe p83.JIH'me B OObeMe MO)f(llO OObSCHHTb oco6eHHOCTSMH a.nropHTMa cerMeHT~ e nepeao,n;a c pyccKoro ea cnoBau,Kmt

Bee TeKCTbl B Kopnyce aBTOMaTH'lecKH neMMaTH3HpoBa.Ilhl, a TaK­)f(e MoP<l>onorH'lecKH p83MeqeHhl. Mop<l>oJioreqecKas p83MeTKa. CJIO­

Bau;KHX TeKCTOB ocym;eCTBJUIJiacb c IIOMOIIJ;blO Mop<l>oaHaJIH3aTopa (Garabik, Gianitsova, Horak, Simkova-aneKTpoHHa.s Bepcrur), ec­noJib3yeMoro B no~BKe TeKCTOB .n;JIH CnoBruv<:oro HaIJ,HOHaJibHO­ro KOpnyca, MOPcPoJIOrHqecKas p83MeTKa. pyccKHX TeKCTOB 6bIJia npo-

164

LA FILOLÓGICA POR LA CAUSA

Page 8: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

H3Be,z:r,eHa o CHCTeMe ,LI.HAJIHHf npoeKTa AOT (http://www.aot. ru/ docs/ rusmorph.html).

B Kopnyce PARUS npHCY'rCTBYeT 6m5JIHorpa.cl>H<iecKa.S p83MeTKa, cOOTBeTCToyiom;a.H p83MeTKe CJionan;1<0ro Han;HOH8JlbHoro Kopnyca {fa-­pa6m< 2004).

BbipaoHHBrume napa..'IJieJibHblX TeKCTOB B Kopnyce npoH3oo~ocb aBTOMa.THqecKH nyTeM cpa.BHemrn TeKCTOB c yqeTOM COBna,n;eHRH OTH<r

CHTeJibHhlX ,1J,1IHH IlpeAJIO)f(eafil1, p83,z:r,eneHH.H TeKCTOB Ha a63~. IlpH BbrpaBHHBa.HHH HCll0Jlb30B8JICH BHennnrl!: CJIOBapb.

IIouCK B Kopnyce ocym;eCTBJIBeTCH aa ocHooe cuCTeMbI Manatee / Bonito (Rychly, Smrz 2004) c noMOID;bIO cneD;H8JlbBo p83pa6oTaHBcr ro noJib30BaTeJibcKoro HHTep<l>ettca, KOTOpbrlt BKJIIOqaeT BHPTYa.JlbffYIO KJiaBHaTYPY c 6YKBaMH pyccKoro H CJIOBa~oro 8JlcPaBHTOB, c 'AJ!la.KPH­THKa.MH H HeKOTOpblMH ,n;pyrm..m CHMBOJiaMH. 9Ta CHCTeMa ll03BOJUleT ocym;ecTBJUITb B KOpnyce IlOHCK CJIOB, CJIOBOCoqe-ra.BHtt, perymrpHDIX Bbip8.)KeffilA.

B n;eHTpe BHHMaHIDI HaCToHID;ero ucCJie,z:r,oea.mrn aaxo,D,HTca TeKCTIJl poMa.HOB A. C. fpH1:1a «,l:VKeccH H MoprnaHa>, cBJmCTaIOID;HA MHp> H

BX nepeBO,lJ,bl Ha CJIOBa.qKJilt Sl31JlK.

4. OcaoBHble pe3yJI&TaThl

4.1 PeayA'bmam'bl. o6pa6om'IC'U pycC1C0zo u c.n.o6a~?Cozo me?Cemoa po.Matta A. C. I'pu.tta ~,4:>1eeccu. u. Mopzuatta~

IIepe,n; npone,z:r,eHHeM 3KcnepHMeHTOB no AKJI B TeKCTe opHrHHa­Jia u nepeoo,z:r,a poMaHa A. C. fpHHa «,]J,»<:eccu e: Moprna.Ha> ( CJiooaa,. «Jessie a Morgiana>) 6bIJio Bbl,Zl;e.Jieao ,z:r,ec.aTb KJIIOqeohIX CJIOB, B Ka.qe­CTBe KOTOpbIX 6hIJIH Bbt6p8.Hb1 qacTOTHble JieKCeMbl, xapa.KTepH3yIOm;He p83BHTHe CIO)KeTa. B <iHCJIO KJIIOqenhIX CJIOB BOilIJIH:

1) HMeHa rJia.BHhlX repoen (pyc. ,4:>1eeccu-CJiooaa.. Jessie, pyc. Mopzuatta - CJIOBaq. Morgiana, pyc. Eaa - CJIOBaq. Eva, pyc. ,4em-pe'i1, CJiooaa.. Detrey);

2) neKceMbl, onpe,z:r,eJI.KIOID;He CIO)KeTBYJO JIHHHIO (pyc. cecmpa - cncr ea.a,. sestro, pyc. J£iJ - CJIOBaq. jed, pyc. ompaaum'b - CJIOBan;. otr6.vit ', pyc. ttetta6ucm'b - CJIOBaa,. nen6.vist ').

,lI.Jm K8.)K,Zl;Oro H3 KJIIOqeohIX CJIOB 6bIJIH ocym;ecTBJieBbl npon;e,n;y­phl AKJI c p83JIH'lllhIMH napaMe-rpa.MH ( c yqeTOM / 6e3 yqe-ra Becoo, c p83JIHqHOA nmpHHoA KOHTeKCTHoro oKHa). B xo,n;e 3KcnepHMeBTOB

165

LA FILOLÓGICA POR LA CAUSA

Page 9: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

<l:>opMHpoBaJIHCb KJiaCCbl JCJIOBHOA 9KBHBaJieHTHOCTH ,ll;JU[ KJIIO'ieBbIX

CJIOB, BKJIIO'ia.IOIIJ)'le IIO 20 CJIOB-aCCOD;HaTOB c JKa38JUieM HX Mepbl 6JIH-

30CTH Cos OTHOCHTeJibHO KJIIO'ieBoro CJIOBa.

IlpH ycTaHOBJieHHH rrepeBo~ecKHX COOTBeTCTBliA ~a CJIOB B co­

CTaBe KJiacTepoB, rro~eHHbIX rrpH o6pa6oTKe pyccKOI'O H CJIOB~Oro TeKCTOB, HCIIOJib30BaJICH CJie,cyIO~ rrpHHrzym: ecJIH qacTepe'IHa.a rrpH­

HaAITe)f(HOCTb pyccKHX H CJIOBaIIKHX JieKceM - rrpe~oJiaraeMbrx rrepe­

BO~ 9KBHBaJieHTOB He coarra,n;aeT, COOTBeTCTBHe MeJK,Zzy HHMH ycTa­

HaBJIHBaeTCH no JieKCH'iecKOMJ 3Ha'ieHmo ocaoa. AHamI3 nap rrepeBo,n;­

HblX 3KBHBaJieHTOB Mor 6blTb corrpIDKeH co cne,cyiom;HMH TPJ.D;HOCTHMH:

a) ,n;JIH CJIOBaQKott JieKceMbl OTCJTCTByeT 9KBHBaJieHT B pyccKOM TeKCTe;

6) rrepeBO,ll; CJIHIIIKOM ,n;aJieK OT opHrHHaJia, rro,n;o6paTb COOTBeTCTBJI<>­

m;y10 JieKceMy aeB03MO)KIIO; B) JieKCH'leCKoe 3Ha'leHHe, rrepe,n;aaaeMoe

JieKceMoA B CJIOB~OM TeKCTe, Bblp8.)f(eHO CHHTaKCH'ieCKHM crroco6oM

B pyccKOM TeKCTe. 9THMH IIpH'IHHaMH OT'laCTH o6'hacHHeTCH COKpam;e­

HHe ,ll;OJIB COBIIa,n;eHHA CJIOB-acCOa;HaTOB B KJiaCTepax ,n;JIH KJIIO'ieBblX

CJIOB B pyccKOM H CJIOB~OM TeKCTax.

IlpH cpaaHeHHH pe3JJibTaTOB aKcrrepHMeHTOB no AKJI, npoae,n;ea­

HhIX c J'leTOM H 6e3 yqeTa BeCOB KOHTeKCTHbIX 9JieMeHTOB, OKa3aJIOCb,

"<ITO COCTaB KJiaCTepOB ,n;JIH KJIIOqeBbIX CJIOB H pacrrpe,n;MeIDie CJIOB­

accoa;HaTOB B HHX pa3JIH'iaIOTca, H BapHaHT 6e3 J'ieT.a BeCOB rrpe,n;­

CTaBJIHeTCH 6oJ1ee rrpe~O"<ITHTeJlbHblM, TaK KaK OH 6oJ1ee IIOJIHO OTpa­

JKaeT ceMaHTHqecKHe CB.lI3H MeJK,Zzy KJIIO'ieBbIMH CJIOBaMH H CJIOBaMH­

accoa;HaTaMH.

CaMOCTO.lITeJThHOMJ aH8..1IH3J 6blJ1 rro,n;BeprayT TaKott napaMeTp

AKJI, KaK nmpHHa KOHTeKCTHoro OKHa. PaccMaTpHBaJIHCb 3Ha'iemra

rrapa.MeTpa OT [-3; +3] ,n;o [-5; +5]. Pe3J.1IbTaTb1 ,n;JIH pyccKoro H cJioBau,­

KOro TeKCTOB rrpe,n;cTaBJI.lIIOTC.lI a.HaJIOrHqHbJMH: rrpH mHpHHe OKHa ee

MeHee (-4; +4] KOJIH'ieCTBO coerra,n;eHHA B COCTaBe KJiaCTepoB OKa3bIBa­

eTCH uaHOOJlbillHM (,n;o ceMH COBIIa,n;aiom;mc 9JieMeHToB), Tor,n;a KaK npH

IIIHpHHe OKHa [-3; +3) 'IHCJIO CXOJKHX CJIOB-aCCOD;HaTOB 3Ha'iHTMbHO

MeHbllle ( ee 6onee Tpex 9JieMeHToB). B CB.H3H c 9THM 6o.1IbmHA HHTepec

npe,n;CTaBJIHIOT ,D;a.HHble, rro~eHHble rrpH 9KCrrepHMeHTax c IIIHpHHOtt

KOHTeKCTHOI'O OKHa He MeHee [- 4; +4]. CoOT.BeTCTBH.H, ycTaHOBJieHHbie B KJiaCTepax CJIOB-accoa;HaTOB ,n;JIH

KJIJOqeBbIX CJIOB H3 pyccKOI'O H CJIOB~OI'O TeKCTOB, CJIY)l(aT OCHOBaHR­

eM ,n;JIH oa;eHKH a,n;eKBaTHOCTH nepeBo,n;a. B Ta6JI. 1 rrpHBe,n;eHhl ,n;a.HHhie

o CTPJKType KJiacTepoB c JKa3aeHeM CJIOB-accoa;HaTOB ,n;JIH KJDO'ieBhIX

CJIOB RiJ H jed, ,ll;JIH KOTOpbIX 6hIJ10 3aperHCTpHpoBaHO ceMb COOTBeT­

CTBliA. HeTpy,n;Ho 38.MeTHTh, '!TO B KJiacTepax rrpHCJTCTBJIOT JieKce-

166

LA FILOLÓGICA POR LA CAUSA

Page 10: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Mbl, CBH3aHHhle ycTOit<:JHBbIMH ce.Ma.HTHqecKHMB CBH3HMH c JieKCeMoA

JI.a (HaIIpHMep, numue, mpyn'H.'b'iL, noa.M.eU.W.mt>, a100arruxjjatta H T. ,n;.). IloHBJieHHe ~yrHX JieKCeM B KJiaCTepax o6ycJIOBJieHO ClO:>KeTOM poMa.Ha

H oTpa,)f(aeT BHYTPHTeKCTOBble CBH3H.

Ta6AU'qa 1. KnaC'l'ephl, c<l>opMHJ>OBaue1>1e a pyccKOM H cnOBaD;KOM TeKCTax AJUI KJUO'Cf0BblX cnoa Ai} H jed

Kmo'lesoe CJioso .11.a, Kmo'lesoe CJioso je.d, nmpHHa KOHTeKCTHOro OKHa (- 5; +~ urnpHHa KOHTeKCTHOrO OKHa_l:-5; +5]

8neMeHThI KJiacrepa Cos 8neMeHTbl KJiacrepa Cos .H,ll, 1,0 jed (.H,D,) 1,0 IlHTHe 0,1965 zamarit' ( npe~cra.sHTbOI, no- 0,2117

'JY,ll;HTbCSI) roprosKa 0,1925 mttvolny (TpynH1>1A) 0,2112 HaKJ!Offi!Tb 0,1925 zakopat' (3aKODBTb) 0,2107 TpyDHhlA 0,1726 prerast' (pa3BHT&CJ1, nepe- 0,1909

pacTH) noAMemaTb 0,1709 stihonuun \MaHWI npecne- 0,1904

~OBaHIUI) MHOC'O'IHcneHHbrl!: 0,1577 tlct ' (6HTb 1 CTy'IaTb) 0,1904 3aKODBTb 0,1573 aquatofan ]_aKeaTO<t>aea) 0,1904 aKBaTo<t>aHa 0,1573 rozochvene (Tpene~a, 3a.lij)O- 0,1714

JKaB) pa3HOBH~OCTb 0,1509 chrtivo (xpHIIJio) 0,1691 H~OBOJibCTBO 0,1498 trepinka J.ocKOJIOK) 0,1691 HW<aHyfle 0,1492 farebny (~eTHoA:, KpacO'IHbIA:l 0,1685 yrsep~eJibHO 0,1486 dotierav:Y (HOOTl»l3HbIA:) 0,1675 CKJll[HKa 0,1351 liocba _1!Ie'leHHe) 0,1633 HII1epm~a 0,1287 vypadnut' (sblilac=) Q,1557 pa3BHTbCJI 0,1277 vyspytatel 'ne ( HenOCTIDKHMol 0,1349 MaHHJI 0,1129 stredovek lcpe,zvme seKa) 0,1269 )1(8.6a 0,0948 paliatfvny]. namrnaTHBHhlA) 0,1264 Monn10CK 0,0854 !aba ( :>Ka6a) 0,1045 HaceKOMoe 0,0643 mi.klcy§ (Monn10CK) 0,0837

IT pH M e 'I a H He. Cnosa.-acco~aTbl, o6pa3yio~ae naphl nepeso,ll,Hl>lX ~KBHBa­neHTOB B CpaBHHBaeMblX KJiacrepax, BbIJJ;eJieHbl IlOJIYJKHpHbIM mpmpTOM.

AHa.rronPrable npoa,e~hl AKJI H conOCTaBHTeJibHOro ae8.JIH3a co­CTa.Ba KJiacTepoB B pyCCKOM H CJIOBaII,KOM TeKCTax 6blJIH OCyID;ecTBJieHbl

~ Bcex KJIJOqeBbIX CJIOB. Cpaaeemre TeKCTOB opHI'HHBJia .e nepeBo.n;a

poMa.Ha A. C. fpHHa «)l)KeccH H Mopraaaa~ noKa3a.n:o, 'ITO nepeceqe­

HHH B COCTa.Be KJiaCTepoB ,D;OCTaTO'llIO cym;eCTBeHHhI B co.n;ep»ca:reJlbHOM

acneKTe: 3TO CBB,ll;eTeJihCTByeT o6 a,n;eKBaTHOCTH nepeBO,n;a.

167

LA FILOLÓGICA POR LA CAUSA

Page 11: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

4.2. Peay.11:bmam'bL o6pa6ormru pycc?Coeo u C.ttoea'4?W20 me?Ccmoe poMaHa A. C. I'putt.a 'B.11,ucmaww,uiL Mup~

B xo.n;e aH8.JIH3a TeKCTa oplil'HHaJia H nepeBo.n;a poMaaa A. C. fpH­

Ha «BJIBCTaIO~ MHpl) ( CJIOBaJJ;. «Ziarivy sveb) 6hIJIO Bhl,ll;eJieHO 20 KJIIOqeBhIX CJIOB, CpeAH KOTOphIX 'iaCTOTHhie JieKCeMhI, Hafil)c)Jlee IIOJIHO

xapaKTepH3y10m;ee CJO)f{eT. KJIJOqeBhie cJioBa MO)f{HO pa.36HTb Ha cJie­

AYJOIII;He rpyllIIhI: 1) HMeua rJiaBHhIX repoeB (pyc. Taeu - CJIOBau;. Tavi, pyc. ,4pyiJ­

CJIOBan;. Droud, pyc. Pytt.a - CJIOBan;. Runa, pyc. Cme66c - CJIOBau;.

Stebbs, pyc. Kpy?Cc-CJ1oBan;. Crow:); 2) JieKCeMbl, CB513aHHhie c JJ;HPKOBhIM rrpe.n;cTaBJieHHeM (pyc. iJpyz -

CJIOBan;. priatel ', pyc. '4UJm - CJioaan;. cirkus, pyc. ene"{am.ttett.ue - CJIO­

aan;. dojem, pyc. iJeu:>1eeHue - CJioBan;. pohyb, pyc. mo.ttna - cJioaau;.

dav); 3) JieKCeMbl, Bb!J>SJKaIOm;ee a6cTpaKTHbie IIOH51TH51 (pyc. cyi>b6a -,

CJiosau;. osud, pyc. 0ywa - cJiosan;. dusa, pyc. aey?C CJIOBau;. zvuk, pyc.

Mup - CJIOBau;. svet, pyc. cepiJ'4e - CJIOBau;. srdce); 4) .n;pyroe (pyc. cmpax - cJionau;. strach, pyc. y.tt'bl.61Ca - CJIOBan;.

tlsmev, pyc. M'bLCJt'b CJiosau;. myslienka, pyc. ceem CJIOBau;. svetlo, pyc. :HCU3H'b - CJIOBa.D;. zivot).

. 9KcnepHMeHThl no AKJI 11poso.n;tl.1mch no a.HaJIOrHH c 3KcnepHMeH­

TaMH Ha MaTepHa.Jie TeKCTOB opHnma..na H rrepeso.n;a poMa.aa A. C. fpH­

Ha c,IVKeccH H MoprHaHal). KaK H B rrpe,Ll.hI)Iyru;eit cepHH 3KcnepHMeH­

TOB, K HaH.Jiy'IlllHM pe3JJibTaTaM npHBeJia i<JracTepH3aI.I,IDI 6e3 J'leTa

aecos, c mHpmmit KOHTeKCTHoro mrna ue Meuee [-4; + 4]. HarrpHMep,

.n;JI.K KJIIO'ieBOro CJIOBa 36Y1C B 3KCIIepHMeHTax rrpH nmpHHe KOHTeKCT­HOro OKHa [-3; +3) 6muKe Bcero K KJIIOqesoMy CJIOBy B KJiacTepe oKa-

3hIBaIOTC.s: CJiosa-accoD,HaThl ce·ucmO?C, noih£Mam'bC.J£, IIpH nmpHHe [-4; +4]- CJIOBa ceucmmc, uaiJaeam'b, rrpH nnrpHHe [- 5; +5) noihi..Mam'b­c.ii, Mapmbl.W?Ca. BH.n;Ho, 'ITO IIPH CJIHIIIKOM nmpoKOM HJIH CJIHilIKOM

J3KOM KOHTeKCTHOM OKHe CJIOBa-aCCOu;HaThl He IIpO.s:BJI51IOT TeCHhIX ce­

Ma.HTH'lecKHX CB513eit .n;pyr c .n;pyroM H c KJIIO'leBhIM CJIOBOM, ror.n;a KaK rrpH rrpoMe)f{yTO'IHOM 3HaqeH.HH 3TOit BeJf.EI'iHHhI IIOJIY'ieHHJ>Ie CJIOBa­

accou;HaTbl OKa3hIBaIOTC51 B3aHMOCBSI3aHhI Ka.I< Me.>K,ZJy co6ott, TaK H c

KJIIO'leBhIM CJIOBOM.

AHaJIH3 cocTasa KJiacrepoB CJIOB-accoD,HaToB .n;JI.K KJI10qeBhIX CJIOB B

pyccKOM H CJIOBa.u;KOM TeKCTax BO MHOrKX CJIJ'lruIX II03BOJI.s:eT .n;eJiaTh

BNBO~ o CIO)f{eTe poMaaa H o xapaKTepHCTHKax ero rJia.BHhIX nep­

COHaJKeit. B qacTHOCTH, p51,L1; CJIOB-accoD,HaTOB .n;JI.K KJIIOqeBoro cnosa

168

LA FILOLÓGICA POR LA CAUSA

Page 12: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Cme66c ( JieKCeMhl mempai>b, 90.11., ycepmw, C.M.ompume.11.'b, au-au-ao­aott, 6'bt36aHUBam'b H T. ,lJ;.), C03,ll;aeT ,D;OBOJlhHO qeTKoe rrpe,I.J;CTa.BJieHHe

o6 <>TOM repoe. CorJiaCHO CIO:lKeTy poMaHa, CTe66c, CTOpo.:lK JlHCCKoro

Ma.mca, Be,n;eT TeTpa,LJ;b, Ky.n;a ycep,n;Ho 3aDHChIBaeT CT.mm co6CTBeHHoro

co'IIIBeHHH. CTe66c mo6HT 'IHTaTb CBOH CTHXH CBoeMy .n;pyry ,l]J>y.n;y,

Kor.n;a TOT Ha.Bem;aeT ero. B OAHY H3 TaKIDC BcTpeq CTe66c noK83hl­

Baer npHHTemo c.n;enaHHbrlt co6cTBeHHOpI'ffiO MJ3h1KaJibHhlft HHCTpy­

MeHT, COCTOHID;Bit H3 CTeKJIHHHbIX 6JThlJIOK c OTmmeHHhlMH HlDKllHMl'I

qacTHMH. B CJIOBa~oM TeKCTe Bhl,ll;eJieHh! nepeBO,.!I.Bbie 3KBHB8JleHTh1

pyCCKHX CJIOB-acco~aTOB, a TaK.:lKe H .n;pyrHe CJIOBa-acco~aThl, KOTO­

phle TeM He Meaee peJieBaHTHhl ,ll;JIH xapaKTepHCTHKH 3TOI'O nepCOHBJKa

( CJIOBa iskricka ( 'UC?CpuH?l:a), cing-ling ( OpuH'b-OpuH'b), cing-ling (Ou­ifu-ao- aou), zabavit' ( noaece.11.um'bc.R.), pobrnkavat' ( e'bLae.11.tnaam'b, no­uzp'bLaam'b ( na.11.°"'?Coil)) H T. ,n;.) .

0,n;HaKO CJie.n;yeT OTMeTHTb, qTo B pH,D;e ~aeB COCTaB KJiacTepoB

CJIOB-acco~aTOB ,ll;JUI HMeH I'JiaBHbIX repoeB ae.n;ocraToqao HH<pOpMa­

THBeH. Bo3MO.:lKHO, 3TO o6yc.noBJieao TeM, qro B HcnoJlb3yeMott Bep­

cee KOMnb10Tepaoro HHCTpyMeHTa AKJI He rrpe.n;ycMOTpeH yqe-r CHH­

TarMaTeqecKHX CBH3eA CJIOB B TeKCTax. iliyM KJiaCTepH3a~ MO.:lKeT

B03HHKaTb B CHJIY Toro, qTo rpa~ KOHTeKCTHOI'O OKHa MOI'JT He

coBna,u,aTb c rp~aMH CHHTan.r. HanpHMep, ,ll;JIH KJIIOqeooro CJIOBa

J(pya B KJiaCTep CJIOB-acco~aTOB, c¢opMHpoBaHHhIA ,ll;JIH pyccKOI'O

TeKcTa, nona.JIH CJie.n;yio~e JieKceMb1: 'IC.ll.106, 20.11.y6'b, c?Cpunemt>, zyp­?Cam'!:> H3 KOHTeKCTa • Ilo ?CapHU3a.M :HCa.11.UCb 6 p.R.<J'bi. COHH.'bLe ZOJty6u, zyp1ro.R. u ctcpUn.R. M10aoM. ,l(pya aeatty.11.. l.(up?C u ttanaae11:ue ymoMu­Au ezo•. )];JIH TeKCTa Ha CJIOB~OM H3h1Ke nepeBO,D;Hhle 3KBHBMeHTbl

3THX JieKceM TaK.:lKe BOIDJIH B CIIHCOK CJIOB-acco~aTOB KJIIOqenoro CJIO­

Ba Droud: zobak, holub, st'ukat', hrk:Utat': cp. KOHTeKCT •Na rimsach sa tlaCili k sebe holuby, hrkutali a st 'ukali zobakmi. Drou.d zivol. Cirkus a prepad ho u.navili•. C o,n;HoA cropoHh1, JKa3aHHh1e CJIOBa-acco~aTbl CJie.n;yeT C'IHTaTb OKK83HOH8JlbHhlMH, c .n;pyroA-H8..lIH'IHe B KJiaCTe­

pax nap rrepeBO,n;HhlX 3KBHBaJieHTOB CBH.n;eTeJibCTByeT 0 JieKCH'l:eCKOM

COOTBeTCTBHH Me.:lK.n;J TeKCTaMH operHHaJia e rrepeoo.n;a. BnpoqeM, CJie­

.n;yer .n;onyCTHTb, qTo HecTaH,Zl;apTHhle cJiosa-acco~aThl, BblHBJieHHh!e

AJUI KJIIoqeBbIX CJIOB, MoryT OTproKaTb BHJTpeHHHe CBH3H, cne~Q>eq­

Hble ,ll;JIH paccMaTpHBaeMOI'O TeKCTa B n;e.noM HJIH ,ll;JIH OT)l;eJibHOI'O ero

Q>parMeHTa.

BruKHo 3aMeTHTb, qro B KJiaCTepax CJIOB-acco~aTOB BhlHBJIHlOTCH

rpyIIllhI CJIOB c 6JIH3KHM 3aaqeHHeM, a T8.IOKe yCToA'l:HBhle CJIOBOcoqera.­

HHH. HanpHMep, s KJiaCTep wrn KJIIOqesoro cJioBa cmpax BOmJIH TaKHe

169

LA FILOLÓGICA POR LA CAUSA

Page 13: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

JieKCeMhl, KaK 6eiJa, zpoaum'b, ympama, 6eaom"temtt'bl.u, aapaaum'bc.R.; .zvrn CJIOBa ceem - JieKceMhI iJHeettoiL, eocxumume.tt'b'H.'bl.U, pacnpocmpa­H.R.m'b, ocee~enue; .zvrn CJiosa Mup- JieKceMhI notcopum'b, m<m""auwuu, naymutta, iJyxoett'bl.iL; .zvrn CJIOBa aeytc - neKceMhI ceucmO?C, aaeueatt.ue, moc?C.itU6eiJ.w,uu, uaaaeam'b, aaz.ttyw,amt.; .zvrn CJIOBa '4Up?C - JieKCeMhl

cJupe?CmOp, v,etta, ympoum'b, n.ttamtt'bl.u. To )Ke caMoe Ha6mo.n;aerc.s: H B

KJiacTepax cnoBall,KHX CJIOB-acco~aToB: s KJiaCTep KJIIO'ieBOro CJIOBa

]>Ohyb ( aeu:>1eewue) DOIIaJIB JieKCeMhl plavny ( n.tta6H'b£U), graci6zny ( 2pa­V,U03H'bl.U); .zvrn CJIOBa usmev (y.tt'bl.6tca)-JieKCeMbI §ibalsk'!} (.ttytca6bLU), l'ubostny ( cepae"ttt'bl.u), zaihrat' (aauzpamt>); ,n;IDI CJioBa svetlo ( ceem)­JieKceMhI luster (.1t10cmpa), zapal'ovat' (paawpam'bc.R.), zaziarit' (ceeptc­ttym'b), zltost' ( :>1ee.t1.muatta), pozlatit ' ( noa0.ttomumt>), zafat' ( ecn'bl.x-

Ta6J11.1'4a 2. KJiaCTepbl, c<j>opMnpoBaHHble e pyccKOM H CJIO~M TeKCTaX A1UI KJIIO'!:eBblX CJIOB ~upn H cirkus

Kmoqesoe CJIOBO 'l\Up7J;, nm:p1ma Kmoqesoe CJIOBO cirkus, KOHTeKCTHOro OKHa (-4; +4) IID!pl'IHa KOl!TeKCTHOro OKHa (-4; +4)

9neMeHTbl Knacrepa Cos 9neMeHTbl KnaCTepa Cos IVIPK 1,0 cirkus 1,0 ConeAJib 0,2883 svetielkujUci ( CBeTil~CH, JTIOMH- 0,1721

HOCI:{Rpyro~)

~KTop 0,1871 riaditel' (.n.HpeKTOp) 0,1710 YTpoHTb 0,1606 bohvieaky (6onbmoA, BblCOKlilt (o6 0,1710

aBTOpHTeTe ~pKa)) Tp8,ll,HIJ;HO!Dif>rlt 0,1597 Aggaissitz (Arac~) 0,1700 nap1111 0,1597 !lest'hranny :IwecTurpaeHblA) 0,1439 po~CTBemn.:dt 0,1597 pB.ria (napWI) 0,1439 CHJIHTbCff 0,1510 porulleny (HapymeHHbrll:, HCilOp'leH- 0,1439

Hbrlt) QeHa 0,1375 hrkutat' (rypKaTb) 0,1429 rypKaTb 0,1342 faj~iarsky ( Kypwrenbffbrlt) 0,1429 IlOCTOXTb 0,1342 autorita (asTopwrer) 0,1428 3a.zi,ep)f{HB8.TbCH 0,1332 novinar ( I'!l3eTHbdi >KYPHaJIHCT) 0,1418 KJIJOB 0,1142 satan~ ( ca:raH~ 0,1418 IlOTeTb 0,1136 poskytnut ' (npHHOCHTb, 'ACIBaTh) 0,1416 ro.ny6b 0,1077 holub (rony6&) 0,1227 DJiaTHblA 0,1068 zobak_(KJIIOB) 0,1216 CKpHDeTb 0 ,1068 pdkwat' 0,1168 ::itcypH&llHCT 0,1068 i!it'ukat' IcKpHDeTb) 0,1148 WeCTDrpaHHO 0,1059 zovretie (3a,)KHM, XBaTKal 0,1147 no.n;xnecnma.Tb 0,0803 plateny (mtaTHNA) 0,1147

II pH Me 'I a H He. CJioBa-accon;uaTbI. o6paay10~e napbI rrepeBO,D;HbIX 3K­

BHBaJieHTOB B CpaBHHBaeMbIX KJiacTepax, Bbl,ll;eJJ:eHbl IlOJI)')KH}>HbIM IDPH<l>­

TOM.

170

LA FILOLÓGICA POR LA CAUSA

Page 14: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Hym'b), oslepovat' (oc.n.en.11.1im'b); .n,mr CJIOBa i}yw.a (d'USa)-neKceMN

hnutie ( aeu.?1Cettue), spas a ( cnacettue), zakutie ( nome.MKU). EcJm ro­oopHTb o6 JCTOit<m:BoCTH CBil3eA Me)K,IJy CJIOBa.MH-accoD,HaTaMH B KJia­

CTepax, TO H&HOOJibIIIee 3Ha'l:eHHe MepbI 6JIH30CTH xapaKTepH3yeT pyc­

CIGfe H CJIOBa~e napbl THIIa 3Byit I zvuk usaaeam'b I vydavat', qup?C / cirkus -<Jupe7'1Ttop / riaditel', Kpy?CC / Croux - Bettua.Mutt / Benjamin. TaKoe noJio:>Keime <>6-b.s:cIDieTC.H TeM, 'ITO 3TH CJIOBa -Tep­

MHHhI 6830BbIX ceMaJITK'lecKHX OTHomearut B CTPYKType neKCHKOHa.

IlpH yCTaHOBJieHHH nepeoo~ecKHX COOTBeTCTBiilt B napax KJIB.CTe­

poB, c<t>opMHpoBaHHblX .zvrn KJllO'leBbIX CJIOB H3 pyCCKOI'O H CJIOBau;­

KOI'O TeKCTOB, OKa3aJIOCb, 'ITO HaOOpbI pyccKHX H CJIO~ CJIO&­

accoD,HaTOB )J.OBOJibHO CHJlbHO OTJIH'iaJOTCjl )J.pyr OT )J.pyra. ,ll,aJKe npH

Bb16ope OIITHMaJibHNX napa.MeTpoB AIOI pery.JIHPHO perHCTpHpyeTcg

He 6onee TpeX nap cxo$brx CJIOB-acCOD,HaTOB .zvrn pyccKOI'O H CJIOBau;­

KOI'O TeKCTOB. Ha6opb1 CJIOB-accoD,HaTOB B KJiaCTepax .n,mr KJIIO'leBbIX

CJIOB Taeu H Tavi ooace He nepeceKaIOTcs:. 0)1.HaKo .n,mr KJIIO'l:eBblX CJIOB

qup?C H cirkus Bbl.HBJieHO )J.eB11Tb nap CJIOB-3KBHBaJieHTOB, BMeIO~

CXO)J.HJIO )J.HCTpH6yD,HIO B pyccKoM e CJIOBa~oM TeKCTax. B Ta6JI. 2 npH­

Be)J.eHbl )J.aHHble o CTPJKType KJiaCTepoB c JKa3BHHeM CJioa-accoD,HaTOB

)VIH KJIIOqeBbIX CJIOB qupx: H cirkus. TeM Ca.MbIM, pe3JJibTaTbl npou,e.nypbI AKJI, ocym;ecTBJieHHo:R

B TeKCTax pyccKoro operBHana H CJIOBau,KOro nepeao.n,a poMaaa.

A. C. fpHHa cBJIHCTaIOm;H:A MHp>, JKa3b1BaIOT Ha TO, 'ITO nepeBO)J. MO:>K­

HO npH3HaTb ~eKBaTBblM, BMecTe c TeM, TeKCT CJIOBau,KOro nepeoo.n,a

xapaKTePH3YeTC.H .n,OCTaTO'IHOA caMOCTOHTeJibHOCTbIO no cpaBHeHHIO c

TeKCTOM pyccKOI'O opHmHana.

5. BbIBOAbl u nepcneKTHBhl ,ll,aJibHeiimero uccneµ,oBaHHSI

B xo)J.e aKcnepeMeHTOB 6blJla no)J.TBep:>K,ll,eea B03MO:>KHOCTb cpaaee­

HHH • TeKCTOB operHHaJIOB B nepeBO)J.OB c J'leTOM pe3JJibTaTOB AKJI.

B )J.aJibHettllieM npe,n;CTOBT npoBeCTH aKcnepHMeHTbl B TeKC'rax pa3aoro

o&beMa H CTBJI.H, c pacnmpeHHbIMH ea6opaMH KJIIOqeabIX CJIOB, c H3-

MeHeHHeM napaMeTpoB KJiacTepH3aD,HH {Mepa 6JIH30CTB, nmpBHa KOH­

TeKCTHOI'O oKHa, o6'beM KJiacTepa H np.). TaK:>Ke Heo6xo,IJ,HMo npoae­

pHTb p.H)J. rHilOTe3 0 BJIHHHHH qacTQTHOCTH KJIIOqeBblX CJIOB, IlJIOTHOCTH

HX KJiacTepoB Ha pe3YJibTaTbl 8.H8JIH3a TeKCTOB.

ABropbI Bbipa:>KaIOT 6naro.n;apHOCTb P. rapa6HKy {11acTBTJT .H3bl­

K03HaffHj{ JI. llhypa CJIOBau,KOA ~eMJm HaJK) H B. n. 3axapoBy

{Cil6fY, llJIH PAH) 3a co.n;eACTBHe B npoBe.n,eHHH aKcnepHMeHTOB u

171

LA FILOLÓGICA POR LA CAUSA

Page 15: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

3a noMO!qb B pa60Te c TeKcTaMH H3 Koprryca ITAPYC, IT. B. Ilaneqe­

BO:A: u H. C. KJ3He~oBo:A: (CII6rY) 3a rro,wOToBKY HHcTp)'1'1eHTa AKJI,

HCilOJib3yeMOI'O B HaIIIeM HCCJie,n;OBaHHH, a TaIOKe BCeM KOJIJieraM, IIpH­IDIBIIIBM yqacTHe B o6cy.>K,n;eHim OCHOBHblX H,n;ett H pe3JJibTaTOB pa6o­

Tbl.

JluTepaTYPa

Azeee M. C., ,4o6poe B. B., Jifl"(ltueeu"' H. B. Ilo,D,Diep)K](8. CKCTeMbl aBTOMaTH­<1ecKoro py6p~poBa.Hrui AJIX CJIO:lKHblX 38,Zl;a'I KJiacCH¢~H TeKCTOB / / .Tpy)J,hl 6-ll: Bcepoc. Hayq. KOH¢. c9JieKTpoHHhle 6H6JIHOTeKH! nepcneKTKBHhle MeTO,IJ,bl H TeXHOJIOI'HH, aneKTpOHHhle KOJIJleKIJ;HH• (RCDL-2004). Il~o, 2004.

Asapoea H.B., Maputta A. C. ABTOMaTH3HpoBaHHaJI KJiaccm:p~ KOHTeK­CTOB npH nOAI'OTOBKe ~ AJU1 KOMill>IOTepHoro Te3aypyca RussNet / / KoMITbIO­TepHM JIHHI"BKCTKKa H HHTeJIJieKTyaJihHbie TeXHOJIOrnH: Tpy)J,hl Me:lK,!lyffap. KOHcp. c)J;Hanor-2006•. M., 2006.

Atu1peeea E. r AHaJIH3 nepeaop;qecKHX COOTBeTCTBKll: Ha MaTepKaJie napan­JleJibHOro Kopnyca TeCTOB / / KoMilhlOTepHM JIHHrBHCTHKa H HHTeJIJieKTYaJibHhle TeXHononm: Tpy,!l;bl MeJK,D,yHap. KoH¢. «)J;Hanor-2006•. M., 2006.

Be.A.Aeaa JI. H. JieKCHKOrpa¢H'lecKHA: llOTeffIVlaJI napanJieJibHOro Kopnyca TeKCTOB / / Tpy)J,hl MeJK,ZzyHap. KOHcp. cKopnycH8Jl mmrBHCTHKa-2004•. CII6., 2004.

Be.A.Aeaa JI. H., JiapuOH.Oaa H. B . Mcn0Jrb30BaHHe JIHHI"BHC'QNecKoro pecyp­ca ,ll,1UI 3BTOMaTH'lecKOll: KJlacCH¢~H JieKCHKH KaK OCHOBbI AJUI ~a.I'HOCTHKM aµ;eKBaTHOCTH nepeaoµ;a (Ha MaTepRane TeKCTOB poMaHOB B. 0. IleneBHHa u mc nepeBO,!l;OB Ha aHrJIHll:CKHll: H3bIK) / / cMegaLing-2007>: ropH30HTbI IlpHKJl~Oll: JIHHI'BHCTHKH H JIHHI'BHCTH'lecKHX TeXHOJiorHll:. )J;oKJI~ MeJK,ZzyHap. KOH¢. CKM­Q>eponoJlb, 2007

Bozamwpea M. IO., T10xmux B. B. PemeHHe HeKOTOpbIX 3aµ;aq Text Mi­ning npH noMOII:QI KOHu;errry&IIbHbIX rpa¢oa / / Tpy)J,hl 10-ll: Bcepoc. Hayq. KOHcp. c9JieKTpoHHhle 6Ht5JIHOTeKH: nepcneKTHBHhie MeTO,!l;bl H TeXHOJiornH, aJieKTpoHHhle KOJIJleKIJ;HH• (RCDL-2008). )J;y6Ha, 2008.

Btmozpa(Joaa H. B., Mumpoefxittaaa 0. A. cI>opMaJibHaJI OHTOJIOI'IDI KaK KHCTpy­MeHT CHCTeM3~H ,11;8.HHblX B pyccKOH3bl'IHOM KOpnyce TeKCTOB no KOpnyCHOll: JIHHrBHCTHKe // Tpy,!l;bl Me)K,!lyHap. KOHcP· cKopnycHM JIHHrBKCTHKa-2008>. CII6., 2008.

Buttazpa(Joaa H. B., Mumpoifxixoaa 0. A., IIaxu"'eaa II. B. ABTOMaTH'i~ CKaJI KJ13CCH<i>m<au;ru1 TepMHHOB B pycCK0.!13bl'IHOM KOpnyce TeKCTOB no Kopnyc­HOll: JIHHrBHCTHKe / / Tpy,!l;bl 9-ll: Bcepoc. Hayq. KOH<p. c9neKTpoHHhle 6H6JIHOTeKH: nepcneKTHBHhle MeTO,!l;bl H TeXHOJIOrHH, 3JieKTpOHHhle KOJIJleKIJ;HH• (RCDL-2007). IlepeCJI3BJib-3anecCKHll:, 2007.

I'apa6u1C P. Cno~ H~OHaJibHhIA Kopnyc / / Tpy)J,hl Me>K,zyHap. KOHcP. cKopnycHM JIHHI'BHCTHKa-2004>. CII6., 2004.

I'apa6u7' P., 3axapoa B. II. IIap8JIJleJibffblll: pycc1<0-cnoaau;iadt Kopnyc // Tpy­)J,hl Me:lK,!JYHap. KOH¢. cKopnyCHaa JIHHrBHCTHKa.- 2006•. Cll6., 2006.

JieOH.m'beea H. H. ABTOMaTH'lecKoe nOHHMaHHe TeKCTOB: CHCTeMbl, MO,!l;eJIH, pecypci.r. M., 2006.

172

LA FILOLÓGICA POR LA CAUSA

Page 16: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Mumpcxfx1:noaa 0 . A . O~em<a ceMa.HTH'lecKoa O,D;HOpoAflOCTH TeKCTOB no pe-3YJll>TaTa.M aBTOMaTH'lecKoA KJiacrn<P~ JieKCHKH 11 XXXVII Me)f(,cyHap. Q>HJioJI . KOHQ>. Cel<IVl.11 MaTeMaTH'lecKoA JIHHI'BHCTHKH. CI16., 2008.

Mump0f/>a'H.06a 0 . A ., Myxutt. A . c., flatt.'U~Ba n. B. ABTOMaTH'lecKa.R KJiac­CHcp~ JieKCHKH B pyCCK0.113hI'IHblX TeKCT8JC Ha OCHOBe JiaTeHTHOI'O ceMaHTH'le­CKOI'O aH!IJIH3a. 11 KOMilbIOTepHa.11 JIHHI'BHCTHK!I. H HHTeJLlleKT)'&llbHbie TeXHOJIOnm: Tpy,11;I>1 Me)f(,cyHap. KOHcP. c)l.Hanor-2007>. M. , 2007

OuiJopoaa E. A. Ilo~O.I:\ K nocrpoeHHIO npeAMeTHblX CJiosapea no Kopnycy TeK­

CTOB 11 Tpy,IJ;bl Me}f(,cyHap. KOHcp. cKopnycHa.R .IIHHI'BHCTHKa- 2008>. CI16., 2008. Buscaldi D. , Rosso P., Alexandrov M., Ciscar A. J. Sense Cluster Based Cate­

gorization and Clustering of Abstracts I I Computational Linguistics and Intelligent Text Processing: Proceedings of the 7th International Conference CICLing- 2006. LNCS 3878. Berlin, 2006.

Gambik R. , Gianitsova L. , Horak A ., Simkova M. Tokenizacia, lematizacia a morfologicka anotacia Slovenskeho narodneho korpusu - http:l l korpus.juls.sa.vba. skl publicationsl block2I

Mitrofanova 0 . Automatic Word Clustering in Studying Semantic Structure of Texts I I Advances in Computational Linguistics: Research in Computing Science. Vol. 41 I Ed. A. Gelbukh. Mexico, 2009.

Pantel P. , Lin D. Discovering Word Senses from Text 11 Proceedings of ACM Conference on Knowledge Discovery and Data Mining (KDD-02). Edmonton (Cana­da) , 2003.

Rychly P., Smri P. Manatee, Bonito and Word Sketches for Czech 11 Tp~ Me>K,ZJYHap. KOHcP- cKopnycHa.R JIHHI'BHCTHKa- 2004>. CI16., 2004.

Smri P. , Rychly P Finding Semantically Related Words in Large Corpora 11 Text, Speech and Dialogue: 4th International Conference (TSD- 2001) . LNAI 2166. Berlin, 2001.

Stein B., Meyer zu Bissen S. Document Categorization with MajorClust 11 Proceedings of the 12th Workshop on Information Technology and Systems (WITS-02) . Barcelona (Spain), 2002.

LA FILOLÓGICA POR LA CAUSA

Page 17: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Ceep;eHH.H 06 aBTOpax

rep,a; AneKCB.H,Zq> CepreeBH'I - ,ll;OKTOP cpHJIOJIOrH'lecKHX HayK, npo¢ec­cop, 38.Be,!J;YIO~ Kact>e,!J;pOit MaTeMaTH'lecKOit JIHHrBHCTHKH cpHJIOJIOI'H'le­CKOro ¢ai<yJibTeTa CfI6rY E-mail: [email protected]

rpHH6ayM Oner HaTaHOBH'I - ,ll;OKTOP cpaJIOJIOI'H'lecKHX HaJK, npo¢ec­cop Ka¢e,!J;phl MaTeMaTH'lecKOit JIHHrBHCTHKH cpHJIOJIOrH'lecKoro <l>aKYJibTeTa CIT6rY. E-mail: [email protected]

,lJ;MHTpHee AneKCaH)JJ> Bna,z:i;ucJiaBOBH'I acm1pa.HT, CTapmuif npeno­,a;asaTeJib Ka¢e,!J;pbl MaTeMaTH"'IecKO.fi: JIHHrBHCTHKH <l>HJIOJIOrH"'IeCKOro <Pa­KYJlbTeTa Cil6rY E-mail: [email protected]

.D;o6poe AJieKceU: BJiaAffMHpOBH'I acnHpa.HT Kacpe;:q>bI MaTeMaTH­'lecKoA JIHHrBHCTHKH <l>HJIOJioraqecKoro ¢aKyJI&TeTa CIT6rY. E-mail: [email protected]

3axapoe BHKTop IlaeJIOBH"'-1 - KaJf,ll;H,a;aT cpHJIOJIOrH'lecKHx HayK, ,11;0-

n;eHT Kact>e;:q>hl MaTeMaTH'lecKOH JIHHrBHCTHKH cPHJIOJIOrH'leCKOro ¢ai<yJlb­TeTa CIT6rY E-mail: [email protected]

3y6Koea ThTb.HHa HBaHoeHa Ka.H,L\H,!l;aT <l>HJIOJIOrH'leCKHX HayK, ,11;0-

n;eHT K~hl MaTeMaTH'lecKOA JIHHl'BHCTHKH <l>HJIOJIOrH'lecKoro cpaKyJlb­TeTa CIT6rY E-mail: [email protected]

MapThIHeHKO rpuropHH .HKOBJieBH'I ,!J;OKTOp cpHJJOJIOrH'leCKHX HayK, npo¢eccop K~bl MaTeMaTH'lecKOH mmrBHCTHKH <l>HJIOJIOnl'lecKoro <Pa­KYJibTeTa Cil6rY E-mail: [email protected]

MHTp<><i>auoea On&ra AneKCaH)JJ>OBHa Ka.H,a;H,a;aT <l>HJIOJIOrH'lecKHX aayK, ,a;ou;eHT Ka¢e,!J;phl MaTeMaTH'l€CKO:A JIHHrBHCTHKH <l>HJIOJIOrH'lecKoro cpai<yJI&TeTa CIT6rY E-mail: [email protected]

HeMn;eea (IInaxoT.H) Ba.n:eHTHHa Bs'leCJiaBOBHa ~HHHCTPaTOp

6H3Hec-n;eHTpa 000 «AJihHHC CIT6>. E-mail: [email protected]

HKKOJiaee Hn&.s Cepreeeuq KaH,ZJ,H,a;aT cpHJIOJIOra'lecKHX HayK, ,11;0-

u;eHT Kacpe;:q>bl MaTeMaTH'lecKOH JIHHrBHCTHKH cpHJIOJIOrH'leCKOC'O cpaKyJib­TeTa CIT6rY E-mail: [email protected]

270

LA FILOLÓGICA POR LA CAUSA

Page 18: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Por03HHa EneHa AHAJ>8eBHa cTapmHH: npenO,ZJ;a.BaTeJib KacPe,ZJ;phI MaTe­MaTH'iecKoit JIHHrBHCTHKH cPHJIOJIOI'H'lecKoro cPaKYJibTeTa CII6fY E-mail: [email protected]

Py6uHep BHKTOpHH HropeBHa acnHpa.HTKa, acc:acreHT .Ka<i>eAPhl Ma­TeMaTH'IecKott JIHHI'BHCTHKH cPHJIOJIOrH'iecKoro cPaKYJibTeTa Cil6fY E-mail: viki [email protected]

CKpe6~oBa TaTb.HHa reoprHeBHa ~aT cPHJIOJIOrH'IecKHX HayK, AO~eHT Kact>eAPhl MaTeMaTl'r'iecKOH JIHHI'BHCTHKH cPHJIOJIOrH'iecKoro cPaKYJilr TeTa CII6fY. E-mail: [email protected]

Cooa (AKceuoBa) JII06oB& 3HHOB&eoua - ,ZJ;OKTOp cPHJIOJIOrH'l:ecKHX Ha­yK, ,ZJ;OKTOP cPHJIOCocPHH (fepMaHH.a). E-mail: [email protected]

¢1HJIHnnoB AH,!q>eA KoHCTaHTHHOBH'I - acIIHJ>a.HT Ka4>eAPhl MaTeMa­TH'lecKoli JillHI'BHCTHKH cPHJIOJIOrH'iecKoro cPaKYJibTeTa CII6fY. E-mail: [email protected]

Xoxnooa MapH.H BJia.D;HMHpooHa - acnHpa.ttTKa J<a4>eAPhl MaTeMa­TH'iecKo:ii mrnrB!ICTHKH cPHJIOJIOrH'iecKoro cPaKYJibTeTa CII6fY. E-mail: [email protected]

qe6affoB CepreH: BHKTOpoBH"I - ,n;oKrop cPHJIOJIOI'H'lecKHX HayK, npo­¢eccop Ka¢eAPhl MaTeMaTH'lecKOR JIHHI'BHCTHKH cPHJIOJIOrH'iecKoro <i>a.KYJib­TeTa CII6fY, npo¢eccop Ka¢eAPhl TeOpeTH'IecKoii :a npHKJI~oA JIHHrBH­CTHKH BaJITHHCKoro roe. TexH. yH-Ta «Boemfex HM. ,IJ.. <t>. YCTHHooa~. E-mail: [email protected]

~eopcKlUI MapH.H BJIRAHMHpoeHa - acnHpa.HTKa KacPeAPhl MaTeMa­TH'IecKo:ii JIBHrBHCTHKH cPHJIOJIOrH'iecKoro <}>a.KyJibTeTa CII6fY E-mail: [email protected]

LA FILOLÓGICA POR LA CAUSA

Page 19: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Information about Authors

Gerd Alexandre S. Doctor of Sciences in Linguistics, Professor. Director of the Department of Mathematical Linguistics, Phylological Faculty, St. Pe­tersburg State University.E-mail:[email protected]

Grinbaum Oleg N. - Doctor of Sciences in Linguistics, Professor. Depart­ment of Mathematical Linguistics, Phylological Faculty, St. Petersburg State University.E-mail:[email protected]

Dmitriev Alexandre V. -Assistant Professor, Department of Mathemati­cal Linguistics, Phylological Faculty, St. Petersburg State University. E-mail: [email protected]

Dobrov Alexey V. - PhD student, Department of Mathematical Lin­guistics, Phylological Faculty, St. Petersburg State University. E-mail: [email protected]

Zakharov Victor P. Candidate of Sciences in Linguistics, associate pr<>­fessor. Department of Mathematical Linguistics, Phylological Faculty, St. Pe­tersburg State University. E-mail: [email protected]

Zoobkoba Tatiana I. - Candidate of Sciences in Linguistics, as&>eiate pr<>­fessor. Department of Mathematical Linguistics, Phylological Faculty, St. Pe­tersburg State University. E-mail: [email protected]

Martynenko Grigorij Ya. Doctor of Sciences in Linguistics, Professor. Department of Mathematical Linguistics, Phylological Faculty, St. Peters­burg State University.E-mail:[email protected]

Mitrofanova Olga A. - Candidate of Sciences in Linguistics, associate pf(}­fusror. Department of Mathematical Linguistics, Phylological Faculty, St. Pe­tersburg State University. E-mail: [email protected]

Nemceva (Plakhotya) Valentina V. Administrator of the cAlliance• Co Ltd.E-mail:[email protected]

Nikolaev Ilya S. - Candidate of Sciences in Linguistics, associate profes­sor. Department of Mathematical Linguistics, Phylological Faculty, St. Pe­tersburg State University. E-mail: [email protected]

272

LA FILOLÓGICA POR LA CAUSA

Page 20: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Rogozina Elena A. - Assistant Professor, Department of Mathematical Linguis~ics, Phylological Faculty, St. Petersburg State University. E-mail: [email protected]

Rubiner Victoria I. PhD student, assistant, Department of Mathemati­cal Linguistics, Phylological Faculty, St. Petersburg State University. E-mail : viki [email protected]

Skrebtsova Tatiana G. Candidate of Sciences in Linguistics, associate professor. Department of Mathematical Linguistics, Phylological Faculty, St. Petersburg State University. E-mail: [email protected]

Sova {Aksenova) Lubov Z. Doctor of Sciences in Linguistics, Professor. PhD (Germany) E-mail: [email protected]

Filippov Andrey K. - PhD student, Department of Mathematical Lin­guistics, Phylological Faculty, St. Petersburg State University. E-mail: [email protected]

Khokhlova Maria V. - PhD student, Department of Mathematical Lin­guistics, Phylological Faculty, St. Petersburg State University E-mail: [email protected]

Chebanov Sergey V. - Doctor of Sciences in Linguistics, Professor. De­partment of Mathematical Linguistics, Phylological Faculty, St. Petersburg State University. E-mail: [email protected] ·

Yavorskaya Maria V. - PhD student, Department of Mathematical Lin­guistics, Phylological Faculty, St. Petersburg State University. E-mail: [email protected]

LA FILOLÓGICA POR LA CAUSA

Page 21: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

CO.l(EP)KAHHE

Mapm'bi'He'H?W r. JI. MareMaTHKa rapMOHHH B ryMa.HHTapHbIX Ha)'Kax H HC-KYCCTBe ... .. . .. . . .. . . . . . . . . .. . . ... .. .. .. .. • .. .. .. .. . .. .. .. .. .. . 3

I'p1.m6ayM 0 . H. CTPYKTYPHO-p;HHaMH'l'.ecKHe B3aHMOCBH3H pHTMa H CMbICJia B nHCbMe TaTbHHbI JlapHHol!: . . . . . . . . . . . . . . . . . . 23

Citpe6v,oea T I' rpaMMaTm<a KOHCTp~l!: K8.K JIHHrBHCTH'iecKa.s! Teo-

~ · M Coea JI. 3. 9eomo~ rpaMMaTH'iecKoro CTJXlH B H3bII<ax pa3.TlH'lHblX TmIOB 46 HeMv,eea (Il.11axom.11.) B. B., Lfe6a'H06 C. B. lIHHTBOCTaTHCTH'IOCKHe rrocJiep;-

CTBIDI op<Porpa.¢H'lecKol!: pecPopMbI 1918 r. . • . . . . . . . . 60 Mapm'bittett?Co I' H., T.:/e6attoe C. B. CeMHOTHKa <l>yr6oJibHol!: CTaTHCTHKH. 91 3y6?Wea T. H . <l>opMHpoB&me H3bIKOBol!: cnoco6HocTH: KOrHHTHBHa.s! TO'iKa

3peHIDI • . . . • . . • • . . . . . . • . . . . . . . . . . . . • . 103 I'epa A. C. HecKOJibKO CJIOB o corzyioJIHHrBHCTm<e Ka.I< HarrpaeJieHHH npH-

KJia,IJ;Horo H3bIK03Ha.HIDI 113 <I>wn.mnoe A. K. CTpyKrypa paMOK ea.neHTHOCTel!: JICr rJiarOJIOB noJio:>Ke-

HIDI B npoc:rpa.HCTBe H JICr rJiaroJIOB MblllIJieHIDI . . . . . . . . . . . . . . . . . . 117 Heopc1WJ1. M. B. IloJIHMO.a;aJibHOCTb B acneKTe nep~enTHBHblX np1rnaraTen1>-

HblX H cornnepoHHMIDI npH ee npep;C'raBJ!eHHH B RussNet 138 Mumpoefja'HOea 0. A., I'pa'l.mea M . A., lIIUMopv.'H.<l A. C. ABTOMaTH'le­

CKaH KJiaccH<l>m<arzylH JieKCHKH B napaJIJieJibHbIX TeKCTax (Ha Marepua­Jie pyCCKOH3bI'lHb!X TeKCTOB A . c. rpuHa H me nepeso..o;os Ha CJIOB~ H3bIK) 161

3axapoe B. IT. J1orHKo-ceMaHTH'lecKoe Mop;enHpoBa.HHe JieKCHKH 3a1Ipocos s .z:IOKYM€HTaJibHblX HTIC 17 4

Py6uttep B. H. :>i<a.HpOBbie KJiaccu<l>uKalzy!H HHTepHer-cTpaHH~. 188 XoXAOea M. B . 11cCJiep;o:srume coqeraeMOCTH u ycTOl!:'iuBOCTH JieKcuqecKHX

e..o;u~ aBTOMaTH'lecKHMH MeTO)l;aMH. 206 Ao6poe A. B. Te:x:a:oJionm HHTeJIJieKT)'aJibHoro no11cKa H cnoco6bl o~eHKH

HX 3<l><PeKTHBHOCTH 219 Hu1C0.11aee H. C. 11cCJiep;osaTe.11bCKaH 6a3a .z:la.HHbIX no MoP<l>oJionm H:>KOp-

CKHX 3IIH'lecKHX neceH: TepMHHOJJOrm1, MOp;eJIH H peaJIH3arzyiH 233 Po2osu.tta E. A. Kopnyc arnorpa.¢H'lecKHX TeKCTOB CKAT: CIO)l(!ITH8Ji cxeMa

)f(HTHI!:, XML-pa3MeTKa u coop;aHue omaBJieHHI!: . 243 AMumpu.ee A. B. TonOHHMH'lecKHe ucCJiep;oBBJIHH H 3JieKTpoHHble KOJUieK-~ TOIIOHHMOB illBe~H . 250

I'epa A. C. Jles JlbBOBH'I ByJiaHHH, KaKHM s ero 3HaJI H noMmo (113 MaTe­p11a.noe no HCTOpHH Ka.¢e..o;pbI M8.TeMaTHqecKotl: JIHHrBHCTHKH Ca.HKT-Ilerep6yprc1<oro rocy.a;apCTBeHHoro YffHBepcHTeTa.) 262

Cse.a;eHIDI o6 aeropax 270 Information a.bout Authors 272

274

LA FILOLÓGICA POR LA CAUSA

Page 22: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

CONTENTS

Martynenko G. Ya. Mathematics of Harmony in Arts and Humanities Grinbaum 0. N. Structural and Dynamic Interrelations between Rhythm

and Sense in the Letter of Tatiana Larina Skrebtsova T G. Construction Grammar as a Linguistic Theory Sova L. Z. Grammatical Structure Evolution in the Languages of Different

Types Nemtseva V V {Plakhotya), Chebanov S. V Linguostatistical Consequences

of Spelling Reform of 1918. . Martynenko G. Ya., Chebanov S. V Semiotics of Football Statistics Zoobkova T I. Language Acquisition: A Cognitive Approach Gerd A. S. A Few Words about Sociolinguistics as a Branch of Applied

Linguistics Filippov A. K. Valency Frames Structure of the Verbs of Spatial Location

and the Verbs of Mental Activity Yavorskaya M. V Polymodality from the Aspect of Perceptive Adjectives

and Cohyperonymy in the Russian Database RussNet .. Mitrofanova 0. A., Grochkova M. A., Shimorina A . S. Automatic Word

Classification in Parallel Texts (Case Study of Russian Original Texts and Their Slovak Translations)

Zakharov V P Logical and Semantic Modeling of Query Vocabulary in Do­cumentary Search Engines

Rubiner V I. Genre Classifications of Web Pages . . . . . . . . . . . . . . . . . . . . .... Khokhlova M. V Studying Compatibility and Stability of Lexical Units by

Means of Automatic Methods . . . . . . . . . ... . Dobrov A . V Technonogies of Intellectual Information Retrieval and Tech­

niques Evaluating Their Effectivenes . . . . . . . . . Nilolaev I. S. Izhorian Epic Songs Morphology Research Database: Termi­

nology, Models and Implementation Rogozina E. A. cSKAT• Hagiographic Texts Corpus - Content Structure of

the Texts and Its xml-Encoding for Tables of Contents Dmitriev A. V Researches on Toponymy and Electronic Collections of Place-

Names in Sweden).... . . . . . . . . . . . . ....... . Gerd A. S. L. L. Bulanin as I Knew and Remember Him Information about Authors. . . . . . . . . . .

3

23 34

46

60 91

103

113

117

138

161

174 188

206

219

233

243

250 262 272

LA FILOLÓGICA POR LA CAUSA

Page 23: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

Hay'iHOe H3,D;aHHe

CTPYKTYPHASI H IIPHKJIA,ZJ;HA.SI JIHHTBHCTHKA

Me:>tCayaoac1"Uil c6opxuit

BbIIIyCK 8

Pe'Aa.KTOp /I. A . Kapnoea 06no:>KKa xy'AO)KHHJ<a E. A. Cwioet>eecri.i

BepcrKa H. M. BMoeoil

fIO'AJ!HCaHO B ne'13Tb 19.04.2010. cI>opMaT 60x84 1 /i6 ·

BYMara o<PceTHa.sr. fle'laTb ocPceTHa.H. .JL'l'J YCJI. ne'1. JI. 16,04. TepaJK 200 3K3. 3a.Ka3 N• 1 r:Jf{.'

113,n;a-re..TuCTBO CIT6rY. 199004, C.-Ilerep6ypr, B. 0., 6-H .1IHHH.ff, 11/ 21

Ten. (812) 328-96-17; cpaKc (812) 328-44-22 E-mail: [email protected]

www.unipress.ru

Ilo eonpocaM pea.JIHJa.1.µm o6pam;aTbCH no a.zwecy: C.-ITerep6ypr, B . 0., 6-H JIHHWI, 'A· 11/21, K. 21

TenecpoHbl: 328-77-63, 325-31-76 E-mail: [email protected]

THIIorpa<PeH 113,n;a.TenbCTaa CIT6rY 199061, C.-Ile-rep6ypr, CpeAJIID% np., 41

LA FILOLÓGICA POR LA CAUSA

Page 24: АВТОМАТИЧЕСКАЯ КЛАССИФИКАЦИЯ ЛЕКСИКИ В ПАРАЛЕЛЛЬНЫХ ТЕКСТАХ (НА МАТЕРИАЛЕ РУССКОЯЗЫЧНЫХ ТЕКСТОВ

1113,QATEnbCTBO C.-nETEP6YPrCKOro YHV1BEPCV1TETA

CD f'-. N

I

c:i ...... 0 N

a:i c: :ii ID

~ :s: t3 :s: m L.. :I: :s: c:; er:; co :I:

~ ~ :s: a. c: :s: er:; co :I: a.

~ >. a. 1-(.)

c:i 0 "<t N

I N 0 N 0

z Cl)

~

LA FILOLÓGICA POR LA CAUSA