Hilofumi Yamamoto Tokyo Institute of Technology Bor Hodoˇ sˇ cek Osaka University A Study on the Distribution of Cooccurrence Weight Patterns of Classical Japanese Poetry Introduction ← more generic more specific → • P: Hairball effect or spoke effect (Yamamoto 2005) • P: Difficult to observe all word adjacency features. • O: Checking the distribution of cooccurrence weight (Yamamoto 2006) • G: Drawing clear figures and extracting function words. • S: Cooccurrence weight is like the mean value of two person’s weights. light heavy C. F. Gauss (1777–1855) x σ 0 -σ y Tall Head Tall Head Tall Head ⑦ Long Tail Dinosaur Long Tail Dinosaur Long Tail Dinosaur Content Tail Function Tail Methods Material: the Hachidaish¯ u (ca. 905–1205) Calculation of Cooccurrence Weight: cw w(t, d) = (1+log tf (t, d)) · idf (t) cw(t 1 ,t 2 ,d) = (1+log ctf (t 1 ,t 2 ,d)) · cidf (t 1 ,t 2 ) cidf (t 1 ,t 2 ) = √ idf (t 1 ) · idf (t 2 ) idf (t) = log N df (t) Bell curve ⑦ Distribution of cw becomes Bell curve. ring!ring! • Over σ = ⇒ Content Tail. • Under -σ ⇒ Function Tail. Result -2 -1 0 1 2 3 0 200 400 600 800 1000 z-value frequency Sakura (cherry) -2 -1 0 1 2 3 4 0 200 400 600 800 1000 1200 1400 1600 z-value frequency Ume (plum) Figure 1: Bell curves Table 1: Upper cutoff patterns of ame (sakura): cw = co-occurrence weight; z = z-value (normalized value of frequency). word annotations: ari(be), ba(cond.), ha(topic.), hana(flower), hito(human), keri(past.), ki(past.), koso(emphatic.), miru(see), mo (also), nasi(no exist), nu(neg.), o(obj.), omou(think), ramu(aux.will), su(do), te(p.), to(and), ware(we), zo(emphatic.), zu(neg.) cw z pattern cw z pattern cw z pattern 1 0.62 -0.91 mo–keri 11 0.59 -0.96 nasi–ha 21 0.52 -1.05 nu–o 2 0.62 -0.92 hana–o 12 0.57 -0.98 o–ramu 22 0.52 -1.05 o–zo 3 0.62 -0.92 o–koso 13 0.57 -0.98 mo–ramu 23 0.52 -1.05 miru–o 4 0.60 -0.94 zu–keri 14 0.57 -0.98 ha–ki 24 0.48 -1.09 ba–mo 5 0.60 -0.94 su–ha 15 0.56 -1.00 zu–mo 25 0.48 -1.09 o–keri 6 0.60 -0.94 to–ba 16 0.56 -1.00 o–te 26 0.43 -1.16 zu–ha 7 0.59 -0.96 ari–ha 17 0.55 -1.01 hito–mo 27 0.43 -1.16 to–o 8 0.59 -0.96 ari–mo 18 0.54 -1.02 zu–te 28 0.43 -1.16 te–ha 9 0.59 -0.96 ware–mo 19 0.52 -1.05 zo–ha 29 0.34 -1.27 o–ha 10 0.59 -0.96 nasi–o 20 0.52 -1.05 omou–o 30 0.34 -1.27 o–mo well as in classical texts (Fig. 2). Therefore we will attempt to divide t Table 2: Lower cutoff patterns of ame (sakura) in Kokinsh¯ u: 30 out of 164 patterns extracted; cw = co-occurrence weight; z = z-value (normalized value of fre- quency) word annotations: ba(cond.), bakari(only), besi(should be), chiru(fall), fukakusa(deepgreen), hana(flower), isa(already), kakusu(hide), katu(win), koku(pull), komoru(go deep inside), magiru(mix), makasu(entrust), maku(wind up), manimani(as it is), masi(as), mazu(mix), me(eye), minami(south), miyako(city), mono(thing), nagara(even if), sakura(cherry), si(emphasic.), sumi(black ink), tatu(start,stand), tazumu(being around), tu(past.), uturou(change), watasu(give), yamakaze(mountain wind), yamu(stop), yanagi(willow), yononaka(world) cw z pattern cw z pattern 1 3.86 3.18 yamu–manimani 106 2.38 1.31 si–fukakusa 2 3.75 3.04 minami–magiru 107 2.38 1.31 sakura–hana 3 3.67 2.93 minami–maku 108 2.38 1.31 sakura–isa 4 3.61 2.86 maku–magiru 109 2.38 1.31 sakura–ba 5 3.42 2.62 yanagi–koku 110 2.38 1.30 sakura–me 6 3.38 2.57 yamu–makasu — 7 3.38 2.56 mazu–koku 155 2.17 1.04 chiru–katu 8 3.27 2.43 yanagi–mazu 156 2.17 1.04 bakari–sumi 9 3.26 2.42 sakura–yamu 157 2.16 1.03 maku–besi 10 3.25 2.40 minami–yamakaze 158 2.16 1.03 tatu–maku – 159 2.16 1.03 tatu–tazumu 101 2.40 1.33 uturou–komoru 160 2.16 1.03 tazumu–tu 102 2.40 1.33 sakura–watasu 161 2.16 1.03 miyako–sakura 103 2.40 1.33 katu–nagara 162 2.16 1.02 kakusu–si 104 2.39 1.32 sakura–masi 163 2.14 1.00 yononaka–sakura 105 2.39 1.31 sakura–makasu 164 2.14 1.00 mono–sakura Conclusion omou (think.1) B1 VERB a3 LK a1 a2 DN gamma b1 b2 END omou (think) suru (do.2) 1 haseru (run) 1 (ra)reru (PASS/POT) 5 te/de (LK) 24 ni (LK) 1 ta (PERF) 5 no/koto (DN) 38 node (because.2) 1 hodo (as much as..) 2 yo (FP) 7 wa (FP) 13 . 37 u (CJR) 1 torawareru (be captured) 1 1 1 2 1 1 iru (be-AS) 23 kuru (toward-AS) 1 koi (love.2) 1 fuan (anxiety) 1 1 15 ka (Q) 1 2 5 5 1 5 da (ASSER) 34 4 sa (FP) 12 5 kara (because.1) 1 taisetsu (precious) 1 1 1 6 1 14 20 Figure 2: Construction of the predicate of omou (think) with Function Tail cherry blossom (28/131,4.28): CT ctf.>0.00; non-dist=off; idf=off; pruned under U:1: here way home 1 village.1 1 lodging 1 scatter.1 1 forget 1 confuse 1 do.1 1 (adv.neg) 1 (must) 1 while 2 ã® 2 ã ¦ 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 2 1 1 2 1 10 4 2 8 15 8 perplexed 1 (adv.will) 4 forcibly 1 ã ¨ 9 ã ´ 6 even 3 see.3 10 ã 11 ã ã 2 get used 1 know 1 say 2 (p.wish) 6 ã ¯ 7 bloom 1 ç ¡ã ˙ 5 (adv.would) 1 ã ˛ 4 break off 1 ã 1 ã 2 ã `ã 1 increase 1 love.1 1 will not 1 kurabu.pl 1 ã 2 cloud.vi 1 such as.1 1 ã ªã 2 wait 2 ã ° 8 ã 3 borrow 1 regrettable 1 here we go 1 like.2 1 blow 2 å¦ ä½¯ã « 1 regret 1 bear 1 think 1 fast/early 1 ã ˙ 1 (ku) 3 lovely 1 sorrowful.1 1 1 1 1 1 2 2 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 6 3 2 2 1 1 2 (aux.past) 1 (aux.perf) 1 2 1 8 4 11 8 4 1 2 1 3 1 1 2 11 3 3 6 1 1 3 3 (suffix.forbid) 1 1 painful.1 1 high.1 2 (emph) 1 wish 1 1 (interj.) 2 Yoshino.PN 1 seemed 1 3 2 1 1 5 1 quiet 2 although 1 2 1 1 2 2 2 1 2 1 1 15 7 1 7 7 4 㠤㠤 1 seem 1 1 5 2 1 1 1 1 5 1 4 2 ã ˆã ˝ 2 2 1 1 6 2 1 11 1 7 7 2 2 3 2 2 2 1 2 2 1 1 2 2 3 3 1 1 4 2 2 1 2 1 4 1 1 either way 1 till 1 person 1 road 1 stop.vi.2 1 go 1 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 lodge.n 1 be.3 1 take 1 1 1 1 1 4 2 7 2 7 8 2 1 1 1 1 6 1 3 3 4 7 3 mountain village 1 1 3 1 2 mind 2 wind.n 1 mountain 2 desolate 1 come 1 2 be lonely 1 prosper 1 1 1 1 1 stone 1 waterfall 1 hand 1 1 run 1 1 stand.vi 1 1 cross 1 go over 1 send.v2 word 1 hear 1 1 1 present 1 tell 1 2 add 1 get tired.1 1 2 1 name 1 1 vain 1 rare.1 1 1 1 1 fall 1 1 spring rain 1 tear 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 2 1 1 exchange 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 2 1 2 2 1 1 1 see.1 1 1 1 1 1 1 2 6 5 4 1 3 2 4 1 1 1 3 1 2 1 1 1 1 1 2 4 8 1 4 5 2 3 2 1 2 1 1 1 2 1 1 2 1 1 1 3 2 3 1 1 2 1 what 1 1 2 thing.2 1 between 1 1 spring haze 1 hide.vt 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 4 1 1 2 1 1 1 1 2 1 2 1 myself 1 2 1 number 1 only 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 2 3 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 snow 1 1 1 1 such as.2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2 3 1 2 1 1 1 1 trail 2 1 change.1 1 transfer.1 1 1 1 2 1 1 1 1 7 1 1 7 5 6 6 2 2 1 10 1 3 5 2 3 1 1 3 1 trust.2 1 1 1 1 2 1 1 1 1 3 1 1 1 4 2 1 2 5 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 1 1 1 thing.1 1 1 1 spring 1 this year 1 ã ã 1 begin 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 year 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 white cloud 1 foot 1 pull 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 2 2 1 1 2 1 4 1 2 1 1 1 1 1 3 2 1 1 2 1 out 1 1 2 1 1 1 1 after 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 2 5 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 5 2 5 9 3 2 1 1 3 4 1 4 1 1 1 1 1 1 1 1 3 1 1 1 1 1 2 1 1 1 3 1 2 1 1 1 1 mi 2 2 1 1 1 1 1 2 1 1 1 1 day 1 1 1 2 1 cloud.1 1 3 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 3 3 1 1 1 1 3 2 1 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 2 3 3 3 1 1 2 1 2 1 2 1 1 1 1 1 1 1 1 wave 1 sky 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 5 9 1 2 2 2 3 4 3 1 2 1 1 1 1 1 2 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 6 1 3 2 3 3 2 2 1 5 3 1 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 4 2 4 4 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 for 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 remnant 1 1 2 1 1 1 1 1 water 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 4 1 2 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 mistake 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 gorge 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 every 1 3 1 1 2 1 1 1 1 1 1 1 1 3 1 2 1 1 1 1 1 3 1 2 1 1 1 1 1 old age 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 1 1 1 1 1 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 today 2 1 1 2 1 1 1 1 1 1 1 a glance 1 you.3 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 2 2 2 1 1 mountain area 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 2 1 2 1 3 3 1 2 1 2 1 1 1 1 2 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 remain 1 be over 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 1 1 1 1 2 1 1 1 1 2 1 2 3 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 2 1 person from old village 3 2 1 1 1 1 1 1 1 1 1 1 world.2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 colour 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Figure 3: Hairball effect cherry blossom (28/131,4.28): CT ctf.>0.00; non-dist=off; idf=off; pruned under U:1: here way home 1 lodging 1 confuse 1 1 1 village.1 1 scatter.1 1 forget 1 (must) 1 1 cloud.vi 1 such as.1 1 either way road 1 stop.vi.2 1 perplexed 1 forcibly 1 1 1 1 1 2 exchange 1 1 1 1 1 what hide.vt 1 out after 1 mountain village 1 1 mi cloud.1 1 mountain 3 desolate 1 prosper 1 painful.1 1 high.1 3 seem 1 1 break off 1 1 3 trail 2 1 1 1 (suffix.forbid) 1 be lonely 1 (emph) 1 1 1 1 1 1 1 1 1 for stone 1 waterfall 1 hand 1 run 1 wish 1 1 1 1 1 1 1 1 1 1 1 1 present 1 tell 1 1 1 come 1 1 1 remnant wave 1 1 1 2 1 kurabu.pl 1 1 (p.wish) 6 see.3 10 even 3 (ku) 3 lovely 1 fast/early 1 between 1 2 spring haze 1 send.v2 word 1 1 1 2 transfer.1 1 2 get tired.1 1 Yoshino.PN 1 gorge white cloud 1 foot 1 pull 1 bloom 1 1 seemed 1 1 1 1 1 1 1 1 1 such as.2 1 quiet 1 1 1 1 1 every 1 1 1 1 1 (adv.will) 2 old age 1 1 1 1 say 1 1 1 1 1 1 1 1 1 1 spring add 1 1 (aux.past) 1 (aux.perf) 1 year 1 rare.1 1 day 1 1 this year begin 1 get used 1 1 1 1 1 1 1 today a glance 1 2 1 you.3 1 2 wait 1 mountain area mistake 1 1 1 1 1 number 1 till 1 borrow 1 like.2 1 1 regrettable 1 here we go 1 1 1 1 1 only 1 1 myself 1 1 1 1 1 1 1 1 1 person from old village 3 1 1 1 lodge.n 1 world.2 1 sorrowful.1 1 name 1 vain 1 1 wind.n 3 1 1 1 spring rain regret 1 snow 1 6 remain 1 be over 1 take 1 1 1 1 1 1 1 1 1 1 1 1 1 change.1 1 cross 1 go over 1 1 increase 1 bear 1 think 1 1 love.1 1 blow 1 trust.2 1 1 will not 1 Figure 4: Only with Content Tail Conclusion 1) the distribution of classical texts fits a Gaussian (Bell) curve as well as in modern texts (Hodoˇ sˇ cek and Yamamoto 2013); 2) the cw value can separate patterns into three layers (low-, mid-, and high-range) using inflection points (-1σ and 1σ); 3) of the three layers, the high-range could be extracted without a list of stop words; 4) the mid-range lexical layer might include mathematical traits not yet revealed in the present study. Reference • Yamamoto, H. (2005), Visualisation of the construction of poetic vocabulary using the database of the Kokinsh¯ u ., Jinbun kagaku to d¯ etab¯ esu (Humanities and Database) the 11th symposium, 81–8, The council of humanities and database. • Yamamoto, H. (2006), Extraction and Visualisation of the Connotation of Classical Japanese Poetic Vocabulary, Symposium for Computer and Humanities, vol. 2006, 21–28, The information processing society of Japan. • Hodoˇ sˇ cek, B. and H. Yamamoto (2013) “Analysis and Application of Midrange Terms of Modern Japanese”, in Computer and Hu- manities 2013 Symposium Proceedings, No. 4, pp. 21–26.