ASIALEX 2005 Lexical association network maps for basic Japanese vocabulary Terry Joyce Institute of Technology Tokyo [email protected]Abstract Extending psycholinguistic research into the lexical representation of two-kanji compound words within the Japanese mental lexicon (Joyce, 2002, 2004), this paper reports on a large-scale word association survey for basic Japanese vocabulary. The database of word association norms, which is being compiled from various survey formats including a web-based version of the survey, supplements existing databases concerning the lexical features of Japanese vocabulary (Amano & Kondo, 1999; Yokoy m a a, Sasahara, Nozaki & Long, 1998), such as familiarity ratings and frequency counts, which are essential for cognitive science research. A particularly promising application of the word association norms data, however, is the creation of lexical association network maps that capture important proprieties of words and their interconnectivity. These maps complement other approaches that attempt to tap into aspects of lexical knowledge, such as WordNet, thesauri, ontologies, and collocation data, while avoiding some of their problems. There are also direct and interesting lexicographical and Japanese language learning applications of the 114
7
Embed
Lexical association network maps for basic Japanese vocabulary.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ASIALEX 2005
Lexical association network maps for basic Japanese vocabulary Terry Joyce
Extending psycholinguistic research into the lexical representation of two-kanji compound words within the Japanese mental lexicon (Joyce, 2002, 2004), this paper reports on a large-scale word association survey for basic Japanese vocabulary. The database of word association norms, which is being compiled from various survey formats including a web-based version of the survey, supplements existing databases concerning the lexical features of Japanese vocabulary (Amano & Kondo, 1999; Yokoy ma a, Sasahara, Nozaki & Long, 1998), such as familiarity ratings and frequency counts, which are essential for cognitive science research. A particularly promising application of the word association norms data, however, is the creation of lexical association network maps that capture important proprieties of words and their interconnectivity. These maps complement other approaches that attempt to tap into aspects of lexical knowledge, such as WordNet, thesauri, ontologies, and collocation data, while avoiding some of their problems. There are also direct and interesting lexicographical and Japanese language learning applications of the
114
WORDS IN ASIAN CULTURAL CONTEXTS
database of word association norms and the lexical association maps. In addition to prov in
1968) and, on the other hand, notions of activation spreading through lexical networks (e.g., Collins & Loftus, 1975), much atten on has been devoted to investigating the rich networks of word associations that connect related words together. For instance, Nelson and McEvoy (2003) have recen uence of word associations on cognition, showing that differences in the associative structures of known words—in terms of associate set size, resonance (or backward association), and the connectivity within an associate set—effect performance on memory tasks.
This paper reports on a project to compile a large-scale database of word association norms for basic Japanese vocabulary and to utilize the data to create lexical association network maps that capture important properties of words and their connectivity. After briefly noting some psycholinguistic research into the representation of two-k 004), which highlights the need for comprehensive word association data for cognitive sciencesurvey
ersed order o
he importance of verbal information. In expe
id g more user-friendly means of consulting the lexical entries of electronic dictionaries, the inclusion of the word associative normative data within the lexical entry will also greatly enrich the variety of lexical information. Given the importance of contextual relevance for second language learning, the lexical associative networks also open up extremely effective learning strategies for Japanese language education.
1. Introduction Many areas of cognitive science, such as psychology, artificial intelligence, as
well as natural language processing and computational linguistics, are contributing to our understanding of the mental lexicon and the representation of lexical knowledge. Reflecting, on the one hand, the conviction that associative processes are basic mechanisms underlying cognition (e.g., Cramer,
ti
tly demonstrated the infl
anji compound words within the Japanese mental lexicon (Joyce, 2002, 2
experimentation, the paper outlines the compilation and coding of an initial corpus and data-collection work. Finally, the paper touches on utilizing the
normative data to create lexical association network maps for basic Japanese vocabulary,singling out two promising applications of the maps: namely, as an approach to modeling the semantic representations of connectionist models and in the areas of Japanese lexicography and language learning.
2. The Japanese lemma-unit model (Joyce 2002, 2004) Compounding is an extremely productive word-formation process in Japanese,
making the language particularly important for examining the extent of morphological involvement in the organization of the mental lexicon. In that context, Joyce (2002, 2004) has examined the lexical representation and retrieval of two-kanji compound words by conducting constituent-morpheme priming experiments—comparing the facilitation from component kanji in the lexical decision task—with various kinds of compound words, such as modifier + modified, verb + complement and the rev
f complement + verb, and synonymous pairs. The general results from the experiments are that both constituent prime
conditions facilitate lexical decisions and in the majority of cases at similar levels, indicating that morphology is important in the organization of the mental lexicon. The research has also provided findings pointing to t
riments that manipulated the positional frequency of the verbal constituents in verb + complement and complement + verb compound words, reversed patterns of
115
ASIALEX 2005
priming were observed in high positional frequency conditions where the reaction times for verbal constituent conditions were significantly faster than in the respective complement conditions. To account for the lexical representation and retrieval of two–kanji compound words, Joyce (2002, 2004) has proposed adapting for the Japanese mental lexicon a version of the multilevel interactive-activation framework. A feature of the model is the incorporation of lemma-unit representations that mediate the connections between access representations (both orthographic and phonological form representations) and semantic representations, which is especially appealing for handling the complex nature of the Japanese writing system.
While this visual word recognition research indicates a central role for morphological information in the organization of the Japanese mental lexicon, clearly possible confounding factors, such as association effects, need to be examined further. However, because comprehensive Japanese word association data is currently not available, a central objective of the present research is to construct a large-scale database of word association norms for basic Japanese vocabulary.
3. Word association norms for Japanese
After noting some word association databases for English and Japanese, this section outlines the creation of the initial survey corpus and data-collection preparations.
While Moss and Older (1996) have collected between 40-50 responses for some 2,400 words of British English, Nelson, McEvoy and Schreiber (1998) have compiled the largest database for American English covering some 5,000 words with approximately 150 responses per item. In terms of Japanese language surveys, although the early survey by Umemoto (1969) is well-known and has responses from 1,000 university students, the word corpus is very small with only 210 words. More recently, Ishizaki (2004) has collected word associations for 1,656 nouns for use in building an associative concept dictionary. A drawback with this data, however, is the fact that response category was specified, so it tells us little about free association norms.
In creating an initial corpus of basic Japanese vocabulary for the word association survey, three primary reference sources were used. The first was the survey of basic vocabulary for Japanese language education conducted by the National Language Research Institute (1984), with 6,800 words including a core set of about 2,200 words. The second source was a more recent list of about 4,000 words prepared by Tamamura (2003), while the third reference was a handbook of Japanese orthography (Sanseidō Henshūjo, 1991) listing sanctioned readings for all Jōyō kanji. Based on comparisons of these sources, an initial survey corpus of 5,000 kanji and words has been created, and this will be expanded through the inclusion of associate responses as data collection proceeds.
The discrete free word association task is relatively straightforward—the respondent is simply asked to provide the first meaningfully-related word that comes to mind when presented with a stimulus word. However, a major concern for the present project in seeking to efficiently construct a large-scale database has been to devise an automatic method of generating multiple individual respondent survey lists from the survey corpus, while minimizing as far as practically possible the effects of intra-list association. Accordingly, much of the preparatory efforts have been devoted to coding the survey corpus with information relating to the following criteria: (1) Pronunciation;
116
WORDS IN ASIAN CULTURAL CONTEXTS
necessary due to the high incidence of homophones and to explore the influence of orthographic variants (i.e., kanji versus kana). (2) Orthographic type (i.e., single kanji, multi-kanji, and mixed kanji-kana words); to reduce respondent strategies, orthographic type is mixed in each survey list. (3) Component kanji codes; to ensure that a given
anji only appears once in a survey list. (4) Semantic category codes based on the NTT thesaurus (Ikehara, et al, 1999); to ensure that survey lists do not contain multiple
emantic category. (5) Unique ID code; assigned ID codes will
hich should be shortly). However, data collection is already underway employing traditional
nnaires. For the first block of data, a random sample of 2,000 items was
seco ering the
and
asso al elson
WordNet (Fellbaum Thus, the structure in the network emergesthan being theory-driven or constrained by s ecific relationships, which are concerns for associative concept dictionaries and ontologies where the hierarchical structure is pre-defined. The use of free associations also avoids domain biases that are an issue
k
words from a given sserve as additional measure to eliminate intra-list associations as data collection proceeds.
In order to obtain the large-scale quantities of responses required for the atabase, a web-based format of the survey is being developed (wd
availablepaper questiotaken from the corpus, and approximately 100,000 responses were collected from about 1,000 undergraduates (100 items per questionnaire). While the questionnaire format involves data inputting burdens, it is important for addressing reliability issues. A
nd block of data is currently being collected with questionnaires covremaining items in the survey corpus.
4. Association data applications
The large-scale database of free word association norms for basic Japanese vocabulary will be a valuable resource for cognitive science research, such as memory
visual word recognition experiments. The database of association norms also makes it possible to create lexical association network maps as a means of representing
ciate sets and their connectivity. Figure 1 illustrates the basic concept of lexicassociation network maps, with an example for the English word ‘planet’ based Nand McEvoy (2003). In addition to representing associate set size, the maps can highlight differences in association strengths, both forward and backwards (which are independent), as well as the association density of the associate set.
The lexical association network maps can complement other approaches to uring aspects of lexical knowledge, while avoiding some ocapt f their problems. For
example, unlike the lexicographer’s expertise required to define synonym sets in , 1998), the maps are based on the free responses from respondents.
from tapping into free associations, rather p
VENUS
UNIVERSE
STAR
SPACE
EARTH
SATURN
MOON
MARS
PLUTO
PLANET
association network maps (based on Nelson
Figure 1. Basic concept of lexical
& McEvoy, 2003)
117
ASIALEX 2005
118
44
寒い・さむい 雪
15
6
6
4
2
夏
冬至
白・白い
氷
冬将軍
切ない
越冬 冬眠
北
くま
休息
休み
かまくら
こたつ
春
2
2
2 2
2
2
2
2
2
2
2
冬
15
お金・
10
8
6
人
切手収集
捨てる
ゴミ
集合
集会 趣味
コレクター
集まる
2 2
2
2
2
2
2
2
10
6
コレクション
6
4
4
4
4
収めるおち葉
ガラクタ
カン
コレクト
標本
密集
フィギュア
大人買い
2
2
集める
responses, followed by 夏 ‘summer’ and 冬至 ‘winter solstice’, both at 6 percent, and 白・白い ‘white’ at 4 percent. Thus, 冬 ‘winter’ has a relatively small set of core associates with one particularly strong associate.
Figure 2. Associate sets for 冬 ‘winter’ and 集める ‘gather, collect’
In contrast, the verb 集める ‘gather, collect’ has a larger set of core associates, naturally with weaker association strengths. The primary associate here is
for collocation data. Figur se words 冬
‘winter’ and 集める based on th ssociation responses collected so far. ures on th ions represent the percentage of re hows, ry strong primary associate with the word 寒い・さむい ‘cold’, which accounts for 44 percent of all
The second associate of 雪 ‘snow’ represents only 15 percent of the
お金・sponses. There are also two secondary
ses. Compa むい ‘cold’ and the nou
of associations that exist between words—an important aspect of lexical knowledge—there are promising applications for the lexical
ne such application is in modeling the semantic represe
e 2 resents two examples of associates sets, for the Japane ‘gather, collect’, e free word aThe enclosed fig e arrow connect
sponses. As the figure s 冬 ‘winter’ has a ve
responses.
but金 ‘money’ accounting for 15 percent of the reresponses at 10 percent; namely, 切手 ‘stamps’ and 収集 ‘collection’. Some of the remaining core associates are 人 ‘people’ (8%), 集合 ‘set’ (6%), ゴミ ‘rubbish, trash’ (6%), and コレクター ‘collector’ (6%). Consistent with their respective word classes of noun and verb, these two words exhibit different kinds of syntagmatic respon
red to the very strong association between the adjective 寒い・さn 冬 ‘winter’, more of the core responses for the verb 集める ‘gather, collect’
are nouns that could either occupy the direct object slot (i.e., 切手 ‘stamps’, 人 ‘people’, ゴミ ‘rubbish, trash’) or the subject slot (i.e., コレクター ‘collector’).
In capturing the networks
association network maps. Ontations within connectionist models of the Japanese mental lexicon, like the
Japanese lemma-unit model (Joyce, 2002, 2004). However, there are also direct and interesting lexicographical and Japanese
language learning applications of the word association database and the lexical
WORDS IN ASIAN CULTURAL CONTEXTS
association network maps. For instance, the inclusion of word association norms within the lexical entry would greatly enrich the variety of lexical information presented to the dictionary user. So, in addition to listing information relating to the entry word’s pronunciation, its definitions, inflectional or derivational forms, and idiomatic express ns, the dictionary could also provide word association information, which together with response frequency data, would assist in identifying for a given entry word the relative importance of different kinds of associative relations, such as synonyms, antonyms, and hyponyms, as well as common modifiers and complements, and various other relations.
upplementing electronic dictionaries with the word association normative data and the lexical association network maps could also support more user-friendly search functions (Zock & Bilac, 2004). In daily life, people regularly consult dictionaries to check the spelling of a known word (or the strokes of a kanji character) or to find out the mea
the tongue’ phenomenon (Brown & McNeill, 1966). Typically in th desired word from mental lexicon, they are often able to provide information about some of the word’s form features, such as its initial and/or final letters, or part of its pronunciation. Unfortunately, these are not the kinds of inform t be ed effectively to search ctronic es. Ho , in addition t g me of the target word’s form char ics, the i idual will also be aware of rious kinds of semantic information related to th get word, such as the assoc s the word has with other w In contrast to partial inf ab features, which is too impr any retrieval, known semantic information is potentially much If ssociation normative data and the lexical
network maps were incorporated into electronic dictionaries, then inputting ted word (either single or multiple word entry) could provide access to the
relevan
g is to instill in the learner lexical knowle
system is an electronic study notebook that would build into a personalized dictionary
io
S
ning of a newly encountered word. However, electronic dictionaries, and even thesauri organized primarily according to synonyms, are of little help to the user experiencing the common ‘tip of
is situation, while the individual is unable to retrieve the their
ation hat can usele dictionari wever o knowin so
acterist ndiv vae tar iative relation
ords. ormation out formecise to be ofmore useful.
value for the word a
associationan associa
t lexical association network map, from which the user could follow appropriate association links until the target word is identified.
Given the importance of contextual relevance for second language learning, the lexical associative networks also open up extremely effective learning strategies for Japanese language education. Memory researchers have long demonstrated that the categorization and semantic organization of stimulus materials have dramatic effects on retrieval performance (e.g., Bower, Clark, Winzenz, & Lesgold, 1969). Accordingly, the lexical association network maps for basic Japanese vocabulary could be a very useful resource in the context of teaching Japanese as a foreign language. To the extent that the goal of second language teachin
dge for the target language that approaches that of the native speaker, the lexical association network maps, depicting sets of associatively-related words based on the free word association responses of native Japanese speakers, represent a kind of authentic study material and a value reference source for the second language learner in judging the naturalness of lexical combinations.
In pursuing these Japanese lexicographical and language learning applications of the database of word association norms and the lexical association network maps, this research project is also working to create a comprehensive kanji database and integrated kanji instruction system. The key concept behind the integrated kanji instruction
119
ASIALEX 2005
by drawing on the database reference source through various learning assignments, ranging from basic ‘look-up tasks’ to more advanced ‘expansion tasks’ that would help the lea
This research project is part of the 21st Century COE Program ‘Framework for Systematization and Application of Large-scale Knowledge Resources’ (program leader; Professor Sadaoki Furui) at the Tokyo Institute of Technology, Japan.
References Amano, Shigeaki, and Kondō, Tadahisa (Eds.), (1999), Nihongo no goitokusei [Lexical properties of
Japanese] Vols. 1-6, NTT database series. Tokyo: Sanseidō. Bower, G. H., Clark, M. C., Winzenz, D., and Lesgold, A. (1969), ‘Hierarchical retrieval schemes in recall
of categorized word lists’, Journal of Verbal Learning and Verbal Behavior, 8, 323-343. Brown, R., and McNeill, D., (1966), The ‘tip of the tongue’ phenomenon. Journal of Verbal Learning and
Verbal Behavior, 5, 325-337. Collins, Allan. M., and Loftus, Elizabeth. F. (1975), ‘A spreading-activation theory of semantic
processing’, Psychological Review, 82, 407-428. Cramer, Phebe, (1968), Word association. New York & London: Academic Press. Fellbaum, Christiane (Ed.), (1998), WordNet: An electronic lexical database, Cambridge: MIT Press. Ikehara, Satoru, Miyazaki, Masahiro, Shirai, Satoshi, Yokoo, Akio, Nakaiwa, Hiromi, Ogura, Kentaro,
Ōyama, Yoshifumi, and Hayashi, Yoshihiko, (1999), Nihongo goi taikei [Goi-taikei-A Japanese lexicon] (CD-Rom), Tokyo: Iwanami Shoten.
Ishizaki, Shun, (2004), Rensō gainen jisho Version 1.0 CD. Joyce, Terry, (2002), ‘Constituent-morpheme priming: Implications from the morphology of two-kanji
compound words’, Japanese Psychological Research, Blackwell: Japan, pp. 79-90. Joyce, Terry, (2004), ‘M cal, orthographic and
phonological conside ychological Research, Vol
me no kihon goi chōsa, Tokyo: Shuei n.
las L., and McEvoy, Cathy L., (2003), ‘Implicitly activated memories: The missing links of rem
un CD-R yoru kanj ō [Elec nic newspape edia kanji: Kanji y list d on Asahi aper CD-R okyo: Sanseidō.
Zo l, an aven, , Word lo he basi associations rom an idea to a OL G2004 Work on Enhancin using elec nic dictionarie August, Geneva.
rner develop a deeper understanding of kanji structure and the morphological principles of compound words.
Acknowledgements
odeling the Japanese mental lexicon: Morphologirations’, In Serge P. Shohov (Ed.), Advances in Ps
ume 31, (pp. 27-61). Hauppauge, NY: Nova Science. Moss, Helen, and Older, Lianne, (1996), Birkbeck word association norms, Hove: Psychological Press. National Language Research Institute, (1984), Nihongo kyōiku no ta
ShuppaNelson, Doug
embering’, Fourth Tsukuba International Conference of Memory (Human learning and memory: Advances in theory and application), 11-13 January, Epochal Congress Center, Tsukuba, Japan.
Nelson, Douglas L., McEvoy, Cathy L., and Schreiber, Thomas A., (1998), The University of South Florida word association, rhyme, and word fragment norms, http://www.usf.edu/FreeAssociation.
Sanseidō Henshūjo, (1991), Atarashii kokugo hyōki handobukku (Dai yonhan), Tokyo: Sio, (2003), ‘Chūkyūyō goi: Kihon 4000 go’, Nihongo Kyōiku, Tokyo, pp. 5969), Rensō kijunhyō: Daigakusei 1000 nin no jiyū rensō ni yoru, Tokyo: T
Shupkoyama, Sh
nji: Asai., Sasa
nbhara, Hir yuki, Noz ironari, an
y. Eric, (1998 inbun den
no kafrequ
hi Shis base
OM niNewsp
i hindohOM]. T
tro r menc
ck, Michae d Bilac, Sl (2004) okup on t s of : Froadmap, C IN shop g and tro s,