Allomorphic responses in Serbian pseudo-nouns as a result of analogical learning Petar Milin a, c , Emmanuel Keuleers b , Dušica Filipović Đurñević a, c a Department of Psychology, University of Novi Sad, Serbia b Department of Experimental Psychology, Ghent University, Belgium c Laboratory for Experimental Psychology, University of Belgrade, Serbia Abstract: Allomorphy is a phenomenon that occurs in many languages. Several psycholinguistic studies have shown that allomorphy, if present, co-determines cognitive processing. In the present paper we discussed allomorphic variations of Serbian instrumental singular form of pseudo-nouns as emerging from analogical learning. We compared the predictions derived from memory-based language processing models with results from previous experimental study with adult Serbian native speakers. Results confirmed that production of suffix allomorphs in Serbian instrumental singular masculine nouns could be accounted for by memory-based learning, and simple analogical inferences. The present findings are in line with a growing body of research showing that memory-based learning models make relevant predictions about the cognitive processes involved in various linguistic phenomena. Keywords: allomorphy, memory-based learning, analogy, Wug-task, Serbian Introduction In this paper we will present a probabilistic computational model of allomorphy and demonstrate that allomorphic variation may arise from analogical learning of the mapping from stems to inflected forms. We will make use of behavioral experiments that were previously conducted with adult native speakers of Serbian engaged in a computerized Wug task (Jovanović, 2008; see Berko, 1958 for the original Wug task experiment). Looking at the two allomorphic forms of the instrumental singular of Serbian masculine pseudo-nouns, we will compare the performance of native speakers with the outcomes of several simulations using computational models of analogical learning.
22
Embed
Petar Milin a, c , Emmanuel Keuleers b , Dušica Filipovi ć ... · Petar Milin a, c, Emmanuel Keuleers b, Dušica Filipovi ć Đur ñevi ć a, c a Department of Psychology, University
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Allomorphic responses in Serbian pseudo-nouns as a result of analogical learning
Petar Milin a, c, Emmanuel Keuleers b, Dušica Filipović Đurñević a, c
a Department of Psychology, University of Novi Sad, Serbia b Department of Experimental Psychology, Ghent University, Belgium c Laboratory for Experimental Psychology, University of Belgrade, Serbia
Abstract: Allomorphy is a phenomenon that occurs in many languages. Several
psycholinguistic studies have shown that allomorphy, if present, co-determines
cognitive processing. In the present paper we discussed allomorphic variations of
Serbian instrumental singular form of pseudo-nouns as emerging from analogical
learning. We compared the predictions derived from memory-based language
processing models with results from previous experimental study with adult Serbian
native speakers. Results confirmed that production of suffix allomorphs in Serbian
instrumental singular masculine nouns could be accounted for by memory-based
learning, and simple analogical inferences. The present findings are in line with a
growing body of research showing that memory-based learning models make
relevant predictions about the cognitive processes involved in various linguistic
phenomena.
Keywords: allomorphy, memory-based learning, analogy, Wug-task, Serbian
Introduction
In this paper we will present a probabilistic computational model of allomorphy and
demonstrate that allomorphic variation may arise from analogical learning of the
mapping from stems to inflected forms. We will make use of behavioral experiments
that were previously conducted with adult native speakers of Serbian engaged in a
computerized Wug task (Jovanović, 2008; see Berko, 1958 for the original Wug task
experiment). Looking at the two allomorphic forms of the instrumental singular of
Serbian masculine pseudo-nouns, we will compare the performance of native
speakers with the outcomes of several simulations using computational models of
analogical learning.
The allomorphy represents a variation in the form of a particular morpheme, without
a change in its meaning (cf. Lieber, 1982; Lyons, 1986; Spencer, 2001 etc.). In
English, variations in the -ed morpheme used in the regular past tense, and the -s
morpheme used to mark noun plurals, are well known examples. The regular English
past tense suffix appears in three different forms (or morphs), depending on the final
sound of the verbal stem: walk-ed (/t/), jogg-ed (/d/), trott-ed (/əd/). In modern Arabic,
allomorphy occurs in an etymon – a bi-consonantal morphological unit that carries
semantic information of a given word (Ratcliffe, 1998; Boudelaa & Marslen-Wilson,
2001; Boudelaa & Marslen-Wilson, 2004). In Dutch, the diminutive suffix has two
frequent allomorphic variations (-tje and -je), and three less frequent ones (-etje, -pje
and -kje) (Daelemans, Berck & Gillis, 1997). In Finnish, allomorphy appears both in
the stem (Järvikivi & Niemi, 2002) and in suffixes (Järvikivi, Bertram & Niemi, 2006).
Similarly, in Hungarian, allomorphic variations occur as stem shortening or
lengthening (Pléh, Lukács & Racsmány, 2002), and as suffixal alternating vowels
(Kertész, 2003; Hayes & Cziráky-Londe, 2006). Finally, allomorphy is present in
Slavic languages as well. Affixal allomorphy in Russian is discussed in detail by
Blevins (2004), while Ivić (1990) and Zec (2006) provided linguistic analysis of the
suffix allomorphy in Serbian instrumental singular masculine and neuter nouns.
Allomorphy as a cognitive phenomenon
For cognitive science, and in particular for psycholinguistics, the main question of
any language phenomenon is its cognitive relevance. If a particular linguistic
phenomenon can also produce critical differences in behavioral and/or neurological
measures, then one can say that the linguistic phenomenon also has cognitive
relevance. Although often not of central interest, the cognitive relevance of
allomorphy has repeatedly been attested in behavioral research. Schreuder and
Baayen (1995) stated that we may be slower in processing words with affixes that
have several allomorphs, than words containing affixes for which there is no
allomorphic variation. Järvikivi, Bertram & Niemi (2006) made a similar, but more
detailed claim, using the concept of affixal salience – "the probability with which an
affix is likely to emerge from the orthographic/phonological string" (p. 395). They
showed that affixal salience decreases as the number of affixal allomorphs
increases. Conversely, however, to the inhibition that allomorphy produced to a
given affix, allomorphic realizations of bound nominal stems in Finnish significantly
primed the same noun in its base form – nominative singular (Järvikivi & Niemi,
2002a; Järvikivi & Niemi, 2002b). Similarly, in a priming task in Spanish, Allen &
Badecker (1999) found no difference between conditions in which the prime was a
true stem-homograph of the target (e.g., "placer" (pleasure, to please/inf./) – "placa"
(plate, panel)) and conditions in which the target was preceded by a stem allomorph
of the prime (e.g., "plazca" (to please/subjunctive 3 Pers. Sg./) – "placa" (plate,
panel)). Finally, specific difficulties in processing allomorphic variations in Hungarian
nouns were observed with normal children (Pléh, 1989), and with children with
Williams syndrome (Pléh, Lukács, & Racsmány, 2002).
One of the most common instances of allomorphy in Serbian is the suffix allomorphy
(-em vs. -om) occurring for instrumental (making use of) singular masculine nouns.
For instance, Serbian native speakers may be somewhat puzzled whether to say
"nos-om" or "nos-em" (using the nose), "malj-om" or "malj-em" (using an odor),
"obruč-om" or "obruč-em" (using a hoop), "pištolj-om" or "pištolj-em" (using a
revolver), and so on. Jovanović et al. (2008) directly addressed this form of
allomorphy using two experimental tasks. First, using a sentence completion task,
the authors confirmed that suffix allomorphy in Serbian instrumental singular
masculine nouns occurred only when a noun stem ended in a particular subset of
consonants: palato-alveolars or back coronals.1 Second, using a visual lexical
decision task, they showed that suffix allomorphy in Serbian masculine nouns, with
stem ending in back coronals, elicits significant differences in processing latencies:
for words with the -om suffix, an increase in observed form frequency was
associated with an increase in processing latency, while for the -em suffix, an
increase in form frequency was associated with a decrease in reaction time. This
interaction between a particular suffix realization (-om or -em) and its probability in
production task showed that even though -om is the most frequent suffix in the
Serbian instrumental singular, it is processed slower if encountered within the
phonological domain for which -em is preferred. Although such complimentarity
1 Different subset labels come from two means of consonant classification. Front coronals match
alveolars: n (/n/), l (/l/) and r (/r/) and include five additional consonants: t (/t/), d (/d/), s (/s/), z (/z/)
and c (/ts/). Back coronals match palato-alveolars: č (/tȓ/), ć (/tǥ/), dž (/dʒ/), ñ (/dȡ/), nj (/Ȃ/), lj (/Ȟ/), j
(/j/), š (/ȓ/) and ž (/ʒ/).
might suggest rule-based derivation of the two allomorphic forms, we will advocate
that this pattern can emerge from a more parsimonious learning principle.
Modeling allomorphic response as analogical learning
Jovanović and her collaborators (2008) discussed their results in respect to previous
findings of Mirković, Seidenberg & Joanisse (2009), who used a connectionist
network to model the production of Serbian case-inflected morphology. This model
used a training set of 3244 Serbian nouns, and learned to produce the correct case-
endings by developing particular probabilistic constraints at the level of phonology
and semantics. At the end of learning, the error rate for masculine instrumental
singular – our taget case, was still approximately 4%. However, the model excluded
the possibility of having both suffixes applied to the same stem with different
probabilities, but implemented a simple rule that attached either -om or -em to a
given stem. For instance, all masculine nouns with a stem ending in an alveolar or
palato-alveolar consonant, necessarily took the -em suffix, while all nouns with other
terminating consonants used -om instead (Mirković et al., 2009).
In contrast, a study by Zec (2005) showed that masculine noun stems ending with a
coronal can, and usually do have allomorphic realizations in the instrumental
singular: both -om and -em can apply. Jovanović and colleagues (2008) and
Jovanović (2008) confirmed the analysis of Zec (2005), both in a lexical decision task
and in a computerized modification of the Wug task (Berko, 1958), administered to
adult native speakers of Serbian. More specifically, masculine nouns ending in back
coronals (or palato-alveolars) were significantly more likely to allow for both suffix
allomorphs (-om and -em).
In principle, connectionist networks should be capable of modeling allomorphy. In
particular, a probabilistic version of the model of Mirković and collaborators (2009)
could account for allomorphic variation in Serbian nouns. However, the immense
power or flexibility that is typical for artificial neural networks, comes at a cost of
lacking insight in how a given network achieved a particular morphological mapping.
As Norris (2005) suggested, the true contribution of connectionist models should not
come from their performance, but from understanding the principles that guide the
performance of the networks (see also Baayen, 2003 for a more elaborate
discussion). Thus, the question is whether more directly addressable learning
mechanisms could meet the same goal. In particular, we are interested in testing
whether we could model allomorphic variation in Serbian instrumental singular by
using a very simple analogical approach. However, before we go any further, a note
of caution is in order: it is perfectly possible to successfully model the same
phenomenon using different machine learning approaches. What is important is the
contribution that different approaches give to our understanding of the phenomenon.
Following Marr (1982), we can say that analogical learning improves our
understanding mostly at the algorithmic level, revealing the processes and
representations of this task. At the same time, a connectionist network improves our
understanding mostly at the implementational level, showing how neural structures
and neuronal activities might implement a given cognitive task.
Our claim is that allomorphy can take place from analogical inference, where
sources of analogy (existing stem forms) compete with each other in providing one
or the other suffix allomorph – possible inflected forms of instrumental singular
masculine nouns. Acquisition and processing of linguistic knowledge by means of
memory and analogy has a long history in twentieth-century linguistics (De
Saussure, 1916; Bloomfield, 1933; Harris, 1951; 1957 etc.). Recently, the idea has
been further developed by usage-based models of language (Bybee, 2007). In
psychology, the concept of analogy can be seen in exemplar-based accounts of
human categorization behavior (Smith & Medin, 1981; Nosofsky, 1986; Estes,1994).
According to these accounts, categories are formed by storing exemplars in memory,
and categorization decisions are made by relying on similarities of target stimuli to
exemplars stored in memory. In computational linguistics, these ideas have been
applied in memory-based learning (Daelemans & Van den Bosch, 2005) and
Analogical Modeling of Language (Skousen, 2002).
According to the memory-based learning approach, a categorization decision (e.g.,
the choice of allomorph) is resolved by re-use of existing exemplars and analogical
reasoning. In order for this process to take place, three conditions need to be
fulfilled. In the case we are studying here, firstly, we need a store of exemplars
(stems) with assigned exponent (the instrumental ending). These exemplars can be
represented as vectors of phonological features at the subsyllabic level (i.e., the
onset, nucleus, coda of each syllable). Secondly, a distance function is required to
compute the similarity of the target form to the forms stored in memory. Finally, in
order to assign a class to the novel exemplar, a decision function is required. The
decision function is adopted from the field of artificial intelligence and is based on the
k nearest neighbor classifier method (k-NN). This implies that the outcome of the
decision function is determined by the class of the k nearest neighbors (e.g., if k = 1,
a novel exemplar is assigned a class of the exemplar most similar to it). Memory-
based learning has a long history of application within the field of computational
linguistics. Recently, the method has also been successfully applied in
psycholinguistic research, where the aim is to approach the performance of native
speakers, that is, to simulate the functioning of the cognitive system. By now, a
considerable body of empirical data demonstrated the efficiency of memory-based
learning. Keuleers et al. (2007) and Keuleers and Daelemans (2007) have
demonstrated that outcomes of simulations based on the memory-based learning
paradigm mimic performance of native speakers in the production of Dutch noun
plurals. Similar findings have been reported for Italian verb conjugations (Eddington,
2002a), Spanish gender assignment (Eddington, 2002b), linking elements in German
compounds (Krott, Schreuder, Baayen and Dressler, 2007) and so on.
Problem
In this paper, we will compare the predictions derived from memory-based learning
models to experimental results by looking at production of allomorphic variations
using pseudo-nouns in the domain of the Serbian instrumental singular. Attempts
have been made in describing orthographic/phonological properties of stems that
lead to the production of each of the two allomorphic variations (Zec, 2006 in
particular). These descriptions were moderately successful in predicting responses
collected from native speakers, and can be seen as rules for choosing an allomorph.
In this study, we will not compare the predictions derived from these rules to the
results obtained by means of exemplar-based modeling. Our aim is to demonstrate
that analogical learning can account for allomorphic variation at least as well as the
rule-based descriptions. Moreover, the difference between the analogical models
and the rule-based descriptions is that the former operate in a completely inductive
manner. By this we imply that the model does not rely on a priori knowledge of which
features are important and which ones are not.
The predictive power of the memory-based learning models will be tested by
comparing the outcomes of simulations to behavioral responses collected from
native speakers. In particular, for each allomorph, we will be looking at the
correlation coefficients between probabilities assigned by the model and the
probabilities observed in behavior of native speakers (by dividing speakers preferring
one allomorph with total number of speakers in a given sample). Because the
simulations are based on the principles of memory-based learning, high correlation
coefficients between these probabilities would suggest that these principles have a
cognitive relevance.
Finally, the memory-based learning models will use only similarity between forms at
the level of orthography/phonology.2 Although a clear improvement in predictions is
to be expected if additional similarities were included, we shall opt for simplicity, and
examine the explanatory potential of a simple measure.
Method
Experimental data
The experimental data are taken from Jovanović (2008). In total, 42 adult
participants, first year students of Psychology in Novi Sad, mainly females, with
normal or corrected-to-normal vision participated in a computerized Wug-task.
Jovanović used 125 pseudo-stems that followed Serbian ortho-phonotactic
constraints. Each pseudo-stem was exactly five characters long, and had a fixed
CVCVC structure. The final VC segment was controlled: all 25 Serbian consonants
occurred five times as a final consonant, preceded once with each of the five vocals.
For example, some of the pseudo-stems used in experiment were: "bobaš", "cogilj",
"gofić", "nirib", "salav" and so on. To implement the Wug-task, Jovanović
downloaded 125 pictures from the What is it? web-site
(http://puzzlephotos.blogspot.com). Each trial started with presentation of an
unknown picture with its pseudo-word label in nominative singular (for 2000 ms).
Then, a grammatical Serbian sentence appeared with the critical pseudo-word in
both of instrumental singular allomorphs. One allomorph was positioned a row above
and the other was positioned a row below blank space that was in line with the rest
2 Serbian has shallow orthography, and mapping from phonology to orthography is one to one.
Hence, for the purpose of present research, this difference can be disregarded.
of the words forming a sentence (for example: "Motori se testiraju
cogiljem/cogiljom."; in English: "Engines are tested (by) cogiljem/cogiljom."). The
participants' task was to choose one of the two forms by pressing a spatially
corresponding button. There was no response time-out. It took approximately 10
minutes for participant to complete the task. Based on participants’ choice,
probability of each of the two allomorphic forms was estimated.
Simulation procedure
Implementation of the memory-based learning model started with the selection of an
exemplar-storage that made up the "memory" of the model. For the present
research, we used all 3481 masculine and neuter nouns from the Frequency
Dictionary of Contemporary Serbian Language (Kostić, 1999), which occurred in
instrumental singular case. Neuter nouns were included because their instrumental
singular can also attach both -om and -em suffix, depending on the final vowel (-o or
-e). This inclusion gave additional noise in the exemplar-storage, thus making
analogical learning more demanding.
In memory-based learning, the problem of predicting an allomorph is considered a
simple classification problem: each pseudo-word needs to be classified as taking -
om or -em. For this, the memory base was searched for the k nearest neighbors. For
instance, in a model where the neighborhood size (k) equals 7, we would search the
memory for the 7 stems that were most similar to the pseudo-word.3 We could then
look at how often the -em and -om suffixes occurred among these stems. The
estimated probability of each suffix then was a simple ratio of the times it occured in
the neighborhood to the total number of stems in that neighborhood. We tested
models with different neighborhood sizes: we linearly increased k from 1 to 16, after
which we used an exponential growth function of base 2 (k = 32, 64, ..., 1024, 2048),
until finally k equalled the size of the lexicon (3481 items).
In addition to the parameter k, memory-based learning models have another two
crucial parameters: the distance metric used for computing the similarity between 3 In practice, the parameter k refers to nearest distances rather than nearest neighbors. When several
exemplars occur at the same distance from the target, these exemplars are considered tied. In other
words, a k-NN model looks at least k exemplars. See Keuleers and Daelemans (2007) for a more
detailed treatment of this issue.
exemplars stored in memory and the pseudo-word to be classified, and the decay
function, defining how a neighbor's weight in the classification decreases with
distance from the target pseudo-word. We employed three well-known distance
metrics: Jeffrey divergence, Levenshtein distance, and Hamming or Overlap
distance. The Overlap metric is the coarsest: it simply counts the number of
mismatching features. The Levenshtein distance is a generalized version of Overlap
distance: it measures how many features must be inserted, deleted, or replaced to
transform the stem into the pseudoword. Finally, Jeffrey divergence uses principles
from information theory to give a weight to each feature, and operates as a weighted
Overlap metric (for an in depth presentation of these measures consult Rubner,
Tomasi & Guibas, 2000; Levenshtein, 1966; Hamming, 1950; for their application in
linguistics see Daelemans & Van den Bosch, 2005). In addition to the distance
metrics, we compared three decay functions: Zero Decay, where all neighbors have
the same influence on classification, regardless of their distance to the pseudo-word;
Inverse Distance Decay, where neighbors are weighted by the inverse of their
distance; and Exponential Decay, where a neighbor's weight decreases
exponentially with its distance. Since both the neighborhood size, the definition of
similarity and its decay weighting affect the composition of the neighborhood, these
parameters can interactively affect the outcome of a simulation.
Results
In the very first step of analysis we estimated the probability of producing -om and -
em suffix for each noun based on participants' responses in Wug-task. These
probabilities were then correlated with the outcomes of the memory-based learning
simulations, where distance metric, decay weight and neighborhood size were
systematically varied as factors. These results are presented in Figure 1.
As we can see from the plots, the similarity between human and computer results,
expressed in terms of product-moment correlation coefficient, reached its maximum
very rapidly. This means that in most cases, a very small number of exemplars was
sufficient for memory-based learning to make a correct analogy and to produce
human-like output of suffix allomorphy in Serbian instrumental singular pseudo-
nouns. After including about ten nearest neighbors, nothing much could be gained,
as the right-hand lines presenting exponential increase of neighbors show.
Moreover, without decay weighting any further increase in number of neighbors was
harmful for the similarity between human and computer-simulated behavior, while
exponential and inverse decay weights just alleviated cost of using large
neighborhoods.
Figure 1. Correlation coefficients between probabilities of producing -om and -em suffix allomorph, in
behavioral experiment (Wug-task) and computer simulations. Line-breaks mark points where increase
in neighborhood size changes from linear to exponential.
Row-wise comparisons of graphs by means of visual inspection already show that
there were no substantial differences between the three distance metrics. However,
in addition to visual inspection of graphs, we performed more detailed statistical
comparisons of the three distance measures. Having two allomorphs crossed with
three distance measures and three decay weights for each number of neighbors
provided us with a total of 18 correlation coefficients per number of neighbors. In
order to demonstrate that there were no significant differences between 18
correlation coefficients within a given number of neighbors, we tested for the
significance of the difference between the smallest and the largest correlation
coefficient for each of the first sixteen neighborhood sizes, separately. If the
difference between the smallest and the largest of correlation coefficients was not
significant, then we could deduce that none of the differences were. In other words,
this way we could demonstrate that all three distance measures using three different
decay weighting performed equally well both for –em and for –om forms, for a given
number of neighbors. The tests confirmed the null-hypothesis, thus proving that, in
range from one to sixteen nearest neighbors, with any of the three measures using
any of the three decay weighting we can achieve approximately the same success in
simulating human production.
However, some variations were rather interesting and specific to a given measure.
Using the simplest of the three measures – Hamming's distance (i.e., Overlap), gave
somewhat lower correlations, but Jeffrey divergence, although the most
sophisticated measure, did not perform better than Levenshtein distance. However,
using Jeffrey divergence, the difference in similarity in producing each of the two
allomorphic variants (ending with -om or with -em) was negligible. Levenshtein
distance provided a better mach to the human responses for the -em allomorph,
while the Hamming distance did exactly the opposite. Finally, larger neighborhoods
were the least penalizing for Jeffrey divergence. It seems that this was the single
point where some leverage from a more sophisticated measure was observed. This
finding might be surprising, but can be simply explained by the fact that Jeffrey
divergence is more fine-grained than the other measures. It expresses distances as
real numbers. Therefore exemplars do not tie often at the same distance, while the
Overlap and Levenshtein metric, which express distances in integer numbers,
collapse many exemplars at the same distance. Jeffrey divergence reaches the
same neighborhood size in absolute terms (the total number of exemplars) at a
much later point than the other similarity metrics, thus having particular decay
weighting as its intrinsic property.
In order to make comparison of human and simulated behavior even more rigorous
and conservative, we developed a specific statistical procedure which made use of