Top Banner

of 16

NON Word Naming

Apr 04, 2018

Download

Documents

tomor2
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/29/2019 NON Word Naming

    1/16

    Simulating consistency effects and individual differencesin nonword naming: A comparison of current models

    Jason D. Zevin a,*, Mark S. Seidenberg b

    a Sackler Institute for Developmental Psychobiology, Weill Medical College of Cornell University,

    1300 York Ave., Box 140, New York, NY 10021, USAb University of Wisconsin-Madison, USA

    Received 10 May 2004; revision received 29 August 2005

    Abstract

    The mechanisms underlying nonword pronunciation have a been a focus of debates over dual-route and connection-ist models of reading aloud. The present study examined two aspects of nonword naming: spelling-sound consistencyeffects and variability in the pronunciations assigned to ambiguous nonwords such as MOUP. Performance of a paralleldistributed processing model was assessed over multiple runs, representing multiple subjects with varying reading expe-rience. The model provided a good account of behavioral data concerning these phenomena. In contrast, the DualRoute Cascaded model does not produce consistency effects and does not account for the alternative pronunciationsthat subjects produce. The results highlight the importance of considering multiple aspects of a phenomenon such as

    nonword naming in assessing computational models. 2005 Elsevier Inc. All rights reserved.

    Keywords: Reading; Statistical learning; Computational models; Nonwords; Individual differences

    Word and nonword reading are among the mostextensively studied areas in cognitive science and neuro-science (see Posner, Abdullaev, McCandliss, & Sereno,1999; Rayner, Foorman, Perfetti, Pesetsky, & Seiden-berg, 2001 for overviews). Although several word read-ing models have been proposed, considerable attention

    has focused on the contrast between dual-route (Colt-heart, Curtis, Atkins, & Haller, 1993; Coltheart, Rastle,Perry, Langdon, & Ziegler, 2001) and connectionist (seeHarm & Seidenberg, 1999, 2004; Plaut, McClelland,Seidenberg, & Patterson, 1996; Seidenberg & McClel-land, 1989 for discussion) approaches, both of which

    have evolved over many years. In the present articlewe consider how recent versions of these models farewith respect to the task of reading nonwords aloud, atask that has long been used to assess their adequacy(Besner, Twilley, McCann, & Seergobin, 1990; Seiden-berg, Plaut, Petersen, McClelland, & McRae, 1994).

    The dual-route model of reading aloud (e.g., Colt-heart et al., 2001) holds that pronouncing letter strings(words and nonwords) involves a lexical route consistingof knowledge of individual words, and a nonlexicalroute consisting of rules for translating spellings tosounds. Words whose pronunciations violate the rules(exceptions such as PINT) can only be pronouncedcorrectly via the lexical route. Nonwords (such asNINT) can only be pronounced using the rules. The cen-tral dogma of the dual-route approach (Seidenberg,

    0749-596X/$ - see front matter 2005 Elsevier Inc. All rights reserved.

    doi:10.1016/j.jml.2005.08.002

    * Corresponding author.E-mail address: [email protected] (J.D. Zevin).

    Journal of Memory and Language 54 (2006) 145160

    www.elsevier.com/locate/jml

    Journal ofMemory and

    Language

    mailto:[email protected]:[email protected]
  • 7/29/2019 NON Word Naming

    2/16

    1995) is that the two routes are required to accou nt forthe ability to read these differing types of stimuli.1

    In the Dual Route Cascaded (DRC) model (Colt-heart et al., 2001), the lexical route is construed as anassociative network with nodes corresponding to words;it is sensitive to lexical statistics (e.g., the frequencies of

    words, their orthographic and phonological similarity toone another). The sublexical route operates categoricallyand deterministically, applying rules that state the validcorrespondences between spelling and sound butabstract away from statistical properties such as howoften the rules are used across words. The dual-routeframework is representative of a general approach tothe study of language and other phenomena which holdsthat distinct mechanisms are involved in the abstractionof categorical, symbolic rules versus the memorizationof arbitrary facts (e.g., Pinker, 1999).

    In the connectionist (or triangle) framework, both

    words and nonwords are pronounced using the samenetwork of weighted connections among units in a par-allel distributed processing (PDP) architecture (e.g.,Seidenberg & McClelland, 1989). Such models do notincorporate the distinction between a lexical level con-taining memorized forms of words and a set of rulesfor decoding novel words. This approach is representa-tive of a broader theoretical stance which asserts the pri-macy of statistical learning in the acquisition and use oflanguage and other types of knowledge (Kirkham, Slem-mer, & Johnson, 2002; Saffran, Newport, Aslin, &Tunick, 1997; Seidenberg, 1997). Thus, the differencesbetween the dual-route and connectionist approachesto spelling-to-sound decoding instantiate contrastingviews about the characterization of lexical and othertypes of knowledge.

    Regularity and consistency effects in word reading

    Several recent studies have attempted to adjudicatebetween the competing models by considering the effectsof two structural properties of words: regularity andconsistency. The concept of regularity is central to

    the dual-route approach: regular words are ones whosepronunciations are correctly specified by spelling-soundrules. Many words are regular (e.g., MUST and PAVE),as are all nonwords (e.g., NUST and MAVE, whichaccording to this theory can only be pronounced byrule). Typically the rules involve mappings between gra-

    phemes and phonemes, but other types of rules aresometimes proposed (see Coltheart et al., 2001, whoincluded multigrapheme and context-sensitive rules).The critical property of regularity is that it is a categor-ical concept: a words pronunciation is either correctlyspecified by the pronunciation rules or not. Wordswhose pronunciations violate the rules produce regular-ity effects: longer latencies and/or more errors than forrule-governed words. These arise because the lexicaland nonlexical routes produce conflicting pronuncia-tions for exception words (e.g., the lexical route yieldsthe correct pronunciation of PINT and the nonlexical

    route, the regularization /pInt/).Consistency, in contrast, is a statistical concept cen-

    tral to the triangle approach. The degree of consistencyin the mapping between spelling and sound varies con-tinuously. Consistency defined in terms of rimes, some-times also called word bodies, has been examined mostthoroughly in previous research (e.g., Jared, McRae, &Seidenberg, 1990) because this unit happens to be salientgiven the structure of English monosyllables (see Seiden-berg & McClelland, 1989; Treiman, Mullennix, Bijelac-Babic, & Richmond-Welty, 1995 for discussion). Forexample, the rime -UST is highly consistent because itis always pronounced /Vst/ in monosyllabic words. Arime such as -AVE is inconsistent because it is usuallypronounced as in SAVE, PAVE, and GAVE but differ-ently in HAVE. Other factors being equal, words withless consistent spelling-sound correspondences will bemore difficult to read aloud than words with more con-sistent correspondences. Connectionist models havebeen highly successful at accounting for the effects of dif-fering degrees of consistency observed in many studies(Cortese & Simpson, 2000; Jared et al., 1990; Jared,1997).

    It is important to note that although rimes exert thegreatest influence on naming latencies (and thus have

    been the primary focus of interest), there are secondarystatistical phenomena involving other units, rangingfrom graphemes and onset-nucleus units to entire words(as in homographs such as WIND). Rimes happen to bemost prominent, and the effects of inconsistencies overrimes are of a magnitude that can be readily observedin simple behavioral studies. Effects of other units areweaker but can be picked up in careful experiments(e.g., Treiman, Kessler, & Bick, 2003). We emphasizethe point that inconsistency occurs over multiple ortho-graphic grain sizes because it is often obscured in facto-rial experimental designs. For example, in studies that

    directly compare consistency and regularity effects

    1 As Harm and Seidenberg (2004) noted, the term dual-routemodel is ambiguous. Sometimes it refers to visual andphonologically mediated processes in the access of meaning,and other times to lexical and nonlexical processes forgenerating pronunciations (see also Coltheart, 2000). The claimthat reading may involve both visual and phonologicallymediated processes is not specific to any one theory; it reflectsthe fact that in alphabetic writing systems letter strings can beassociated with both meanings and pronunciations. However,the claim that two mechanisms are required to pronounce letterstrings is specific to the dual-route moded by Coltheart andcolleagues. We use the term dual-route model of reading

    aloud in reference to models incorporating this claim.

    146 J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160

  • 7/29/2019 NON Word Naming

    3/16

    (e.g., Cortese & Simpson, 2000; Jared, 2002), consistencyis typically operationalized at the rime level and regular-ity at the grapheme level. In fact, inconsistencies exist atmultiple levels of orthographic structure, and rules aresometimes defined in terms of units other than gra-phemes. The important difference between the concepts

    is that consistency is statistical, whereas regularity is cat-egorical. A statistical learning model such as Seidenbergand McClelland (1989) or Harm and Seidenberg (1999)will pick up on spelling-sound consistencies over manyunits, to the extent that they occur in the corpus of train-ing examples, constrained by properties of networkarchitecture such as the input (orthographic) and output(phonological) representations, and number of hiddenunits. In dual-route models, pronunciations are eitherrule-governed or not, and the procedure by which rulesare applied does not take into account statistical proper-ties such as how often they apply across words.

    In the triangle framework, regularity effects arisefrom spelling-sound inconsistencies. Consider, for exam-ple, the word ROLL, which is treated as an exception inthe DRC model (Coltheart et al., 2001) because the rulegoverning the grapheme O is associated with the pro-nunciation that occurs in DOT, TOP, ROCK, and manyother words. Hence the DRC model predicts that theexception ROLL should be more difficult than a wordsuch as ROCK that obeys the rules. The triangle modelmakes a similar prediction but for different reasons.ROLL is more difficult than ROCK because ROLLexhibits spelling-sound inconsistencies at several levels.At the rime level it is consistent with TROLL, POLL,and TOLL but inconsistent with DOLL and MOLL.At the grapheme level, the O in ROLL is consistent withwords such as POST and MOLD but inconsistent withevery word in which O is pronounced differently (e.g.,ROT, ROB, SOT, etc.). The weights governing theorthographyphonology computation encode the statis-tics of the mapping between spelling and sound. HenceROLL will be more difficult than ROCK because itexhibits greater spelling-sound inconsistency.

    Thus, both theories can in principle account for reg-ularity effects. However, they differ with respect to wordsthat are rule-governed (according to DRC) but inconsis-

    tent (according to the triangle model). Such words weredesignated regular but inconsistent by Glushko(1979). PAVE, for example, is rule-governed accordingto DRC, but inconsistent according to the triangle mod-el because of the irregularly pronounced neighborHAVE. DRC predicts that PAVE should be as easy topronounce as PANE, which is rule-governed but alsohighly consistent because it has no close irregular neigh-bors. In contrast, the triangle model predicts that thetwo types of rule-governed words should differ: inconsis-tent words such as PAVE should be more difficult thanconsistent words such as PANE, a finding that has been

    observed in numerous studies (see Jared et al., 1990, who

    summarized the results of more than a dozenexperiments).

    Consistency effects have been taken as strong evi-dence against the dual-route theory. However, accord-ing to Coltheart et al. (2001), the consistency effectsobserved in previous studies were due to confounding

    factors: the presence of words that DRC treats asexceptions among the inconsistent words, and leftto right misanalyses of words (whammies) that theyasserted occur more frequently in inconsistent words.They applied this analysis to a single study in the lit-erature (Jared, 1997). However, Jareds data yield aconsistency effect even with these factors taken intoaccount (i.e., if the exception and whammy itemsare removed from the data set). Moreover, the twofactors do not account for consistency effects in otherstudies. For example both Cortese and Simpson(2000) and Jared (2002) conducted naming studies

    that explicitly compared the regularity and consistencyfactors, and examined the performance of the Plautet al. (1996) and Coltheart et al. (2001) models ontheir stimuli. Whereas human subjects and the Plautet al. (1996) model produced large consistency andsmall (for the Cortese & Simpson stimuli) or null(for the Jared stimuli) regularity effects, the Coltheartet al. (2001) model did the opposite, producing verylarge regularity effects and null (or reversed) consis-tency effects.

    In summary, consistency effects are critically impor-tant because the triangle model predicts they shouldoccur whereas the DRC model predicts they shouldnot, in the absence of confounding factors. Existingdata indicate that such effects occur for words, pre-senting a substantial problem for the dual-routeapproach, including DRC 2001, which does not pro-duce them.

    Consistency and variability in nonword naming

    Although consistency effects for words strongly favorthe triangle approach, it is also important to considerwhether similar effects occur for nonwords. In dual-

    route models, all known words can be pronounced viathe lexical mechanism; the graphemephoneme corre-spondence rules that constitute the second route aremainly relevant to generalization (i.e., nonword pronun-ciation). Because of the serial nature of rule applicationin the 2001 DRC model and the settings of other param-eters, the nonlexical route operates so slowly as to havelittle effect on the pronunciation of words. In contrast,connectionist models make the strong claim that gener-alization arises from passing novel items through thesame network that encodes knowledge of words. Hence,whether such networks can generate correct nonword

    pronunciations is important. In fact there has been some

    J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160 147

  • 7/29/2019 NON Word Naming

    4/16

    question as to whether they can. Seidenberg and McC-lelland (1989) initially emphasized the fact that their net-work produced correct output for both rule-governedwords and exceptions. It was later noted that the model sperformance on nonwords was less accurate than peo-ples (Besner et al., 1990). The Seidenberg and McClel-

    land model correctly generalized to simple nonwordssuch as NUST, illustrating generalization without rules;however, it made errors on more difficult nonwords suchas JINJE. Plaut et al. (1996) and Seidenberg et al. (1994)subsequently showed that with improved phonologicalrepresentations such models could also pronounce non-words at levels comparable to human performance andslightly better than the 1993 version of DRC (Coltheartet al., 1993). However, more recent work discussedbelow (e.g., Treiman et al., 2003) raised further ques-tions about both types of models capacities to accountfor nonword performance.

    The present research focused on two aspects of non-word naming. The first is nonword consistency effects.We know that performance on a rule-governed wordsuch as PAVE is affected by an inconsistent neighborsuch as HAVE. The same effect has been reported fornonwords: MAVE is also harder to pronounce thanNUST (Glushko, 1979). As with words, the DRC modelpredicts that such effects should not occur; nonwords arepronounced by the rule component with little if anyinput from the lexical route. Nonword consistencyeffects also seem more compatible with the PDPapproach; the effects arise from the same source as forwords, shaping of the weights by exposure to words withconflicting spelling-sound correspondences. Thus weightsettings that produce an inconsistency effect for a wordsuch as PAVE also produce one for a nonword suchas MAVE. In the dual-route approach, both MAVEand NUST are pronounced using nonlexical pronuncia-tion rules and hence should behave alike. Consistencyeffects for nonwords provide a particularly strong basisfor deciding between the theories, insofar as they chal-lenge the claim that the generalization must be achievedby rule, which is the primary motivation for includingrules in the DRC model.

    The second issue is variability in people s pronunci-

    ations of nonwords. In previous work, the performanceof both dual-route and connectionist models wasassessed with respect to whether they produced plausi-ble nonword pronunciations. Judged by this criterion,both types of models do approximately equally well.Many nonwords, however, are pronounced differentlyby different subjects (Andrews & Scarratt, 1998;Seidenberg et al., 1994). Variability in pronunciationsis a fact about performance that models of word andnonword pronunciation need to explain. One potentialexplanation is that people have somewhat different rep-resentations of spelling-sound knowledge because their

    reading experience differs (e.g., with respect to type and

    amount of reading). Individual differences of this sortare easily accommodated by an approach in whichprobabilistic spelling-sound mappings are acquired viaa learning mechanism that is sensitive to statisticalproperties of the words to which the reader (or model)is exposed. Accounting for this variability provides an

    additional, more stringent criterion for evaluating mod-els. One can consider not merely whether the modelproduces a plausible pronunciation (e.g., one of thepronunciations produced by people) but whether itproduces the alternative pronunciations that peopleproduce.

    In principle, variability in experience could lead indi-viduals to formulate different pronunciation rules in aDRC-type model. This proposal is difficult to evaluatebecause there is no current proposal about how a setof rules is acquired in the DRC framework, or how dif-ferent readers could acquire different rules. An early ver-

    sion of the model Coltheart et al. (1993) utilized a rule-learning algorithm, but this was later discarded becauseit lacked psychological plausibility (e.g., in the way itsearched the space of possible rules) and because therules it generated did not work sufficiently well (Seiden-berg, Petersen, MacDonald, & Plaut, 1996). Norris(1994) also used an algorithmic procedure to learn pro-nunciation rules, but the rules consisted of weights onconnections among units representing letters, gra-phemes, and larger units, trained using a connectionistlearning procedure (the delta rule). This notion of ruledeviates considerably from Coltheart et al.s: In the Nor-ris model, the mappings are applied probabilistically,and they are based on sublexical statistics over multiplelevels of representation that are built into the architec-ture of the model. Thus they are much more similar tothe kinds of probabilistic constraints learned by PDPmodels.

    Within the connectionist approach, individual differ-ences have tended to be ignored because they requiremultiple runs of a model, something that until recentlywas too computationally time consuming to be feasible.The research described below is the first attempt to mod-el behavioral data concerning variability in nonwordpronunciation. Individual differences among readers

    could be due to several factors that can be simulatedwithin the triangle framework, including constitutionalfactors related to learning efficiency or capacity; differ-ences in the level of detail with which phonologicalinformation is represented; and different amounts ortypes of reading experience (Harm & Seidenberg,1999). Although other factors may eventually prove tobe involved, we began by determining how well differ-ences in experience alone could account for existingdata. Thus we examined whether individual differencesin the pronunciations of nonwords arise from minor dif-ferences in the sample of words to which individuals are

    exposed. We trained a single model multiple times, using

    148 J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160

  • 7/29/2019 NON Word Naming

    5/16

    the probabilistic, frequency weighted sampling proce-dure used in earlier models. This procedure results ineach model being exposed to a different set of words.We predicted that this seemingly minor manipulationwould give rise to different pronunciations for somenonwords, as in people. Specifically, we expected the

    sampling differences to have little impact on nonwordscontaining common, high-frequency spelling-soundcorrespondences, for which people agree on a singlepronunciation; however, they would be expected toaffect the pronunciations of inconsistent nonwordssuch as MOUP, for which people provide differentpronunciations.

    We assessed performance of two modelsthe DRC(Coltheart et al., 2001) and multiple runs of a versionof the Harm and Seidenberg (1999) modelwith respectto the simulation of three studies of nonword reading:Glushko (1979); Andrews and Scarratt (1998), and Trei-

    man et al. (2003). These studies provide different types ofevidence about nonword reading, and included stimulithat were suitable for testing the models. Three kindsof data were examined: response latency from the Glu-shko (1979) and Andrews and Scarratt (1998) studies;effects of spelling-sound consistency on the pronuncia-tions of ambiguous nonwords (Andrews & Scarratt,1998; Treiman et al., 2003); and individual differencesin nonword pronunciation related to spelling-sound con-sistency (Andrews & Scarratt, 1998).

    To anticipate the results, the modeling providestwo serious strikes against the dual-route approach.First, DRC does not produce the consistency effectsthat were observed in behavioral studies conductedwith different materials in different labs. The effectsare real and they cannot be explained by the alterna-tive factors invoked by Coltheart et al. (2001). Second,DRC does not account for individual differences in thepronunciations assigned to nonwords; the pronuncia-tion rules generate one pronunciation for each letterpattern. Although the connectionist modeling doesnot capture all aspects of the variability observedacross individuals, it clearly establishes the feasibilityof the approach.

    Methods

    PDP model architecture

    For these simulations we used a slightly modified ver-sion of the Harm and Seidenberg (1999) model. Themodel had 133 orthographic units, 200 phonologicalunits and 100 hidden units. Twenty cleanup units medi-ated connections from each unit of the phonologicallayer to itself and every other unit on the layer. The pho-nological representation consisted of 8 slots and 25 pho-

    nological features for each slot (see also Harm, 1998;

    Harm & Seidenberg, 2004). Each phoneme was codedas a binary vector, with each on bit representing thepresence of a phonological feature. This representationdiffers somewhat from the Harm and Seidenberg(1999) model. However, our experience with differentoutput phonological representations is that the precise

    choice of coding scheme has little effect, as long as therepresentation effectively encodes the similarity spaceof the phonetic inventory. This is true even when, as inKeidel, Zevin, Kluender, and Seidenberg (2003), themodel directly addresses how such representations arelearned from acoustic input.

    The pronunciations on which the model was trainedwere based on the dialect of American English prevalentin Southern California. The most distinctive aspect ofthis dialect vis a vis other versions of American Englishis the lack of a contrast between /a/ and / c/ (as in dolland ball, respectively). The behavioral data sets include

    a range of dialects (including Australian and Midwest-ern English) which creates minor discrepancies betweenmodel and human data. A nonword such as YALD, forexample, is inconsistent in the Midwestern dialect spo-ken by the subjects in the Treiman et al. (2003) study,but not in the models dialect. On our view, the noisethat these items add to the assessment of the modelwas outweighed by the advantage of simulating datafrom a range of studies using a single architecture andtraining set. Clearly, it would be possible to train modelsin different accents and achieve better fits to particulardata sets.

    Model training and testing

    A list of 5870 monosyllabic words was used as thetraining set in all simulations. During training, the prob-ability of using any word on a given trial was propor-tional to the square root of its frequency (taken fromMarcus, Santorini, & Marcinkiewicz, 1993), with rawfrequencies were capped at 10,000. This transformationand capping ensured that low-frequency words wouldbe selected a reasonable number of times out of the1,000,000 training trials used for each run. For example,an item with a nominal frequency of one per million

    occurs 40 times on average during the training phasein the current simulations. This gives such items achance of being acquired within a reasonable amountof training time. Similar results would obtain withoutthis transformation, but at the cost of multiplying theamount of computational time per simulation run toan impractical degree. This is a major consideration inthe present study, which involves multiple runs of themodel.

    All runs of the model started with the same set ofinitial weights set to small, random values. We keptthe initial weights constant in order to be able to

    isolate effects of training corpus variability. On each

    J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160 149

  • 7/29/2019 NON Word Naming

    6/16

    training trial, the orthographic units were set to thevalues for a given target for 10 time ticks. After 12ticks, the pattern on the phonological layer was com-pared to the desired output and an error signal waspropagated back through the network using a variantof the continuous recurrent backpropagation algo-

    rithm. The learning rate was .01.The model was tested by setting the orthographic

    values to represent a nonword for 10 ticks and observ-ing the output on the phonological layer two ticksafter the orthographic input was removed. A winner-take-all scoring system was used to determine themodels output: for each slot on the output layer, wedetermined which phoneme was closest to the patternon the output at the final time tick and reported thisas the models pronunciation. Pronunciations werethen scored as regular (according to the rulesdefined by Andrews & Scarratt, 1998) or critical

    (according to the definition in Treiman et al., 2003).All items from the Andrews and Scarratt (1998) studywere included, although a small number includedspellings or spelling-to-sound mappings not commonin American English. In contrast, one condition (Case2, onset and body) had to be eliminated from simula-tions of the Treiman et al. (2003) study because itdepended on a contrast between the vowels /a/ and/ c/ not present in the models dialect.

    Because these simulations were designed to examinevariability in the pronunciations generated to ambigu-ous nonwords, we did not designate a single pronun-ciation as the correct one a priori. The modelsoutput was only scored as an error if one or moregraphemes were assigned pronunciations that did notoccur in any word in the lexicon. Response latenciesfor nonerror responses were simulated in the modelusing a settling time measure, which is the time (inprocessing cycles) required for the model to arrive atits final response. Settling times were determined withrespect to the pronunciation generated for each itemon each run. Settling time is not an ideal proxy fornaming latency, which reflects the time to initiate aresponse, rather than the time required to fully specifya pronunciation. However, settling times do roughly

    reflect relative differences in pronunciation difficultyitems and are sufficient for many purposes (e.g., simu-lating data averaged across items of a given type). Alldata presented below are means from the 10 runs ofthe model, except for the H statistics (defined below)for which fourteen additional runs were included tomake the analyses more compatible with the humandata.

    Items from the experiments were also submitted tothe DRC model using the standard parameter set. Themodel (downloaded from http://www.maccs.mq.edu.au/~max/DRC/max/DRC/) was presented with

    the items in batch naming mode. Response latencies

    were recorded in cycles, and the models pronunciationswere scored as above.

    Results

    Overall performance of the PDP model

    After one million training trials, the PDP model cor-rectly pronounced an average of 95% of the training set(5579/5870 words). Errors tended to be so-called strangewords, (e.g., ACHE, CHUTE, GAUCHE, VELDT).Because these items are highly unusual both in their spell-ing patterns and spelling-to-sound mappings, they dependon semantic knowledge to be pronounced correctly (seeHarm & Seidenberg, 2004; Plaut et al., 1996; Strain, Patt-erson, & Seidenberg, 1995 for a discussion of the role ofsemantics in reading strange words). Another difficulty

    is the slot problem (Plaut et al., 1996): the models

    orthographic input consists of vowel-centered slots whichcan take on a different value for each letter that occurs atthat location. This means that what the model knowsabout, e.g., the letter T in VELDT does not overlap withits representation of the same letter in more common posi-tions (i.e., first or second position after the vowel). Thisalso creates difficulties for some nonwords. However,the fact that the models performance differsfrom peoplesas littleas it does suggests that although theslot based rep-resentation needs to be replaced with a more realistic rep-resentation, it does not greatly interfere with learningspelling-sound correspondences.

    The error rate was 5% for the Glushko nonwords, 5%for the Andrews and Scarratt nonwords and 10% for theitems in the Treiman et al. (2003) study. These errorrates are somewhat larger than the means reported forthe subjects in the behavioral studies. Errors tended tobe vowel blends. For example, the grapheme EA is mostfrequently pronounced /i/ or /e/. Because the phonolog-ical features of vowels are encoded continuously in themodel, it will occasionally settle on the midpointbetween /i/ and /e/, which happens to be /I/, yielding,for example /bIlm/ for BEALM. These responses werescored as incorrect, which may be a more stringent crite-

    rion than employed in scoring the human data, givenexpectancy effects in speech perception (e.g., Ganong,1980), and the fact that the researcher scoring theresponses is expecting one of a limited number ofpronunciations.

    Settling time data were right skewed in a manner sim-ilar to RT data. Specifically, a small number of observa-tions were observed during the last two time ticks, whenthe output targets were typically provided. Data weretrimmed by excluding trials with settling times at theselast two time ticks. This resulted in discarding less than1% of the data for both the Glushko and the Andrews

    and Scarratt data.

    150 J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160

    http://www.maccs.mq.edu.au/~max/DRC/max/DRC/http://www.maccs.mq.edu.au/~max/DRC/max/DRC/http://www.maccs.mq.edu.au/~max/DRC/max/DRC/http://www.maccs.mq.edu.au/~max/DRC/max/DRC/
  • 7/29/2019 NON Word Naming

    7/16

    Consistency effects on response latency

    Fig. 1 presents data from the Glushko (1979) studyand the current simulation. The consistency effect washighly reliable in the human data, and also in the sim-ulation, F(1,78) = 18.59, p < .001. The item-wise cor-relation between settling times and naming latencieswas also significant, r = .22, t (84) = 2.09, p < .05.The DRC model did not replicate the consistencyeffect, and latencies measured in DRC cycles did notcorrelate significantly with human latencies, r = .15,t (83) = 1.40.2

    Coltheart et al. (2001) claimed that consistency effects

    for words arise from two confounding factors. One, thepresence of exception words among the inconsistentstimuli, is irrelevant here because the stimuli are non-words. The other, whammy effects due to the serialapplication of rules, could apply to nonwords. Thus ifbehavioral effect were due to more whammies in theinconsistent nonwords, DRC would produce the effect.

    However, it does not, indicating that the behavioral

    effect is not due to this factor.Andrews and Scarratt (1998) presented a detailed

    study of factors that affect nonword pronunciation anddata about variability across subjects, using stimuliwhose neighborhood properties varied. Four types ofstimuli were used (Table 1): Regular, consistent body

    RT(milli

    seconds)

    Consistent ConsistentInconsistent Inconsistent

    A B

    580

    604

    628

    652

    676

    700

    Cycles

    Settlin

    gtime

    Consistent Inconsistent

    C

    135

    140

    145

    150

    155

    160

    3.0

    3.3

    3.6

    3.9

    4.2

    4.5

    Fig. 1. Response latency data for Glushkos (1979) consistent and inconsistent nonwords: (A) subjects response latencies inmilliseconds; (B) mean settling time from PDP simulations reported in this paper; and (C) response latencies (in cycles) from the DRC.

    2 One of Glushkos items, HOVE, was removed from theanalysis because it was in DRCs lexicon and had a responselatency more than 3 standard deviations faster than the meanfor the remaining items. Including this outlier does not improve

    the correlation between human latencies and DRC data.

    Table 1Categories of Nonwords in the Andrews and Scarratt (1998)Study

    Stimulus type Example Pronunciation

    Regular Analogy

    RCB TUNK /tVNk/ /tVNk/RIB SULL /sVl/ /sfl/NRAU VONTH /vanh/ /vVnh/NRAM WALF /wlf/ /wf/

    Note. Abbreviations for item types described further in text:RCB, rgular, consistent body; RIB, regular, inconsistent body;NRAU, no regular analogy, unique body; NRAM, no regularanalogy, many neighbors body. Regular pronunciations deter-mined according to rules in Andrews and Scarratt (1998);analogies represent an alternative pronunciation based on themost frequent pronunciation of each body, TRUNK, FULL,

    MONTH and HALF, respectively.

    J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160 151

  • 7/29/2019 NON Word Naming

    8/16

    (RCB) nonwords contain bodies that are assigned a sin-gle, regular pronunciation in all words in which theyoccur; regular, inconsistent body (RIB) nonwords con-tain bodies that assigned an irregular pronunciation inmany words, and the rule-governed pronunciation inmany others; no regular analogy, many neighbors

    (NRAM) nonwords contain bodies that occur in multi-ple irregular words; no regular analogy, unique (NRAU)nonwords are those whose bodies occur in only one,irregular word.

    Although they were designed to test a distinctionbetween rule and analogy mechanisms, the effect ofAndrews and Scarratts subtyping procedure was to cre-ate conditions in which consistency varied in a gradedmanner. The RCB items are the most consistent, becausethere the statistics at the rime level and the graphemelevels are consistent with each other. For example, themost frequent pronunciation in the lexicon for U is /V/

    and this is the only pronunciation it is assigned in thecontext of the body -UNK. The RIB items represent aslightly lower degree of consistency, because there is sup-port in the lexicon for both a regular pronunciation(e.g., HULL and GULL for -ULL) and an irregularpronunciation (e.g., FULL and PULL). In the case ofthe NRAM and NRAU items, statistics at the body leveloverwhelmingly favor a different pronunciation from the

    statistics at the grapheme level: By definition, there areno instances in which the most frequent pronunciationfor a critical grapheme is assigned in the context of thebody in question. Here we assess the models with respectto naming latencies in this study; later we consider dataabout the types of pronunciations produced.

    The latency data shown in Fig. 2 reflect an orderly,graded influence of consistency on response latency.The most consistent items (RCB) were read most quick-ly. Items that were regular but inconsistent at the word-body level (RIB) were read more slowly. Items with noregular analogy (NRAU and NRAM) are interestingbecause they exhibit conflicts between statistics at thegrapheme and word-body levels. For these items, thenumber of neighbors had a large effect: items with manyneighbors were named significantly more quickly thanitems with only one. Average settling times for thePDP models exhibited this same difference between the

    NRAU and NRAM items, F(1,126) = 14.34, p < .001.In the human data, the advantage for RCB over RIBitems was marginally significant; the model data alsoproduce this trend, F(1,78) = 2.01. Finally, the differ-ence between NRAM and NRAU items was also signif-icant, F(1,46) = 5.62, p < .05. The item-wise correlationbetween the model and human data was significant,r = .28, t (126) = 3.22, p < .005.

    Fig. 2. Response latency data for Andrews and Scarratts (1998) items: (A) subjects response latencies in milliseconds; (B) settling time

    from PDP simulations reported in this paper; and (C) response latency data from the DRC in cycles.

    152 J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160

  • 7/29/2019 NON Word Naming

    9/16

    The DRC model captures the overall advantage ofitems with predominately regular neighbors over itemswith no regular neighbors, F(1,126) = 11.85, p < .001,as well as the difference between NRAM and NRAUitems, F(1,46) = 7.30, p < .01. Item-wise correlationbetween the DRC and human data was significant,

    r = .36, t (122) = 4.27 p < .001.3 The only major discrep-ancy between the DRC model and the human data is inregard to the difference between the RCB and RIB con-ditions, which is marginally significant in the humandata but nonsignificant and numerically reversed in theDRC model.

    To summarize, both models provide a good accountof the latency data from Andrews and Scarratt (1998)study. In both the human and PDP data, there is a trendtoward a consistency effect for nonwords with regularword bodies, which DRC does not produce. Taken withthe results for the Glushko (1979) study, these findings

    suggest that the PDP model is more sensitive to consis-tency effects on nonword latencies.

    Consistency effects on pronunciation regularity

    We now consider the effects of consistency on hownonwords are pronounced. An item such as SULL,which has both regular (HULL and DULL) and irregu-lar (PULL and FULL) neighbors, can be pronounced torhyme with either of these alternatives. Two studies haveexamined the degree to which skilled readers are sensi-tive to lexical statistics at this level when assigning pro-nunciations to nonwords.

    Andrews and Scarratt (1998)

    Fig. 3A shows the percentage of regular pronuncia-tions produced by subjects in the Andrews and Scarratt(1998) study. Regularity was defined in terms ofAndrews and Scarratts rules. As in the latency data,the effects are graded. The RCB and RIB conditionsproduced the highest percentages of regular pronuncia-tions, with NRAU producing a much lower percentageand the NRAM condition the fewest. Statistically, thedifference between the RCB and RIB conditions wasmarginal, whereas the RIB-NRAU and NRAU-NRAM

    differences were significant. The PDP model (Fig. 3B)produced the same ordering of conditions, and the samepattern of significant differences between conditions(t (64) = 7.95, p < .01 for RIB-NRAU andt (23) = 7.74, p < .01 NRAU-NRAM) except that theRCB-RIB difference was larger than for humans,t (78) = 7.82, p < .01.

    The results from the DRC model are different. First,the model is at ceiling (100% regular pronunciations) inthe RCB and RIB conditions, which was not observed inhuman subjects or the PDP model. Second, DRC doesnot reproduce the significant difference between theNRAU and NRAM conditions that is present in both

    human data and PDP model. Numerically, the differenceis in the wrong direction, although it is not statisticallyreliable. DRC produces a significant main effect of reg-ular body vs. no regular analogy body items (RCB-RIB vs. NRAUNRAM) rather than the graded effectsin the human and PDP data. These results implicate thesame problem as the latency simulations: DRCs rule setdoes not capture peoples sensitivity to degrees of spell-ing-sound consistency.

    In Table 2, stimuli are grouped by the proportion ofregular responses assigned by subjects in Andrews andScarratt (1998). The data from the PDP simulation show

    a pattern similar to the human data, although the modeloverestimates the proportion of regular pronunciationsthroughout, particularly in the 2040 bin. The DRConly produces a large proportion of irregular responsesin one bin. It is important to note here that the irregu-lar pronunciations produced by the DRC are the resultof a discrepancy between how the rules are encoded, andnot the result of the involvement of analogical or lexicalprocessing. The rule set adopted by Andrews and Scarr-att (1998) represents a minimalist approach, whereinonly very small units (graphemes) are coded. On theirscheme (and others in a similar vein, see e.g., Venezky,1970), each grapheme has a rule associated with it,and pronunciations are considered regular if all of therules apply. By not including context-sensitive rulesand multigrapheme rules, this scheme avoids a numberof issues that arise for more complex rule sets. Forexample, if both single- and multigrapheme rules areallowed, a mechanism to adjudicate between them isrequired. However, restricting the rule set to rulesencoding single graphemes necessarily ignores meaning-ful regularities at larger grain sizes. This is clear from thesmall number of regular pronunciations produced byhuman subjects in the NRAU and NRAM conditions.

    The rule set adopted in the DRC approach does not

    have the same restrictions as the Andrews and Scarratt(1998) rules. It contains both context-sensitive and mul-tigrapheme rules and explicitly defines a mechanism bywhich rules operating at larger unit sizes (e.g., the wordbody) can override rules at smaller unit sizes. As shownin Fig. 3 and Table 2, these rule sets make slightly differ-ent claims about what should count as a regular pronun-ciation. In particular, the word-body level rules in theDRC cause it to produce irregular pronunciations for16 of the 24 items that most often elicited irregular pro-nunciations from human subjectsfor all other items,the DRC agrees strongly with the Andrews and Scarratt

    (1998) rules.

    3 Three items, LANG, RATCH and TOPE were present inthe DRCs vocabulary and were removed from the analysesbecause their RTs were nearly 4 standard deviations faster than

    the remaining items.

    J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160 153

  • 7/29/2019 NON Word Naming

    10/16

    Treiman et al. (2003)

    Treiman et al. (2003) examined the pronunciation ofspecific vowel graphemes in specific onset or coda con-texts. Their items were developed on the basis of statisticalanalyses of a large corpus of English monosyllables(Kess-ler & Treiman, 2001). Examples of the items are shown in

    Table 3. They obtained naming data for these items in

    order to examine the effects of different contexts on thepronunciations of vowels. For each vowel, a critical pro-nunciation was chosen. Critical pronunciations weredefined as involving relatively infrequent grapheme-to-phoneme correspondences that are highly conditionedby context. One dependent measure in their study wasthe difference in the proportion of critical pronunciationsfor experimental items (which contain onsets or codasthat occur in words with the critical pronunciation) andcontrol items (which contain neutral onsets or codas).For example, the probability of pronouncing the A innonwords with the body -ANGE as /eI/ was compared

    to the probability of producing the same vowel in itemswith the body -ANCE. Treiman et al. (2003) found thatconnectionist models (Plaut et al., 1996; Harm & Seiden-berg, 2004) captured the broad pattern of the data inshowing a strong influence of the critical context on vowelpronunciation; however the fit at the smaller grain size ofpredicting particular pronunciations for particular itemswas not impressive (p. 67).

    Data from 10 runs of the current model are given inTable 3 along with the human data and data from twoother models that Treiman et al. (2003) used for compar-ison. Multiple runs of the present model provide a closer

    approximation of the human data than the results

    Fig. 3. Percentage of regular pronunciations for Andrews and Scarratts (1998) items: (A) human data; (B) data from PDP simulationsreported in this paper; and (C) data from the DRC.

    Table 2Percentage of regular pronunciations generated by humans andthe two models

    BIN 020 2040 4060 6080 80100N 24 15 8 8 73

    Percent regular

    AS98 6.74 31.89 46.28 69.93 96.53PDP 24.13 55.38 50.31 86.25 90.38DRC 33.33 100.00 100.00 87.50 95.89

    Note. Bins represent ranges of percent regular pronunciationsand are organized according to human data from the originalAndrews and Scarratt (1998) study; N, number of items in each

    bin; AS98, Andrews and Scarratts (1998) subjects; PDP, cur-rent parallel distributed processing model; DRC, Dual RouteCascade model; percent regular, mean percentage of regularpronunciations for nonwords in each bin.

    154 J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160

  • 7/29/2019 NON Word Naming

    11/16

    reported by Treiman et al. (2003) in their Experiment 1,which were based on single runs of earlier models. Forexample, the single run of the Harm and Seidenberg(2004) model tested by Treiman et al. (2003) generated100% critical responses for the Case 4 and Case 5 items.Among runs of the model tested in the current work, dif-ferent runs produced different pronunciations for someof these items, leading to lower (and thus morehuman-like) proportions of critical pronunciations.The only cell in which a large deviation from the humandata was observed is the VC4 condition. Interestingly,the word body in this condition was -IND, and the onlyword in the lexicon for which this body is assigned theregular pronunciation is for one sense of the homographWIND. Homographs were not included in the trainingset (their pronunciations are normally disambiguatedby context, which the current model lacks). Thus the lex-ical item that would contribute most to producing the

    critical pronunciation was not included, and so themodel regularized more than people. Otherwise thesimulation and behavioral data are very similar.

    Individual differences

    Unlike classical box-and-arrow models which aremost often treated as deterministically generating pre-dictions (Coltheart, 1999), PDP models implement aset of principles regarding the acquisition and use of lin-guistic knowledge that are fundamentally probabilistic(Seidenberg, 1997). In a quasiregular system such asEnglish spelling-to-sound, this means that different runsof the same model can arrive at slightly different enco-dings of the same mapping, making it quite natural toexplain individual variability in the human data.

    For example, data from the Andrews and Scarratt(1998) items for 10 runs of the model are depicted inFig. 4. While in all instances the ordinal pattern wasthe same, the relative proportion of regular pronuncia-tions differed considerably across runs of the model, par-ticularly for the items with no regular analogy

    (NRAU/NRAM). This pattern could be interpreted interms of differences in reading style among runs ofthe model. A number of studies have attempted to iden-

    Table 3Difference in proportion of critical vowel pronunciations in experimental and control nonwords from the Treiman et al. (2003) study

    Context CV1 VC1 VC3 VC4 VC5 VC6

    Critical pronunciation /a/ /eI/ /e/ /aI/ /of/ /f/Example Squant Crange Chead Crind Prold Blook

    Human data 0.58 0.55 0.12 0.33 0.83 0.70Current model 0.39 0.59 0.32 0.83 0.77 0.68Harm and Seidenberg (2004) 0.56 0.90 0.40 1.00 1.00 0.80DRC 0.00 0.00 0.00 0.00 0.00 0.00

    Note. C, consonant; V, vowel; and DRC, Dual Route Cascade model.

    Simulation Run

    ProportionRegu

    larPronunciations

    1 2 3 4 5 6 7 8 9 10

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    RCB

    RIBNRAM

    NRAU

    Fig. 4. Percentage of regular pronunciations for Andrews and Scarratts (1998) items from all 10 runs of the PDP model reported inthis paper. RCB (white), regular, consistent body; RIB (black), regular, inconsistent body; NRAM (light gray), no regular analogy,

    many neighbors; NRAU (dark gray), no regular analogy, unique body.

    J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160 155

  • 7/29/2019 NON Word Naming

    12/16

    tify subgroups of skilled (Baron & Strawson, 1976;Brown, Lupker, & Colombo, 1994) and developing(Goswami & East, 2000) readers who vary in the degreeto which they appear to apply grapheme-to-phonemerules or lexical analogies in their reading of nonwords.The current results suggest that at least some of this var-

    iability may be the result of fairly subtle differences inexperience, and may be explicable in terms of the encod-ing of spelling-to-sound statistics rather than differentreading strategies.

    In addition to quantifying the proportion of regularresponses given for each type of item, Andrews andScarratt (1998) also quantified variability among sub-

    jects in the pronunciation assigned to particular non-words. Variability was quantified using theinformation theoretic H statistic, a measure of entropywhich, in this instance, quantifies the heterogeneity ofresponses to a given nonword. It is computed using

    the formulaXpi log2pi; 1

    where pi is the probability of a given pronunciation. Avalue near 0 represents a nonword for which a singlepronunciation is highly dominant, whereas higher valuesrepresent a nonword for which many pronunciations areequiprobable. Because nonwords generally have onlytwo or three possible pronunciations, the maximal valuefor His rarely approached. Furthermore, because valuesfor H are related to the number of observations in-volved, we ran an additional 14 models in order to

    match the number of subjects (24) in the Andrews andScarratt study. As shown in Fig. 5, the greatest consen-sus (thus, lowest H values) was observed in the RCBcondition with incrementally less in the RIB condition.In the human data, there was a large increase in H val-ues between the RIB and NRAM that is much largerthan that observed in the model.

    Similar to the human data, variability was greater forthe NRA items than the items with regular analogies inthe model F(1,126) = 19.82, p < .001. Also like thehuman data, the difference between RIB items andRCB items was not significant, F(1,78) = 1.34. Unlikethe human data, however, the difference between NRAU

    items and NRAM items was significant, F(1,46) = 6.73,p < .05. Overall, the pattern of results from the humansubjects and multiple runs of the model are fairly simi-lar, although the model somewhat underestimates thevariability in the human data for all categories, and par-ticularly for the NRAM condition.

    The lower level of overall variability in the modelmay be the result of the small range in which frequencywas actually manipulated, or the fact that all runs ofthe model started with the same set of weights. A moreserious problem for the model is the large difference invariability between the NRAU and NRAM items, also

    clearly depicted in Fig. 4. The proportion of regularpronunciations for the NRAM items is tightly clusteredaround the (very low) mean. This suggests that whenthere is abundant evidence for a particular body-levelmapping between spelling and sound, this tends tooverride grapheme-level mappings more regularly inthe models than in human subjects. There is little evi-dence that the models generally prefer body-level statis-tics to grapheme-level statistics: The model is as likelyto overestimate the proportion of grapheme-level pro-nunciations as to underestimate it in a given data set(Table 3, cases CV1 and VC1; Fig. 3, NRAU items).However, the NRAM items present a special case inwhich the body-level statistics are rich enough to sup-port good generalization (because the bodies all occurin multiple words) and consistent enough to be imper-vious to small differences in the frequencies of particu-lar words (because they are all 100% consistent at thebody-level).

    Fig. 5. H values reflecting variability in responses for the Andrews and Scarratt (1998) stimuli for humans (A) and the current model

    (B). No data are presented for the DRC because it does not produce variable pronunciations.

    156 J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160

  • 7/29/2019 NON Word Naming

    13/16

    Discussion

    The behavioral studies and modeling discussed hereprovide evidence bearing on theories of how the syste-maticity in spelling-sound correspondences is encodedby skilled readers. Three aspects of the results are more

    consistent with the view that this knowledge consists ofprobabilistic constraints rather than categorical rules.People and PDP models both show a graded sensitivityto consistency in chronometric measures of nonwordreading. Consistency also influences the pronunciationsgenerated by both people and models, suggesting thatstatistical properties of the lexicon can be used produc-tively. Finally, individual variability among readersand runs of a PDP model in the pronunciation of partic-ular nonwords reflects consistency as well: highly consis-tent nonwords generate a high degree of agreementamong readers, whereas less consistent items generate

    a greater variety of responses. These phenomena are lesssuccessfully simulated by an implemented model (DRC)that relies on categorical rules to translate from spellingto sound, suggesting a role for lexical statistics in thetranslation of spelling to sound.

    Consistency effects on response latency

    That consistency influences both word and nonwordreading latency is clear from experiments over a many-year period. Studies of word reading that independentlymanipulated consistency (defined at the word-bodylevel) and regularity (mainly at the grapheme to pho-neme level) have yielded large effects of consistencyand small or nonexistent effects of regularity (Cortese& Simpson, 2000; Jared, 2002). Nonwords yield similareffects. These results follow naturally from a view inwhich spelling-sound correspondences are statisticalrather than categorical. It is thus not surprising thatcomputational models that incorporate statistical learn-ing can capture these kinds of effects.

    The fact that DRC does not account for performanceon the Glushko (1979) nonwords is particularly impor-tant because Coltheart et al. (2001) cite this result as amotivation for adopting cascaded as opposed to thres-

    holded processing: in a cascaded model, there is anopportunity for partially completed output from the lex-ical route to influence nonword pronunciation, provid-ing a potential basis for the advantage of consistentnonwords over inconsistent ones in naming latency.Although used to motivate this property of the DRCmodel, it does not produce the Glushko effect. In princi-ple, the effect could arise in the DRC model from theactivation of exception words in the lexical network,producing a conflict between the two routes. For a non-word to activate words to a sufficiently high level, theparameters governing inhibition must be set to low val-

    ues. Doing so creates a problem, however: the model

    generates lexicalization errors, particularly for simplewordlike nonwords such as STARN. Thus it is difficultto tune the models parameter set so that it correctly sim-ulates this aspect of nonword naming.

    What is the correct pronunciation of CHEAD or MOUP?

    Early discussions of nonword reading in computation-al models focused on pronunciation accuracy: a modelsoutput was scored as correct if it generated a plausiblepronunciation (e.g., one that rhymed with a similarlyspelled word). Besner et al. (1990) noted that the Seiden-berg and McClelland (1989) model frequently producedpronunciations that differed from peoples, suggestingthat pronunciation rules might be required. Plaut et al.(1996) traced this behavior to limitations in the way pho-nological information was represented in such models,rather than the need for rules, and reported human-level

    accuracy, as did Harm and Seidenberg (1999, 2004).Later discussion focused on nonword pronunciation

    at a finer level of detail. Seidenberg et al. (1996) obtainedbehavioral data concerning the pronunciations of sever-al hundred nonwords. They found that many nonwordsare pronounced in more than one way. The two mostcommon pronunciations accounted for more than 90%of subjects responses. Rather than assessing how oftena computational model produced a single, intuitivelyplausible nonword pronunciation, Seidenberg et al.,examined whether the computed pronunciationsmatched either of the most common ones produced bysubjects. In that large-scale study, a connectionist modelwith an improved phonological representation provideda slightly better fit to the data than the Coltheart et al.(1993) version of the dual-route model. This was impor-tant both because of claims that rules were necessary fornonword reading and because the rules in the Coltheartet al., model were specifically created to account fornonword pronunciation.

    The current simulations extend these findings to twoadditional data sets. These studies show that pronuncia-tion of novel forms is influenced by statistical propertiesof the spelling-sound mapping that arise from similarityrelations among words. A central claim of the DRC

    framework is that nonwords are pronounced by apply-ing rules. The rules state how spellings deterministicallymap to pronunciations. The defining characteristic ofthe rules (and the mechanism by which they are applied)is that they are not influenced by statistical propertiessuch as how often particular mappings occur acrosswords. Thus, evidence that such statistical propertiesaffect word and nonword performance counts againstthe dual-route framework. Statistics at the level of indi-vidual graphemes and larger units such as the wordbody, and even ad hoc units such as the oncleus(onset + nucleus) all play a role in determining the pro-

    nunciations of novel items.

    J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160 157

  • 7/29/2019 NON Word Naming

    14/16

    Modeling individual variability

    Extending modeling to account for variability acrossindividuals with respect to nonword pronunciations is anatural direction for research in this area to follow. Forthe DRC model there are two challenges. One is to

    develop an account of how grapheme-phoneme conver-sion rules are learned. At present the model lacks thislearning component. The second challenge would thenbe to account for how different sets of GPCs could belearned, as demanded by the behavioral data.

    Previous assessments of connectionist models perfor-mance also ignored individual variability in nonwordnaming performance: they were conducted by compar-ing mean latencies or modal pronunciations from exper-iments in which many subjects were run to the data froma single run of a model. This introduces a mismatchinsofar as the model data is actually more comparable

    to the results for an individual subject rather than groupresults (see Seidenberg & Plaut, 1998, for discussion).The present research is a step toward more serious eval-uations of individual differences in both human andmodel performance. The data from multiple runs pro-vides a better fit to the human data, especially whenthe dependent measure is a qualitative one such as theproportion of regular or critical responses generat-ed. The model also captures some of the more specificdata concerning the variability of responses to differenttypes of nonwords observed in the Andrews and Scarrattstudy. This suggests a number of possible future direc-tions for research.

    In the current study, we introduced variability byusing slightly different randomizations of the same wordfrequency list for each run of the model. This hardlycaptures the much greater variability in the kind andamount of reading people doeven within the relativelyhomogenous population of university students who par-ticipate in psychology experiments. In addition, currentcomputational models (both connectionist and DRC)are limited to monosyllabic words, which may introduceerror in their performance. Some of the statistics that arerelevant to the pronunciation of monosyllabic wordsand nonwords arise from exposure to a much larger

    vocabulary that includes multisyllabic words. This cre-ates obvious discrepancies between the models experi-ence and the human reader. Furthermore, variabilityin the training regime is not the only possible sourceof variability in connectionist models. There are alsoeffects due to differences in the random assignment ofinitial weights whose effects need to be explored further.

    Finally, the method of examining multiple runs of thesame model can be used to explore whether the patternsof impairment seen in cases of acquired dyslexia arerelated to premorbid individual differences in reading.Such patients sometimes exhibit extreme patterns of dis-

    sociation (e.g., in reading words vs. nonwords); such

    patterns are often taken as a basis for identifying isola-ble processing systems (e.g., routes). However, littleinformation is available about the patients premorbidreading abilities, how often they read and what typesof materials. A given type of brain injury may have dif-ferent behavioral effects as a function of premorbid indi-

    vidual differences. This possibility is the complement ofthe situation studied by Plaut (1996), who demonstratedthat random damage to a single model can producehighly variable patterns of impairment. Thus, behavioralimpariments that are observed in acquired dylsexiadepend on individual differences with respect to bothpremorbid capacities and effects of neuropathology.These factors are likely to produce a wide range ofbehavioral profiles.

    Conclusions

    It is a positive reflection of the degree of sophistica-tion of contemporary models of word and nonwordreading that strong tests of their basic assumptions canbe derived. Several researchers have been able to identifystimuli that contrast the effects of regularity vs. consis-tency in word and nonword reading (e.g., Andrews &Scarratt, 1998; Cortese & Simpson, 2000; Jared, 1997and Treiman et al., 2003). Simulations of these behavior-al studies provide further data bearing on the adequacyof these competing theories. The speed with which non-words are pronounced, the pronunciations assigned tothem, and the degree of agreement among individualsregarding their pronunciation all depend on the statisti-cal properties of their lexical neighborhoods. Such phe-nomena are more easily captured by connectionistmodels which acquire this knowledge via statisticallearning mechanisms. Both behavioral and modelingresults support the view that generalization results fromusing a network that encodes this statistical knowledge,rather than application of rules.

    Acknowledgments

    This work supported by NIMH Grants P50 MH64445, K02 MH 01188, and NICHD grant R01 MH29891 (M.S.S.) and NIH fellowship F32 DC 06352(J.D.Z.). The model used in this paper is based on workby Mike Harm, whom we also thank for the simulationsoftware. We also thank Max Coltheart for making anexecutable version of the DRC available online.

    References

    Andrews, S., & Scarratt, D. R. (1998). Rule and analogy

    mechanisms in reading nonwords: Hough dou peapel rede

    158 J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160

  • 7/29/2019 NON Word Naming

    15/16

    gnew wirds. Journal of Experimental Psychology: HumanPerception & Performance, 24, 10521088.

    Baron, J., & Strawson, C. (1976). Use of orthographic andword-specific knowledge in reading words aloud. Journal ofExperimental Psychology: Human Perception and Perfor-

    mance, 2, 386393.

    Besner, D., Twilley, L., McCann, R., & Seergobin, K. (1990).On the connection between connectionism and data: Are afew words necessary. Psychological Review, 97, 432446.

    Brown, P., Lupker, S. J., & Colombo, L. (1994). Interactingsources of information in word naming: A study ofindividual differences. Journal of Experimental Psychology:Human Perception and Performance, 20, 537554.

    Coltheart, M. (1999). Modularity and cognition. Trends inCognitive Sciences, 3, 115120.

    Coltheart, M. (2000). Dual routes from print to speech and dualroutes from pring to meaning: Some theoretical issues. In A.Kennedy, R. Radach, J. Pynte, & D. Heller (Eds.), Readingas a perceptual process (pp. 475492). Oxford: Elsevier.

    Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993).

    Models of reading aloud: Dual-route and parallel-distrib-uted-processing approaches. Psychological Review, 100,589608.

    Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J.(2001). DRC: A Dual Route Cascaded model of visual wordrecognition and reading aloud. Psychological Review, 108,204256.

    Cortese, M. J., & Simpson, G. B. (2000). Consistency effects:What are they? Memory & Cognition, 28, 12691276.

    Ganong, W. F. (1980). Phonetic categorization in auditoryword perception. Journal of Experimental Psychology:Human Perception and Performance, 6, 110125.

    Glushko, R. J. (1979). The organization and activation oforthographic knowledge in reading aloud. Journal ofExperimental Psychology: Human Perception and Perfor-

    mance, 5, 674691.Goswami, U., & East, M. (2000). Rhyme and analogy in

    beginning reading: Conceptual and methodological issues.Applied Psycholinguistics, 21, 6393.

    Harm, M. W. (1998). Division of labor in a computationalmodel of visual word recognition. Unpublished doctoraldissertation, University of Southern California, Los Ange-les, CA.

    Harm, M. W., & Seidenberg, M. S. (1999). Phonology, reading,and dyslexia: Insights from connectionist models. Psycho-logical Review, 163, 491528.

    Harm, M. W., & & Seidenberg, M. S. (2004). Computing the

    meanings of words in reading: Cooperative division of laborbetween visual and phonological processes. PsychologicalReview, 111, 662720.

    Jared, D. (1997). Spelling-sound consistency affects the namingof high-frequency words. Journal of Memory and Language,36, 505529.

    Jared, D. (2002). Spelling-sound consistency and regularityeffects in word naming. Journal of Memory and Language,46, 723750.

    Jared, D., McRae, K., & Seidenberg, M. S. (1990). The basis ofconsistency effects in word naming. Journal of Memory andLanguage, 29, 687715.

    Keidel, J. L., Zevin, J. D., Kluender, K. R., & Seidenberg, M. S.(2003). Modeling cross-linguistic speech perception. In

    Proceedings of 15th annual International Conference of

    Phonetic Sciences. Barcelona, Spain.Kessler, B., & Treiman, R. (2001). Relationships between

    sounds and letters in English monosyllables. Journal ofMemory and Language, 24, 592617.

    Kirkham, N. Z., Slemmer, J. A., & Johnson, S. P. (2002). Visual

    statistical learning in infancy: Evidence for a domain generallearning mechanism. Cognition, 83, PB35PB42.Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993).

    Building a large annotated corpus of English: The PennTreebank. Computational Linguistics, 19, 313330.

    Norris, D. (1994). A quantitative multiple-levels model ofreading aloud. Journal of Experimental Psychology: HumanPerception and Performance, 20, 12121232.

    Pinker, S. (1999). Words and rules: The ingredients of language.New York: Basic Books.

    Plaut, D. (1996). Relearning after damage in connectionistnetworks: Toward a theory of rehabilitation. Brain andLanguage, 52, 2582.

    Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson,

    K. E. (1996). Understanding normal and impaired wordreading: Computational principles in quasi-regulardomains. Psychological Review, 103, 56115.

    Posner, M. I., Abdullaev, Y. G., McCandliss, B. D., & Sereno,S. (1999). Neuroanatomy, circuitry and plasticity of wordreading. Neuroreport, 10, R12R23.

    Rayner, K., Foorman, B., Perfetti, C., Pesetsky, D., &Seidenberg, M. S. (2001). How psychological scienceinforms the teaching of reading. In Psychological sciencein the public interest monographs Vol. 2 (pp. 3174).American Psychological Society.

    Saffran, J., Newport, E., Aslin, R., & Tunick, R. (1997).Incidental language learning: Listening (and learning) out ofthe corner of your ear. Psychological Science, 8, 101105.

    Seidenberg, M. S. (1995). Visual word recognition: An overview.In P. Eimas & J. L. Miller (Eds.), Handbook of perception andcognition: Language. New York: Academic Press.

    Seidenberg, M. S. (1997). Language acquisition and use:Learning and applying probabilistic constraints. Science,275, 15991603.

    Seidenberg, M. S., & McClelland, J. L. (1989). A distributed,developmental model of word recognition and naming.Psychological Review, 96, 523568.

    Seidenberg, M. S., & Plaut, D. C. (1998). Evaluating wordreading models at the item level: Matching the grain oftheory and data. Psychological Science, 9, 234237.

    Seidenberg, M. S., Petersen, A., MacDonald, M. C., & Plaut,

    D. C. (1996). Pseudohomophone effects and models of wordrecognition. Journal of Experimental Psychology: LearningMemory and Cognition, 22, 4862.

    Seidenberg, M. S., Plaut, D. C., Petersen, A. S., McClelland, J.L., & McRae, K. (1994). Nonword pronunciation andmodels of word recognition. Journal of ExperimentalPsychology: Human Perception and Performance, 20,11771196.

    Strain, E., Patterson, K., & Seidenberg, M. S. (1995). Semanticeffects in single-word naming. Journal of ExperimentalPsychology: Learning, Memory and Cognition, 21,11401154.

    Treiman, R., Kessler, B., & Bick, S. (2003). Influence ofconsonantal context on the pronunciation of vowels: A

    J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160 159

  • 7/29/2019 NON Word Naming

    16/16

    comparison of human readers and computational models.Cognition, 88, 4978.

    Treiman, R., Mullennix, J., Bijelac-Babic, R., & Richmond-Welty, E. D. (1995). The special role of rimes in the

    description, use, and acquisition of English orthography.Journal of Experimental Psychology: General, 124, 107136.

    Venezky, R. L. (1970). The structure of English orthography.Mouton: The Hague.

    160 J.D. Zevin, M.S. Seidenberg / Journal of Memory and Language 54 (2006) 145160