Top Banner
Linguistic Society of America Recent Work in Computational Linguistic Phylogeny Author(s): Joseph F. Eska and Don Ringe Source: Language, Vol. 80, No. 3 (Sep., 2004), pp. 569-582 Published by: Linguistic Society of America Stable URL: http://www.jstor.org/stable/4489723 . Accessed: 18/01/2015 08:16 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Linguistic Society of America is collaborating with JSTOR to digitize, preserve and extend access to Language. http://www.jstor.org This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AM All use subject to JSTOR Terms and Conditions
15

Recent Work in Computational Linguistic Phylogeny

Feb 25, 2023

Download

Documents

Ayat Mohammed
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Recent Work in Computational Linguistic Phylogeny

Linguistic Society of America

Recent Work in Computational Linguistic PhylogenyAuthor(s): Joseph F. Eska and Don RingeSource: Language, Vol. 80, No. 3 (Sep., 2004), pp. 569-582Published by: Linguistic Society of AmericaStable URL: http://www.jstor.org/stable/4489723 .

Accessed: 18/01/2015 08:16

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Linguistic Society of America is collaborating with JSTOR to digitize, preserve and extend access to Language.

http://www.jstor.org

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 2: Recent Work in Computational Linguistic Phylogeny

DISCUSSION NOTES

Recent work in computational linguistic phylogeny*

JOSEPH F. ESKA DON RINGE

Virginia Polytechnic Institute & State University of Pennsylvania University

1. INTRODUCTION. A number of recent attempts by nonlinguists to reconstruct lin- guistic evolutionary trees have made news. Reconstructions of the phylogeny of the Indo-European (IE) family of languages are especially well represented; well-known examples include Rexovai et al. 2003, Gray & Atkinson 2003-which is discussed by Searls (2003) and briefly reported in U.S. news & world report (10 December 2003)-and Forster & Toth 2003, which also generated considerable attention in the popular media. Scientific linguists have not been impressed for a variety of reasons. Though no two of the publications in question exhibit exactly the same weaknesses, all can be impugned on one or more of the following grounds: the linguistic data employed have not been adequately analyzed, or-in some cases-even competently analyzed; the model of language change employed has not been shown to fit the known facts of language change; attempts to fix the dates of prehistoric languages have ignored the fatal shortcomings of glottochronology discovered by Bergsland and Vogt (1962; see further ?4); the researchers assume that vocabulary replacement is governed by a LEXICAL CLOCK (similar to the controversial MOLECULAR CLOCK posited by some biological cladists);' and/or the data set used is too small to yield statistically reliable conclusions.

A thoroughgoing critique of all recently published work in this vein would be un- wieldy and would require far more space than a discussion note permits. Instead, we focus on the article that best exemplifies the shortcomings listed above, namely the work of Forster and Toth.

1.1. In an article published recently in the Proceedings of the National Academy of Sciences, Peter Forster and Alfred Toth propose a new computational method for recovering the dates of prehistoric events of linguistic speciation (2003, cited hereafter as F & T). The article appeared with an online appendix,2 as well as references to a web tutorial describing how F & T handle their linguistic data.3

If F & T's proposal were scientifically adequate, it could represent an important methodological advance in historical linguistics. But we demonstrate in this discussion note that they control neither the data nor the necessary linguistic methodology, that their treatment of linguistic data amounts to an explicit rejection of scientific historical linguistics to the point that it could be called antiscientific, and that their application of computational methods for recovering linguistic evolutionary trees is inadequate to prove the dating of prehistoric linguistic events that they claim.

* We are grateful to Hope Dawson, Brian Joseph, Bill Poser, Tandy Warnow, and the late Larry Trask-to whose memory we dedicate this discussion note-for helpful discussion concerning various issues raised in this note. All opinions expressed are, of course, our own.

1 The hypothesis of a molecular clock postulates that DNA mutates at a constant rate. 2 Available at http://www.pnas.org/cgi/content/full/1331158100/DC1/1. 3 Available under the link to Phylogenetic methods at www.mcdonald.cam.ac.uk/genetics/research.html.

569

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 3: Recent Work in Computational Linguistic Phylogeny

570 LANGUAGE, VOLUME 80, NUMBER 3 (2004)

We wish to emphasize two points at the start. First, we critique F & T's online appendix and web tutorial together with their published article, since all those materials together constitute a coherent published project. Second, the reader should bear in mind that the main thrust of F & T's work is the dating of prehistoric linguistic events. Every other part of their project-including their selection and processing of the data, discussed mostly on the associated web sites-serves as input to the process of inferring dates; thus a serious error in ANY SINGLE part invalidates the inference of dates. We show that there are serious errors in EVERY part of their work.

2. ERRORS IN THE SELECTION AND TREATMENT OF THE DATA. Any attempt to assess

phylogenetic relationships among human languages rests crucially upon the accurate selection and coding of the data. Just as small programming errors can lead to severe problems in the operation of computer software, errors and/or ambiguities in the selec- tion and coding of linguistic data can lead to significant errors in the construction of phylogenetic trees. Work of this kind, then, requires either an intimate knowledge of the languages concerned or access to sources that one can trust to be reliable. F & T elect to base their project on the Celtic languages, but it is very clear to us that they not only are not familiar with the linguistic data that they use, but also are not familiar enough with the current state of Celtic historical linguistics to make critical use of the secondary literature. Sometimes they simply ignore what their sources have to say.

2.1. We first note that F & T treat the ancient Celtic languages of continental Eu- rope-known collectively as Continental Celtic4-as a linguistic monolith. They do allow for the possibility that Transalpine Gaulish and Cisalpine Gaulish were discrete languages, but do not explore the matter, because they say it is 'not crucial' to their methodology. It has long been known, however, that there were a number of discrete Continental Celtic languages. Until recently, these were enumerated as Hispano-Celtic (also known as Celtiberian) from the northern central plateau of the Iberian Peninsula, Lepontic from the northern Italian Alpine lake district and adjacent Switzerland, and 'Gaulish' from modern-day France and northern lowland Italy. To these could be added the sparsely known Galatian from Asia Minor and Noric from Austria and the Balkans. But recent work has now established that Lepontic and Cisalpine Gaulish are, in fact, the same language (Eska 1998) and that Lepontic-and, hence, Cisalpine Gaulish-and Transalpine Gaulish are discrete languages (Uhlich 1999). Thus, we speak henceforth of Transalpine Celtic (Transalp. Celt.) and Cisalpine Celtic (Cisalp. Celt.). In view of the diversity among the Continental Celtic languages themselves, much of which F & T ignore, it appears that they are attempting to find the phylogenetic position of a hybrid construct-Transalpine Celtic plus a portion of Cisalpine Celtic-that never actually existed.

2.2. F & T suggest that their data set of thirty-five CHARACTERS should be considered to be very rigorously selected because they rely exclusively on bilingual texts." They suggest that this is advisable because the Continental Celtic languages are only fragmen-

4 In contrast to the Insular Celtic languages--Irish, Scottish Gaelic, Manx, Welsh, Cornish, Breton-which are principally attested from the medieval period onwards.

5 A character is a linguistic feature that is comparable across all of the languages of the data set. Each language instantiates the character in a particular way, such that it is possible to determine whether the ways that two languages instantiate the character are the same or different. Languages are thus said to share the same STATE Of the character or to exhibit different states.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 4: Recent Work in Computational Linguistic Phylogeny

DISCUSSION NOTES 571

tarily known, and texts accompanied by Latin translations should be those in which the Celtic linguistic forms can be most reliably identified and interpreted. They make the fatal error, however, of assuming that bilingual texts always provide one-to-one TRANSLATIONS. As is well known, such texts may provide only rough or categorial equivalents across their two linguistic components, or not even be equivalent at all (see many of the articles in Adams et al. 2002). Even when actual translation is involved, the languages in such texts can differ significantly in their grammars and formulaic and idiomatic constructions, necessarily rendering the 'translation' rather free. We sur- vey only some of the errors F & T commit in their selection and assessment of the linguistic data, which can be amplified many times over.

2.3. In their analysis of bilingual Latin-Celtic inscriptions, F & T rely on Lambert 1994 as their only source. They have not done sufficient research to know what the consensus is in the field, or, indeed, whether consensus even exists with regard to individual details of analysis. Such a practice, of course, can lead nonspecialists astray. In the bilingual Latin-Cisalpine Celtic inscription from Vercelli, for example, Lat. deis et hominibus 'to gods and men' is translated by Cisalp. Celt. TeuoXTonion;6 F & T follow Lambert (1994:78) in attributing to Michel Lejeune, the editor of the volume of the standard corpus in which the inscription is described, the view that the Celtic form is a compound adjective, noting that there is an alternative view that the form is a dvandva compound noun (translated 'x and y'). But Lejeune describes the form as 'un dvandva parfait' (1988:36). It continues the etymon *deiuo- 'god' and the nominalized derivative *dfi'iom + -io- 'that which pertains to the earth', i.e. 'earthling', i.e.

'human'.7 This is indicative of F & T's inattention to getting the data correct.8 They also introduce a further error by coding each stem of the compound gen. pl. TeuoXTon- ion (cf. Gk. -wv, Lat. -um) as dative plural,9 either on the basis of the Latin text, or perhaps because they misunderstand Lambert's translation of the genitive plural case by 'appartenant aux' (1994:76). The grammar of the two languages is different, but F & T do not seem to realize this.

2.4. F & T obtain other translations not from bilingual inscriptions, but from inde- pendent Latin and Transalpine Celtic texts which they assume translate each other. This information comes from a corpus of graffiti from La Graufesenque, a ceramic factory in operation during the early centuries of the common era. The artisans fired their wares in common and kept records on pieces of broken ceramics to track each

6 We cite forms engraved in indigenous scripts in boldface. Note that the variety of Etruscoid script used to engrave Cisalpine Celtic does not distinguish the voicing of plosives: hence, the use, for example, of the archigrapheme (T) to represent both /t/ and /d/.

7 F & T do follow Lejeune (1988:36) in claiming that Lat. comunem deis et hominibus 'to gods and men in common' is translated by Cisalp. Celt. TeuoXTonion, but this is not correct. Lat. comunem 'in common' simply is not translated in the Celtic text.

8 We note that F & T likewise follow Lambert's erroneous representation and translation of the Latin- Cisalpine Celtic bilingual inscription of Todi (Lambert 1994:74-76). An examination of Lejeune 1988:46-49 reveals that Lambert ignores the fragmentary characters of the accusative argument of the Latin text on one face of the inscription and the space for the accusative argument (now lost) on the second. This error, of course, leads Lambert to an incorrect translation.

9 As noted in ?3.1, one must code the characters accurately by knowing what etyma, morphemes, or phonemes are cognate; failure to do so inevitability results in a phylogenetic tree in which little confidence can be placed.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 5: Recent Work in Computational Linguistic Phylogeny

572 LANGUAGE, VOLUME 80, NUMBER 3 (2004)

individual's production. Some of these accounts were kept in Latin, some in Transalpine Celtic, and some in a combination of both.10

Many of these accounts are headed by the term furnus 'oven' + ordinal numeral (sometimes + oneratus 'loaded') in Latin, to which F & T compare accounts headed by tuO(O)os + ordinal numeral (sometimes + luxtos or luxtodos) in Transalpine Celtic, making the assumption that tuO(O)os and luxtodos translate Lat. furnus and oneratus, respectively. But though they cite Robert Marichal's standard corpus of the graffiti, they ignore him when he explicitly states that 'Furnus n'est pas une traduction [of tuO( O)os], mais une <<equivalence>' (1988:96), that is, the two terms fill a common categorial slot in these accounting records. In fact, the Transalpine Celtic lexeme proba- bly continues *t(o)-us-to-, a verbal adjective built to the Proto-Indo-European root *hieus- 'burn' (as proposed by Lambert 1989:261), and refers not to the kiln, but to the group of ceramic wares that have been fired."1 The Latin and the Transalpine Celtic documents are of a kind, but they do not translate each other.12

2.5. F & T obtain other translations from 'internal information', that is, 'translations which can be deduced from the internal system of a Gaulish record in the absence of

accompanying translation or depictions'. In this category, they correctly identify some onomastic phrases in the Transalpine Celtic inscription of Larzac which take the form of 'x mother of y' or 'x daughter of y'. This gets them the lexemes matir 'mother' and duxtir 'daughter'-though we wonder how they can tell which is which without re- course to etymology, given that a translation is not provided'3-and a-stem inflectional exponents nom. sg. -a and, according to them, gen. sg. -as and -ias. But gen. sg. -as does not occur in this text.'4 It does occur in Hispano-Celtic, Cisalpine Celtic, and early Transalpine Celtic, but not in any of the texts from which F & T extract their data. The genitive singular in -ias of the inscription of Larzac reflects the later Transal- pine Celtic merger of the a- and r-stem inflectional paradigms.'5

2.6. Another category of translations that F & T make use of are Continental Celtic lexemes reported by classical writers for which translations are provided. In the interest of rigor, they eventually elect to make use of only three lexemes recorded in Pliny's

10 Indeed, this language mixing is what the graffiti of La Graufesenque are particularly known for (see Adams's recent study (2003:687-724); Adams remarks that the 'code-switching [is] on such an extensive scale that the base language is sometimes difficult to determine' (688-89)).

11 Marichal, himself, speculates that tu0( O)os luxtos (approx. 'group of fired ceramic wares of the load') translates Lat. furnus (1988:99-100), but this does not seem at all obvious to us.

12 F & T also use the Transalpine Celtic ordinal numerals from 'first' through 'tenth' as characters, but, since the graffiti do not present bilingual texts in the sense of translations, and we find only fragmentarily preserved pri[mus] 'first', secun[dus] 'second', and, perhaps, de[cimus] 'tenth' in the Latin texts, we wonder how they identified the Celtic forms without resorting to cognation, which they expressly eschew (see ?3.1).

13 One may note that Albanian moter 'sister' continues a form of Proto-IE *meh2ter-'mother', thus demon-

strating that semantic shifts in female kin terms are possible and that the meanings cannot be taken for

granted (we thank Brian Joseph for drawing this fact to otir attention). 14 F & T wrongly print as adiegas the genitive singular form which Lambert (1994:161) correctly prints

as adiegias. This is another example of their inattentiveness to getting the data right-and one that directly affects their results.

15 We take this opportunity to note that, while it is true that there are many items of the Continental Celtic lexicon for which the precise meaning is not known, the inflectional morphology is well understood on the basis of cognation, especially in the nominal system. F & T could have made much greater use of this type of data, which we believe provides better evidence for phylogenetic research than lexical items; cf. Ringe et al. 2002:63-65 and see further in ?2.7.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 6: Recent Work in Computational Linguistic Phylogeny

DISCUSSION NOTES 573

(23/24-79 AD) Naturalis historia (NH). Because Pliny was born in Como in northern Italy and traveled widely in the Roman army, F & T conclude that he would have been acquainted personally with spoken Celtic. This is quite a leap of faith, however, for the Cisalpine Celtic epigraphic corpus terminates in the first century BC and the Hispano- Celtic corpus no later than the first decades of the first century AD. These languages may have been spoken for some time afterwards, especially in rural areas, but this does not guarantee that Pliny had firsthand knowledge of the information that he reports or that it was accurately described as Celtic.

We do know that two of the forms selected by F & T are genuine,16 for they were borrowed into Latin and surface in the Romance languages (sometimes with additional suffixation): thus alaudae (dat. sg.), glossed by galerita 'lark' (NH 11.121), is continued by Fr. alouette, Ital. allodola, and Sp. alondra (Delamarre 2003:36), and uiriolae (nom. pl.), glossed by dardania 'Dardanian (objects)' (NH 33.39-40),17 is continued by Fr. virole and Friulian viruele (Delamarre 2003:321). The third lexeme they select, how- ever, is anything but reliable: eglecopalam (acc. sg.) is glossed by columbinam (acc. sg.) 'dove-colored marl' (NH 17.46), but its etymological analysis is mysterious and it is not continued elsewhere in Celtic or as a borrowed term in Latin or Romance. As F & T state, it is 'not widely discussed in the literature', but they are simply wrong to remark that this fact 'presumably indicates a significant reporting bias: words seem to have been underrepresented in the literature if they bear no resemblance to Indo-Euro- pean and particularly to Celtic languages'. Eglecopalam has not often been discussed in the literature because it has defied analysis. For the record, it is listed, without comment, by Billy (1993:70), who notes the manuscript variant glecopalam; Walde and Hofmann (1938-54:1.395) state that its etymology is unexplained; Whatmough (1970:565) lists it with a question mark to query its genuineness; and O'tir (1930:39, 55, 66, 109) believes that the form is non-Indo-European.18 We simply are not in a position to confirm Pliny's statement. F & T would have done much better to make use of the numerous Continental Celtic lexical items mentioned by classical writers that are validated by the appearance of cognates in Insular Celtic or loanwords in Greek, Latin, or Romance.

2.7. F & T include characters in their data set, as we note in ??2.4-2.5, that have been identified on the basis of cognation rather than 'translation'. There are hundreds of similar characters that they could likewise have used. To cite just a few examples from the Transalpine Celtic lexicon, we find ratin (acc. sg.) as a cognate to Old Irish (OIr.) rdith 'fort'; suiorebe (dat.-instr. pl.) as a cognate to Sanskrit (Skt.) svdsar-, Lat. soror, OIr. siur 'sister'; anmanbe (dat.-instr. pl.) as a cognate to Skt. nhma, Gk. 6votxot, Lat. nomen, OIr. ainm 'name'; and gabi (2. sg. impv.) as a cognate to OIr. gaib 'take!'. But lexical items can be easily borrowed, so when attempting to establish phylogenetic relationships, it is prudent, as has long been known (Meillet 1925:22-33), to make use of cognate phonological and, especially, morphological characters. One could, for example, compare the Proto-Celtic development of Proto-IE */g/ > */b/ with related languages, or the raising of Proto-IE */o:/ > */u:/ in final syllables and lowering to

16 Though they are not certainly Celtic, since cognates are not attested in the Insular Celtic languages.

"7 So-called, Pliny tells us, because the fashion for sporting such jewelry came from the Dardani, an Illyrian tribe of the Balkans which also had connections to Asia Minor. He also attributes the variant viriae (nom. pl.) to Hispano-Celtic. Later, anonymous, scholiasts gloss viriola as KhrtVLov 'bracelet', q~xiov 'armlet, bracelet', and 7repLXPLOLt (nom. pl.) 'armlet, bracelet' (Goetz & Gundermann 1888:209).

18 Walde and Hofmann (1938-54:1.395), however, label his discussion of the form as 'phantastisch'.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 7: Recent Work in Computational Linguistic Phylogeny

574 LANGUAGE, VOLUME 80, NUMBER 3 (2004)

/a:/ in other environments in Proto-Celtic, or the treatment of the Proto-Indo-European syllabic liquids as */aC/ word-initially and finally and before */m, n, 1, r, s/ in Proto- Celtic, but as */Ci/ in other environments. Just among the singular o-stem nominal inflectional morphology of Transalpine Celtic, we can straightforwardly identify nom. -os with Skt. -as, Gk. -o;, Lat. -us; acc. -on with Skt. -am, Gk. -ov, Lat. -um; dat. -ovt (= /u:j/) with Gk. -wt, Oscan -iii; and voc. -e with Skt. -a, Gk. -c, Lat. -e.

More attention to phonological and morphological characters would have revealed to F & T the internal differentation of the Continental Celtic languages. For example, Proto-IE */ej/ is largely preserved intact in Hispano-Celtic, while in early Cisalpine Celtic it is preserved word-finally, but has monophthongized to /e:/ word-internally, and in Transalpine Celtic the monophthongization is complete in all environments. Likewise, Cisalpine Celtic continues Proto-IE o-stem gen. sg. *-osio as -oiso,19 while later Cisalpine Celtic and Transalpine Celtic have replaced it with -r, and Hispano- Celtic has replaced it with -o.

F & T argue that relying on cognation would bias the data set against AUTAPOMORPHIES (unique innovations) in 'Gaulish', but we know enough about the Continental Celtic languages to be able to parse texts for the most part, thus revealing innovatory states in morphology, at least, even if we cannot always translate them completely. By examin- ing the totality of the available evidence, then, we not only recognize characters that have cognates in related languages, but also identify many innovations, as well. F & T's decision to use a minimal data set to assess the phylogenetic relationship of fragmen- tarily attested languages was, therefore, exactly the wrong thing to do.

3. ERRORS OF LINGUISTIC METHOD. Though the errors in F & T's data are, in them- selves, sufficient to undermine their conclusions, the errors in their linguistic analysis of the data are much worse. At least three are serious enough to merit discussion here.20

3.1. THE FATAL ERROR. In coding comparative lexical data, it is usual to assign the same STATE to all cognates. For instance, for the character 'son' in a data set of modem Indo-European languages, we would assign a single state-say, 1-to Eng. son, Germ. Sohn, and Russ. syn, since they are all cognate with each other, but a single different state-say, 2-to Fr. fils, Sp. hijo, and Ital. figlio, since they are cognate with one another but not with the first set of words; the former set continues *suhxnu-, while the latter are all descended from Lat. ftlius, which represents an innovation. (How the states are represented does not matter; what matters is sameness and difference of states.) The advantages of such a method are decisive. Most importantly, it is possible to PROVE that forms are cognates by showing that they exhibit multiple regular sound correspondences that recur significantly often throughout the comparative word list. The provability of cognation is one of the cornerstones of scientific historical linguistics. Moreover, because cognates are jointly inherited from a common ancestor, they provide linguistic continuity against which evolutionary changes in various lines of descent can be evaluated. There are No significant disadvantages to the use of cognate classes.

But F & T decline to use cognation in coding their data. They claim in the online appendix (see n. 2 above) that 'grouping of item translations according to etymology [i.e. cognate classes], .... would be feasible if the underlying language tree were known.

19 Altered via analogy with pronominal gen. pl. -oisam (Eska 1995:42). 20 Many of the observations made in this section were also made by the late Larry Trask in a posting (ref.

no. 14.1825) to the Linguist List on 7 July 2003.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 8: Recent Work in Computational Linguistic Phylogeny

DISCUSSION NOTES 575

However, etymology assumes a particular language tree ...'. This claim is patently false. Cognation is completely independent of the shape of the evolutionary tree (or network), because cognates are forms jointly inherited from ANY common ancestor. Two simple examples illustrate the point: Eng. brother and Germ. Bruder are cognates because both forms are inherited from Proto-West-Germanic; but both forms are also cog- nate with Fr. frdre and Russ. brat, because all are inherited from Proto-IE *bfreh2ter-. Moreover, Germ. Hals and Fr. cou are cognate (they reflect a preform *kolso- which apparently reflects a root *kelhx- 'raise'), yet neither is cognate with Eng. neck (which continues the root *knok- 'projection') nor Russ. s'eja (whose etymology is unclear), in spite of the fact that English is much more closely related to German than French is. The fact that English and German are much more closely related than the other languages is simply irrelevant to the cognate status of these forms.

Instead of using cognate classes, F & T code their data according to personal judg- ments of superficial similarity between the forms in question (see their online tutorial). We have been unable to discover any principles on which their judgments might be based. For instance, OIr. and Mod. Ir. sorn 'oven' is assigned a different state from Breton (Bret.) and Occitan (Occ.)forn, even though all of the forms begin with voiceless anterior fricatives and they are otherwise identical; but Sp. horno, which does not have an initial consonant (the (h) being purely graphic), is assigned the same state as the /f/-initial forms. In fact, all of the Insular Celtic forms that F & T cite as comparanda in their data set for 'oven' (Welsh (Wel.) ffwrn in addition to the forms cited above), regardless of their initial consonant-aside from Scottish Gaelic (Sc. Gael.) abhan, which is borrowed from Eng. oven-were borrowed from Lat. furnus (see McManus 1983:27-28 on Irish and Jackson 1953:274 on Welsh and Breton), whereas all of the Romance forms (Fr. four and Ital. forno in addition to the forms cited above) are genuine cognates; thus F & T's coding of this character groups together all cognates AND all borrowings in which the initial consonant was not replaced, against borrowings with replacement of the initial consonant. Even stranger is the fact that Sp. hija 'daugh- ter', which differs from its Romance cognates (Fr. fille, Occ. filha, Ital. figlia) in the same way, is assigned a unique state. It is difficult to see how these inconsistencies can be defended.

The character for 'loaded' is no better. So far as we can tell, the forms are judged to be similar if they begin with /1/. As a result, F & T group together basic adjectives meaning 'full' (OIr. and Mod. Ir. Idn, Wel. Ilawn), an etymologically related participle meaning 'filled' (Sc. Gael. lionta), and Transalp. Celt. luxtodos and Eng. loaded, which are etymologically unrelated to the other Celtic forms and to each other (the Insular Celtic forms, save for Bret. karget, continue the root *plehl- 'fill', English continues *leit- 'go forth', and Transalpine Celtic probably continues

*leu,- 'break (off)'). Once

again, a Breton loanword from Fr. charge is assigned the same state as the genuinely cognate Romance forms (also Occ. cargat, Ital. carico, Sp. cargado).

The coding of 'and to men' is, likewise, confused. Cisalp. Celt. -XTonion, OIr. ocus do dainib, Mod. Ir. agus do dhaoine, and Sc. Gael. agus do dhaoinean, which are both similar and cognate, are assigned state (a); so is Bret. ha d'an dud, which is an unrelated collective sharing, at best, no more than an initial consonant with the other Celtic forms. Lat. et hominibus, Fr. et aux hommes, Occ. e als bmes, Ital. ed agli uomini, and Sp. y a los hombres are assigned a different state, yet their root syllable, at least, is actually cognate with that of the Celtic forms (they all continue

*df^,fom- 'earth'), save for

Breton.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 9: Recent Work in Computational Linguistic Phylogeny

576 LANGUAGE, VOLUME 80, NUMBER 3 (2004)

At least equally strange is the coding of the character for 'and'. Ital. e is assigned the same state as Basque (Basq.) eta, but Sp. y is assigned a different state. The first decision about 'and' could conceivably be defended with the observation that there is an Italian prevocalic variant ed which exhibits a consonant similar to that of eta. But why does it follow that e and y, two extremely similar forms (each consisting entirely of a nonlow front vowel), must be assigned different states?

In some cases, F & T miss differences that are actually significant. For instance, all of the forms for 'bull', save for Eng. bull and Basq. zezen, are assigned the same state, because all are roughly similar (and happen to be cognate in this case). But whereas Gk. T•raipo;, Lat. taurus, Fr. taureau, Occ. taur(e), and Ital. and Sp. toro reflect a

preform *tauro-, all and only the Celtic forms-Transalp. Celt. tarvos, OIr. tarb, Mod. Ir. and Sc. Gael. tarbh, Wel. tarw, and Bret. tarv-reflect *taruo-, which shows a metathesis in the medial consonant cluster. The sound change in this form is diagnostic of Celtic, and it is a convincing SYNAPOMORPHY (significant shared innovation) precisely BECAUSE it is unexpected. Practically all of the forms for 'crane' are likewise assigned the same state, though they reflect several preforms: Transalp. Celt. garanus, Wel. and Bret. garan,2' Gk.

ye/pavog, Lat. grus, Fr. grue, Occ. grua, Ital. gru, and Sp. grulla

all continue nominalizations of the root *gerh2- 'cry hoarsely' (either with a suffix *-no-, as in Celtic and Greek, or a somewhat unclear suffix *-u-, as in Latin and its descendants), while Old and Modem Irish continue a northern Wanderwort of the shape *kork-.22 The examples we adduce are representative; arbitrary coding is the norm in F & T's data set.

It is important to understand what this means in terms of methodology. In coding their data, F & T have rejected the use of cognation classes, which can be proved, in favor of subjective judgments of similarity, which cannot even be replicated. This is much worse than merely unscientific; it amounts to an explicit rejection of scientific

linguistics, and is therefore unacceptable. Of course, it invalidates not only their results, but also their methodology.

3.2. OTHER ERRORS OF LINGUISTIC METHOD. Other methodological errors in- clude-but are not restricted to-poor choice of comparative characters and inadequate descriptive analysis of the characters chosen. For this short discussion, a couple of

examples must suffice. One of the most puzzling features of F & T's comparative data set is the inclusion

of the consonant cluster /ps/ as a character. They imply-correctly, though they charac- terize it as 'loss'-that inherited */ps/ merged with */ks/ in Proto-Celtic (realized pho- netically as [xs]). They also say that the cluster is 'frequent' in English, Latin, and Greek, but rare in all the other languages examined. No evidence of a careful quantitative study is offered to support these assertions;23 for the sake of argument, we leave that aside and concentrate on the status of /ps/ in English. In most English examples of the cluster, the /p/ belongs to one morpheme and the /s/ to another; typical are tips (plural of tip) and hopes (3. sg. present of hope). Virtually all other instances of /ps/ occur in

21 Wel. crychydd also means 'crane'. 22 As does Scottish Gaelic, which F & T code as lacking a form for this character. 23 Which we believe would show that the cluster is rare in all of the languages, though rarer in some than

in others.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 10: Recent Work in Computational Linguistic Phylogeny

DISCUSSION NOTES 577

forms borrowed from other languages in recent centuries. The only example we can find that is inherited from Old English is cops, the name of a U-shaped piece of ironmon- gery (< OE cops, cosp; see the Oxford English dictionary, s.v.). How, then, can the incidence of/ps/ in English shed any light on the position of 'Gaulish' in the evolution- ary network of Indo-European languages? Any such inference would be antihistorical. Nor is the fact that bimorphemic /p + s/ sequences have arisen in English of any relevance, since the cluster has also arisen in Transalpine Celtic (!) in the well-known form exsops 'blind' from the inscription of Chamalibres, which was altered from *exsoxs (with root *h3ek"- 'see' + nom. sg. -s > Proto-Celt. *[oxs]) by analogy with, for example, gen. sg. *exsopos and dat. sg. *exsope, thus leveling the paradigm.24 Further- more, F & T's suggestion that Celtic substrate influence is responsible for the elimina- tion of inherited /ps/ in the Romance languages is unnecessary; it appears to reflect the commonsense, but false, idea that language change is somehow unnatural and must be explained by external forces. In fact, sound changes that simplify consonant clusters are a universal tendency of human language and can, therefore, recur independently.25

Almost equally strange is the inclusion of a very general word-order parameter as a character. The number of potential orders for the main constituents of a sentence is very limited, and languages often share them by chance. Conversely, closely related languages can exhibit different basic word orders. English, for instance, is a relatively uncomplicated SIVO language (where I marks the position of the tensed verb and V the position of the lexical verb), while its close relative German is SOVI in subordinate clauses and exhibits the famous verb-second (really I-second) constraint in main clauses. But if such a character were to be included, it should at least be analyzed cogently and coded correctly. F & T have done neither. They examine only the order of subject and (tensed?) verb; they are apparently working only from surface word order, and for Breton, which is a verb-second language, they list the order as VS.26

In short, F & T's analysis of the data does not meet even the minimum standards of the field.

4. GLOTTOCHRONOLOGICAL ERRORS. We here discuss errors in F & T's attempt to assign real dates to prehistoric languages; shortcomings in their computational approach are addressed in ?5.

It is clear that F & T posit a uniform lexical clock for the replacement of vocabulary as languages evolve, analogous to the molecular clock which has provoked so much debate in evolutionary biology (see the references in Sanderson 1997:1218). But Bergs- land and Vogt (1962 with references) demonstrated long ago, using actual historical data, that the rate of basic vocabulary replacement varies significantly from lineage to lineage, not only from period to period within a lineage and from lexeme to lexeme.

24 We note that Proto-IE */ks/, */gs/, */ps/, and */kWs/ merged as */ks/ (realized phonetically as [xs]) in Proto-Celtic; the labialization of */kW/ > /p/ in only some Celtic languages was a much later development.

25 And even common sense is not very well served by their suggestion: are we to suppose that central and southern Italian dialects reflect a Celtic substratum?

26 This is far from being F & T's only descriptive error. Some other examples: the Old and Modern Irish and Scottish Gaelic o-stem nominative singular is not marked, as stated by F & T, by the absence of a grammatical exponent, but by a final consonant of neutral quality, that is, nonpalatalized and nonvelarized. The genitive singular is not marked by internal vowel change, but by a final palatalized consonant, for example, in OIr. nom. sg. dliged 'law', the (e) of the final syllable (realized as [a]) indicates that the final consonant is neutral, while the (i) (realized as [a]) in gen. sg. dligid indicates that it is palatalized.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 11: Recent Work in Computational Linguistic Phylogeny

578 LANGUAGE, VOLUME 80, NUMBER 3 (2004)

F & T suggest that the differences can simply be averaged out, that is, that the overall rate of lexical replacement does not change over time. But, in fact, variation across lineages is the one type of variation for which such an approach CANNOT correct.27

Nor is that the only serious shortcoming in this aspect of F & T's work. Because rates of lexical retention vary along all possible parameters, it is strongly advisable to use as large a data set as possible so as to minimize the distorting effects of chance factors. A striking illustration of the point is provided by Tischler (1973:97-100). In attempting a glottochronological study of the diversification of Proto-Indo-European, Tischler obtains widely varying results using the Swadesh 100-word list but much more consistent results with the 200-word list. Most interestingly, many of the dates obtained with the shorter list strike many researchers as unrealistically early, while those obtained with the longer list are much easier to square with the archeological data (on which, see especially Mallory 1989). It appears that the use of insufficient data can yield unrealistically large time depths for linguistic diversifications. It might not be an acci- dent, then, that F & T's use of a minimal data set leads them to posit time depths that are, at best, very difficult to believe. Indeed, their suggestion of 8100 BC for the diversification of Indo-European in Europe (2003:9083) is outside the range of mainstream opinion by at least three millennia (see e.g. Mallory 1989:151, 159). It is true that Gray and Atkinson (2003) arrive at an even earlier date for Proto-Indo- European-8700 BC-using a larger data set. But since they use a very different model of evolution whose relation to the known process of linguistic evolution is obscure at best, it can hardly be claimed that their work and F & T's support each other. All we really have are multiple demonstrations that the use of questionable methods tends to give early dates. Indeed, all of these dates are even earlier than the range of dates proposed in Renfrew 1987-dates that are themselves so early that they are 'wholly incongruent with our reconstruction of the Indo-European vocabulary' (Mallory 1989: 179). F & T's suggestion of 3200 BC for the diversification of Celtic is likewise inconsis- tent with the findings of mainstream linguistics and archeology.

Finally, it is an open question as to whether glottochronological studies based on poorly recorded languages of the past (such as the Continental Celtic languages) can ever yield trustworthy results. Embleton (1986:68-148) demonstrates that many of the problems that beset glottochronology can be overcome-but many of her advances were made by adducing as much evidence as possible about the actual histories of the languages under investigation. Our fragmentary knowledge of many ancient lan- guages simply does not provide the information needed to apply Embleton's approach intelligently.

For all of these reasons, well-informed linguists should regard the dates that F & T propose as speculative at best. Indeed, these dates may be worse, since they give a specious appearance of resting upon serious glottochronological work. They do not.

5. SHORTCOMINGS OF COMPUTATIONAL METHOD. F & T's understanding of computa- tional techniques for recovering evolutionary history is certainly much better than their grasp of linguistics, though the following discussion shows that it is still not good enough.

As noted in 94, F & T assume a lexical clock, but they have not demonstrated that their data are consistent with the assumption of such a clock. The assumption needs

27 See ?5 for an effective way of dealing with the problem.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 12: Recent Work in Computational Linguistic Phylogeny

DISCUSSION NOTES 579

to be tested if their claims are to be credible, but they cannot perform the relevant tests until they have addressed a more fundamental problem, which can be described as follows.

There are two prerequisites for testing whether a given evolutionary data set deviates from a molecular or lexical clock (and if so, by how much). First, one must have determined the structure of the tree that underlies the network. F & T appear to have met that requirement; the tree they give in figure 3 in their article (2003:9083) is unproblematic and should not surprise any specialist. The second prerequisite is much harder to meet, however: one must have a good estimate of how many evolutionary events within the data set occurred along each EDGE of the tree (i.e. between each pair of connected nodes). If one is working with characters that are necessarily binary, and if the data exhibit no HOMOPLASY (parallel development), and if BACKMUTATION is absent,28 then the problem is trivial, since between every two different states of a given character there can be only one evolutionary event. If a character can have more than two states, however, there is more than one sequence of events that can lead to each such difference of states. A change of state in one of two diverging lineages gives rise to such a difference, but so does a change of state in each of the two lineages. In fact, an indefinitely large number of state replacements in one or both of the lineages gives rise to the same simple difference of states. In other words, if one works with HAMMING DISTANCES,29 computed by a simple count of the characters for which two nodes do NOT share states, one inevitably underestimates the true evolutionary distances between the nodes.

Unfortunately, it seems clear that F & T ARE working with Hamming distances, and since they are also working with lexical characters, which can have an indefinitely large number of states, they cannot recover true evolutionary distances, and so cannot test for deviation from a lexical clock (even if they have eliminated homoplasic charac- ters, and even though backmutation is normally absent from linguistic data). This is disastrous, because it undermines their method regardless of whether their glottochrono- logical calculations are correct (see ?4).

It appears that F & T were too quick to conclude that lexical characters are the best data for this kind of work (2003:9080 and the online appendix). But even if they were able to find the true evolutionary distances on their tree (for instance, by working with binary phonological characters based on idiosyncratic sets of phonemic mergers), they would still need to establish whether their data are compatible with the assumption of clocklike evolution. Several relevant tests have been devised and published (see e.g. Muse & Weir 1992 with references to earlier work). Note that it would not be necessary to know where in the tree the ROOT (i.e. the node representing the protolanguage) falls, nor to adduce an OUTGROUP,30 in order to run such a test. Even an unrooted tree with

28 Backmutation is the replacement of a linguistic state x by another state y (and potentially then by z, etc.), then by x again-that is, the earlier state recurs in the same line of development. For reasons briefly discussed in Ringe et al. 2002:70, backmutation is rare in linguistic evolution and is easily excluded from linguistic data sets.

29 Hamming distances are pairwise distances between TAXA (e.g. languages) computed by simply adding up how many characters exhibit DIFFERENT states in the two taxa. Traditional lexicostatistics is based on Hamming distances.

30 An outgroup is a taxon known to be more distantly related to the taxa in the data set than any of them are to each other; using outgroups is an obvious and well-established way of finding the ROOT, or parent, node of a clade.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 13: Recent Work in Computational Linguistic Phylogeny

580 LANGUAGE, VOLUME 80, NUMBER 3 (2004)

no outgroup can be tested by rooting it on each likely edge and at each likely node and running the test for each such rooting. If it turned out that the data were not consistent with a single regular rate of biological evolution, the next step would be to try to quantify the deviation from an evolutionary 'clock' (see e.g. Kishino et al. 2001 with references to earlier work). Given such an estimate, plus at least one internal node of the tree to which a date could be assigned (a CALIBRATION POINT), dates could be

assigned to the other nodes, relying on the evolutionary distances. The method outlined in Sanderson 1997 is perhaps the most appropriate in this particular case; Classical Latin would provide the calibration point.

The issues discussed in this section are well known in the community that applies computational methods to the study of biological evolution (see the references in the preceding paragraph). It is, therefore, surprising that F & T do not address them.

6. DISREGARD AND MISUSE OF PRIOR WORK. Yet another serious shortcoming of F & T's article is their almost complete disregard of earlier work that is directly relevant to their own. We noted in ??2-2.1 that F & T are almost completely unfamiliar with the specialist literature on Continental Celtic languages (and Celtic in general). They likewise exhibit little evidence of familiarity with the long and complex debate on glottochronology; they do not even cite Embleton 1986.

Most surprisingly of all, they do not refer to earlier work, including computational work, on the subgrouping and glottochronology of Indo-European. Tischler 1973 is not mentioned, nor is Dyen et al. 1992. They refer in passing to Warnow 1997, but not to Ringe et al. 2002, which is both more up-to-date and much more directly relevant to their work. None of the articles we referred to in ?5 is mentioned by F & T, but they cite Bergsland & Vogt 1962 in support of their contention that lexical replacement is, in fact, clocklike, which is precisely the opposite of what that article demonstrates.

7. CONCLUSIONS. Methodological advances in any area of linguistics are to be wel- comed, but only after they have been held up to close scrutiny, especially when the claims are as potentially significant as those of F & T. We have shown that their selection and analysis of data are full of errors, that their confusion about what kinds of evidence are valuable for research in linguistic phylogeny has compromised their project, and that their rejection of the principles of the comparative method is not only counterproductive, but also completely antithetical to historical linguistics as a science. Most importantly, they have not addressed the crucial computational problems involved in phylogenetic reconstruction from comparative data.

The only more general conclusion that can reasonably be drawn is that one can pursue neither historical linguistics nor computational phylogeny successfully without the relevant specialist training. The assumption that there is really nothing to be known about human language, so that untrained outsiders can rush into the field and tell the linguists how to do it properly, is arrogant, as well as obviously false. F & T's article is effectively a paradigmatic example of how wrong one can go in such instances.

REFERENCES

ADAMS, J. N. 2003. Bilingualism and the Latin language. Cambridge: Cambridge University Press.

ADAMS, J. N.; MARK JANSE; and SIMON SWAIN (eds.) 2002. Bilingualism in ancient society: Language contact and the written word. Oxford: Oxford University Press.

BERGSLAND, KNUD, and HANS VOGT. 1962. On the validity of glottochronology. Current Anthropology 3.115-53.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 14: Recent Work in Computational Linguistic Phylogeny

DISCUSSION NOTES 581

BILLY, PIERRE-HENRY. 1993. Thesaurus linguae Gallicae. Hildesheim: Olms-Weidmann. DELAMARRE, XAVIER. 2003. Dictionnaire de la langue gauloise: Une approche linguistique

du vieux-celtique continental. 2nd edn. Paris: Errance. DYEN, ISIDORE; JOSEPH KRUSKAL; and PAUL BLACK. 1992. An Indoeuropean classification:

A lexicostatistical experiment. Philadelphia: American Philosophical Society. EMBLETON, SHEILA M. 1986. Statistics in historical linguistics. Bochum: Brockmeyer. ESKA, JOSEPH F. 1995. Observations on the thematic genitive singular in Lepontic and

Hispano-Celtic. Hispano-Gallo-Brittonica: Essays in honour of Professor D. Ellis Evans on the occasion of his sixty-fifth birthday, ed. by Joseph F. Eska, R. Geraint Gruffydd, and Nicolas Jacobs, 33-46. Cardiff: University of Wales Press.

ESKA, JOSEPH F. 1998. The linguistic position of Lepontic. Berkeley Linguistics Society 24S.2-11.

FORSTER, PETER, and ALFRED TOTH. 2003. Toward a phylogenetic chronology of ancient Gaulish, Celtic, and Indo-European. Proceedings of the National Academy of Sciences 100.9079-84.

GOETZ, GEORG, and GOTTHOLD GUNDERMANN (eds.) 1888. Corpus glossariorum Latinorum 2, Glossae Latinograecae et Graecolatinae. Leipzig: B. G. Teubner.

GRAY, RUSSELL D., and QUENTIN D. ATKINSON. 2003. Language-tree divergence times sup- port the Anatolian theory of Indo-European origin. Nature 426.435-40.

JACKSON, KENNETH. 1953. Language and history in early Britain. Edinburgh: Edinburgh University Press.

KISHINO, HIROHISA; JEFFREY THORNE; and WILLIAM BRUNO. 2001. Performance of a diver- gence time estimation method under a probabilistic model of rate evolution. Molecular Biology and Evolution 18.352-61.

LAMBERT, PIERRE-YVES. 1989. Review of Marichal 1988. Etudes celtiques 26.259-61. LAMBERT, PIERRE-YVES. 1994. La langue gauloise: Description linguistique, commentaire

d'inscriptions choisies. Paris: Errance. [2nd edn., 2002.] LEJEUNE, MICHEL. 1988. Recueil des inscriptions gauloises 2.1: Textes gallo-dtrusques,

Textes gallo-latins sur pierre. Paris: Centre National de la Recherche Scientifique. MALLORY, J. P. 1989. In search of the Indo-Europeans: Language, archeology and myth.

London: Thames and Hudson. MARICHAL, ROBERT. 1988. Les graffites de La Graufesenque. Paris: Centre National de la

Recherche Scientifique. MCMANUS, DAMIAN. 1983. A chronology of the Latin loan-words in early Irish. Eriu

34.21-71. MEILLET, A. 1925. La mithode comparative en linguistique historique. Oslo: H. Asche-

houg & Co. MUSE, S. V., and B. S. WEIR. 1992. Testing for equality of evolutionary rates. Genetics

132.269-76. O?TIR, K. 1930. Drei vorslavisch-etruskischen Vogelnamen. Ljublana: Znanstveno Drusto. RENFREW, COLIN. 1987. Archaeology and language: The puzzle of Indo-European origins.

Cambridge: Cambridge University Press. REXOVA, KATERINA; DANIEL FRYNTA; and JAN ZRZAV'. 2003. Cladistic analysis of languages:

Indo-European classification based on lexicostatistical data. Cladistics 19.120-27. RINGE, DON; TANDY WARNOW; and ANN TAYLOR. 2002. Indo-European and computational

cladistics. Transactions of the Philological Society 100.59-129. SANDERSON, MICHAEL. 1997. A nonparametric approach to estimating divergence times in

the absence of rate constancy. Molecular Biology and Evolution 14.1218-31. SEARLS, DAVID B. 2003. Trees of life and language. Nature 426.391-92. TISCHLER, JOHANN. 1973. Glottochronologie und Lexikostatistik. Innsbruck: Institut fuir

Sprachwissenschaft der Universitit. UHLICH, JURGEN. 1999. Zur sprachlichen Einordnung des Lepontischen. Akten des zweiten

deutschen Keltologen-Symposiums, ed. by Stefan Zimmer, Rolf Kidderitzsch, and Arndt Wigger, 277-304. Ttibingen: Max Niemeyer.

WALDE, A., and J. B. HOFMANN. 1938-54. Lateinisches etymologisches Wdrterbuch. Heidel- berg: Carl Winter.

WARNOW, TANDY. 1997. Mathematical approaches to comparative linguistics. Proceedings of the National Academy of Sciences 94.6585-90.

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions

Page 15: Recent Work in Computational Linguistic Phylogeny

582 LANGUAGE, VOLUME 80, NUMBER 3 (2004)

WHATMOUGH, JOSHUA. 1970. The dialects of ancient Gaul: Prolegomena and records of the dialects. Cambridge, MA: Harvard University Press.

Eska

Department of English Virginia Polytechnic Institute & State University Blacksburg, VA 24061 [[email protected]]

Ringe Department of Linguistics University of Pennsylvania Philadelphia, PA 19104

[[email protected]]

[Received 9 November 2003; accepted 17 January 2004]

This content downloaded from 128.173.127.127 on Sun, 18 Jan 2015 08:16:20 AMAll use subject to JSTOR Terms and Conditions