Data Citation: The State of the Art in Linguistics · Language 33 LANG Linguistics of the Tibeto-Burman Area 18 LTBA Natural Language and Linguistic Theory 32 NLLT Oceanic Linguistics
Post on 26-Aug-2020
3 Views
Preview:
Transcript
Data Citation: The State of the Art in Linguistics
Lauren Gawne(1), Andrea Berez-Kroeker(2), Barbara F. Kelly(3) & Tyler Heston(2)
1st Workshop on Data Citation and Attribution in LinguisticsBoulder CO September 18 2015
(1) SOAS, University of London, (2) The University of Hawaii, (3) The University of Melbourne
View these slides! bit.ly/DataCitationSOTA
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
Developing Standards for Data Citation and AttributionWorkshop aim: to develop and promote standards for data citation and attribution for linguistic data sets.
A data-driven linguistic science has the potential to provide substantiation of scientific claims by promoting attention to the care and structuring of language data.
2
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
What is the state of the art?
How do researchers link their writing back to the underlying data?
● Where does our data come from?● What kind of data are we using?● Where is the data now?● Are we citing our examples? If so, how?
3
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
Our studyWe examined:● 100 books (50 published grammars, 50 dissertations)
○ Variety of publishers, languages, institutions, supervisors
● 271 journal articles from 9 journals○ Range of areal foci, linguistic subfields, theoretical persuasions
● Published 2003-2012 ○ 5 years after Himmelmann 1998:164
“[Language] documentation […] will ensure that the collection and presentation of primary data receive the theoretical and practical attention they deserve.”
4
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015 5
Journal No. articles included
Abbreviation
International Journal of American Linguistics 33 IJAL
Journal of African Languages and Linguistics 29 JALL
Journal of Sociolinguistics 33 JS
Language 33 LANG
Linguistics of the Tibeto-Burman Area 18 LTBA
Natural Language and Linguistic Theory 32 NLLT
Oceanic Linguistics 33 OL
Studies in Language 30 SL
Studies in Second Language Acquisition 30 S2LA
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
Data distribution by year of publication
Red - published grammarsBlue - dissertations
Books
6
num
ber o
f pu
blic
atio
ns a
ye
ar
Journals
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
Data Coding
7
● Methodological variables:○ participants, data collection equipment, data analysis
tools/software, time spent collecting data (see bit.ly/GoodMethods)
● Data-related variables:○ 1. Source of data○ 2. Data genre analyzed (linguistic genre)○ 3. Where data is now○ 4. Citation conventions used to reference data, if any
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
Data Coding
8
● Methodological variables:○ participants, data collection equipment, data analysis
tools/software, time spent collecting data (see bit.ly/GoodMethods)
● Data-related variables:○ 1. Source of data○ 2. Data genre analyzed (linguistic genre)○ 3. Where data is now○ 4. Citation conventions used to reference data, if any
Results from Journals
9
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
1. Source of data● OWN: data collected by author● PUBD: published data● UNPUBD: unpublished data collected by someone
other than the author (excluding fieldnotes)● INTRO: introspection● OFN: other person’s fieldnotes● UNST: source of data unstated● NA: not applicable
10
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
1. Source of data: all journals● Most data come from authors’ own
research ~ 50%
● Followed by published data
● Followed by...unstated
11
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
1. Source of data: individual journals
12
(see handout)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015 13
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
2. Data genre (linguistic genre)● CARRIER: data in a carrier sentence● CONVO: conversational data (natural)● CONVOTASK: conversational task (eg
acquisition studies)● ELICIT: elicitation● EXPR: experimental● GJ: grammaticality judgments● HIST: historical data (eg correspondence
sets)● INTV: interviews● LEX: lexical items/words● NAMES: names
14
● NOTES: own fieldnotes● NP: noun phrases● PHR: other phrases● QUEST: questionnaires● SENT: sentence data (broadly defined)● SONG: songs● SPECT: spectrograms● TEXT: texts (broadly defined)● TRANS: translation tasks (eg acquisition
studies)● TEST: tests in a school environment● WR: written data (eg newspapers)● OTHER: other
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
2. Data genre: all journals
15
● Sentences● Lexical items/words● Texts
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
2. Data genre: individual journals
16
(see handout)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015 17
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
3. Where the data is now● ARCH: archived in institutional repository● PUBD: published● HERE: article contains the primary data● HERESUMMARY: data summarized in the article (stats,
graphs, tables)● ONL: online (website or other non-archive)● UNST: location of data not stated
18
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
3. Where the data is now: all journals● Mostly we don’t know!● “Published” a distant 2nd
19
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
3. Where the data is now: indiv js
20
(see handout)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015 21
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
4. Citation conventions used in examples
AUTHPG: author + page no.
ma han-ac-en [ah-ic-0 cab-e]
NEG eat-POT-B1 dawn-MCMP-B3 ground-TER
‘I have not eaten since it dawned.’
(example from IJAL (Yasugi 2005:27)
22
(Coronel:91)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
BIBLE: example from Bible, usually book + chapter + verse
Tuiy-ul ganu giLiji ibi-l-a a abric-il-o;
exit-PA insideinpeople DEM-C-PROX and let-PL-PL
‘Keep away from these men and leave them alone.’
(example from JALL (Schadeberg & Kossman 2010:88)
4. Citation conventions used in examples
23
(Act 5:38)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
CODEEC: citation is a code that is explained by author
So the buggies [bugíz] came out. [BN T3P12]
(endnote explains “[t]he code [BP T3P12] means speaker BN, tape 3, transcription page 12.”)
(example from JS (Brown 2003:21, note 9)
4. Citation conventions used in examples
24
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
CODEUNEX: citation is a code that is not explained
Dijokoti.
PT:take
‘(I) took (it).’ (107:936)
(example from SL (Ewing 2005:100)
4. Citation conventions used in examples
25
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
INIT: citation appears as speaker’s initials only
Mapuche mie-kawell-la-y-ngün.
Mapuche have-horse-NEG-IND-3pS
‘The Mapuche do not own horses.’
(example from LANG (Baker et al. 2005:145)
4. Citation conventions used in examples
26
(JA)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
LANG: citation appears as language name only
Words for ‘six’ in Eastern Miwok Languages
Northern Sierra Miwok
Central Sierra Miwok
Southern Sierra Miwok
(example from IJAL (Blevins 2005:90)
4. Citation conventions used in examples
27
tem:ok:a
tem:ok:a
tem:ok:a
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
LIST: article contains a list or table of sources used
Example: Bender et al. (2003:9, note 2) OL article on Proto-Micronesian: ● footnote list of all the published dictionaries from which
cognate forms are taken. ● Sources are listed by author’s name and year, ● are found in full citation in the bibliography of that paper.
4. Citation conventions used in examples
28
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
PC: citation appears as personal communication
kwa lút waʔ s-náw -lx-s
and NEG SPC NOM-3SG.run-AUT-3POSS
‘But it didn’t run’ (Kinkade, p.c., 2011)
(example from IJAL (Davis 2005:5)
4. Citation conventions used in examples
29
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
SPKRAGEDIAL: citation appears as speaker’s name + other demographic info
[T]here are times when I get stuck, and probably all my grammar is wrong, but I can – yeah, I can manage.
(Rita, f27)
(example from JS (Chand 2011:17)
4. Citation conventions used in examples
30
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
SPKRANON: citation appears anonymized speaker ID
Example: Maddieson et al. 2009 (IJAL):● M1, M2, M3, F1, F2, F3...
4. Citation conventions used in examples
31
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
SPKRPAGE: citation appears as speaker’s initials + numerical code
Code most likely a portion of a corpusMay or may not be explained
tā ba nánháir jiào-zhù
3 MOM boy call-stop
~”he call-stopped the boy” (LY:3)
(example from SL (Post 2007:129)
4. Citation conventions used in examples
32
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
SPKRTITLE: citation appears as speaker’s name + title of a narrative
kwaʔ ʔíca lut l cəl’án taʔ-ntitiyáxconj dem neg loc Chelan exist-Chinook.salmon
‘And in Chelan there are no salmon.’ (Friedlander: Coyote)
(example from IJAL (Mattina 2006:107)
4. Citation conventions used in examples
33
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
STATEMENT: textual statement in body of article explaining sources for numbered examples
Example: Zanuttini 2008:186 (NLLT)
· “[...] Example (2a) is from Hamblin (1987), the others from Potsdam (1998).”
4. Citation conventions used in examples
34
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
STD: citation appears in “standard” format for publications
Wari’, Chapacura-Wanam
mo ta pa’ ta’ hwam ca,
COND realis.FUT kill 1SG:realis.fut fish 3SG.M
mo ta pa’ ta’ carawa ca
COND realis.FUT kill 1sg:realis.FUT animal 3SG.M
‘Either he will kill fish or he will hunt.’
(example from SL (Mauri 2008:23)
4. Citation conventions used in examples
35
(Everett and Kern 1967:162)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
TITLE: citation appears as the title of the story or conversation it was taken from
[…]
83 kyoo desho?
today COP
‘The day when they cook sukiyaki is tomorrow, and the day when they bring something [to us] is today, right?
(Broccoli)(example from SL (Takara 2012: 95)
4. Citation conventions used in examples
36
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
TITLELINE: citation appears as the title of the story + numerical code
moso maezo aut’ucu to mo con-ci fo’kunge.AUX.AV AV.also raise OBL AUX.AV one-REL frog‘They also kept a frog.’ (Frog 1:3)
4. Citation conventions used in examples
37
(example from OL (Huang & Tanangkingsing 2011:95)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
URL: citation appears as internet URL
Eight Deadly Sins of Web 2.0 Start-Ups […] Happinessless: Your start up has no future if you are not happy.(http://www.slideshare.net/imootee/eight-deadly-sins-of-
web-20-startups/)
(example from LANG (Plag & Baayen 2009:115)
4. Citation conventions used in examples
38
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
MS: citation appears as standard reference to unpublished manuscript.
NONE: author did not include any form of citation
NA: article did not contain numbered examples
OTHER: other practice not easily classifiable here
4. Citation conventions used in examples
39
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
4.Citation conventions used in examples: all
● Again, mostly nothing.
● “Standard” is a distant 2nd
40
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
4. Citation conventions used in examples: indiv
41
(see handout)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015 42
Results from Books
43
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
1. Source of data - Books
44
Methodologically consistent genre compared to Journals
Dissertations Published Grammars
Own fieldwork 50 40
Other published sources 6 11
Other unpublished sources 2 5
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
3. Where is the data now - Books
45
We don’t know where most descriptive data ends upDissertations Published
Unknown 35 33
Archived 12 10
“Will archive” 2 3
With community 6 2
Online 0 4
Sizable text corpus with grammar 1 5
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
4. Data citation - books
46score (1-5) 1=none, 5 = fully resolvable
num
ber o
f pu
blic
atio
ns
Red - published grammarsBlue - dissertations
Low rates of good practice in referencing back to original data
minimal with reference to corpus:Rajesh ‘JC story’
resolvable to underlying corpus, not archived:RL LG1-101027-01
fully resolvable to underlying corpus, with time codes & archived:RL LG1-101027-01 01:09
none
minimal reference e.g. to speaker or story‘Rajesh’ or ‘Hunting’
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
Citation conventions - books
47
The reference number (here ref 07032.031) indicates the name of the file the sentence is extracted from (07032) and the line inside the text where the sentence appears (031).
ref 07032.031
Na-tov Vira rar Myriam.1SG-call Vira 3PL.DL Myriam
‘I called Vira and Myriam.’
(example from Guerin 2008: 8)
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015 48
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
We have good models for fieldwork
49
Gippert, Himmelmann & Mosel (2006), Crowley (2007), Bowern (2008), Chelliah & de Reuse (2011), Thieberger (2012), Nakayama & Rice (2014),
LD&C, LD&D and many more.
But few of these explicitly discuss management, citation and attribution of linguistic data. [A notable exception is Bird & Simons 2003]
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
Attitudes are changing slowly
“...one participant wondered why she felt the need to point out the importance of citing the source of each example when presenting it in an academic publication. This seemed obvious to those present…”
Ruth Singer, Melbourne LIP “Grammar writing: where are we now?” http://www.paradisec.org.au/blog/2015/02/grammar-writing-where-are-we-now/
50
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
ReferencesBerez, Andrea. 2014. “Reproducible research in descriptive linguistics: Integrating archiving and citation into the postgraduate curriculum at the University of Hawai‘i at Manoa.” In Amanda Harris, Nick Thieberger & Linda Barwick (eds.), Research, records, and responsibility: Ten years of the Pacific and Regional Archive for Digital Sources in Endangered Cultures. Sydney: University of Sydney Press.
Berez, Andrea, Lauren Gawne, Barbara Kelly and Tyler Heston. In prep. Citation and transparency in descriptive linguistics.
Bird, Steven & Gary Simons. 2003. Seven dimensions of portability for language documentation and description. Language 79(3): 557-582.
Bowern, Claire. 2008. Linguistic fieldwork: a practical guide. Basingstoke [England] ; New York: Palgrave Macmillan.
Chelliah, Shobhana L., and Willem J. De Reuse. 2011. Handbook of descriptive linguistic fieldwork. London: Springer.
Crowley, Terry. 2007. Field linguistics: a beginner's guide. Edited by Nicholas Thieberger, Oxford linguistics. Oxford ; New York: Oxford University Press.
Gippert, Jost, Nikolaus P. Himmelmann, & Ulrike Mosel. 2006. Essentials of language documentation. Berlin: Mouton de Gruyter.
Himmelmann, Nikolaus P. 1998. "Documentary and descriptive linguistics." Linguistics no. 36:161–195.
Himmelmann, Nikolaus P. 2006. “Language documentation: What is it good for?” In Jost Gippert, Nikolaus P. Himmelmann, & Ulrike Mosel (eds.). Essentials of language documentation, 1-30. Berlin: Mouton de Gruyter.
Nakayama, Toshihide, and Keren Rice (eds). 2014. The Art and Practice of Grammar Writing. Vol. 8, Language Documentation & Conservation Special Publication. Honolulu: University of Hawaii Press.
Pawley, Andrew. 2014. "Grammar writing from a dissertation advisor’s perspective." In The Art and Practice of Grammar Writing, edited by Toshihide Nakayama and Keren Rice, 7-23. Honolulu: University of Hawaii Press.
Ring, Hiram. 2015. A Grammar of Pnar. PhD Thesis, Nanyang Technological University.
Thieberger, Nicholas. 2009. “Steps toward a grammar embedded in data” In Patricia Epps and Alexandre Arkhipov (eds.) New Challenges in Typology: Transcending the Borders and Refining the Distinctions, 389-408. Berlin; New York, NY : Mouton de Gruyter.
Thieberger, Nicholas. 2012. The Oxford handbook of linguistic fieldwork. Oxford: Oxford University Press.
Woodbury, Anthony C. 2011. Language documentation. In Austin & Sallabank 2011, 159-186.
51
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
Works from study included hereBaker, Mark C., Roberto Aranovich & Lucia A. Golluscio. 2005. Two types of syntactic noun incorporation: Noun incorporation in Mapudungun and its typological implications. Language 81(1): 138-176.Blevins, Juliette. 2005. Origins of northern Costanoan ʃak:en 'six': A reconsideration of senary counting in Utian. IJAL 71(1): 87-101.Brown, Becky. 2003. Code convergent borrowing in Louisiana French. J Socio 7(1): 3-23.Chand, Vineeta. 2011. Elite positionings toward Hindi: Language policies, political stances and language competence in India. J Socio 15(1): 6-35.Davis, Henry. 2005. On the syntax and semantics of negation in Salish. IJAL 71(1): 1-55.Ewing, Michael C. 2005. Hierarchical constituency in conversational language: The case of Cirebon Javanese. SiL 29(1): 89-112.Guérin, V. M. (2008). Discovering Mavea: Grammar, texts, and lexicon. PhD dissertation. Honolulu: University of Hawaii. Huang, S., & Tanangkingsing, M. 2011. A Discourse Explanation of the Transitivity Phenomena in Kavalan, Squliq, and Tsou. Oceanic Linguistics, 50(1), 93–119.Mattina, Nancy. 2006. Determiner phrases in Moses-Columbia Salish. IJAL 72(1): 97-134.Mauri, Caterina. 2008. The irreality of alternatives: Toward a typology of disjunction. SiL 32(1): 22-55.Maddieson, Ian, Heriberto Avelino, & Loretta O'Connor. 2009. The phonetic structures of Oaxaca Chontal. IJAL 75(1): 69-101.Plag, Ingo & Harald Baayen. 2009. Suffix ordering and morphological processing. Language 85(1): 109-152.Post, Mark W. 2007. Grammaticalization and compounding in Thai and Chinese: A text frequency approach. SiL 31(1): 117-175.Schadeberg, T. C., & Kossmann, M. 2010. Participant reference in the Ebang verbal complex (Heiban, Kordofanian). Journal of African Languages and Linguistics, 31(1). Takara, Nobutaka. 2012. The weight of head nouns in noun-modifying constructions in conversational Japanese. SiL 36(1): 33-72.Yasugi, Yoshiho. 2005. Fronting of nondirect arguments and adverbial focus marking on the verb in Classical Yucatec. IJAL 71(1): 56-86.Zanuttini, Raffaella. 2008. Encoding the addressee in the syntax: Evindence from English imperative subjects. NLLT 26(1): 185-218.
52
Gawne, Berez, Kelly, Heston | Delaman | Sept. 18 2015
Thank you.These slides are available at bit.ly/DataCitationSOTA
lg21@soas.ac.ukandrea.berez@hawaii.edu
Special thanks to the National Science Foundation, The University of Melbourne library staff, The University of Melbourne and NTU, Singapore where Lauren worked on earlier stages of this project
53
top related