Top Banner
1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)
22

1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Apr 06, 2016

Download

Documents

Kristian Maus
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

1

Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt

The GerManC Project

A Representative Corpus of Early Modern German

(1650-1800)

Page 2: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

2

Page 3: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Representativeness1. Not complete texts, but extracts of

approximately 2000 words (cf. Brown corpora and ARCHER)

2. Nine genresa. Dramasb. Newspapersc. Lettersd. Sermonse. Narrative prosef. Journalsg. Scholarly texts (humanities)h. Scholarly texts (science & medicine)i. Legal texts

3

Page 4: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Representativeness3. Periods (cf. Bonn corpus of ENHG)

1650-1700 1700-1750 1750-18004. Regionsa. North Germanb. West Central Germanc. East Central Germand. West Upper German (incl.Swiss)e. East Upper German (incl. Austrian)5. Three extracts of ≥2000 words per

genre/period/region = approx. 900,000 words

4

Page 5: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Pilot Project: GerManCOne year grant from ESRC:[March 2006 - March 2007]

Team: Paul Bennett, Martin Durrell, Astrid Ensslin

Aim: testing corpus design and aims with a single genre, and evaluating and developing a set of analytical tools

Newspapers were selected as genre for the pilot

5

Page 6: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Extended GerManCPilot project completed March 2007. Newspaper corpus lodged with Oxford Text Archive (and available on project website)

Application for funding the extended corpus approved early 2008, with equal funding from ESRC and AHRC

Original design maintained, eight further genres to be added

Team: Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt

Work started in September 2008 6

Page 7: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Digitization

• Scanning black letter (Fraktur) texts with OCR proved impractical and prone to error

• All texts keyed in twice and the results compared electronically (“double-keying“) to eliminate mistakes

• Texts keyed into <oXygen/> XML Editor and marked-up according to TEI 5 Lite guidelines

• Only texts with 2000 words of (more or less) continuous German prose were selected

7

Page 8: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Der in seiner Freyheit vergnügte ALCBIADES (Drama: North German, 1700)

8

Page 9: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Development of toolsA program for tokenizationA program to recognize orthographic variants A lemmatization program with the ultimate aim of lemmatizing the whole corpus

The development of a POS-tagger (on the basis of the Stuttgart-Tübingen Tagset, and based on the TreeTagger) with a view to tagging the complete corpus

Developing a program to enable more detailed morphosyntactic tagging of the whole corpus

If possible within the time constraints, developing a parser (possibly on the basis of the parser used in York for Old English) and parsing the complete corpus on this basis.

9

Page 10: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

10

Case Study I: Changing norms weak adjective inflection

“Innerhalb der nach grammatischem Bestimmungswort zu erwartenden indet. Flexion des Nom./Akk.Pl. aller Genera (die klugen Frauen) kommt es zu allen Zeiten des Fnhd. zu einer zwischen -(e) und -(e)n schwankenden Formbildung”

Gramm. d. Fnhd. VI, 174

Page 11: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Findings: weak adjective inflection 1 (newspapers) process of standardization weak adjective inflection (Durrell et al. 2008) in nom./acc. pl.,

e.g. : die gute[n] Kinder (die Gute[n])

11

1650-1700 1700-1750 1750-1800-e -en -e -en -e -en

North German

20 (6) 6 (5) 6 33 (16)

1 32 (14)

West Central

45 (18)

4 (3) 18 (4)

10 (5) 3 28 (6)

East Central

7 (2) 18 (11)

7 18 (3) 2 31 (5)

West Upper 25 (7) 6 (3) 16 (3)

6 (2) 16 (3)

16 (8)

East Upper 38 (22)

14 (11)

24 (3)

11 (8) 3 34 (5)

Total 135 (55)

48 (33)

71 (10)

78 (34)

25 (3)

141 (38)

Page 12: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

12

Genre-dependent variant selection“Die Entwicklung vom späten 16. Jh. bis zur Mitte des 18. Jhs. erweist die Durchsetzung [von -en] als die Verallgemeinerung eines in erster Linie omd. Usus. Die [...] stilschichtliche Distribution bestätigt die Einschätzung bei Hemmer [...], daß -n über literarische Sprachvorbilder übernommen worden ist.”

(Gramm d. Fnhd. 176)

Page 13: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Findings: weak adjective inflection 2(literary genres)Preliminary examples from ‘drama’ and ‘narrative prose’ in new extended corpus

13

1650-1700 1700-1750 1750-1800-e -en -e -en -e -en

North German

2 19 (2)

7 (2) 23 (7) 0 25 (7)

West Central

2 19 (7)

6 (1) 14 (5) 6 8 (1)

East Central

2 24 (3)

1 (1) 23 (4) 0 17 (3)

West Upper 5 0 0 3 (1) 3 (1)

21 (5)

East Upper 2 11 (2)

5 (1) 3 (1) 3 (1)

14 (2)

Total 13 73 (14)

19 (5)

78 (34)

12 (2)

85 (18)

Page 14: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

14

Case Study II: Morphological simplification

zween/zwo/zwei-zwey

“Bei den Grammatikern ist bis in die 2. Hälfte des 18. Jh. hinein die Genusdifferenzierung aufrechterhalten” (Schottel, Bödiker, Gottsched) “Erst Adelung (a. 1782) gibt ausschließlich die Form zwey für alle Genera”. (Gramm. d. Fnhd. VII, 539)

Page 15: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

15

Case Study II: Morphological simplification

zween/zwo/zwei-zwey

“Am frühesten ist das Neutrum als Einheitsform festgeworden im Niederdeutschen [1303]. Im Ostmitteldeutschen (Obersächsischen und Schlesischen) herrscht es seit der Mitte des 17. Jhs. und drang von dort auch in die Literatursprache” (Schirmunski, Deutsche Mundartkunde, 474).

Page 16: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

16

1650-1700 1700-1750 1750-1800

zween zwo zwei zween zwo zwei zween zwo zwei

North German 1 2 8 8

West Central German

1 3 3 2 8 1 13

East Central German

2 2 2 1 8

West Upper German

1 2 3

East Upper German

1 2 6 13

‘zwei’ in newspaper corpus

Page 17: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

17

1650-1700 1700-1750 1750-1800

zween zwo zwei zween zwo zwei zween zwo zwei

North German 2 2 2 1 7 5 2 8

West Central German

3 1 13 1 1 6 2 15

East Central German

2 3 1 20 1 15

West Upper German

1 4 2 11 1 1 3

East Upper German

3 13 5 1 7

‘zwei’ in extended corpus to date

Page 18: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Historical/cultural findingsMedia history

Ensslin (2009), ‘”Im Unterhause groß Getöse”: representations of 18th century British parliamentary democracy in Early Modern German newspaper discourse’

The representation of a parliamentary monarchy in 17th & 18th century Germany, with predominantly absolute rulers - but responding to increased interest in Britain ruled by the Hannoverians

Initially straightforward factual presentations, concise and apparently objective, though with (intentional?) emphasis on the leading role of the king

Later clear tendency towards stigmatization of the raucous ‘debates’ in the House of Commons, with a much more subjective style of presentation and often sensationalist tone

18

Page 19: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Other investigations• The development of the würde + Infinitive construction

(Smirnova 2006; Durrell 2007)

• Das Doppelperfekt (Topalović 2007)

• Evidentiality and text type (Whitt 2008)

• The general notion of text type/genre/register as it relates

to historical corpora

19

Page 20: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Thank you

Contacts: [email protected]@[email protected]@manchester.ac.uk

Web page:http://www.llc.manchester.ac.uk/research/projects/germanc/

20

Page 21: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Project publicationsMartin Durrell, Astrid Ensslin and Paul Bennett, "The GerManC project", In: Sprache und Datenverarbeitung 31 (2007), 71-80. Martin Durrell, Astrid Ensslin und Paul Bennett, "Zur Standardisierung der Adjektivflexion im Deutschen im 18. Jahrhundert". In: W. Czachur and M. Czyzewska (eds.), Vom Wort zum Text. Studien zur deutschen Sprache und Kultur. Festschrift für Professor Józef Wiktorowicz zum 65. Geburtstag. Warszawa, Instytut Germanistyki Uniwersitetu Warszawskiego, 2008, pp. 259-267. Martin Durrell, Astrid Ensslin und Paul Bennett, "Zeitungen und Sprachausgleich im 17. und 18. Jahrhundert.“ In: Zeitschrift für deutsche Philologie 127 (2008), Sonderheft, pp. 263-279Ensslin, Astrid (2008), '"Im Unterhause abscheulich groß Getöse". Representations of 18th century British parliamentary democracy in early modern German newspaper discourse and their treatment of borrowings from English'. In: Pfalzgraf, F. & Rash, F. (eds.), "Anglo-German Literary Relations". Bern, etc.: Lang, pp. 73-96.

21

Page 22: 1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)

Other referencesDurrell, Martin. 2007. "'Deutsch ist eine würde-lose Sprache'. On the history of a failedprescription". In: Stephan Elspaß, Nils Langer, Joachim Scharloth & WimVandenbussche (eds.), Germanic Language Histories 'from Below' (1700-2000).(Studia Linguistica Germanica 86). Berlin & New York: de Gruyter, pp. 243-258.

Smirnova, Elena. 2006. Die Entwicklung der Konstruktion würde + Infinitiv imDeutschen: Eine funktional-semantische Analyse unter besonderer Berücksichtigungsprachhistorischer Aspekte. Berlin & New York: de Gruyter.

Topalović, Elvira. To appear. "Perfekt II und Plusquamperfekt II. Zur historischenKontinuität doppelter Perfektbildungen im Deutschen". In: Claudine Moulin, FaustoRavida & Nikolaus Ruge (eds.), Sprache in der Stadt. Akten der 25. Tagung desInternationalen Arbeitskreises Historische Stadtsprachenforschung. Luxemburg,11.-13. Oktober 2007. Heidelberg: Winter.

Whitt, Richard J. 2008. Evidentiality and Perception Verbs in English and German: A Corpus-Based Analysis from the Early Modern Period to the Present. Ph.D. Dissertation: The University of California, Berkeley.

22