1 Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt The GerManC Project A Representative Corpus of Early Modern German (1650-1800)
1
Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt
The GerManC Project
A Representative Corpus of Early Modern German
(1650-1800)
2
Representativeness1. Not complete texts, but extracts of
approximately 2000 words (cf. Brown corpora and ARCHER)
2. Nine genresa. Dramasb. Newspapersc. Lettersd. Sermonse. Narrative prosef. Journalsg. Scholarly texts (humanities)h. Scholarly texts (science & medicine)i. Legal texts
3
Representativeness3. Periods (cf. Bonn corpus of ENHG)
1650-1700 1700-1750 1750-18004. Regionsa. North Germanb. West Central Germanc. East Central Germand. West Upper German (incl.Swiss)e. East Upper German (incl. Austrian)
5. Three extracts of ≥2000 words per genre/period/region
= approx. 900.000 words
4
Pilot Project: GerManCOne year grant from ESRC:
[March 2006 - March 2007] Team: Paul Bennett, Martin Durrell, Astrid EnsslinAim: testing corpus design and aims with
a single genre, and evaluating and developing a set of analytical tools
Newspapers were selected as genre for the pilot
5
6
Breslau 1683
Wien 1780
Digitization
1. Scanning black letter (Fraktur) texts with OCR proved impractical and prone to error
2. All texts keyed in twice and the results compared electronically (“double-keying“) to eliminate mistakes
3. Only texts with 2000 words of (more or less) continuous German prose were selected
7
Extended GerManC Pilot project completed March 2007. Newspaper
corpus lodged with Oxford Text Archive (and available on project website)
Application for funding the extended corpus approved early 2008, with equal funding from ESRC and AHRC
Original design maintained, eight further genres to be added
Team: Paul Bennett, Martin Durrell, Silke Scheible, Jason Whitt
Work started in September 20088
Development of toolsA capable program for tokenizationA program to recognize orthographic variants A lemmatization program with the ultimate aim
of lemmatizing the whole corpusThe development of an appropriate POS-tagger
(on the basis of the Stuttgart-Tübingen Tagset) with a view to tagging the complete corpus
Developing a program to enable automatic morphosyntactic tagging of the whole corpus
If possible within the time constraints, developing a parser (possibly on the basis of the parser used in York for Old English) and parsing the complete corpus on this basis.
9
10
Changing norms“Innerhalb der nach grammatischem
Bestimmungswort zu erwartenden indet. Flexion des Nom./Akk.Pl. aller Genera (die klugen Frauen) kommt es zu allen Zeiten des Fnhd. zu einer zwischen -(e) und -(e)n schwankenden Formbildung”
Gramm. d. Fnhd. VI, 174
Findings: weak adjective inflection 1 (newspapers) process of standardization weak adjective inflection (Durrell et al. 2008) in nom./acc.
pl., e.g. : die gute[n] Kinder (die Gute[n])
11
1650-1700 1701-1750 1751-1800
-e -en -e -en -e -en
North German
20 (6) 6 (5) 6 33 (16)
1 32 (14)
West Central 45 (18) 4 (3) 18 (4) 10 (5) 3 28 (6)
East Central 7 (2) 18 (11)
7 18 (3) 2 31 (5)
West Upper 25 (7) 6 (3) 16 (3) 6 (2) 16 (3)
16 (8)
East Upper 38 (22) 14 (11)
24 (3) 11 (8) 3 34 (5)
Total 135 (55)
48 (33)
71 (10)
78 (34)
25 (3)
141 (38)
12
Changing norms“Die Entwicklung vom späten 16. Jh. bis zur
Mitte des 18. Jhs. erweist die Durchsetzung [von -en] als die Verallgemeinerung eines in erster Linie omd. Usus. Die [...] stilschichtliche Distribution bestätigt die Einschätzung bei Hemmer [...], daß -n über literarische Sprachvorbilder übernommen worden ist.”
(Gramm d. Fnhd. 176)
Findings: weak adjective inflection 2(literary genres)Preliminary examples from ‘drama’ and
‘narrative prose’ in new extended corpus
13
1650-1700 1701-1750 1751-1800
-e -en -e -en -e -en
North German
2 19 (2) 7 (2) 23 (7) 0 25 (7)
West Central 2 19 (7) 6 (1) 14 (5) 6 8 (1)
East Central 2 24 (3) 1 (1) 23 (4) 0 17 (3)
West Upper 5 0 0 3 (1) 3 (1) 21 (5)
East Upper 2 11 (2) 5 (1) 3 (1) 3 (1) 14 (2)
Total 13 73 (14)
19 (5) 78 (34)
12 (2)
85 (18)
14
Morphological simplification: zwei
“Bei den Grammatikern ist bis in die 2. Hälfte des 18. Jh. hinein die Genusdifferenzierung aufrechterhalten” (Schottel, Bödiker, Gottsched) “Erst Adelung (a. 1782) gibt ausschließlich die Form zwey für alle Genera”. (Gramm. d. Fnhd. VII, 539)
15
Morphological simplification: zwei
“Am frühesten ist das Neutrum als Einheitsform festgeworden im Niederdeutschen [1303]. Im Ostmitteldeutschen (Obersächsischen und Schlesischen) herrscht es seit der Mitte des 17. Jhs. und drang von dort auch in die Literatursprache” (Schirmunski, Deutsche Mundartkunde, 474).
16
‘zwei’ in newspaper corpus
0
2
4
6
8
10
1216
50-1
700
1701
-50
1751
-180
0
1650
-170
0
1701
-50
1751
-180
0
1650
-170
0
1701
-50
1751
-180
0
1650
-170
0
1701
-50
1751
-180
0
1650
-170
0
1701
-50
1751
-180
0
NG ECG EUG WCG WUG
<zwey*>
<zwei*>
<zwo*>
<zwe(e)n/t*>
Morphological simplification: zwei
In newspapers occasional gender forms in those areas where they occur in the dialects, especially WCG and, notably, Erfurt 1769. zween in newspapers only before 1700, thereafter sporadic. Only one text (Frankfurt 1671) consistently maintains gender distinction
17
18
1650-1700 1700-1750 1750-1800
zween zwo zwei zween zwo zwei zween zwo zwei
North German 2 2 2 1 7 5 2 8
West Central German
3 1 13 1 1 6 2 15
East Central German
2 3 1 20 1 15
West Upper German
1 4 2 11 1 1 3
East Upper German
3 13 5 1 7
‘zwei’ in extended corpus to date
19
Morphological simplification: zwei
In other texts zwei/zwey are dominant throughout, but other forms occur sporadically, even in the North. But Herder Abhandlung über den Ursprung der Sprache (North German 1772) uses the gender forms - but not consistently correctly (e.g. zwey Parteien)
Historical/cultural findingsMedia history
Ensslin (2009), ‘”Im Unterhause groß Getöse”: representations of 18th century British parliamentary democracy in Early Modern German newspaper discourse’
The representation of a parliamentary monarchy in 17th & 18th century Germany, with predominantly absolute rulers - but responding to increased interest in Britain ruled by the Hannoverians
Initially straightforward factual presentations, concise and apparently objective, though with (intentional?) emphasis on the leading role of the king
Later clear tendency towards stigmatization of the raucous ‘debates’ in the House of Commons, with a much more subjective style of presentation and often sensationalist tone
20
21
Erfurt 1744
22
(Freyburgerzeitung, 28 January 1784):Die Veränderungen des neuen Ministeriums
machen im Unterhause abscheulich groß Getöse. Dieß Ministerium hat wirklich schon die herrlichsten Namen aufgeheftet gekriegt: einige schelten selbes die kleine Pastetengebäckadministrazion, andere, die Bildsäule Nabukadnezars. Pitt, der nun an der Stelle des Fox ist, ist ein Gegenstand des öffentlichen Spottes scheelsüchtiger Satyriker.
Thank you
Contacts: [email protected]@[email protected]@manchester.ac.uk
Web page:http://www.llc.manchester.ac.uk/research/projects/
germanc/
23
Project publicationsMartin Durrell, Astrid Ensslin and Paul Bennett, "The GerManC project", In: Sprache und Datenverarbeitung 31 (2007), 71-80.
Martin Durrell, Astrid Ensslin und Paul Bennett, "Zur Standardisierung der Adjektivflexion im Deutschen im 18. Jahrhundert". In: W. Czachur and M. Czyzewska (eds.), Vom Wort zum Text. Studien zur deutschen Sprache und Kultur. Festschrift für Professor Józef Wiktorowicz zum 65. Geburtstag. Warszawa, Instytut Germanistyki Uniwersitetu Warszawskiego, 2008, pp. 259-267.
Martin Durrell, Astrid Ensslin und Paul Bennett, "Zeitungen und Sprachausgleich im 17. und 18. Jahrhundert.“ In: Zeitschrift für deutsche Philologie 127 (2008), Sonderheft, pp. 263-279
Ensslin, Astrid (2008), '"Im Unterhause abscheulich groß Getöse". Representations of 18th century British parliamentary democracy in early modern German newspaper discourse and their treatment of borrowings from English'. In: Pfalzgraf, F. & Rash, F. (eds.), "Anglo-German Literary Relations". Bern, etc.: Lang, pp. 73-96.
24