Historical Newspapers A Historical Newspaper Corpus Segmentation Development and annotation of a newspaper corpus as part of a doctoral thesis on text structure and cohesion in news items from the 17th and 18th centuries Katrin Goldschmidt, Universit¨ at Bonn 10 March 2017 Katrin Goldschmidt DGfS 2017 1/21
30
Embed
Development and annotation of a newspaper corpus as part ... · A Historical Newspaper Corpus Segmentation Development and annotation of a newspaper corpus as part of a doctoral thesis
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Historical NewspapersA Historical Newspaper Corpus
Segmentation
Development and annotation of a newspapercorpus as part of a doctoral thesis on text
structure and cohesion in news items from the17th and 18th centuries
Katrin Goldschmidt, Universitat Bonn
10 March 2017
Katrin Goldschmidt DGfS 2017 1/21
Historical NewspapersA Historical Newspaper Corpus
Segmentation
Content
1. Historical NewspapersCharacteristics of historical newspapersObjectives of doctoral thesis
2. A Historical Newspaper CorpusCorpus DevelopmentANNIS
Historical NewspapersA Historical Newspaper Corpus
Segmentation
Typographical SegmentationFunctional Segmentation
Conclusions
typographical means alone are not an adequate means of newsitem segmentation (maybe in 60% of the correspondencesthey are helpful)
at least 7% of phrasal entities at the beginning of news itemsrefer to entities in the preceding news item(s)=> implies no thematic change
but: considering typographical AND functional means couldlead to successive segmentation methods
With help of the Historical Newspaper Corpus we can geta better understanding of the textual and publicistic structure of
these interesting old newspapers.
Katrin Goldschmidt DGfS 2017 19/21
Historical NewspapersA Historical Newspaper Corpus
Segmentation
Typographical SegmentationFunctional Segmentation
Conclusions
typographical means alone are not an adequate means of newsitem segmentation (maybe in 60% of the correspondencesthey are helpful)
at least 7% of phrasal entities at the beginning of news itemsrefer to entities in the preceding news item(s)=> implies no thematic change
but: considering typographical AND functional means couldlead to successive segmentation methods
With help of the Historical Newspaper Corpus we can geta better understanding of the textual and publicistic structure of
these interesting old newspapers.
Katrin Goldschmidt DGfS 2017 19/21
Historical NewspapersA Historical Newspaper Corpus
Segmentation
Nordischer Mercurius 1664
”Everything [serves] to the [news] lovers [as] a message and a
pleasure / but to the editor [it serves as] his honest entertainment,which he therefore will seek with God / until his END.“
Katrin Goldschmidt DGfS 2017 20/21
Historical NewspapersA Historical Newspaper Corpus
Segmentation
References
Fritz, G. & E. Straßner (eds.) (1996): Die Sprache der ersten deutschenWochenzeitungen im 17. Jahrhundert. Tubingen.Demske, U. (2007): Das Mercurius-Projekt: eine Baumbank fur dasFruhneuhochdeutsche. In: G. Zifonun & W. Kallmeyer (eds.): Sprachkorpora -Datenmengen und Erkenntnisfortschritt (= Jahrbuch des Instituts fur DeutscheSprache 2006). Berlin, 91-104.Krause, T. & A. Zeldes (2014): ANNIS3: A New Architecture for GenericCorpus Query and Visualization. Literary and Linguistic Computing.Lefevre, M. (2013): Textgestaltung, Außerungsstruktur und Syntax in deutschenZeitungen des 17. Jahrhunderts. Zwischen barocker Polyphonie undsolistischem Journalismus. Berlin.Schroder, T. (1995): Die ersten Zeitungen. Textgestaltung undNachrichtenauswahl. Tubingen.
Special thanks to all who supported me with the newspaperrecords (Osterreiche Nationalbibliothek, Staatsbibliothek zu Berlin,Staats- und Universitatsbibliothek Bremen, Institut fur DeutscheSprache Mannheim) and Prof. Ulrike Demske for providing theMercurius Corpus.
Katrin Goldschmidt DGfS 2017 21/21
Additional corpus :: NM 1667-01 synt
all January issues of Nordischer Mercurius 1667
integrated syntactical annotation from the Mercurius corpus