Top Banner
1 2 Modern Approaches to Corpus Linguistics 2 Modern Approaches to Corpus Linguistics Dominique LONGRÉE, LASLA – Université de Liège et FUSL (Bruxel • automatic taggers as heuristic tools • multilevel approaches : the motives what do they have in common ?
7

1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel.

Jan 02, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel.

1

2 Modern Approaches to Corpus Linguistics

2 Modern Approaches to Corpus Linguistics

Dominique LONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles)

• automatic taggers as heuristic tools

• multilevel approaches : the motives

• what do they have in common ?

Page 2: 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel.

2 Modern Approaches to Corpus Linguistics

2

1. Automatic taggers as heuristic tools

a LASLA research project : testing various automatic recognition software, know as taggers

Biber, 1993, Illouz, 1999, etc. : the quality of production can vary significantly - from one type of text to another

- from one tagger to another.

Questions :- are the results better with a tagger trained

- on one author or on a given text for another text

- by the same author, or within the same discourse? - what can we deduce from those results regarding- the tagger or - the homogeneity of corpora?

Page 3: 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel.

2 Modern Approaches to Corpus Linguistics

3

1. Automatic taggers as heuristic tools

The test-texts :- book 3 of The Gallic Wars by Caesar – BGall3 (3673 tokens- The Conspiracy of Catilina by Sallust – SalCat. (10688 tokens), - book 3 of The History of Alexander the Great by Quintus Curtius

– QC3 (7261 tokens), - The First Oration Against Catilina by Cicero – CicCat1 (3333 tokens) - poem 66 of Catullus – Catu66 (586 tokens)

Varying the nature of the training and evaluation corpus , in order to identify and measure variant factors :

style of the workstyle of the authordiachronyliterary genretype of discourse

Page 4: 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel.

2 Modern Approaches to Corpus Linguistics

4

1. Automatic taggers as heuristic tools

In theoretical terms : taggers appear to have some value as heuristic instruments

For instance, highlight - the homogeneity of the historical style

over and above diachronic development- the gap between narration and discourse (speeches)- the gap between the styles of Caesar and Cicero- a smaller gap between Catullus and Cicero

or between Catullus and Quintus Curtius/Tacitus than the gap between Catullus and Caesar,

etc

Page 5: 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel.

2 Modern Approaches to Corpus Linguistics

5

2. Multilevel approaches : the “motives”

Some indicators intuitively catalogued in Latin narrative prose - sequences of verb tenses - lexical elements

repente, subito ‘suddenly’, ‘abruptly’- syntactical structures / ‘linking clichés’

Quibus rebus cognitis ‘Those things being known’Quod ubi animaduertit ‘When he had noticed that’

Limits- no very analysis as text’s structure indicators- no study of their interaction

- poor use for characterising text genre and style

Page 6: 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel.

2 Modern Approaches to Corpus Linguistics

6

2. Multilevel approaches : the “motives”

The Discourse Modes and Bases Approach - Kroon, 2007, 2009; Adema, 2007, 2008, 2009 - a priori definition of typical features for each discourse mode- in order to evaluate text homogeneity

LASLA and BCL approach

- to develop endogenous exploratory methods - to take into account this text linearity

- to specify functional convergences between several indicators

methods- calling upon mathematical models (neighborhoods, bursts) - combining

- small-scale qualitative approach- large-scope quantitative analysis

Page 7: 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel.

2 Modern Approaches to Corpus Linguistics

7

3. What do these approaches have in common ?

they take texts and discourses into account in both their dimensions

- the multilevel nature of texts and of languages, from phonetics to pragmatics

- the fact that texts and discourses - are organized according to linearity - can be considered as topological entities.