Top Banner
LIN 3098 – Corpus Linguistics Albert Gatt
36

LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture Corpora for the study of genre/register variation revisit the concept of representativeness.

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

LIN 3098 – Corpus Linguistics

Albert Gatt

Page 2: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

In this lecture

Corpora for the study of genre/register variation revisit the concept of representativeness

and balance external vs. internal criteria: Biber (1992)

introduce the multi-dimensional approach to register/genre variation (Biber 1988)

Page 3: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Part 1

The concept of register/genre

Page 4: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

A preliminary example

Compare the following: It is hard to resolve this problem. I find it hard to resolve this problem.

Is one intuitively more “formal”? Why?

Page 5: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

A preliminary example Extraposed to-clause

It is hard to resolve this problem. It (expletive) Verb be An adjective (hard) or participle (boring) Clause starting with to + infinitive verb

Tends to be associated with a formal, “anomymous” style.

Tends to be “static”: Adjective or participle denotes a state, not a

dynamic event.

Page 6: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

A preliminary example Extraposed to-clause

It is hard to resolve this problem. It (expletive) Verb be An adjective (hard) or participle (boring) Clause starting with to + infinitive verb

If our intuitions are correct, we would expect the distribution of this clause to vary across genres and registers.

Page 7: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

What is a register? Would you consider the following to

be registers?1. recipe English2. legal Maltese3. specialised language used by ship-

builders

What are the crucial characteristics of register?

Page 8: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Defining register

Possible definitions (see overview in Paolillo 2000): register = “a field of discourse” or

“topic” register = “a combination of all the

parameters of the communicative situation”

register = “an occupationally determined variety of language”

Page 9: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Defining genre In discourse analysis and related

fields, genre is given a “sociologically oriented” definition:

“A socially ratified way of using language in connection with a particular type of social activity” suggests “typical” settings in which

language is used e.g. interview, lecture, story…

Page 10: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Why is this relevant?

Reminder (see lecture 2): general-purpose corpora aim for balance

and representativeness how genre/register are defined affects the

structure and the uses of the corpus

corpus-based studies of variation across/within registers need a well-defined notion

Page 11: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Balance and representativeness Balance:

refers to the range of types of text in the corpus e.g. the BNC’s construction was based on an a

priori classification of texts by domain, time and medium

Representativeness: refers to the extent to which the corpus contains

the full range of variation in the language.

Representativeness depends on balance as a prerequisite

Page 12: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Biber (1993) on achieving balance

Biber distinguishes: external criteria:

social and communicative contexts in which a particular sample of text/speech is produced

external criteria define registers or genres internal criteria:

linguistic (e.g. lexico-grammatical) features that distinguish texts

internal criteria define text types

Page 13: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

External vs. internal Example: academic writing vs. spoken

conversation Some external criteria of differentiation:

primary channel (spoken/written/…) type of addressee factuality

Some internal criteria of differentiation: more uses of personal pronouns in spoken

discourse more use of passives in academic writing …

Page 14: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Which should come first? Biber’s argument:

“in defining the population for a corpus, register/genre distinctions [i.e. external criteria] take precedence over text-type distinctions. […] identification of the salient text-type distinctions in a language requires a representative corpus of texts…”

Page 15: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Biber’s external criteria

1. Primary channel: written/spoken/scripted

2. Format: published/unpublished

includes various publication formats

3. Setting: institutional/other/private-personal

Page 16: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Biber’s external criteria

4. Addresse/receivera. Plurality: unenumerated/

plural/individual/selfb. Presence: present/absentc. Interactiveness: none/little/extensived. Shared knowledge: general/ specialised/

personal

Page 17: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Biber’s external criteria

5. Addressor:a. Demographic variation: age, sex etcb. Acknowledgement: acknowledged

invididual/insititution

6. Factuality: factual-informational / intermediate / imaginative

7. Purposes: persuade, entertain, edify, inform, instruct…

8. Topics: [cf. the “Domain” definition in BNC texts]

Page 18: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

The logic behind genre/register comparison

A priori distinction between different genres/registers adequately sampled to be representative

Given these externally-based distinctions, the question is: what linguistic features are characteristic

(give rise to) different genres?

Page 19: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Part 2

The multifeature/multidimensional framework (Biber 1988, Biber 1995)

Page 20: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Biber (1988, 1995) Compared twenty-one genres in spoken

and written British English

Used a precompiled list of 67 linguistic features, comparing: the extent to which these features “cluster

together” across genres high relative frequency of personal pronouns

=> high relative frequency of questions the extent to which these clusters are more

clearly present in different genres

Page 21: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Primary goals

1. identify the main dimensions (clusters of features) of variation underlying all registers

2. find similarities and differences between different registers

Page 22: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Dimensions Dimension:

group of features that are empirically determined to co-occur in text

Functional interpretation: given a set of features forming a dimension

e.g. pers. pronouns + questions the crucial question is: how do we interpret it

functionally? e.g. the cluster containing pers. pronouns and

questions shows a high level of interpersonal focus in the text

Page 23: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Factor analysis

The MF/MD approach uses factor analysis statistical technique to group together

related features based on their co-occurrence

resulting clusters of features (“factors) are then interpreted and given a label

this is the process of identification and functional interpretation of dimensions

Page 24: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Biber’s methodology1. Identify the grammatical features

based on review of existing literature

2. tag all relevant features in the corpus texts

3. post-edit the texts to ensure accuracy

4. count frequency of each feature in each text

5. apply factor analysis to compute co-occurrence patterns among features

6. interpret the resulting dimensions functionally

7. compare different registers to see how much each dimension is represented in them

Page 25: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Types of features

Lexical features type-token ratio (indicates the average

no. of different types given the number of tokens)

word length

lexical semantic features e.g. word classes like hedges (probably,

possibly…); speech act verbs (declare), etc

Page 26: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Types of features

Grammatical feature classes nouns, prepositional phrases, attributive

and predicative adjectives, etc.

Syntactic features: relative clauses, that-complements, pied-

piping constructions (Which car does he like?), conditional subordination (should you ever…)

Page 27: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

The dimensions identified Involved vs. informational production Narrative vs. non-narrative production Elaborated vs. situation-dependent

reference Overt expression of persuasion Abstract vs. non-abstract styleNB. Many of these dimensions define

“poles of opposition”

Page 28: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Dimension 1: involved vs. informational Features:

1st & 2nd personal pronouns

questions reductions stance verbs hedges emphatics adverbial

subordination nouns adjectives prepositional phrases long words

Typical of conversations, letters(high personal involvement)

Typical of informational exposition, e.g. in official documents and academic writing

Page 29: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Dimension 2: Narrative vs. non-narrative

Features: past tense perfect aspect 3rd person pronouns speech act verbs

present tense attributive adjectives

Typical of fiction

Typical of broadcasts, telephone conversations, professional letters

Page 30: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Dimension 3: elaborated vs. situation-dependent reference

Features: wh-relative clauses

pied-piping phrasal coordination

time adverbials place adverbials

Typical of “elaborated” text: official documents, professional letters, written exposition

Typical of “situation-independent language”

Typical of “situation-dependent language”, e.g. broadcasts, fiction, personal letters

Page 31: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Dimension 4: Overt expression of persuasion

Features: modals conditional

subordination

lack of any of the above

Defines an “overt expression of persuasion type”e.g editorials, professional letters

Language which does not overtly seek to persuade

Page 32: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Dimension 5: Abstract vs. non-abstract style

Features:

agentless passives by-passives …

lack of any of the above

An “abstract style”: technical prose, academic prose, official documents

Language which is typically not abstract: conversation, public speeches, broadcasts…

Page 33: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Biber’s main argument

No one dimension is enough to characterise the properties of a particular register dimensions are coherent, correlated

groupings of features every register could be defined in terms

of the relative prominence of all 5 dimensions

Page 34: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Biber’s main argument Biber finds no evidence of an absolute

difference between spoken and written language e.g. conversations often display similar

characteristics to other non-spoken genres

Better to identify different types of speech (broadcast, scripted, spontaneous) view similarities and differences to different

types of writing

Page 35: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Summary

Biber’s MF/MD approach has proved highly influential in the study of register and genre

Crucially, relies on a priori definition of: features (“what to look for”) registers (“situationally-defined uses of

language”)

Page 36: LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

References Paolillo, J. C. (2000). Formalising formality.

Journal of Linguistics, 36: 215—259

Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8 (4): 243-258.

Biber, D. (1995). On the role of computational, statistical and interpretive techniques in multi-dimensional analysis of register variation. Text, 15 (3): 314—370