Corpus approaches to discourse analysis 2000891 Text and corpus analysis in English Studies.

Corpus approaches to discourse analysis2000891

Text and corpus analysis in English Studies

Course format

• Lectures and seminars, group work (40 hours)• reading days, tutorials and individual work on

MOOC (50 hours)• Exam: individual corpus projects either using a

corpus you have compiled or assisting in the compilation of a corpus (60 hours)

• Corpus analysis using software

Timetable

• Course starts 05 October 2015• Monday 16:00 – 18:00 Room 447• Tuesday 14:00 – 16:00 Room 349a• Wednesday 09:00 – 11:00 Room 447

• Course ends: 17 November 2015

Aims

• The aims of this course are to give students the awareness, knowledge, experience and skills of analysis of naturally occurring language texts through a corpus assisted discourse studies approach.

Aims cont’d

• the awareness of processes of text production and reception and the effects of register, domain and text type differences.

• Of the use and manipulation of language in society

• Investigations of social issues, how language in society can be investigated through the construction and analysis of electronic corpora

Aims cont’d

• experience of a corpus-based approach to discourse studies through exposure via reading assignments and seminars and awareness through participation in the group discussions in a blended learning format,

• the skills are developed through seminar work and the experience of undertaking a project of corpus analysis using software tools.

Text and Corpus:

• Text: “the record of some speaker’s or writers’ discourse, uttered or written in some context and for some purpose.”

• Corpus: ‘a collection of pieces of text in electronic form, selected according to external criteria to represent as far as possible a language or language variety as a source of data for linguistic research’ (John Sinclair)

By text, I mean

• the record of some speaker’s or speakers’ discourse, uttered or written in some context and for some purpose.

• A corpus consists, then, of the records of authentic discourses, of actual uses of a language in their social contexts.

Corpus topics in the course

•definitions of corpora, purposes and applications background to corpus linguistics

• corpus assisted discourse studies reading concordances applications of corpora in research textlinguistics and sociolinguistics

Texts and language system

• The system of language is instantiated in the form of text

• Like the relationship between weather and climate – the same phenomenon seen from different standpoints: climate is weather seen from a greater depth of time.

• Weather can be said to resemble texts • while climate is the equivalent for the system

Texts in linguistic investigation

• The use of texts, as records of discourse, is already absolutely central to many types of linguistic investigation

• – from discourse analysis, to conversation analysis, sociolinguistics, ethnomethodology, forensic linguistics, lexicography etc

Language in use

• To communicate means to use language with a purpose and language is not just an abstract entity that we can study detached from its users and contexts in which it is used.

Language and function

• This course takes a functional view of language whereby language is seen as having a social, interactive function (establishing human relationships and negotiating communicative goals) as well as an informational/communicative function.

Hymes notion of communicative competence

• communicative competence includes both linguistic competence (implicit and explicit knowledge of the rules of grammar), and contextual or sociolinguistic knowledge of the rules of language use in context.

Four aspects of communicative competence

• Hymes viewed communicative competence as having the following four types:

• what is formally possible, • what is feasible, • what is the social meaning or value of a given

utterance, • and what actually occurs.

Communicative competence

• Grammatical• Lexical • Textual• Organisational• sociolinguistic• Pragmatic• Strategic

Communicative competence

• Throughout their lives, speakers engage in a multitude of different discourses, both as performers and addressees. They do so, typically, one discourse at a time. Over the course of a lifetime, an individual human may participate in thousands of casual conversations, write and read hundreds of postcards, listen to numerous speeches of various kinds, read scores of recipes, instructional leaflets, tax forms etc.

Generic competence

• As our communicative competence develops, we not only develop varying degrees of what we might term ‘generic competence’, we also come to have expectations associated with the use of language in a vast range of contexts, expectations which allow us to make judgements about the social meaning or value of a given utterance, and to know what is likely or probable based on our experience of what actually occurs by being exposed to texts.

Generic competence

• Particular text types have typical patterns of form and convey particular sets of meanings.

• Part of a native speaker’s competence lies in having been exposed to many examples of language in context and being thus primed for the meanings and effects of particular patterns.

The usual and the unusual

• The ability to recognise a particular register, tone or text type is vital for literary appreciation and for translation purposes.

• Some patterns may be deviant or particularly creative when compared to other text types and an awareness of naturalness and unusuality is important in linguistic research.

Sources of information about language

• There are two main sources of information about words: introspection and observation.

• introspection means ‘looking inside’ your own brain and trying to remember everything you know about a word

• observation means examining real examples of language in use (in newspapers, novels, blogs, tweets, and so on), so that we can observe how people use words when they are communicating with one another

Corpora in English studies

• The premise behind corpus analysis in English Studies is that language patterns can be retrieved (with the aid of software programmes) and can provide insights into language and literature which intuitive approaches to the same objects of study may fail to reveal.

Corpus analysis

• Corpus analytic techniques can help provide multi-purpose strategies (in the investigation of literature, linguistics and language teaching) which help corroborate or disconfirm our intuitions about patterns and meanings in texts.

• Corpus analysis and corpus literacy are key skills and provide an essential component in English studies

Corpus-based studies look at patterns

words and word groupsgrammatical units

meaningsattitudes

frequenciesco-occurrence

in context

• Provide techniques for building topic-specific corpora (e.g. Gabrielatos, 2007)

• Reveal salient contextual elements (“trigger events” – Gabrielatos et al., 2012)

• Reveal differences as well as similarities (e.g. Taylor, 2013) intertextuality / interdiscursivity

• Pinpoint absence (e.g. Partington, 2014)

CL can…

Text: form of data used for linguistic analysis

• When we study texts we see patterns that some texts share and describe these in terms of text type

• texts vary very systematically according to contextual values.

• Recording technology and computers have made

it possible to capture spontaneous speech and store and access data in increasing quantities

The methodology of modern linguistics

• Corpus data is authentic• It can include spoken language• A corpus makes it possible to study language

in quantitative terms

The methodology of modern linguistics

• it examines the relationship between instance and system, between the typical and the exceptional, between signal and noise

• Partington Patterns and meanings 1998

• Qualitative and quantitative research

Language and society

• a close attention to language data which still has significance for the wider world of social, cultural and political studies

• looking at discourse and rhetorical strategies and seeing how they can be analysed with the aid of corpora and semi-automatic computational tools

Example: KWIC

• language often looks very different when you see a lot of it together (Sinclair)

• The concordancer – a collector and collator of

examples

Concordance lines

• A concordance line is a line of text taken from a corpus, i.e. a collection of language texts which are organised and stored on a computer. The concordance line may come from the beginning, the middle or the end of one of the texts. It may be made up of one sentence, part of a sentence or part of two sentences. Each concordance line in a set includes the target word, i.e. the word being studied. The target word is always in the middle of the concordance line. This means that when we study a word in a set ofconcordance lines we can see its context, in other words, the words which are used before it and after it.

• A node word with a set of linguistic environments a span of words from left to right

• Environments in which the word finds itself, we can observe common features in the context

• A concordance makes it possible to observe repeated events

• The co-occurrences are observable on the syntagmatic horizontal axis

• Repeated paradigmatic choices are observable on the vertical axis

• Repetitions are made visible by the layout

• You can know a word by the company it keeps (Firth)

• We learn meanings through the accumulated effects of our encounters in contexts, our experience of language through texts, spoken and written

Meanings through encounters

• we learn the meaning of a word through our encounters with it

• its grammatical category,• its collocations,• its colligations,• its syntactic preferences• Its textual preferences• Its pragmatic associations

Try these

• What do you know about these words?• neck• scarf • trouser• suggestive• typical• brimming• fraught

Corpus data

• corpora provide detailed linguistic evidence to enhance the study of discourse features of a particular genre of language.

• You can make an investigation of the communicative strategies used by speakers and writers to achieve their aims

Meaning is spread out in all parts of a text.

• It is conveyed in the choices an author makes:• at lexical level, the words and phrases selected• at grammatical level , whether to express a process as a

verb or a noun, a description as an adjective or an adverb;• what roles to give participants, grammatical subject or

thematic subject; • when an utterance contains two ideas whether to present

them as coordinate or to subordinate one to the other; • which order to present them in (i.e. which to thematise);

how much and what kind of modality to employ.

Non-obvious meanings

• Even authors sometimes are not fully aware of the meanings their texts convey, much of what carries meaning in texts is not open to direct observation

• We can discover how meaningful choices are by comparing them with those which are normal or usual within the genre, we compare instance with system.

Corpus work is comparative

• If texts are not studied in comparison to other bodies or corpora of text it is not possible to know what is normal and what is marked.

• We are not justified in interpreting the

significance of a single linguistic event unless we can compare it with other similar events

comparisons

• in choosing our corpora we need to control the variables in a reasoned way.

• To compare like with like • To pinpoint differences and similarities• Salience and significance• Change over time

Types of corpora

• Corpora can be either heterogeneric or monogeneric, that is, they may contain texts of many different types, generally as many different types as the compilers can practically and legally obtain, or they may contain texts of a single type.

• heterogeneric corpora are thus intended to be in some way representative of the language in question as a whole.

Hetero- and mono-generic

• Monogeneric corpora are compiled as a means of studying one particular text-type, for example, the language of law, of economics, of Parliamentary debates, and so on.

heterogenic

• Heterogeneric corpora tend to be very large, nowadays typically from 100million to a billion words in size.

• Their compilation is complex and expensive and tends to be carried out by special organizations attached to Universities or large institutions, such as publishing houses.

monogeneric

• Monogeneric corpora, on the other hand, can be relatively easy to compile and are often created by individual researchers with a special interest in a particular text-type. The favourite source for accessing texts today is the Internet.

heterogeneric

• Heterogeneric corpora, by enabling researchers to take into account vast quantities of language data and therefore obtain an overview of the authentic behaviour of language users not otherwise readily available to the ‘naked ear’, have helped provide a mass of new information about the grammar and lexis of languages, and have led to the compilation of a new generation of dictionaries, of grammatical descriptions, as well as language-teaching materials.

Corpus = archive

• A corpus by itself it simply an inert archive. However, it can be ‘interrogated’ using dedicated software. The most important interrogation tools include, first of all, the concordancer, then calculators of frequency, keywords, clusters and dispersion.

Monogeneric corpora and the study of discourse

• Research of this types generally entails the comparison of two or more corpora of a particular text-type and very often also the comparison of the contents of a monogeneric corpus with that of a heterogeneric one. In fact, discourse study is necessarily comparative in two separate but related ways

Meaningful choices

• Firstly, within an individual discourse type, only by comparing the choices being made by speakers or writers at any point in a discourse with those which are normal, that is, usual within the genre, can we discover how meaningful those choices are.

Discourse type and specific features

• Secondly, if we are also interested in the characteristics and content of the discourse type itself, it is vital to be able to compare its particular features and patterns with those of other discourse types.

• In this way we discover how it is special, and can go on to consider why.

Comparison is key

• All genre or discourse-type analysis is thus properly comparative.

• In the wider field of discourse studies, this requirement has unfortunately not always been observed in practice.

• Corpora provide the means and methodology to enable rigorous and principled comparative study to be performed.

CADS

• Throughout the course we will examine different aspects of Corpus Assisted Discourse Studies

• See http://en.wikipedia.org/wiki/Corpus-assisted_discourse_studies

• And http://www3.lingue.unibo.it/blog/clb/• Follow the Corpus Linguistics group on

Facebook

http://en.wikipedia.org/wiki/Corpus-assisted_discourse_studies

http://en.wikipedia.org/wiki/Corpus-assisted_discourse_studies

http://www3.lingue.unibo.it/blog/clb/



For non-attenders

• You should follow the MOOC on Corpus Linguistics:

• Corpus Linguistics, Method, Analysis, Interpretation

• https://www.futurelearn.com/courses/corpus-linguistics.

• You can enrol from September 28.

https://www.futurelearn.com/courses/corpus-linguistics



Corpus approaches to discourse analysis 2000891 Text and corpus analysis in English Studies.

Documents

language texts

project of corpus analysis

corpus linguisticscorpus

corpus approaches

corpus topics

language variety

manipulation of language

speakers discourse