Top Banner
Guy Aston SSLMIT, University of Bologna [email protected] The learner as corpus designer
46

Guy Aston SSLMIT, University of Bologna [email protected] The learner as corpus designer.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Guy Aston

SSLMIT, University of [email protected]

The learner as corpus designer

Page 2: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

… or the art of fruit salads

Page 3: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Learner uses of corpora

• Form-focussed (data-driven learning)• Meaning-focussed (learning the

culture)• Skill-focussed (reading practice)• Browsing environment (serendipity)• Reference tool for other tasks

(reading/writing aid)

Page 4: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Why make your own corpus?

• You can devise your own recipe• You know what’s in it• You learn how to do it• Can be fun• Can provide practice in language

use

Page 5: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

The raw ingredients

Page 6: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Devising your own recipe

• Only the text-type(s) you want• Only the texts you want• The quantity you want

… small and specialised is beautiful

Page 7: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

You know what’s in it

• Top-down knowledge of corpus• Top-down knowledge of texts

Page 8: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

You learn how to do it

• Can be a useful skill for many language workers – technical writers– translators– teachers

• Can make you a more critical corpus user

Page 9: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

It can be fun

• Provides a challenge• Gives sense of achievement/satisfaction

Practice in language use

• Design/construction/evaluation of corpora can be communicative activities

Page 10: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Why use standard corpora?

• Less effort• More reliable• Better packaging• You don’t want to learn to make

your own

Page 11: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Less effort

Page 12: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

More reliable

• if it’s well designed

• if it fits your needs

Page 13: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Better packaging

• Metatextual information

• Annotation

• Corpus-specific software

Page 14: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

You don’t want to learn to make your own?

Page 15: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

A compromise strategy: make your own

subcorpus

• assemble using the pre-prepared ingredients of a larger corpus

or in other words… go to a (fruit) salad bar

Page 16: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

(Pick ’n’ mix with the BNC)

Page 17: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

You have a choice of

• text-types• individual texts

• selection by pre-determined criteria

• selection by hand

… or both

Page 18: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

You know what went in• so top-down processing is easier

Little effort• in comparison with making your

own

Page 19: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Good packaging

• Metatextual information• Linguistic annotation• Can use software designed for full

corpus• Indexed

Page 20: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

You get to learn

• what are(n’t) useful subcorpora• what are(n’t) useful design criteria• how to do it

Page 21: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

It can be fun

• challenge / achievement / satisfaction

You can talk about its

• design / construction / evaluation

Page 22: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Talking about fruit salad

BNC Sampler: KC2

Page 23: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Talking about fruit salad

BNC Sampler: KC2

Page 24: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

And now to details …

the Sampler awaits!

Page 25: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

You can create subcorpora of

• specific corpus texts• texts containing solutions to a query • encoded categories of texts• your own categories of texts

and compare them with • other subcorpora• the full corpus

Page 26: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Text analysis: selectingChoosin

gspecific

texts

Page 27: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Viewing the index

Viewing the

index

Page 28: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Party policies (will/shall be + VVN)

Page 29: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Or, to return to our fruit salad text …

Page 30: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

A bad language subcorpus: texts containing solutions to a

query

Page 31: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Choosing the bad language texts

j

Page 32: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

collocates of f.*k.*

collocates of f_ words

Page 33: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

collocates of oh

collocates of oh

Page 34: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

• ‘context-governed’ spoken texts- monologue: 17 texts - dialogue: 29 texts

Making subcorpora using encoded

categories

Page 35: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

More frequent in M*– could– had– he– know– their– were– when– who– your

More frequent in D*– 'll– 'm– any– no– pounds– right– yeah– yes

*ranked 20+ positions higher in first 100 words

Monologue vs Dialogue

Page 36: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

• no occurrences of all right in monologue

• when you’re / you’ll / you’d / you’ve is more common in monologue than when we’re / we’ll / we’d / we’ve; vice-versa in dialogue

Investigating the differences

Page 37: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

you and we

you we

Monologue 42532014

Dialogue 66354949

Page 38: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Subcorpora using your own categories

David Lee’s book genres• academic non-fiction (13 texts)• non-academic non-fiction (15 texts)• prose fiction (13 texts)

Page 39: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Distinctive -ly adverbs of:

• academic non-fiction– accordingly, essentially, eventually,

largely, namely, notably, respectively, surprisingly

• non-academic non-fiction– effectively, merely, normally,

obviously, possibly, specially• prose fiction

– carefully, quietly, slightly, slowly, softly, surely, truly

Page 40: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

largely (academic non-fict)largely (academic non-fiction)

Page 41: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

it (academic non-fiction)

Page 42: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

To conclude …

Page 43: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Working with subcorpora can allow

• study/comparison of forms/meanings in particular texts/text-types

• better-focussed reading practice• more appropriate reference tools

for particular tasks• more focussed browsing

Page 44: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

• may not be representative (but nor is most language learning data)

• are good for forming hypotheses to be tested more widely

• will allow more interesting uses when extracted from a larger corpus

Subcorpora

Page 45: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Making your own provides

• better preparation and motivation for corpus use

• more critical awareness• lots to talk about

Page 46: Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer.

Enjoy!