Top Banner
The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http:// www.sketchengine.co.uk
26

The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Mar 31, 2015

Download

Documents

Cora Broad
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

The Cambridge Learner Corpus, English Profile, the Sketch Engine

and the Kelly Project

Adam KilgarriffLexical Computing Ltd

http://www.sketchengine.co.uk

Page 2: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

The Cambridge Learner Corpus, English Profile, the Sketch Engine,

“freely available”, HOO, DANTE and the Kelly Project

Adam KilgarriffLexical Computing Ltd

http://www.sketchengine.co.uk

Page 3: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Cambridge Learner Corpus (CLC)

• Since 1993 – Nearly as old as CECL

• Leading resource (like ICLE)• CUP and Cambridge ESOL– For better dictionaries, ELT courses, tests– Material: all from exams (levels A1-C2)

• 45m words; 22m error-tagged• 200,000 scripts, 138 L1s, 203 nationalities

Page 4: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

English Profile

• From 2006• Cambridge Univ, Univ Press, ESOL (+ others)• Goal– for each CEFR level, find characteristic lexis and

grammar– Main resource: CLC– Talk on Thursday• Theodora Alexopolou, Helen Yannakoudakis

Page 5: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Flyers

Page 6: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Sketch Engine

• Leading corpus tool• Word sketches– One-page summaries of a word’s grammatical and

collocational behaviour• In use at OUP, CUP, Collins, Macmillan, INL …• 42 languages– Over 150 corpora– Since May including CHILDES: demo– Since last year including CLC

Page 7: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Error-coded corpus

• Challenge– Intuitive to search for x• anywhere• only where it is part of an error• only where it is part of a correction

where x can be a word, phrase, grammar pattern …

Requirement for CLC in Sketch Engine

Page 8: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Sample text

• We will only use those informations to take part of our guest survey

Page 9: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Error-coded corpora in SkE

• demo

Page 10: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

freely available

Page 11: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

freely available

Free (MED online)Sense 1: not costing anythingSense 4: not limited by rules … anyone can get hold of it??

Page 12: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

freely available

Free (MED online)Sense 1: not costing anythingSense 4: not limited by rules … anyone can get hold of it??

AvailableTo download onto your comTo use

Page 13: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Case studiesICLE CLC

Money 225 EUR No

To everyone Yes Cambridge author/collab

To download ? No

To use Yes Yes

Page 14: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Non-geeks

• Access is important, not download• Web is beautiful

Page 15: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

HOO / HOO+

• Helping Our Own• HOO: English-NNS NLP researchers – Developer = user: motivation– Shared task/competitive evaluation• Organisers define task and prepare ‘gold standard’• Teams participate by running their software over test

data• Six teams (incl Tübingen), workshop end Sept

Page 16: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

HOO+ (2012)

• Probably– English: learner data from CLC– Other languages? – Tasks• Essay scoring • Determiner, preposition errors• ?• http://www.clt.mq.edu.au/research/projects/hoo/

Page 17: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

DANTE

Highlights of English lexicography

Page 18: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

DANTE

Page 19: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

DANTE

Page 20: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

DANTE

Page 21: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

DANTE

http://webdante.comFlyers

Page 22: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

The KELLY Project

• EU Lifelong Learning Project• Word cards– 9 languages

• Arabic Chinese English Greek Italian Norwegian Polish Russian Swedish

– All 36 pairs– Words the learner should know (at A1 … C2)

• Partners• Stockholm Univ, Gotheburg Univ, Adam Mickiewicz Univ,

ILSP Athens, CNR Pisa, Oslo Univ, Leeds Univ, Keewords A/S, Lexical Computing Ltd

Page 23: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Interesting question

• How close to purely corpus-based can a pedagogic list be?

Page 24: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Method

• Take a general corpus• Count• Review, add, delete using other lists and corpora• Translate (72 directed-lg-pairs)• Words not in source list which occur in

translations:– Review source list

• http://kelly.sketchengine.co.uk

Page 25: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

• Symmatrical pairs: <x,y> and <y,x>• Cliques:– For x, y, z, … all pairs are symmetrical– 9-language cliques (English members)• hospital library music sun theory

Page 26: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd .

Homage