Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing Ltd, UK Generous support from National Science Council, Taiwan
Dec 19, 2015
Making useful wordlists for ELTTopical vocabulary from the WWW
Simon Smith & Scott SommersMing Chuan University, Taipei
Adam Kilgarriff, Lexical Computing Ltd, UK
Generous support from National Science Council, Taiwan
Outline
• Importance of learning natural English• Wordlists in English learning• Making relevant wordlists• Using two corpus analysis tools
– WebBootCat – Sketch Engine
• Conclusions and future plans
The problem
• Learning non-authentic English– It’s raining cats and dogs!– Long time no see!
• In Taiwan, all students learn these• They may believe they are authentic• But English speakers hardly use them!
Word and phrase lists
• Students must learn vocabulary• It is best to learn vocabulary through practice:
– Reading– Speaking to American people– Interacting in the language
• That is difficult for Asian students• In Taiwan, students must learn vocabulary
from lists
From the MOE
• 6000 word high school list– Probably useful for
policy makers– May be useful for
teachers– Not useful for learners
• Better to organize wordlists by topic?
So, we should teach vocabulary by topic?
Khmer learning Game © North Illinois University
Unit 1
Getting started at University
Nounsattendance course facilities helmetinitiative major vendor Verbsaccomplish consider improve tease Adjectiveschallenging fortunateimpatient occasional protective
From the ELC textbook
• It is not easy to make up a good vocabulary list for an abstract topic
• Try these topics:– Unit 1: Getting started at University– Unit 2: Family and Hometown– Unit 3: English and You
• Please– Choose a topic– Write down some good keywords
• Better use computer to help us!
Getting wordlists from the web
WebBootCat: making corpora from the web
• User chooses some seed words– For example freshman and university
• WebBootCat – searches Yahoo for seed words– throws away lists of numbers, HTML, prices lists…– puts all running text into a corpus– tags the corpus (noun, verb etc) if required
12345 56789 $$$$$ £££££*&%^
WebBootCat passes query to Yahoo!
WebBootCat throws away non-data web pages
WebBootCat puts text pages in corpus
User enters seed words
Advantages of automatic wordlist creation
• contain relevant, topical vocabulary
• created easily and conveniently
• of course, we can select the words manually, from the automatic list!
Disadvantages of manual wordlist creation
• It is difficult to get inspiration to make good wordlists manually.
• Manual wordlists may include rare or unnecessary vocabulary.
Future work: Automatic cloze exercise generation
Q: It’s a ___ day today!
(b) tepid(a) toasty Choose:
(c) lukewarm
(d) sunny
Summary: making wordlists
• choose a topic • get a topic corpus from the web• extract topic wordlist from it• Use recursive bootstrapping to extend the
wordlist• include multi-word terms in the wordlist