Top Banner
Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women’s College Kyoto, Japan
57

Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Dec 22, 2015

Download

Documents

Opal Bradford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Resources for Using Corpus Linguistics in ELT

Kenji KitaoDoshisha University

Kyoto, Japan

S. Kathleen KitaoDoshisha Women’s College

Kyoto, Japan

Page 2: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

I. Presentation A. Corpus linguistics and corpus-

related resources B. Online resources for corpus

linguistics 1. Types of resources 2. Examples of resources

C. Using corpus-related resources for language teaching

Page 3: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

II. Application A. Assigned tasks B. Free exploration

Page 4: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Presentation Definitions

Corpus (Latin for “body”) A text or collection of texts Now generally used to refer to machine-

readable texts

Page 5: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Corpus linguistics the use of the empirical data from a

corpus to study language usage and to find patterns of language usage by analyzing actual language use

Page 6: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Requirements A corpus

Can be a single text or a large collection of texts

Larger corpora provide more reliable results, if the purpose is making generalizations about language use

Page 7: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Balanced corpora A variety of genres, including academic

writing, newspapers, fiction, and spoken language

Page 8: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Specialized corpora Examples

Academic writing Texts by learners of English, sometimes

with a specific native language Teachers can develop their own corpora

Newspaper articles Learners’ texts

Page 9: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Corpus analysis tool(s) Types

Tools with specific corpora Tools that can be used with any text or collection of

texts General

Word, Excel, etc. Specialized

Count words Find example of specific words or parts of speech Analyze word frequencies Evaluate readability

Page 10: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Online Corpora Free to all users Available for a fee or for purchase Available only to restricted users

In this presentation, we will only introduce resources that are free.

Page 11: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Using Corpus Linguistics for Language Teaching Technology has become widespread and

accessible Larger, more powerful computers that can

analyze large amounts of data quickly are available

Many corpus-related resources have become available

Language teachers and learners can use corpora

Page 12: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Corpus-related Internet resources 1. General resources on corpus

linguistics 2. Vocabulary frequency lists and

frequency level checkers 3. Online corpora, concordancers and

other text-analysis software 4. E-texts 5. Information about using corpus

linguistics for language teaching

Page 13: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Resources for Corpus Linguisticshttp://www.cis.doshisha.ac.jp/kkitao/library/resource/corpus/corpus.htm

Page 14: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

1. General resources on corpus linguistics Web sites that help orient users to

corpora and to what is available online for teachers to use in the classroom or in preparing material

Page 15: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

The Compleat Lexical Tutor http://www.lextutor.ca/

Resources for data-driven learning, including concordancers for various corpora and in which one can enter texts

Tutorials, resources of teachers, resources for research

Page 16: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Bookmarks for Corpus Linguists http://devoted.to/corpora/

extensive annotated list of links related to corpus linguistics, including

software tools frequency lists papers and articles English and non-English corpora

Page 17: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

2. Vocabulary frequency lists, frequency level checkers, and n-gram extractors Frequency lists

Words used most frequently in English and thus words that are most useful for students to know

Often divided into sublists

Page 18: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Specialized word lists Academic Word List

http://www.nottingham.ac.uk/~alzsh3/acvocab/index.htm

List includes 570 headwords with their word families

Site includes an explanation of the word lists, the words in each sublist, suggestions for using the list, and a gapmaker that can be used to produce gap-filling exercises

Page 19: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

5000 Vocabulary List for Visiting Scholars in the USA

http://www.paulnoll.com/Books/5000-Words/index.html

This is a list of the 5000 Words determined by the Chinese Academy of Sciences for scholars that need to go abroad for research or advanced studies in the USA. They are listed in alphabetical order and have sample sentences and examples. There is an additional three thousand words.

Page 20: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Frequency-level checkers Produces a list of words at each level of

difficulty Helps a teacher understand how difficult

the vocabulary in the reading passage is and which words students at different levels of proficiency might need to learn

N-gram finders Finds groups of n-words

Page 21: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

JACET 8000 Word List http://www01.tcp-ip.or.jp/~shin/j8web/j8web.c

gi

On this web page, you can enter a text and get a list of the words that appear in the text at each of the eight levels of the JACET list. You also get statistics about what percentage of the words (both types and tokens) occur at each of the eight levels.

Page 22: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

N-gram finders Online text analysis tool

http://www.online-utility.org/text/analyzer.jsp

Finds most frequent groups of 2 and 3 words, plus produces a list of all the words, their occurances, and their percentage

Page 23: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Advanced Search – Explore N-grams from the BNC

http://pie.usna.edu/explore.html Produces lists of n-grams, based on the

number of words and occurances you specify

N-gram phrase extractor http://www.er.uqam.ca/nobel/r21270/cgi-

bin/tuples/u_extract.html Produces KWIC list of n-grams

Page 24: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

3. Online corpora, concordancers, and other text-analysis software Concordancers

A type of software for searching corpora Produces a list of key words in context (KWIC),

that is, search terms with the words that come before and after them.

May be able to search for parts of speech, e.g., take, followed by a preposition

May be able to search for two words that are not next to each other

Page 25: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Corpora (or parts of corpora) may have spoken language, written language, American English, British English, academic English, and so on.

Specialized corpora include: parallel corpora, which have same texts in

different languages (to compare same passages in different languages)

learner corpora, which have students’ writing/ speaking (to help identify learners’ problems or to study characteristics of their writing)

Page 26: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Examples of concordancers Turbo Lingo

http://www.staff.amu.edu.pl/~sipkadan/lingo.htm

Can enter a text or URL and get a list of KWIC, average sentence length, word frequency list, and other analyses

Page 27: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

VIEW (Variation in English Words and Phrases) http://view.byu.edu/ Concordancing tool for the British

National Corpus, the Corpus of Contemporary American English, and a Time magazine corpus, plus non-English corpora

Page 28: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

A powerful concordancing tool Has a useful tutorial

Click on what you want to do to see samples of searches

For example, if you want to learn to use wildcards, click on that word, and you will see several examples. You choose the type of search you want to do, and the search is automatically filled in. You can revise it based on what you want to do.

Page 29: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Types of searches Search by exact word, exact phrase,

wildcard, or part of speech For example, mysterious

Use ? or * as a wildcard For example, * point *

Search for an exact word plus a part of speech

For example, white [n*]

Page 30: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Compare usage of semantically related words

{sheer/total} [n*] Search for surrounding words

Nouns that follow the verb “wrap” Limit the search to one register

Adjectives in tabloid newspapers

Page 31: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Compare usage between registers, e.g., news and speaking

we [verb] that: ACAD vs SPOKEN Find words with similar, more general,

or more specific meanings Similar words to “small” More general than “shriek” More specific than “woman”

Page 32: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

BNCweb To log in, go to:

http://bncweb.lancs.ac.uk/bncwebSignup/ For information, go to:

http://bncweb.info

Page 33: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

On BNCweb, you can do simple searches, you can restrict your search to written or spoken texts or based on the type of text.

Form your own subcorpora.

Page 34: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Make frequency lists based on criteria you specify

For example, make a frequency list of all adverbs that end in –ly in spoken texts.

Look at your query history and save queries to use again.

Page 35: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

See your results in a sentence view or a KWIC view.

Get a list of collocates, with statistics about their frequency.

Get information about what type of texts the search term was found in.

Page 36: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Online concordancer http://www.lextutor.ca/concordancers

/concord_e.html Can search a variety of corpora,

including the Brown Corpus, the British National Corpus (written and spoken), a learner corpus, etc.

Produces a KWIC list for a given word and a list of collocates and their frequency

Page 37: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

WebCorp http://www.webcorp.org.uk/ Uses the Internet as a corpus and

produces KWIC as well as providing other information

Page 38: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Comparing two texts Text Lex Compare

http://www.lextutor.ca/text_lex_compare/ Allows users to enter two texts and get lists

of: Unique words to first text Shared words in two texts Unique words in second text

Useful to help teacher find new words in new text

Page 39: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Specialized corpora (a few examples) Spoken English

Corpus swb (American English telephone conversations)

http://www.ldc.upenn.edu/cgi-bin/lol/swb/speechcorpus?&corpus=swb

Technical English e-Xplore Technical English

https://learn.sz.htwk-leipzig.de/wc/main.php

Page 40: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Parallel corpora CRATER Multilingual Aligned Annotated

Corpus http://www.comp.lancs.ac.uk/linguistics/cr

ater/corpus.html Academic English

Michigan Corpus of American Spoken English http://quod.lib.umich.edu/m/micase/ Some large corpora also have sub-corpora

of academic English

Page 41: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Online software to assess readability Tests of document readability and

suggestions how to improve readability

http://www.online-utility.org/english/readability_test_and_improve.jsp

Can calculate texts of any length (some online text analysis programs have limits)

Page 42: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Can enter the text directly or enter a URL e.g.,

http://www.cis.doshisha.ac.jp/kkitao/Japan/shimoda/s1.htm

Provides statistics: Number of characters Number of words Number of sentences Number of syllables/word Number of words/sentence

Page 43: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Calculates readability indexes, including

Gunning Fog Index Coleman-Liau Index Flesch Kinkaid Grade Level Flesch Reading Ease

Lists sentences that might be rewritten to improve readability.

Page 44: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

4. E-texts In some cases, teachers or students

may want to develop their own corpora. There are large numbers of e-text available.

Project Gutenberg http://www.gutenberg.org/wiki/Main_Page Large collection of downloadable fiction and

non-fiction

Page 45: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Internet Public Library: Online Texts http://www.ipl.org/div/subject/browse/

hum60.60.00/ A large number of online texts on a wide variety

of subjects Drew’s Script-o-Rama

http://www.script-o-rama.com/oldindex.shtml A website with a large number of scripts of

movies and TV programs American Rhetoric Online Speech Bank

http://www.americanrhetoric.com/speechbank.htm

A website with a large collection of speeches

Page 46: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

5. Information about using corpus linguistics for language teaching Corpus-related websites specifically

for language teachers Learner corpora and SLA Research

http://leo.meikai.ac.jp/%7Etono/ Links to learner corpora made up of

language produced by speakers of various languages, links to useful tools, a bibliobraphy, and so on

Page 47: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Corpus linguistics: What it is and how it can be applied to teaching

http://iteslj.org/Articles/Krieger-Corpus.html

An article about corpus linguistics and how it can be used in the language classroom

Page 48: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Classroom Application Two types of uses of corpus-related

resources “Low contact” uses – teacher uses resources to help

in teaching, e.g., to find the difficult words in a reading passage; students do not actually see the corpus

“High contact” uses – students use the corpora themselves to learn about language, e.g., to find out which adjectives collocate with “rain”

Page 49: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

“Data-driven learning” is a high contact use of corpus-related resources.

Using corpora to deduce rules of grammar or usage, e.g., to determine if a word’s connotation is positive or negative

Advantages of data-driven learning Focus on authentic language Encouragement of students to deduce Real, exploratory activities rather than drills A learner-centered activity

Page 50: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Web sites with suggestions for data-driven learning activities How to use concordances in teaching

English: Some suggestions http://www.nsknet.or.jp/%7Epeterr-s/

concordancing/usingconcs.html

Page 51: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Data-Driven Learning (DDL): the idea

http://www.ecml.at/projects/voll/rationale_and_help/booklets/resources/menu_booklet_ddl.htm

An explanation of DDL, with examples

Page 52: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Activities Use a corpus to check grammar

http://www.lextutor.ca/grammar_tester/

Use the concordancer in the bottom frame to check the grammar of the sample sentences in the top half

Page 53: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Use a concordancer to make a gap-filler or a quiz http://www.lextutor.ca/multi_conc/ http://www.nottingham.ac.uk/

~alzsh3/acvocab/awlgapmaker.htm

Page 54: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Find examples of a word and group them according to meaning Examples

(http://www.lextutor.ca/concordancers/concord_e.html)

party run

Page 55: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Use the results of a KWIC search to determine how synonyms are used differently Examples

http://www.lextutor.ca/concordancers/concord_e.html

travel, journey, trip, voyage, tour confident, fearless, pushy, upbeat, self-

reliant

Page 56: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Use the academic word list web page and enter a text and make a gap-filling activity http://www.nottingham.ac.uk/

~alzsh3/acvocab/awlgapmaker.htm

Page 57: Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Resources for Corpus Linguisticshttp://www.cis.doshisha.ac.jp/

kkitao/library/resource/corpus/corpus.htm

Thank you