Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Post on 22-Dec-2015

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Resources for Using Corpus Linguistics in ELT

Kenji KitaoDoshisha University

Kyoto, Japan

S. Kathleen KitaoDoshisha Women’s College

Kyoto, Japan

I. Presentation A. Corpus linguistics and corpus-

related resources B. Online resources for corpus

linguistics 1. Types of resources 2. Examples of resources

C. Using corpus-related resources for language teaching

II. Application A. Assigned tasks B. Free exploration

Presentation Definitions

Corpus (Latin for “body”) A text or collection of texts Now generally used to refer to machine-

readable texts

Corpus linguistics the use of the empirical data from a

corpus to study language usage and to find patterns of language usage by analyzing actual language use

Requirements A corpus

Can be a single text or a large collection of texts

Larger corpora provide more reliable results, if the purpose is making generalizations about language use

Balanced corpora A variety of genres, including academic

writing, newspapers, fiction, and spoken language

Specialized corpora Examples

Academic writing Texts by learners of English, sometimes

with a specific native language Teachers can develop their own corpora

Newspaper articles Learners’ texts

Corpus analysis tool(s) Types

Tools with specific corpora Tools that can be used with any text or collection of

texts General

Word, Excel, etc. Specialized

Count words Find example of specific words or parts of speech Analyze word frequencies Evaluate readability

Online Corpora Free to all users Available for a fee or for purchase Available only to restricted users

In this presentation, we will only introduce resources that are free.

Using Corpus Linguistics for Language Teaching Technology has become widespread and

accessible Larger, more powerful computers that can

analyze large amounts of data quickly are available

Many corpus-related resources have become available

Language teachers and learners can use corpora

Corpus-related Internet resources 1. General resources on corpus

linguistics 2. Vocabulary frequency lists and

frequency level checkers 3. Online corpora, concordancers and

other text-analysis software 4. E-texts 5. Information about using corpus

linguistics for language teaching

Resources for Corpus Linguisticshttp://www.cis.doshisha.ac.jp/kkitao/library/resource/corpus/corpus.htm

1. General resources on corpus linguistics Web sites that help orient users to

corpora and to what is available online for teachers to use in the classroom or in preparing material

The Compleat Lexical Tutor http://www.lextutor.ca/

Resources for data-driven learning, including concordancers for various corpora and in which one can enter texts

Tutorials, resources of teachers, resources for research

Bookmarks for Corpus Linguists http://devoted.to/corpora/

extensive annotated list of links related to corpus linguistics, including

software tools frequency lists papers and articles English and non-English corpora

2. Vocabulary frequency lists, frequency level checkers, and n-gram extractors Frequency lists

Words used most frequently in English and thus words that are most useful for students to know

Often divided into sublists

Specialized word lists Academic Word List

http://www.nottingham.ac.uk/~alzsh3/acvocab/index.htm

List includes 570 headwords with their word families

Site includes an explanation of the word lists, the words in each sublist, suggestions for using the list, and a gapmaker that can be used to produce gap-filling exercises

5000 Vocabulary List for Visiting Scholars in the USA

http://www.paulnoll.com/Books/5000-Words/index.html

This is a list of the 5000 Words determined by the Chinese Academy of Sciences for scholars that need to go abroad for research or advanced studies in the USA. They are listed in alphabetical order and have sample sentences and examples. There is an additional three thousand words.

Frequency-level checkers Produces a list of words at each level of

difficulty Helps a teacher understand how difficult

the vocabulary in the reading passage is and which words students at different levels of proficiency might need to learn

N-gram finders Finds groups of n-words

JACET 8000 Word List http://www01.tcp-ip.or.jp/~shin/j8web/j8web.c

gi

On this web page, you can enter a text and get a list of the words that appear in the text at each of the eight levels of the JACET list. You also get statistics about what percentage of the words (both types and tokens) occur at each of the eight levels.

N-gram finders Online text analysis tool

http://www.online-utility.org/text/analyzer.jsp

Finds most frequent groups of 2 and 3 words, plus produces a list of all the words, their occurances, and their percentage

Advanced Search – Explore N-grams from the BNC

http://pie.usna.edu/explore.html Produces lists of n-grams, based on the

number of words and occurances you specify

N-gram phrase extractor http://www.er.uqam.ca/nobel/r21270/cgi-

bin/tuples/u_extract.html Produces KWIC list of n-grams

3. Online corpora, concordancers, and other text-analysis software Concordancers

A type of software for searching corpora Produces a list of key words in context (KWIC),

that is, search terms with the words that come before and after them.

May be able to search for parts of speech, e.g., take, followed by a preposition

May be able to search for two words that are not next to each other

Corpora (or parts of corpora) may have spoken language, written language, American English, British English, academic English, and so on.

Specialized corpora include: parallel corpora, which have same texts in

different languages (to compare same passages in different languages)

learner corpora, which have students’ writing/ speaking (to help identify learners’ problems or to study characteristics of their writing)

Examples of concordancers Turbo Lingo

http://www.staff.amu.edu.pl/~sipkadan/lingo.htm

Can enter a text or URL and get a list of KWIC, average sentence length, word frequency list, and other analyses

VIEW (Variation in English Words and Phrases) http://view.byu.edu/ Concordancing tool for the British

National Corpus, the Corpus of Contemporary American English, and a Time magazine corpus, plus non-English corpora

A powerful concordancing tool Has a useful tutorial

Click on what you want to do to see samples of searches

For example, if you want to learn to use wildcards, click on that word, and you will see several examples. You choose the type of search you want to do, and the search is automatically filled in. You can revise it based on what you want to do.

Types of searches Search by exact word, exact phrase,

wildcard, or part of speech For example, mysterious

Use ? or * as a wildcard For example, * point *

Search for an exact word plus a part of speech

For example, white [n*]

Compare usage of semantically related words

{sheer/total} [n*] Search for surrounding words

Nouns that follow the verb “wrap” Limit the search to one register

Adjectives in tabloid newspapers

Compare usage between registers, e.g., news and speaking

we [verb] that: ACAD vs SPOKEN Find words with similar, more general,

or more specific meanings Similar words to “small” More general than “shriek” More specific than “woman”

BNCweb To log in, go to:

http://bncweb.lancs.ac.uk/bncwebSignup/ For information, go to:

http://bncweb.info

On BNCweb, you can do simple searches, you can restrict your search to written or spoken texts or based on the type of text.

Form your own subcorpora.

Make frequency lists based on criteria you specify

For example, make a frequency list of all adverbs that end in –ly in spoken texts.

Look at your query history and save queries to use again.

See your results in a sentence view or a KWIC view.

Get a list of collocates, with statistics about their frequency.

Get information about what type of texts the search term was found in.

Online concordancer http://www.lextutor.ca/concordancers

/concord_e.html Can search a variety of corpora,

including the Brown Corpus, the British National Corpus (written and spoken), a learner corpus, etc.

Produces a KWIC list for a given word and a list of collocates and their frequency

WebCorp http://www.webcorp.org.uk/ Uses the Internet as a corpus and

produces KWIC as well as providing other information

Comparing two texts Text Lex Compare

http://www.lextutor.ca/text_lex_compare/ Allows users to enter two texts and get lists

of: Unique words to first text Shared words in two texts Unique words in second text

Useful to help teacher find new words in new text

Specialized corpora (a few examples) Spoken English

Corpus swb (American English telephone conversations)

http://www.ldc.upenn.edu/cgi-bin/lol/swb/speechcorpus?&corpus=swb

Technical English e-Xplore Technical English

https://learn.sz.htwk-leipzig.de/wc/main.php

Parallel corpora CRATER Multilingual Aligned Annotated

Corpus http://www.comp.lancs.ac.uk/linguistics/cr

ater/corpus.html Academic English

Michigan Corpus of American Spoken English http://quod.lib.umich.edu/m/micase/ Some large corpora also have sub-corpora

of academic English

Online software to assess readability Tests of document readability and

suggestions how to improve readability

http://www.online-utility.org/english/readability_test_and_improve.jsp

Can calculate texts of any length (some online text analysis programs have limits)

Can enter the text directly or enter a URL e.g.,

http://www.cis.doshisha.ac.jp/kkitao/Japan/shimoda/s1.htm

Provides statistics: Number of characters Number of words Number of sentences Number of syllables/word Number of words/sentence

Calculates readability indexes, including

Gunning Fog Index Coleman-Liau Index Flesch Kinkaid Grade Level Flesch Reading Ease

Lists sentences that might be rewritten to improve readability.

4. E-texts In some cases, teachers or students

may want to develop their own corpora. There are large numbers of e-text available.

Project Gutenberg http://www.gutenberg.org/wiki/Main_Page Large collection of downloadable fiction and

non-fiction

Internet Public Library: Online Texts http://www.ipl.org/div/subject/browse/

hum60.60.00/ A large number of online texts on a wide variety

of subjects Drew’s Script-o-Rama

http://www.script-o-rama.com/oldindex.shtml A website with a large number of scripts of

movies and TV programs American Rhetoric Online Speech Bank

http://www.americanrhetoric.com/speechbank.htm

A website with a large collection of speeches

5. Information about using corpus linguistics for language teaching Corpus-related websites specifically

for language teachers Learner corpora and SLA Research

http://leo.meikai.ac.jp/%7Etono/ Links to learner corpora made up of

language produced by speakers of various languages, links to useful tools, a bibliobraphy, and so on

Corpus linguistics: What it is and how it can be applied to teaching

http://iteslj.org/Articles/Krieger-Corpus.html

An article about corpus linguistics and how it can be used in the language classroom

Classroom Application Two types of uses of corpus-related

resources “Low contact” uses – teacher uses resources to help

in teaching, e.g., to find the difficult words in a reading passage; students do not actually see the corpus

“High contact” uses – students use the corpora themselves to learn about language, e.g., to find out which adjectives collocate with “rain”

“Data-driven learning” is a high contact use of corpus-related resources.

Using corpora to deduce rules of grammar or usage, e.g., to determine if a word’s connotation is positive or negative

Advantages of data-driven learning Focus on authentic language Encouragement of students to deduce Real, exploratory activities rather than drills A learner-centered activity

Web sites with suggestions for data-driven learning activities How to use concordances in teaching

English: Some suggestions http://www.nsknet.or.jp/%7Epeterr-s/

concordancing/usingconcs.html

Data-Driven Learning (DDL): the idea

http://www.ecml.at/projects/voll/rationale_and_help/booklets/resources/menu_booklet_ddl.htm

An explanation of DDL, with examples

Activities Use a corpus to check grammar

http://www.lextutor.ca/grammar_tester/

Use the concordancer in the bottom frame to check the grammar of the sample sentences in the top half

Use a concordancer to make a gap-filler or a quiz http://www.lextutor.ca/multi_conc/ http://www.nottingham.ac.uk/

~alzsh3/acvocab/awlgapmaker.htm

Find examples of a word and group them according to meaning Examples

(http://www.lextutor.ca/concordancers/concord_e.html)

party run

Use the results of a KWIC search to determine how synonyms are used differently Examples

http://www.lextutor.ca/concordancers/concord_e.html

travel, journey, trip, voyage, tour confident, fearless, pushy, upbeat, self-

reliant

Use the academic word list web page and enter a text and make a gap-filling activity http://www.nottingham.ac.uk/

~alzsh3/acvocab/awlgapmaker.htm

Resources for Corpus Linguisticshttp://www.cis.doshisha.ac.jp/

kkitao/library/resource/corpus/corpus.htm

Thank you

top related