Top Banner
By Erkan Karabacak
24

A Brief Intro to Corpus Techniques in ELT Research

Jan 14, 2016

Download

Documents

holly

A Brief Intro to Corpus Techniques in ELT Research. By Erkan Karabacak. Several Important Buzzwords. corpus : corpora : concordancer : keyword in context (KWIC) :. Several Important Buzzwords. corpus : a collection of texts corpora : corpus in plural concordancer : a search engine - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Brief Intro to  Corpus Techniques  in ELT Research

By Erkan Karabacak

Page 2: A Brief Intro to  Corpus Techniques  in ELT Research

corpus: corpora: concordancer: keyword in context (KWIC):

Page 3: A Brief Intro to  Corpus Techniques  in ELT Research

corpus: a collection of texts

corpora: corpus in plural

concordancer: a search engine

keyword in context (KWIC): a list of words in context

Page 4: A Brief Intro to  Corpus Techniques  in ELT Research

•The Oxford Text Archivewww.ota.ox.ac.uk•Warwick Centre for Applied

Linguisticshttp://www2.warwick.ac.uk/fac/soc/al/•Open American National Corpushttp://americannationalcorpus.org/OANC/OANC-1.0.1-UTF8.zip

Page 5: A Brief Intro to  Corpus Techniques  in ELT Research

Monoconc Wordsmith Tools Concordance Simple Concordance Program WConcord TextStat AntConc

Page 6: A Brief Intro to  Corpus Techniques  in ELT Research
Page 7: A Brief Intro to  Corpus Techniques  in ELT Research

AntConc is our concordancer We will use BASE and texts from our

students as our corpora We will do some simple analyses to

answer some language related questions.

Page 8: A Brief Intro to  Corpus Techniques  in ELT Research

Open the read-me file online and read it.

http://www.antlab.sci.waseda.ac.jp/software/README_antconc3.2.1.txt

Page 9: A Brief Intro to  Corpus Techniques  in ELT Research

By Laurence Anthony, Waseda University, Tokyo

http://www.antlab.sci.waseda.ac.jp/software/antconc3.2.1w.exe

Open Google and search for “download AntConc”

Page 10: A Brief Intro to  Corpus Techniques  in ELT Research

We need a collection of texts (corpus) of an adequate size.

Page 11: A Brief Intro to  Corpus Techniques  in ELT Research

• developed at the Universities of Warwick and Reading

• a collection of transcripts of lectures and seminars recorded at two universities in the UK during the period 1998-2005.

• recorded in a variety of university departments. four broad disciplinary groups,

• each represented by 40 lectures and 10 seminars.

Page 12: A Brief Intro to  Corpus Techniques  in ELT Research

• Arts and Humanities • Life and Medical Sciences • Physical Sciences • Social Studies and Sciences.

Page 13: A Brief Intro to  Corpus Techniques  in ELT Research

• Arts and Humanities , 40 text files, untagged

• Life and Medical Sciences, 40 text files, untagged

• Physical Sciences • Social Studies and Sciences.

Page 14: A Brief Intro to  Corpus Techniques  in ELT Research

…now what are you reading now he asked as i put down the book and reached for my jacket i was labouring over Troilus and Criseyde reading an essay on Criseyde's character you love this rubbish eh he laughed you'll end up an old professor wanking by the fireside putting aside your pipe and warming up your hand first i should say this is not a autobiographical work [laughter] in any s-, in any way right [laughter] er

sm0003: you've said that before [laughter] nm0001: warming up your hand first [laughter] i looked at

him sternly only a joke man he said with mocking reassurance only a joke i sat on the bus deep in thought trying to work out why she should have betrayed him so easily why after all those pure shy exchanges the secret glances

Page 15: A Brief Intro to  Corpus Techniques  in ELT Research

Open AntConc FileOpen FilesSelect the files you

would like to analyze by ctrl+shift (or clicking with your mouse’s left button)Open

You will see the selected files in the left window (titled “corpus files”)

Page 16: A Brief Intro to  Corpus Techniques  in ELT Research

What is the size of the corpus? (How many words (tokens) are there?)

How many different words (types) are there?

Click “Word List”Make your selectionsStart

Page 17: A Brief Intro to  Corpus Techniques  in ELT Research

Let’s search for a single word.

Page 18: A Brief Intro to  Corpus Techniques  in ELT Research

Which lectures are the most fun? Which lectures did not have a lesson

plan? What part of speech mostly follows a

pause?

Page 19: A Brief Intro to  Corpus Techniques  in ELT Research

<struct type="tok" from="29" to="34">

  <feat name="base" value="right" />

  <feat name="msd" value="NN" />

  </struct>

<struct type="tok" from="34" to="35">

  <feat name="base" value="," />

  <feat name="msd" value="," />

  </struct>

<struct type="tok" from="36" to="40">  <feat name="msd" value="DT" />   <feat name="base" value="this" />   <feat name="affix" value="" />   </struct>

<struct type="tok" from="41" to="43">  <feat name="msd" value="VBZ" />   <feat name="base" value="be" />   <feat name="affix" value="s" />   </struct>

Page 20: A Brief Intro to  Corpus Techniques  in ELT Research

Let’s say we want to create a dictionary of medical terms.

Our analysis corpus is BAWE Life and Medical Sciences

Page 21: A Brief Intro to  Corpus Techniques  in ELT Research

What are the most frequently used 10 lexical bundles by American students?

What are the most frequently used 10 lexical bundles by Chinese students?

Page 22: A Brief Intro to  Corpus Techniques  in ELT Research

Of course, AntConc is not enough for every type of analysis.

An applied linguist who wishes to analyze large language data not only should know several application programs, but also learn a programming language; such as PERL

Page 23: A Brief Intro to  Corpus Techniques  in ELT Research

We can create a diachronic corpus from our students papers and observe their development.

We can tag texts for their part of speech or for other information.

We can automatically compile corpora from online sources.

Page 24: A Brief Intro to  Corpus Techniques  in ELT Research

We can do all of the above for other languages (Turkish, Chinese, Russian, and so on)

We can do EVERYTHING a linguist might need to do with texts.