Corpora and Teaching Language for Special Purposes
Jan 17, 2016
Corpora andTeaching Language for Special
Purposes
Terms
• język specjalistyczny
• sublanguage, scientific (technical) language, special subject language, special language, language for special purposes (LSP)
• Fachsprache
• lange de spécialité
Definition
Język specjalistyczny to “szczególna postać języka ogólnonarodowego, przystosowana do możliwie precyzyjnego opisu określonej gałęzi wiedzy lub techniki. Różni się od języka ponaddialektalnego przede wszystkim słownictwem fachowym, niejednokrotnie zawierającym wiele internacjonalizmów, oraz składnią, jak również częstotliwością użycia określonych form gramatycznych (np. passivum w niemieckim języku technicznym) (Szulc 1984:104 w Roskowski 2005:174)
Characteristic features
• Określony zakres tematyczny ograniczony do danej dziedziny
• Ograniczenia struktury leksykalnej, składniowej i semantycznej
• Niestandardowe zasady gramatyczne• Wysoka częstotliwość występowania niektórych
konstrukcji• Specyficzne cechy struktury tekstowej• Wykorzystanie specjalnych symboli
(Lehrberger 1886, 1982: 102 w Roszkowski 2005:176)
• Different levels of specialisation– English for academic purposes vs. English for
chemistry– English for medical purposes vs. English for
urology
Teaching LSP
• Lack of textbooks or other teaching materials
• Teacher’s lack of expertise
How can corpora help?
• Syllabus design
• Production of teaching materials -- authentic examples– Text types– Vocabulary and grammar in context
Types of corpora helpful in teaching LSP
• subcorpora (sections of general corpora) e.g. section J of Brown or LOB
• special corpora (Air Trafic Control Corpus)
• monolingual corpora• bilingual corpora
– comparable– parallel
http://devoted.to/corpora
Example 1
Academic language
Verb frequencies
LOB_J LOB LLraw 100,000 raw 100,000
corpus size 164,016 1MVB*(any form of lexical verb)
13,625 8,310 71,607 7,160 244.05*
VBG(present participle, gerund)
1,493 910 12,979 1,300 185.10*
VBN(past participle)
5,286 3,220 27,031 2,700 131.43*
MD(modal auxiliary)
2,136 1,300 14,861 1,490 33.64*
BE*(any form of the verb be)
8,797 5,360 42,979 4,300 341.03*
Academic Word List
• developed at the School of Linguistics and Applied Language Studies at Victoria University of Wellington, New Zealand
• purpose: to be used by teachers and students working alone as part of a preparation for tertiary level study
• Coxhead, Averil (2000) A New Academic Word List. TESOL Quarterly, 34(2): 213-238
• http://www.victoria.ac.nz/lals/resources/academicwordlist/
Academic corpus
• a written corpus of academic English• approximately 3,500,000 running words• divided into four faculty sections: Arts,
Commerce, Law and Science• Each of these faculty sections contained
approximately 875,000 running words.• Each faculty section was divided into seven
subject areas of approximately 125,000 running words.
Arts Commerce Law ScienceEducation Accounting Constitutional Law BiologyHistory Economics Criminal Law ChemistryLinguistics Finance Family Law and Medico-
LegalComputerScience
Philosophy IndustrialRelations
International Law Geography
Politics Management Pure Commercial Law GeologyPsychology Marketing Quasi-Commercial Law MathematicsSociology Public Policy Rights and Remedies Physics
Selection criteria
• The AWL contains 570 word families• Range. The AWL families had to occur in the Arts,
Commerce, Law and Science faculty sections of the Academic Corpus (see below for details on the Academic Corpus). The word families also had to occur in over half of the 28 subject areas of the Academic Corpus. Just over 94% of the words in the AWL occur in 20 or more subject areas. This principle ensures that the words in the AWL are useful for all learners, no matter what their area of study or what combination of subjects they take at tertiary level.
Selection criteria
• Frequency. The AWL families had to occur over 100 times in the 3,500,000 word Academic Corpus in order to be considered for inclusion in the list. This principle ensures that the words will be met a reasonable number of times in academic texts.
• Uniformity of frequency. The AWL families had to occur a minimum of 10 times in each faculty of the Academic Corpus to be considered for inclusion in the list. This principle ensures that the vocabulary is useful for all learners.
Words excluded from the AWL
• Words occurring in the first 2,000 words of English. The AWL assumes knowledge of West's General Service List (GSL) (1953) as the basic vocabulary any learner should have before starting to learn academic vocabulary.
• Narrow range words. Words which occurred in fewer than 4 faculty sections of the Academic Corpus or which occurred in fewer than 15 of the 28 subject areas of the Academic Corpus were excluded because they had narrow range. Technical or specialist words often have narrow range and were excluded on this basis
Words excluded from the AWL
• Proper nouns. The names of places, people, countries, for example, New Zealand, Jim Bolger and Wellington were excluded from the list.
• Latin forms. Some of the most common Latin forms in the Academic Corpus were et al, etc, ie, and ibid.
Example 2
Język umów
(Roszkowski 2005:184)
Wielkość 30,000 słówLiczba tekstów 62Medium teksty pisaneTemat zobowiązaniaTyp tekstu umowa standardowaAutor teksty sporządzone
przez ekspertówJęzyk język polskiData publikacji 1997-2003
Patterns of word frequencies
Frequency
Example 3
English for Biology
Corpus of English for Biology
• source: Flowerdew 2001• purpose: course design
– the selection and grading of items for the syllabus– the authentic contextualization of theses items in teaching
materials
• teaching situation: – science students at the English-medium Sultan Quaboos
University, Oman; foundation course– science course and English course
• corpus: transcripts of 25 hours of lectures in biology + supporting reading material
• size 104,483 tokens
Example(Flowerdew 2001:79)
Important connetors:
so (1183) , then (266), first (103), next (72)
Less important connectors:
however (13), therefore 11, thus (8), finally (8), as a result (4)
Connectors not apearing at all:
what is more, whatsomore, furthermore, nonetheless,
nevertheless, hence, consequently, in conclusion, in contrast,
after that
Example 4
My project
English for Urology
Design criteria
size 35978 47107no of texts 20 20content urology urologylanguage Polish
nativeEnglishnative and non-native
medium written writtentext type long summary of research
paperresearch paper
publicationstatus
Urologia Polska(prestigious)
British Journal of Urology(prestigious)
author expert expertaudience experts expertsdate 2005 2005
Analysis
• wordlist
• keyword list
• list of clusters
• collocations
Terminology extraction
• wordlists
• clusters
• keywords
http://www.ling.pl/
Examples of comparable corpora
• Multext– http://www.lpl.univ-aix.fr/projects/multext/
• Multext-East– http://nl.ijs.si/ME/