A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015
Jan 11, 2016
A UBC textbook corpus to
identify EAP target vocabulary
Mike Murphy | BC TEAL Conference | May 22, 2015
What’s in it for you?• A better understanding of the Academic Word List• Food for thought about the nature of academic written
vocabulary• Technical tips, if you’re new to corpus research• An engineering-specific academic word list to use in your
teaching
Agenda• Introduction
• The academic vocabulary controversy / Research questions and design
• Background• Lexical thresholds / GSL and AWL / Previous engineering lists
• Methodology Notes• Identifying target texts / Notes on logistics
• Results• FEC coverage comparisons / Overlap between FEWL and AWL /
Word technicality assessment
• Conclusions• Is there an academic vocabulary? / Classroom applications of
FEWL
INTRODUCTION• Is there an academic vocabulary?• Research questions• Research design
Is there an academic vocabulary?
Yes.• Paul Nation (2013): notwithstanding the distinctive
linguistic features of texts from different academic disciplines, all scholars share a common vocabulary because they perform common communicative functions, such as evaluating research, describing methods, and presenting and discussing data
• Word lists for EAP vocabulary teaching and learning should be derived from general academic corpora.
Is there an academic vocabulary?
No.• Hyland and Tse (2007): “It is by no means certain that
there is a single academic literacy which university students need to acquire to participate in academic environments, and we believe that a perspective which seeks to identify and teach such a vocabulary … does not correspond with the ways language is actually used in academic writing”
• Word lists for EAP vocabulary teaching and learning should be based on discipline-specific corpora.
Research questions
To what extent can there be said to exist a general, non-discipline-specific academic English?
1. How well will Coxhead’s AWL, an EAP word list derived from a general academic corpus, cover the lexis in a corpus of first-year engineering textbooks?
2. Compared to AWL, how well will a word list that is derived from the first-year engineering corpus cover the lexis in the corpus?
3. If the items in the engineering-specific word list differ from the items in AWL, what proportion of the non-AWL items will be too technical to use in non-engineering-specific EAP contexts?
Research design• Build a corpus of all textbooks used in first-year UBC
engineering courses for the 2014-’15 academic year• From this “First-year Engineering Corpus” (FEC), derive a
“First-year Engineering Word List” (FEWL) that resembles AWL in these ways:• Consists of the 570 most frequent word families in the corpus,
divided into 10 sublists by frequency• Excludes items from the General Service List (West 1953)
Research design (con’d)
• Compare coverages of FEC provided by the general and discipline-specific word lists AWL and FEWL
• Assess whether FEWL items that do not overlap with AWL are too technical for certain EAP contexts.
BACKGROUND• Lexical thresholds for effective reading• The General Service List and Academic Word List• Previous engineering-specific word lists
Lexical thresholds for effective reading
• research suggests readers must know 95% of a text’s words to be able to guess the unknown words and have reasonable comprehension of the text overall.• Liu and Nation (1985)• Hirsh and Nation (1992)• Laufer (1992)
The General Service List• Developed in 1940s mainly to support creation of
simplified, difficulty-sequenced ESL readers• 2,000 word families divided into GSL 1 and GSL 2• Gives an average of 82% coverage of various written
texts (Hirsh and Nation, 1992)• Has been criticized by some due to its age and written
focus but Nation and Waring (1997) claim it is still best available list of high-frequency English
• A New General Service List is available
AWL: Profile of the corpus • Based on a 3.5M-token corpus of academic texts
• 4 broad areas—Arts, Commerce, Law, Science—each encompassing 7 disciplines, for a total of 28 disciplines
• Text types represented: Journal articles, textbooks and coursebooks, texts from the ‘Learned and Scientific’ subcorpora of Wellington, Brown, LOB
AWL: Profile of the word list• Three broad selection criteria for list items
1. Distinctiveness – vocabulary item does not appear on GSL
2. Frequency – item appears at least 100 times in corpus as a whole
3. Range – items appears… a) At least 10 times in each of the 4 macro-discipline subcorpora
(Arts, Commerce, Law, Science )
b) At least once in 15 or more of the 28 subject areas subsumed by the four macro-disciplines
Coverage of Coxhead’s academic corpus provided by GSL and AWL
Academic Word List
General Service List Total
Subcorpus GSL 1 GSL 2
Arts 9.3 73.0 4.4 86.7
Commerce 12.0 71.6 5.2 88.8
Law 9.4 75.0 4.1 88.5
Science 9.1 65.7 5.0 79.8
Previous engineering lists: Englist (Ward ,1999)• Developed at a Thai university for engineering students• Unlike AWL, overlaps with GSL• 2,000 word families, derived from a 1M-token corpus
• Corpus made up of one textbook each from five required first-year engineering courses at his university
• EngList covered 95% of Ward’s engineering corpus• Ward favours word lists derived from discipline-
specific corpora
EngList vs. GSL/UWLSubject Grouping Subject GSL/UWL
(2,836 words)
GSL Ward’s EngList
(2,000 words)
Background Science
Biology 85.9 76.5 79.5
Chemistry 88.7 76.1 89.2
Physics 92.5 81.7 94.3
Background
Engineering
Engineering Materials 86.5 76.0 89.4
Engineering
Mechanics
92.0 83.0 97.4
Fluid Mechanics 92.1 79.9 97.4
Specialist
Engineering
Mechanical
Engineering
80.3 71.4 84.2
Chemical Engineering 92.5 80.1 91.7
Electrical Engineering 94.1 81.7 96.7
Humanities
Economics 92.6 82.8 86.1
Philosophy 95.0 87.2 87.3
Psychology 91.0 80.9 84.4
Previous engineering lists:
The SEEC list (Mudraya, 2006)• Also developed in Thai university context• 2M-token Student Engineering English Corpus (SEEC)
• 13 complete textbooks used in required undergrad engineering courses
• ‘Keyness’ comparison with reference corpora BoE and BNC
• Most unusually frequent SEEC items that also had good range over 13 subcorpora were high frequency or general academic lexis, NOT technical lexis
Previous engineering lists:
Mudraya’s observations• Verbs that scored highest on keyness measures and
occurred in at least 5 of 13 subcorpora: act, apply, assume, be, become, calculate, consider, correspond to, define, determine, exert, give, illustrate, indicate, locate, obtain, occur, require, show, sketch, solve, substitute, use
• Distinctive lexis in her engineering corpus not engineering-specific • we can identify a general, pan-academic English and it, not
discipline-specific lexis, should be focus of EAP studies
METHODOLOGY NOTES• Identifying target texts• Logistical Notes
Identifying target courses/readingsSubject Area Course
Comp. Sci. • Introduction to Computation in Engineering Design
Chemistry • Chemistry for Engineering
English • Strategies for University Writing
Math
• Differential Calculus with Applications to Physical Science and
Engineering
• Integral Calculus with Applications to Physical Science and
Engineering
• Linear Systems
Physics
• Introductory Physics for Engineers I
• Introductory Physics for Engineers II
• Mechanics
Notes on logistics
• Software used to…• Convert .PDFs of textbook pages to text files:
PDF2TXT, v. 3.2• Clean text files: EditPad Lite 7• Make frequency counts: AntConc concordancing
software (v. 3.4)• Calculate coverages: Microsoft Excel• Obtain electronic copies of AWL and GSL:
Paul Nation’s Range program
RESULTS • FEC coverage comparisons• Overlap between FEWL and AWL• Word technicality assessment
Coverage of FEC by GSL, AWL, and FEWL
English Comp. Sci.
Chem. Math Physics Overall
General Service List (GSL)
80.8 78.0 77.0 81.1 80.9 79.5
Academic Word List (AWL)
10.6(91.4)
13.0(91.1)
9.9(86.9)
9.1(90.2)
7.9 (88.8)
10.1 (89.7)
First-year Engineering Word List (FEWL)
10.2(91)
18.3(96.3)
16.9(93.9)
16.0(97.1)
14.9(95.8)
15.3(94.8)
Most frequent FEWL items in each FEC subcorpus
English Comp. Sci. Chem. Math Physics
1. research function† react function† energy††
2. scholar data atom equation*** magnet
3. genre computer molecule vector potential
4. community array chemical matrix magnitude
5. summary file energy†† graph conduct
6. abstract vary electron††† linear constant
7. define* define* equation*** define* section**
8. cite structure bond theorem positive
9. identify equation*** equilibrium integral electron†††
10. culture section** ion section** wavelength
Top 50 FEWL words*non-AWL words are bolded
1. function 11. atom 21. electron 31. positive 41. integral
2. equation 12. matrix 22. potential 32. sum 42. negative
3. react 13. constant 23. area 33. equilibrium 43. ion
4. define 14. molecule 24. array 34. element 44. loop
5. energy 15. research 25. chemical 35. maximum 45. correspond
6. computer 16. structure 26. identify 36. series 46. magnitude
7. section 17. linear 27. magnet 37. initial 47. estimate
8. data 18. process 28. volume 38. require 48. axis
9. vector 19. chapter 29. file 39. occur 49. bond
10. vary 20. graph 30. assume 40. conduct 50. theorem
Overlap between FEWL and AWLFEWL Sublist
Non-AWL words in FEWL sublist
FEWL headwords that are non-AWL
Sublist 1 20 20 / 60 (33%)
Sublist 2 19 39 / 120 (33%)
Sublist 3 24 63 / 180 (35%)
Sublist 4 33 96 / 240 (40%)
Sublist 5 31 127 / 300 (42%)
Sublist 6 27 154 / 360 (43%)
Sublist 7 35 189 / 420 (45%)
Sublist 8 34 223 / 480 (46%)
Sublist 9 35 258 / 540 (48%)
Sublist 10 17 275 / 570 (48%)
Word technicality scale (Coxhead and Nation, 2001)
Category 1
item appears rarely if ever outside a particular field (e.g. in Law, jactitation, per curieam, cloture)
Category 2
item is used both inside and outside a particular field, but the meaning it conveys outside the field is completely different from the meaning it has in the field (e.g. in Law, a caution is a formal warning recited by a police officer when arresting a suspect; in its non-legal usage it means ‘prudence’ or ‘circumspection’)
Category 3
item is used both inside and outside a given field but most of its uses with a particular meaning occur in the field (e.g. most instances of reconstruction appear in legal discourse, but occurrences with this meaning also appear outside the field of Law). Crucially, specialized, ‘in-field’ meaning “is readily accessible through its meaning outside the field”
Category 4
item is somewhat more common inside than outside of field but specialization of meaning is minimal or non-existent (e.g. judge, mortgage, trespass)
Technicality results
Of the 275 FEWL items that do not overlap with AWL,• 176 (or 64%) were deemed non-technical • 99 (or 36%) were deemed technical
Top 10 non-AWL FEWL headwords judged 'technical' and
'non-technical'Technical Non-Technical
1. vector 1. atom
2. matrix 2. linear
3. molecule 3. graph
4. electron 4. magnet
5. array 5. equilibrium
6. ion 6. magnitude
7. loop 7. dense
8. axis 8. radius
9. theorem 9. scholar
10. velocity 10. wavelength
CONCLUSIONS
Is there an academic vocabulary?• Yes, to an extent: Coxhead’s claim that AWL covers about
10% of lexis in academic texts holds up in this study, in the case of first-year UBC engineering texts
• However, at 15%, FEWL provides engineering EAP students with a much greater return on their investment in vocabulary study
Applications of FEWL?• Not suitable for mixed-major, pre-sessional EAP classes
• Could be given to engineering students in the class to be used for self-study
• Suitable for situations in which all students are taking or preparing to take engineering classes and the instructor has some specialized knowledge of engineering • E.g. engineering-specific pre-sessional courses; in-sessional
adjunct courses
Contact information• Mike Murphy, English Language Institute, • [email protected]
References• Anthony, L. (2014) AntConc• Bauer, L. and Nation, P. (1993) Word Families. International Journal of Lexicography, 6: 253–279• Bruce, I. (2011) Theory and concepts of English for academic purposes. Houndmills, Basingstoke, Hampshire ; New
York: Palgrave Macmillan• Cobb, T. and Horst, M. (2001) “Reading academic English: Carrying learners across the lexical threshold.” In Flowerdew, J.
and Peacock, M. (eds.) Research perspectives on English for academic purposes. Cambridge applied linguistics series. Cambridge, England: Cambridge University Press. pp. 315–329
• Cohen, A.D., Glasman, H., Rosenbaum-Cohen, P.R., et al. (1988) “Reading English for specialised purposes: Discourse analysis and the use of student informants.” In Carrell, P., Devine, J. and Eskey, D.E. (eds.) Interactive Approaches to Second Language Reading. Cambridge: Cambridge University Press. pp. 152–167
• Cooper, M. (1984) “Linguistic competence of practised and unpractised non-native speakers of English.” In Alderson, J.C. and Urquhart, A.H. (eds.) Reading in a foreign language. Applied linguistics and language study. London ; New York: Longman. pp. 122–138
• Coxhead, A. (2000) A New Academic Word List. TESOL Quarterly, 34 (2): 213–238• Coxhead, A. and Hirsh, D. (2007) A pilot science-specific word list. Revue française de linguistique appliquée, Vol. XII (2):
65–78• Coxhead, A. and Nation, P. (2001) “The specialised vocabulary of English for academic purposes.” In Flowerdew, J. and
Peacock, M. (eds.) Research perspectives on English for academic purposes. Cambridge applied linguistics series. Cambridge, England: Cambridge University Press. pp. 252–267
• Engels, L.K. (1968) The fallacy of word counts. IRAL, 6: 213–231• Farrell, P. (1990) Vocabulary in ESP: a lexical analysis of the English of electronics and a study of semi-technical
vocabulary [online]. M.Phil., Trinity College Dublin (University of Dublin) (Ireland). Available from: http://search.proquest.com.ezproxy.library.ubc.ca/docview/301436110?pq-origsite=summon [Accessed 3 November 2014]
• Freebody, P. and Anderson, R.C. (1983) Effects of Vocabulary Difficulty, Text Cohesion, and Schema Availability on Reading Comprehension. Reading Research Quarterly, 18 (3): 277–294
• Ghadessy, M. (1979) Frequency counts, word lists, and materials preparation: A new approach. English Teaching Forum, 17 (1): 24–27
References• Guhr, D.J., Furtado, N.D. and Villareal, N. (2014) 2014 British Columbia International Education Intelligence Report. San
Carlos, CA: Illluminate Consulting Group• Haahr, M. (1998) RANDOM.ORG [online]. Dublin: Randomness and Integrity Services Ltd. Available from: www.random.org• Heatley, A. and Nation, I.S.P. (1996) Range [online]. Wellington, New Zealand: Victoria University of Wellington. Available from:
http://www.vuw.ac.nz/lals• Hirsh, D. (2004) A Functional Representation of Academic Vocabulary. PhD, Victoria University of Wellington• Hirsh, D. and Nation, P. (1992) What Vocabulary Size Is Needed to Read Unsimplified Texts for Pleasure? Reading in a
Foreign Language, 8 (2): 689–696• Hornby, A.S., Phillips, P. and Ashby, M. (eds.) (2010) Oxford advanced learner’s dictionary of current English. 8th ed.
Oxford: Oxford University Press• Hyland, K. and Tse, P. (2007) Is There an “Academic Vocabulary”? TESOL Quarterly, 41 (2): 235–253• Krishnamurthy, R. and Kosem, I. (2007) Issues in creating a corpus for EAP pedagogy and research. Journal of English for
Academic Purposes, 6 (4): 356–373• Laufer, B. (1992) “How much lexis is necessary for reading comprehension?” In Arnaud, P.J.L. and Bejoint, H. (eds.)
Vocabulary and applied linguistics. Houndmills, Basingstoke: Macmillan Academic and Professional. pp. 126–132• Law, J. and Martin, E.A. (2009) A Dictionary of Law. 7th ed. [online]. Oxford University Press. Available from:
http://www.oxfordreference.com/view/10.1093/acref/9780199551248.001.0001/acref-9780199551248 [Accessed 31 October 2014]
• Liu, N. and Nation, I.S.P. (1985) Factors affecting guessing vocabulary in context. RELC journal, 16 (1): 33–42• Lynn, R.W. (1973) Preparing word lists: A suggested method. RELC journal, 4 (1): 25–32• McEnery, T. and Hardie, A. (2012) Corpus linguistics: method, theory and practice. Cambridge textbooks in linguistics.
Cambridge ; New York: Cambridge University Press• Miller, D. (2011) ESL reading textbooks vs. university textbooks: Are we giving our students the input they may need. Journal
of English for Academic Purposes, 10 (1): 32–46• Moudraia, O. (2003) “The student engineering corpus: analysing word frequency.” In Archer, D., Rayson, P., Wilson, A., et al.
(eds.) Proceedings of the corpus linguistics 2003 conference. Lancaster: UCREL, Lancaster University. pp. 552–561
References• Mudraya, O. (2006) Engineering English: A lexical frequency instructional model. English for Specific Purposes,
25 (2): 235–256• Nation, I.S.P. (2013) Learning vocabulary in another language. The Cambridge applied linguistics series. Second
Edition. Cambridge: Cambridge University Press• Nation, P. and Waring, R. (1997) “Vocabulary size, text coverage and word lists.” In Schmitt, N. and McCarthy, M.
(eds.) Vocabulary: Description, Acquisition and Pedagogy. Cambridge University Press. pp. 6–19• Nurweni, A. and Read, J. (1999) The English language knowledge of Indonesian university students. English for
Specific Purposes, 18 (2): 161–175• PDF2TXT (2007). VeryPDF• Richards, J.C. (1974) Word list: Problems and prospects. RELC, 5 (2): 69–84• Salager, F. (1984) The English of medical literature research project. English for Specific Purposes, 87 (5 July)• Sinclair, J. (1991) Corpus, concordance, collocation. Describing English language. Oxford: Oxford University
Press• Sinclair, J. (2005) “Corpus and text--Basic principles.” In Wynne, M. (ed.) Developing linguistic corpora: a guide
to good practice. AHDS guides to good practice. Oxford [U.K.]: Oxbow. p. Chap. 1• Valipouri, L. and Nassaji, H. (2013) A corpus-based study of academic vocabulary in chemistry research articles.
Journal of English for Academic Purposes, 12 (4): 248–263• Ward, J. (1999) How Large a Vocabulary Do EAP Engineering Students Need? Reading in a Foreign Language,
12 (2): 309–323• West, M. (1953) A General Service List of English Words. Longman• Williams, J. (2013) LEAP Advanced Reading and Writing Student Book with CW+. Pearson Education, Limited• Xue, G. and Nation, I.S.P. (1984) A University Word List. Language Learning and Communication, 3 (2): 215–229• Yang, H. (1986) A new technique for identifying scientific/technical terms and describing science texts. Literary and
linguistic computing, 1 (2): 93–103