Page 1
Computer Assisted Vocabulary
Learning: Design and evaluation
Qing Maa* and Peter Kellyb
aUniversity of Louvain, Belgium; bThree Gorges University, China
This paper focuses on the design and evaluation of the computer-assisted vocabulary learning
(CAVL) software WUFUN. It draws on the current research findings of vocabulary acquisition and
CALL, aiming to help Chinese university students to improve their learning of English vocabulary,
particularly that with which they experience most difficulty. It is argued that vocabulary should be
learned explicitly as well as implicitly; learners need to be trained to become good learners, e.g., by
being instructed in useful learning strategies, to enable them to learn vocabulary more efficiently
and effectively. A design model of CALL efficacy is constructed to ensure the quality of vocabulary
learning in CALL programs; it is employed in the design of the software WUFUN. Finally, the
preliminary results of the software evaluation are reported and discussed.
Introduction
Vocabulary learning has always been a popular subject in CALL programs, especially
in the early stages of CALL (1980s) when technology was relatively simple and it was
thought that vocabulary learning could be easily integrated into CALL programs. The
earlier programs typically included a single type of language learning activity, such as
text reconstruction, gap-filling, speed-reading, simulation, and vocabulary games
(Levy, 1997). The range was narrow, probably because previously computers were
less powerful and language teachers did not have sufficient knowledge of
programming (Goodfellow, 1995). It might also have been due to the limited
number of vocabulary learning theories at a time when vocabulary learning was just
starting to attract people’s attention.
*Corresponding author. Rue Grafe 4, App. 206, Namur, 5000, Belgium.
Email: [email protected]
Computer Assisted Language LearningVol. 19, No. 1, February 2006, pp. 15 – 45
ISSN 0958-8221 (print)/ISSN 1744-3210 (online)/06/010015–31
� 2006 Taylor & Francis
DOI: 10.1080/09588220600803998
Page 2
Nowadays, vocabulary learning is often viewed as a sub-component of a
multimedia package or a CALL program, particularly in commercialised materials.
Some researchers have tried to create CALL programs devoted to vocabulary learning
(Goodfellow, 1994; Groot, 2000; Boers, Eyckmans, & Stengers, 2004). One common
feature is situating vocabulary learning in context instead of treating it as an isolated
activity, as was the case before. Another important trend is for learners to be given as
much freedom as possible to choose what to learn and how to learn. However, this
could be problematic if learners do not know how to deal with the learning tasks and
use the software effectively. Too much freedom will sometimes adversely affect the
learning result. A way forward is for learners to be given some help to become ‘good
learners’—that is, to acquire sufficient knowledge about language learning and have
the ability to take charge of their own learning effectively and efficiently.1 They can
thus benefit maximally from the freedom of learning.
In this article, we first review the literature on those approaches to vocabulary
learning and CALL programs that take into account vocabulary learning. This is
followed by an introduction to a CALL efficacy model, which aims to help and guide
the learner to the completion of learning tasks as a way to ensure the quality of a
CALL program. This CALL efficacy model is used to design the software WUFUN,
developed for Chinese university students to help them to learn vocabulary perceived
as difficult. A pilot study was carried out to evaluate a prototypical unit of the software
as well as to validate the CALL efficacy model empirically in two settings: individual
use and classroom use. Results are reported and discussed. Finally, possible
improvements regarding both the software design and future research are outlined.
Current Approaches to Vocabulary Learning
Approaches to vocabulary learning can be generally categorized under two broad
paradigms: the implicit and the explicit learning paradigm. In this article, the
meaning of ‘implicit’ and ‘explicit’ is not restricted to what they mean in ‘implicit
learning’ and ‘explicit learning’2 in cognitive psychology; rather, the literal meanings
of the two words are used to refer to the main features associated with the two
paradigms. Implicit learning is associated with natural, effortless and meaning
focused learning; explicit learning implies that learning requires more deliberate
mental effort than simply engaging in meaning focused activities and that a link has to
be established between meaning and form by various means.
The Implicit Learning Paradigm
The basic assumption of the implicit learning paradigm is that words can be acquired
naturally through repeated exposure in various language contexts with reading as the
major source of input, a notion that is strongly supported by findings in respect of L1
vocabulary acquisition. Incidental learning is perhaps the most important feature of
this learning paradigm. It can be defined as the process of acquiring vocabulary and
grammar through meaning focused communicative activities such as reading and
16 Q. Ma and P. Kelly
Page 3
listening (Hulstijn, 2003, p. 349). Several studies support the implicit learning
paradigm. Krashen’s input hypothesis (1989, 1993) postulates that vocabulary can be
acquired by reading as long as the input is comprehensible to the learner. Nagy,
Herman, and Anderson (1985) hold the view that children acquire most L1 words
through reading and that they do so incidentally. In the same vein, Sternberg (1987,
p. 89), relying on studies in L1 acquisition, claims that ‘‘most vocabulary is learned
from context’’ by contextual guessing, although whether this process can take place
successfully or not depends on several ‘‘moderating variables’’ (pp. 92 – 94), such as
the density of unknown words; the learner may be overwhelmed by a large number of
unknown words with the result that no learning takes place.
The main problem with regard to acquiring vocabulary incidentally in L2
acquisition seems to be attributable to three sources. First, incidental learning
inevitably involves a great deal of contextual guessing of the unknown words. Context
alone does not always facilitate meaning transfer; in some cases even educated adults
cannot infer the meaning of L1 words in context (Ames, 1966; Beck, McKeown, &
McCaslin, 1983, cited in Duquette, Renee, & Laurier, 1998). Second, as a
consequence, the learning rate is very low (see Hulstijn, 1992). According to Nation
(1990), 5 – 16 exposures are needed to fully acquire a word. This is implicitly
supported by Nagy et al. (1985) who reported a 5% – 15% probability of a word being
learned at first exposure; similarly, Knight (1994) demonstrated a learning rate of
5% – 21% from her studies, also for one exposure. Third, the vocabulary acquired
through incidental learning is mainly for recognition and hardly at all for production
(see Paribakht & Wesche, 1997; Wesche & Paribakht, 2000). This is due to the nature
of incidental learning: the main language activity is reading where the focus is on
meaning and content and only limited attention is paid to the lexical and syntactic
features of the new words. The quality and quantity of lexical processing in incidental
learning is simply insufficient to enable the learner to grasp the precise meanings and
correct usage of words that will lead to correct production.
The Explicit Learning Paradigm
Authors who adhere to this paradigm argue that vocabulary and vocabulary learning
strategies should be learned or taught explicitly so that learning can be more efficient.
They agree with upholders of incidental learning that context is the main source for
acquiring vocabulary, but they claim that learners need some extra help to build up an
adequate vocabulary and to acquire the strategies necessary to cope with the vast
reading context (see Coady, 1997). There are two main approaches in respect of the
explicit learning paradigm: explicit instruction and strategy instruction.
Authors who favour explicit instruction argue that learners should be taught
vocabulary explicitly by using various means including direct memorization
techniques (Coady, 1993; Nation, 1990, 2001). Here the concern is mainly with
low level learners who do not have enough vocabulary to read extensively. Nation
(2001) suggests that high frequency (2,000 word level) and low frequency vocabulary
should be treated differently. High frequency words have a high coverage (80%) of
Computer Assisted Vocabulary Learning 17
Page 4
text (p. 11) and should be mastered as soon as possible; this can be achieved by direct
teaching (teacher explanation, peer teaching), direct learning (using word cards,
consulting dictionaries), incidental learning (contextual guessing, communicative
activities) and planned encounters with the words (graded reading, vocabulary
exercises) (p. 16). As for the low frequency words, teachers should train learners to
use strategies such as contextual guessing, dictionary use, memory techniques and
vocabulary cards to cope with these words and to enlarge their vocabulary (p. 20).
According to Laufer (1997, p. 23), learners should master a basic vocabulary of 3,000
word families to be able to use the ‘‘high level processing strategies’’ needed to
comprehend a general text. The empirical studies of Paribakht and Wesche (1997)
and Wesche and Paribakht (2000) show that reading plus explicit vocabulary training
enables learners to learn vocabulary both quantitatively and qualitatively better than
by simply relying on context alone. Laufer (2001) demonstrated a superior lexical
gain when decontextualised word-focused activities were used than when learners
were simply engaged in reading comprehension.
The second approach, strategy instruction, emphasizes teaching the learners
specific learning strategies to make learning more efficient (Cohen, 1998; Cohen,
Weaver, & Li, 1996; O’Malley et al., 1985; Oxford & Scarcella, 1994). Researchers of
strategy instruction often hold the view that context can provide the essential means
for learning vocabulary but additional support, such as explicit instruction, is also
needed (Oxford & Scarcella, 1994). The typical strategies recommended that learners
be instructed in are word grouping, word association, imagery, mnemonics (for
example, keyword method, hookword method), and semantic mapping, etc.
Traditionally, strategy instruction seems to concern advanced learners rather than
low level learners (Coady, 1997). However, strategy instruction to low level learners
can be very useful. For example, strategies such as imagery and mnemonics will be
very helpful since the greatest difficulty in acquiring a word in the initial stages is to
link the form and the meaning in memory (Kelly, 1986; Laufer, Eldder, Hill, &
Congdon, 2004). This is particularly true in respect of an unrelated language and was
the initial driving force behind the keyword method (Atkinson & Raugh, 1975).
It would seem that the explicit learning paradigm is best summarized as a ‘‘mixed
approach’’, to use Coady’s words (1993, p. 17). Supporters of this paradigm combine
a whole variety of activities, including explicit vocabulary instruction, vocabulary
exercises, vocabulary learning strategies, and extensive reading. The strength of the
explicit learning paradigm is that implicit learning is not excluded but rather is seen as
one of the two complementary learning approaches that are necessary to vocabulary
acquisition. The two would work best in combination with each other.
Review of Call Programs for Vocabulary Learning
Multimedia Packages with Vocabulary Learning Activities
This is perhaps the most popular type in terms of the number of products that have
been sold and their wide use in educational settings. Most are commercialised
18 Q. Ma and P. Kelly
Page 5
programs. The criticism is often made that these programs lack a pedagogical basis.
The investment in such projects is usually considerable but it does not necessarily
mean that solid research has preceded them. They are particularly vulnerable when it
comes to the issue of users’ needs being addressed. Commercialised programs are
often remote from the users; background information, such as the age, sex, cultural
background, other foreign language knowledge, computer knowledge and so on, of
those users for whom the programs are intended is not specified and can only be
guessed (Levy, 1999, 2002). Given their general lack of research basis as well as the
comparatively small amount of time and space devoted to vocabulary learning, the
quality of the vocabulary learning resulting from the utilization of these programs is
often disappointing.
Programs Made up of Written Texts with Electronic Glosses
This is probably the most popular type in research-based programs, and is a
reflection of the prevailing interest in incidental learning. These programs are
written texts with hyperlinks and equipped with an electronic dictionary or glossary.
The main emphasis is on reading comprehension and the acquisition of some new
lexical items is a by-product of the reading process. The advantage of providing
electronic glosses is that the lexical information can be accessed easily simply by a
click (or by typing the word) with little interruption of the reading process.
Moreover, glosses are made much more informative and attractive than traditional
lexical entries by utilizing multimedia effects. Chun and Plass (1996a), Laufer and
Hill (2000), and De Ridder (2002), have carried out studies that demonstrated how
vocabulary can be learned in such a setting, though each of these studies focuses on
a different aspect. The main concern of this type of program is with the information
that should be included about a word and with the way the information should be
presented. The learning rates are reportedly higher in the computer-mediated
situations than in paper materials for incidental learning. Chun and Plass (1996a)
reported a learning rate of 24.1% – 26.6%; Hill and Laufer (2000) reported a
learning rate of 33.3% – 62%. However, the learning rate in each of the studies is,
strictly speaking, only tested at recognition level. It is reasonable to anticipate a
lower learning rate at production level due to the nature of the learning task in this
type of program. It is productive vocabulary learning that this type of program
cannot address adequately.
Programs Dedicated to Vocabulary Learning
Another type of CALL program, which is often based on research, usually takes a
different approach. The CALL authors choose a particular theory of language
learning and implement it via computer technology. A good example is provided by
Groot (2000, p. 64), where the three stages of acquiring a new word in the mental
lexicon: ‘‘noticing’’, ‘‘storage’’ and ‘‘consolidation’’, are simulated by the CALL
program ‘‘CAVACO’’. The learning process is composed of four stages in sequential
Computer Assisted Vocabulary Learning 19
Page 6
order: ‘‘deduction’’, ‘‘usage’’, ‘‘examples’’ and ‘‘retrieval’’. A careful look at the
program reveals that learners were encouraged to deduce word meanings and word
usage. However, instead of leading to deeper processing, this may risk inducing mere
guessing since learners are prone to take short-cuts and perform activities requiring
less mental effort. This may explain why the learning result of the experimental
groups was not much higher than that of the control bilingual list groups in Groot’s
investigation. Goodfellow’s ‘Lexica’ (1994) is based on Kukusla-Hulme’s model of
‘‘Journey of a vocabulary item’’ (1988, p. 164) in which a vocabulary item to be
learned goes through the following procedure (Figure 1).
‘Lexica’ adopted this model and elaborated the ‘written record’; the user of Lexica
is asked to group words according to ‘form’, ‘meaning’, and ‘context’ and then find
the meanings and usage of the words with the help of lexical tools (for example,
dictionary, concordancer). The weakness of such a design, according to Goodfellow
(1995, p. 220), is the lack of explicit instruction how each task should be carried out.
Consequently, word grouping was found to be difficult for some learners and on the
whole they tended to adopt a superficial learning approach, such as using L1
translations. The expected learning rate of eight words per hour was achieved by very
few subjects.
A Design Model for CALL Efficacy
The model that we suggest is an attempt to provide an alternative way of addressing
CALL design, bearing in mind that this is only a starting point and that there still
remains plenty of scope for the better integration of computer technology into the
design of CALL programs. What is provided here is a simple preliminary model;
the major concern is with identifying the most important parameters that determine
the efficacy or the quality of a CALL program (see Figure 2).
CALL Efficacy
CALL efficacy can be interpreted as the quality of the CALL program—that is,
how effective and helpful it is when used by the learner. It can be assessed both
quantitatively and qualitatively. Quantitative data include the performance of the
user on the program’s tasks, which can be revealed by the scoring system of the
program; they also include the progress (or the regression) that takes place
through using the program, which can be assessed by pre-tests and post-tests in
an experimental setting. Qualitative data include the recording of the user
Figure 1. Journey of a vocabulary item (adapted from Kukulska-Hulme, 1988)
20 Q. Ma and P. Kelly
Page 7
interaction with the program, which could be provided by a profile recording
system built into the program. The user’s own evaluation of the program is
another important source of qualitative data regarding the efficacy of the program.
This can be obtained by a questionnaire and/or an interview on the completion of
the tasks.
Theory
First, it is commonly agreed that a sound theoretical underpinning is vital to ensure
the quality of a CALL program. It has been demonstrated that the quality of a CALL
program is determined by the methodology behind it rather than the computer
technology itself. Methodology refers to the overall approach to the design of the
program; the underlying theoretical principles constitute a very important component
of the methodology. Here theory mainly means language learning theory, which is used
as a general term to refer to the program designer’s assumptions about the nature of
language, language learning and the process of learning. What specific language
learning theory to choose depends on what language knowledge aspects or skills the
CALL program would like to focus on. In CALL programs for vocabulary learning,
learning theories or research findings specific to vocabulary learning should be
considered first. On the other hand, language is best learned as a whole rather than in
separate components. There are thus some general learning aspects shared by CALL
programs though they have different focuses. The selection of a specific or general
language learning theory will serve as a guide in the selection of the technologies to
be used.
Figure 2. The CALL efficacy model
Computer Assisted Vocabulary Learning 21
Page 8
Computer Technology
Traditionally, computer technology is referred to as the means or the medium used to
deliver learning materials to learners. Clark (1994) distinguished between ‘methods’
and ‘media’. Media are the means of delivering the methods which consist in
‘‘a number of possible representations of a cognitive process or strategy that is
necessary for learning’’ (p. 26). He claimed that a method can be implemented by
many means other than computer technology; thus media (or the computer
technology) might ‘‘influence the cost or speed efficiency of learning but methods
are causal in learning’’ (p. 26). There seems to be a tradition of dividing CALL into
two broad categories: technology-driven and pedagogy-driven based projects (Colpaert,
2003; Levy, 1997). Developers in this category are often accused of producing CALL
materials based on their intuition instead of on research in language learning.
There is a dividing line in conceptualising CALL design: there are those who do so
according to technologies and those who do so according to methodology: each side
focuses on its own aspect and plays down the other. There is therefore on both sides
an inclination to view method (methodology) and media (computer technology) as
two separate components. Technology alone cannot determine the design, but should
it be viewed solely as a means of implementing the materials? A crucial question
arises: Is there a merging point of technology and pedagogical knowledge in
conceptualising CALL design? If so, where is it? We argue that computer technology
could be thoroughly integrated into the design and become an inseparable part of the
methodology; technology can be used to monitor and control user actions so that
users can be guided in performing language learning activities and achieve high
learning potential.
User Actions
Learner performance, or user actions, is an important source of data for the
evaluation of CALL programs. Chapelle (2001) puts learner performance at the third
level of evaluation after that of the CALL program itself and the teacher’s planned
activity. What the learner has actually done and how s/he interacts with the program is
a good indicator of the learning outcome. In line with the current emphasis on
‘learner autonomy’ and ‘learner focus’, the trend is for the user to be given as much
freedom as possible in the use of the program. Closely associated with these concepts
is ‘learner development’, discussed in detail by Wenden (2002, p. 32) who defines it
as ‘‘a learner-centred innovation in FL/SL instruction that responds to the learner by
aiming to improve the language learner’s ability to learn a language’’. It can be said
that learner development is the process and improved ability in language learning is
the objective. This entails the premise that learners initially do not necessarily possess
good learning ability and efficient learning strategies for language learning; they need
to learn how to become good learners.
Obviously, something has to be done to facilitate learner development and it is very
unlikely that learners could simply learn to become good learners by themselves
22 Q. Ma and P. Kelly
Page 9
without any help. They have to be equipped with metacognitive knowledge, learning
strategies, and skills for self-direction to be able to become good learners. ‘Strategies-
based instruction’, reported by Cohen et al. (1996), has two major components:
explicit strategy instruction and strategy instruction integration. In the first case, students
are explicitly taught how, when, and why strategies can be used to facilitate language
learning and language use tasks. In the second case, strategies are integrated into
everyday class materials and may be explicitly or implicitly embedded in the language
tasks. If we are going to train learners to master good learning strategies in language
activities, we have to draw up rules to constrain what they do instead of giving them
complete freedom, which will go against the learning goals of the program. We
therefore propose that user actions be controlled to some degree and that this be done
by integrating computer technology into the overall design. The user is directed to the
completion of the learning tasks as well as the embedded learning strategies
instruction. S/he is guided and not left to wander at will through the program.
Learner Information
Language learning is also a very idiosyncratic process that is subject to a series of
learner characteristics, such as mother tongue, knowledge of other foreign languages,
level of proficiency in the target language, learning difficulty, learning style, learning
strategy, motivation, age, sex, etc. Obviously, different types of learners have different
needs, and these should be taken into account when designing the CALL program. In
the same way, it is suggested that a CALL program should be targeted to a particular
group of learners who have in common a series of characteristics. As much
information as possible should be obtained about the learner before the software is
designed; this information constitutes an important part of ‘analysis’ in Colpaert’s
RBRO design model (2004, p. 135).
From the CALL efficacy model, it can be seen that learner information influences
the other three components by providing background information to inform the
choices made in respect of theory, technology, and user actions. When choosing
language learning theories for the program, we should ask ourselves a series of
questions in respect of the learner. For example:
1. Will the theory chosen help our learners to acquire language knowledge or skills?
And if so, how?
2. Will the theory chosen have the potential to improve our learners’ ability to learn?
3. How can we apply the theory to the learning activities so that learners will enjoy
them?
Other questions can be asked, depending on the specific context of learner
information. As for computer technology, learner information can tell us what type
of specific technology is favoured or rejected by our users. For example, if users
are used to having the right mouse click function to display information for a word,
we need to consider developing this type of technology in the software. If they are
Computer Assisted Vocabulary Learning 23
Page 10
interested in sound or visual effects, we should develop more audio, video, speech, or
animation technologies so that learners’ different sensory learning styles can be
accommodated. Learner information can also provide information on how user
actions should be controlled so that learners can be guided in their language learning.
It should also be borne in mind that, if complete freedom will harm the learning
results, so doubtless will no freedom at all. The degree to which learners’ freedom or
control over the program should be restricted largely depends on the learner
characteristics: learning style, learning strategy, knowledge about language learning,
and perceived useful and harmful effects of the guided instruction, etc. All these have
to be taken into account in deciding what should be allowed and what should be
restricted regarding learner freedom.
Design of WUFUN
Some General Learner Information
The main learning difficulties for vocabulary include fixing the new vocabulary in
memory, mastering the meaning(s) of new items, using vocabulary items correctly,
and incorporating idiomatic expressions into one’s vocabulary. These are the
problems faced by every language learner. Chinese learners of English face other
specific problems in acquiring vocabulary due to the huge linguistic distance between
English and Chinese and the considerable cultural gap. In particular, it is observed
that:
1. The practice of mechanical memorization (rote), which is deep-rooted in
Chinese culture, characterizes their learning.
2. Lack of direct contact with western culture makes it extremely difficult to bridge
the cultural gap and to use language appropriately.
3. The exam orientation of all language teaching and learning, which has for so
long encouraged rote learning and discouraged a communicative learning
approach.
A CAVL program, named WUFUN, is being developed to help Chinese university
learners of English to overcome the learning difficulty they face by incorporating
learning activities that specifically address their needs. The design of the software is
based on the CALL efficacy model presented earlier.
Integrating the Theory into the Design
Vocabulary learning in WUFUN is addressed in a holistic way; learning is situated in
context with particular attention being paid to the items. Following a series of
systematic studies on Chinese learners (Kelly, Li, Vanparys, & Zimmer, 1996;
Vanparys, Zimmer, Li, & Kelly, 1997; Li et al., 1999), it was decided that the specific
elaboration strategies for item learning that will be potentially useful to Chinese
24 Q. Ma and P. Kelly
Page 11
learners are imagery, verbal association, programmed rehearsal and oral input. The
originality of the approach consists in the integration of a listening approach and
memorization techniques and in sensitising the learner to cultural differences.
Mnemonic techniques have been documented since an interest was first shown in
language strategies (see Atkinson & Raugh, 1975; Cohen, 1987; Pavio &
Desrochers, 1979; Pressley & Levin, 1981) and they have been generally proved
to be much more effective than rote. The effectiveness of most mnemonics consists
in the interplay between images and verbal representations, formulated by Pavio
and Desrochers as the ‘dual-coding theory’ (1980). However, these memory
strategies are investigated largely in language laboratory settings (O’Malley &
Chamot, 1990), and their potential is rarely exploited in classroom teaching/
learning with the result that they remain largely unknown to most learners. Not
only have these mnemonic techniques been demonstrated to be as much as three
times more effective than the traditional rote method (Paivio & Desrochers, 1979)
but, as so many researchers have pointed out, they transform the learning of
vocabulary from what is invariably viewed as a tedious, boring task into one that is
enjoyable and even amusing.
Listening is viewed by a number of leading researchers as the basic skill in SLA
(see Asher, 1983; Krashen & Terrell, 1983; Nord, 1978; Winitz, 1978). Through the
progressive build-up of a language store, involving both hemispheres of the brain,
speaking results, in much the same way as with the acquisition of L1. Many
investigations have demonstrated this transfer (see Asher, 1964; Ervin-Tripp, 1974;
Postovsky, 1975; Winitz & Reeds, 1973). In addition, the auditory perception of the
learner progressively develops and this becomes the basis of a good pronunciation
(Gary, 1975; Winitz, 1977). Furthermore, it has been shown that listening aids in the
long-term retention of vocabulary, whether it be for reading or for listening purposes
(Gary & Gary, 1982; Kelly, 1992).
Language always mirrors the background culture of whoever is speaking. This is
particularly true in respect of vocabulary (see De Saussure, 1974; Miller, 1996).
There should, in consequence, be a strong focus on the cultural aspect3 in respect of
vocabulary learning. This is done in WUFUN via the images, the stories which
introduce the vocabulary, the idioms and proverbs and, in particular, via the humour/
true stories. Humour can be a valuable tool for bringing out salient characteristics of a
culture, without indulging in negative stereotypes. Differences between different
western cultures—their customs, practices, attitudes, behaviour, humour, even
political and economic situation—are also brought out.
When the learning theories are decided on, the next step is to create the learning
content underpinned by these theoretical guidelines. Approximately 300 words were
taken from the 4,000 word list that Chinese university students (non-language
specialists) are required to master; their selection was based on student and teacher
judgement of word learning difficulty (for example, pronunciation, word length,
spelling, confusion with similar forms, cultural connotations, and so on), together
with other criteria, such as usefulness and relevance (Kelly & Li, 2005). These
words were then used to create 20 stories as learning texts combined with other
Computer Assisted Vocabulary Learning 25
Page 12
different learning activities, forming 20 units in all. One of the 20 units has
been developed into a computer program as the prototypical unit of WUFUN,
containing 25 words plus three idioms4 to be studied (see Appendix A). Some
of the words may already be known receptively or productively to learners. The
results of the pre-vocabulary tests in the pilot study confirmed that this was the
case. The following provides an overview of the sequence of learning activities in
WUFUN.
First, a preview of the context (overview of the story), serving as an ‘advanced
organizer’, a device aiming to activate useful background information (see Chun &
Plass, 1996b, p. 504), is presented to the learner. The user can view a series of
pictures, each of them accompanied by a short spoken sentence (some of the words
to be learned will appear in the sentences for the first time; the word meaning can
be easily guessed from the pictures). This will give the user a general idea of the
story presented later. Then some vocabulary items are presented in the form of a
mini dictionary (Word Focus); the glosses include meanings, collocations, example
sentences and usage. In addition, the learner can listen to the word, view the
picture if available, and ask for a Chinese translation of the word. The user can then
read the text, the complete version of the preview, in which glossed words in Word
Focus will reappear in the context. Then some vocabulary learning strategies are
introduced to the learner in Word Memorisation Aids, the main ones relating to
verbal association, imagery, rhyming or alliteration (see Boers & Lindstromberg,
2005), etc. The user chooses a word s/he wants to know better from a list, and s/he
will be given a useful tip (with the option to display the Chinese translation) on how
to memorise the word. For example, for the word acquaintance, a sentence is given:
The queen is an acquaintance of mine. The user can listen to the sentence and is asked
to form a mental image of the sentence while listening to it. Different tips are given
to facilitate the learning of the word; whether the word contains affixes or roots,
whether it is imageable, whether it can be associated with other known words, etc.
What is central to these tips is the combination of image, sound and verbal
information. Their combination will help word memorization and accommodate
different learning styles.
Next are the exercises where the words will be practised and rehearsed in context.
By doing exercises, the learner becomes familiar with the meaning and usage of the
words. Exercises include supplying synonymous expressions, finding antonyms,
using words in collocations or as they typically occur in contexts, differentiating
words having similar but not identical meanings (for example, ridiculous and funny5),
etc. The whole procedure can be repeated. The vocabulary processing procedure in
WUFUN is described in Figure 3.
After the exercises comes the section on idioms followed by that on humour/true
stories. The idioms are usually found to be very difficult to learn as their meaning is
not apparent and often heavily culture bound. In accordance with the duel coding
theory of Pavio and Desrochers (1980), which advocates dual modality input to
enhance vocabulary learning, the user clicks on the idiom s/he wants to study and a
picture that illustrates the meaning of the idiom will pop up on the right of the screen;
26 Q. Ma and P. Kelly
Page 13
in the meantime, the user can listen to an explanation of the idiom. The humour/true
stories are to arouse the learner’s awareness of the cultural elements underlying
language learning. Thus each story or joke to a certain degree reflects a facet of
western culture (though not necessarily the culture of English-speaking countries
since the language is spoken by a much larger population). The learner can read and
listen to the stories.
Integrating Computer Technology into the Design
The computer technology has a two-fold function. It is used to create the multimedia
program and, more importantly, to make the user follow the design model of the
program. Users have restricted freedom in using the software. The idea is that they
can always go back to the previous steps while they have to complete some basic
requirements before going on to the next step. If the user does not obey these rules,
the forward button on the navigation bar to go to the next page will be disabled. Here
are a few examples: the user has to have listened to all the short sentences in the
overview of the story before being able to go to the Word Focus (WF); s/he has to
look up at least one word in WF before reading the story; s/he can only access the
correct answers of the exercises in written form after having listened to them first; s/he
normally has to finish one exercise before starting the next one or s/he can go directly
to the next exercise but will get a score of ‘0’ for the exercise skipped. Every decision
regarding user freedom for each step is thought out so that the user can obtain some
benefit from doing the activities without being frustrated to the point that s/he no
longer wishes to continue. Technology is employed in such a way as to ensure that
each step is completed to a minimum requirement.
Taking into Account User Actions in the Design
In order to induce the learner to follow the design model of our program, a learning
metaphor is represented in the menu screen, namely, learning is a cyclic process
and learning tasks are to be finished step by step (see Figure 4). A help system will
Figure 3. The vocabulary processing procedure in WUFUN
Computer Assisted Vocabulary Learning 27
Page 14
be at hand to show the learner how the software should be used. Each learning
activity is accompanied by detailed instructions on how to carry out the task. The
interface design in each page of the software is consistent and easy to understand.
To monitor users’ performance, some user actions are recorded by the system while
s/he is using the software: the total time spent on the software, the number of words
viewed in WF and in Word Memorization Aids (WMA),6 the time spent on exercises and
the score obtained. These data will provide important information for evaluation of
the software.
A Pilot Study for Software Evaluation
When the prototypical unit was ready we carried out a pilot study in a Chinese
university to evaluate the software. The study is a pre-test and post-test design
combined with questionnaires and interview. The evaluation of the software will be
conducted in terms of: learning outcome as measured by vocabulary learning rate and
the vocabulary learning strategies acquired; learner evaluation as revealed by degree of
satisfaction in the use of the software; restricted freedom impact (on learning outcome
and learner evaluation) as measured by the relationship between user actions and
learning outcome/learner evaluation. Through the software evaluation, the CALL
efficacy model discussed earlier can be empirically validated.
Figure 4. The main menu screen of WUFUN
28 Q. Ma and P. Kelly
Page 15
Research Questions
Our research questions are the following:
1. What is the learning outcome of WUFUN? More specifically:
(a) To what extent will WUFUN help Chinese learners to acquire vocabulary
perceived as difficult at the receptive and the productive level in two
different settings: individual use and classroom use?
(b) Are learners likely to develop vocabulary learning strategies that will
facilitate vocabulary learning in the long run in the two different settings?
2. How do users evaluate WUFUN in the two different settings?
3. How are user actions related to learner evaluation and to the learning results in
the two different settings?
The Study
Subjects. Two groups of first year students at Three Gorges University, Yichang,
China, of various study backgrounds (non-language specialists) participated in the
study. They are low intermediate learners who have a vocabulary of 2,000 – 3,000
words. Initially we tried to include more subjects, but due to some unexpected
practical constraints we only had 35 subjects, divided into two groups according to
the experiment setting. Group 1 (G1) contains 17 students who volunteered to
participate in the experiment after a brief introduction to WUFUN. They made an
appointment with the researcher and completed the experiment on an individual
basis. Group 2 (G2) contains 18 students who did the experiment together in a
computer room as a self-learning class. They were required by their teacher to
participate in the study. It should be noted that individual use or classroom use of
language learning software are the two most prototypical settings for CALL. When
learners volunteer or choose to use a piece of language software, as in the case of G1,
it can be assumed that they are displaying an interest in the task. According to the
process model of motivation (Dornyei, 2001), this generates motivation7 at the start
of the learning task. However, no such assumption can be made in respect of subjects
who are coerced into performing the task, which was the case with G2.
Experiment instruments (see examples for each type of instrument in Appendix B): pre- and
post-vocabulary (receptive/productive) tests. A separate receptive and productive test was
administered before software use to test whether the students knew the new
vocabulary items that appeared in WUFUN receptively or productively. Laufer
(1998) distinguished three types of vocabulary knowledge, namely passive vocabu-
lary, controlled active vocabulary and free active vocabulary. In a more recent article
(2004), she divides knowledge of a word into four degrees of strength: productive
recall, receptive recall, productive recognition, receptive recognition, which are ranked
hierarchically (from the highest to the lowest) in terms of the strength of the word
knowledge. We chose two test formats for the receptive knowledge test: the receptive
Computer Assisted Vocabulary Learning 29
Page 16
recognition test (the lowest strength) and the vocabulary level test (Laufer & Nation,
1995). For the productive knowledge test, we used the controlled active vocabulary test
(Laufer, 1998), which closely resembles the equivalent of the receptive recall test for
the second highest strength of the word knowledge. To avoid the test-wise effect, we
used some distracters in both tests. There were 25 words to be marked in the
receptive test and 21 words in the productive test. The same two tests were
administered again after software use to see whether there were vocabulary gains and
what these might be.
Pre- and post-questionnaires. A pre-questionnaire (Q.1) was administered before
software use to glean information about the students’ vocabulary learning strategies
and their expectations of the software (WUFUN) they were going to use. We mainly
used multiple-choice questions; both the questions and the choice of answers were
carefully designed to ensure the information given would be as complete as possible
and thus give as accurate a picture as possible of the students’ opinions.
A post-questionnaire (Q. 2) was administered after software use. It aimed to find
out to what degree the students were satisfied after using WUFUN and to obtain their
comments and suggestions. It is divided into 13 sections and made up of 44 questions
on a 5-point scale plus a few open questions. Students were asked to give a rating in
terms of their satisfaction regarding the various components (see Figure 3 for a brief
review) of the program. Questions were also asked on the scoring and checking
system (feedback system), interface design, graphic design, sound system, etc. At
the end there was an open section for any comments and suggestions regarding the
software and to find out whether the students had learned or been aware of the
vocabulary learning strategies embedded in the software.
Experiment procedure. The whole experiment follows an eight-step linear sequence:
pre-receptive test, pre-productive test, pre-questionnaire, software use, post-
questionnaire, post-receptive test, post-productive test and an interview. The last
step, the interview, was limited to G1; it was not used with G2 due to the practical
constraints. It took about 2 – 2.5 hours to complete the whole procedure. It should be
noted that learners were told beforehand that they would study a piece of vocabulary
learning software but they did not know about the detailed procedure involved.
Although they were tested before using the software, most students would not have
expected a test afterwards.
Data collection and analysis. For each subject in G1 we collected four scores on
vocabulary tests, two sets of information in the pre- and post-questionnaires, user
actions recorded by the software system, and some follow-up information in the
interview. For G2, we have all the information except the follow-up information.
We obtained each student’s vocabulary gain at both the receptive and the
productive level by subtracting the pre-scores from the post-scores. We performed a
t-test to see whether there was a significant difference between the two groups. We
calculated all the ratings in all the sections for the post-questionnaire and calculated a
30 Q. Ma and P. Kelly
Page 17
mean for each student with a rating from 1 – 5 as the learner evaluation. A profile
recording system built into the software enabled us to examine the user actions
during software use. For both groups we performed a correlation test between the
user actions and the vocabulary gain and another correlation test between the user
actions and the learner evaluation.
Results and Discussion
Pre-questionnaire. From Q1, we get a detailed picture of the students’ profile. In
addition, an in-depth study of the quantified results reveals some characteristics of the
students’ learning habits and of their perception of CALL program learning. As for
vocabulary learning, the most popular memorisation strategies are rote accompanied
by periodic review. Other more elaborate techniques, such as mnemonics and word
grouping, are also reported to have been used, but less frequently. The listening
approach is adopted by the least number of students. They tend to be ready to
perform tasks perceived as interesting or less demanding, such as viewing pictures or
reading stories, and are more likely to avoid demanding tasks such as doing exercises
or learning vocabulary. However, the avoidance could be compensated for by the
usefulness they perceived in performing the task. If they received help to make the
task easier, they would certainly be more willing to do it.
Gain in receptive and productive vocabulary. Table 1 presents the mean score of both
the receptive and the productive tests for both groups. Table 2 presents the means of
receptive gain between the pre-test and the post-test for both groups.
Table 1. Mean and standard deviation (SD) for pre-test and post-test
Mean SD Minimum Maximum
Pre Post Pre Post Pre Post Pre Post
Receptive G1 15.59 21.88 2.62 1.45 11 19 19 24
Full¼ 25 G2 16.06 20.5 3.11 3.84 8 10 20 24
Productive G1 10.91 16.12 2.31 2.64 7 11.5 15 20.5
Full¼ 21 G2 9.64 14.25 4.05 4.41 2 2 15.5 20
Table 2. Mean for receptive gain
Mean SD Learning rate* Minimum Maximum
G1 6.29 2.69 40% 3 11
G2 4.44 2.38 28% 1 9
Note: *Learning rate is calculated by dividing the mean of the pre-test score by the difference
between the mean of the post-test score and the mean of the pre-test score, e.g., the receptive
learning rate of 40% for G1 is obtained by dividing the mean of the pre-test score (15.59) by the
difference between the post-test score and the pre-test score (6.29).
Computer Assisted Vocabulary Learning 31
Page 18
Table 3 presents the means of productive gain between the pre-test and the post-
test for both groups.
The mean scores set out in Table 1 revealed that the pre-test scores for both groups
regarding receptive and productive vocabulary are quite similar (receptive: 15.59 –
16.06 out of 25; productive: 10.91 – 9.64 out of 21); a t-test indeed confirms that
there is no difference between the two groups (not reported here for the sake of
space). It would seem that both groups have a similar starting point in terms of pre-
knowledge of the vocabulary items to be studied. However, it is noted that G2 had a
higher SD than G1 for both pre-test and post-test on both vocabulary levels, showing
that there was a bigger difference between the subjects within G2 than within G1.
The gain for both groups was quite satisfactory considering there was a high
baseline for each group (see Table 1). Figures presented in Table 1 imply that G1
had nine words to learn to a receptive level and 10 words to a productive level; G2
had nine words to learn to a receptive level and 11 words to a productive level. Our
first research question was: To what extent will WUFUN help Chinese learners to acquire
vocabulary perceived as difficult at the receptive and the productive level in two different
settings: individual use and classroom use? It seems that both groups achieved a
considerable learning rate at both the receptive and the productive level. Moreover,
both groups have a higher vocabulary learning rate at the productive level than at the
receptive level (47%4 40% for G1; 48%4 28% for G2).
Initially, it appeared that G1 had gained more vocabulary at both the receptive and
productive levels. By performing a t-test to compare the means for both groups we
find, however, that G1 did significantly better than G2 at the receptive level but not at
the productive level. See Tables 4 and 5 for the results.
Table 4 shows that the difference in receptive gain between G1 and G2 is
significant (t Stat 2.164 t Critical 2.03, df¼ 33, p5 .05.); however, the difference in
productive gain is insignificant as shown in Table 5 for both groups (t Stat 0.785 t
Critical 2.03, p4 .05.).
Table 3. Mean for productive gain
Mean SD Learning rate Minimum Maximum
G1 5.32 2.59 47% 0 9.5
G2 4.56 3.17 48% 0 11.5
Table 4. T-test of receptive gain between two groups
T-test Mean Variance Observations df T-stat T-critical p
G1 6.29 7.22 17 33 2.16 2.03 .038*G2 4.44 5.67 18
Note: *p5 .05. (two-tailed).
32 Q. Ma and P. Kelly
Page 19
The two findings given above—that the productive learning rates are higher than
the receptive learning rates for both groups and that there is no significant difference
in vocabulary gain between the two groups at the productive level but the difference is
significant at the receptive level—seem to indicate that WUFUN is slightly more able
to help learners to learn vocabulary productively than receptively regardless of
whether for individual or classroom use.
Post-questionnaire (learner evaluation). As mentioned earlier, there are two types of
questions in Q2: rating scale questions and open questions. We will focus only on the
rating scale questions and leave the open questions to a later stage. See Table 6 for the
means of evaluation for both groups.
A t-test shows there is no significant difference between the two groups (t Stat
0.935 t Critical 2.03, df¼ 33, p4 .05.). Both groups gave a good evaluation of the
program; G1 had a mean of 4 out of 5 and G2 had a mean of 3.83 out of 5. In
answering the question whether they would like to use the software when more units
are developed in the future, all the subjects in G1 unanimously replied ‘‘Yes’’. 3 out of
18 in G2 replied ‘‘No’’, which still leaves a positive result since G2 were forced, as it
were, into participating in the experiment. In response to the research question: How
do users evaluate WUFUN in the two different settings?, the software evaluation by the
learners in the individual or classroom setting is satisfactory with most students from
both groups expressing their willingness to continue to use the software in the future.
Of the 13 sections of Q2, the favourite section for G1 is the ‘‘scores and checking
system’’, which has an average of 4.53. It is the same for G2 who have an average
rating of 4.47. The lowest section (3.65) for G1 is the ‘‘program sequence’’ in which
students are asked whether they like the sequence of the program and whether they
feel they should follow the guidelines of the program instead of doing what they want.
The CALL efficacy model described earlier is implemented in the program sequence
as also is the restricted user freedom regarding the control of the program. For G2,
this section is rated the second lowest (3.47). Nevertheless, the ratings for both
Table 5. T-test of productive gain between two groups
T-test Mean Variance Observations df T-stat T-critical p
G1 5.32 6.69 17 33 0.78 2.03 .44
G2 4.56 10.03 18
Table 6. Mean of learner evaluation of the software
Mean SD Minimum Maximum*
G1 4 0.42 3.21 4.74
G2 3.83 0.63 2.1 4.53
Note: *Full rating¼ 5.
Computer Assisted Vocabulary Learning 33
Page 20
groups for this section have exceeded the middle point in the rating scale. This
suggests that subjects in both settings do not particularly like the constraints but that
they find them acceptable.
User actions. Table 7 presents the mean of user actions: time spent on the program,
number of words viewed in WF, number of words viewed in WMA, time spent on the
exercises and score obtained for the exercises for both groups.
A quick look at this table will reveal that the two groups are very different regarding
the way they use the software. At first sight, it appears that G2 spent more time on the
program than G1 but G2 had a much greater SD (33.5) than G1 (18.72). A careful
look at the data shows that three subjects (all females) in G2 spent 141, 150 and 156
minutes on the program. If the three were taken out, the average time for G2 would
be about 73 minutes. For G1, the longest time spent was 112 minutes. Thus, in fact,
subjects in G2 generally spent less time than those in G1, except the three female
subjects. G2 also spent less time on the exercises and scored much lower than G1. To
answer the research question: How are user actions related to learner evaluation and to the
learning results in the two different settings?, we performed multiple correlation tests
between each selected user action (listed in Table 7) and the receptive, productive
gain (learning results) and the learner evaluation (results of Q2) for both groups. See
Table 8 and Table 9 for the results.
Note that in both Tables 8 and 9, the correlation r whose absolute value is smaller
than 0.2 is excluded. As revealed in Tables 8 and 9, the situations for both groups are
quite different. For G1, the total time spent on the software seems to have a good
significant negative correlation (r¼7.52, p5 .05.) with the learner evaluation; that
is, the more time spent on the software, the lower the evaluation tends to be. This is
the opposite for G2 in which there is a good significant positive correlation (r¼ .51,
p5 .05.) with the evaluation. The correlations between total time and the receptive
Table 7. Mean of user actions
Time
(minutes) WMA WF
Time on ex.
(minutes)
Score for ex.
(Max.¼ 100)
G1 80.77 7.94 17.65 22.53 60.08
G2 85.44 8.56 18.67 17.44 36.30
Table 8. Correlation between user actions and their learning results and learner evaluation for G1
Person r Time WMA WF Time on ex. Score for ex.
Learner evaluation 7.52* 7.22
Receptive gain .25 .61** .36
Productive gain .32 .44 .49*
Note: *p5 .05. **p5 .01. (two-tailed).
34 Q. Ma and P. Kelly
Page 21
and productive gain are weak and insignificant for both groups. The number of words
viewed in WMA seems to have a good significant positive correlation with the
receptive gain for both groups (r¼ .61, p5 .01 for G1; r¼ .52, p5 .05 for G2). The
number of words viewed in WF has little correlation with receptive and productive
gain for G1; it has quite a good significant correlation with the productive gain for G2
and a weaker insignificant correlation with the receptive gain. Time spent on the
exercises seems to have little to do with the receptive and productive gain for G1; it
has a better positive correlation (r¼ .47, p5 .05) with the receptive gain for G2. The
score for the exercises has a good significant positive correlation (r¼ .49, p5 .05)
with the productive gain for G1 and a good significant positive correlation (r¼ .51,
p5 .05) with the receptive gain for G2.
In both tables we find that three factors, total time spent on the program, words viewed
in WMA and score obtained for the exercises, seem to be more closely related to the
learning results and learner evaluation for both groups. The way these factors
correlate with the learning results and evaluation is quite different for both groups.
For example, we are not very clear why the total time spent on the program is
correlated in two opposite directions for G1 and G2. The only common phenomenon
shared by both groups is that WMA has similar positive correlation with receptive
gain. This proves that WMA, the main section to introduce vocabulary learning
strategies, is more likely to be helpful to receptive vocabulary gain. But why is it less
likely to be helpful for productive vocabulary gain? One assumption might be that a
single exposure to vocabulary learning strategies is not enough to help the students to
learn the vocabulary to a productive level. To learn a word productively, one needs, in
addition to deep mental processing of the lexical information, sufficient familiarity
with the word in different contexts. Therefore, the score the subjects obtained for the
exercises would be more likely to account for the productive gain. This is the case
with G1 for which a significant positive correlation is found between the exercise
score and the productive gain. This is not the case with G2, where a significant
positive correlation is only found between the exercise score and the receptive gain.
Finally, an extra correlation test was performed between the learner evaluation and
the learner outcome. No significant correlation was found for both groups and the
two types of vocabulary gain. Unlike previous findings, learner attitude toward the
learning tasks does not greatly affect the learning results. For example, the subject in
G1 who had the lowest evaluation of the software (3.21) turned out to have achieved a
high vocabulary gain both receptively (10 words) and productively (eight words).
Table 9. Correlation between user actions and their learning results and learner evaluation for G2
Person r Time WMA WF Time on ex. Score for ex.
Learner evaluation .51* .35 .44 .35 .33
Receptive gain .38 .52* .21 .47* .51*Productive gain .31 .28 .5* .42 .21
Note: *p5 .05. (two-tailed).
Computer Assisted Vocabulary Learning 35
Page 22
This subject stated frankly in the interview that he did not like the software because
the ‘rigid’ order of the program would not allow him to exercise his individuality and
creativity. He spent 90 minutes on the software, of which 19 minutes were devoted to
the exercises, and obtained a score of 73.9. In addition, he viewed 15 words in WMA
and 17 in WF. Note that he spent more time, viewed more words in WMA and did
better on the exercises than the average (See Table 7). Although he did not like the
restricted freedom regarding the software use, it is this design feature that guided and
controlled his actions which led to his superior learning results over others who gave a
higher evaluation of the software but who spent less time and viewed fewer words.
This, on the one hand, proves that the design of WUFUN based on the CALL
efficacy model has been preliminarily successful; on the other hand, it indicates that
affective factors such as attitudes towards the learning task do not always predict
learning result. What matters is what learners actually do in the learning process.
Subjects in the two different settings provided rather different pictures of how user
actions are related to vocabulary gain and learner evaluation in the two different
settings. The difference can be attributed to the quantitatively different user
behaviour, as shown in Table 7. It is very likely that the two groups differ in several
respects; for example, individual users in G1 are doubtless more motivated to use the
software than group users in G2 since the former volunteered to participate in the
study while the latter were coerced into doing so. One subject in G2 spent only 44
minutes on the software, including two minutes on the exercises, viewed two words in
WMA and one word in WF. His vocabulary gain turned out to be the lowest: two
words receptively and zero word productively. The comments he gave were negative:
he considered that the software was boring and that it did not differ much from their
textbooks. His insufficient user actions and poor learning outcome are clearly the
result of a lack of motivation. In addition, the subjects in G1 might have been more at
ease, attentive, and relaxed than those in G2 in the experiment due to the different
settings. It should be remembered that subjects in G1 completed the experiment
individually while all the subjects of G2 were placed together in a computer room.
Other information. This information includes comments and suggestions given by the
subjects in the free open section in Q2 and further information obtained from the
interview (limited to G1). In addition, the subjects were asked to indicate whether
they acquired some useful strategies for learning vocabulary from the software.
Table 10 presents quantitative information regarding answers to the two questions.
The response rates for these two questions are much better for G1 than for G2; in
addition, the quality of the answers for G1 is definitely better in terms of content and
Table 10. Response rate for questions in free section in Q.2
Comments/suggestions Percentage Ideas for voc. learning Percentage
G1 (n¼ 17) 16 94% 13 76%
G2 (n¼ 18) 15 83% 5 27%
36 Q. Ma and P. Kelly
Page 23
length. We will discuss only the strategies that they acquired from using the software
in order to answer the research question: Are learners likely to develop vocabulary
learning strategies that will facilitate vocabulary learning in the long run in the two different
settings? See Table 11 for the categorization of the strategies both groups claimed to
have acquired from the software.
We noted two facts. First, most learners mentioned just one or two strategies.
Second, the learners tend to adopt the strategies that require less mental effort and
show less interest in those requiring more mental effort, such as imagery and
practising words in different contexts. This could also be due to the perceived
usefulness of each category.
Thus our answer to this research question is that the majority of individual users
acquired one or two strategies perceived to be useful from using the software but
strategies requiring more mental effort are less likely to be appreciated. In contrast,
the embedded vocabulary learning strategies are largely ignored by most group
learners. This does not necessarily mean that those strategies were not perceived to be
useful, but simply that the strategies have not entered into their metacognitive
repertoire. It seems that a single exposure to the software for a short period is not
enough to help students to develop vocabulary learning strategies in a systematic way.
It may be also due to the limited mental processing capacity: when learners attend to
both the form and the meaning of the vocabulary items, the cognitive load might be
too heavy to allow them to pay more than limited attention to the embedded learning
strategies.
Conclusion and Suggestions for the Future Study
Our main objective has been to introduce the CALL efficacy model to ensure the
quality of CALL programs. The model is constructed by identifying four main
components, theory, computer technology, user actions, and learner information, and
integrating them into a whole. They influence and interact upon each other, thus
strengthening all the fibres or links of the model. It is these that determine the quality
of a CALL program as well as constituting the methodology of a CALL program. It is
Table 11. Learning strategies acquired by G1 and G2
G1 G2
Put words in sentences to memorize them 4 1
Put words similar in form and meaning together to study 3
Separate roots or affixes from the words 2
Make word associations 2
Practise words (in diversified contexts) 1
Listen to the words in sentences or a text 1 1
Compare words and group words 1
Image the meaning of words 1 1
Computer Assisted Vocabulary Learning 37
Page 24
shown how the model can be applied to the design of the CAVL software WUFUN
for Chinese university students to learn difficult vocabulary items. A pilot study is
reported in order to evaluate the software and to validate the model empirically in
both individual use and classroom use. From the results of the study, it seems that
the CALL efficacy model underpinning WUFUN has been preliminarily proved to
be effective in both settings. Due to the complicated experimental procedure we
collected a large amount of different data. Data analysis and results reporting were
also a painstaking process. It is arguable whether we have chosen the ideal research
methodology for such a complicated study.
Regarding our first research question, the learning outcome of WUFUN, it is
demonstrated that by using the software, learners can acquire vocabulary perceived as
difficult both receptively and productively in both settings. Moreover, the productive
learning rate is slightly higher for both. Learners have acquired a few vocabulary
learning strategies but not in a systematic way that would allow their further
independent use, which is probably due to their limited mental processing. For the
second research question, learner evaluation in both settings is fairly satisfactory
despite the constraints incorporated into the software, and the majority of learners
reported that they would like to use the software when more units are developed.
There is not yet a satisfactory answer to the third question. Some user actions, such as
total time, number of words viewed in WMA and the score obtained for exercises,
seem to be closely related to the learning outcome and learner evaluation; however,
subjects in different settings, individual use or classroom use, revealed very different
pictures. Learner attitudes towards the software do not appear to affect the learning
outcome which is more related to what learners actually do in the learning process.
There are, however, a number of suggestions to be made for the next study.
Improvements will be made to the design of the WUFUN software based on the
results of the pilot study and the comments/suggestions made by learners (for
example, more pictures should be added to the software). More importantly, the
following questions will be addressed:
1. The instruction of vocabulary learning strategies will be made more explicit. The
first thing is to make learners notice the existence of vocabulary learning
strategies and convince them of their usefulness. In other words, a (short)
strategy training session can be held preceding the software use by arousing the
learners’ metalinguistic awareness to fully maximize the software learning
potential.
2. The user data recording system will be elaborated to allow a more detailed
recording of the user actions, e.g., what words are viewed in WF or WMA. This
can enable us to look at user actions more clearly in relation to other learner
information, such as previous vocabulary knowledge. It might also lead to a more
satisfactory answer to the question how user actions are related to learning
results.
3. We need to further test the CALL efficacy model. In the present study, user
actions and learning results were investigated under the constraints embedded in
38 Q. Ma and P. Kelly
Page 25
the software. In the next study we will make a different version of WUFUN with
all the constraints removed, where the user is given complete freedom to decide
what and in what order to do with the software. We shall compare the user
actions and the learning outcome in two conditions: one with constraints and the
other constraint-free.
Acknowledgements
We wish to extend our thanks to the following: Sylviane Granger (University of
Louvain) for her support and for her constructive comments on earlier versions of this
research; Nora Condon (University of Louvain) for her insightful remarks and her
participation in our lengthy discussions; Frank Boers (Erasmus College of Brussels
and University of Antwerp) for his careful reading of the text, for his many helpful
suggestions and for his keen interest in our research; the two anonymous reviewers on
whose suggestions we have endeavoured to act.
Notes
1. This independence and know-how are essentially what we mean by ‘good learner’. It was a key
feature of the method of language learning for non-language specialists developed by a number
of Belgian linguists in the 1980s (Kelly, 1989; Ostyn & Godin, 1985). The learner assumes
responsibility for his or her learning, and is given the materials and knowledge needed to
progress on their own. It is beyond the scope of this paper to say what that knowledge is as that
would take us into the wide and well-researched world of learning strategies.
2. Implicit learning in cognitive psychology can be defined as ‘‘learning without awareness of what
is learned’’ (Dekeyser, 2003, p. 314). Thus explicit learning can be defined as learning with
awareness of what is learned.
3. This cultural aspect of vocabulary learning is stressed and discussed at some length in one of the
research papers that preceded the development of the software (Vanparys et al., 1997).
4. Idioms are introduced for two purposes: to add them to the learners’ lexicon and to show how
idioms in different languages reflect the culture of the language.
5. Ridiculous and funny can be both translated into hao xiao de in Chinese. Thus if a Chinese learner
only remembers the translation for the two words s/he would not be able to know that ridiculous
has a negative connotation while funny is always positive.
6. Each time the user goes to WF or WMA to view a word, a count will be recorded. If the data
show that a user has viewed 15 words in WF, this only means s/he has referred to words in WF
15 times and does not necessarily mean s/he has looked up 15 different words, because a given
word can be viewed several times. Due to some technical constraints, the software programmer
was unable to develop the function to record what words were viewed.
7. The motivation source may be their perceived value of using the software since most Chinese
learners are keen to improve their English on account of the exam requirement and to give them
more chances of professional advancement.
Notes on contributors
Qing Ma is currently doing a Ph.D. in applied linguistics at the University of Louvain, Faculty of
Arts, Belgium. Her main research interests include second language vocabulary acquisition
and CALL.
Computer Assisted Vocabulary Learning 39
Page 26
Peter Kelly is a professor of linguistics at China University of Three Gorges, formerly senior
professor at the University of Namur, Belgium, where he directed the School of Modern
Languages. His main research interests are in the area of second language acquisition.
References
Ames, W. S. (1966). The development of a classification scheme of contextual aids. Reading
Research Quarterly, 11(1), 57 – 82.
Asher, J. J. (1964). Towards a neo-field theory of behaviour. Journal of humanistic psychology, 4,
85 – 94.
Asher, J. J. (1983). Learning another language through actions. Los Gatos: California Sky Oaks
Productions, Inc.
Atkinson, R. C., & Raugh, M. R. (1975). An application of the mnemonic keyword method to the
acquisition of a Russian vocabulary. Journal of Experimental Psychology: Human Learning and
Memory, 104(2), 126 – 133.
Beck, I. L., McKeown, M. G., & McCaslin, E. S. (1983). Vocabulary development: all contexts are
not created equal. The Elementary School Journal, 83(3), 177 – 181.
Boers, F., Eyckmans, J., & Stengers, H. (2004). Researching mnemonic techniques through CALL:
the case of multiword expressions. Proceedings of The Eleventh International CALL Conference
(pp. 43 – 48). Antwerp: University of Antwerp.
Boers, F., & Lindstromberg, S. (2005). Finding ways to make phrase-learning feasible: the
mnemonic effect of alliteration. System, 33, 225 – 238.
Burt, M., & Dulay, H. (1975). New directions in second language teaching, learning and bilingual
education. Washington, DC: TESOL.
Chapelle, C. A. (2001). Computer applications in second language acquisition. Cambridge: Cambridge
University Press.
Chun, D. M., & Plass, J. L. (1996a). Effects of multimedia annotations on vocabulary acquisition.
The Modern Language Journal, 80(2), 183 – 198.
Chun, D. M., & Plass, J. L. (1996b). Facilitating reading comprehension with multimedia. System,
14(4), 503 – 518.
Clark, R. E. (1994). Media will never influence learning. Educational Technology Research and
Development, 42(2), 21 – 29.
Coady, J. (1993). Research on ESL/EFL vocabulary acquisition: putting it in context. In T. Huckin,
M. Haynes, & J. Coady (Eds.), Second language reading and vocabulary learning (pp. 3 – 23).
Norwood, NJ: Ablex Publishing.
Coady, J. (1997). L2 vocabulary acquisition: a synthesis of research. In J. Coady & T. Huckin
(Eds.), Second language vocabulary acquisition (pp. 273 – 290). Cambridge: Cambridge
University Press.
Cohen, A. D. (1987). The use of verbal and imagery mnemonics in second-language vocabulary
learning. Studies in Second Language Acquisition, 9(1), 43 – 64.
Cohen, A. D. (1998). Strategies in learning and using a second language. Harlow, Essex:
Longman.
Cohen, A., Weaver, S. J., & Li, T. Y. (1996). The impact of strategies-based instruction on
speaking a foreign Language. CARLA Working Paper Series, 4. Retrieved January 10, 2005,
from www.carla.umn.edu/about/profiles/CohenPapers/SBIimpact.pdf
Colpaert, J. (2003). Introduction to CALL. Lecture given at the ELSNET Summer School 2003,
June, Lille, France.
Colpaert, J. (2004). Design of online interactive language courseware: conceptualisation, specification and
prototyping. Research into the impact of linguistic-didactic functionality on software architecture.
Unpublished PhD thesis, University of Antwerp, Belgium. Retrieved June 8, 2005, from
www.didascalia.be/doc-design.pdf
40 Q. Ma and P. Kelly
Page 27
Dekeyser, R. (2003). Implicit and explicit learning. In J. Doughty & M. L. Long (Eds.), The hand
book of second language acquisition (pp. 313 – 348). Oxford: Blackwell.
De Ridder, I. (2002). Visible or invisible links: Does the highlighting of hyperlinks affect incidental
vocabulary learning, text comprehension, and the reading process? Language Learning &
Technology, 6(1), 123 – 146.
De Saussure, F. (1974). Course in general linguistics. London: Fontana/Collins.
Duquette, L., Renie, D. & Laurier, M. (1998). The evaluation of vocabulary acquisition when
learning French as a second language in a multimedia environment. Computer Assisted
Language Learning, 11(1), 3 – 34.
Ervin-Tripp, S. (1974). Is second language learning like the first? TESOL Quarterly, 8, 111 – 127.
Gary, J. O. (1975). Delayed oral practice in initial stages of second language learning. In M. Burt &
H. Dulay (Eds.), New directions in second language teaching, learning and bilingual education
(pp. 89 – 95). Washington, DC: TESOL.
Gary, N., & Gary, J. O. (1982). Packaging comprehension materials: towards effective language
instruction in difficult circumstances. System, 10(1), 61 – 69.
Goodfellow, R. (1994). A computer-based strategy for foreign-language vocabulary learning.
Unpublished PhD thesis, Open University, UK.
Goodfellow, R. (1995). A review of the types of CALL programmes for vocabulary instruction.
Computer Assisted Language Learning, 2 – 3, 205 – 226.
Groot, P. J. M. (2000). Computer assisted second language vocabulary acquisition. Language
Learning & Technology, 4(1), 60 – 81.
Hulstijn. J. (1992). Retention of inferred and given word vocabulary learning. In P. J. Arnaud and
H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 113 – 125). London: Macmillan.
Hulstijn, J. (2003). Incidental learning and intentional learning. In J. Doughty & M. L. Long
(Eds.), The handbook of second language acquisition (pp. 349 – 381). Oxford: Blackwell
Publishing Ltd.
Kelly, P. (1986). Solving the vocabulary retention problem. ITL, 74, 1 – 16.
Kelly. P. (1989). A particular application of the RALEX method of foreign language learning.
Le Langage et l’Homme, 24(70), 153 – 160.
Kelly, P. (1992). Does the ear assist the eye in the long-term retention of lexis? International Review
of Applied Linguistics, 30(2), 137 – 145.
Kelly, P., & Li, X. (2005). A new approach to learning English vocabulary: more efficient, more effective,
and more enjoyable. Beijing: Foreign Language Teaching and Research Press.
Kelly, P., Li, X., Vanparys, J., & Zimmer, C. (1996). A comparison of the perceptions and practices
of Chinese and French-speaking Belgian university students in the learning of English: the
prelude to an improved programme of lexical expansion. ITL, 113 – 114, 275 – 303.
Knight, S. (1994). Dictionary: The tool of last resort in foreign language reading? A new
perspective. The Modern Language Journal, 78, 285 – 299.
Krashen, S. (1989). We acquire vocabulary and spelling by reading: additional evidence for the
input hypothesis. The Modern Language Journal, 73(4), 440 – 464.
Krashen, S. (1993). The power of reading. Englewood Colorado: Libraries Unlimited Inc.
Krashen, S., & Terrell, T. (1983). The natural approach: language acquisition in the classroom. Oxford:
Pergamon Press.
Kukusla-Hulme, A. (1988). A computerized interactive vocabulary development system for
advanced learners. System, 16(2), 163 – 170.
Laufer, B. (1997). The lexical plight in second language reading: words you don’t know, words you
think you know and words you can’t guess. In J. Coady, & T. Huckin (Eds.), Second language
vocabulary acquisition (pp. 20 – 34). Cambridge: Cambridge University Press.
Laufer, B. (1998). The development of passive and active vocabulary in a second language: Same or
different? Applied Linguistics, 19(2), 255 – 271.
Laufer, B. (2001). Reading, word-focused activities and incidental vocabulary acquisition in a
second language. Prospect, 16(3), 44 – 54.
Computer Assisted Vocabulary Learning 41
Page 28
Laufer, B., Elder, C., Hill, K, & Congdon, P. (2004). Size and strength: do we need both to
measure vocabulary knowledge? Language Testing, 21(2), 202 – 226.
Laufer, B., & Hill, M. (2000). What lexical information do L2 learners select in a CALL dictionary
and how does it affect word retention? Language Learning & Technology, 3(2), 58 – 76.
Laufer, B., & Nation, I. S. P. (1995). Vocabulary size and use: lexical richness in L2 written
production. Applied Linguistics, 16, 307 – 322.
Levy, M. (1997). Computer assisted language learning. Oxford: Clarendon Press.
Levy, M. (1999). Design processes in CALL: integration theory, research and evaluation. In
K. Cameron (Eds.), Computer assisted language learning: media, design and applications
(pp. 84 – 107). Lisse: Swets & Zeitlinger.
Levy, M. (2002). CALL by design: discourse, products and process. ReCALL, 14(1), 55 – 84.
Li, X., Song, X., Zimmer, C., Vanparys, J., & Kelly, P. (1999). WUFUN: a new approach to more
efficient and effective vocabulary learning. ITL, 125 – 126, 181 – 194.
Miller, G. A. (1996). The science of words. New York: Scientific American Library.
Nagy, W. E., Herman, P. A., & Anderson, R. C. (1985). Learning words from context. Reading
Research Quarterly, 20, 233 – 253.
Nation, I. S. P. (1990). Teaching and learning vocabulary. New York: Newbury House Publishers.
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Nord, J. R. (1978). Developing listening fluency before speaking: an alternative paradigm. Paper
presented at the 5th World Congress of Applied Linguistics, Montreal, Canada.
O’Malley, J. M., & Chamot, A. U. (1990). Learning strategies in second language acquisition.
Cambridge: Cambridge University Press.
O’Malley, J. M., Chamot, A. U., Stewner-Manzanares, G., Russo, R. P., & Kupper, L. (1985).
Learning strategy applications with students of English as a second language. TESOL
Quarterly, 19, 557 – 584.
Ostyn, P. & Godin, P. (1985). RALEX: An alternative approach to language teaching. The Modern
Language Journal, 6(4), 346 – 355.
Oxford, R. L., & Scarcella, R. C. (1994). Second language vocabulary learning among adults: State
of the art in vocabulary instruction. System, 22(2), 231 – 243.
Paribakht, T. S., & Wesche, M. B. (1997). Vocabulary enhancement activities and reading for
meaning in second language vocabulary acquisition. In J. Coady, & T. Huckin (Eds.), Second
language vocabulary acquisition (pp. 174 – 200). Cambridge: Cambridge University Press.
Pavio, A., & Desrochers, A. (1979). Effects of an imagery mnemonic on second language recall and
comprehension. Canadian Journal of Psychology, 33, 17 – 28.
Pavio, A., & Desrochers, A. (1980). A dual-coding approach to bilingual memory. Canadian Journal
of Psychology, 34, 388 – 399.
Postovsky, V. A. (1975). The priority of aural comprehension in the language acquisition process. Paper
presented at the 4th AILA World Congress, Stuttgart, Germany.
Pressley, M., & Levin, J. R. (1981). The keyword method and recall of vocabulary words from
definitions. Journal of Experimental Psychology: Human Learning, 17(1), 72 – 76.
Sternberg, R. J. (1987). Most vocabulary is learned from context. In M. G. McKeown & M. E. Curtis
(Eds.), The nature of vocabulary acquisition (pp. 89 – 105). London: Lawrence Erlbraum Associates.
Vanparys, J., Zimmer, C., Li, X., & Kelly, P. (1997). Some salient and persistent difficulties
encountered by Chinese and Francophone students in the learning of English vocabulary.
ITL, 115 – 116, 137 – 164.
Wenden, A. L. (2002). Learner development in language learning. Applied Linguistics, 23, 32 – 55.
Wesche, M. B., & Paribakht, T. S. (2000). Reading-based exercises in second language vocabulary
learning: an introspective study. The Modern Language Journal, 84(2), 196 – 213.
Winitz, H. (1977). Nonauditory auditory disorders. Otolaryncologic Clinics of N. America, 10, 187 – 192.
Winitz, H. (1978). The learnables. Kansas: International Linguistics Corporation.
Winitz, H., & Reeds, J.A. (1973). Rapid acquisition of a foreign language (German) by the
avoidance of speaking. International Review of Applied Linguistics, 18(3), 245 – 247.
42 Q. Ma and P. Kelly
Page 29
Appendix A. Vocabulary items to be studied in WUFUN
Words
Acquaintance, available, burst, dam, damage, despair, dump, fail, formal, funny,
injury, jump, land, policy, quantity, ridiculous, roof, shallow, shrink, sign, stretch,
suit, utterly, wonder, weight.
Idioms
He is in the depths of despair
I am fit to burst
I split my sides laughing.
Computer Assisted Vocabulary Learning 43
Page 30
44 Q. Ma and P. Kelly
Page 31
Computer Assisted Vocabulary Learning 45