Designing a Vocabulary Achievement Test For Saudi Six Graders

Student ID : 610058427

1

UNIVERSITY OF EXETER

GRADUATE SCHOOL OF EDUCATION

DESIGNING A VOCABULARY ACHIEVEMENT TEST FOR SAUDI SIX GRADERS

Student Name: Ahmed Saad Abdelsalam

Summative Assignment

Summer 2013

Submitted to Dr Susan Riley

Submitted as part of the requirements for the M.Ed in TESOL

Summer Intensive Course


2

Contents

Introduction

I. Teacher-based Assessment

II. Testing Vocabulary

III. Context

IV. Test-Takers

V. Test Purpose

VI. Content

VII. Structure, timing, medium/channel

VIII. Test tasks

a. Task types

b. Writing tasks

IX. Test Administration

a. Test environment

b. Personnel and Procedures

c. Scoring

d. Feedback

Conclusion

Bibliography

Appendix 1 (unit 4)

Appendix 2 ( Vocabulary Achievement Test)


3

INTRODUCTION

Brown (2003) defines the term “ test” as “ A method of measuring a person’s ability, knowledge

or performance in a given domain”. Testing and assessing as described by Brown (2003) are not

synonyms. However, tests could be considered as a subset of assessment. Brown (2003)

suggests different distinctions of assessment. These distinctions are mainly emerging from

differences in the form, function and score interpreting of tests.

This paper is mainly about the development of an achievement test for Saudi young learners.

The test focuses only on vocabulary knowledge. In the following, the teacher-based assessment

will be introduced. The assessment of vocabulary, the testing ability, will be discussed. Then,

the test development steps will be presented. These steps will be based on the arguments of

Fulcher & Davidson (2007), Douglas (2010) and Hughes (2003) about the stages of test

development. The first stage includes describing the context and test-takers, identifying the

test purpose and setting the content, structure and timing. The second stage includes selecting

the appropriate task types and writing the task items. The last stage involves developing the

rules and procedures of test administration. It presents the test environment, personnel,

procedures, scoring and feedback. Concepts of validity, reliability and practicality are

considered during the test development. Thus, the test would be a valid, reliable and practical

one.

I. TEACHER-BASED ASSESSMENT

Teacher-based assessment as stated by Davison and Leung (2009) is used widely in a

number of educational systems internationally. TBA could be defined as a teacher-

mediated, classroom-embedded and context-based assessment. This kind of

assessment as discussed by Davison and Leung (2009) has some main characteristics

that make it distinguished from traditional exams. First, it contains teacher

involvement in the assessment process. It could be adapted and modified and is also

conducted in the ordinary classroom. It is administered by the students’ teacher, not a

stranger. TBA gives the teacher the opportunity to give instant and constructive


4

feedback. In addition, This kind of assessment has some tools that establish the

validity and reliability of the test. As the teacher can assess the students more than

once that gives an opportunity for assessor reflection or standardization. Moreover,

TBA is considered cheaper and practical as it is taken by the teacher and integrated

into the normal curriculum.

II. TESTING VOCABULARY

Vocabulary is one of the controversial topics in language teaching. Many theories and

approaches have discussed the nature of vocabulary. Several scholars such as Schimtt (2008),

Nagy (2005) and Herman and Dole (1988) set approaches, techniques and procedures for

teaching and learning vocabulary. They have viewed vocabulary as the main factor of language

learning and teaching. Wilkins (1972) (as stated by Schmitt (2010) says “Without grammar very

little can be conveyed, without vocabulary nothing can be conveyed. “

Read (2000) argues that the assessment of vocabulary knowledge is necessary and

straightforward. He described words as small units that form the larger structures such as

sentences, paragraphs and whole texts. Read (2000) refers to two different perspectives of

vocabulary assessment. These perspectives as described by Read (2000) are complementary as

they relate to different purposes. Concerning the first perspective, meaning and usage of a set

of words will be assessed as independent semantic units. This kind of assessment could be used

by classroom teachers to assess the students’ progress in learning vocabulary and identify the

points of weakness. Concerning the other perspective, the vocabulary will be assessed in a

context, integrated with the other language knowledge. This kind of test is a large scale one,

where vocabulary has less importance. From these perspectives, three dimensions have

suggested for vocabulary assessment; discrete – embedded, selective – comprehensive and

context independent and context dependent. The first dimension as stated by Read (2000)

concentrates on the construct; the ability to be measured. The discrete test would assess the

vocabulary as a distinct construct. On the other hand, the embedded test would assess the

vocabulary as part of the language construct. The second dimension relates to the range of


5

vocabulary. In this dimension, the difference is that whether the test will be based on selected

words or based on the overall ability of the language use. The third dimension considers the

contexts. In the context independent vocabulary test, the vocabulary will be presented in

isolation. On the other hand, in the context dependant vocabulary test, the vocabulary will be

presented within a context.

III. CONTEXT

The test occurs in a Saudi national school, which has an American division. The school is

accredited by AdvanceED and follows California Standards for public schools. Scott Foresman

Reading Street coursebook is used in the elementary stage. Scott Foresman Reading Street is a

story based course book. The core book is divided into units, each unit consists of five stories

that contain the to-be learned vocabulary and the related word knowledge. The teachers are

Arab nationals. They hold bachelor degrees in language education, however, they do not have

training qualifications such as CELTA.

IV. TEST-TAKERS

The characteristics of the test takers are generally not influencing the test design so

much. However, it is not the case when it comes to the young learners. McKay (2006)

shows that the young learners have special characteristics that require a special

approach to their assessment. This approach should take into account that the young

learners are still growing cognitively, socially, emotionally and physically. They are

also not aware of the outcomes of either success or failure in the assessment. In

addition, the young learners as described by Hughes (2003) lose attention in the long

tests. They respond well to pictures and interact with games. All these features should

be taken into account when the target test is designed. So, the test would have a

positive impact on the learners. In addition, it would achieve the main purpose of

designing it.


6

In this case, the test takers are the same as described by McKay (2006). They are

Saudi six graders. They are young language learners. They do not pay much attention

to the results of the tests, only when it comes to the appraisal and rewards. They got

bored in long tests and that reflects on their answers. They may leave some questions

of the test. On the other hand, their performance in the exams affects their attitudes

toward the subjects.

V. TEST PURPOSE

Many scholars such as Douglas (2010), Read (2000), Hughes (2003) and Brown (2003),

describe defining the test purpose as an important step in designing a test. The test

purpose as stated by Hughes (2003) is the basis for the test validity. The test is valid as

long as its results help to achieve the goals of designing it. Thus, the first step in

designing the target test should be to define a clear purpose of it. Read (2000)

suggests three broad purposes for language tests; for research, making decisions

about learners and making decisions about a language program. Hughes (2003) also

suggests some key questions that would help to build a solid basis for designing the

test. These questions are about test type, precise purpose, abilities to be tested, form

and accuracy of the results and the importance of backwash. These questions

suggested by Hughes (2003) in addition to the broad purposes of Read (2000) will be

the framework of defining the purpose of the target test.

This test is a teacher-based assessment, not a large scale one. It is mainly for Saudi six

graders in a Saudi national school. This test aims at making decisions about learners,

not for research nor making decisions about language programs. It tries to assess the

vocabulary knowledge of the students. This knowledge mainly includes the meaning

and the use of the words with a focus on the recognition and recall skills. This test,

according to the categorization of Brown (2003) or Hughes (2003) is an achievement

test as it occurs at the end of unit 4 of Scott Foresman Reading Street (Grade 6). It is a


7

formal assessment as it should be taken regularly at the end of each unit. It is a

discrete-point test, not an integrative one as it assess only vocabulary. This type of

testing usually occurs in the target context for some main reasons. First, learning

vocabulary is one of the weak points in the students’ performance. Thus, this regular

assessment will motivate them to focus on vocabulary. In addition, the school follows

an ongoing assessment that takes up 60% of the overall grade. 30% is only for weekly

and unit tests. Moreover, this regular test gives the students as well as their parents

the opportunity to identify the points of weakness with regard to vocabulary. The

teacher also suggests the appropriate solution to overcome the challenges faced by

the students.

VI. CONTENT

Read (2000) and Hughes (2003) emphasize the importance of defining the content of

the test. Read (2000) sees that the full information on the content will help to make

better decisions in the subsequent steps of the test. Hughes (2003) also shows that

defining the construct to be assessed influences the validity of the test. Content

validity is an important factor in building the validity of the test. It also affects the

process of backwash. Hughes (2003) shows that the content validity is developed by

including the proper samples of the skills or knowledge to be assessed. Alderson,

Clapham and Wall (1995) state that content validation is based on gathering experts’

judgements about the content.

On the other hand, Hughes (2003) argues that the description of the content varies

according to the abilities to be discussed. For instance, the content of grammar would

be a list of structures. However, the content of a language skill would cover other

dimensions such as type of text, topics, speed of processing and dialects. In case of a

vocabulary test, another issue arises that is selecting target words. Read (2000) shows

that selecting the target words varies according to the type of test. In designing

classroom progress test, as described by Read (2000), the teacher usually selects the


8

words that the learners have recently been studying. This process is simple when the

test is based on coursebooks that include word lists for each unit. After selecting the

words, the issue of how to present the words also arises. Read (2000) argues that the

words could be presented either in isolation or in context. The key factor in this issue

is that whether the test-takers’ responses would be based on the contextual

information.

In this case, as shown in the test purpose, it is an achievement test on vocabulary. So,

the outline will be simple and practical. It will include simply the word lists to be

assessed. The process of selecting the target words will be also simple. The test is

mainly based on Scott Foresman Reading Street coursebook. This coursebook is

divided into units, each unit has five stories and each story contains a list of high

frequency words. The target word lists will be based on Unit 4 as shown in Appendix

1. The word lists will also be monolingual. The definitions will be presented in the

English language. All the words delivered through the unit will be included. After that,

the content will be examined by the academic supervisor. Thus, the content validation

would be assured.

VII. Structure, timing, medium/channel

According to the syllabus outline as well as the available time, this test usually occurs

at the end of the unit in one period. Therefore, the test will take 30 minutes. The test

will assess only vocabulary as mentioned before. It will include 20 items. The test will

be a paper-pencil one.

VIII. TEST TASKS

a. Task Types

Douglas (2010) shows that various testing techniques could be used according to the

target language skill or knowledge. Techniques that are used for assessing speaking

could be different from the techniques used for assessing writing. However, Hughes

(2003) refers to some common techniques that could be used with different language


9

skills. These techniques include multiple choice items, Yes/No items and filling gaps

items. Read (2000) suggests several techniques to assess vocabulary; matching items,

completion items and sentence writing items. Matching technique is a recognition

task that focuses on the meaning of the word. Read (2000) refers to some notes to be

considered when we write the items. One of these notes is to add one or two extra

items. Another note is that the definitions should be easy to understand. One more

note is that the words should belong to one word class. On the other hand,

completion technique is a recall task. One idea suggested by Read (2000) is to provide

the letter of the target word so that the number of answers will be restricted.

Sentence writing technique is the simplest vocabulary task from the perspective of

test preparation (Read 2000). In addition, this technique assesses the meaning, the

grammatical function, word collocation and the ability to use the word in writing.

In addition, Douglas (2010) categorizes the testing techniques into three categories;

selected response, short response and extended response. Concerning the selected

response category, the test-takers have to choose answers from the available options.

The selected response category includes; multiple choice, matching and ordering

tasks. This category has an advantage of easy scoring and is able to assess language

knowledge. However, it requires a great deal of care in development through revision

and test trial processes. In the short answer items, the test-takers have to produce

words or phrases to complete the sentences. The main obstacle in developing such

techniques is that you provide all the acceptable answers. They also require a great

deal of care in the development process. Concerning the extended response category,

the test-takers are supposed to produce written or spoken longer discourse than a

sentence. One common extended response is the essay or composition.

In this case, the selected response items for assessing vocabulary will be selected. Therefore,

the matching items and completion items will be used. Several reasons are behind the selection

of selected response items; matching and completion items. First, these techniques are easy

scoring, so they would be more appropriate to the target test. As the test occurs during


10

teaching, so the time will not be enough for scoring short answer and extended answer

techniques. In addition, the matching and completion techniques cover the testing vocabulary

abilities. The matching technique will assess the recognition ability. The completion technique

will assess the recall ability. The sentence writing technique is not appropriate in this case. It is

time consuming and involves grammar and writing, which are not the objectives of the test in

this case.

b. Writing Items

Hughes (2003) argues that it is not possible to cover the whole items of content by

any one version of the test. To assure the validity of the test, the test should choose

widely and unpredictably from the content area. In case of testing vocabulary,

Hughes shows that the vocabulary items could be taken at random. He also

emphasizes the importance of keeping the test specifications in mind when writing

the items. In addition, Hughes (2003) states that the test should be examined by at

least two colleagues. He thinks that moderating the test is the best way to identify the

items to be improved.

In this case, four random vocabulary items will be taken from each story. So, all

stories would be covered by the test. Two words will be used in the matching part and

the other two will be used in the completion part. In the matching part, the

recognition ability will be assessed. It will be divided into sections so that it will not be

long for the students. Five words will be written against seven definitions in each

section. In the completion part, ten sentences based on the other selected ten words

will be written. The target words will be removed from the sentences. Only the first

letter of the word will be left. These sentences could be extracted from the course

book or any other text materials. After writing it, the test items will be examined by

the academic supervisor to be improved. Sometimes the test is also examined by one

teacher who teaches the same grade.


11

IX. TEST ADMINISTRATION

a. Test Environment

Douglas (2010) refers to the importance of providing the test-takers with the best

opportunity to show their language ability. Therefore, the test should be taken in a

setting which is as pleasant as possible. Hughes (2003) says that rooms should be

appropriate to the number of the test-takers. There should be enough space between

tables to prevent copying and cheating. Hughes (2003) states that the testing

materials should be ready in advance. All required equipments should be available in

plenty of time for repair or replacement. Douglas (2010) suggests to have short breaks

in case of long tests that last for two or three hours.

In this case, as the test is a teacher - based assessment, so the students will take the

test in their ordinary class. The desks are to be checked two days before for repair or

replacement. The desks are organized in rows and there is enough space between

them to prevent cheating. The test sheets are to be photocopied two or three days

before. The test will last for only 30 minutes, so there will be no breaks.

b. Personnel and Procedures

Hughes (2003) and Douglas (2010) show that many factors related to personnel and procedures

of the test have to be considered. First, examiners should be provided with full instructions and

receive practice on the different aspects of the tests. There should be enough proctors

especially in case of a large number of test-takers. The test-takers themselves should receive

the full instructions about the test. Some rules should be set about the arrival time, accepted

behavior and checking identity.

In this case, the examiner or the proctor will be the teacher. The students will be ready on time,

as the test will be taken in one period of the school days. The students will be notified on the

test day. The teacher will give the students full instructions about what they should do on the

test, testing rules and the time limit.

c. Scoring & Feedback


12

After designing the test, scoring and grading must be considered (Brown 2003). The

ways of scoring vary according to the types of subtests. In the objective subtests,

Alderson, Clapham and Wall (1995) say that each item will be assigned one mark if

correct and 0 if wrong. Then, these marks can be added together to arrive a total for

the subtest or the whole test. These tests can be scored mechanically (Douglas 2010).

On the other hand, holistic and analytic ratings will be used to assess the performance

on the whole test or on individual tasks. Douglas 2010 says that these tests require

trained raters.

Alderson, Clapham and Wall (1995) show that it is possible that some items could be

more important than the other items. Thus, they could be given extra value. However,

they state that the simplest method of weighting is to give the same mark to each

item.

One important step twhen set the score plan is to determine the pass mark. Many

ways are suggested (Alderson, Clapham and Wall 1995) to set the pass mark. These

ways are mainly based on whether the test is norm-referencing or criterion

referencing. The main difference between these approaches is whether the scoring

will be based on the ranking of test-takers’ score or based on a standard or criterion.

Alderson, Clapham and Wall (1995) refer to some historic traditions in pass mark such

as 50% or 75%.

In this case, the test contains 20 objective items; matching items and completion

items. Each item will be assigned one mark if correct and o if wrong. All the items will

be given the same mark. They have the same importance, as all of them are

vocabulary items. In addition, the recall and recognition skills have the same focus.

Concerning the pass mark, the school itself follows 50% pass mark, therefore, the test

follows the same pass mark. The test will be checked by the students’ teacher. This

could be a tool that establishes the reliability of the test. As the teacher already

knows the target students. He also has the opportunity to do more than one

assessment to check the reliability.


13

Brown (2003) shows that the scoring process will not be complete unless the feedback is given.

In this case, as the target test is a teacher-based assessment, so the students will receive an

instant and constructive feedback.

CONCLUSION

To conclude, an achievement test has been designed (See Apendix 2). It could be described as a

teacher-based assessment. This test will be taken in a Saudi national school. The test-takers are

Saudi six graders of the American division in the school. The test focuses only on vocabulary

knowledge; recognition and recall skills. This test is part of ongoing assessment that occurs

weekly and at the end of each unit. It also helps to build a beneficial washback. The content of

the test is based on the high frequency word lists of Unit four in Scott Foresman Reading

Street. These lists are supported by an English definition for each word. The time limit of the

test will be 30 minutes. It consists of 20 items. The test is a paper and pencil one. The test tasks

will be selected response ones; completion and matching items. Four vocabulary items will be

selected at random from each story; two items used in the matching part and the other two

items used in the completion part. The test will be examined by the academic supervisor. It may

also be examined by another teacher. The test will be conducted in the ordinary classroom. It

will be administered by the students’ teacher, not a stranger. The teacher will also mark the

test. Each item will be assigned one mark if correct and 0 if wrong. The students will receive an

instant and constructive feedback after the test has finished.


14

BIBLIOGRAPHY

Alderson, J. C. & Clapham C. & Wall, D. (1995). Language Test Construction and Evaluation.

Cambridge University Press: UK.

Brown, H. D. (2003). Language Assessment: Principles and Classroom Practices. Pearson &

Longman.

Davison, C. & Leung, C. (2009). Current Issues in Teacher Based Assessment. TESOL Quarterly,

Volume 43, Issue 3, pages 393–415, September 2009.

Douglas, D. (2010). Understanding language testing. Abingdom : Hadder.

Fulcher G., Davidson, F.(2007). Language Testing and Assessment : An Advanced Resource Book.

Routledge: USA & Canada.

Hughes, A. (2003). Testing for Language Teachers. Cambridge University Press: UK.

McKay, P. (2006). Assessing Young Language Learners. Cambridge University Press : UK.

Read, J. (2000). Assessing Vocabulary. Cambridge University Press: UK.

Nagy, W. (2005). Why Vocabulary Instruction Needs to Be Long-Term and

Comprehensive. Hiebert, & Kamil (Ed.), Teaching and Learning Vocabulary (p. 26 -40). New

Jersey : Lawrence Erlbaum Associates, Publishers.

Schmitt, N. (2008). Teaching Vocabulary. Pearson Education, Inc.

Schmitt, N. (2010). Researching Vocabulary: A Vocabulary Research Manual. Palgrave

MacMillan.

Herman A. & Dole, J. (1988). Theory and Practice in Vocabulary Learning and Instruction. The

Elementary School Journal, Vol. 89, N. 1 (Sep. 1988) , P. 42 -54.


15

Appendix 1


16


17


18

Appendix 2

Name : ………………………………………………………………………………………………………………………………..

Grade Subject Unit Selections Date

6 L.A. 4 1 – 5 ...../……/2014

Vocabulary Test

Directions: Match the word with its meaning. (1 mark each)

( A)

1. Existence ( )

2. Ordeal ( )

3. Encounter ( )

4. Earthen ( )

5. Proclaimed ( )

(B)

a. Made of soil

b. Meet in a battle

c. Condition of being

d. Declared publicly

e. Friendly

f. The state of being alone

g. A severe experience

(A)

1. Generated ( )

2. Destiny ( )

3. Isolation ( )

4. Aliens ( )

5. Hospitable ( )

(B)

a. One’s fate or fortune

b. Condition of being

c. Imaginary creatures

d. The state of being alone

e. Produced

f. Friendly

20


19

Complete the following sentence with the appropriate word.

(1 mark each) Note: The first letter of the word is already written.

1. N……………………. a person in charge of finding the position and course of a ship or

aircraft.

2. E……………………………. is a journey taken for a special purpose.

3. Azzam really enjoyed the c………………………. among his friends.

4. The science class went to visit several wildlife S……………….. .

5. Enslaved people experienced the b…………………….

6. S……………………. Is what a town or city of today once was.

7. They mine iron O……………… here.

8. Her v…………………. of the story was hilarious.

9. “It c…………………… food into electrical pulses,” the inventor declared.

10. If the machine worked, it could result in more e…………………….. by saving time.

(Sentences are extracted from Scott Foresman Reading Street / Practice Book)

Designing a Vocabulary Achievement Test For Saudi Six Graders

Documents