Page 1
AN ITEM ANALYIS ON DISCRIMINATING POWER
OF ENGLISH SUMMATIVE TEST
(A Case Study of Second Year of “SMPN 87” Pondok Pinang)
A “Skripsi”
Presented to Faculty of Tarbiya and Teachers Training
in Partial Fulfillment of the Requirements
for the Degree of S.Pd. (Bachelor of Arts) in English Language Education
By:
HIKMAH LESTARI
105014000379
By:
HIKMAH LESTARI
105014000379
DEPARTMENT OF ENGLISH EDUCATION
FACULTY OF TARBIYAH AND TEACHERS’ TRAINING
SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERITY
JAKARTA
2011
Page 2
AN ITEM ANALYIS ON DISCRIMINATING POWER
OF ENGLISH SUMMATIVE TEST
(A Case Study of Second Year of “SMPN 87” Pondok Pinang)
A “Skripsi”
Presented to Faculty of Tarbiyaand Teachers‟ Training
in Partial Fulfillment of the Requirements
for the Degree of S.Pd. (Bachelor of Arts) in English Language Education
Approved by the advisor:
Dr. Atiq Susilo, MA
NIP: 19491122 197803 1 001
DEPARTMENT OF ENGLISH EDUCATION
FACULTY OF TARBIYA AND TEACHERS’ TRAINING
SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERITY
JAKARTA
2011
Page 3
ENDORSEMENT BY THE EXAMINATION COMMITTEE
The examination committee of the Faculty of Tarbiya and Teachers Training certifies that
the „Skripsi” entitled: “AN ITEM ANALYIS ON DISCRIMINATING POWER OF
ENGLISH SUMMATIVE TEST”, written by Hikmah Lestari, student‟s registration
number 105014000379 was examined at examination session of the Faculty of Tarbiyah
and Teachers‟ Training, “SyarifHidyatullah” State Islamic University Jakarta on March,
31th, 2011. The “Skripsi” has been accepted and declared to have fulfilled one of the
requirements for the degree of S.Pd. (Bachelor Art) in English Education in The
Department of English Education.
Jakarta, March 31th, 2011
Examination Committee:
CHAIRMAN : Drs. Syauki, M.Pd. (…………………….)
NIP. 19641212 199103 1 002
SECRETARY : NenengSunengsihS.Pd. (…………………….)
NIP. 19730625 199903 2 001
EXAMINER 1 : Dr. Fahriany, M.Pd. (…………………….)
NIP. 19700611 199101 2 001
EXAMINER 2 : Drs. BahrulHasibuan, M.Ed. (…………………….)
Acknowledge by:
Dean of Tarbiya and Teachers Training Faculty
Prof. Dr. DedeRosyada, M.A.
NIP. 19571005 198703 1 003
Page 4
i
ABSTRACT
Lestari, Hikmah. 2011, An Item Analysis on Discriminating Power of English
Summative Test at the Second Grade Students of SMPN 87 Pondok
Pinang, “Skripsi”, Department of English Education, The Faculty of
Tarbiya and Teachers’ Training, Syarif Hidayatullah State Islamic
University Jakarta.
Advisor: Dr. Atiq Susilo, MA
Key words: Item Analysis, Discriminating Power, SMPN 87
The purpose of this study is to analyze the discriminating power of English
Summative test at second grade of “SMPN” 87 Pondok Pinang. Through this
study, it is hoped that the teacher can get clear description about the quality of
discriminating power of English summative, so the teacher is able to help the poor
students.
This study is categorized as descriptive analysis; because it is intended to
describe the objective condition about the discriminating power of students’
English summative test at odd semester of second grade of “SMPN 87” Pondok
Pinang by analyzing the quality of English summative test items in discriminating
students’ achievement. This study is considered as quantitative research, because
the researcher used some numerical data which is analyzed statistically.
The finding of this study is that the English summative test which is tested
at second grade of “SMPN 87” Pondok Pinang has good discriminating power,
because 35 items ranging from 0.25 until 0.75 (70%) of the test items have
fulfilled the criteria of a positive discriminating power.
Page 5
ii
ABSTRAK
Lestari, Hikmah. 2011, An Item Analysis on Discriminating Power of Engish
Summative Test at the Second Grade of SMPN 87 Pondok Pinang,
Skripsi, Jurusan Pendidikan Bahasa Inggris, Fakultas Ilmu TArbiyah dan
Keguruan, UIN Syarif Hidayatullah Jakarta.
Pembimbing: Dr. Atiq Susilo, MA
Key words: Analisa Butir Soal, Daya Pembeda, SMPN 87
Tujuan dari penelitian ini adalah untuk menganalisis daya pembeda dari
tes sumatif bahasa Inggris kelas dua SMPN 87 Pondok Pinang. Melalui penelitian
ini, diharapkan guru mendapatkan penjelasan secara jelas tentang kualitas daya
pembeda tes summatif bahasa Inggris, sehingga guru dapat membantu siswa-
siswa yang mendapatkan nilai rendah.
Penelitian ini dikategorikan sebagai analisis deskriptif; karena penelitian
ini menggambarkan kondisi objektif daya pembeda tes summative bahasa Inggris
semester ganjil siswa kelas dua SMPN 87 Pondok Pinang dengan menganalisis
kemampuan butir – butir soal pada tes summative bahasa Inggris dalam
membedakan kemampuan para siswa. Penelitian ini termasuk penelitian
kuantitatif, karena peneliti menggunakan beberapa data penghitungan yang
dianalisa dengan statistik.
Hasil temuan penelitian ini menyatakan bahwa tes sumatif bahasa Inggris
yang diuji pada kelas kedua SMPN 87 Pondok Pinang memiliki daya pembeda
yang baik, karena 35 item dengan kategori 0.25 sampai 0.75 atau 70% dari item
tes telah memenuhi kriteria dengan daya diskriminasi positif.
Page 7
iv
Mrs. Dra. Farida Hamid, M.Pd., Mr. Prof. Dr. Mulyanto Sumardi, MA., The late
Mr. Drs. Munir Sonhaji, M.Ed., Mr. Drs. Nasifuddin Djalil, M.Ag.,
Mr. Drs Arifin Toy, M.Sc., Mr. Drs. M. Zaenuri, M.Pd., and Mr. Drs Nasrun
Mahmud, M.Ed.
The writer also would like to express her gratitude to Mr. Dr. Atiq Susilo,
MA as the writer’s advisor who had kindly spent his time to give his valuable
advice, guidance, correction, and suggestion in finishing this “skripsi”
His gratitude also goes to Mr. Drs. Syauki, M.Pd as the head of English
Education Department, Mrs. Neneng Sunengsih S.Pd as the secretary of English
Education Department, and Prof. Dr. Dede Rosyada, MA as the Dean of Faculty
of Tarbiya and Teachers’ Training.
The writer dedicated many thanks to Mr. Drs. Ishak Idrus as the
headmaster of “SMPN” 87 Pondok Pinang, who had given the permission to the
writer to do the research at “SMPN” 87 Pondok Pinang.
The writer also would like to express her gratitude and love to all beloved
classmates (2005) of English Education Department, either class A, B, or C, for
sharing their knowledge, support, and time in accomplishing this ”skripsi” and for
the wonderful friendship while studying together.
May Allah the Almighty bless the all, so be it.
Finally, the writer realizes that this “skripsi” is still far from being perfect.
Constructive criticism and suggestion would be welcomed to make it better.
Jakarta, March 31th, 2011
The writer
Page 8
v
TABLE OF CONTENTS
ABSTRACT .................................................................................................... i
ABSTRAK ...................................................................................................... ii
ACKNOWLEDGEMENT ............................................................................. iii
TABLE OF CONTENTS ............................................................................... v
LIST OF TABLES ......................................................................................... vii
LIST OF APPENDICES ............................................................................... viii
CHAPTER I: INTRODUCTION
A. The Background of the Study .......................................................... 1
B. The Limitation of the Problem ........................................................ 3
C. The Formulation of the Problem ..................................................... 4
D. The Objective of Study .................................................................... 4
E. The Method of Study ....................................................................... 4
F. The Organization of the Writing .................................................... 5
CHAPTER II: THEORETICAL FRAMEWORK
A. Test
1. The Definition of Test ............................................................... 6
2. Types of Test ............................................................................. 7
3. Characteristic of a Good Test .................................................... 11
B. Item Analysis
1. The Definition of Item Analysis ................................................ 13
2. Discriminating Power ................................................................ 14
3. Types of Test Item ..................................................................... 16
Page 9
vi
4. The Importance of Item Analysis .............................................. 18
CHAPTER III: RESEARCH METHODOLOGY
A. Place and Time of Research ............................................................ 20
B. Technique of Sample Taking ........................................................... 20
C. Technique of Data Collecting .......................................................... 21
D. Research Instrument ........................................................................ 21
E. Technique of Data Analysis ............................................................ 21
CHAPTER IV: RESEARCH FINDINGS
A. Description of Data .......................................................................... 23
B. Analysis of Data .............................................................................. 36
C. Interpretation of Data....................................................................... 37
CHAPTER V: CONCLUSION AND SUGGESTION
A. Conclusion ....................................................................................... 38
B. Suggestion ....................................................................................... 39
BIBLIOGRAPHY .......................................................................................... 40
APPENDICES ................................................................................................ 42
Page 10
vii
LIST OF TABLES
Table 4.1 The Students’ Score and Group Position of English Summative
Test at the Odd Semester............................................................... 25
Table 4.2 The Students’ Answer Sheet of English Summative Test Items
from the Upper Group ................................................................... 27
Table 4.3 The Students’ Answer Sheet of English Summative Test Items
from the Lower Group .................................................................. 30
Table 4.4 The Discriminating Power Index of the Upper and Lower Group 33
Table 4.5 The Percentage of Discriminating Power ..................................... 35
Page 11
viii
LIST OF APPENDICES
Appendix 1 Form of Test .......................................................................... 42
Appendix 2 Answer Key ........................................................................... 48
Appendix 3 Students’ Answer Sheet ........................................................ 54
Appendix 4 Surat Pengajuan Judul Skripsi ............................................... 59
Appendix 5 Surat Izin Penelitian .............................................................. 61
Appendix 6 Surat Keterangan Telah Melakukan Penelitian ..................... 62
Page 12
1
CHAPTER I
INTRODUCTION
A. The Background of Study
Making an evaluation is an integral part of life; we evaluate all aspects of
our life and work constantly. In the field of education, evaluation plays an
important role because it reflects the result of education development. Evaluation
may be defined as the systematic process of collecting, analyzing, and interpreting
information to determine the extent to which pupils are achieving instructional.
Evaluation gives information about how successful the efforts of education
have been. It helps teachers to get the information about the progress of students’
achievement of the material they have learned in order to make decision.
Evaluation cannot be separated from teaching-learning process. According to
Bahman in her book “Fundamental Consideration in Language Testing”,
evaluation is defined as the systematic gathering of information for the purpose of
making decision.1 In additions, the purpose of evaluation is to provide relevant
information.
There are many techniques in collecting information for evaluation
purposes. One of them is by using test.2 A test in plain words is “A method of
1 Lyle F. Bahman, Fundamental Consideration in Language Testing, (Toronto:
Oxford University Press, 1990), p.20. 2 Fred Genesee & John A. Uphsur, Classroom – Based Evaluation in Second
Language, (New York: Cambridge University Press, 1996), p. 140
1
Page 13
2
measuring person’ ability or knowledge in a given domain”.3 Tests are used for
pedagogical purposes, either as a mean of motivating students to study or as a
mean of reviewing the material taught.
Students usually tend to study harder when they are going to have an
examination rather than when they are not and they will emphasize in studying the
material that expect to be tested. Thus, if the teacher announces that they are
going to give an examination, most students are motivated to study or to review
the material assigned.
A test is supposed to be well constructed so that it can be used effectively.
To be said as a good test, it has to fulfill the characteristics of good test, they are
validity, reliability and practicality. It is valid if the test can measure what is
supposed to be measured. It can be reliable if the result from the test is the same
even though the test is administered to the same standard for several times. A test
can be practical if it is easy to do and administered.
In applying the three characteristics above, the teacher should prepare the
test as good as possible. After that, the teacher should administer and score the
test; it is desirable to evaluate the effectiveness of the test, especially the test item
because it is necessary for teachers to use their own judgment to know how well
the test item works. This is done by studying the students’ responses to each item.
When formalized, the procedure is called item analysis.
Item analysis provides a quick, simple technique for appraising the
effectiveness of individual test items.4 Item analysis procedures provide
information for evaluating the functional effectiveness of each item and for
detecting weaknesses that should be corrected. This information is useful when
reviewing the test with students and it is indispensable when building a file of
high quality items.
There are three characteristics which are usually determined for a test
item: first, item difficulty; it indicates how difficult each item was for the group.
3 H. Douglass, Teaching by Principles, An Interactive Approach to Language
Pedagogy, (New York: Addison Wesley, Longman, 2001), p. 384. 4 Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York:
Macmillan Publishing Co., 1981), p. 262.
Page 14
3
Second, discriminatory power; it tells how well the item performs in separating
the better students from the poorer students. Third, item distracter; for multiple
choice items, it indicates how effective each alternative was for the item. So it can
be concluded that item analysis provide us the data whether the test item is too
difficult or too easy, whether it can discriminate the students or not, and whether
all the alternatives functioned as intended.
Incidentally, the writer is as a practice teacher at “SMPN 87” Pondok
Pinang. She often hears some of the students said that English items in their
school are hard and some students said in the contrary. Therefore, from the
personal experience above, she is interested in figuring out the quality of
discriminating power of English summative test items at “SMPN 87” Pondok
Pinang. The reason why the writer chooses the discriminating power is because
she thinks that the discriminating power deals more with the students than the
other two choices-the level on difficulty and the effectiveness of the distracter.
Based on the explanation above, the writer tries to limit the problem of
item analysis that she will discuss, so she just focuses on the discriminating power
of the test item. The test item that will be analyzed by the writer is a final test of
odd semester which is tested on the second grade of “SMPN 87” Pondok Pinang.
The main aim of this study is to know how well the test item can discriminate
between the students who have achieved well and those who have achieved
poorly. So, the writer tries to analyze and interpret it under the title “AN ITEM
ANALYSIS ON THE DISCRIMINATING POWER OF ENGLISH
SUMMATIVE TEST”. And this study will be done at second grade of “SMPN
87” Pondok Pinang.
B. Limitation and scope of the Problem
In order to make this study easier and deeper to comprehend, and not too
broad, the writer is going to limit the area of study. First, the writer intends to see
the quality of test item only by doing an item analysis that focused on
discriminating power. By analyzing on the discriminating power of test items, the
Page 15
4
writer can conclude how well it discriminates between the students who
performed well from those who did poorly on the test as a whole.
C. Formulation of the Problem
Based on the limitation of the problem, the writer formulates the problem
in this research as follow: “Do the test items of English summative test which is
administered at the second grade of “SMPN 87” Pondok Pinang have a good
discriminating power?”
D. Objective of the Study
The objective of this study is to measure the quality of English summative
test and to know whether the English summative test items have a good
discriminating power or not. High quality of test items is prominent to diagnose
the strengths and weaknesses of students. Thus, the findings of this study are
expected to provide useful information about the test items quality.
It is hoped that analysis on discriminating power of English summative
test which will be done by the writer can give significant contribution in
improving the quality of future English summative test items. It is also expected
that it can reveal the students who have achieved well in doing the test and they
who have not, so the teacher is able to help the poor students.
E. Method of the Study
This study is categorized as descriptive analysis; because it is intended to
describe the objective condition about the discriminating power of students’
summative test a odd semester of second grade of “SMPN 87” Pondok Pinang.
Besides, this study is called analysis, because it analyze how well the items of
English summative test can discriminate between the students who have achieved
well and those who have achieved poorly. This study is considered as quantitative
research, because the researcher used some numerical data which is analyzed
statistically.
Page 16
5
F. Organization of the Writing
The writing systematically divided into five chapters, they are:
Chapter one deals with the introduction. It consists of the background of
study, limitation and formulation of the problem, objective of the study, method
of the study, significant of the study, and organization of the writing.
Chapter two discusses the theoretical framework. It is divided into two
sections. The first section discusses about Test, definition of test, function of test,
types of test, types of test item, and character of good test. The second section
discusses about item analysis, the definition of item analysis, discriminating
power and the importance of item analysis.
Chapter three deals with the research methodology. It discusses about the
objective of study, place and time of study, technique of data collecting and
technique of data analysis.
In Chapter four, the writer will focus on the research finding. The data
description, data analysis and data interpretation are included in this section.
The last chapter, chapter five will talk about the conclusion and
suggestion.
Page 17
6
CHAPTER II
THEORETICAL FRAMEWORK
A. Test
1. The Definition of Test
Test is one of instruments for collecting data. Test can be used in an
instructional program to assess entry behavior, monitoring learning progress,
diagnose learning difficulties, and measure performance at the end of
instruction. Tests are given for many different reasons. In order to achieve
such diverse purposes, they need to be carefully planned. In classroom
settings, this planning usually entails instructional objectives.
A test is a procedure designed to elicit certain behavior from which
one can make inferences about certain characteristics of an individual.1 A test
is an instrument, device, or procedure that proposes a sequence of tasks which
a student is to respond – the result of which are used as measures of a specific
trait. Test may be defined as a task or series of tasks used to obtain systematic
observations presumed to be representative of educational or psychological
traits or attributes.2
Cronbach defines a test as a “systematic procedure for observing a
person‟s behavior and describing it with the aid of a numerical scale or a
1 Lyle F. Bahman, Statistical Analysis for Language Assessment, (Cambridge:
Cambridge University Press, 2004), p. 9. 2 Gilbert Sax, Principles of Educational and Psychological Measurement and
Evaluation, (Belmont: Wadsworth Publishing Company, 1980), p. 13.
6
Page 18
7
category system”. The phrase “systematic procedure” indicates that a test is
constructed, administered, scored and described according to prescribed rules.
The term behavior implies that a test measures the responses a person makes
to the test items. Tests do not measure a person directly but rather they infer
his characteristics from his responses to test items. We do not observe all
behavior but only a sample of behavior. A test contains only a sample of all
possible items. The test results are described with the aid of measurement
scales.3
Based on the definitions above, the writer can conclude that a test is
an instrument which is administered to measure students‟ responses to the test
items.
2. Types of Test
Test can be categorized according to the types of information they
provide. This categorization will prove useful both in deciding whether an
existing test is suitable for a particular purpose and in writing appropriate new
tests where these are necessary.4 Test can be classified based on its purpose
and based on its test maker.
a. Based on its purpose
1) Aptitude Test
An aptitude test is primarily designed to predict success in some
future learning activity. It is generally given before the student begins
language study, and may be used to select students for a language
course or to place students in section appropriate to their ability.
There are some information provided by aptitude test that the test is
useful in determining learning readiness, individualizing instruction,
organizing classroom groups, identifying underachievers, diagnosing
3 H.J.X. Fernandes, Testing and Measurement, (Jakarta: National Educational
Planning, Evaluation and Curriculum Development, 1984), p.1. 4Arthur Hughes, Testing for Language Teachers, (New York: Cambridge University
Press, 2003), p.5.
Page 19
8
learning problems and helping students with their educational and
vocational plans.
Aptitude tests are often used in selecting individuals for jobs, for
admission to training program, for scholarship, and for many other
purposes. Sometimes aptitude tests are used for classifying individuals,
as when students are assigned to different ability-grouped sections of
the same course.5
2) Achievement Test
Achievement tests measure what a person has learnt during a
course of instruction. It is given at the end of the course. The content of
achievement tests is generally based on the course of syllabus or the
course textbook.
Achievement test is designed to indicate degree of success in some
past learning activity. This purpose of achievement test is obviously
different with the purpose of aptitude test, where the aptitude test is
designed to predict success in some future learning activity. A
distinction between these two tests is made in terms of the use of the
results rather than of the qualities of the test themselves.6
Assessment and evaluation are term often used in connection with
achievement testing. The purpose of the testing is for assessing present
attainment. With achievement tests, we are trying to measure students‟
present attainment. 7 A common distinction is that achievement tests
measure what a student has learned, and aptitude tests measure the
ability to learn new tasks.8
5Howard B. Lyman, Test Scores and What They Mean, (Singapore: Allyn & Bacon,
1998), p.22. 6 Drs. Wilmar Tinambunan, Evaluation of Student Achievement, (Jakarta: Depdikbud,
1998), p.7. 7 Howard B. Lyman, Test Score……….., p. 22
8 Robert L. Linn, Norman E. Gronlund, Measurement and Assessment in Teaching,
(New Jersey: Prentice-Hall, Inc., 1995), p. 391-392.
Page 20
9
There are two kinds of achievement test, final achievement test and
progress achievement test.9 Final achievement test is that administered
at the end of the course of study. They may be written and administered
by ministries of education, official examining board, or by member of
teaching institutions. On the other hand, progress achievement test are
intended to measure the progress that students are making. They
contribute to formative assessment. Since, progress is toward the
achievement of course objective, therefore this test should relate to
objective.
3) Proficiency test
Proficiency test are designed to measure people‟s ability in a
language regardless of any training they may have had in that language.
The content of a proficiency test, therefore, is not based on the content
or objectives of language courses that people taking the test may have
followed. Rather, it is based on a specification of what candidates have
to be able to do in the language in order to be considered proficient. 10
It
means that the function of the test is to show whether the candidates
have reached certain specific abilities or not.
Some proficiency tests are intended to show whether students have
reached a given level of general language ability. Others are designed to
show whether students have sufficient ability to be able to use a
language in some specific area such as medicine, tourism or academic
study or not.
4) Diagnostic Test
The diagnostic tests seek to identify those are in which a student
needs further help. These tests can be fairly general, and show, for
example, whether a student needs particular help with one of the four
9 Desmon Allison, Language Testing and Evaluation, (Singapore: Singapore
University Press, 1999), p. 80 10
Arthur Hughes, Testing for Language…, p. 11.
Page 21
10
main language skills; or they can be more specific, seeking perhaps to
identify weaknesses in a student‟s use of grammar. These more specific
diagnostic tests are not easy to design since it is difficult to diagnose
precisely strengths and weaknesses in the complexities of language
ability. For this reason, there are very few purely diagnostic tests.
b. Based on the Test Maker
1) Standardized Test
Standardized tests are constructed by test specialists working with
curriculum experts and teachers. They are standardized in that they
have been administered and scored under standard and uniform testing
conditions so that results from different classes and different schools
may be compared. The quality of the test items in this test is high
quality because the test items are made by the specialist. Those test
items are also pretested and selected on basis of effectiveness.
2) Teacher-Made Test
Teacher made test are constructed by teachers for use within their
own classroom. Their effectiveness depends on the skill of the teacher
and his or her knowledge of test construction. The quality of test items
in this test is unknown unless the test item file is used. But the quality
is typically lower than standardized test because of teacher‟s limited
time and skill. Because this test is conducted within teacher own
classroom so that the test only compare the score among the student in
that classroom.
In general, teacher made examinations have flexibility for use
within a given classroom, but provide little data for comparing students
in different classes. Standardized tests, in contrast, are used to compare
students‟ performance in different classes or schools.11
11
Gilbert Sax, Principles of Educational and Psychological Measurement and
Evaluation, (Belmont: Wadsworth, Inc., 1980), p. 16-18.
Page 22
11
3. Characteristic of a Good Test
The test can be said as the good test I it has the certain qualifications or
the certain characteristics. The most essential characteristic of the good test
can be classified into three main aspect, they are, validity, reliability, and
practicality.12
a. Validity
The most simplistic definition of validity is that it is degree to which a
test measures what it is supposed to be measure. J.B Heaton said, “The
validity of the test is the extent to which it measures what it is supposed to
measure and nothing else”.13
The validity of a test must be considered in
measurement in this case there must be seen whether the test used really
measures what are supposed to measure.
b. Reliability
Reliability means dependability or trustworthiness. Basically,
reliability is the degree to which a test consistently measures whatever it
measures. The more reliable a test is, the more confidence we can have the
scores obtained from the administration of the test are essentially the same
scores that would be obtained if the test were re-administered to the same
group. An unreliable test is essentially useless. If a test were unreliable
then scores from a given group would be expected to be quite different
every time the test was administered. If an intelligence test were
unreliable, a student scoring an IQ of 120 today might score 140
tomorrow, and 95 the day after tomorrow If the test were highly reliable
and if the students‟ IQ were 110, then we would not expect that score to
fluctuate too greatly from testing to testing.
12
Norman E Gronlund, Measurement and…, p.51. 13
J.B Heaton, Writing Language Test, (Longman: 1998), p.153.
Page 23
12
A valid test is always reliable, but a reliable test is not necessarily
valid. If a test is measuring what it is supposed to be measuring, it will be
reliable and do so every time. But a reliable test can consistently measure
the wrong thing and be invalid.14
c. Practicality
The third characteristic of a good test is practicality or usability in the
preparation of a new test. The teacher must keep in mind a number of very
practical considerations which involves economy, ease administration, and
interpretation the result.
Economy means that the test is not costly. The teacher must take into
account the cost per copy, how many scores will be needed. How long the
administering and scoring of it will take.
Ease administration means that the test administrator can perform his
task quickly and efficiently. We must also consider the ease with which
the test can be administered.
Ease of interpretation and application JB. Heaton states “The final
point concerns the presentation of the test paper itself”, where possible, it
should be printed or type written and appear neat, tidy, aesthetically
pleasing. Nothing is worse and more disconcerting to the testiest than
untidy test paper, full of miss spelling, omissions and corrections.” If it
happens, it will be easy for student or testiest easy to interpret the test
items”.15
14
L.R Gay, Educational Evaluation and Measurement, (New York: Macmillan, Inc.,
1985) p. 167. 15
J. Charles Anderson, Claphane & Dianne Wall, Language Test Construction &
Evaluation, (Melbourne: Cambridge University Press, 1995), p.187.
Page 24
13
B. Item Analysis
Selection of appropriate language items is not enough by itself to ensure a
good test. Each question needs to function properly; otherwise, it can weaken the
exam. Fortunately, there are some rather simple statistical ways of checking
individuals‟ item. This is done by studying the students‟ responses to each item.
When formalized this procedure is called “item analysis”.16
An item analysis tells
us basically three things: how difficult each item is, whether or not the question
discriminate or tells the difference between high and low students, and which
distracters are working as they should. An analysis like this is used with any
important exam- for example, review tests and tests given at the end of a school
term or course.
1. The Definition of Item Analysis
Item analysis is usually done for purposes of selecting which items will
remain on future revised and improved version of test. There are several
descriptions about item analysis. According to Nitko in his book he stated
that, “Item analysis refers to the process of collecting, summarizing, and
using information about individual test items, especially information about
pupil‟s responses to items.17
Furthermore, Lado defines item analysis is the
study of validity, reliability and difficulty of test items taken individually as if
they were separate tests.18
Item analysis usually provides two kinds of information on items: item
facility, which helps us decide if test items are at the right level for the target
group, and item discrimination, which allows us to see if individual items are
providing information on candidates‟ abilities consistent with that provided by
the other items on the test.19
16
Harold S. Madsen, Techniques in Testing, (New York: Oxford University Press,
1983), p.180. 17
Anthony J. Nitko, Educational Test and Measurement, an Introduction, (New
York: Harcourt B Race Jovanovich, Inc, 1983), p. 284 18
Robert Lado, Language Testing, (London: Longman Group Limited, 1983), p. 342. 19
Tim McNamara, Language Testing, (Oxford: Oxford University Press, 2000), p.
60
Page 25
14
From those opinions, it can be concluded that item analysis is the
process of collecting information about pupil‟s responses to the items, to see
the quality of test items. More specific, item analysis information can tell us if
an item was too easy or too hard, how well it discriminated between high and
low scores on the test and whether all of the alternatives function as intended.
Item analysis data also aids in detecting specific technical flaws and thus
further provides information for improving the test items.
2. Discriminating Power
Item discriminatory power of a test is its ability to separate good
students from poor students. These students groups are defined by their scores
on the test as whole. The difference between the percentage of the top scoring
27% and bottom scoring 27% of students get the item right in its
discrimination index.20
As well as knowing how difficult an item is, it is important to know
how it discriminates, that is how well it distinguishes between students at
different levels of ability. If the item is working well, we should expect more
of the top-scoring students to know the answer than the low-scoring ones. If
the strongest students get the item wrong, while the weaker students get it
right, there is clearly a problem with the item, and it needs investigating.
Each item on the test should contribute to the total score and to the
meaning of that total score. Many times the purpose of a test is to discriminate
between groups of students, such as those who have mastered the domain of
content that the test represents and those students who have not achieved
mastery.
The discrimination index can range from -1 to +1. Items with positive
values of the discrimination index are desired because those are the items that
are contributing to the usefulness of the total score. When the discrimination
index is near zero, it indicates that the item is contributing nothing to the
20
H.J.X. Fernandes, Testing and Measurement, (Jakarta: National Educational
Planning, Evaluation and Curriculum Development, 1984), p. 27.
Page 26
15
discriminating power of the overall test.21
When a larger proportion of
students in the lower group got the item right more than those in the upper
group, it discriminates negatively. And since more students in the upper group
than in the lower group got the item right, it is discriminating positively. 22
Item discriminating power can be obtained by subtracting the number
of students in the lower group who got the item right (U) from the number of
students in the upper group who got the tem right (L) and dividing by the total
number of students in one group included in the item analysis (N).
It summarized in formula form, as below:
DI = U - L
N
Where:
DI = the index of discriminating power
U = the number of pupils in the upper group who answered the item
correctly
L = the number of pupils in the lower group who answered the item
correctly
N = number of pupils in each of the groups23
The classifications of the index of discriminating power (D) are:
DI = 0.70 – 1.00 = Excellent
0.40 – 0.70 = Good
0.20 – 0.40 = Satisfactory
≤ 0.20 = Poor
Negative value on D= Very poor 24
21
William Wiersma, Educational Measurement and…, p. 245. 22
Gilbert Sax, Principles of Educational and…, p. 191. 23
Charles D. Hopkins & Richard L. Antes, Classroom Measurement and Evaluation,
(Illinois: F.E. Peacock Publishers, Inc., 1990), p. 279. 24
Anas Sudijono, Pengantar Evaluasi Pendidikan, (Jakarta: PT. Raja Grafindo
Persada, 2006), p. 389.
Page 27
16
3. Types of Test Item
In constructing the test items, the test maker may choose from a
variety of item types. There are two types of test items: subjective test and
objective test.
a. Subjective test
Subjective test is attest which the examinee answers in his own words,
and at appropriate length, all or some of a relatively small number of
questions. Typical key-words in the question set in examinations of this
kind are: „discuss‟, „compare‟, „contrast‟, „describe‟, and the answer they
elicit may range from a single sentence to a dozen or more paragraphs.
These answers are commonly called „essays‟. Here are some subjectively
marked tests:
1) Short-Answer Items
The short answer (or completion) item is the only objective item
type that requires the examinee to supply, rather than select, the
answer. Its make-up is similar to a well-stated multiple choice item
without the alternatives. Thus, it consists of a question or incomplete
statement, to which the examinee responds by providing the
appropriate words, numbers, or symbols.
2) Essay
Essay tests are inefficient for measuring knowledge outcomes but
they provide a freedom of response that is needed in measuring certain
complex outcomes. These outcomes include the ability to create, to
organize, to integrate, to express and similar behaviors that call for the
production and synthesis of ideas.
The most noticeable characteristic of the essay test is the freedom
of response it provides. The student is asked a question that requires
him to produce his own answer. He is relatively free to decide how to
Page 28
17
approach the problem, what factual information to use, how to
organize his reply, and what degree of emphasis to give each aspect of
his answer. Thus the essay question places a premium on the ability to
produce, integrate, and express idea.
b. Objective test
Objective test is said to be one that may be scored by comparing
examinee responses with an establishing set of acceptable responses of
scoring key. Objective test can be scored objectively. That is, equally
competent scorers can score them independently and obtain the same
results. Objective test includes a variety of item types:
1) Multiple choice items
Multiple-choice refers to test items that require the students to
select one or more responses from a set of two or more options These
items consists of a stem, which presents a problem situation, and
several alternatives, which provide possible solutions to the problem.
The stem may be a question of an incomplete statement. The
alternatives include the correct answer and several plausible wrong
answers, called distracters. The function of the latter is to distract those
students who uncertain of the answer.25
Multiple choice items can measure a variety of learning outcomes,
ranging from simple to complex, and it is easy to see why this item
type is regarded so highly and used so widely.
2) Matching Items
Another selected-response item, sometimes called an objective
item, is the matching item. The format is not used as extensively as
true-false or multiple-choice items. But the matching item can be used
25
Norman E. Gronlund, Constructing Achievement..., p.38.
Page 29
18
effectively to measure learning and, when used, it provides variety in
the test format for both student and teacher.
The matching item is exactly what the name implies; it requires the
student to use some association criterion in order to match the words or
phrases that represent ideas, concepts, principles, or things. Matching
items are usually presented in two-column format: one column consists
of premises and the other consists of responses.26
3) True-False Items
The true-false item is simply a declarative statement that the
student must judge as true or false. There are modifications of this
basic form in which the student must respond “yes” or “no,” “agree”
or “disagree,” “right” or “wrong,” “fact” or “opinion,” and the like.
Such variations are usually given the more general name of
alternative-response items. In any event this item type is characterized
by the fact that only two responses are possible.27
True-false items can be effective when a few guidelines are
followed in the construction: Statements must be clearly true or false,
statements should not be lifted directly from the text, specific
determiners should be avoided, trick questions should not be used,
some statements should be written at higher cognitive levels, and true-
false items should be of the same frequency and length.28
3. The Importance of Item Analysis
Item analysis is an important and necessary step in the preparation of
good multiple-choice tests. Because of this fact, it is suggested that every
classroom teachers who use multiple choice test data should know something
of item analysis, how it is done and what it means.
26
William Wiersma, Educational Measurement and Testing, (Boston: Allyn &
Bacon, 1990), p. 48. 27
Norman E. Gronlund, Constructing Achievement…, p.54. 28
William Wiersma, Educational Measurement and…, p. 47.
Page 30
19
The benefits of item analysis are not limited to the improvement of
individual test items; however there are a number of fringe benefits of special
value to classroom teachers. The most important of these are the following:
a. Item analysis data provide a basis for efficient class discussion of the test
result.
b. Item analysis data provide a basis for remedial work.
c. Item analysis data provide a basis for the general improvement of
classroom instruction.
d. Item analysis procedures provide a basis for increased skill in test
construction.29
While Nitko states in his book, the important of item analysis are:
a. Determining whether an item functions as teacher intends,
b. Feedback to students‟ performance and as a basis for class discussion,
c. Feedback to the teacher about pupil‟s difficulties,
d. Area for curriculum improvement,
e. Revising the items,
f. Improving item writing skills.30
29
Robert L. Linn and Norman E. Grondlund, Measurement and…, p.316. 30
Anthony J. Nitko, Educational Test and…, p. 284.
Page 31
20
CHAPTER III
RESEARCH METHODOLOGY
A. Place and Time of Research
The research was conducted at “SMPN 87” Pondok Pinang. This is located
at Jl. Ciputat Raya Pondok Pinang, Kebayoran Lama, South of Jakarta. The writer
did the research in December 2010. The writer took the English summative test
papers and the students’ answer sheets of second grade period of 2010-2011 to be
analyzed.
B. Technique of Sample Taking
The writer took the sample from third year students of “SMPN 87”
Pondok Pinang. The total number of second year students is 238 students; those
are divided into 6 classes. The writer took 25% of the total number of the second
year students as a sample. That is 25% x 238 = 60 students. The writer used an
ordinal sampling to get the students’ answer sheet. The writer divides the students
into three groups; they are upper, middle, and lower group. Then the writer takes
upper and lower group only to be analyzed.
20
Page 32
21
C. Technique of Data Collecting
To collect data connecting with the topic of discussion, the writer came to
the school to get the permit from the headmaster to take students’ answer sheet
and the test question paper of English summative test of second year students of
“SMPN 87” Pondok Pinang to be analyzed.
D. Research Instrument
a. Students’ answer sheet
The students answer sheet is papers in which the students give their
answer that correspond to the English summative test. The English summative test
that the writer used is the final odd semester for the second year students of
“SMPN 87” Pondok Pinang academic year 2010-2011, prepared by MGMP.
b. English summative test of the second year student of “SMPN 87” Pondok
Pinang.
E. Technique of Data Analysis
In this research, the writer used quantitative method to analyze the
discriminating power of English summative test items of second year of “SMPN
87” by using a statistic formula, namely, the Discriminating Power Index:
DI = U - L
N
Where:
DI = the index of discriminating power
U = the number of pupils in the upper group who answered the item correctly
L = the number of pupils in the lower group who answered the item correctly
N = number of pupils in each of the groups1
1Charles D. Hopkins & Richard L. Antes, Classroom Measurement and Evaluation,
(Itasca: F.E. Peacock Publishers, Inc., 1990), p. 279
Page 33
22
The classifications of the index of discriminating power (D) are:
DI = 0.70 – 1.00 = Excellent
0.40 – 0.70 = Good
0.20 – 0.40 = Satisfactory
≤ 0.20 = Poor
Negative value on D = Very poor 2
2 Anas Sudijono, Pengantar Evaluasi…, p. 389.
Page 34
23
CHAPTER IV
RESEARCH FINDINGS
A. Description of Data
The data which is used by the writer is the English summative test in the
odd semester of second grade of “SMPN 87” Pondok Pinang. This English
summative test was held on Wednesday, December 8th
2010, that must be finished
in 120 minutes. The total numbers of test items are 50 questions, which all of
them are multiple choice items.
The total numbers of students that took part in this analysis are 60
students. Kelley in the book “Classroom Measurement and Evaluation”
demonstrated that the selection of criterion groups based upon the upper 27
percent and lower 27 percent of the papers provide the greatest confidence that the
upper group is superior in the trait measured by the test as compared to the lower
group. The middle 46 percent of the papers is not used when 27 percent in the
upper and 27 percent in the lower groups are employed in item analysis.1 Based
on that statement, the writer classified the students into three groups; upper,
middle and lower group. The writer took only 27% of the lower group and 27% of
the upper group for this analysis. And the rest students that belong to the middle
1 Charles D. Hopkins, Richard L. Antes, Classroom Measurement and
Evaluation,(Itasca: F.E. Peacock Publishers, Inc., 1990), p. 275
23
Page 35
24
group will not take part to this analysis. The next table is the students’ scores and
group position in English summative test.
Table 4.1
The Students’ Scores and Group Position of English Summative Test
In the Odd Semester
No Score Explanation
1 82 U
P
P
E
R
G
R
O
U
P
2 80
3 78
4 76
5 76
6 74
7 74
8 74
9 74
10 72
11 72
12 72
13 72
14 64
15 64
16 64
17 64 M
18 62
Page 36
25
19 62
I
D
D
L
E
G
R
O
U
P
20 60
21 60
22 60
23 60
24 60
25 58
26 58
27 58
28 58
29 58
30 58
31 58
32 56
33 56
34 56
35 56
36 56
37 54
38 54
39 54
40 54
41 54
42 54
43 52
Page 37
26
44 52
45 52 L
O
W
E
R
G
R
O
U
P
46 50
47 50
48 50
49 48
50 46
51 46
52 44
53 44
54 42
55 40
56 38
57 34
58 34
59 34
60 32
Table 4.1 shows that students who are taking the test are classified into 3
groups: upper group, middle group and lower group. The writer took 27% or 16
students from upper and lower group to be analyzed. The highest score in upper
group is gained by one student in score 82. The lowest score in upper group is
gained by three students in the same score 64. Meanwhile the highest score in
lower group is gained by one student in the same score 52. So, the lowest score in
lower group is gained by one student in score 32.
Page 39
28
Table 4.2
The Students' Answer Sheet of English Summative Test Items from the Upper Group
No
Students' Number of items
score 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Answer key * B D C D B B D B B B A C C B C D C C B C D A B D
1 82 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1
2 80 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1
3 78 0 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 0 1 0 1
4 76 0 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 1 0 1 0 1
5 76 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1
6 74 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1
7 74 0 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1
8 74 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1
9 74 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1
10 72 0 1 1 1 0 0 1 1 1 1 1 0 1 0 1 1 1 0 0 1 0 0 1 1 1
11 72 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0
12 72 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 0 0 1 0 0
13 72 0 1 1 0 1 0 0 1 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1
14 64 0 1 1 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 1
15 64 0 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1
16 64 0 1 1 0 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 0 1 1 1
Correct answer 0 16 16 12 15 13 11 12 12 15 13 1 14 11 15 11 14 10 10 10 5 3 16 7 14
Page 40
29
No
Students' Number of Items
score 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Answer key A D A B B D B D A B C D C C D B C B B D B B D A D
1 82 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1
2 80 0 1 1 0 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0
3 78 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 1 1 0 1 1 1
4 76 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1
5 76 0 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1
6 74 0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0
7 74 0 0 1 0 0 1 0 1 1 1 1 0 1 0 1 1 1 1 0 1 1 0 1 1 0
8 74 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1
9 74 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1
10 72 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0
11 72 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1
12 72 1 1 1 1 1 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0
13 72 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1
14 64 0 1 0 1 0 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 1 1
15 64 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1
16 64 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 1
Correct answer 7 9 14 12 13 8 1 12 13 14 16 11 11 10 16 12 16 15 7 15 13 12 15 14 11
Page 42
31
Table 4.3
The Students' Answer Sheet of English Summative Test Items from the Lower Group
No
Students' Number of Items
score 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Answer key * B D C D B B D B B B A C C B C D C C B C D A B D
1 52 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 1 1 1
2 50 0 0 1 1 0 0 1 1 1 0 0 0 1 1 0 1 0 1 0 1 0 0 0 1 0
3 50 0 1 1 1 0 1 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1
4 50 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0
5 48 0 1 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1
6 46 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 0 0 1 0 1 0 0 1 1 0
7 46 0 0 1 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0
8 44 0 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 1
9 44 0 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0
10 42 0 0 1 1 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 0 1 0 0
11 40 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 1 1 1
12 38 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 1
13 34 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0
14 34 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 0
15 34 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1
16 32 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 0 0
Correct answer 0 7 16 11 5 6 2 6 11 9 7 1 10 10 4 2 1 9 6 9 0 0 13 12 7
Page 43
32
No
Students Number of Items
score 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Answer key A D A B B D B D A B C D C C D B C B B D B B D A D
1 52 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1
2 50 1 1 1 1 1 0 0 0 1 0 1 1 0 1 1 0 1 0 0 0 0 1 1 0 0
3 50 1 0 1 0 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 0
4 50 0 0 1 0 0 1 0 0 0 0 1 1 1 0 0 0 1 1 0 1 0 1 1 1 1
5 48 1 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1
6 46 1 1 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0
7 46 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 0 0 0 1
8 44 1 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 1 0 0
9 44 0 1 0 1 1 0 1 1 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0
10 42 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0
11 40 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 0
12 38 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0
13 34 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 1 1 1 1 1
14 34 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 1 0 0 1
15 34 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 1
16 32 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0
Correct answer 10 7 8 7 4 4 2 9 6 2 8 6 7 2 8 2 16 7 2 12 4 7 7 9 7
Page 44
33
From the table 4.3 above it can be concluded that the responses of each
item of lower group students in their English test are:
1. There are 16 students who answered the item no 3 and 42 correctly.
2. There are 13 students who answered the item no 23 correctly.
3. There are 12 students who answered the item no 24 and 45 correctly.
4. There are 11 students who answered the item no 4 and 9 correctly.
5. There are 10 students who answered the item no 13, 14 and 26 correctly.
6. There are 9 students who answered the item no 10, 18, 20, 33, and 49
correctly.
7. There are 8 students who answered the item no 28, 36, and 40 correctly.
8. There are 7 students who answered the item no 2, 11, 25, 27, 29, 38, 43,
47, 48, and 50 correctly.
9. There are 6 students who answered the item no 6, 8, 19, 34, and 37
correctly.
10. There are 5 students who answered the item no 5 correctly.
11. There are 4 students who answered the item no 15, 30, 31 and 46 correctly.
12. There are 2 students who answered the item no 7, 16, 32, 35, 39, 41 and 44
correctly.
13. There is 1 student who answered the item no 12 and 17 correctly.
14. There is no student who answered the item no 1, 21 and 22 correctly.
Page 45
34
Before the writer analyzes the data, the writer has calculated the data into
statistic calculation. The writer used Discrimination Index formula to find the
discriminating power criteria of English summative test. The table is as follows:
Table 4.4
The Discriminating Power Index of the Upper and Lower Group
Item
number
Total correct answer
U - L DI = U - L
N Remark* Upper
Group
Lower
Group
1 0 0 0 0 Poor
2 16 7 9 0.56 Good
3 16 16 0 0 Poor
4 12 11 1 0.06 Poor
5 15 5 10 0.62 Good
6 13 6 7 0.43 Good
7 11 2 9 0.56 Good
8 12 6 6 0.37 Satisfactory
9 12 11 1 0.06 Poor
10 15 9 6 0.37 Satisfactory
11 13 7 6 0.37 Satisfactory
12 1 1 0 0 Poor
13 14 10 4 0.25 Satisfactory
14 11 10 1 0.06 Poor
15 15 4 11 0.68 Good
16 11 2 9 0.56 Good
17 14 1 13 0.81 Excellent
18 10 9 1 0.06 Poor
19 10 6 4 0.25 Satisfactory
20 10 9 1 0.06 Poor
21 5 0 5 0.31 Satisfactory
22 3 0 3 0.18 Poor
23 16 13 3 0.18 Poor
Page 46
35
24 7 12 -5 -0.31 Very poor
25 14 7 7 0.43 Good
26 7 10 -3 -0.18 Very poor
27 9 7 2 0.13 Poor
28 14 8 6 0.37 Satisfactory
29 12 7 5 0.31 Satisfactory
30 13 4 9 0.56 Good
31 8 4 4 0.25 Satisfactory
32 1 2 -1 -0.06 Very Poor
33 12 9 3 0.18 Poor
34 13 6 7 0.43 Good
35 14 2 12 0.75 Excellent
36 16 8 8 0.50 Good
37 11 6 5 0.31 Satisfactory
38 11 7 4 0.25 Satisfactory
39 10 2 8 0.50 Good
40 16 8 8 0.50 Good
41 12 2 10 0.62 Good
42 16 16 0 0 Poor
43 15 7 8 0.50 Good
44 7 2 5 0.31 Satisfactory
45 15 12 3 0.18 Poor
46 13 4 9 0.56 Good
47 12 7 5 0.31 Satisfactory
48 15 7 8 0.50 Good
49 14 9 5 0.31 Satisfactory
50 11 7 4 0.25 Satisfactory
*note: the classification of remark is adopted by Anas Sudijono, Pengantar
Evaluasi Pendidikan, (Jakarta: PT. Raja Grafindo Persada, 2006), p. 389.
Page 47
36
Based on the data above, the percentage of discriminating power of
English summative test is:
Table 4.5
The percentage of Discriminating Power
No Discriminating
power Total item Percentage Item number
1 Excellent 2 4% 17, 35
2 Good 16 32%
2, 5, 6, 7, 15, 16, 25,
30, 34, 36, 39, 40, 41,
43, 46, 48
3 Satisfactory 15 30%
8, 10, 11, 13, 19, 21,
28, 29, 31, 37, 38, 44,
47, 49, 50
4 Poor 14 28%
1, 3, 4, 9, 12, 14, 18,
20, 22, 23, 27, 33, 42,
45
5 Very Poor 3 6% 24, 26, 32
The table above showed that: there are 2 test items (4%) are categorized
into excellent test item, which is showed by the test items number 17 and 35. It is
categorized as excellent test item because its discriminating index is in range
between 0.70 – 1.00. There are 16 test items (32%) are categorized into good
items, that range from 0.40 – 0.69, they are test items number 2, 5, 6, 7, 15, 16,
25, 30, 34, 36, 39, 40, 41, 43, 46, and 48. There are 15 test items (30%) are
categorized as satisfactory test items for their discriminating index are in range
0.20 – 0.39, they are test items number 8, 10, 11, 13, 19, 21, 28, 29, 31, 37, 38, 44,
47, 49, and 50.
Meanwhile, 14 test items (28%) are categorized into poor test items
because their discriminating index are range in 0.00 – 0.19, they are test items
number1, 3, 4, 9, 12, 14, 18, 20, 22, 23, 27, 33, 42, and 45. At last, there are 3 test
Page 48
37
items (6%) are categorized as very poor item as their discriminating index are
range in negative values.
B. Data Analysis
In analyzing the discriminating power of the data, the writer listed the
students’ responses of each number of the test firstly. The list can be seen in the
table 4.2 and 4.3 of this “skripsi”.
Then the next step is to make a format of item analysis. This format and
the result of this format labeled table 4.4. The last step is to count discriminating
power of all items using this formula:
DI = U - L
N
Where:
DI = the index of discriminating power
U = the number of pupils in the lower group who answered the item correctly
L = the number of pupils in the lower group who answered the item correctly
N = number of pupils in each of the groups
The result of this last step can be seen also in the table 4.4 In this table,
result of each item will be in decimal then the writer categorized each item
according to this formula:
The classifications of the index of discriminating power (D) are:
DI= 0.70 – 1.00 = Excellent
0.40 – 0.70 = Good
0.20 – 0.40 = Satisfactory
≤ 0.20 = Poor
Negative value on D = Very poor
Based on the data of item analysis result in discriminating power above,
the writer can conclude that from 50 items:
1. There are 33 test items (66%) are categorized into good test items which is
range from 0.25 – 0.81.
Page 49
38
2. There are 14 test items (28%) are categorized into poor test items because
their discriminating index are range in 0.00 – 0.18.
3. There are 3 test items (6%) are categorized as very poor item as their
discriminating index are range in negative values that -0.06 – -0.31
C. Data Interpretation
For whole items, the writer can interpret that the discriminating power of
English summative test prepared by “MGMP” tested at the second grade of
“SMPN 87” Pondok Pinang belongs to good discriminating power, because there
are 33 test items or 66% from 50 test items is ranging from 0.25 – 0.81.
Page 50
39
Based on the table 4.2 on the previous page, the writer concluded the
achievement of upper group students in their English test. From 50 multiple
choice items, none of the students got the perfect score. The following description
tells about the responses of each item:
1. There are 16 students who answered the item no 2, 3, 23, 36, 40, and 42
correctly.
2. There are 15 students who answered the item no 5, 10, 15, 43, 45, and 48
correctly.
3. There are 14 students who answered the item no 13, 17, 28, 35, and 49
correctly.
4. There are 13 students who answered the item no 6, 11, 30, 34, and 46
correctly.
5. There are 12 students who answered the item no 4, 8, 9, 29, 33, 41, and 47
correctly.
6. There are 11 students who answered the item no 7, 14, 16, 37, 38, and 50
correctly.
7. There are 10 students who answered the item no 18, 19, 20, and 39
correctly.
8. There are 9 students who answered the item no 27 correctly.
9. There are 8 students who answered the item no 31 correctly.
10. There are 7 students who answered the item no 24, 26, and 44 correctly.
11. There are 5 students who answered the item no 21 correctly.
12. There are 4 students who answered the item no 25 correctly.
13. There are 3 students who answered the item no 22 correctly.
14. There is 1 student who answered the item no 12 and 32 correctly.
15. There is no student who answered the item no 1 correctly.
Page 51
38
CHAPTER V
CONCLUSION AND SUGGESTION
A. Conclusion
Based on the analysis and the interpretation in the previous chapter, the
writer would like to conclude that the English summative test which is tested at
second grade of “SMPN 87” Pondok Pinang, can be categorized into 5 different
range of discrimination power index. First, they are 2 test items (4%) that is
categorized into excellent test items. Then, there are 16 test items (32%) that are
categorized into good test items. Besides that, 15 test items (30%) are categorized
into satisfactory test items. Fourth, there are 14 test items (28%) are categorized as
poor test items. Lastly, 3 test items (6%) are categorized into very poor test items.
So, there are 33 test items (66%) of English summative test regarded as a
good discriminating power that range from 0.25 – 0.81 and it can be used for the
next test. Meanwhile, 14 test items (28%) are needed to be revised for their poor
value in differentiating the ability of the upper from the lower group that range
from 0.00 – 0.18. And 3 test items (6%) have to be eliminated, because those
items have negative discrimination index that range from -0.06 - -0.31.
From the explanation above, the writer concludes that the English
summative test which is tested at second grade of “SMPN 87” Pondok Pinang has
good discriminating power, because 33 items (66%) of the test items have fulfilled
the criteria of a positive discriminating power which range from 0.25 – 0.81.
38
Page 52
39
B. Suggestion
After doing the research, there are some suggestions that can be given
in relation to the writer’s conclusion. The suggestions are as follows:
1. Teachers have to give good techniques in answering the items. For
instance, encourage the students to do the easier items and not to be
stuck to the difficult items. This technique should be common used by
the students so that they will not waste their time.
2. Teachers should save test items which have satisfactory, good and
excellent criteria in order can be used by the teachers for the future
evaluation
3. Teachers should revise the test items which have poor criteria and
discard those which have very poor criteria, so that they can be used
for the next evaluation.
Page 53
40
BIBLIOGRAPHY
Anderson, J. Charles, et.al. Language Test Construction & Evaluation,
Melbourne: Cambridge University Press, 1995.
Bahman, Lyle F. Statistical Analysis for Language Assessment, Cambridge:
Cambridge University Press, 2004.
Brown, H. Douglas. Teaching by Principles, An Interactive Approach to
Language Pedagogy, White Plains: Addison Wesley, Longman, 2001.
Fernandes, H.J.X. Testing and Measurement, Jakarta: National Educational
Planning, Evaluation and Curriculum Development, 1984.
Gay, L.R. Educational Evaluation and Measurement, New York: Macmillan, Inc.,
1985.
Gronlund, Norman E. Measurement and Evaluation in Teaching, New York:
Macmillan Publishing Co., Inc, 1981.
Genesee, Freed and John A.Upshur. Classroom – Based Evaluation in Second Language,
New York: Cambridge University Press, 1996.
Heaton, J.B. Writing Language Test, Longman: 1998.
Hopkins, Charles D. and Antes, Richard L. Classroom Measurement and
Evaluation, Itasca: F.E. Peacock Publishers, Inc., 1990.
Hughes, Arthur. Testing for Language Teachers, Cambridge: Cambridge
University Press, 2003.
Lado, Robert. Language Testing, London: Longman Group Limited, 1983.
Madsen, Harold S. Techniques in Testing, New York: Oxford University Press,
1983.
McNamara, Tim. Language Testing, Oxford: Oxford University Press, 2000.
Nitko, Anthony J. Educational Test and Measurement, an Introduction, New
York: Harcourt B Race Jovanovich, Inc, 1983.
Sax, Gilbert. Principles of Educational and Psychological Measurement and
Evaluation, Belmont: University of Washington, 1980.
Sudijono, Anas. Pengantar Evaluasi Pendidikan, Jakarta: PT. Raja Grafindo
Persada, 2006.
Page 54
41
Wiersma, William. Educational Measurement and Testing, Boston: Allyn &
Bacon, 1990.