8/2/2019 Testing New 2007 Second Session
1/64
Chapter Six, Characteristics of a Good Test
8/2/2019 Testing New 2007 Second Session
2/64
1. What are the characteristics of a good test?
Reliability,
validity, and
practicality
8/2/2019 Testing New 2007 Second Session
3/64
2. What is the main concern of the reliability? The notion of consistency of ones score with respect to
ones average score over repeated administrations is the
central concern of the concept of reliability.
8/2/2019 Testing New 2007 Second Session
4/64
3. What is the classical test theorys assumption about the concept of reliability? The fact that repeated measurements of some
attributes of the same individual almost never duplicate
one another is called unreliability.
On the other hand the tendency toward consistency
from one set of measurement to the next is calledreliability.
8/2/2019 Testing New 2007 Second Session
5/64
4. What is meant by systematic variation or predictable change of ones score in
different administrations of a test?
Those are some of the changes in scores which are due
to some sort of learning.
8/2/2019 Testing New 2007 Second Session
6/64
5. What is meant by unsystematic variation or unpredictable change of ones
score in different administrations of a test?
Those are some of the changes in students score which
are due to the other factors rather than learning and
may not be predictable.
8/2/2019 Testing New 2007 Second Session
7/64
6. What is the difference between the observedscore and true score?
The observed score includes the measurement error or
error score but the true score is the errorless score.
8/2/2019 Testing New 2007 Second Session
8/64
7. When is the observed score equal to the true score?
When there is absolutely no error of measurement, the
observed score is equal to true score.
8/2/2019 Testing New 2007 Second Session
9/64
8. What are the possibilities for the relationship between the observed and the
true score?
The observed score can be greater than, equal to, or
smaller than the true score.
X= T , X> T , X
8/2/2019 Testing New 2007 Second Session
10/64
9. How is the variance of observed score calculated?
The magnitude of the observed variance equals themagnitude of true variance plus the magnitude of theerror variance.
Vx= Vt+ Ve
8/2/2019 Testing New 2007 Second Session
11/64
10. What is the technical definition for the term
reliability?
It is the consistency of scores produced by a given test.It is the ration of true score variance to the observedscore variance.
r=
8/2/2019 Testing New 2007 Second Session
12/64
11. What is the formula for the reliability?
The reliability is the total variance in test scores minusthe error variance. R=
8/2/2019 Testing New 2007 Second Session
13/64
12. What does a reliability of zero mean?
It means that all observed variation is due to error.
When there is the greatest amount of error in
measurement, the reliability will equal zero.
8/2/2019 Testing New 2007 Second Session
14/64
13. What is standard error of measurement (SEM)?
It is the standard deviation of all error scores obtained
from a given measure in different situations.
SEM= Sx
8/2/2019 Testing New 2007 Second Session
15/64
14. What is the relationship between reliability
and SEM?
There is a negative relationship between reliability and
SEM. The higher the reliability, the smaller the
standard error of measurement. By the same token, the
lower the reliability, the greater the SEM.
8/2/2019 Testing New 2007 Second Session
16/64
15. How true score is estimated from SEM and
observed score?
The true score is interpreted within the range of plus or
minus one SEM from the observed score.
EX: X= 14, SEM= 3 , true score= 14+ 3
14+3= 17 , 14-3=11
8/2/2019 Testing New 2007 Second Session
17/64
16. Why is the calculating the real magnitude of reliability impossible? What is
the implication of this issue?
Calculating is impossible because measuring the true
score is impossible. The implication is that the
mathematical value of reliability is always estimates
rather than calculated.
8/2/2019 Testing New 2007 Second Session
18/64
17. What are different practical methods of estimating
reliability?
Correlation,
test-retest method,
parallelforms method,
split half method,
KR-21 method
8/2/2019 Testing New 2007 Second Session
19/64
18. Which factors influence reliability?
The effect of testees,
the effect of test factors,
the structure of the test,
the effect of administration factor,
the influence of scoring factors
8/2/2019 Testing New 2007 Second Session
20/64
19. What is reliability?
Reliability is the consistency of scores produced by a
given test. Or it is the total variance in test scores
minus the error variance.
8/2/2019 Testing New 2007 Second Session
21/64
20. What does the difference between the scores of the
two administrations contribute to? The difference contributes to the unreliability of the
scores or the degree of the error in measurement.
8/2/2019 Testing New 2007 Second Session
22/64
The correlation coefficient between two sets of scores
obtained from two administrations of the same test to
the same group should be calculated.
21. How is the reliability calculated from the correlation
coefficient?
8/2/2019 Testing New 2007 Second Session
23/64
22. How is reliability estimated in test- retest method?
The reliability is obtained through administrating a
given test to a particular group twice and calculating
the correlation between the two sets of scores.
8/2/2019 Testing New 2007 Second Session
24/64
23. What are the disadvantages of test- retest
method? a. It requires two administrations (it is difficult to
arrange two testing sessions for the same group).
b. human beings are intelligent and dynamic creatures.
c. there is the test effect, especially when the interval is
short.
8/2/2019 Testing New 2007 Second Session
25/64
24. What is the main procedure for estimating reliability in (Test-
retest, parallel forms, split-half and KR-21 methods?
For the first three methods correlational procedure and
for the KR-21 a formula is used.
8/2/2019 Testing New 2007 Second Session
26/64
25. Why is the test- retest method called as reliability of
scores over time?
Since there should be a reasonable amount of time
between the two administrations in test- retest method,
it is referred to as reliability of scores over time.
8/2/2019 Testing New 2007 Second Session
27/64
26. Which factors affect the interval time in testretest method?
Memory and change. The longer the interval, the more
change will occur in the testees behavior but less
memory factor will exist; the shorter the interval, the
less change will occur in the testees behavior, but more
memory factor will exist.
8/2/2019 Testing New 2007 Second Session
28/64
27. What is the main disadvantage of test- retest
method? The major disadvantage is the difficulties involved in
administering a single test to the same group twice.
8/2/2019 Testing New 2007 Second Session
29/64
28. How is reliability estimated in parallel forms method?
Two parallel forms of the same test are administered to
a group of examinees just once. Then the correlation
coefficient between the scores of the two forms will be
an estimate of reliability.
8/2/2019 Testing New 2007 Second Session
30/64
29. What is the disadvantage of parallel forms method?
Constructing two parallel forms of a test is not an easy
task.
8/2/2019 Testing New 2007 Second Session
31/64
30. What are the main issues in constructing two
parallel forms of a test? a. the table of specifications for the two forms of the
test must be the same.
b. the components of the two tests. i.e., subtests should
also be the same.
8/2/2019 Testing New 2007 Second Session
32/64
31. What is the main idea behind the splithalf method?
The main idea is that the items comprising a test are
homogeneous.
8/2/2019 Testing New 2007 Second Session
33/64
32. What is the main assumption behind split- half
method? The assumption is similar to the assumptions of
parallel forms. Here parallel forms of the items in a
single test are important.
8/2/2019 Testing New 2007 Second Session
34/64
33. Why the split- half method is called internal
consistency of the test scores, too?
The method assumes that there is an internal
homogeneity among the items.
8/2/2019 Testing New 2007 Second Session
35/64
34. How is the reliability obtained in split- half method?
The correlation between the two halves is an estimate
of the test reliability.
8/2/2019 Testing New 2007 Second Session
36/64
35. What is the appropriate procedure for dividing a test
into two halves?
Test developers should select odd items for one half and
even items for the other. Through this procedure, easy
and difficult items will be equally distributes in the two
halves.
8/2/2019 Testing New 2007 Second Session
37/64
36. What is the formula for estimating the total test reliability in split- half method? *the
total test reliability will always be higher than the reliability of half of the test.
r=
8/2/2019 Testing New 2007 Second Session
38/64
37. What are the main advantages of split half method?
a. it is more practical than others.
b. there is no need to administer the same test twice.
c. it not necessary to develop two parallel forms of the
same test.
8/2/2019 Testing New 2007 Second Session
39/64
38. What is the disadvantage of split- half method? Developing a test with homogeneous items is difficult.
8/2/2019 Testing New 2007 Second Session
40/64
39. What is the assumption behind KR-21 method?
The assumption is that all items in a test are designed
to measure a single trait.
8/2/2019 Testing New 2007 Second Session
41/64
40. What is the formula in KR-21 method? (KR-21) r= { }{1- }
8/2/2019 Testing New 2007 Second Session
42/64
41. What are the advantages of KR-21 method?
a. it does not require double administrations.
b. It does not require the utilization of correlational
procedure.
8/2/2019 Testing New 2007 Second Session
43/64
42. What are the factors which influence reliability?
a. the effects of testees.
b. the effect of test factors.
8/2/2019 Testing New 2007 Second Session
44/64
43. How do the testees affect the test reliability? a. the psychological conditions of the testees and the
dynamic nature of human attributes affect reliability.
b. the homogeneity of the testees ability on the
measured attribute influence the reliability.
8/2/2019 Testing New 2007 Second Session
45/64
44. How do the test factors affect the reliability of a test?
a. The structure of the content of the test.
b. administration procedure of the test
c. and the scoring process of the test are the teat factors
which influence reliability.
8/2/2019 Testing New 2007 Second Session
46/64
45. Which parameters of the structure of the test affect reliability?
a. homogeneity of the items.
b. the speed with which a test is performed.
c. the length of the test.
8/2/2019 Testing New 2007 Second Session
47/64
46. What is the relationship between the homogeneity of the
items and reliability?
The more homogeneous the test items, the more
consistent score it will produce.
8/2/2019 Testing New 2007 Second Session
48/64
47. What is the relationship between the length of the
test and reliability? The longer the test the more reliable the test will be.
8/2/2019 Testing New 2007 Second Session
49/64
a. the ambiguity of instructions .
b. the time of administration and the extent of
interaction between the tester and the testees.
c. and some irregularities in the administration
process(ex: regular time announcement, giving extraexplanations and answering the questions) .
e. environment of testing
48. What are the main reasons for the fluctuations of
scores in test administration?
8/2/2019 Testing New 2007 Second Session
50/64
49. What is meant by validity?
It refers to the extent to which a test measures what is
supposed to measure. It is concerned with whether the
test is achieving what is intended to or not.
8/2/2019 Testing New 2007 Second Session
51/64
50. What are different types of validity? a. content validity,
b. criterion- related validity,
c. construct validity
8/2/2019 Testing New 2007 Second Session
52/64
51. What is content validity? Content validity refers to the degree of correspondence
between the test content and the content of the
materials to be tested.
8/2/2019 Testing New 2007 Second Session
53/64
52. Why content validity is called the appropriateness
of the test? Because the focus of content validity is on the
appropriacy of the elements included in the test and
appropriacy of the learning level, it is called the
appropriateness of the test.
8/2/2019 Testing New 2007 Second Session
54/64
53. What are the two major criteria for a test to be
content valid? a. the content of the test should be selected
appropriately to correspond to the content of the
materials to be tested.
b. Second, the test should be aimed at measuring the
appropriate level of the students learning.
8/2/2019 Testing New 2007 Second Session
55/64
54. What is the main drawback of content validity?
There is no numerical expression for content validity
and it provides just subjective information about the
appropriateness of the test.
8/2/2019 Testing New 2007 Second Session
56/64
55. What are the ways to avoid too much subjectivity
of judgments about the content of a test? a. One way is to have the test reviewed by more than
one expert.
b. another is to define the content to be tested in
detailed terms on a table of specification.
8/2/2019 Testing New 2007 Second Session
57/64
56. What is face validity?
It is simply whether the test looks valid on the face of it
or not.
8/2/2019 Testing New 2007 Second Session
58/64
57. What is criterion related validity?
It investigates the correspondence between the scores
obtained from the newly developed test and the scores
obtained from some independent outside criteria.
8/2/2019 Testing New 2007 Second Session
59/64
58. What are different types of criterion- related
validity? a. concurrent validity: the newly developed test is
administered concurrently with another well- known ,
reputable test of which the validity is already
established.
b. Predictive validity: the administration of the newlydeveloped test and the reputable test is not concurrent
but in some time interval.
8/2/2019 Testing New 2007 Second Session
60/64
59. What is construct validity? How can the test
developers determine it? It examines whether the test measures what is
purported to measure.
Construct validity can be determined through
utilization of sophisticated statistics called factor
analysis.
8/2/2019 Testing New 2007 Second Session
61/64
60. What are the factors which influence validity? a. directions:(directions should be quite clear and
simple)
b. difficulty level of the test: (too easy or too difficult
items will reduce test validity)
c. structure of the items: (poorly constructed andambiguous items will contribute to the invalidity of thetest)
d. arrangement of items and correct responses:(theitems should be arranged from easy to difficult)
8/2/2019 Testing New 2007 Second Session
62/64
61. What does practicality of a test depend on? a. ease of administration,
b. ease of scoring,
c. cost of testing
d. ease of interpretation of scores
e. ease of application of scoresf. availability of comparable forms
8/2/2019 Testing New 2007 Second Session
63/64
62. Why is validity more important than reliability?
A reliable test may or may not be valid, but a valid test
is, to some extent, reliable.
8/2/2019 Testing New 2007 Second Session
64/64
63. What is the difference between reliability and
validity? Reliability is an independent statistical concept and its
computation depends on a set of scores. Validity, on the
other hand, has a direct correspondence to the content
of the test.