1
1
2
Outline
Norm-Referenced Reliability Procedures
Criterion-Referenced Reliability Procedures
Norm-Referenced Validity Procedures
Norm-Referenced Item Analysis Procedures
Criterion-Referenced Validity Assessment
Criterion-Referenced Item Analysis Procedures
3•3
4
Affects
Affects
Affects
5
Norm-Referenced Reliability
Estimated by: Estimated by:
Criterion-Referenced Reliability
Estimated by: Estimated by:
Test-retest Parallel form Internal consistency
Test-retest Parallel form Inter-rater and intra-ratter
6
Test-retest Parallel form Internal consistency
-For affective measures
1. Administer instrument under standardized conditions.
2. Re-administer instrument with same conditions.
3. Determine the correlation between the two scores.
-Administer the two forms of instrument and identify the correlation between them.
-The two forms should have equal mean and standard deviation, equal correlation with a third variable, and being constructed with the same objective and procedure.
- Can assess equivalence & stability.
-For cognitive measures.
•Consistency of a single measure on one occasion.
•Cronbach’s alpha is the preferred index of internal consistency reliability. Why?
•KR 20 & KR 21 are special cases of alpha, used when data are dichotomous
7
αα coefficient coefficient
•Equal to the mean of the all possible split-half coefficients associated with a set of data
• Indicator for the consistency of items in the same instrument.
•Affected by test length, total test variance, and shape of the resulting distribution of test scores, and response rate.
8
Reliability of Subjectively Scored Measures
9
Estimating the Reliability of Change in Testing LengthEstimating the Reliability of Change in Testing Length
10
Criterion-Referenced Reliability
11
•The ability of a measure to consistently classify objects or persons into the same categories on two separate occasions.
12
Procedure for Test-retest for Criterion-Referenced Reliability
13
14
15
•The two parallel forms assess the same content domain and have The two parallel forms assess the same content domain and have relatively homogeneous items.relatively homogeneous items.
16
Inter-rater and intra-ratter/Inter-rater and intra-ratter/ Criterion-Referenced Reliability
•Error of standardsError of standards•Halo errorHalo error•Logic errorLogic error•Similarity errorSimilarity error•Central tendency errorCentral tendency error
Expected Rating errorsExpected Rating errors
17
18
19
20
o
21
22
23
Contrasted Groups Approach
24
25
26
Confirmatory Factor Analysis (CFA) Procedure
27
28
29
1. Reliability3.Construct
Validity
4.Discriminant
Validity
2. Convergent
Validity
Same Same ConstructConstruct
Different Different ConstructsConstructs
Sam
e Sa
me
Met
hod
Met
hod
Diff
eren
t D
iffer
ent
Met
hods
Met
hods
Method 1: rating scale
Method 2: checklist
ConstructConstruct 1: bonding
ConstructConstruct 2: perinatal care
1- Reliability should be high as a prerequisite for validity
2- Convergent validity should be high (correlation between different methods measuring the same construct).
3- Construct validity is evidenced when heterotrait-monomethod correlations be lower than correlations mentioned in point 2 (function of trait not method).
4- Discriminant validity is evidenced when heterotrait-heteromethod correlations are the lowest among all previously mentioned correlations.
Example
This example can be applied for more than 2 constructs
Multitrait- Multimethod Approach
30•http://www.socialresearchmethods.net/kb/mtmmmat.php
31
Criterion-Related Validity/Norm-Referenced
32
Criterion-Related Validity/Norm-Referenced
33
Norm-Referenced Item Analysis Procedures
34
35
2-Discrimination Index
36
Item No.
Proportion of Correct answers in Group
Item DiscriminationIndex (D)
(range from -1.00 to +1.00)
Upper 1/4 Lower 1/4
1 90% 20% 0.72 80% 70% 0.13 100% 0% 14 100% 100% 05 50% 50% 06 20% 60% - 0.4
Adapted from : www. distance.fsu.edu/docs/
A negative D value usually indicates that an item is faulty and needs improvement because the item is not discriminating in the same way as the total test.
A positive D value is desirable
•D values greater than +0.20 are desirable for a norm-referenced measure.
37
38
39
Focuses on:
We need evidence for Content validity Construct validity Decision validity Criterion related validity (predictive validity and concurrent
validity).
Criterion-referenced Validity Assessment
40
41
1- Content specialists 2-Determination of Interrater Agreement
3-Average Congruency/Percentag
e
•Two or more content specialists examine the format and content of each item.
•Item-objective congruence focuses on content validity at the item level.
•If more than one objective is used for a measure, the items that are measures of each objective usually are treated separately.
•Interrater Agreement can be evaluated by:
•1- The index of content validity (CVI).
•2- P0 and K as measures of inter-rater agreement with acceptable levels (P0 ≥ 0.80, & K ≥ 0.25).
• Too low P0 and K are indicators of ???
The percentage of items rated congruent by each judge is calculated. • The mean percentage for all judges is the “average congruency percentage”.
• An average congruency percentage of 90% or higher is acceptable .
42
43
44
1-Item-Objective or Item-Subscale Congruence
2-Item Difficulty 3-Item Discrimination
-Based on the ratings of two or more content specialists who assign a value of +1 (definitely measure), 0 (undecided), or -1 (not a measure) for each item upon the item’s congruence with the measure’s objective
-The Index is computed based on formula (6.1) , & ranges from (-1 to +1).
•Item p level is calculated for each item.
•The item p level should be higher for the group that is known to possess more of a specified trait or attribute than for the group known to possess less
•Focus on measurement of performance changes (e.g., pretest/posttest) or differences (e.g., experienced/ inexperienced) between the groups.
-Referred to as (D‘) is directly related to the property of decision validity.
45•A useful adjunct item-discrimination index is provided through the use of Po or K
•Usually a negative discrimination index is due to a faulty item.
46
1. McBride, D. L., LeVasseur, S. A., & Li, D. (2013). Development and Validation of a Web-Based Survey on the Use of Personal Communication Devices by Hospital Registered Nurses: Pilot Study. JMIR research protocols, 2(2).
2. Sriratanaprapat, J., Chaowalit, A., & Suttharangsee, W. (2012). Development and Psychometric Evaluation of the Thai Nurses' Job Satisfaction Scale. Pacific Rim International Journal of Nursing Research, 16(3).
3. Yildirim, Y., Tokem, Y., Bozkurt, N., Fadiloglu, C., Uyar, M., & Uslu, R. (2011). Reliability and validity of the Turkish Version of the Memorial Symptom Assessment Scale in cancer patients. Asian Pacific Journal of Cancer Prevention, 12, 3389-3396.
Thank you!