Ahmad measuremenr reliability and validity

1

2

Outline

Norm-Referenced Reliability Procedures

Criterion-Referenced Reliability Procedures

Norm-Referenced Validity Procedures

Norm-Referenced Item Analysis Procedures

Criterion-Referenced Validity Assessment

Criterion-Referenced Item Analysis Procedures

3•3

4

Affects

Affects

Affects

5

Norm-Referenced Reliability

Estimated by: Estimated by:

Criterion-Referenced Reliability

Estimated by: Estimated by:

Test-retest Parallel form Internal consistency

Test-retest Parallel form Inter-rater and intra-ratter

6

Test-retest Parallel form Internal consistency

-For affective measures

1. Administer instrument under standardized conditions.

2. Re-administer instrument with same conditions.

3. Determine the correlation between the two scores.

-Administer the two forms of instrument and identify the correlation between them.

-The two forms should have equal mean and standard deviation, equal correlation with a third variable, and being constructed with the same objective and procedure.

- Can assess equivalence & stability.

-For cognitive measures.

•Consistency of a single measure on one occasion.

•Cronbach’s alpha is the preferred index of internal consistency reliability. Why?

•KR 20 & KR 21 are special cases of alpha, used when data are dichotomous

7

αα coefficient coefficient

•Equal to the mean of the all possible split-half coefficients associated with a set of data

• Indicator for the consistency of items in the same instrument.

•Affected by test length, total test variance, and shape of the resulting distribution of test scores, and response rate.

8

Reliability of Subjectively Scored Measures

9

Estimating the Reliability of Change in Testing LengthEstimating the Reliability of Change in Testing Length

10

Criterion-Referenced Reliability

11

•The ability of a measure to consistently classify objects or persons into the same categories on two separate occasions.

12

Procedure for Test-retest for Criterion-Referenced Reliability

13

14

15

•The two parallel forms assess the same content domain and have The two parallel forms assess the same content domain and have relatively homogeneous items.relatively homogeneous items.

16

Inter-rater and intra-ratter/Inter-rater and intra-ratter/ Criterion-Referenced Reliability

•Error of standardsError of standards•Halo errorHalo error•Logic errorLogic error•Similarity errorSimilarity error•Central tendency errorCentral tendency error

Expected Rating errorsExpected Rating errors

17

18

19

20

o

21

22

23

Contrasted Groups Approach

24

25

26

Confirmatory Factor Analysis (CFA) Procedure

27

28

29

1. Reliability3.Construct

Validity

4.Discriminant

Validity

2. Convergent

Validity

Same Same ConstructConstruct

Different Different ConstructsConstructs

Sam

e Sa

me

Met

hod

Met

hod

Diff

eren

t D

iffer

ent

Met

hods

Met

hods

Method 1: rating scale

Method 2: checklist

ConstructConstruct 1: bonding

ConstructConstruct 2: perinatal care

1- Reliability should be high as a prerequisite for validity

2- Convergent validity should be high (correlation between different methods measuring the same construct).

3- Construct validity is evidenced when heterotrait-monomethod correlations be lower than correlations mentioned in point 2 (function of trait not method).

4- Discriminant validity is evidenced when heterotrait-heteromethod correlations are the lowest among all previously mentioned correlations.

Example

This example can be applied for more than 2 constructs

Multitrait- Multimethod Approach

30•http://www.socialresearchmethods.net/kb/mtmmmat.php

31

Criterion-Related Validity/Norm-Referenced

32

Criterion-Related Validity/Norm-Referenced

33

Norm-Referenced Item Analysis Procedures

34

35

2-Discrimination Index

36

Item No.

Proportion of Correct answers in Group

Item DiscriminationIndex (D)

(range from -1.00 to +1.00)

Upper 1/4 Lower 1/4

1 90% 20% 0.72 80% 70% 0.13 100% 0% 14 100% 100% 05 50% 50% 06 20% 60% - 0.4

Adapted from : www. distance.fsu.edu/docs/

A negative D value usually indicates that an item is faulty and needs improvement because the item is not discriminating in the same way as the total test.

A positive D value is desirable

•D values greater than +0.20 are desirable for a norm-referenced measure.

37

38

39

Focuses on:

We need evidence for Content validity Construct validity Decision validity Criterion related validity (predictive validity and concurrent

validity).

Criterion-referenced Validity Assessment

40

41

1- Content specialists 2-Determination of Interrater Agreement

3-Average Congruency/Percentag

e

•Two or more content specialists examine the format and content of each item.

•Item-objective congruence focuses on content validity at the item level.

•If more than one objective is used for a measure, the items that are measures of each objective usually are treated separately.

•Interrater Agreement can be evaluated by:

•1- The index of content validity (CVI).

•2- P0 and K as measures of inter-rater agreement with acceptable levels (P0 ≥ 0.80, & K ≥ 0.25).

• Too low P0 and K are indicators of ???

The percentage of items rated congruent by each judge is calculated. • The mean percentage for all judges is the “average congruency percentage”.

• An average congruency percentage of 90% or higher is acceptable .

42

43

44

1-Item-Objective or Item-Subscale Congruence

2-Item Difficulty 3-Item Discrimination

-Based on the ratings of two or more content specialists who assign a value of +1 (definitely measure), 0 (undecided), or -1 (not a measure) for each item upon the item’s congruence with the measure’s objective

-The Index is computed based on formula (6.1) , & ranges from (-1 to +1).

•Item p level is calculated for each item.

•The item p level should be higher for the group that is known to possess more of a specified trait or attribute than for the group known to possess less

•Focus on measurement of performance changes (e.g., pretest/posttest) or differences (e.g., experienced/ inexperienced) between the groups.

-Referred to as (D‘) is directly related to the property of decision validity.

45•A useful adjunct item-discrimination index is provided through the use of Po or K

•Usually a negative discrimination index is due to a faulty item.

46

1. McBride, D. L., LeVasseur, S. A., & Li, D. (2013). Development and Validation of a Web-Based Survey on the Use of Personal Communication Devices by Hospital Registered Nurses: Pilot Study. JMIR research protocols, 2(2).

2. Sriratanaprapat, J., Chaowalit, A., & Suttharangsee, W. (2012). Development and Psychometric Evaluation of the Thai Nurses' Job Satisfaction Scale. Pacific Rim International Journal of Nursing Research, 16(3).

3. Yildirim, Y., Tokem, Y., Bozkurt, N., Fadiloglu, C., Uyar, M., & Uslu, R. (2011). Reliability and validity of the Turkish Version of the Memorial Symptom Assessment Scale in cancer patients. Asian Pacific Journal of Cancer Prevention, 12, 3389-3396.

Thank you!

Ahmad measuremenr reliability and validity

Health & Medicine