designing a questionnaire

1

designing a questionnaire

2

Objectives in designing questionnaires

There are 3 main objectives are:

To maximise the proportion of subjects answering our questionnaire - that is, the response rate.

To obtain accurate relevant information for our survey.

In order to obtain accurate relevant information, we have to give some thought to what questions we ask, how we ask them, the order we ask them in, and the general layout of the questionnaire.

To maximise our response rate, we have to consider carefully how we administer the questionnaire,

establish rapport, explain the purpose of the survey, and remind those who have not responded. The length of the questionnaire should be appropriate.

3

Deciding what to ask

There are three potential types of information: (1) Information we are primarily interested in-that is,

dependent variables.

(2) Information which might explain the dependent variables-that is, independent variables.

(3) Other factors related to both dependent and independent factors which may distort the results and have to be adjusted for - that is, confounding variables.

4

Qualities of a Good Question

Evokes the truth. Questions must be non-threatening. When a respondent is concerned about the

consequences of answering a question in a particular manner, there is a good possibility that the answer will not be truthful.

Anonymous questionnaires that contain no identifying information are more likely to produce honest responses than those identifying the respondent.

If your questionnaire does contain sensitive items, be sure to clearly state your policy on confidentiality.

5

Asks for an answer on only one dimension or only one piece of information at a time (NO DOUBLE BARREL QUESTIONS)

The purpose of a survey is to find out information. A question that asks for a response on more than one dimension will not provide the information you are seeking. For example, a researcher investigating a new food snack asks: "Do you like the texture and flavor of the snack?" If a respondent answers "no", then the researcher will not know if the respondent dislikes the texture or the flavor, or both.

6

Another question asks, "Were you satisfied with the quality of our food and service?" Again, if the respondent answers "no", there is no way to know whether the quality of the food, service, or both were unsatisfactory. A good question asks for only one "bit" of information.

Another example, "Please rate the lecture in terms of its content and presentation" asks for two pieces of information at the same time. It should be divided into two parts:

"Please rate the lecture in terms of (a) its content, (b) its presentation."

7

Can accommodate all possible answers.

Multiple choice items are the most popular type of survey questions because they are generally the easiest for a respondent to answer and the easiest to analyze.

Asking a question that does not accommodate all possible responses can confuse and frustrate the respondent. For example, consider the question:

8

What brand of computer do you own? __ A. IBM PC B. Apple

Clearly, there are many problems with this question. What if the respondent doesn't own a microcomputer? What if he owns a different brand of computer? What if he owns both an IBM PC and an Apple? There are two ways to correct this kind of problem.

9

The first way is to make each response a separate dichotomous item on the questionnaire. For example:

Do you own an IBM PC? (circle: Yes or No)

Do you own an Apple computer? (circle: Yes or No)

10

Another way to correct the problem is to add the necessary response categories and allow multiple responses. This is the preferable method because it provides more information than the previous method.

What brand of computer do you own?(Check all that apply)

__ Do not own a computer__ IBM PC__ Apple__ Other

11

Has mutually exclusive options.

A good question leaves no ambiguity in the mind of the respondent. There should be only one correct or appropriate choice for the respondent to make. An obvious example is:

Where did you grow up? __ A. country

B. farmC. city

A person who grew up on a farm in the country would not know whether to select choice A or B. This question would not provide meaningful information. Worse than that, it could frustrate the respondent and the questionnaire might find its way to the trash.

12

Produces variability of responses.

When a question produces no variability in responses, we are left with considerable uncertainty about why we asked the question and what we learned from the information.

If a question does not produce variability in responses, it will not be possible to perform any statistical analyses on the item. For example: What do you think about this report? __

A. It's the worst report I've readB. It's somewhere between the worst and bestC. It's the best report I've read

13

Since almost all responses would be choice B, very little information is learned [It's somewhere between the worst and best].

Design your questions so they are sensitive to differences between respondents. As another example: Are you against drug abuse? (circle: Yes or No)

Again, there would be very little variability in responses and we'd be left wondering why we asked the question in the first place.

14

Follows comfortably from the previous question.

Writing a questionnaire is similar to writing anything else. Transitions between questions should be smooth. Grouping questions that are similar will make the questionnaire easier to complete, and the respondent will feel more comfortable. Questionnaires that jump from one unrelated topic to another feel disjointed and are not likely to produce high response rates

15

Does not presuppose a certain state of affairs.

Among the most subtle mistakes in questionnaire design are questions that make an unwarranted assumption. An example of this type of mistake is:

Are you satisfied with your current auto insurance? (Yes or No)

This question will present a problem for someone who does not currently have auto insurance. Write your questions so they apply to everyone. This often means simply adding an additional response category. Are you satisfied with your current auto insurance?

___ Yes___ No___ Don't have auto insurance

16

One of the most common mistaken assumptions is that the respondent knows the correct answer to the question. Industry surveys often contain very specific questions that the respondent may not know the answer to. For example:

What percent of your budget do you spend on direct mail advertising? ____

Very few people would know the answer to this question without looking it up, and very few respondents will take the time and effort to look it up. If you ask a question similar to this, it is important to understand that the responses are rough estimates and there is a strong likelihood of error.

17

Does not imply a desired answer.

The wording of a question is extremely important. We are striving for objectivity in our surveys and, therefore, must be careful not to lead the respondent into giving the answer we would like to receive.

Leading questions are usually easily spotted because they use negative phraseology. As examples: Wouldn't you like to receive our free brochure? Don't you think the government is spending too much money?

18

Does not use emotionally loaded or vaguely defined words. This is one of the areas overlooked by both beginners and experienced researchers.

Quantifying adjectives (e.g., most, least, majority) are frequently used in questions.

It is important to understand that these adjectives mean different things to different people.

19

Does not use unfamiliar words or abbreviations. Remember who your audience is and write your questionnaire for them.

Do not use uncommon words or compound sentences. Write short sentences. Abbreviations are okay if you are absolutely certain that every single respondent will understand their meanings.

If there is any doubt at all, do not use the abbreviation. The following question might be okay if all the respondents are educated people , but it would not be a good question for the general public.

What was your SES status? ______

20

Is not dependent on responses to previous questions. Branching in written questionnaires should be avoided.

While branching can be used as an effective probing technique in telephone and face-to-face interviews, it should not be used in written questionnaires because it sometimes confuses respondents. An example of branching is: 1. Do you currently have a life insurance policy ? (Yes or No) If

no, go to question 3 2. How much is your annual life insurance premium ?

_________

21

Does not ask respondent to order or rank a series of more than five items.

Questions asking respondents to rank items by importance should be avoided. This becomes increasingly difficult as the number of items

increases, and the answers become less reliable. This becomes especially problematic when asking respondents

to assign a percentage to a series of items. In order to successfully complete this task, the respondent must

mentally continue to re-adjust his answers until they total one hundred percent.

Limiting the number of items to five will make it easier for the respondent to answer.

22

The Order of the Questions

Items on a questionnaire should be grouped into logically coherent sections.

Grouping questions that are similar will make the questionnaire easier to complete, and the respondent will feel more comfortable.

Questions that use the same response formats, or those that cover a specific topic, should appear together

23

Each question should follow comfortably from the previous question.

Writing a questionnaire is similar to writing anything else. Transitions between questions should be smooth. Questionnaires that jump from one unrelated topic to

another feel disjointed and are not likely to produce high response rates

24

Arranging the questions The order of the questions is also important. Some

general rules are: Go from general to particular.

Go from easy to difficult. Go from factual to abstract.

Start with closed format questions.

Start with questions relevant to the main subject.

Do not start with demographic and personal questions.

25

It is useful to use a variety of question format to maintain the respondents' interest.

When a series of semantic differential scales are used, it may be a good idea to mix positive negative - for example, interesting to dull - with negative positive - for example, useless to useful - scales.

This might make the respondents think more and avoid the tendency to tick the same response for every question.

26

Question Wording

The wording of a question is extremely important. Researchers strive for objectivity in surveys and, therefore, must be careful not to lead the respondent into giving a desired answer. Unfortunately, the effects of question wording are one of the least understood areas of questionnaire research.

27

Many investigators have confirmed that slight changes in the way questions are worded can have a significant impact on how people respond.

Several authors have reported that minor changes in question wording can produce more than a 25 percent difference in people's opinions

28

Several investigators have looked at the effects of modifying adjectives and adverbs. Words like usually, often, sometimes, occasionally, seldom, and rarely are "commonly" used in questionnaires, although it is clear that they do not mean the same thing to all people.

Some adjectives have high variability and others have low variability. The following adjectives have highly variable meanings and should be avoided in surveys: a clear mandate, most, numerous, a substantial majority, a minority of, a large proportion of, a significant number of, many, a considerable number of, and several. Other adjectives produce less variability and generally have more shared meaning. These are: lots, almost all, virtually all, nearly all, a majority of, a consensus of, a small number of, not very many of, almost none, hardly any, a couple, and a few.

29

Use short and simple sentences Short, simple sentences are generally less confusing and

ambiguous than long, complex ones. As a rule of thumb, most sentences should contain one or two clauses. Sentences with more than three clauses should be

rephrased.

30

Avoid negatives if possible Negatives should be used only sparingly. For example,

instead of asking students whether they agree with the statement, "Small group teaching should not be abolished," the statement should be rephrased as, "Small group teaching should continue." Double negatives should always be avoided.

31

Ask precise questions Questions may be ambiguous because a word or term

may have a different meaning. For example, if we ask students to rate their interest in

"medicine," this term might mean "general medicine" (as opposed to general surgery) to some, but inclusive of all clinical specialties (as opposed to professions outside medicine) to others.

32

Another source of ambiguity is a failure to specify a frame of reference.

For example, in the question, "How often did you borrow books from your library?" the time reference is missing. It might be rephrased as, "How many books have you borrowed from the library within the past six months altogether?"

33

Ensure those you ask have the necessary knowledge For example, in a survey of university lecturers on recent

changes in higher education, the question,

"Do you agree with the recommendations in the Report on Higher Education" is unsatisfactory for several reasons.

Not only does it ask for several pieces of information at the same time as there are several recommendations in the report, the question also assumes that all lecturers know about the relevant recommendations.

34

Level of details

It is important to ask for the exact level of details required. On the one hand, you might not be able to fulfil the purposes of the

survey if you omit to ask essential details. On the other hand, it is important to avoid unnecessary details. People are less inclined to complete long questionnaires. This is particularly important for confidential sensitive information,

such as personal financial matters or marital relationship issues.

35

Handling Sensitive Issues It is often difficult to obtain truthful answers to sensitive

questions. Clearly, the question, "Have you ever copied other students' answers in a degree exam?" is likely to produce either no response or negative responses. Less direct approaches have been suggested.

Firstly, the casual approach: "By the way, do you happen to have copied other students' answers in a degree exam?" may be used as a last part of another decoy question.

Secondly, the numbered card approach: "Please tick one or more of the following items which correspond to how you have answered degree examination questions in the past." In the list of items, include "copy from other students" as one of many items.

36

Thirdly, the everybody approach: "As we all know, most university students have copied other students' answers in degree exams. Do you happen to be one of them?"

Fourthly, other people approach. This approach was used in the recent medical student survey. In this survey, students were given the scenario, “Jalil copies answers in a degree exam from Jamal." They were then asked, "Do you feel Jalil is wrong, what penalty should be imposed for Jalil, and have you done or would you consider doing the above?"

37

Length of questionnaire

There are no universal agreements about the optimal length of questionnaires.

It probably depends on the type of respondents. However, short simple questionnaires usually attract

higher response rates than long complex ones. In a survey of stroke survivors both the response rate

and the proportion of completed forms were higher for a shorter questionnaire (six questions with a visual analogue scale) compared with a longer and more complex questionnaire (with 34 questions).

38

Write in everyday terms.Avoid internal jargon. Many corporations have abbreviations or acronyms for products and services which are not familiar to custome

Follow good business writing practices.Write short, simple questions. Be clear and to the point. Avoid errors in spelling, grammar and usage.

39

Use consistent scales.

All rating scales should mimic the first one used. It can confuse respondents if you change from, for

example, a five point to a seven point scale. Keep the scales going the same way. In other words, if

'5' is high on the first scale, don't make '1' high on the next.

Use similar wording for the anchors. Finally, group like questions under the same scale. If you

do need to change scales, wait until you reach a new section of the questionnaire.

40

Use consistent wording.

The use of similar phrases for the text of the survey can unify your questionnaire.

For example, questions can be set up with a lead phrase which is a phrase that can be used to lead off each

question. For example:

How satisfied are you that our staff is:Responsive to your service requests ....... Knowledgeable about products ............. Knowledgeable about your business........

41

Avoid asking more than one question at a time.

This is known as asking a 'double barreled' question. A typical double barreled question: "Sales reps are polite

and responsive." While the sales reps may be polite they may not be responsive, or vice versa. The respondent will be forced to rate one attribute differently from their true feelings. Consequently, data interpretation will be

questionable.

42

Provide directions.

It is important to let the respondent know what to do on any particular question; however, it is just as important to avoid complicated directions. Make the survey as easy as possible for your respondents by using phrases such as 'Mark all that apply,' and 'Mark only one.' Avoid asking them to calculate anything, such as percentages, and try to avoid the use of skip patterns.

43

Analysis of the responses and the interviewers' comments are used to improve the

questionnaire.

Ideally, there should be sufficient variations in responses among respondents; each question should measure different qualities - that is, the responses between any two items should not be very strongly correlated - and the non-response rate should be low. In the third phase the pilot test is polished to improve the question order, filter questions, and layout.

44

Format of responses

45

Format of responses

The responses can be in open or closed formats. In an open ended question, the respondents can formulate their own answers. In closed format, respondents are forced to choose between several given options. What are the advantages of each of these formats?

It is possible to use a mixture of the two formats- for example, give a list of options, with the final option of "other" followed by a space for respondents to fill in other alternatives.

There are several forced choice formats. Out of these formats, ranking is probably least frequently used, as the responses are relatively difficult to record and analyse.

46

Closed-that is, forced choice-format

Easy and quick to fill in Minimise discrimination against the less literate (in self

administered questionnaire) or the less articulate (in interview questionnaire)

Easy to code, record, and analyse results quantitatively Easy to report results

47

Example "How satisfied are you with your job?" (Circle the number that represents your response)

Very disatisfied Dissatisfied Neutral Satisfied Very

satisfied 1 2 3 4 5

48

Example "What is your marital status?" (Check the box that applies)

Single, never married Married Divorced Separated Widowed Other:_____

49

Questions Very dissatisfied

Dissatisfied Satisfied Very satisfied

How satisfied are you with

your working conditions?

1 2 3 4


your pay?

1 2 3 4


your supervisor?1

1 2 3 4

Please cicle your respons

50

Open format

Advantages Allows exploration of the range of possible themes

arising from an issue Can be used even if a comprehensive range of

alternative choices cannot be compiled

51

Choosing the Right Scale

Choosing a scale for your survey instrument is an important decision that will shape the information you collect. Each scale has variations, some more reliable than others.

Even vs. Odd Number of Points Defining Your Scale

52

Even vs. Odd

Even numbered scales can more effectively discriminate between satisfied or unsatisfied customers because there is not a neutral option.

However, this clear division may cause hesitation for respondents who are neutral in regard to a survey item.

Without a midpoint option, respondents often choose a positive response, creating positively skewed data.

Carefully consider whether a clear division between positive and negative responses is necessary, or whether a midpoint will be more appropriate for your information needs.

53

Number of Points

In survey research, scales commonly range from 2 to 10 points. The number of points for your scale should be determined by how

you intend to use the data. Although seven to ten point scales may seem to gather more

discriminating information, there is debate whether respondents actually discriminate carefully enough when filling out a questionnaire to make these scales valuable.

Also, these scales are often collapsed into three or five point scales for reporting purposes.

Four and five point scales are more highly recommended; Two and three point scales offer little discriminative value and are rarely recommended.

54

Defining Your Scale

Once the number of points on a scale has been decided, it is important to determine the labels for each scale point, or in some cases, whether or not you will use any labels.

For example, an agreement scale could be set up like this:

Strongly Agree Strongly Disagree

5 4 3 2 1

55

Though this may be true, it is also important that each respondent understand the meaning of each scale point.

By labeling each scale point, all respondents attach the same word to a numerical value.

This helps avoid respondent misinterpretation of scale definitions.

Additionally, verbally defining each scale point allows reports to be written in more concrete terms such as: "x percentage were satisfied."

56

Four-point Requirements Scale

Receives high marks for discrimination and reliability. A leading sentence might be, "Please indicate how well Company Z met your requirements.“

Exceeded Met Nearly Met Missed 4 3 2 1

The option of "Nearly Met" serves well to capture data from respondents who are somewhat unsatisfied but prefer to choose positive responses.

57

Five-point Expectations Scale

Receives high marks for discrimination and reliability. A leading sentence might be, "In terms of your

expectations, please rate the performance of Company Z.“

Significantly Significantly Above Above Met Below Below

5 4 3 2 1

58

While these scales have been shown to be effective in collecting accurate data, a good scale cannot compensate for poorly worded items.

Accurate, reliable data depends on a combination of the proper scale and correctly written items, as well as proper survey administration

59

Open-ended QuestionsHow much information or detail do you need in open-ended questions? In general, more detailed information can be gathered when an interviewer probes and clarifies responses, than when respondents are asked to write in their own response.

Visual AidsWill the respondent need to see graphs or figures? If so, chances are a phone survey will not work.

Skip PatternsAre there complex skip patterns requiring respondents to skip to other questions based on answers to previous questions? If so, trained interviews or properly programmed computer methods will ensure that respondents answer the correct questions.

60

There are several ways of administering questionnaires. They may be self administered or read out by interviewers. Self administered questionnaires may be sent by post, email, or

electronically online. Interview administered questionnaires may be by telephone or

face to face.

Advantages of self administered questionnaires include: Cheap and easy to administer. Preserve confidentiality. Can be completed at respondent's convenience. Can be administered in a standard manner.

61

.

Nonsence or error dataSD D X A SA, X should be there because it is not of the scaleElimate the question. If several respondents give similar nonsense data, your isntrument is probably in error

Other categories Not Applicable: you may have too many respondents giving NA.

when more than 15-20% gave NA, then the validity of the Q is in question. When item has more than 20% NA, elimate the item for analysis, keep it an eliminate the individual. One way is to be sure that all item apply to yur respondents and do not use NA

NA means it does not apply to me Neutral mean it applies to me but I am neutral in my opinion

62

Validity of questionnaires

63

What is validity?

According to the American Psychological Association, validity "...refers to the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores." (Standards for Psychological and Educational Testing, 1985, p. 9).

In other words, if your findings need to be appropriate, meaningful and useful, they need to be valid.

Validity refers to whether the questionnaire or survey measures what it intends to measure.

64

An instrument that is a valid measure of third grader's math skills probably is not a valid measure of high school student's math skills.

An instrument that is a valid predictor of how well students might do in school, may not be a valid measure of how well they will do once they complete school.

So we never say that an instrument is valid or not valid...we say it is valid for a specific purpose with a specific group of people.

65

The validity of a questionnaire relies first and foremost on reliability. If the questionnaire cannot be shown to be reliable, there is no discussion of validity.

But there is good news. Demonstrating validity is easy, compared to reliability. If you have reached this point and have a reliable instrument for measuring the issues or phenomena you are after, demonstrating its validity will not be difficult.

66

Types of Validity

Everyone agrees that validity is important, but what type of validity are we talking about?

three main types of validity: content, criterion-related, and construct validity.

67

Content Validity

Content validity determines if the survey items are representative of the topic being measured.

You need to: Define [you must clearly state] what you are interested in

measuring, for example 'Quality.' Choose the specific aspects which require feedback, for example,

'Error Rate.' Judge whether your items relate to the definitions you developed

and adequately cover all aspects [whether the items are representative of the topic].

68

Example:

Specialists in the content measured by the instrument are asked to judge the appropriateness of the items on the instrument.

Do they cover the breath of the content area (does the instrument contain a representative sample of the content being assessed)?

Are they in a format that is appropriate for those using the instrument?

A test that is intended to measure the quality of science instruction in fifth grade, should cover material covered in the fifth grade science course in a manner appropriate for fifth graders.

A national science test might not be a valid measure of local science instruction, although it might be a valid measure of national science standards.

69

Researchers aim to study mathematical learning and create a survey to test for mathematical skill. If these researchers only tested for multiplication and then drew conclusions from that survey, their study would not show content validity because it excludes other mathematical

functions. A researcher needing to measure an attitude like self-

esteem must decide what constitutes a relevant domain of content for that attitude.

You must define your content domain

70

Criterion-Related Validity

Criterion-related validation relies on statistical analyses rather than judgments as in content validation.

Criterion-related validation involves calculating a 'validity coefficient' by correlating the survey items with another measure (criteria) already known to be related to other aspects of the attribute.

For example, if satisfaction with the service department relates to the number of friends one refers to the service department, then we could correlate scores on a measure of satisfaction to an index of referrals.

71

Construct Validity

Determine the construct to be measured, for example, 'Quality.'

Determine relationship between the construct and other constructs, for example, 'Satisfaction.'

Examine pattern of relationships

72

73

Reliability of questionnaires

74

Why Does Reliability Matter?

A questionnaire, will always produce numerical results, even if they're meaningless.

You could be making business decisions based on survey results that don't mean anything.

Only a test of reliability can tell you if you should trust the results.

.

75

What Is Reliability?The most common definitions include

descriptions such as stability, repeatability, and accuracy.

In the context of survey design, reliability is essentially the extent to which a survey will provide the same results with repeated measurement.

An example will make this statement clear.

76

Non-technically speaking, a reliable questionnaire is one that that would give the same results if you used it repeatedly with the same group.

That may sound funny because most organizations don't administer a questionnaire to the same group twice.

But if they did, they would learn how reliable their questionnaire is, because a reliable survey will give the same results on Tuesday as it did the previous Monday.

77

Reliability is a property of the measuring instrument.

If you are like many people, you probably get on your bathroom scale in the morning, look at the weight displayed, then step off, and do it again.

You have learned that what is displayed by a bathroom scale the first time is not always exactly the same as the second, but it is usually very close.

78

What if one morning you weighed yourself, then a second time, and the second weight displayed was 5 lbs. heavier than the first?

You would probably step off, then weigh yourself a third time. What if it was now 4 lbs. lighter than the first?

Would you still be concerned about your weight? Or would you be more concerned about finding out what's "wrong" with the scale?

What's wrong is that your scale has become unreliable. You can see unreliability by repeatedly measuring the same thing.

And when you know the scale is unreliable, you don't even try to measure your weight, you concentrate on fixing the scale first.

79

If you questionnaire is unreliable, it's like trying to measure the length of something with a rubber tape measure.

You could make your marks at precise intervals, but the flexibility of the material would destroy its reliability.

Most questionnaires that use rating scales to record people's opinions are like rubber tape measures.

80

Reliability: The ability of an instrument to measure consistently with relative absence of error. The higher the reliability coefficient, the more confidence you can have in the score.

.90 and up………Excellent! .80-.89………….Good .70-.79………….Adequate Below .70………May have limited applicability Source: Testing and Assessment: An Employer’s Guide to Good Practices U.S. Dept of Labor 1999

81

To understand reliability coefficients, a brief discussion of the components of a score will be helpful. An observed or obtained score on an instrument can be divided into two parts. Observed Score = True Score + Error

An instrument can be said to be reliable if it accurately reflects true scores. Or in other words, an instrument can said to be reliable to the extent that it minimizes the error component. So, the reliability coefficient is the proportion of true variability to the total obtained variability.

Therefore, if you get a reliability coefficient of .85, this means that 85 percent of the variability in obtained scores could be said to represent true individual differences and 15 percent of the variability is due to random error.

82

Stability (produces the same results with repeated testing)

Test-retest Parallel forms Alternate forms

83

Internal-Consistency Measures of Reliability

Split-half reliability

Chronbach’s alpha Split-Half Reliability One test is split into two halves and the correlation between the two halves

is calculated. (Both halves of the test must be equal in content and difficulty.) Since the number of items is split in half, the Spearman-Brown formula must be employed to estimate reliability for the entire test.

Kuder-Richardson 20 A test of homogeneity (inter-item consistency) the K-R 20 compares the

proportion of correct and incorrect responses to each of the items on the test. The K-R 20 is appropriate for tests in which items are either scored right or wrong.

Kuder-Richardson 21 This simpler formula is based upon the assumption that all items are of

equal difficulty (rarely the case!)

84

Equivalence (instrument produces the same results when a equivalent instrument is used or there is consistency among researchers using the same instrument). Two equivalent forms of the test are administered to the same group of people. (It can be very difficult to develop two truly equivalent forms of a test.)

Parallel items on Alternate forms Inter-rater reliability

85

if we measure an object using two rulers, one made of steel and one made of a rubber band, you would expect the steel ruler to provide relatively consistent or stable measurements (assuming the object was stable).

The rubber band ruler, on the other hand would probably provide a variable set of measurements.

86

How Do You Measure Reliability? As is the case with validity, there are a number of different ways to

assess the reliability of a survey. The method you choose will depend upon what you are trying to accomplish. Several ways we measure the reliability of an instrument include: test-retest, split-halves, and internal consistency. All of these methods will result in a number between 0.00 and 1.00, with scores increasing as the survey becomes more reliable.

Basically there two reliability testing procedures:

One administration and two administration Two administration is a less desireble procedure

87

Stability:

Test - Retest As the name implies, test-retest reliability involves administering a

survey to a group of individuals at one time and then re-administering the survey to the same individuals some later time.

The survey responses are then correlated and the resulting correlation is interpreted as the reliability of the instrument. This method clearly illustrates the notion of reliability as measurement consistency.

Unfortunately, there are some downsides to this approach. Look for correlation score of at least .70 (tend to higher for short

term retests ~ 2 weeks and lower for long term retests >1-2 months

88

Weaknessess: First, it is often very difficult to administer the same

survey to the same person twice. Second, the act of measuring someone's attitudes (i.e., satisfaction) can affect their attitude.

Specifically, asking people to report their satisfaction at time 1 can sensitize them to the issues and result in a change in scores at time 2 (a phenomenon called reactivity), resulting in a low reliability estimate.

Finally, people remember their first response and respond in a way to maximize their consistency, not necessarily to reflect their attitude. This will result in inflated estimates of reliabilit

89

Parallel items on Alternate forms Same population completing similar forms of the instruments before

and after a short time period or one right after another If items are truly parallel, they have identical true scores and

identical error variances. Responses to parallel items will differ only with respect to random fluctuations.

Uses questions (items) that are comparable to each other and parallel. However, it is very difficult to prepare to two forms of a test that display the properties of parallel measures.

However, there are two forms of certain tests whose items are intended to measure the same thing and do not differ from each other in any systematic way.

The two sets of scores are correlated to produce a correlation coefficient

90

Homogeneity - Internal Consistency

Split Halves An alternate approach to reliability requires us to split a survey in

half and then correlate the two halves. For example, if we had a twenty item survey assessing customer

satisfaction with a sales associate, we could administer the survey to our sample, split the survey in half, and then correlate the two halves of the survey.

A reliable survey would result in strong correlations between the two halves.

The major problem associated with this approach is splitting the survey.

Every approach will probably result in a slightly different result, providing some confusion as to the actual reliability of the survey.

91

Formula for split half

r = 2 r ½ ½

1 + r ½ ½

Where r ½ ½ is the pearson correlation between the two halves

92

Internal Consistency A potential solution to the problems with the split-half approach is to

use a measure of internal consistency. Internal consistency considers the average correlation between all

of the survey items and the number of survey items to provide an estimate of reliability.

A common measure of internal consistency is coefficient a Alpha or Cronbach's alpha KR20 or KR21.

The downside to alpha is that is more difficult to calculate than the other methods.

Luckily, however,many statistical programs will calculate this for you.

Item to total correlation Measures the correlation of each of the items to the total scale. Items with a low correlation can be deleted.

93

KR 20 for dichotomous itensKR 21 MC items

Formula for KR20

r = [ K ] [1 – (S x2 – sum of pq)]

K- 1 sx 2

Formula for KR21

R = (KS x 2) X(K – X) S x 2 (K – 1)

Where, r is realibility estimate K number of items on test p proportion of sample

who got item correct q proportion of sample

who got item wrong (1-p) S x2 variance in sample X mean on test

94

Factors influencing reliability

Test related factors Length, test content, homogeniety of items, dificulty of

items (too easy or too difficulty will reduice reliability)Test taker- Heterogeneity [hetero more spread]- Attitude- AptitudeAdministration factor- time limit- opportunity to cheat

95

Raising reliability

Lengthen the intrument Check the test item (clarity, reading level, format) Make it median difficulty if it is an achievement test Increase timi limit

96

Types of instrument Dichotomous (T/F) Multiple choice Check all that apply Rank the items Rate the itesm that will be

summed together Likert scale Demographic One instrument with several

domain which are measured by multiple items

Long instrument

Types of realibility procedures KR20 KR21/alpha Test/retest Test/retest Test/retest

Cronbach’ Alpha/KR21 None or percent or agreement KR21/Croncbach’s Alpha

Split half

97

designing a questionnaire

Documents

identifying information

pieces of information

piece of information

respondent answers

accurate relevant information

respondent doesnt

potential types of information

questionnaire objectives