SOCY3700 Selected Overheads Prof. Backman Fall 2007 Update history 10/05/07 Add slides 1-25 10/10/07 Add slides 26-33 10/14/07 Add slides 34-45.

SOCY3700 Selected Overheads

Prof. BackmanFall 2007

Update history10/05/07 Add slides 1-25

10/10/07 Add slides 26-33

10/14/07 Add slides 34-45

Measurement Validity

• Measurement validity is the extent to which a measure measures whatever it is intended to measure

• Three types of measurement validity– Face validity – does the

measure seem (“on its face”) like it measures what it’s supposed to (often tested by asking experts and others)

Measurement Validity, cont.

– Content validity - the extent to which the measure covers the full range of the concept

• The richer the concept (say, religiosity or feminism), the more likely that multiple indicators will be needed

– Criterion validity – the extent to which the measure is supported by other accepted measures

• Concurrent validity – how well the measure correlates with other measures of the concept

• Predictive validity – how well the measure correlates with other concepts its should be related to

Levels of Measurement

• Nominal – values identify categories only– Do not have arithmetic meaning– Also called categorical variables– When there are only two

categories, called dichotomies or binary variables

– Two technical requirements for categories:

• Exhaustive ( every observation fits into some category)

– Leads to lots of “Others”• Mutually exclusive (every

observation fits in exactly one category)

Levels of Measurement, cont.

• Ordinal – same characteristics as nominal PLUS the fact that categories can be ranked from lower to higher– Mathematical operation of subtraction

makes no sense, but > and < do– Most common: Likert

• Interval – same characteristics as ordinal PLUS the fact that the arithmetic difference between any two values makes sense– That is, the usual subtraction

operation makes the usual arithmetic sense

• Ratio – same characteristics as interval PLUS the fact that there is a sensible zero value– Thus division and ratios make sense

Abbreviations often used for “Other” categories

• NA – no answer or not answered

• DK – don’t know

• NAP – not applicable. Often this means the question was not even asked

• nec or n.e.c.– not elsewhere classified. Typically in the category title, “Other, nec”

Ecological and Reductionist Fallacies

• Unit of analysis – level (individual or some kind of aggregate) addressed by your theory or hypothesis

• Unit of observation – level (individual or some kind of aggregate) from which data are collected

• Ecological fallacy – drawing conclusions about individuals based on data from aggregates

• Reductionist fallacy – drawing conclusions about aggregates based on data from individuals

Writing About Crosstabulations From a Sample

• Lead with what is important– What’s important?

• The fate of your hypotheses (if you have stated some)

• The overall pattern for the dependent variable, especially if it is striking or surprising. Then look at deviations from the pattern in the categories of your independent variable

• Big differences between categories of your independent variable

• Things of interest to your audience

– Remember, the usual point of a crosstab is to display differences between categories of the independent variable

Writing About Crosstabulations From a Sample, cont.

• Do not use raw counts; use percents

• Use the correct percents– Do not confuse row, column, and total

percents• Be sure to specify the base for

percents– Usually something like, “… x percent

of [the base] …” or “Of all [bases] surveyed, x percent responded…”

• Round percents in your text (but not necessarily in your tables) to integers

• Be ready to convert percents to simple fractions– For example, 23 percent could be

called “nearly a quarter” or “about one in four”

Writing About Crosstabulations From a Sample, cont.

• Do not confuse percentage differences and percentage point differences– Percentage differences cannot

be calculated by simple subtraction

• Be ready to collapse categories– For example, to combine

“Strongly agree” and “Agree” responses into one category

• Be ready to calculate cumulative percents

Central Limit Theorem

If repeated random samples of size n are drawn from any population with mean μ and standard deviation δ, the sampling distribution of sample means will be normal as n gets large, with mean μ and standard deviation δ/√n (also known as the standard error of the mean) .

Hence, the standard deviation of the means drawn from many, many samples reflects 1) the standard deviation of the population, and 2) the sample size

Probability Sampling

• Probability sampling is any method of drawing a sample of elements from a population such that the probability that any element or set of elements will be included in the sample is known and is not zero

• The chief advantage of probability sampling is that the accuracy (or lack thereof) of estimates of population parameters from the sample can be estimated

Simple Random Sampling (SRS)

• Frame – complete list of the survey population

• Sample size – calculated based on desired precision of results

• Selection rule – random selection without replacement

• Estimate of population mean is the sample mean– Unbiased– s.e. = √fpc * (δ / √sample size)

Simple Random Sampling: Advantages and Disadvantages

• SRS advantages– Samples are easy to draw– Samples are easy to use– Estimation of errors is “easy”

• SRS disadvantages– Not always the lowest standard error

method– Requires complete roster– Can be very expensive

• Completing the frame may be expensive• Reaching geographically dispersed

respondents may be expensive

– May require large sample sizes to deal with rare population elements

• Most elements in the sample will not be rare

Finite Populations and Sampling

• Sampling error estimation depends on the Central Limit Theorem

• The Central Limit Theorem applies to infinite populations– Infinite populations are easy to do in

theory, but rare in practice

• If you sample everyone in a finite population, the sampling error would be 0– The closer you get to sampling

everyone, the smaller your error should be

– Central Limit Theorem says error is proportional to δ/√n

Finite Populations and Sampling, cont.

• The finite population correction factor (fpc) takes into account the reduction in error you should get from sampling all or a large fraction of a finite population

• The fraction of the population that is in the sample, n/N, is called the sampling ratio (f)

• fpc = (N-n)/(N-1) ≈ (N-n)/N = (1 – f)

• The standard error of the mean from a finite population (with simple random sampling) is√fpc * (δ/√n)

• In practice, we ignore the fpc when the sampling ratio is less than 10%

Stratified Sampling

• Frame– Usual SRS frame except broken

into exhaustive, mutually exclusive groups

– Requires knowledge ahead of time about how many elements in the population there are in each group

– Each group is a stratum (plural strata)

• Sample size - calculated based on desired precision of results– Calculations more complex than

with SRS because there are more alternatives

Stratified Sampling (2)

• Selection rules– Cases are drawn from each

stratum– Cases within strata are drawn by

SRS– Two alternatives for number

drawn with each stratum• Proportionate to size – every

element in the population has an equal chance of being drawn into the sample, regardless of stratum

• Disproportionate – some strata will have a larger proportion of the sample than they will of the population

Stratified Sampling (3)

• Estimation of the mean– If proportionate to size selection is

used, the sample mean is an unbiased estimate of the population mean

– If disproportionate selection is used, weights must be used to obtain an unbiased estimate of the population mean

– Standard error of the mean will ordinarily be lower than the standard error from a simple random sample of the same size

– The more homogeneous the elements are within strata, the more efficient stratified sampling will be

Stratified Sampling: Advantages and Disadvantages

(compared with Simple Random Sampling)

• Advantages– Reduced standard errors of estimate

over SRS– Can thus get the same precision as

SRS with smaller sample size– If proportionate selection is used,

unweighted sample statistics can be used to estimate population parameters

– Disproportionate selection can be used to get sufficient numbers of members of rare populations

• Disadvantages– Requires advanced knowledge about

stratum sizes– Disproportionate selection requires

use of weights in making estimates of parameters

Cluster Sampling• Most complex method. Often used

in conjunction with stratification and SRS; this is called multi-stage sampling

• Frame– Broken into groups called clusters– Complete frame is needed only for

clusters that are selected• It is necessary to know the size of clusters

that are not selected

• Sample size – usually calculated based on explicit tradeoff between costs and precision of results– Calculations more complex than with

SRS or stratification because there are more alternatives

Cluster Sampling (2)

• Selection rules– A sample of the clusters is drawn

by simple random sampling– Within each cluster either all the

elements or a simple random sample of the elements are drawn

– When possible, sample sizes within clusters are drawn proportionate to size

– NOTE that in cluster sampling only some of the clusters are used, while in stratified sampling, all of the strata are

Cluster Sampling (3)• Estimation of the mean

– If clusters and elements within clusters were drawn so that all elements in the population had equal probabilities of selection, the sample mean is an unbiased estimate of the population mean. This rarely is possible

– In the likely case of unequal probabilities of selection, weights must be used to obtain an unbiased estimate of the population mean

– Standard error of the mean will ordinarily be higher than the standard error from a simple random sample of the same size

– The more heterogeneous the elements are within strata, the more efficient cluster sampling will be

• To the extent possible, each cluster should be representative of the entire population

Cluster Sampling:Advantages and Disadvantages

(compared with Simple Random Sampling)

• Advantages– Cost control

• In general, the only reason to use clustering is to reduce financial or time costs

– Can be used with stratification of clusters to help control standard errors

– If proportionate selection is used, unweighted sample statistics can be used to estimate population parameters

• Disadvantages– Sampling consultant probably needed– Larger standard errors than with SRS– Parameter and error estimation

usually requires use of weights

Sample Pathologies

• Biggest, most common problem: non-response– Estimation of parameters and

errors assumes that data were collected from every element in the sample

• Limitations on generalizability due to mismatch between the population of interest (target population) and the frame (survey population)– Called coverage error

Source: Patricia Salant and Don A. Dillman. 1994. How to Conduct Your Own Survey. NY: Wiley

Surveys á la Dillman:Eight Steps

1. Decide what information you need

2. Choose a survey method3. Draw a sample4. Write questions5. Design the questionnaire6. Field the survey7. Turn answers into usable

data8. Report results

Dillman on the Survey Process

• Dillman analyzes the survey process from an exchange theory perspective– There is an exchange between

the researcher and the respondent

– Compliance with researcher’s request for information is a function of the social rewards the researcher can offer the respondent

• Rewards such as gratitude, opportunity to have a say on something important

Writing Survey Questions

• Question topics– There is little you can’t ask about– Useful distinction:

• Questions about subjective states like attitudes, beliefs, and knowledge

• Questions about objective phenomena like behavior or demographic attributes

– Always remembering that in a questionnaire even objective phenomena are filtered through the respondent’s mind

Pp. 177ff in W.L. Neuman. 2007. Basics of Social Research. 2nd ed. Boston: Pearson

Writing Survey Questions (2):

Question Form• Two basic question forms:

open-ended and closed-ended

• Open-ended questions are questions to which respondents can give any answer

• Closed-ended questions both ask a question and provide the respondent with preset answers to the question to choose among

Writing Survey Questions (3):

Closed-ended Questions• Questions with ordered

categories– E.g., Likert scale items– When there is an order, be sure

to use it

• Questions with unordered categories

• Partially closed-ended– One option is something like

“Other (please specify) ____”

Pp. 170-3 in W.L. Neuman. 2007. Basics of Social Research. 2nd ed. Boston: Pearson

Writing Survey Questions:Neuman’s

Dirty Dozen Don’ts1. Avoid jargon, slang, and

abbreviations

2. Avoid ambiguity, confusion, and vagueness

a. Whatever

3. Avoid emotional languagea. Can evoke frames that

effectively hijack the intent of the question

4. Avoid prestige bias



Dirty Dozen Don’ts (2)5. Avoid double-barreled

questions

6. Do not confuse beliefs with reality

7. Avoid leading questions

8. Avoid asking questions that are beyond respondents’ capabilities



Dirty Dozen Don’ts (3)9. Avoid false premises

10.Avoid asking about intentions in the distant future

11.Avoid double negatives

12.Avoid overlapping or unbalanced response categories

Questionnaire Layout (1)

• Very important– Reflects your professionalism in the

eyes or ears of your respondents and the eyes of your interviewers

– Affects the likelihood of measurement error through respondent or interviewer error

– Affects response rate

• In mail surveys designed primarily with respondent in mind

• In telephone and face-to-face surveys, designed with both interviewer and respondent in mind

Questionnaire Layout (2):Mail Surveys

• Overall objectives– Minimize perceived (and real)

respondent burden– Don’t confuse respondent– Simplify later data entry

• Make a booklet– Questions are enclosed inside a

booklet made of folded legal sized (8.5 x 14 inch) paper

– No questions on the front or back of the booklet


• Front page of booklet:– Title of study– Some graphic stuff– Sponsor– Return address

• Back page– Request for comments– Thank you– Return address and telephone

contact information


• Overall question sequence– Start easy

• First question must grab attention, reflect the issues in the cover letter, and not be too difficult or threatening

– Start on topic– Group like questions together

• Makes writing transitions easier

– Keep threatening questions until later in the questionnaire

– Get your demographics last• That’s probably least important to

you and apparently least relevant to respondent


• Layout of individual pages– Use white space

• What counts is not how many pages the survey is, but rather how long it seems to be to respondents

– Use fonts consistently to distinguish questions, answers, and instructions

• Dillman likes to use bold for questions, all caps for answers, unbolded for transitions, and unbolded in parentheses for instructions

– Establish a vertical flow– Precode the answers, usually on

the left margin

Source: Salant and Dillman

Fielding Mail Surveys (1)

Overview

1. We’re always trying to increase response rates

2. Respondents are most likely to respond if they think benefits outweigh their costs

3. We need to keep respondents engaged from the opening of the mail through the returning of the completed questionnaire



Bottom lines1. Mail survey response rates depend very much on the number of contacts2. Mail surveys require advanced planning

- Be sure you have the resources to meet the schedule

3. What really matters is the overall look and feel of the questionnaire

- It’s a lot like buying (or selling!) a car


• First mailout – advanced notice letter– Sent to the entire sample– Mailed first class– Handwritten signature– Explains why there will be a

survey– Explains why participation will be

appreciated

• Put yourself on the mailing list for this and all other mailings



• Second mailout – cover letter, questionnaire, and return envelope– Sent one week after advanced

notice– Cover letter

• Personalized• Explains survey purpose• Explains ID# on the questionnaire

and promises confidentiality• Reinforces importance of

everyone’s participation• Specifies who should complete the

questionnaire• Thanks respondent for participation• Hand signed


• Questionnaire – with ID number

• Return envelope is stamped, addressed, and ready for use


• Third mailout – postcard followup– 4 to 8 days later– Personalized– Reminding and thanking

• Fourth mailout – new cover letter, questionnaire, and return envelope– Three weeks after the second

mailout (the first one with a copy of the questionnaire)

– Sent only to addresses that have not yet returned the survey


• The four mailings should yield a final response rate of 50 – 60 percent

• To further increase response rate, one can:– Send another follow up like the

fourth mailing– Send the follow up as certified or

express mail– Telephone

• Often you will discover that people shouldn’t have been in the sample in the first place

Sampling Review

• Rule of thumb sampling error of a proportion at the 95 percent confidence level = 1 / square root (sample size)– If size = 400, error = 1/20 = 5%

• The Central Limit Theorem is important for social science research because it provides the mathematical basis for using probability samples 1) to make estimates of parameters from large populations using small samples and 2) to estimate the precision of those estimates

Sampling Review (2)

• In both stratified and cluster sampling the survey population is divided into exhaustive, mutually exclusive groups. Each group could be either a stratum or a cluster

• If we use all the groups in our final sample, we call each group a stratum

• If we use only some of the groups in our final sample, we call each group a cluster

SOCY3700 Selected Overheads Prof. Backman Fall 2007 Update history 10/05/07 Add slides 1-25 10/10/07 Add slides 26-33 10/14/07 Add slides 34-45.

Documents

sense slide

category slide

nec slide

measure measures

independent variable

criterion validity

content validity

categories na