SOCY3700 Selected Overheads Prof. Backman Fall 2007 Update history 10/05/07 Add slides 1- 25 10/10/07 Add slides 26-33 10/14/07 Add slides 34-45
Dec 26, 2015
SOCY3700 Selected Overheads
Prof. BackmanFall 2007
Update history10/05/07 Add slides 1-25
10/10/07 Add slides 26-33
10/14/07 Add slides 34-45
Measurement Validity
• Measurement validity is the extent to which a measure measures whatever it is intended to measure
• Three types of measurement validity– Face validity – does the
measure seem (“on its face”) like it measures what it’s supposed to (often tested by asking experts and others)
Measurement Validity, cont.
– Content validity - the extent to which the measure covers the full range of the concept
• The richer the concept (say, religiosity or feminism), the more likely that multiple indicators will be needed
– Criterion validity – the extent to which the measure is supported by other accepted measures
• Concurrent validity – how well the measure correlates with other measures of the concept
• Predictive validity – how well the measure correlates with other concepts its should be related to
Levels of Measurement
• Nominal – values identify categories only– Do not have arithmetic meaning– Also called categorical variables– When there are only two
categories, called dichotomies or binary variables
– Two technical requirements for categories:
• Exhaustive ( every observation fits into some category)
– Leads to lots of “Others”• Mutually exclusive (every
observation fits in exactly one category)
Levels of Measurement, cont.
• Ordinal – same characteristics as nominal PLUS the fact that categories can be ranked from lower to higher– Mathematical operation of subtraction
makes no sense, but > and < do– Most common: Likert
• Interval – same characteristics as ordinal PLUS the fact that the arithmetic difference between any two values makes sense– That is, the usual subtraction
operation makes the usual arithmetic sense
• Ratio – same characteristics as interval PLUS the fact that there is a sensible zero value– Thus division and ratios make sense
Abbreviations often used for “Other” categories
• NA – no answer or not answered
• DK – don’t know
• NAP – not applicable. Often this means the question was not even asked
• nec or n.e.c.– not elsewhere classified. Typically in the category title, “Other, nec”
Ecological and Reductionist Fallacies
• Unit of analysis – level (individual or some kind of aggregate) addressed by your theory or hypothesis
• Unit of observation – level (individual or some kind of aggregate) from which data are collected
• Ecological fallacy – drawing conclusions about individuals based on data from aggregates
• Reductionist fallacy – drawing conclusions about aggregates based on data from individuals
Writing About Crosstabulations From a Sample
• Lead with what is important– What’s important?
• The fate of your hypotheses (if you have stated some)
• The overall pattern for the dependent variable, especially if it is striking or surprising. Then look at deviations from the pattern in the categories of your independent variable
• Big differences between categories of your independent variable
• Things of interest to your audience
– Remember, the usual point of a crosstab is to display differences between categories of the independent variable
Writing About Crosstabulations From a Sample, cont.
• Do not use raw counts; use percents
• Use the correct percents– Do not confuse row, column, and total
percents• Be sure to specify the base for
percents– Usually something like, “… x percent
of [the base] …” or “Of all [bases] surveyed, x percent responded…”
• Round percents in your text (but not necessarily in your tables) to integers
• Be ready to convert percents to simple fractions– For example, 23 percent could be
called “nearly a quarter” or “about one in four”
Writing About Crosstabulations From a Sample, cont.
• Do not confuse percentage differences and percentage point differences– Percentage differences cannot
be calculated by simple subtraction
• Be ready to collapse categories– For example, to combine
“Strongly agree” and “Agree” responses into one category
• Be ready to calculate cumulative percents
Central Limit Theorem
If repeated random samples of size n are drawn from any population with mean μ and standard deviation δ, the sampling distribution of sample means will be normal as n gets large, with mean μ and standard deviation δ/√n (also known as the standard error of the mean) .
Hence, the standard deviation of the means drawn from many, many samples reflects 1) the standard deviation of the population, and 2) the sample size
Probability Sampling
• Probability sampling is any method of drawing a sample of elements from a population such that the probability that any element or set of elements will be included in the sample is known and is not zero
• The chief advantage of probability sampling is that the accuracy (or lack thereof) of estimates of population parameters from the sample can be estimated
Simple Random Sampling (SRS)
• Frame – complete list of the survey population
• Sample size – calculated based on desired precision of results
• Selection rule – random selection without replacement
• Estimate of population mean is the sample mean– Unbiased– s.e. = √fpc * (δ / √sample size)
Simple Random Sampling: Advantages and Disadvantages
• SRS advantages– Samples are easy to draw– Samples are easy to use– Estimation of errors is “easy”
• SRS disadvantages– Not always the lowest standard error
method– Requires complete roster– Can be very expensive
• Completing the frame may be expensive• Reaching geographically dispersed
respondents may be expensive
– May require large sample sizes to deal with rare population elements
• Most elements in the sample will not be rare
Finite Populations and Sampling
• Sampling error estimation depends on the Central Limit Theorem
• The Central Limit Theorem applies to infinite populations– Infinite populations are easy to do in
theory, but rare in practice
• If you sample everyone in a finite population, the sampling error would be 0– The closer you get to sampling
everyone, the smaller your error should be
– Central Limit Theorem says error is proportional to δ/√n
Finite Populations and Sampling, cont.
• The finite population correction factor (fpc) takes into account the reduction in error you should get from sampling all or a large fraction of a finite population
• The fraction of the population that is in the sample, n/N, is called the sampling ratio (f)
• fpc = (N-n)/(N-1) ≈ (N-n)/N = (1 – f)
• The standard error of the mean from a finite population (with simple random sampling) is√fpc * (δ/√n)
• In practice, we ignore the fpc when the sampling ratio is less than 10%
Stratified Sampling
• Frame– Usual SRS frame except broken
into exhaustive, mutually exclusive groups
– Requires knowledge ahead of time about how many elements in the population there are in each group
– Each group is a stratum (plural strata)
• Sample size - calculated based on desired precision of results– Calculations more complex than
with SRS because there are more alternatives
Stratified Sampling (2)
• Selection rules– Cases are drawn from each
stratum– Cases within strata are drawn by
SRS– Two alternatives for number
drawn with each stratum• Proportionate to size – every
element in the population has an equal chance of being drawn into the sample, regardless of stratum
• Disproportionate – some strata will have a larger proportion of the sample than they will of the population
Stratified Sampling (3)
• Estimation of the mean– If proportionate to size selection is
used, the sample mean is an unbiased estimate of the population mean
– If disproportionate selection is used, weights must be used to obtain an unbiased estimate of the population mean
– Standard error of the mean will ordinarily be lower than the standard error from a simple random sample of the same size
– The more homogeneous the elements are within strata, the more efficient stratified sampling will be
Stratified Sampling: Advantages and Disadvantages
(compared with Simple Random Sampling)
• Advantages– Reduced standard errors of estimate
over SRS– Can thus get the same precision as
SRS with smaller sample size– If proportionate selection is used,
unweighted sample statistics can be used to estimate population parameters
– Disproportionate selection can be used to get sufficient numbers of members of rare populations
• Disadvantages– Requires advanced knowledge about
stratum sizes– Disproportionate selection requires
use of weights in making estimates of parameters
Cluster Sampling• Most complex method. Often used
in conjunction with stratification and SRS; this is called multi-stage sampling
• Frame– Broken into groups called clusters– Complete frame is needed only for
clusters that are selected• It is necessary to know the size of clusters
that are not selected
• Sample size – usually calculated based on explicit tradeoff between costs and precision of results– Calculations more complex than with
SRS or stratification because there are more alternatives
Cluster Sampling (2)
• Selection rules– A sample of the clusters is drawn
by simple random sampling– Within each cluster either all the
elements or a simple random sample of the elements are drawn
– When possible, sample sizes within clusters are drawn proportionate to size
– NOTE that in cluster sampling only some of the clusters are used, while in stratified sampling, all of the strata are
Cluster Sampling (3)• Estimation of the mean
– If clusters and elements within clusters were drawn so that all elements in the population had equal probabilities of selection, the sample mean is an unbiased estimate of the population mean. This rarely is possible
– In the likely case of unequal probabilities of selection, weights must be used to obtain an unbiased estimate of the population mean
– Standard error of the mean will ordinarily be higher than the standard error from a simple random sample of the same size
– The more heterogeneous the elements are within strata, the more efficient cluster sampling will be
• To the extent possible, each cluster should be representative of the entire population
Cluster Sampling:Advantages and Disadvantages
(compared with Simple Random Sampling)
• Advantages– Cost control
• In general, the only reason to use clustering is to reduce financial or time costs
– Can be used with stratification of clusters to help control standard errors
– If proportionate selection is used, unweighted sample statistics can be used to estimate population parameters
• Disadvantages– Sampling consultant probably needed– Larger standard errors than with SRS– Parameter and error estimation
usually requires use of weights
Sample Pathologies
• Biggest, most common problem: non-response– Estimation of parameters and
errors assumes that data were collected from every element in the sample
• Limitations on generalizability due to mismatch between the population of interest (target population) and the frame (survey population)– Called coverage error
Source: Patricia Salant and Don A. Dillman. 1994. How to Conduct Your Own Survey. NY: Wiley
Surveys á la Dillman:Eight Steps
1. Decide what information you need
2. Choose a survey method3. Draw a sample4. Write questions5. Design the questionnaire6. Field the survey7. Turn answers into usable
data8. Report results
Dillman on the Survey Process
• Dillman analyzes the survey process from an exchange theory perspective– There is an exchange between
the researcher and the respondent
– Compliance with researcher’s request for information is a function of the social rewards the researcher can offer the respondent
• Rewards such as gratitude, opportunity to have a say on something important
Writing Survey Questions
• Question topics– There is little you can’t ask about– Useful distinction:
• Questions about subjective states like attitudes, beliefs, and knowledge
• Questions about objective phenomena like behavior or demographic attributes
– Always remembering that in a questionnaire even objective phenomena are filtered through the respondent’s mind
Pp. 177ff in W.L. Neuman. 2007. Basics of Social Research. 2nd ed. Boston: Pearson
Writing Survey Questions (2):
Question Form• Two basic question forms:
open-ended and closed-ended
• Open-ended questions are questions to which respondents can give any answer
• Closed-ended questions both ask a question and provide the respondent with preset answers to the question to choose among
Writing Survey Questions (3):
Closed-ended Questions• Questions with ordered
categories– E.g., Likert scale items– When there is an order, be sure
to use it
• Questions with unordered categories
• Partially closed-ended– One option is something like
“Other (please specify) ____”
Pp. 170-3 in W.L. Neuman. 2007. Basics of Social Research. 2nd ed. Boston: Pearson
Writing Survey Questions:Neuman’s
Dirty Dozen Don’ts1. Avoid jargon, slang, and
abbreviations
2. Avoid ambiguity, confusion, and vagueness
a. Whatever
3. Avoid emotional languagea. Can evoke frames that
effectively hijack the intent of the question
4. Avoid prestige bias
Pp. 170-3 in W.L. Neuman. 2007. Basics of Social Research. 2nd ed. Boston: Pearson
Writing Survey Questions:Neuman’s
Dirty Dozen Don’ts (2)5. Avoid double-barreled
questions
6. Do not confuse beliefs with reality
7. Avoid leading questions
8. Avoid asking questions that are beyond respondents’ capabilities
Pp. 170-3 in W.L. Neuman. 2007. Basics of Social Research. 2nd ed. Boston: Pearson
Writing Survey Questions:Neuman’s
Dirty Dozen Don’ts (3)9. Avoid false premises
10.Avoid asking about intentions in the distant future
11.Avoid double negatives
12.Avoid overlapping or unbalanced response categories
Questionnaire Layout (1)
• Very important– Reflects your professionalism in the
eyes or ears of your respondents and the eyes of your interviewers
– Affects the likelihood of measurement error through respondent or interviewer error
– Affects response rate
• In mail surveys designed primarily with respondent in mind
• In telephone and face-to-face surveys, designed with both interviewer and respondent in mind
Questionnaire Layout (2):Mail Surveys
• Overall objectives– Minimize perceived (and real)
respondent burden– Don’t confuse respondent– Simplify later data entry
• Make a booklet– Questions are enclosed inside a
booklet made of folded legal sized (8.5 x 14 inch) paper
– No questions on the front or back of the booklet
Questionnaire Layout (3):Mail Surveys
• Front page of booklet:– Title of study– Some graphic stuff– Sponsor– Return address
• Back page– Request for comments– Thank you– Return address and telephone
contact information
Questionnaire Layout (4):Mail Surveys
• Overall question sequence– Start easy
• First question must grab attention, reflect the issues in the cover letter, and not be too difficult or threatening
– Start on topic– Group like questions together
• Makes writing transitions easier
– Keep threatening questions until later in the questionnaire
– Get your demographics last• That’s probably least important to
you and apparently least relevant to respondent
Questionnaire Layout (5):Mail Surveys
• Layout of individual pages– Use white space
• What counts is not how many pages the survey is, but rather how long it seems to be to respondents
– Use fonts consistently to distinguish questions, answers, and instructions
• Dillman likes to use bold for questions, all caps for answers, unbolded for transitions, and unbolded in parentheses for instructions
– Establish a vertical flow– Precode the answers, usually on
the left margin
Source: Salant and Dillman
Fielding Mail Surveys (1)
Overview
1. We’re always trying to increase response rates
2. Respondents are most likely to respond if they think benefits outweigh their costs
3. We need to keep respondents engaged from the opening of the mail through the returning of the completed questionnaire
Source: Salant and Dillman
Fielding Mail Surveys (2)
Bottom lines1. Mail survey response rates depend very much on the number of contacts2. Mail surveys require advanced planning
- Be sure you have the resources to meet the schedule
3. What really matters is the overall look and feel of the questionnaire
- It’s a lot like buying (or selling!) a car
Fielding Mail Surveys (3)
• First mailout – advanced notice letter– Sent to the entire sample– Mailed first class– Handwritten signature– Explains why there will be a
survey– Explains why participation will be
appreciated
• Put yourself on the mailing list for this and all other mailings
Source: Salant and Dillman
Fielding Mail Surveys (4)
• Second mailout – cover letter, questionnaire, and return envelope– Sent one week after advanced
notice– Cover letter
• Personalized• Explains survey purpose• Explains ID# on the questionnaire
and promises confidentiality• Reinforces importance of
everyone’s participation• Specifies who should complete the
questionnaire• Thanks respondent for participation• Hand signed
Fielding Mail Surveys (5)
• Questionnaire – with ID number
• Return envelope is stamped, addressed, and ready for use
Fielding Mail Surveys (6)
• Third mailout – postcard followup– 4 to 8 days later– Personalized– Reminding and thanking
• Fourth mailout – new cover letter, questionnaire, and return envelope– Three weeks after the second
mailout (the first one with a copy of the questionnaire)
– Sent only to addresses that have not yet returned the survey
Fielding Mail Surveys (7)
• The four mailings should yield a final response rate of 50 – 60 percent
• To further increase response rate, one can:– Send another follow up like the
fourth mailing– Send the follow up as certified or
express mail– Telephone
• Often you will discover that people shouldn’t have been in the sample in the first place
Sampling Review
• Rule of thumb sampling error of a proportion at the 95 percent confidence level = 1 / square root (sample size)– If size = 400, error = 1/20 = 5%
• The Central Limit Theorem is important for social science research because it provides the mathematical basis for using probability samples 1) to make estimates of parameters from large populations using small samples and 2) to estimate the precision of those estimates
Sampling Review (2)
• In both stratified and cluster sampling the survey population is divided into exhaustive, mutually exclusive groups. Each group could be either a stratum or a cluster
• If we use all the groups in our final sample, we call each group a stratum
• If we use only some of the groups in our final sample, we call each group a cluster