Lecture 4 Lecture 4 Survey Design and Data Survey Design and Data Coding Coding
Jan 06, 2016
Lecture 4Lecture 4
Survey Design and Data CodingSurvey Design and Data Coding
22
OverviewOverview
What is a surveyWhat is a survey
Question design and considerationsQuestion design and considerations
Testing a survey instrumentTesting a survey instrument
Data considerations (data coding)Data considerations (data coding)
33
What is a Survey?What is a Survey?
To survey: To survey: ““act of looking or seeing, observing”act of looking or seeing, observing”
Research SurveysResearch Surveys Qualitative interviews, focus groupsQualitative interviews, focus groups Specific, systematic, quantitative data-Specific, systematic, quantitative data-
collection instrumentscollection instruments
44
55
Example SurveysExample Surveys
US CensusUS Census
General Social SurveyGeneral Social Survey
Online SurveysOnline Surveys
66
Some General Considerations for a Some General Considerations for a SurveySurvey
Must have a logic for each questionMust have a logic for each question Must have a logic for the question responses (if Must have a logic for the question responses (if
provided)provided) Must have a logic for the sequencing of questionsMust have a logic for the sequencing of questions
Must have clear wordingMust have clear wording
Must have clear formattingMust have clear formatting
Must take into account the sample population Must take into account the sample population that will actually take/use the survey instrumentthat will actually take/use the survey instrument Culture, language, interpretive ambiguityCulture, language, interpretive ambiguity
77
Question LogicQuestion Logic
Avoid redundant questions unless you have a Avoid redundant questions unless you have a reasonreason Exception: Sometimes it is a good idea to ask two Exception: Sometimes it is a good idea to ask two
questions that tap same concept (first as a scale, then questions that tap same concept (first as a scale, then as a categorical decision)as a categorical decision)
Example (ranks):Example (ranks): First ask respondent to rate 1-10 how much they First ask respondent to rate 1-10 how much they
agree with several statements.agree with several statements. Next, ask respondents to rank the statements by how Next, ask respondents to rank the statements by how
much they agree with them (i.e., 5 statements, rank much they agree with them (i.e., 5 statements, rank them 1-5).them 1-5).
88
Question Logic (continued)Question Logic (continued)
Avoid asking unnecessary questionsAvoid asking unnecessary questions Example: Survey on computer usageExample: Survey on computer usage
Socio-demographic questions (age, gender)?Socio-demographic questions (age, gender)?
Risk behavior questions (drug use, etc)?Risk behavior questions (drug use, etc)?
99
Question Response Logic: Scales Question Response Logic: Scales versus Categoriesversus Categories
As a basic rule, metric scales with more range As a basic rule, metric scales with more range are better than binary or categorical responses are better than binary or categorical responses (when appropriate).(when appropriate).
Example: Happiness Example: Happiness Are you happy or sad today?Are you happy or sad today?How happy are you on a scale of 0-10 (0 is least happy, 10 is How happy are you on a scale of 0-10 (0 is least happy, 10 is most happy)?most happy)?
With general scales and Likert-scales, consider With general scales and Likert-scales, consider having no “middle” category (neutral, no having no “middle” category (neutral, no opinion).opinion).
1010
Question Response Logic: Question Response Logic: Categorical ResponsesCategorical Responses
Responses must be mutually exclusiveResponses must be mutually exclusive Example (bad): Where do you live?Example (bad): Where do you live?
Berkeley, San Fran, Bay Area, OtherBerkeley, San Fran, Bay Area, Other
Responses must be exhaustiveResponses must be exhaustive Example (bad): What kind of computer do you have?Example (bad): What kind of computer do you have?
PC, MacPC, Mac
Use ‘Don’t Know’, ‘Other’, ‘Not Applicable’ when Use ‘Don’t Know’, ‘Other’, ‘Not Applicable’ when absolutely necessary absolutely necessary
1111
Question SequenceQuestion Sequence
Static order for questions needs to have some Static order for questions needs to have some rationale/logicrationale/logic Grouping similar items togetherGrouping similar items together Scattering similar items throughout surveyScattering similar items throughout survey
Personal demographic questions work best at Personal demographic questions work best at end of survey (response rate and completion)end of survey (response rate and completion)
Randomization for all respondentsRandomization for all respondents
1212
Clear Wording / Leading QuestionsClear Wording / Leading Questions
Clear WordingClear Wording Example:Example:
“ “What ISP do you use?”What ISP do you use?”““If you have Internet service at home, what company or If you have Internet service at home, what company or service provider do you use for Internet access?”service provider do you use for Internet access?”
Leading QuestionsLeading Questions Example:Example:
““Don’t you think we should support our troops in Iraq?” Don’t you think we should support our troops in Iraq?” ““How strongly do you agree or disagree with the following How strongly do you agree or disagree with the following question: ‘We should support our troops in Iraq’”question: ‘We should support our troops in Iraq’”
1313
Clear Formatting, LogicClear Formatting, Logic
Not all questions apply to everyoneNot all questions apply to everyone Example:Example:
““How much do you spend on gas heat each month?”How much do you spend on gas heat each month?”
Branching is a possible solutionBranching is a possible solution Example:Example:
““Do you have gas heat? If yes, go to next question. If not, skip to Do you have gas heat? If yes, go to next question. If not, skip to question #3.question #3.
Condense when possible to avoid unnecessary Condense when possible to avoid unnecessary branching.branching.
Example:Example:““How much do you spend on gas heat each month? (write 0 if you How much do you spend on gas heat each month? (write 0 if you do not have gas heat)do not have gas heat)
1414
Know your sample populationKnow your sample population
Regional language and terminologyRegional language and terminology
Cultural differencesCultural differences
How you conduct survey can influence How you conduct survey can influence your valid sampleyour valid sample Door to door?Door to door? Registered telephone directory?Registered telephone directory? Internet-based survey?Internet-based survey?
1515
Replication and Using Existing Replication and Using Existing Survey InstrumentsSurvey Instruments
ALWAYS a good idea to find other surveys that are used ALWAYS a good idea to find other surveys that are used in your area of interest.in your area of interest.
Especially with large, funded surveys the questions may have Especially with large, funded surveys the questions may have been tested for been tested for reliabilityreliability..
Allows for Allows for comparisons between different samplescomparisons between different samples if the if the question wording is the same.question wording is the same.
If a question or set of questions is accepted as a good If a question or set of questions is accepted as a good operationalization of the concept you are interested in, operationalization of the concept you are interested in, you don’t you don’t want to reinvent itwant to reinvent it unless you really intend to argue that your unless you really intend to argue that your measure is more appropriate.measure is more appropriate.
Pre-Testing and Pilot Pre-Testing and Pilot StudiesStudies
1717
Testing a Survey InstrumentTesting a Survey Instrument
Pre-testing versus PilotsPre-testing versus Pilots
Pre-tests: Focus on individual questions or Pre-tests: Focus on individual questions or the entire survey instrument/questionnaire.the entire survey instrument/questionnaire.
Pilots: Usually larger scale than pre-testing, Pilots: Usually larger scale than pre-testing, involve testing the entire survey procedure.involve testing the entire survey procedure.
1818
Pre-Testing and PilotsPre-Testing and Pilots
Pre-tests and Pilots are always necessaryPre-tests and Pilots are always necessary, unless the , unless the survey in its existing form has already been given before.survey in its existing form has already been given before.
Pre-testing and Pilot studies should have a Pre-testing and Pilot studies should have a large enough large enough response rateresponse rate so that you can actually find problems! so that you can actually find problems!
Example: You want to survey 100 undergrads for a small study. Example: You want to survey 100 undergrads for a small study. You may need to at least pre-test on a 20% sample with You may need to at least pre-test on a 20% sample with differentdifferent undergrads than your intended valid sample (i.e., pre- undergrads than your intended valid sample (i.e., pre-test on 20 undergrads from your intended population, but not test on 20 undergrads from your intended population, but not students who could end up in your final survey)students who could end up in your final survey)
1919
Testing your QuestionsTesting your Questions
Does the respondent’s comprehension of Does the respondent’s comprehension of question meaning match that of the question meaning match that of the researcher?researcher?
Does the researcher put too much of an Does the researcher put too much of an expectation of recall on the respondent?expectation of recall on the respondent?
2020
Ways to test your questionsWays to test your questions
Behavior codingBehavior coding– interview some respondents – interview some respondents as you give the survey questions and keep track as you give the survey questions and keep track of requests for clarification.of requests for clarification.
Ask pretest respondents to Ask pretest respondents to rephraserephrase your your questions in their own words.questions in their own words.
Panels of ‘experts’:Panels of ‘experts’: give your questions to give your questions to groups of individuals for comments/suggestions.groups of individuals for comments/suggestions.
2121
The pitfalls of skipping the testing The pitfalls of skipping the testing stagestage
Best case scenario: you get some or most of Best case scenario: you get some or most of what you wanted to get– but often an uphill what you wanted to get– but often an uphill battle with justifying your operationalizations and battle with justifying your operationalizations and wording choices.wording choices.
Worst case scenario: you get wild differences in Worst case scenario: you get wild differences in responses, respondents don’t understand key responses, respondents don’t understand key questions, large incompletion rate, money and questions, large incompletion rate, money and time spent on conducting survey is wasted time spent on conducting survey is wasted (except for your newfound appreciation for pre-(except for your newfound appreciation for pre-testing and pilots)testing and pilots)
After the Survey: Data After the Survey: Data Coding and Error CheckingCoding and Error Checking
2323
Data CodingData Coding
The coding considerations start with the The coding considerations start with the survey itself.survey itself.
You develop a You develop a codebookcodebook that records what that records what the possible numeric responses will be for the possible numeric responses will be for everyevery question. question.
No open-ended questions unless No open-ended questions unless absolutely necessaryabsolutely necessary for other reasons. for other reasons.
2424
Making data numericMaking data numeric
Use numbers to represent variable valuesUse numbers to represent variable values Assign a numeric value to all of the values that your Assign a numeric value to all of the values that your
variables can take. variables can take. Example: Gender (Male, Female) Male=0, Female=1.Example: Gender (Male, Female) Male=0, Female=1.
Develop a systematic way of handling missing Develop a systematic way of handling missing data!data! You You mustmust enter a value for missing data– otherwise enter a value for missing data– otherwise
you will not know if missing is due to input error, N/A, you will not know if missing is due to input error, N/A, skipped question, etc.skipped question, etc.
Example: use numeric codes that would not normally make Example: use numeric codes that would not normally make sense for the variable (e.g., -9 for Missing, -8 for Not sense for the variable (e.g., -9 for Missing, -8 for Not Applicable, etc).Applicable, etc).
2525
Other tips for creating your datasetOther tips for creating your dataset
Use ID numbers– always, no exceptions! Use ID numbers– always, no exceptions! seriously!seriously! Datasets get manipulated and resorted constantly. Datasets get manipulated and resorted constantly.
Without ID’s, errors cannot be corrected, outliers Without ID’s, errors cannot be corrected, outliers cannot be identified. ID’s should allow you to match cannot be identified. ID’s should allow you to match any case in the dataset with an actual survey taken by any case in the dataset with an actual survey taken by that individual.that individual.
Use conventional data structure.Use conventional data structure. Rectangular format, each row is a case and each Rectangular format, each row is a case and each
column is a single variable.column is a single variable.
2626
Error Checking DataError Checking Data
Why?Why? Solves problems that may occur laterSolves problems that may occur later Makes sure your entire analysis is not bogusMakes sure your entire analysis is not bogus You may accidentally engage in coitus more You may accidentally engage in coitus more
often as you get older….WHAT?!often as you get older….WHAT?!
2727
Marital Coital FrequencyMarital Coital Frequency
Jasso and Guillermina (1985) “Marital Coital Jasso and Guillermina (1985) “Marital Coital Frequency and the Passage of Time: Estimating Frequency and the Passage of Time: Estimating the Separate Effects of Spouses’ Ages and the Separate Effects of Spouses’ Ages and Marital Duration, Birth and Marriage Cohorts, Marital Duration, Birth and Marriage Cohorts, and Period Influences” (American Sociological and Period Influences” (American Sociological Review)Review) Major Findings of the Study:Major Findings of the Study:
Controlling for cohort and age effects, negative period effectControlling for cohort and age effects, negative period effectControlling for period and cohort effects, wife’s age had a Controlling for period and cohort effects, wife’s age had a positive effectpositive effectBoth findings differ significantly from earlier studies of the Both findings differ significantly from earlier studies of the same topic.same topic.
2828
Coitus: Part DeuxCoitus: Part Deux
Kahn and Udry (1986) “Marital Coital Frequency: Kahn and Udry (1986) “Marital Coital Frequency: Unnoticed Outliers and Unspecified Interactions Unnoticed Outliers and Unspecified Interactions Lead to Erroneous Conclusions” (American Lead to Erroneous Conclusions” (American Sociological Review)Sociological Review) Major Findings:Major Findings:
In the Jasso study, 4 cases were coded as 88– MISSING In the Jasso study, 4 cases were coded as 88– MISSING DATA CODES!!!DATA CODES!!!4 more cases had very large studentized residuals (each 4 more cases had very large studentized residuals (each was also very different from the first survey)was also very different from the first survey)Missed an important interaction between length of marriage Missed an important interaction between length of marriage and wife’s ageand wife’s ageDropping the 8 outliers from the sample of more than 2000 Dropping the 8 outliers from the sample of more than 2000 cases drastically changed the findingscases drastically changed the findings
2929
How to Error Check dataHow to Error Check data
Know what you are looking to check, use Know what you are looking to check, use appropriate methods:appropriate methods: DescriptivesDescriptives FrequenciesFrequencies Cross-tabulationsCross-tabulations
3030
Error Checking ExamplesError Checking Examples
Checking Original Variables for ErrorsChecking Original Variables for Errors FrequenciesFrequencies DescriptivesDescriptives
Checking and Setting “Missing” codesChecking and Setting “Missing” codes
Recoding and Creating New Variables from Recoding and Creating New Variables from Existing VariablesExisting Variables FrequenciesFrequencies Cross-TabulationsCross-Tabulations
3131
Example:Example:
Class Data SetClass Data Set