Top Banner
Secondary Data Analysis Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of
27

Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Jan 12, 2016

Download

Documents

Jody Nash
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Secondary Data AnalysisSecondary Data Analysis

Linda K. Owens, PhD Assistant Director for Sampling

and Analysis Survey Research Laboratory University of

Illinois

Page 2: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

2 of 27

What is secondary data?What is secondary data?

• Data collected by a person or organization other than the users of the data

Page 3: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

3 of 27

Advantages of Secondary DataAdvantages of Secondary Data

• Unobtrusive

• Fast & inexpensive

• Avoid data collection problems

• Provide bases for comparison

Page 4: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

4 of 27

Disadvantages of Secondary DataDisadvantages of Secondary Data

• Data availability

• Level of observation

• Quality of documentation

• Data quality control

• Outdated data

Page 5: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

5 of 27

Data SourcesData Sources

Inter-university Consortium for Political and Social Research (ICPSR)http://www.icpsr.umich.edu/index-medium.html

National Center for Health Statistics (NCHS) http://www.cdc.gov/nchs/default.htm

Center for Medicare and Medicaid Services (CMS) http://cms.hhs.gov/researchers/

US Census Bureau http://www.census.gov/main/www/access.html

Page 6: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

6 of 27

Examples of Directly Downloadable Data from NCHS:

National Health and Nutrition Examination Survey (NHANES)

National Ambulatory Medical Care Survey (NAMCS)

National Hospital Ambulatory Medical Care Survey (NHAMCS)

National Hospital Discharge Survey (NHDS)

National Home and Hospice Care Survey (NHHCS)

National Nursing Home Survey (NNHS)

National Survey of Ambulatory Surgery (NSAS)

National Employer Health Insurance Survey (NEHIS)

National Vital Statistics System (NVSS)

National Health Interview Survey (NHIS)

Data Sources (cont.)Data Sources (cont.)

Page 7: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

Survey Documentation & AnalysisSurvey Documentation & Analysis

Web-based analysis and documentation

• http://sda.berkeley.edu/

• http://www.icpsr.umich.edu/access/sda.html

• http://www.icpsr.umich.edu/NACJD/das.html

• http://www.icpsr.umich.edu/SAMHDA/

7 of 27

Page 8: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

8 of 27

Data Available for Use with Survey Documentation and Analysis (SDA):

Aging Data • Longitudinal Study of Aging, 70 Years and Older, 1984-1990• National Survey of Self-Care and Aging: Follow-Up, 1994 • National Health and Nutrition Examination Survey II: Mortality Study, 1992• National Hospital Discharge Survey, 1994-1997• National Health Interview Survey, 1994, Second Supplement on Aging

Criminal Justice Data• International Crime Data • Homicide Data • National Crime Victimization Survey Data• Corrections Data

Data Sources (cont.)Data Sources (cont.)

Page 9: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

9 of 27

Data Available for Use with Survey Documentation and Analysis (continued):

Substance Abuse Data• Drug Abuse Warning Network• Monitoring the Future • National Household Survey on Drug Abuse • National Pregnancy and Health Survey• National Treatment Improvement Evaluation Study • Treatment Episode Data Set • Uniform Facility Data Set • Washington, DC Metropolitan Area Drug Study (DC*MADS)

Data Sources (cont.)Data Sources (cont.)

Page 10: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

10 of 27

Evaluation of Data SourcesEvaluation of Data Sources

• Purpose of the study

• Sponsor/collector of the data

• Mode of data collection

• Sampling procedures

• Consistency of data with other sources

Page 11: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

11 of 27

Evaluation of Data Sources (cont.)Evaluation of Data Sources (cont.)

• Documentation

• Number of observations

• Number of variables

• Coding scheme

• Summary statistics

Page 12: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

12 of 27

Types of Survey Sample DesignTypes of Survey Sample Design

• Simple Random Sampling

• Systematic Sampling

• Complex sample designs

▪ stratified designs

▪ cluster designs▪ mixed mode designs

Page 13: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

13 of 27

Types of Survey Sample DesignTypes of Survey Sample Design

• Simple Random Sampling Each member of the population has an equal

and known chance of being selected Simple Random Sample With Replacement

(SRSWR) Simple Random Sample Without

Replacement (SRSWOR)

Page 14: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

14 of 27

Types of Survey Sample DesignTypes of Survey Sample Design

• Systematic Random Sampling the selection of every kth element from a

sampling frame with the sampling interval k (=N/n).

Page 15: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

15 of 27

Types of Survey Sample DesignTypes of Survey Sample Design

• Stratified sample The population is first divided into non-

overlapping subpopulations: strata such as gender, race or SES.

Sample from each stratum. Proportionate vs. disproportionate Works most effectively when the variance of

the dependent variable is smaller within the stratum than in the sample as a whole.

Page 16: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

16 of 27

Types of Survey Sample DesignTypes of Survey Sample Design

• Cluster sample Elements are selected in groups or clusters

PSU: Primary Sampling Unit.  This is the first unit that is sampled in the design.  For example, school districts from Chicago may be sampled and then schools within districts may be sampled.

Homogeneity within cluster: Intracluster correlation (ICC)

Page 17: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

17 of 27

Why complex survey design?Why complex survey design?

• Increased efficiency

• Decreased costs

• Sometimes the only option available

Page 18: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

18 of 27

Complex Survey Design Complex Survey Design

• Complex designs with clustering and unequal selection probabilities generally increase the sampling variance.

• Not accounting for the impact of complex sample design can lead to Type I error.

Page 19: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

19 of 27

Sample Weights Sample Weights

• “pweight” or selection weight: Used to adjust for differing probabilities of selection (=N/n).

• In theory, simple random samples are self-weighted

• In practice, simple random samples are likely to also require adjustments for non-response

Page 20: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

20 of 27

Types of Sample WeightsTypes of Sample Weights

Post-stratification weights:• Typically used to adjust for minor differences in

nonresponse by demographic subgroup.• Bring the sample proportions in demographic

subgroups into agreement with the population proportion in the subgroups.

• Requires auxiliary dataset to use as a comparison.• Not a fix for bad sample design

Page 21: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

Post-Stratification Weights ExamplePost-Stratification Weights Example

21 of 27

Sample

Percent

Population

Percent

Weight

Male 42% 49% 1.16

Female

58% 51% .879

Page 22: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

22 of 27

Types of Sample Weights (cont.)Types of Sample Weights (cont.)

Non-response weights: • Designed to inflate the weights of survey

respondents to compensate for nonrespondents with similar characteristics.

• Only useful if nonresponse varies by stratum (unless inflating sample size to population size).

Page 23: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

23 of 27

Types of Sample Weights (cont.)Types of Sample Weights (cont.)

“Blow-up” (expansion) weights:

• Weights sum to population total

• Provide estimates for the total population of interest

Page 24: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

24 of 27

Types of Sample Weights (cont.)Types of Sample Weights (cont.)

Replicate weights: • A series of weight variables that are

used instead of PSUs and strata in an effort to protect the respondents' identity. Pweight and the replicate weights must be used for the correct calculation of the point estimate and its standard error.

Page 25: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

Summary of WeightsSummary of Weights

• Weight for probability of selection

• Adjust for non-response

• Post-stratify

• Expand or contract to population/sample totals

25 of 27

Page 26: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

26 of 27

Syntax Examples of Design-based Syntax Examples of Design-based Analysis in STATA, SUDAAN & SASAnalysis in STATA, SUDAAN & SAS

STATA

svyset strata strata

svyset psu psu

svyset pweight finalwt

svyreg fatitk age male black hispanic

SUDAAN

proc regress data=”c:\nhanes.sav” filetype=spss desgn=wr;

nest strata psu;

weight finalwt

subpgroup sex race;

levels 2 3;

model fatintk = age sex race;

Page 27: Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

Survey Research Laboratory

27 of 27

Syntax Examples of Design-based Syntax Examples of Design-based Analysis in STATA, SUDAAN & SASAnalysis in STATA, SUDAAN & SAS

SAS

proc surveyreg data=nhanes;

strata strata;

cluster psu;

class sex race;

model fatintk = age sex race;

weight finalwt;