Top Banner
SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL AND STATISTICAL CONSIDERATIONS MSCR PROGRAM AND AXIS RESEARCH DESIGN, EPIDEMIOLOGY AND BIOSTATISTICS Magda Shaheen, MD, PhD; Deyu Pan, MS; Sukrit Mukherjee, MS
60

SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Mar 16, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

SECONDARY DATA SOURCES FOR RESEARCH

EPIDEMIOLOGICAL AND STATISTICAL CONSIDERATIONS

MSCR PROGRAM AND AXIS RESEARCH DESIGN, EPIDEMIOLOGY AND BIOSTATISTICS

Magda Shaheen, MD, PhD; Deyu Pan, MS; Sukrit Mukherjee, MS

Page 2: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Secondary Data (1)• Primary or Secondary data:

– Depends on a relationship between who collect the data and who analyze it

– Data collected by researchers for specific purpose is primary data

– Data collected by someone else for other purpose is secondary data

• Example:– Behavioral Risk Factor Surveillance System (BRFSS) data collected annually by the Center for Disease Control and Prevention (CDC) and State Health Department 

• A researcher used and analyze the data to answer his own research question (i.e., Analysis of secondary data) 

Page 3: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

• Strengths:– Large samples

– Population estimates

– Can test trends over time

• Limitations:– Non‐experimental

– Constructs measured by fewer items (no scales)

– Oftentimes require special statistical techniques

– Most are cross‐sectional

Secondary Data (2)

Page 4: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Secondary Data (3)

• Usage:– Pilot data for grant (e.g., R01) proposals

– Hypothesis generation/testing

– Publications

• Types:

– National Health Survey Data

– State Health Survey Data

Page 5: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Behavioral Risk Factor Surveillance System (BRFSS)

http://www.cdc.gov/brfss/

PopulationAdults

MethodRandom Digit Dial telephone surveyState Administration

ContentBehaviors associated with chronic diseases, injuries, and infectious diseases

Sexual behaviorSmokingCancer screeningDiet and exercise

Data>150,000 subjects/yearCore questions asked of everyone and state‐specific modulesData can be combined across states to get national estimates

Page 6: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

California Health Interview Survey http://www.chis.ucla.edu

PopulationAdult, adolescent and child questionnairesVery diverse racial/ethnic population

MethodTelephone survey of all California counties

Content Physical activityHealth statusHealth conditionsCancer screeningDietSociodemographic information

Data2001, 2003, 2005, and 2007 data available (2009 underway)~40‐50,000 respondents/survey

NoteMany Latino and Asian groups represented

Page 7: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

National Health Interview Survey (NHIS)         

http://www.cdc.gov/nchs/nhis.htm

PopulationHouseholds, families, adults and children

MethodFace to face interview

ContentHealth conditions and behaviors, access to and use of health servicesCancer Control Module (1987, 1992, 2000, 2003, and 2005)

Energy BalanceCancer Screening Sun Avoidance Tobacco Use and Control Genetic Testing

Datan~40,000 households (~87,000 individuals)Initiated in 1957

Page 8: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

http://hints.cancer.gov

PopulationAdults (18+)

MethodRandom digit dial (RDD)Conducted Biennially

ContentCommunications trends and practices Cancer information access and usage Cancer risk perceptionMental models of cancerHealth behaviors

Data2003 (n= 6,469); 2005 (n= 5,586)HINTS 2007 data available to public February 16, 2009

Health Information National Trend Survey (HINTS)      

Page 9: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

http://www.bls.census.gov/cps/cpsmain.htm

PopulationPart of the Current Population Survey

Method75% telephone 25% in‐home

Content:Cigarette smoking prevalenceCurrent and past cigarette consumption Cigarette smoking quit attempts and intentions to quitCigar, pipe, chewing tobacco, and snuff useDegree of youth access to tobacco in the community Attitudes toward advertising and promotion of tobacco

DataSample of ~240,000 respondents in a given survey periodPart of CPS since 1992

Tobacco Use Supplement – Current Population Survey

Page 10: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

National Longitudinal Survey of Adolescent Health (Add Health)

http://www.cpc.unc.edu/addhealth

PopulationAdolescents (grades 7 thru 12) from 80 High Schools and  52 Middle Schools; started in 1994‐05 and latest follow‐up  in 2008 (ages: 24‐32) 

MethodIn‐school questionnaire and in‐person interview 

ContentHealth conditions and behaviors; access to and use of health services; social, psychological and  physical well‐ being; risk behaviors

Datan~6,504

NoteFollow‐up data available at 1, 2, and 6 year intervalsNo fee for public data; $750 fee for restricted data

Page 11: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Surveillance Epidemiology and End Results (SEER)

http://seer.cancer.gov/

PopulationChildren to adults

MethodData collected from cancer registries that cover ~26% of the US population; follow‐up with individual cases until death

ContentCancer incidence, prevalence, and survival data; limited demographics (age, race/ethnicity, region)

Data100% of cancer cases in registries; Six million cases with 350,000 added each year; 1973 to 2006

NoteNeed specialized software to analyze (SEER*Stat or SEER*Prep)  downloaded from website;Must sign user agreement to obtain; limited to research purposes;

Can be linked to Medicare data

Page 12: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

SEER‐Medicare Database

• The linkage of two large population‐based sources of data (Seer + Medicare) 

• can be used for an array of epidemiological and health services research (the use of cancer tests and procedures and the costs of cancer treatment can be examined).

• Are not public use files. Investigators are required to obtain approval in order to acquire data.

Page 13: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

National Health and Nutrition Examination Survey 

(NHANES)

http://www.cdc.gov/nchs/nhanes.htm

PopulationChildren and Adults 

MethodFace to face interviewPhysical exams

Content Chronic and Infectious DiseaseMental health and cognitive functioningEnergy BalanceReproductive history and sexual behaviorRespiratory disease

DataN~5000/YearInitiated in 1960’s; Annual since 1999On‐line tutorial

Page 14: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

National Health and Nutrition Examination SurveysSurvey Date Ages     

• NHES II  1963–65  6–11 years• NHES III  1966–70  12–17 years• NHANES I  1971–75  1–74 years• NHANES II  1976–80  6 mo.–74 years• HHANES  1982–84  6 mo.–74 years• NHANES III  1988–94  2 mo. +• NHANES  1999–2000  All ages• NHANES  2001–2002  All ages• NHANES  2003–2004  All ages• NHANES  2005–2006  All ages• NHANES  2007–2008  All ages• NHANES  2009–2010  All ages• NHANES  2011–2012  All ages

Page 15: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

NHANES Data Retrieval & Manipulation System Software Overview (1)

• Product functional capabilities: 

– Give an opportunity to :

• visualize, explore, query and analyze data. 

• display, query, summarize, and organize the NHANES data 

• generate a visual representation of the desired data 

• download the required data in a file format specified by the user.

• The following users will be able to use the system:

1. Faculty of the Charles Drew University of Medicine and Science.

2. Researchers of the Charles Drew University of Medicine and Science.

3. Students of the Charles Drew University of Medicine and Science.

Page 16: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Software Overview (2)• General constraints: 

– Currently only NHANES Data are available for analysis and use. 

– Some training may be required for the users for optimum use of the system.

• The Specification assumes that:

1. No Changes of the historical data takes place in future

2. New data are to be added to the system every year, which must conform to the standard as, laid out in the Database Design Documentation.

3. The Data under consideration is complete and intact.

Page 17: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Attributes• Security: 

– The system is a password‐protected application and each user needs to use his/her credentials to log into the system. 

– There is a default session time for each user session and if left inactive in a logged in state the system will automatically logged off after the session time is over. 

– The user passwords are stored in an encrypted format in the database. – The Administrator is in charge in maintaining the user.

• Reliability, Availability, Maintainance: – The system has an optimum reliability, a high uptime and does not require any 

maintenance.

• Configuration and Compatibility: – The system is not required to be configured for individual customization, because it is 

configured automatically. – The system is not also designed for any other computing environments.

• Installation: – The system is already installed on the Web Server by the developer and does not require 

any special installation. 

• Usability: – The system has a high usability with extremely good user‐friendliness like error messages 

that direct the user to a solution, input range checking as soon as entries are made, and order of choices and screens corresponding to user preferences.

Page 18: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Software Interface (1)

Page 19: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Software interface (2)

The system retrieves design related variables (primary sampling unit, stratification, and sampling weight) 

Page 20: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Other SurveysNational Longitudinal Mortality Study

http://www.census.gov/nlms/National Health Care Survey

http://www.cdc.gov/nchs/nhcs.htmNational Ambulatory Medical Care Survey

http://www.cdc.gov/nchs/about/major/ahcd/ahcd1.htmMedical Expenditure Panel Survey

http://www.meps.ahrq.gov/Medicare Current Beneficiary Survey

http://www.cms.hhs.gov/MCBS/Medicare Health Outcomes Survey

http://www.hosonline.org/National Survey on Drug Use and Health

http://www.oas.samhsa.gov/nhsda.htmNational Survey of Family Growth

http://www.cdc.gov/nchs/about/major/nsfg/nsfgbiblio.htmAmerican Time Use Survey http://www.bls.gov/tus/

Page 21: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Summary• Subsample of all publicly available datasets

• Most are cross‐sectional

• All employ a complex sampling design

– Many use multi‐stage sampling• Example: 

– The NHANES uses a stratified multistage design that includes the selection of the following:

» PSUs, which are counties or small groups of contiguous counties» SSUs within PSUs, which comprise one or more blocks of households» households within the SSUs» one or more participants within the households

– Requires special software to analyze (e.g., SUDAAN, SAS, Survey module of STATA, Complex sample module of SPSS, EPIINFO)

– Use of weighting, clustering, and stratification

– Differences in variance estimation methods

– You should see documentation from the sites for analytic guidelines and recommendations 

Page 22: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Complex Survey Designs

• Stratification

• Clustering

• Unequal Probabilities of Selection

Traditional calculations give incorrect point estimates; easily fixed through weighting

Traditional statistical calculations give incorrect variance estimates

Page 23: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Statistical and Epidemiological Issues

1. Stratifications

2. Statistical weight of a sampled person

3. Clustering

Page 24: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

1. Stratification• Population divided before sampling into disjoint, 

exhaustive groups (strata)

– Members termed primary sampling units (PSUs) 

– Independent samples are taken in each strata

• Strata formed by similar geographic areas 

– Example:• NHANES: partition US counties into 49 strata based on region and economic/racial characteristics

• Sample 2 counties (PSUs) from each strata

Page 25: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Simple Random Sample (Random sample of list)

1 324 5 6 7 8 9 10 11

12131415161718192021

2223

24 2526

2728

29 30 3132

3334353637383940

41 42 43 44 45 46 47 48 49 50 51

52

53545556

57585960

2 6

18

26

3740

44 49

5359

Page 26: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Separate population into two or more strata and count number in each

1 324 5 6

78 9 10 11

1213

1415

87654321

1716

1514

1312 11 10

9

2225242321201918

36 35 3433 32 31

3029 28 27 26

45

44434241

4039

3837

Stratified Random Sampling

• Separate population into two or more strata

• Do population listing in each stratum, as was done with SRS

• Conduct SRS or systematic sample in each stratum

Page 27: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Stratified Sampling Requires More Complicated Analysis

‐ involves statistical weights

87654

321

1716

1514

1312 11

36 35 3433 32 31

3029

28 27

10

9

2524232221201918

26

45

44434241

4039

3837

The 5 in stratum two must be weighted to represent 45 people

1 324 5

67

89 10 11

1213

1415

The 5 in stratum one are weighted to 

represent 15 people

Page 28: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

2. Statistical Weight

• The statistical weight of a sampled person is the number of people in the population that the person represents. 

• If sampling rate is 1/1000 

– Each sampled person represents 1000 people – Each sampled person would have a sample weight of 1000

• Weights derived from:

– selection probabilities, – response rates, – post‐stratification adjustment (e.g., gender, education, income, 

region).

Page 29: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

HINTS (2003): Older People Participated at a Higher Rate Female participated at higher rate

Page 30: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Unweighted Analysis would have:

Participation increased with educationMinorities were oversampled

* Too many African Americans and Hispanics* Too many 45+ and too few 18-34 year olds* Too many females and too few males* Too many people with high education

Page 31: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

The Variance/Bias Tradeoff for the Mean(The Role of the Sampling Weight)

Estimate Mean Confidence Interval

Unweighted

Weighted

•The unweighted mean is biased•The weighted mean has a larger variance and confidence interval

( )1.96u uy yσ±

( )1.96w wy yσ±

Page 32: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Example: Glycohemoglobin Level (Ghb)• A blood test that measures the amount of glucose bound to hemoglobin.

• Normally, about 4% to 6%.• The test indicates how well diabetes has been controlled in the 2 to 3 months before the test.

• Suppose that we would like to assess the distribution of glycohemoglobin levels. 

• Sampling weights must be considered before plotting a histogram.

Page 33: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

SAS Code: Account for Weightsproc univariate data=explore.glyco noprint;

var glyco;

freq weight;

histogram / nrows=2 cfill=red midpoints=3 to 15 by 0.5 cgrid=grayDD;

run;• The variable weight indicates the number of population units the sample 

unit represents.

Page 34: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Proportion and 95% Confidence Interval

Page 35: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

3- Clustering (1)-- Simple Cluster Sampling --

Each of the 20 clusters is a household (HH) with varying numbers of occupants

Page 36: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

1 2 3

4

5

67

89

10

1112

13

14

15

1617

18

19

20

Select Two Clusters and Identify People

2

18

2 and 18 were sampled from the random number table

Page 37: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

3. Clustering (2)• Persons residing in a small area may have similar 

characteristics

• Thus, responses of subjects in small area (or within an exchange) may be correlated 

• Dependence between subjects leads to inflated variance

• Correlation must be accounted for in the analysis 

– Survey analysis programs do this through strata/PSU

• Area samples may have more clustering than telephone samples

Page 38: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Variance Estimation for Surveys

• Linearization: Uses a Taylor series expansion to estimate variance of non‐linear estimators 

– Default method for most stats programs

– Requires stratification and PSU information

• Replication methods: Calculates different parameter estimates for each replicate and combines these to estimate variance. 

– Jackknife with replicate weights available for a number of SUDAAN, STATA, SAS and WesVAR procedures

Page 39: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Statistical Software for Analyzing Health Surveys

– SAS version 9 ..– STATA ‐‐ Survey module– SPSS – Complex Samples module– EPIINFO for windows – Complex Sample Analysis

Selection of Softweare to Use for Analysis

For correct variance estimates need program that can incorporate complex sampling design

*Needed when doing statistical testing*Standard errors will tend to be larger*Less likely to make Type I error

Page 40: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

HANDS ON SESSION

Page 41: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

NHANES Data Retrieval & Manipulation System 

Page 42: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

The system retrieves design related variables (primary sampling unit, stratification, and sampling weight) 

Page 43: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,
Page 44: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,
Page 45: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,
Page 46: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,
Page 47: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,
Page 48: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

1‐ SAS (The SURVEYFREQ Procedure)

Selected syntax for PROC SURVEYFREQ:

PROC SURVEYFREQ DATA=SAS‐data‐set <options>;TABLES requests/ CHISQ CHISQ1 LRCHISQ LRCHISQ1DEFF;CLUSTER variables;STRATA variables </ LIST>;WEIGHT variable;RUN;

Page 49: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

• Selected syntax for PROC SURVEYMEANS:

PROC SURVEYMEANS DATA=SAS‐data‐set<options>;CLASS variables;STRATA variables ;VAR variables ;WEIGHT variable ;

RUN;

1‐ SAS (The SURVEYMEANS Procedure)

Page 50: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

1‐ SAS (The SURVEYREG Procedure)Selected statements of the SURVEYREG procedure:

The SURVEYREG Procedure

PROC SURVEYREG <options>;CLASS variables;STRATA variables ;MODEL dependent=<effects> </options> ;WEIGHT variable ;ESTIMATE ‘label’ effect values<effect values …>

RUN;

Page 51: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

1‐ SAS (The SURVEYLOGISTIC Procedure)Selected statements of the SURVEYLOGISTIC procedure:

PROC SURVEYLOGISTIC DATA=SAS‐data‐set <options>;CLASS variable <(options)>;CLUSTER variables;STRATA variables </ LIST>;WEIGHT variable;MODEL variable <(options)> = effects </options>;CONTRAST ‘label’ effect values, …,effect values </options>;

RUN;

Page 52: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

2‐ STATA (EXAMPLE 1)Set memory 4m   

– If you are using Intercooled STATA

use "H:\secondary data workshop\dental2_st_9.dta"

svyset [pweight=wtpfex6], strata(sdpstra6) psu(sdppsu6) 

svy: tabulate needcaresvy: tabulate dentalcariessvy: tabulate hssex dentalcaries, row cisvy: tabulate  hssex needcare, row cisvy: logistic dentalcaries hssexsvy: logistic needcare hssex

Page 53: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

2‐ STATA (EXAMPLE 2)Set memory 4m   

– If you are using Intercooled STATA

use "H:\secondary data workshop\EPI10.dta"

svyset [pweight=popw], strata(location) psu(cluster) 

svy: tabulate prenatalsvy: tabulate vacsvy: tabulate prenatal vac, row cisvy: logistic vac prenatal

Page 54: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

3‐ SPSS (Sample Plan)

• Sampling Plan: – Enables you to specify the sampling frame to create a complex sample design used by companion procedures in the SPSS Complex Samples module

– EXAMPLE– CSPLAN ANALYSIS

– /PLAN FILE='F:\secondary data workshop\EPI10_SPSS.csaplan'

– /PLANVARS ANALYSISWEIGHT=POPW       

– /SRSESTIMATOR TYPE=WOR

– /PRINT PLAN

– /DESIGN STRATA=LOCATION CLUSTER=CLUSTER 

– /ESTIMATOR TYPE=WR.

Page 55: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

3‐ SPSS (Syntax)

* Complex Samples Frequencies.

CSTABULATE

/PLAN FILE='F:\secondary data workshop\EPI10_SPSS.csaplan'

/TABLES VARIABLES=PRENATAL VAC

/CELLS POPSIZE TABLEPCT

/STATISTICS SE CIN(95) COUNT DEFF 

/MISSING SCOPE=TABLE CLASSMISSING=EXCLUDE.

* Complex Samples Crosstabs.

CSTABULATE

/PLAN FILE='F:\secondary data workshop\EPI10_SPSS.csaplan'

/TABLES VARIABLES=PRENATAL BY VAC

/CELLS POPSIZE ROWPCT COLPCT

/STATISTICS SE CIN(95) COUNT DEFF

/TEST ODDSRATIO INDEPENDENCE

/MISSING SCOPE=TABLE CLASSMISSING=EXCLUDE.

Page 56: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

3‐ SPSS (Syntax)

* Complex Samples Logistic Regression.

CSLOGISTIC  VAC(LOW) BY PRENATAL

/PLAN FILE='F:\secondary data workshop\EPI10_SPSS.csaplan' 

/MODEL PRENATAL

/INTERCEPT INCLUDE=YES SHOW=YES

/STATISTICS EXP CINTERVAL DEFF

/TEST TYPE=F PADJUST=LSD

/ODDSRATIOS FACTOR=[PRENATAL(LOW)]

/MISSING CLASSMISSING=EXCLUDE

/CRITERIA MXITER=100 MXSTEP=5 PCONVERGE=[1E‐006 RELATIVE] LCONVERGE=[0] CHKSEP=20 CILEVEL=95

/PRINT VARIABLEINFO SAMPLEINFO.

Page 57: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

4‐ EPIINFO 

• Complex Sample Descriptives

• Complex Sample Tabulate

Page 58: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

Data/Research Resources• Univ. of Michigan Consortium for social research: 

http://www.icpsr.umich.edu/

• UCLA Statistical Computing: http://www.ats.ucla.edu/stat/

• BRFSS Maps    http://apps.nccd.cdc.gov/gisbrfss/default.aspx

• State Cancer Profiles   http://statecancerprofiles.cancer.gov/

• http://www.cdc.gov/nchs/express.htm

• http://www.ahrq.gov/

• http://www.census.gov/

• http://www.oshpd.ca.gov/SITEMAP.htm

• http://seer.cancer.gov/data/

Page 59: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

ReferencesKorn, E.L. and Graubard, B.I. (1999). Analysis of 

Health Surveys. New York: John Wiley.

State Cancer Profiles: http://statecancerprofiles.cancer.gov/

SUDAAN: http://www.rti.org/SUDAAN/

SAS:  http://www.sas.com/

SPSS: http://www.spss.com/

STATA:  http://www.stata.com/

WesVar: http://www.westat.com/wesvar/

Mplus: http://www.statmodel.com/

Page 60: SECONDARY DATA SOURCES FOR RESEARCH EPIDEMIOLOGICAL … · secondary data sources for research. epidemiological and statistical considerations. mscr program and axis research design,

THANK YOU

FEEDBACK AND EVALUATION