Using the 2008 OFHS Using the 2008 OFHS Public Use File Public Use File A Self Guided Tutorial A Self Guided Tutorial *SAS Version* *SAS Version*
Jan 11, 2016
Using the 2008 OFHS Public Use FileUsing the 2008 OFHS Public Use FileA Self Guided TutorialA Self Guided Tutorial
*SAS Version**SAS Version*
• This tutorial is intended for persons who wish to use the 2008 OFHS Public Use File (PUF).
• The PUFs exclude any information that could either intentionally, or unintentionally identify a respondent. Geographic information below the county level has been removed.
• The dataset is a record of the responses to the survey questions at the respondent level.
• The dataset is in a format that requires the use of SAS, a statistical analysis software from SAS Institute.
• The dataset is also available for STATA and SPSS. There is a separate tutorials for STATA users.
Introduction
SAS Users
• Prerequisites– User has SAS version 9.1 or later.– User has experience writing SAS programs and
running them in the SAS Display Manager user interface, or in SAS Enterprise Guide.
– User has an understanding of basic statistics, including analysis of univariate data using nominal and ordinal level variables.
– User is comfortable with statistical terms such as proportions, standard error, confidence level, and confidence interval.
OFHS Background
• The 2008 OFHS is the largest State sponsored health survey in the U.S.
• Previous surveys were completed in 1998 and 2004.
• The survey had a sample size of 50,993.
• The survey was stratified to have enough respondents to do some analysis for each county in the state.
Documents that you may download before you get started.
• OFHS Questionnaire
• OFHS Codebook
• OFHS Methods Report
These documents are available on the OFHS web site.
http://grc.osu.edu/ofhs
Look on the Reports page.
What you need to know about the survey.
• Survey Design
• Survey Questions
• Imputation of Missing Values
• Weighting of Responses
• Constructed Variables
Survey Design
• The survey is a stratified random sample of Ohio’s non-institutional population.– Conducted through telephone interviews.
• Land Lines (49,000 respondents)• Cell Phone (2,000 respondents)
– Random Digit Dialing (land lines) within exchange numbers associated with each county.
• Exchanges are the first 3 digits of a seven digit phone number.
• The last four digits within each exchange are randomly selected.
Survey Design
– Cell Phones• Exchanges are at state level.
– Over Samples• African Americans - Some Exchanges in 6 largest urban counties
have higher proportion of African Americans in the population. The higher proportion exchanges were sampled at a higher rate.
• Asian and Hispanics - Supplementation of survey with lists of persons with hispanic or asian surnames.
– Household clusters• Each household/family forms a cluster within the sample.
– One adult and one child are randomly selected within the family.– Each response includes information on the adult, and the child (if there
are any children).– The adult who is most knowledgeable about the child’s health responds
for the child.
Survey Design
• The population of persons within each of the strata (State, County, telephone exchange, household, etc.) is already known or is collected as a part of the survey.
• A weight is established for each child and adult which reflects the inverse of the probability of being selected for the survey.
• Indicators of the strata and the weights are used in the SAS programs. We will come back to this later on.
Survey Questions
• In the survey questionnaire there are different kinds of questions. They include:– Qs that help to establish the weights for the
survey.• How many children are in the family?• How many phone numbers are in the home?
Survey Questions
– Qs that identify the demographic and socioeconomic characteristics of the individuals and the family.
• Age, gender, race, ethnicity.• Family income, employment, industry.• Education
Survey Questions
– Qs that identify the insurance status of the adult and child respondents.
• Source of Coverage (Job based, Medicare, Medicaid, etc.)
• If no insurance, the length of time without insurance, reason for being uninsured.
• If insured, length of time covered by current plan.• Types of Coverage (dental, prescriptions, vision
mental health)
Survey Questions
– Health Status of Adult and Child• General health status• Chronic health conditions• Special Health Care needs• Functional disability• Height and weight
Survey Questions
• Health Care Access, Utilization, Satisfaction and Unmet needs.– Usual source of care– Care coordination– Specialists– Emergency room use– Hospitalizations– Types of unmet needs.
Survey Questions
• Questions are at multiple levels.– Anchor Questions are questions that are
asked of everyone.– Qualifying Questions are questions that help
to narrow down who should be responding to an in-depth question.
– In-depth questions probe the dimensions of the respondent’s experience with a particular phenomenon.
Example of Question levelsD43. //Have you/Has person in S1// ever been told by a doctor or any other
health professional that //you/he// had diabetes or sugar diabetes?01 YES02 (Skip to D45) NO03 [VOLUNTEERED:]
BORDERLINE98 DK99 REFUSED
D43a //Have you/Has person in S1// ever been told by a doctor or any other health professional that //you/he/she// had TYPE 1 CHILD ONSET DIABETES or TYPE 2 ADULT ONSET, DIABETES?
[INTERVIEWER NOTE: PROBE FOR TYPE, AND IF RESPONDENT SAYS ‘BORDERLINE’ CODE AS ‘03’]
//Display response option 97, only if S15 = 02, 99.// 97 (Skip to D45) [VOLUNTEERED:] YES, “GESTATIONAL”
OR “ONLY WHEN PREGNANT” MENTIONED
01 YES - TYPE I (JUVENILE)02 YES - TYPE II (ADULT ONSET)03 [VOLUNTEERED:] BORDERLINE DIAGNOSIS ONLY04 (Skip to D45) NO, NEVER DIAGNOSED WITH DIABETES98 (Skip to D45) DK99 (Skip to D45) REFUSED
Anchor Question
Example of Question levelsD43b. //If (s15 = 02) then ask:////Was your/Was person in S1’s// DIABETES only during a
time associated with a pregnancy? [INTERVIEWER: PROBE FOR PROPER CODE]
01 (Skip to D45) YES ONLY WHEN PREGNANT
02 NO98 (Skip to D45) DK99 (Skip to D45) REFUSED
D44. //Is your/Is person on S1’s// blood sugar or glucose level, which affects diabetes, USUALLY under control or where a physician wants it, even if medication is required Always, Usually, Sometimes, Rarely, or Never?
01 ALWAYS02 USUALLY03 SOMETIMES04 RARELY05 NEVER98 DK99 REFUSED
Qualifying Question
In DepthQuestion
Question levels
• Notice in the example that there are instructions to skip to another question if the answer is no.
• These are anchor questions and qualifying questions which are eliminating persons from answering the in-depth questions.
• As a result, when a question is not asked of a respondent it creates a missing value for the respondent which is MISSING BY DESIGN.
Missing Values
• Some data is missing in the survey because the respondent refused to answer the question, or did not know the answer.
• These kinds of missing values need to be treated differently then those that are ‘missing by design’.
Missing Values
• There are some types of questions which are very important to the survey design or for public policy issues, for which it is not acceptable to have values missing.
• These include questions like:– Number of children in the family (design)– Family Income (public policy)
Imputation of Missing Values
• Where it is important for the survey to not have any missing values, the survey statisticians have replaced the missing value, by imputing it from all of the other survey respondents that answered other questions in the survey like the respondent did.
• Survey statisticians use very sophisticated models and processes to do imputation, and the practice is well accepted.
• When using this survey to do analysis, it is expected that the user will consider whether or not to choose the form of the variable which includes the imputed values.
• Imputed variables have a suffix of “_imp”.
Weighting
• Weights for each adult and child response which reflect the inverse of the probability of being selected for the survey, are constructed and should be used in all analysis.
• When the weights are used, the results reflect an accurate reflection of the entire population.
Weighting
• If the weights for children in the OFHS were summed up across all responses, the total would be equal to the child population of Ohio. The same is true of the adult weights.
• The variable name for the adult weight is “wt_a”.
• The variable name for the child weight is “wt_c”.
Constructed Variables
• There are many variables in the OFHS file that are constructed from the responses to the survey questions that make it easier to use the OFHS. These variables include:– BMI – Body mass index. BMI is an indicator
of adult and child obesity constructed from height and weight. The formula is complicated, especially for children. We make it easier for the user to do analysis of obesity by pre-calculating it.
Constructed Variables
– Insurance Type – In many instances, respondents to the survey had more than one source of insurance. For example, many seniors have insurance from their private pension plans and Medicare. For the purpose of creating an unduplicated count of the population by their insurance status, we have created a variable which imposes a hierarchy of insurance sources to classify the population.
Using SAS with the OFHS
• Step 1. Make your PC Ready.• Step 2. Download and Un-zip the SAS dataset.• Step 3. Assign a SAS Library name and restore SAS
formats.• Step 4. Build and run your first OFHS SAS Program
Make Your PC Ready• Create a directory for the OFHS Public Use File. It should look like
this:
C:\sasdata\ofhs2008
• Make sure that you have software to decompress the SAS dataset. WinZip is a popular product which works well for this.
• Make sure there is enough room on the drive for the OFHS file after it is unzipped. You will need at least 800 megabytes of storage space. You will need additional temporary work space for when the file is processing. You may want to put the file on a separate drive from the drive which houses the temporary work space (typically Drive C).
Download and Unzip the SAS dataset.
• You will find the OFHS Public Use Dataset at:
http://grc.osu.edu/ofhs/datadownloads/index.htm
• Right click on the file name and select ‘save target as’.
• Save the ZIP file to the directory where you will store the data (c:\sasdata\ofhs2008).
• After the file has been saved, run winzip, saving the unzipped file to the same directory.
Download and Unzip the SAS dataset
• After you download the data, the directory will contain the following files:
Formats.sas7bdat
Restore_formats.sas
OFHS2008.sas7bdat
Assign a SAS Library name and restore SAS formats
• First, you must start SAS or SAS Enterprise Guide.
• Open the Restore_Formats.sas in the program editor window.
Assign a SAS Library name and restore SAS formats
/* This program creates a formats catalog from an existing formats dataset,formats.sas7bdat, which should be in the current directory.
The resulting formats catalog file will be created in the current directory.*/LIBNAME ofhs 'D:\final_data_delivery_021109';libname library 'D:\SASFORMATS';
proc format library=library cntlin=library.formats; run;
The Restore_formats file will look like this:
You need to change the LIBNAME ofhs statement to reflect the drive and directory location of the files that you unzipped.
You can now ‘submit’ or ‘run’ the restore formats program.
Build and run your first OFHS SAS Program
• You should only use procedures in SAS that support the use of complex survey designs. Including:– Proc Surveymeans– Proc Surveyfreq– Proc Surveylogistic– Proc Surveyreg
• Most newcomers will use Proc Surveymeans to start out. If you are familiar with Surveylogistic or Surveyreg, you probably do not need this tutorial.
Proc Surveymeans
Proc Surveymeans data=ofhs.ofhs2008 ALPHA=.05 nobs mean CLM SUMWGT CLSUM;Stratum STRATUM;Weight WT_c;Var i_type_c;Class i_type_c;run;
Here is a simple program which calculates the percent of children by Insurance Type.It includes a 95% confidence interval around the mean. Note the names of the variables which reflect the command syntax for complex sampling design (Stratum, and Weight). The Stratum Variables will always stay the same. There are different weights for children (WT_C) and Adults (WT_A).
Proc Surveymeans results (with a little cutting and pasting and formatting of values)
Child insurance type N Sum of Weights
Mean Std Error of Mean
01-MEDICAID AND MEDICARE 219 2754928 1.94% 0.17% 1.62% 2.27% 44,488 62,597
02-MEDICAID, NO MEDICARE 3276 2754928 30.92% 0.55% 29.83% 32.00% 817,888 885,551
03-MEDICARE, NO MEDICAID 124 2754928 0.64% 0.08% 0.48% 0.81% 13,196 22,339
04-JOB-BASED COVERAGE 7933 2754928 53.29% 0.57% 52.17% 54.42% 1,436,884 1,499,420
05-DIRECTLY PURCHASED 377 2754928 2.55% 0.18% 2.19% 2.90% 60,404 79,942
06-OTHER 93 2754928 0.63% 0.09% 0.45% 0.81% 12,312 22,411
07-INSURED TYPE UNKNOWN 717 2754928 5.99% 0.29% 5.42% 6.55% 149,357 180,409
08-UNINSURED 704 2754928 4.04% 0.21% 3.63% 4.46% 99,869 122,790
Statistics95% CL for Mean 95% CL for Sum
Proc Surveymeans
DATA ofhs.children; SET OFHS.OFHS2008; if h87_imp in ('01','02','03','04') then FPL200='0 to 200% FPL'; else fpl200='201+ % FPL';Proc Surveymeans data=ofhs.ofhs2008 ALPHA=.05 nobs mean CLM SUMWGT CLSUM;Stratum STRATUM;Weight WT_c;Var i_type_c;Class i_type_c;Domain fpl200;run;
Now you might add some domain analysis to this, breaking out insurance status for children by poverty level.
Surveymeans with a Domain Statement
Percent of Federal Poverty level and Child insurance type
N Sum of Weights
Mean Std Error of Mean
0 to 200% FPL01-MEDICAID AND MEDICARE 183 1,250,227 3.77% 0.35% 3.08% 4.45% 38,404 55,743
02-MEDICAID, NO MEDICARE 2822 1,250,227 58.93% 0.87% 57.23% 60.63% 704,848 768,72903-MEDICARE, NO MEDICAID 61 1,250,227 0.74% 0.13% 0.48% 0.99% 6,032 12,36404-JOB-BASED COVERAGE 1420 1,250,227 20.67% 0.69% 19.32% 22.01% 240,535 276,24405-DIRECTLY PURCHASED 97 1,250,227 1.42% 0.19% 1.05% 1.79% 13,175 22,42406-OTHER 41 1,250,227 0.62% 0.14% 0.35% 0.89% 4,327 11,19207-INSURED TYPE UNKNOWN 395 1,250,227 7.69% 0.49% 6.73% 8.65% 83,876 108,51608-UNINSURED 459 1,250,227 6.16% 0.40% 5.39% 6.93% 67,208 86,838201+ % FPL01-MEDICAID AND MEDICARE 36 1,504,701 0.43% 0.09% 0.25% 0.61% 3,809 9,12902-MEDICAID, NO MEDICARE 454 1,504,701 7.64% 0.45% 6.75% 8.52% 101,178 128,68403-MEDICARE, NO MEDICAID 63 1,504,701 0.57% 0.11% 0.35% 0.79% 5,264 11,87504-JOB-BASED COVERAGE 6513 1,504,701 80.40% 0.63% 79.16% 81.64% 1,179,403 1,240,12205-DIRECTLY PURCHASED 280 1,504,701 3.48% 0.29% 2.91% 4.05% 43,718 61,03006-OTHER 52 1,504,701 0.64% 0.13% 0.39% 0.88% 5,886 13,31807-INSURED TYPE UNKNOWN 322 1,504,701 4.56% 0.34% 3.90% 5.23% 58,565 78,80808-UNINSURED 245 1,504,701 2.28% 0.20% 1.88% 2.68% 28,245 40,369
95% CL for Mean 95% CL for Sum
Proc Surveyfreq
proc surveyfreq data=ofhs.ofhs2008;
stratum stratum;
Tables h87_imp*insrd_a / alpha=.05 cl clwt deff;
weight WT_A;
run;
H87_IMP Weighted Std Dev of Std Err of Design
Percent of Federal Poverty Level
Frequency Wgt Freq Percent Effect
01-LESS THAN 63% 01-YES, INSURED 3192 527141 13696 500297 553985 6.0508 0.1552 5.7465 6.355 2.1588
02-NO, UNINSURED 905 220301 10051 200601 240000 2.5287 0.1143 2.3047 2.7527 2.7002
Total 4097 747441 16781 714550 780333 8.5795 0.1881 8.2107 8.9482 2.2985
02-63% - 100% 01-YES, INSURED 3615 467450 12246 443447 491454 5.3656 0.1395 5.0922 5.639 1.9527
02-NO, UNINSURED 791 174675 8751 157522 191828 2.005 0.0998 1.8095 2.2005 2.5803
Total 4406 642126 14911 612901 671351 7.3706 0.1684 7.0404 7.7008 2.1171
03-101% - 150% 01-YES, INSURED 4764 714655 15586 684106 745203 8.2031 0.1761 7.8579 8.5483 2.0984
02-NO, UNINSURED 1105 253800 10698 232833 274768 2.9132 0.1215 2.6751 3.1514 2.6587
Total 5869 968455 18660 931881 1005029 11.1163 0.2082 10.7084 11.5243 2.234
04-151% - 200% 01-YES, INSURED 4044 618783 14539 590287 647279 7.1027 0.1647 6.7798 7.4256 2.0954
02-NO, UNINSURED 727 156246 8220 140135 172356 1.7935 0.0938 1.6095 1.9774 2.5472
Total 4771 775029 16553 742585 807473 8.8961 0.1864 8.5308 9.2614 2.1836
05-201% - 250% 01-YES, INSURED 4613 701052 15205 671251 730853 8.047 0.1726 7.7087 8.3852 2.0504
02-NO, UNINSURED 623 148023 8291 131772 164273 1.6991 0.0946 1.5136 1.8845 2.7307
Total 5236 849074 17153 815455 882694 9.746 0.1933 9.3673 10.1248 2.1629
06-251% - 300% 01-YES, INSURED 4326 718899 15756 688017 749780 8.2518 0.1786 7.9018 8.6019 2.1462
02-NO, UNINSURED 364 84203 6238 71976 96430 0.9665 0.0714 0.8266 1.1064 2.7116
Total 4690 803102 16841 770093 836111 9.2184 0.1901 8.8457 9.591 2.2009
07-301% OR MORE 01-YES, INSURED 21106 3731978 29753 3673662 3790293 42.8373 0.32 42.2101 43.4644 2.1298
02-NO, UNINSURED 769 194784 9739 175695 213873 2.2358 0.111 2.0183 2.4533 2.8704
Total 21875 3926762 30529 3866925 3986599 45.0731 0.3227 44.4406 45.7056 2.1429
Total 01-YES, INSURED 45660 7479957 32411 7416432 7543483 85.8582 0.2482 85.3716 86.3448 2.5856
02-NO, UNINSURED 5284 1232032 22843 1187259 1276805 14.1418 0.2482 13.6552 14.6284 2.5856
Total 50944 8711989 32575 8648142 8775836 100
Table of H87_IMP by INSRD_A
INSRD_A Frequency 95% Confidence Limits
for Wgt Freq
Percent 95% Confidence Limits
for Percent
Results of Proc Surveyfreq
Domain Analysis in Proc Surveyfreq
proc surveyfreq data=ofhs.ofhs2008;stratum stratum;Tables h87_imp*s15_imp*insrd_a / alpha=.05 cl clwt deff;weight WT_A;run;
There is no Domain Statement for Proc Surveyfreq. Add a variable(s) to the front of the tables statement.
Domain Analysis in Proc Surveyfreq
Weighted Std Dev of Std Err of Design
Frequency Wgt Freq Percent Effect
01-MALE 01-YES, INSURED 806 179624 8888 162203 197046 24.0319 1.0248 22.0234 26.0404 29.3024
02-NO, UNINSURED 370 125105 8121 109188 141022 16.7377 0.9701 14.8363 18.6392 34.4017
Total 1176 304729 11984 281240 328218 40.7696 1.1705 38.4754 43.0639 28.9056
02-FEMALE 01-YES, INSURED 2386 347516 10583 326774 368259 46.4941 1.1392 44.2613 48.727 26.5754
02-NO, UNINSURED 535 95196 5995 83445 106946 12.7362 0.755 11.2564 14.216 26.128
Total 2921 442712 12079 419037 466388 59.2304 1.1705 56.9361 61.5246 28.9056
Total 01-YES, INSURED 3192 527141 13696 500297 553985 70.526 1.1032 68.3638 72.6883 29.8247
02-NO, UNINSURED 905 220301 10051 200601 240000 29.474 1.1032 27.3117 31.6362 29.8247
Total 4097 747441 16781 714550 780333 100
Weighted Std Dev of Std Err of Design
Frequency Wgt Freq Percent Effect
01-MALE 01-YES, INSURED 945 167436 8347 151076 183797 26.0753 1.0958 23.9275 28.2231 31.7351
02-NO, UNINSURED 269 85825 6641 72810 98841 13.3658 0.9426 11.5183 15.2133 39.0883
Total 1214 253262 10633 232422 274102 39.4411 1.2116 37.0663 41.816 31.3118
02-FEMALE 01-YES, INSURED 2670 300014 9104 282169 317859 46.722 1.1703 44.4283 49.0158 28.028
02-NO, UNINSURED 522 88850 5743 77593 100107 13.8368 0.831 12.2081 15.4655 29.5046
Total 3192 388864 10702 367888 409840 60.5589 1.2116 58.184 62.9337 31.3118
Total 01-YES, INSURED 3615 467450 12246 443447 491454 72.7973 1.1287 70.5852 75.0095 32.7702
02-NO, UNINSURED 791 174675 8751 157522 191828 27.2027 1.1287 24.9905 29.4148 32.7702
Total 4406 642126 14911 612901 671351 100
for Percent
for Percent
Table of S15_IMP by INSRD_A
Controlling for H87_IMP=02-63% - 100%
S15_IMP INSRD_A Frequency 95% Confidence Limits
for Wgt Freq
Percent 95% Confidence Limits
Table of S15_IMP by INSRD_A
Controlling for H87_IMP=01-LESS THAN 63%
S15_IMP INSRD_A Frequency 95% Confidence Limits
for Wgt Freq
Percent 95% Confidence Limits
The END