Using the 2008 OFHS Public Use File A Self Guided Tutorial SAS Version

Using the 2008 OFHS Public Use FileUsing the 2008 OFHS Public Use FileA Self Guided TutorialA Self Guided Tutorial

*SAS Version**SAS Version*

• This tutorial is intended for persons who wish to use the 2008 OFHS Public Use File (PUF).

• The PUFs exclude any information that could either intentionally, or unintentionally identify a respondent. Geographic information below the county level has been removed.

• The dataset is a record of the responses to the survey questions at the respondent level.

• The dataset is in a format that requires the use of SAS, a statistical analysis software from SAS Institute.

• The dataset is also available for STATA and SPSS. There is a separate tutorials for STATA users.

Introduction

SAS Users

• Prerequisites– User has SAS version 9.1 or later.– User has experience writing SAS programs and

running them in the SAS Display Manager user interface, or in SAS Enterprise Guide.

– User has an understanding of basic statistics, including analysis of univariate data using nominal and ordinal level variables.

– User is comfortable with statistical terms such as proportions, standard error, confidence level, and confidence interval.

OFHS Background

• The 2008 OFHS is the largest State sponsored health survey in the U.S.

• Previous surveys were completed in 1998 and 2004.

• The survey had a sample size of 50,993.

• The survey was stratified to have enough respondents to do some analysis for each county in the state.

Documents that you may download before you get started.

• OFHS Questionnaire

• OFHS Codebook

• OFHS Methods Report

These documents are available on the OFHS web site.

http://grc.osu.edu/ofhs

Look on the Reports page.

What you need to know about the survey.

• Survey Design

• Survey Questions

• Imputation of Missing Values

• Weighting of Responses

• Constructed Variables

Survey Design

• The survey is a stratified random sample of Ohio’s non-institutional population.– Conducted through telephone interviews.

• Land Lines (49,000 respondents)• Cell Phone (2,000 respondents)

– Random Digit Dialing (land lines) within exchange numbers associated with each county.

• Exchanges are the first 3 digits of a seven digit phone number.

• The last four digits within each exchange are randomly selected.

Survey Design

– Cell Phones• Exchanges are at state level.

– Over Samples• African Americans - Some Exchanges in 6 largest urban counties

have higher proportion of African Americans in the population. The higher proportion exchanges were sampled at a higher rate.

• Asian and Hispanics - Supplementation of survey with lists of persons with hispanic or asian surnames.

– Household clusters• Each household/family forms a cluster within the sample.

– One adult and one child are randomly selected within the family.– Each response includes information on the adult, and the child (if there

are any children).– The adult who is most knowledgeable about the child’s health responds

for the child.

Survey Design

• The population of persons within each of the strata (State, County, telephone exchange, household, etc.) is already known or is collected as a part of the survey.

• A weight is established for each child and adult which reflects the inverse of the probability of being selected for the survey.

• Indicators of the strata and the weights are used in the SAS programs. We will come back to this later on.

Survey Questions

• In the survey questionnaire there are different kinds of questions. They include:– Qs that help to establish the weights for the

survey.• How many children are in the family?• How many phone numbers are in the home?

Survey Questions

– Qs that identify the demographic and socioeconomic characteristics of the individuals and the family.

• Age, gender, race, ethnicity.• Family income, employment, industry.• Education

Survey Questions

– Qs that identify the insurance status of the adult and child respondents.

• Source of Coverage (Job based, Medicare, Medicaid, etc.)

• If no insurance, the length of time without insurance, reason for being uninsured.

• If insured, length of time covered by current plan.• Types of Coverage (dental, prescriptions, vision

mental health)

Survey Questions

– Health Status of Adult and Child• General health status• Chronic health conditions• Special Health Care needs• Functional disability• Height and weight

Survey Questions

• Health Care Access, Utilization, Satisfaction and Unmet needs.– Usual source of care– Care coordination– Specialists– Emergency room use– Hospitalizations– Types of unmet needs.

Survey Questions

• Questions are at multiple levels.– Anchor Questions are questions that are

asked of everyone.– Qualifying Questions are questions that help

to narrow down who should be responding to an in-depth question.

– In-depth questions probe the dimensions of the respondent’s experience with a particular phenomenon.

Example of Question levelsD43. //Have you/Has person in S1// ever been told by a doctor or any other

health professional that //you/he// had diabetes or sugar diabetes?01 YES02 (Skip to D45) NO03 [VOLUNTEERED:]

BORDERLINE98 DK99 REFUSED

D43a //Have you/Has person in S1// ever been told by a doctor or any other health professional that //you/he/she// had TYPE 1 CHILD ONSET DIABETES or TYPE 2 ADULT ONSET, DIABETES?

[INTERVIEWER NOTE: PROBE FOR TYPE, AND IF RESPONDENT SAYS ‘BORDERLINE’ CODE AS ‘03’]

//Display response option 97, only if S15 = 02, 99.// 97 (Skip to D45) [VOLUNTEERED:] YES, “GESTATIONAL”

OR “ONLY WHEN PREGNANT” MENTIONED

01 YES - TYPE I (JUVENILE)02 YES - TYPE II (ADULT ONSET)03 [VOLUNTEERED:] BORDERLINE DIAGNOSIS ONLY04 (Skip to D45) NO, NEVER DIAGNOSED WITH DIABETES98 (Skip to D45) DK99 (Skip to D45) REFUSED

Anchor Question

Example of Question levelsD43b. //If (s15 = 02) then ask:////Was your/Was person in S1’s// DIABETES only during a

time associated with a pregnancy? [INTERVIEWER: PROBE FOR PROPER CODE]

01 (Skip to D45) YES ONLY WHEN PREGNANT

02 NO98 (Skip to D45) DK99 (Skip to D45) REFUSED

D44. //Is your/Is person on S1’s// blood sugar or glucose level, which affects diabetes, USUALLY under control or where a physician wants it, even if medication is required Always, Usually, Sometimes, Rarely, or Never?

01 ALWAYS02 USUALLY03 SOMETIMES04 RARELY05 NEVER98 DK99 REFUSED

Qualifying Question

In DepthQuestion

Question levels

• Notice in the example that there are instructions to skip to another question if the answer is no.

• These are anchor questions and qualifying questions which are eliminating persons from answering the in-depth questions.

• As a result, when a question is not asked of a respondent it creates a missing value for the respondent which is MISSING BY DESIGN.

Missing Values

• Some data is missing in the survey because the respondent refused to answer the question, or did not know the answer.

• These kinds of missing values need to be treated differently then those that are ‘missing by design’.

Missing Values

• There are some types of questions which are very important to the survey design or for public policy issues, for which it is not acceptable to have values missing.

• These include questions like:– Number of children in the family (design)– Family Income (public policy)

Imputation of Missing Values

• Where it is important for the survey to not have any missing values, the survey statisticians have replaced the missing value, by imputing it from all of the other survey respondents that answered other questions in the survey like the respondent did.

• Survey statisticians use very sophisticated models and processes to do imputation, and the practice is well accepted.

• When using this survey to do analysis, it is expected that the user will consider whether or not to choose the form of the variable which includes the imputed values.

• Imputed variables have a suffix of “_imp”.

Weighting

• Weights for each adult and child response which reflect the inverse of the probability of being selected for the survey, are constructed and should be used in all analysis.

• When the weights are used, the results reflect an accurate reflection of the entire population.

Weighting

• If the weights for children in the OFHS were summed up across all responses, the total would be equal to the child population of Ohio. The same is true of the adult weights.

• The variable name for the adult weight is “wt_a”.

• The variable name for the child weight is “wt_c”.

Constructed Variables

• There are many variables in the OFHS file that are constructed from the responses to the survey questions that make it easier to use the OFHS. These variables include:– BMI – Body mass index. BMI is an indicator

of adult and child obesity constructed from height and weight. The formula is complicated, especially for children. We make it easier for the user to do analysis of obesity by pre-calculating it.

Constructed Variables

– Insurance Type – In many instances, respondents to the survey had more than one source of insurance. For example, many seniors have insurance from their private pension plans and Medicare. For the purpose of creating an unduplicated count of the population by their insurance status, we have created a variable which imposes a hierarchy of insurance sources to classify the population.

Using SAS with the OFHS

• Step 1. Make your PC Ready.• Step 2. Download and Un-zip the SAS dataset.• Step 3. Assign a SAS Library name and restore SAS

formats.• Step 4. Build and run your first OFHS SAS Program

Make Your PC Ready• Create a directory for the OFHS Public Use File. It should look like

this:

C:\sasdata\ofhs2008

• Make sure that you have software to decompress the SAS dataset. WinZip is a popular product which works well for this.

• Make sure there is enough room on the drive for the OFHS file after it is unzipped. You will need at least 800 megabytes of storage space. You will need additional temporary work space for when the file is processing. You may want to put the file on a separate drive from the drive which houses the temporary work space (typically Drive C).

Download and Unzip the SAS dataset.

• You will find the OFHS Public Use Dataset at:

http://grc.osu.edu/ofhs/datadownloads/index.htm

• Right click on the file name and select ‘save target as’.

• Save the ZIP file to the directory where you will store the data (c:\sasdata\ofhs2008).

• After the file has been saved, run winzip, saving the unzipped file to the same directory.

Download and Unzip the SAS dataset

• After you download the data, the directory will contain the following files:

Formats.sas7bdat

Restore_formats.sas

OFHS2008.sas7bdat

Assign a SAS Library name and restore SAS formats

• First, you must start SAS or SAS Enterprise Guide.

• Open the Restore_Formats.sas in the program editor window.

Assign a SAS Library name and restore SAS formats

/* This program creates a formats catalog from an existing formats dataset,formats.sas7bdat, which should be in the current directory.

The resulting formats catalog file will be created in the current directory.*/LIBNAME ofhs 'D:\final_data_delivery_021109';libname library 'D:\SASFORMATS';

proc format library=library cntlin=library.formats; run;

The Restore_formats file will look like this:

You need to change the LIBNAME ofhs statement to reflect the drive and directory location of the files that you unzipped.

You can now ‘submit’ or ‘run’ the restore formats program.

Build and run your first OFHS SAS Program

• You should only use procedures in SAS that support the use of complex survey designs. Including:– Proc Surveymeans– Proc Surveyfreq– Proc Surveylogistic– Proc Surveyreg

• Most newcomers will use Proc Surveymeans to start out. If you are familiar with Surveylogistic or Surveyreg, you probably do not need this tutorial.

Proc Surveymeans

Proc Surveymeans data=ofhs.ofhs2008 ALPHA=.05 nobs mean CLM SUMWGT CLSUM;Stratum STRATUM;Weight WT_c;Var i_type_c;Class i_type_c;run;

Here is a simple program which calculates the percent of children by Insurance Type.It includes a 95% confidence interval around the mean. Note the names of the variables which reflect the command syntax for complex sampling design (Stratum, and Weight). The Stratum Variables will always stay the same. There are different weights for children (WT_C) and Adults (WT_A).

Proc Surveymeans results (with a little cutting and pasting and formatting of values)

Child insurance type N Sum of Weights

Mean Std Error of Mean

01-MEDICAID AND MEDICARE 219 2754928 1.94% 0.17% 1.62% 2.27% 44,488 62,597

02-MEDICAID, NO MEDICARE 3276 2754928 30.92% 0.55% 29.83% 32.00% 817,888 885,551

03-MEDICARE, NO MEDICAID 124 2754928 0.64% 0.08% 0.48% 0.81% 13,196 22,339

04-JOB-BASED COVERAGE 7933 2754928 53.29% 0.57% 52.17% 54.42% 1,436,884 1,499,420

05-DIRECTLY PURCHASED 377 2754928 2.55% 0.18% 2.19% 2.90% 60,404 79,942

06-OTHER 93 2754928 0.63% 0.09% 0.45% 0.81% 12,312 22,411

07-INSURED TYPE UNKNOWN 717 2754928 5.99% 0.29% 5.42% 6.55% 149,357 180,409

08-UNINSURED 704 2754928 4.04% 0.21% 3.63% 4.46% 99,869 122,790

Statistics95% CL for Mean 95% CL for Sum

Proc Surveymeans

DATA ofhs.children; SET OFHS.OFHS2008; if h87_imp in ('01','02','03','04') then FPL200='0 to 200% FPL'; else fpl200='201+ % FPL';Proc Surveymeans data=ofhs.ofhs2008 ALPHA=.05 nobs mean CLM SUMWGT CLSUM;Stratum STRATUM;Weight WT_c;Var i_type_c;Class i_type_c;Domain fpl200;run;

Now you might add some domain analysis to this, breaking out insurance status for children by poverty level.

Surveymeans with a Domain Statement

Percent of Federal Poverty level and Child insurance type

N Sum of Weights

Mean Std Error of Mean

0 to 200% FPL01-MEDICAID AND MEDICARE 183 1,250,227 3.77% 0.35% 3.08% 4.45% 38,404 55,743

02-MEDICAID, NO MEDICARE 2822 1,250,227 58.93% 0.87% 57.23% 60.63% 704,848 768,72903-MEDICARE, NO MEDICAID 61 1,250,227 0.74% 0.13% 0.48% 0.99% 6,032 12,36404-JOB-BASED COVERAGE 1420 1,250,227 20.67% 0.69% 19.32% 22.01% 240,535 276,24405-DIRECTLY PURCHASED 97 1,250,227 1.42% 0.19% 1.05% 1.79% 13,175 22,42406-OTHER 41 1,250,227 0.62% 0.14% 0.35% 0.89% 4,327 11,19207-INSURED TYPE UNKNOWN 395 1,250,227 7.69% 0.49% 6.73% 8.65% 83,876 108,51608-UNINSURED 459 1,250,227 6.16% 0.40% 5.39% 6.93% 67,208 86,838201+ % FPL01-MEDICAID AND MEDICARE 36 1,504,701 0.43% 0.09% 0.25% 0.61% 3,809 9,12902-MEDICAID, NO MEDICARE 454 1,504,701 7.64% 0.45% 6.75% 8.52% 101,178 128,68403-MEDICARE, NO MEDICAID 63 1,504,701 0.57% 0.11% 0.35% 0.79% 5,264 11,87504-JOB-BASED COVERAGE 6513 1,504,701 80.40% 0.63% 79.16% 81.64% 1,179,403 1,240,12205-DIRECTLY PURCHASED 280 1,504,701 3.48% 0.29% 2.91% 4.05% 43,718 61,03006-OTHER 52 1,504,701 0.64% 0.13% 0.39% 0.88% 5,886 13,31807-INSURED TYPE UNKNOWN 322 1,504,701 4.56% 0.34% 3.90% 5.23% 58,565 78,80808-UNINSURED 245 1,504,701 2.28% 0.20% 1.88% 2.68% 28,245 40,369

95% CL for Mean 95% CL for Sum

Proc Surveyfreq

proc surveyfreq data=ofhs.ofhs2008;

stratum stratum;

Tables h87_imp*insrd_a / alpha=.05 cl clwt deff;

weight WT_A;

run;

H87_IMP Weighted Std Dev of Std Err of Design

Percent of Federal Poverty Level

Frequency Wgt Freq Percent Effect

01-LESS THAN 63% 01-YES, INSURED 3192 527141 13696 500297 553985 6.0508 0.1552 5.7465 6.355 2.1588

02-NO, UNINSURED 905 220301 10051 200601 240000 2.5287 0.1143 2.3047 2.7527 2.7002

Total 4097 747441 16781 714550 780333 8.5795 0.1881 8.2107 8.9482 2.2985

02-63% - 100% 01-YES, INSURED 3615 467450 12246 443447 491454 5.3656 0.1395 5.0922 5.639 1.9527

02-NO, UNINSURED 791 174675 8751 157522 191828 2.005 0.0998 1.8095 2.2005 2.5803

Total 4406 642126 14911 612901 671351 7.3706 0.1684 7.0404 7.7008 2.1171

03-101% - 150% 01-YES, INSURED 4764 714655 15586 684106 745203 8.2031 0.1761 7.8579 8.5483 2.0984

02-NO, UNINSURED 1105 253800 10698 232833 274768 2.9132 0.1215 2.6751 3.1514 2.6587

Total 5869 968455 18660 931881 1005029 11.1163 0.2082 10.7084 11.5243 2.234

04-151% - 200% 01-YES, INSURED 4044 618783 14539 590287 647279 7.1027 0.1647 6.7798 7.4256 2.0954

02-NO, UNINSURED 727 156246 8220 140135 172356 1.7935 0.0938 1.6095 1.9774 2.5472

Total 4771 775029 16553 742585 807473 8.8961 0.1864 8.5308 9.2614 2.1836

05-201% - 250% 01-YES, INSURED 4613 701052 15205 671251 730853 8.047 0.1726 7.7087 8.3852 2.0504

02-NO, UNINSURED 623 148023 8291 131772 164273 1.6991 0.0946 1.5136 1.8845 2.7307

Total 5236 849074 17153 815455 882694 9.746 0.1933 9.3673 10.1248 2.1629

06-251% - 300% 01-YES, INSURED 4326 718899 15756 688017 749780 8.2518 0.1786 7.9018 8.6019 2.1462

02-NO, UNINSURED 364 84203 6238 71976 96430 0.9665 0.0714 0.8266 1.1064 2.7116

Total 4690 803102 16841 770093 836111 9.2184 0.1901 8.8457 9.591 2.2009

07-301% OR MORE 01-YES, INSURED 21106 3731978 29753 3673662 3790293 42.8373 0.32 42.2101 43.4644 2.1298

02-NO, UNINSURED 769 194784 9739 175695 213873 2.2358 0.111 2.0183 2.4533 2.8704

Total 21875 3926762 30529 3866925 3986599 45.0731 0.3227 44.4406 45.7056 2.1429

Total 01-YES, INSURED 45660 7479957 32411 7416432 7543483 85.8582 0.2482 85.3716 86.3448 2.5856

02-NO, UNINSURED 5284 1232032 22843 1187259 1276805 14.1418 0.2482 13.6552 14.6284 2.5856

Total 50944 8711989 32575 8648142 8775836 100

Table of H87_IMP by INSRD_A

INSRD_A Frequency 95% Confidence Limits

for Wgt Freq

Percent 95% Confidence Limits

for Percent

Results of Proc Surveyfreq

Domain Analysis in Proc Surveyfreq

proc surveyfreq data=ofhs.ofhs2008;stratum stratum;Tables h87_imp*s15_imp*insrd_a / alpha=.05 cl clwt deff;weight WT_A;run;

There is no Domain Statement for Proc Surveyfreq. Add a variable(s) to the front of the tables statement.

Domain Analysis in Proc Surveyfreq

Weighted Std Dev of Std Err of Design


01-MALE 01-YES, INSURED 806 179624 8888 162203 197046 24.0319 1.0248 22.0234 26.0404 29.3024

02-NO, UNINSURED 370 125105 8121 109188 141022 16.7377 0.9701 14.8363 18.6392 34.4017

Total 1176 304729 11984 281240 328218 40.7696 1.1705 38.4754 43.0639 28.9056

02-FEMALE 01-YES, INSURED 2386 347516 10583 326774 368259 46.4941 1.1392 44.2613 48.727 26.5754

02-NO, UNINSURED 535 95196 5995 83445 106946 12.7362 0.755 11.2564 14.216 26.128

Total 2921 442712 12079 419037 466388 59.2304 1.1705 56.9361 61.5246 28.9056


02-NO, UNINSURED 905 220301 10051 200601 240000 29.474 1.1032 27.3117 31.6362 29.8247

Total 4097 747441 16781 714550 780333 100

Weighted Std Dev of Std Err of Design


01-MALE 01-YES, INSURED 945 167436 8347 151076 183797 26.0753 1.0958 23.9275 28.2231 31.7351

02-NO, UNINSURED 269 85825 6641 72810 98841 13.3658 0.9426 11.5183 15.2133 39.0883

Total 1214 253262 10633 232422 274102 39.4411 1.2116 37.0663 41.816 31.3118

02-FEMALE 01-YES, INSURED 2670 300014 9104 282169 317859 46.722 1.1703 44.4283 49.0158 28.028

02-NO, UNINSURED 522 88850 5743 77593 100107 13.8368 0.831 12.2081 15.4655 29.5046

Total 3192 388864 10702 367888 409840 60.5589 1.2116 58.184 62.9337 31.3118


02-NO, UNINSURED 791 174675 8751 157522 191828 27.2027 1.1287 24.9905 29.4148 32.7702

Total 4406 642126 14911 612901 671351 100

for Percent

for Percent

Table of S15_IMP by INSRD_A

Controlling for H87_IMP=02-63% - 100%

S15_IMP INSRD_A Frequency 95% Confidence Limits

for Wgt Freq


Table of S15_IMP by INSRD_A

Controlling for H87_IMP=01-LESS THAN 63%

S15_IMP INSRD_A Frequency 95% Confidence Limits

for Wgt Freq


The END

Using the 2008 OFHS Public Use File A Self Guided Tutorial SAS Version

Documents

survey questionsin

survey questionnair

use of sas

sas version

sas programs

state level

county level

sas institute

Using the 2008 OFHS Public Use File A Self Guided Tutorial *SAS Version*

Using the 2008 OFHS Public Use File A Self Guided Tutorial SAS Version