Top Banner
Working with the data
43

Working with the data

Jan 03, 2016

Download

Documents

deacon-rosario

Working with the data. Where to begin?. Have you come across any ACS data issues in your work?. Sample Error (90% Confidence) Collapsing Period Estimates Reliability Dollar Values Trend Analysis Weighing Change Light Rail Reweighting CTPP Issues Block Group data. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Working with the data

Working with the data

Page 2: Working with the data

Where to begin?

Have you come across any ACS data issues in your work?

1. Sample Error (90% Confidence)2. Collapsing3. Period Estimates4. Reliability5. Dollar Values6. Trend Analysis7. Weighing Change8. Light Rail9. Reweighting10. CTPP Issues11. Block Group data

Page 3: Working with the data

You must do Statistical Significance Tests

To avoid false statements like

“Based upon data from the 2000 Census (CTPP) and the 2005-2007 ACS, the total number workers who live in Flagstaff increased along with the number who took transit to work. During the same time, the number of people who worked at home increased along with those who drove alone and carpooled.” The World Gazette

Commutes increase for all modes

Sampling Error

Page 4: Working with the data

Some things to keep in mind

Obtaining Standard Errors is the Key• [SE = MOE / 1.645]

Formulas vary depending comparisons

• Sum or Difference of Estimates

• Proportions and Percents

• Means and Other Ratios

Working with 2000 data will be a little more involved

There are resources to help

Page 5: Working with the data

The ACS compass handbooks

http://www.census.gov/acs/www/guidance_for_data_users/compass_products/

A Compass for Understanding And Using ACS Data

Set of user-specific handbooks

Train-the trainer materialsE-learning ACS Tutorial

Annotated Presentations

Especially

Appendix 3

Page 6: Working with the data

NY State Data Center Calculator

http://sdcclearinghouse.wordpress.com/2009/03/03/spreadsheet-to-calculate-acs-margins-of-error-and-statistical-significance-for-sums-proportions-and-ratios/

Page 7: Working with the data

But what if I am using 2000 non-ACS Data?

You will need to Estimate the MOE and know the Survey Design Factor

Page 8: Working with the data

The CUTR Guide has you covered

and a Spreadsheet Calculator

http://www.nctr.usf.edu/pdf/77802.pdf

There’s a Report

http://www.nctr.usf.edu/spreadsheet/77802.xls

http://www.nctr.usf.edu/abstracts/abs77802.htm

Page 9: Working with the data

Transportation resources

http://onlinepubs.trb.org/onlinepubs/nchrp/nchrp_rpt_588.pdf

Page 10: Working with the data

Understanding the MOEPart 1, Profile 1 (Resident data)

Using the MOEWe know the number of workers has changed, but what is the range of that change?

A. 5,744?B. 5,072 to 6,416?C. 3,888 to 7,600?

Page 11: Working with the data

Another Flagstaff pointPart 1, Profile 1 (Resident data)

Part 2, Profile 1 (Workplace data)

Between the reference period what has the number of people who took transit to work in Flagstaff done? A. Gone Up? B. Gone Down? C. No significant Change

Which Table would you use and why?

Page 12: Working with the data

Two types of Collapsing

3- or 5-year ACS Tables

Page 13: Working with the data

C08301. MEANS OF TRANSPORTATION TO WORK - Universe: WORKERS 16 YEARS AND OVERData Set: 2007-2009 American Community Survey 3-Year Estimates

Collapsed table

Full table not available

Sometimes neither tables exist

And MOEs are greater than estimate

Population = 26,566

Page 14: Working with the data

“B” and “C” Tables

B08006 C08006 Means of Transportation

Page 15: Working with the data

“B” and “C” Tables

Page 16: Working with the data

Full and collapsed table

What do you notice about the Table?

Page 17: Working with the data

Some things to be aware of

What year is the data? Period Estimate

Page 18: Working with the data

Reliability/Currency

What data is more reliable?

Which is more current?

Page 19: Working with the data

Dollar Values and Income tables

ACS asks-- What was your income during the last 12 months?

Single Year Estimates 12 different periods

Each adjusted to single period (Jan to Dec)

Multiyear Estimates

Each year adjusted to

current year

Page 20: Working with the data

About Trend Analysis

Trend analysis (overlapping syndrome)

If you are doing trend analysis with multi-year estimates you can not compare successive period estimates due to the overlapping middle years.

Also, you can not compare a 3-year estimate with a 5-year estimate

Page 21: Working with the data

Change in Weighting

In 2009 changed to using sub-county totals as opposed to just county totals

Page 22: Working with the data

Change in Weighting

Detroit Example

“Detroit is the poster child for odd looking data”

Page 23: Working with the data

Change in Weighting (Analysis)

In 2009 changed to using sub-county totals as opposed to just county totals

Page 24: Working with the data

Source: 2000 CTPP and 2007ACS3, CTPP Data Profile 1

Light Rail Conundrum

Impact of New “Light Rail” systems might not be showing up

Page 25: Working with the data

One more thing on Pop EstimatesThe older estimates get revised every year but the ACS does not get reweighted

Maricopa County Population Estimates

Page 26: Working with the data

DRB Said… “Too many variables” crossed with Means of Transportation (Mode)

…makes for micro data record…and with a micro data record you could identify an individual

Now let’s focus on the CTPP data

But First a word on Disclosure - 3 year tables

Page 27: Working with the data

We Said…

Census Said…

No, You can’t identify an individual-- Hired a statistical consultant < 0.01%-- Had a hearing with DRB Bosses-- Made every argument possible

Tough Luck--Compress your Modes and improve your chances of passing our rules-- Chop your cross tabs to 5 variables

The Battle Ensued

Page 28: Working with the data

What we ended up with – for 3 year Tables

Five (5) Variables crossed with

Means of Transportation to work (MOT)

…andAdded for 5 year CTPP; Minority status, Presence of children

Page 29: Working with the data

A boat load of collapsing of the Modes

…and

Page 30: Working with the data

Disclosure Rules

7. For Worker FlowsMust have 3 unweighted records for

each O-D pair

Does not apply to Total Workers orWorkers by Mode to Work (all 18 modes)

(means of transportation)

Rule 7 was the killer

For the 5-year CTPP

Page 31: Working with the data

So What Did We Do?

NCHRP Web Report 180 ($550K) Producing Transportation Data Products from the ACS that Comply With Disclosure Rules

5-year CTPP will have two types of tables

Tables that passed Census Rules

Tables with Perturbation done to them

Privacy Protection

http://onlinepubs.trb.org/onlinepubs/nchrp/nchrp_w180.pdf

Page 32: Working with the data

Table Summary using 5-year Table list

Means of transportation Aggregate Vehicles UsedAggregate Travel Time Mean HH IncomeAggregate HH Income Aggregate CarpoolsAlmost all Part 3 Tables

Tables Using Perturbed Data Set

TAZ/BG Tract TAD Place County PUMA StatePart 1

Regular 111Perturbed 77

Part 2Regular 50Perturbed 65

Part 3Regular 2Perturbed 38

Page 33: Working with the data

Still left with some Disclosure Rules

1. All Tables Rounded0 = 0, 1-7 =4, 8 or > = nearest multiple of 5

2. Any number that ends in 5 or 0 stays as is3. Aggregate dollar values rounded to nearest 1004. Aggregate minutes to work and aggregate

vehicles use standard rounding5. Totals Rounded independently of cells6. Medians or quintiles not subject to rounding7. Percentages and rates calculated after rounding8. Medians and aggregates must be based on 3 or

more values

For All tables Regular (A) + Perturbed (B)

Page 34: Working with the data

Still left with some Disclosure Rules

1. Cell Suppression: For Tables A101106 (unweighted sample count of the population), A101107 (percent of population in sample), A110101(total housing units sampled), and A110103 (percent of housing units sampled), there must be 0 or at least 3 or more occupied housing units in sample to show the table

2. Table Suppression: Aggregates and Means must have at least 3 unweighted cases to be shown. The policy of the ACS Program Office is if any one cell in a table is suppressed, the whole table is suppressed

For Regular (A) Tables Only

Page 35: Working with the data

Some issues with the 5-year ACS?

Some Very Large MOEs

Block Group data only in download area (not in FactFinder)

Reliability of tract estimates is much lower than the 2000 LF

NO Workplace Tables! (Use CTPP Product)

The Census Bureau says: BG data should ONLY be used to build up larger geographic areas because the Margins of Error (MOEs) are

too large otherwise (JSM Conference August 2010)

AskAgainLater

Standard Data Products

Ken Hodges, Nielsen (claritas)ACS 5-Year Data: A First Look at the First Release (4.5 MB, ppt) http://www.copafs.org/UserFiles/file/HodgesMarch2011.pptx

Page 36: Working with the data

Source: Tract Data-Missouri State Data Center, Block Group Data-AFF

AFF all 21 Modes, MSDC all 21 but also collapsed with Total Commuters Added

MSDC put a value to MOES.

Let’s talk about Block Group Data for a moment

Page 37: Working with the data

First: Let’s consider MOEs

What do you notice?

Don’t forget if this was CTPP data it would be Rounded too

Page 38: Working with the data

Now lets fill in the table

CB does not give you Total Commuters but you like that. Can we talk about that for a moment?

Page 39: Working with the data

Now lets fill in the table

How would we get Total Commuters and more importantly the MOEs?

For the Estimate totals, just add the relevant estimates. But for MOEs you have some decisions to make

Page 40: Working with the data

Now lets fill in the table

488

Two different MOE approaches available

1. Calculate the 90% margin of error of the sum of more than two estimates

2. Calculate the 90% margin of error of the sum or difference between two estimated values (What two values would you use?)

1. Gives you an MOE of either 245 when including the MOE for ‘Other Means’ or 214 without it

2 Gives you an MOE 0f 209

Page 41: Working with the data
Page 42: Working with the data

What data should I use?

Travel Times for the 6-counties in NE Illinois

1. To compare with 1970, ‘80, ‘90 and 2000 Travel Times?

2. To compare with my town of 52K people?

3. To validate my 2008 vintage travel demand model?

Learn how to do the Coefficient of Variation Test

Page 43: Working with the data

The Upside - Data Evolution

Once you know all the data issues it is possible to use the data intelligently

It’s ignorance that kills you

Slides available at:

http://edthefed.com/MN_MPO/