Top Banner
Lecy Data-Driven DM Lecture 09 Working with Maps Files and Census Data
51

Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Jan 20, 2016

Download

Documents

Iris Gaines
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Lecy ∙ Data-Driven DM

Lecture 09Working with Maps Files

and Census Data

Page 2: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

census data

Page 3: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Census Topics by Table

From: http://censusreporter.org/topics/

Each link below brings you to a page describing the concepts and tables covered by the Census and AmericanCommunity Survey on that particular topic.

Getting StartedThe Census is a big subject and there’s a lot to learn, but you don’t have to learn it all at once. Here’s somehelp knowing the lay of the land.

Table CodesWhile Census Reporter hopes to save you from the details, you may be interested to understand some of therationale behind American Community Survey table identifiers.

Age and SexHow the Census approaches the topics of age and sex.

ChildrenTables concerning Children. Helpful to consider in relation to Families.

CommuteCommute data from the American Community Survey.

EmploymentWhile the ACS is not always the best source for employment data, it provides interesting information forsmall geographies that other sources don’t cover.

FamiliesFamilies are an important topic in the ACS and a key framework for considering many kinds of data.

Page 4: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

GeographyGeography is fundamental to the Census Bureau’s process of tabulating data. Here are the key concepts youneed to understand.

Health InsuranceThe ACS has a number of questions that deal with health insurance and many corresponding tables.

IncomeHow the Census approaches the topic of income.

MigrationHow the Census deals with migration data.

PovertyPoverty data and how it is used within the ACS.

Public AssistancePublic assistance data from the ACS.

Race and Hispanic OriginRace is a complex issue, and no less so with Census data. A large proportion of Census tables are brokendown by race.

Same-Sex CouplesAlthough Census does not ask about them directly, there are a number of ways to get at data about same-sexcouples using ACS data.

SeniorsIn addition to basic Census data about age, there are a small number of Census tables which focus directlyon data about older Americans, and on grandparents as caregivers.

Veterans and MilitaryData collected about past and present members of the U.S. Armed Forces.

Page 5: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

American Community Survey

The American Community Survey (ACS) is an ongoing statistical survey by the U.S. Census Bureau, sent to approximately 250,000 addresses monthly (or 3 million per year). It regularly gathers information previously contained only in the long form of the decennial census. It is the largest survey other than the decennial census that the Census Bureau administers.

Note: What is the difference between 1 year, 3 year, 5 year?

Page 6: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml

Page 7: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml

STEP 01: GO TO THE “AMERICAN FACTFINDER” HOMEPAGE

Click on “Advanced Search”

Page 8: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

STEP 02: SEARCH BY TABLE NAME

Enter your topic of table name.

For this lab we will use poverty measures contained in the table: S1701

Page 9: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

STEP 03: SELECT YOUR UNIT OF ANALYSIS (GEOGRAPHY)

Click on “Geographies” and select “Census Tract”

Page 10: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

STEP 03: SELECT YOUR UNIT OF ANALYSIS (GEOGRAPHY)

Follow the prompts to select your state, then select “all counties”

Page 11: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

STEP 04: MATCH DATA YEAR TO SHAPEFILES YEAR

Select the version of the data that matches your shapefiles.

Recall that census tract boundaries may change over time, so matching data to shapefiles ensures accuracy.

Page 12: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

STEP 05: DOWNLOAD IN CSV FORMAT

Page 13: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

STEP 05: DOWNLOAD IN CSV FORMAT

CONTENTS

Text File: contains information on the table that you downloaded (ignore for now).

Metadata File: Contains variable names and definitions.

With Ann File: Your actual data, first line contains variable names (annotated) – is the largest file.

Readme File: disclaimers (ignore for now).

Read this file into R

Page 14: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

geographic units

Page 15: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

In the U.S., Census tracts are "Designed to be relatively homogeneous units with respect to population characteristics, economic status, and living conditions, census tracts average about 4,000 inhabitants."

Why are the tracts different sizes?

Page 16: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Nested Administrative Units

• States• Counties

• Census Tracts• Census Blocks

Census Tract is usually the most granular levelof data provided by census

County

Census Tract

Block Group

Block

Page 17: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Example: Household Income by Census Tract

Page 18: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

FIPS Code Structure

County Census TractState42003000102 42-003-0001.02

11-digit Geo ID Code

Page 19: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Which counties are in my MSA?

http://www.bea.gov/faq/index.cfm?faq_id=470

Pittsburgh, PA (Metropolitan Statistical Area)• Allegheny, PA [42003]• Beaver, PA [42007]• Butler, PA [42019]• Fayette, PA [42051]• Washington, PA [42125]• Westmoreland, PA [42129]

FIPS Code for Allegheny County (42003) 42 = Pennsylvania003 = Allegheny County

Page 20: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

visual narrative

Page 21: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Your data tells a story

Page 22: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

WHAT PATTERN DO YOU SEE?

Page 23: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

WHAT PATTERN DO YOU SEE NOW?

Equal Intervals Quantiles

Geometric

These maps are all made with the same data using different intervals for the break points.

Page 24: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

BREAK POINTS FOR NORMAL DISTRIBUTIONS

http://uxblog.idvsolutions.com/2010/03/crazy-world-of-range-breaks.html

Page 25: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

http://uxblog.idvsolutions.com/2010/03/crazy-world-of-range-breaks.html

Page 26: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

BREAK POINTS FOR SKEWED DISTRIBUTIONS

Page 27: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.
Page 28: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

HIGHLY-SKEWED DISTRIBUTIONS:

Think about skewed distributions like you think about classes of wine.

The first level of quality is a bottle for $3, the next level is a bottle at $6, the next level is at $10-12, the next level at $25, the next level at $50, then $100.

To move up one class, you basically double the price.

If you want a scale that translates wine prices into class, it would look something like:

$0 - $4 Cheap wine$4 - $8 Low quality$8 - $20 Medium quality$20 - $80 High quality$80 + Excellent quality

Page 29: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

http://www.medpagetoday.com/PublicHealthPolicy/GeneralProfessionalIssues/50497

Page 30: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Interval Size Increases

Page 31: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

CLASSIFYING DATA

Process of placing data into groups (classes or bins) that have a similar characteristic or value

Break points• Breaks the total attribute range up into these intervals • Keep the number of intervals as small as possible (5-7)• Use a mathematical progression or formula instead of picking

arbitrary values

Break points

Page 32: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

CLASSIFICATIONS

Quantiles

• Places the same number of data values in each class

• Will never have empty classes or classes with too few or too many values

• Attractive in that this method produces distinct map patterns

• Analysts use because they provide information about the shape of the distribution.

• Example: 0–25%, 25%–50%, 50%–75%,75%–100%

Page 33: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Equal intervals

• Divides a set of attribute values into groups that contain an equal range of values

• Best communicates with continuous set of data

• Easy to accomplish and read

• Not good for clustered data

• Produces map with many features in one or two classes and some classes with no features

CLASSIFICATIONS

Page 34: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Increasing interval widths

• Long-tailed distributions

• Data distributions deviate from a bell-shaped curve and most often are skewed to the right with the right tail elongated

• Example: Keep doubling the interval of each category, 0–5, 5–15, 15–35, 35–75 have interval widths of 5, 10, 20, and 40.

CLASSIFICATIONS

Page 35: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Exponential scales

• Popular method of increasing intervals

• Use break values that are powers such as 2n or 3n

• Generally start out with zero as an additional class if that value appears in your data

• Example: 0, 1–2, 3–4, 5–8, 9–16, and so forth

CLASSIFICATIONS

Page 36: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

CHOROPLETH MAP EXAMPLE

Percentage of vacant housing units by county

Page 37: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

RULES OF THUMB:

1. First Rule: use common sense! What do your groups represent, and are they meaningful? Are you misleading your audience with unreasonable breaks?

2. Binning by quantiles is typically a safe way to create breaks to show low, medium, and high values.

3. If a lot of your data is bunched together (for example, half of your values are close to zero), quantiles will not be meaningful because it will imply differences that do not exist.

4. If your distribution is skewed, consider increasing-interval or exponential scales.

For example, define the first group as 0-2, second as 2-6, third as 6-14, next as 14-30 (your interval size doubles each break).

Page 38: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

U.S. population by state, 2000

Original map (natural breaks)

Page 39: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Not good because too many values fall into low classesEqual interval scale

Page 40: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Shows that an increasing width (geometric) scale is neededQuantile scale

Page 41: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Custom geometric scale• Experiment with exponential scales with powers of 2

or 3.

Page 42: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Normalizing data

Page 43: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Divides one numeric attribute by another in order to

minimize differences in values based on the size of

areas or number of features in each area Examples:

• Dividing the number of vacant housing units by the total number of housing units yields the percentage of vacant units

• Dividing the population by area of the feature yields a population density

43

NORMALIZING DATA

Page 44: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Number of vacant housing units by state, 2000

NON-NORMALIZED DATA

Page 45: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Percentage vacant housing units by state, 2000

NORMALIZED DATA

Page 46: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

California population by county, 2007

NON-NORMALIZED DATA

Page 47: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

California population density, 2007

NORMALIZED DATA

Page 48: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Selecting colors

Page 49: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

SEQUENTIAL SCALE

http://colorbrewer2.org/

Page 50: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

DIVERGENT SCALE

Page 51: Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

RULES OF THUMB:

1. If you are highlighting a class of individuals, like those in poverty, a single color might be sufficient (FE red for in poverty, gray for not).

2. When you are highlighting performance along a single dimension, use a sequential scale with white or gray at the bottom of your color scale and a dark color at the top.

3. If your comparison is relative to the average put white or gray in the middle to represent “average.” Consider red for low, blue or green for high (depending on the context if high is good, low is bad).

4. Five to seven categories is typical.