Top Banner
UNEMPLOYMENT IN AMERICA Analysis of the Census Planning Database, 2010 Census and 2008- 2012 American Community Survey seph Reiter – Villanova University Regression Methods – MAT84
16

Unemployment in America

Aug 07, 2015

Download

Data & Analytics

Joseph Reiter
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unemployment in America

UNEMPLOYMENTINAMERICA

Analysis of the Census Planning Database, 2010 Census and 2008-2012 American Community Survey

Joseph Reiter – Villanova University Regression Methods – MAT8406

Page 2: Unemployment in America

Regression Methods - MAT8406

Overview• Explanation of Dataset• Treatment and Selection of Variables• Model Assumptions• Model Comparison• Interpretation of Results

Joseph Reiter - Villanova University

Page 3: Unemployment in America

Regression Methods - MAT8406

Dataset• Tract Level Planning Database with 2010 Census and

2008 – 2012 American Community Survey Data • Tracts ≈ 4,000 people , same general geographic area for

each census• 73,068 total observations• 559 variables

• Geography • Population• Households• Housing Units• Operational

• Response Variable = % Unemployment

Joseph Reiter - Villanova University

Page 4: Unemployment in America

Regression Methods - MAT8406

Additional Variables• Added indicator variables for 9 regions of the country

Joseph Reiter - Villanova University

Page 5: Unemployment in America

Regression Methods - MAT8406

Variable Reduction• Start with 559 + 8 (567 variables)• Only use percent variables (remove ½)• Drop margin of error variables (about ¼)• Remove repeats between census data and ACS data• Drop variables related to specific languages spoken (only

keep if English was spoken as primary language or English spoken very well)

• Drop operational variables (specifics about census returns)• Drop variables which are linear combinations of other

variables (ex. pct_Male is a linear combination of pct_Female)

• 58 independent variables remain

Joseph Reiter - Villanova University

Page 6: Unemployment in America

Regression Methods - MAT8406

Variable Selection• LASSO: selected 33 variables• Forward Selection: 48 variables• Backwards Elimination: 47 variables• Any variable that was not selected by more than one method was

eliminated (48 left)• Remove non-significant and some multicollinearity issues (31 left)• Using all possible regressions, variables with large individual r-square

values were chosen ( > 0.1)• RESULT: 5 variable model

• % Persons Below Poverty Level• % Non-Hispanic Black• % On Public Assistance• % Not Graduate High School• % College Degrees

Joseph Reiter - Villanova University

Page 7: Unemployment in America

Regression Methods - MAT8406

Evaluating Variance of Residuals  No Transform Arcsin(sqrt) Y^(1/3)

Residuals(with zeros)

Residuals(no zeros)

Residuals(High influence

removed)

Joseph Reiter - Villanova University

Page 8: Unemployment in America

Regression Methods - MAT8406

Evaluating Normality of Residuals  No Transform Arcsin(sqrt) Y^(1/3)

QQplots(with zeros)

QQplots(no zeros)

QQplots(high

influenceremoved)

Joseph Reiter - Villanova University

Page 9: Unemployment in America

Regression Methods - MAT8406

Model Comparison

  No Transform {A} Arcsin(sqrt) {B} Y^(1/3) {C}

R2 0.4686 0.4524 0.4330Intercept 6.99 7.16 7.21

Prs Blw Pov Lev 0.141 0.109 0.0956NH Blk alone 0.0692 0.0515 0.0446

PUB ASST INC 0.325 0.243 0.212Not HS Grad 0.0159 0.0167 0.0174

College -0.0526 -0.0525 -0.0531

Joseph Reiter - Villanova University

Parameter Estimates (transformed to original percents)

  No Transform {D} Arcsin(sqrt) {E} Y^(1/3) {F}R2 0.4810 0.4591 0.4374

Number Points Removed 281 159 148

Intercept 6.99 7.13 7.20Prs Blw Pov Lev 0.132 0.105 0.0954

NH Blk alone 0.0670 0.0512 0.0445PUB ASST INC 0.333 0.252 0.216Not HS Grad 0.0192 0.0176 0.0175

College -0.0513 -0.0519 -0.0529

With High Influence Points Removed

Page 10: Unemployment in America

Regression Methods - MAT8406

Test PointsPrs Blw Pov Lev

NH Blk alone

PUB ASST INC

Not HS Grad College Model A Model B Model C Model D Model E Model

F Actual

16 13 3 15 28   9.89 9.17 8.82 9.82 9.16 8.83 -

9 0 0 1 54   5.44 5.42 5.39 5.42 5.40 5.39 -

0 16 0 13 25   6.99 6.88 6.83 7.03 6.88 6.82 -

11 6 6 21 14   10.50 9.93 9.64 10.52 9.95 9.66 -

29 46 0 18 81   10.30 8.78 8.05 10.08 8.72 8.05 -

35 10 4 34 16   13.62 12.92 12.51 13.43 12.87 12.51 -

5.23 3.33 5.54 3.75 38.97 7.77 7.25 7.02 7.82 7.28 7.03 8.85

28.05 78.19 4.73 17.99 18.21 17.23 16.41 15.83 16.91 16.35 15.85 17.65

5.86 2.57 2.07 7.33 35.76 6.90 6.67 6.57 6.93 6.67 6.56 9.65

31.82 86.12 0.00 33.37 0.61 17.94 17.66 17.45 17.56 17.52 17.42 12.19

13.57 5.00 1.89 14.95 9.16 9.62 9.23 9.03 9.56 9.20 9.03 7.84

Joseph Reiter - Villanova University

Page 11: Unemployment in America

Regression Methods - MAT8406

Random Forest

Joseph Reiter - Villanova University

Page 12: Unemployment in America

Regression Methods - MAT8406

Model suggested by random forest

Joseph Reiter - Villanova University

Page 13: Unemployment in America

Regression Methods - MAT8406

Maps

Joseph Reiter - Villanova University

Page 14: Unemployment in America

Regression Methods - MAT8406

Maps

Joseph Reiter - Villanova University

Page 15: Unemployment in America

Regression Methods - MAT8406

Conclusions• Somewhat different models suggested by selection methods

and random forest• Model suggested is fairly robust to outliers and influential points• Geographic region appeared be less critical than expected, but

still significant (although not include in the fully reduced models)

• Correlational relationship, may not be a causal relationship, or causation may be in opposite direction. (ex. Percent below poverty level)

• Size of dataset presented issues with selecting variables, not necessarily best model

• More models should be developed and considered before concluding on the best model

Joseph Reiter - Villanova University

Page 16: Unemployment in America

Regression Methods - MAT8406

Thank You!• Questions?

Joseph Reiter - Villanova University