The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Post on 29-Mar-2015

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

The Nature of the Bias When Studying Only Linkable Person Records:

Evidence from the American Community Survey

Adela Luque (U.S. Census Bureau)

Brittany Bond (U.S. Department of Commerce)

J. David Brown & Amy O’Hara (U.S. Census Bureau)

FedCASIC March 2014

1

2

Disclaimer

Any opinions and conclusions expressed herein are those of the authors and do not necessarily reflect the views of the U.S. Census Bureau

All results have been reviewed to ensure that no confidential information on individual persons is disclosed

Overview

Motivation Objectives Data & Methodology Background on Anonymous Identifier

Assignment Process Expected Effects Results Conclusions

3

Motivation

Record linkage can enrich data, improve its quality & lead to research not otherwise possible - while reducing respondent burden & operational costs

Linking data requires common identifiers unique to each record that protect confidentiality

Census Bureau assigns Protected Identification Keys (PIKs) via a probabilistic matching algorithm: PVS (Personal Identification Validation System)

Not possible to reliably assign a PIK to every record, which may introduce bias in data analysis

4

Objectives What characteristics are associated with the probability

of receiving a PIK? That is, what is the nature of the bias introduced by incomplete PIK assignment?

Help researchers understand nature of bias, interpret results more accurately, adjust/reweight linked analytical dataset

Examine bias using regression analysis - before & after changes in PVS. Do alterations to PVS improve PIK assignment rates as well as reduce bias? NORC (2011) described some demographic and socio-economic

characteristics of those records not getting a PIK

5

Data & Methodology 2009 & 2010 American Community Survey (ACS) – processed through

PVS Ongoing representative survey of the U.S. population Socioeconomic, demographic & housing characteristics 50 states & DC - Annual sample approximately 4.5 million person records

Probit model for 2009 and 2010 separately Dependent variable = 1 if person record received a PIK (0 otherwise) Covariates:

Demographic characteristics: age, sex, race and Hispanic origin Socio-economic characteristics: employment status, income, poverty status, marital status, level of

education, public program participation, health insurance status, citizenship status, English proficiency, military status, mobility status, and household type

Housing and address-related characteristics: urban vs. rural, type of living quarter, age of living quarter

ACS replicate weights Report marginal effects

2009 & 2010 results compared – before & after changes to PVS

6

Background on PVS

Probabilistic match of data from an incoming file (e.g., survey) to reference file containing data from the Social Security Administration enhanced with address data obtained from federal administrative records

If a match is found, person record receives a PIK or is “validated”

7

Background on PVS

Initial edit to clean & standardize linking fields (name, dob, sex & address)

Incoming data processed through cascading modules (or matching algorithms)

Only records failing a given module move on to the next Impossible to compare all records in incoming file to all

records in reference file → “blocking” Data split into blocks/groups based on exact matches of

certain fields or part of fields – probabilistic matching within block

8

Background on PVS

2009 PVS Modules Verification – Only for incoming files w/ SSNs Geosearch looks for name/dob/gender matches after

blocking on an address or address part (within 3-digit ZIP area)

Namesearch looks for name & dob matches within a block based on parts of name/dob

Each module has several ‘passes’ – different blocking & matching strategies

9

Background on PVS 2010 PVS Enhancements

ZIP3 Adjacency Module looks for name/dob/gender/address matches after blocking on address field parts in areas adjacent to 3-digit ZIP area

DOB Search Module looks for name/gender/dob matches after blocking on month & day of birth

Household Composition Search Module looks for name/dob matches for unmatched records that are seen in past at same address with PIKed record

Inclusion of ITINs in reference file

10

Expected EffectsLess likely to obtain a PIK: Insufficient or inaccurate person identifying info in

incoming record Issues w/ data collection or withholding due to language

barriers, trust in govt., privacy preferences Identifying info in incoming file & reference file more likely to

differ Address info differs/not updated

Movers, rent vs. own, certain types of housing Record not in government reference files

Newborns, recent immigrant, very poor/unemployed/no govt. program recipient

11

Results – Overall Validation Rates

2009 20100.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

88.192.6

PVS Validation Rate (weighted)

12

Sources: 2009 & 2010 ACS

Probit Results

Hispanic Non-Hispanic (base)

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

Marginal Effect of Hispanic Origin on PVS Validation

20092010

13

Sources: 2009 & 2010 ACS

Probit Results

14

White alone (base)

Black alone AIAN alone Asian alone NHPI alone Some Other Race

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

0.005

0.010

Marginal Effect of Race on PVS Validation

2009 2010

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Probit Results

Non-U.S. Citizen Foreign-Born U.S. Citizen U.S. Citizen (base)

-0.140

-0.120

-0.100

-0.080

-0.060

-0.040

-0.020

0.000

0.020

Marginal Effect of Citizenship Status on PVS Validation

20092010

15

Probit Results

Poor Spoken English Not Poorly Spoken English (base)

-0.050

-0.045

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

Marginal Effect of Home Spoken English Quality on PVS Validation

20092010

16

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Probit Results

0-2 3-5 6-9 10-14 15-18 19-24 25-34 (base)

35-44 45-54 55-64 65-74 75 and Older

-0.050

-0.040

-0.030

-0.020

-0.010

0.000

0.010

0.020

0.030

0.040

0.050

Marginal Effect of Age on PVS Validation

2009 2010

17

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Probit Results

Non-Mover in 12 Months Before IM (base)

Mover From Abroad in 12 Months before IM

Domestic Mover in 12 Months before IM Moving status, missing

-0.180

-0.160

-0.140

-0.120

-0.100

-0.080

-0.060

-0.040

-0.020

0.000

0.020

Marginal Effect of Mobility Status on PVS Validation

2009 2010

18

Probit Results

Rented Housing Unit Own Home (base)

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

Marginal Effect of Rent vs. Own on PVS Validation

20092010

19

Probit Results

Mobile

Home (b

ase)

Lives i

n Gro

up Quarte

rs

Detached O

ne-Family

House

Attached O

ne-Family

House

Building w

ith 2 Apartm

ents

Building w

ith 3-4 Apartm

ents

Building w

ith 5-9 Apartm

ents

Building w

ith 10-19 Apartm

ents

Building w

ith 20-49 Apartm

ents

Building w

ith 50+ Apartm

ents

Other (Boat/R

V/Van, e

tc.)

-0.040

-0.030

-0.020

-0.010

0.000

0.010

0.020

0.030

0.040

0.050

Marginal Effect of Type of Living Quarter on PVS Validation

2009 2010

20

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Probit Results

Non-Family Household (base) Living with Family0.0000

0.0050

0.0100

0.0150

0.0200

0.0250

0.0300

Marginal Effect of Family Household Status on PVS Validation

20092010

21

Probit Results

Rural (base) Urban Area

-0.004

-0.002

0.000

0.002

0.004

0.006

0.008

0.010

Marginal Effect of Rural vs Urban Area on PVS Validation

20092010

22

Probit Results

Private

Employment

Government E

mployment

Self-Employed

Family

Employment

Not Employed - i

ncludes m

issing (b

ase)-0.005

0.0000.0050.0100.0150.0200.0250.0300.035

Marginal Effect of Employment Status on PVS Validation

2009 2010

23

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Probit Results

Social S

ecurit

y Recipient

Private

Health

Insu

rance

Public H

ealth In

sura

nce

Both Priv

ate and Public

Health

Insu

rance

Uninsure

d (base

)0.000

0.010

0.020

0.030

0.040

0.050

0.060

Marginal Effect of Health Insurance Status on PVS Validation

20092010

24

Probit Results

Public Assistance Recipient Not Public Assistance Recipient (base)0.0000

0.0020

0.0040

0.0060

0.0080

0.0100

0.0120

0.0140

Marginal Effect of Receiving Public Assistance on PVS Validation

20092010

25

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Probit Results

26

Not In Poverty (base) In Poverty

-0.020

-0.018

-0.016

-0.014

-0.012

-0.010

-0.008

-0.006

-0.004

-0.002

0.000

Marginal Effect of Poverty Status on PVS Validation

20092010

Conclusions

Mobile persons, those with lower income, unemployed, in process of integrating in economy/society, non-participants in government programs are less likely to be validated Renters, movers, mobile homes Low income, non-employed, most minorities, non-U.S.

citizens, poor English Non-participants of govt. program, uninsured, non-military

Researchers may wish to reweight observations based on validation propensity

27

Conclusions

Changes to PVS system Increased overall validation rate by 4.5 percentage

points Reduced validation differences across most groups

from 2009 to 2010

Record linkage research can lead to higher PIK assignment rates and less bias

28

Thank you!adela.luque@census.gov

29

top related