Top Banner
The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany Bond (U.S. Department of Commerce) J. David Brown & Amy O’Hara (U.S. Census Bureau) FedCASIC March 2014 1
29

The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Mar 29, 2015

Download

Documents

Genevieve Angel
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

The Nature of the Bias When Studying Only Linkable Person Records:

Evidence from the American Community Survey

Adela Luque (U.S. Census Bureau)

Brittany Bond (U.S. Department of Commerce)

J. David Brown & Amy O’Hara (U.S. Census Bureau)

FedCASIC March 2014

1

Page 2: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

2

Disclaimer

Any opinions and conclusions expressed herein are those of the authors and do not necessarily reflect the views of the U.S. Census Bureau

All results have been reviewed to ensure that no confidential information on individual persons is disclosed

Page 3: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Overview

Motivation Objectives Data & Methodology Background on Anonymous Identifier

Assignment Process Expected Effects Results Conclusions

3

Page 4: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Motivation

Record linkage can enrich data, improve its quality & lead to research not otherwise possible - while reducing respondent burden & operational costs

Linking data requires common identifiers unique to each record that protect confidentiality

Census Bureau assigns Protected Identification Keys (PIKs) via a probabilistic matching algorithm: PVS (Personal Identification Validation System)

Not possible to reliably assign a PIK to every record, which may introduce bias in data analysis

4

Page 5: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Objectives What characteristics are associated with the probability

of receiving a PIK? That is, what is the nature of the bias introduced by incomplete PIK assignment?

Help researchers understand nature of bias, interpret results more accurately, adjust/reweight linked analytical dataset

Examine bias using regression analysis - before & after changes in PVS. Do alterations to PVS improve PIK assignment rates as well as reduce bias? NORC (2011) described some demographic and socio-economic

characteristics of those records not getting a PIK

5

Page 6: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Data & Methodology 2009 & 2010 American Community Survey (ACS) – processed through

PVS Ongoing representative survey of the U.S. population Socioeconomic, demographic & housing characteristics 50 states & DC - Annual sample approximately 4.5 million person records

Probit model for 2009 and 2010 separately Dependent variable = 1 if person record received a PIK (0 otherwise) Covariates:

Demographic characteristics: age, sex, race and Hispanic origin Socio-economic characteristics: employment status, income, poverty status, marital status, level of

education, public program participation, health insurance status, citizenship status, English proficiency, military status, mobility status, and household type

Housing and address-related characteristics: urban vs. rural, type of living quarter, age of living quarter

ACS replicate weights Report marginal effects

2009 & 2010 results compared – before & after changes to PVS

6

Page 7: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Background on PVS

Probabilistic match of data from an incoming file (e.g., survey) to reference file containing data from the Social Security Administration enhanced with address data obtained from federal administrative records

If a match is found, person record receives a PIK or is “validated”

7

Page 8: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Background on PVS

Initial edit to clean & standardize linking fields (name, dob, sex & address)

Incoming data processed through cascading modules (or matching algorithms)

Only records failing a given module move on to the next Impossible to compare all records in incoming file to all

records in reference file → “blocking” Data split into blocks/groups based on exact matches of

certain fields or part of fields – probabilistic matching within block

8

Page 9: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Background on PVS

2009 PVS Modules Verification – Only for incoming files w/ SSNs Geosearch looks for name/dob/gender matches after

blocking on an address or address part (within 3-digit ZIP area)

Namesearch looks for name & dob matches within a block based on parts of name/dob

Each module has several ‘passes’ – different blocking & matching strategies

9

Page 10: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Background on PVS 2010 PVS Enhancements

ZIP3 Adjacency Module looks for name/dob/gender/address matches after blocking on address field parts in areas adjacent to 3-digit ZIP area

DOB Search Module looks for name/gender/dob matches after blocking on month & day of birth

Household Composition Search Module looks for name/dob matches for unmatched records that are seen in past at same address with PIKed record

Inclusion of ITINs in reference file

10

Page 11: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Expected EffectsLess likely to obtain a PIK: Insufficient or inaccurate person identifying info in

incoming record Issues w/ data collection or withholding due to language

barriers, trust in govt., privacy preferences Identifying info in incoming file & reference file more likely to

differ Address info differs/not updated

Movers, rent vs. own, certain types of housing Record not in government reference files

Newborns, recent immigrant, very poor/unemployed/no govt. program recipient

11

Page 12: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Results – Overall Validation Rates

2009 20100.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

88.192.6

PVS Validation Rate (weighted)

12

Sources: 2009 & 2010 ACS

Page 13: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Hispanic Non-Hispanic (base)

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

Marginal Effect of Hispanic Origin on PVS Validation

20092010

13

Sources: 2009 & 2010 ACS

Page 14: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

14

White alone (base)

Black alone AIAN alone Asian alone NHPI alone Some Other Race

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

0.005

0.010

Marginal Effect of Race on PVS Validation

2009 2010

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 15: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Non-U.S. Citizen Foreign-Born U.S. Citizen U.S. Citizen (base)

-0.140

-0.120

-0.100

-0.080

-0.060

-0.040

-0.020

0.000

0.020

Marginal Effect of Citizenship Status on PVS Validation

20092010

15

Page 16: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Poor Spoken English Not Poorly Spoken English (base)

-0.050

-0.045

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

Marginal Effect of Home Spoken English Quality on PVS Validation

20092010

16

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 17: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

0-2 3-5 6-9 10-14 15-18 19-24 25-34 (base)

35-44 45-54 55-64 65-74 75 and Older

-0.050

-0.040

-0.030

-0.020

-0.010

0.000

0.010

0.020

0.030

0.040

0.050

Marginal Effect of Age on PVS Validation

2009 2010

17

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 18: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Non-Mover in 12 Months Before IM (base)

Mover From Abroad in 12 Months before IM

Domestic Mover in 12 Months before IM Moving status, missing

-0.180

-0.160

-0.140

-0.120

-0.100

-0.080

-0.060

-0.040

-0.020

0.000

0.020

Marginal Effect of Mobility Status on PVS Validation

2009 2010

18

Page 19: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Rented Housing Unit Own Home (base)

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

Marginal Effect of Rent vs. Own on PVS Validation

20092010

19

Page 20: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Mobile

Home (b

ase)

Lives i

n Gro

up Quarte

rs

Detached O

ne-Family

House

Attached O

ne-Family

House

Building w

ith 2 Apartm

ents

Building w

ith 3-4 Apartm

ents

Building w

ith 5-9 Apartm

ents

Building w

ith 10-19 Apartm

ents

Building w

ith 20-49 Apartm

ents

Building w

ith 50+ Apartm

ents

Other (Boat/R

V/Van, e

tc.)

-0.040

-0.030

-0.020

-0.010

0.000

0.010

0.020

0.030

0.040

0.050

Marginal Effect of Type of Living Quarter on PVS Validation

2009 2010

20

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 21: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Non-Family Household (base) Living with Family0.0000

0.0050

0.0100

0.0150

0.0200

0.0250

0.0300

Marginal Effect of Family Household Status on PVS Validation

20092010

21

Page 22: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Rural (base) Urban Area

-0.004

-0.002

0.000

0.002

0.004

0.006

0.008

0.010

Marginal Effect of Rural vs Urban Area on PVS Validation

20092010

22

Page 23: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Private

Employment

Government E

mployment

Self-Employed

Family

Employment

Not Employed - i

ncludes m

issing (b

ase)-0.005

0.0000.0050.0100.0150.0200.0250.0300.035

Marginal Effect of Employment Status on PVS Validation

2009 2010

23

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 24: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Social S

ecurit

y Recipient

Private

Health

Insu

rance

Public H

ealth In

sura

nce

Both Priv

ate and Public

Health

Insu

rance

Uninsure

d (base

)0.000

0.010

0.020

0.030

0.040

0.050

0.060

Marginal Effect of Health Insurance Status on PVS Validation

20092010

24

Page 25: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

Public Assistance Recipient Not Public Assistance Recipient (base)0.0000

0.0020

0.0040

0.0060

0.0080

0.0100

0.0120

0.0140

Marginal Effect of Receiving Public Assistance on PVS Validation

20092010

25

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 26: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Probit Results

26

Not In Poverty (base) In Poverty

-0.020

-0.018

-0.016

-0.014

-0.012

-0.010

-0.008

-0.006

-0.004

-0.002

0.000

Marginal Effect of Poverty Status on PVS Validation

20092010

Page 27: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Conclusions

Mobile persons, those with lower income, unemployed, in process of integrating in economy/society, non-participants in government programs are less likely to be validated Renters, movers, mobile homes Low income, non-employed, most minorities, non-U.S.

citizens, poor English Non-participants of govt. program, uninsured, non-military

Researchers may wish to reweight observations based on validation propensity

27

Page 28: The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.

Conclusions

Changes to PVS system Increased overall validation rate by 4.5 percentage

points Reduced validation differences across most groups

from 2009 to 2010

Record linkage research can lead to higher PIK assignment rates and less bias

28