Local Employment Dynamics Data: Advanced Topics C2ER Training Workshop June 4, 2012 Stephen Tibbets Erika McEntarfer LEHD Program US Census Bureau
Apr 01, 2015
Local Employment Dynamics Data:Advanced Topics
C2ER Training WorkshopJune 4, 2012
Stephen TibbetsErika McEntarfer
LEHD ProgramUS Census Bureau
2
Overview of this section
• Confidentiality protection in QWI and OnTheMap• Data identities• Understanding differences between LED data
and other data products• New data items: education, race & ethnicity, firm
age and size• Question and answer session
3
Confidentiality protection in QWI
• QWI was one of the first public use data products to use noise infusion to protect the underlying microdata– Chief advantage: noise infusion allows
release of small cells that would otherwise be suppressed.
– Noise infusion can’t fully protect cells with v. few observations, so there is still some cell suppression in QWI
4
Noise Infusion (“Fuzzing”)• How noise infusion works
– Every data item is distorted by a minimum amount– For a given workplace, data are always distorted in
the same direction, by the same percentage in every period and release of QWI’s
• When aggregated, the effects of the distortion cancel out for the vast majority of the estimates– The fewer entities in the cell, the more protection
(distortion) needed.– Be aware of noise infusion and suppression when
aggregating small cells.
5
Confidentiality protection in OnTheMap/LODES
• LODES is one of the first partially-synthetic data releases– Workforce characteristics by place of
residence are synthesized conditional on the underlying microdata
– Workforce characteristics by place of work are not synthesized (thus the ‘partially’)
6
QWI Identities: Overview
• There are a number of identities that have been defined to relate QWI’s both within and across quarters– A complete list can be found in infrastructure document,
section A.2.4
• Identities hold at the establishment level (in restricted use data)
• Identities may not hold exactly in publicly released data due to a number of factors, including:– weighting– fuzzing– changes in geography or industry for individual
establishments over time
7
Intertemporal Identity
• Employment at end of period t equals employment at beginning of period t+1
EmpEndt = Empt+1
– When this may not hold:• Industry or geography changes on establishment
record between quarters• Weighting adjustments change every quarter• Fuzz factor changes (successor-predecessor only)
8
Evolution of End-of-period employment
• Beginning and end-of-period employment are tied by accessions and separations
EmpEndt = Empt + Hirt - Sept
– When this may not hold:• This holds almost exactly in the public release files
– Some minor differences may arise due to rounding or precision in the calculation
– Selected measures may be suppressed on individual records
9
Job Flow Identity
• Job flows represents the net of job creation and job destruction
FrmJbCt = FrmJbGnt - FrmJbLst
– When this may not hold:• This holds almost exactly in the public release files
– Some minor differences may arise due to rounding or precision in the calculation
– Selected measures may be suppressed on individual records
10
Creation-Destruction Identity
• The difference between beginning and end-of-period employment equals the net of creation and destruction
EmpEndt = Empt + FrmJbGnt - FrmJbLst
– When this may not hold:• Alternate fuzzing is applied to firm measures, based on
average fuzz factors for Emp and EmpEnd.• http://lehd.did.census.gov/led/library/techpapers/tp-2006-02.pdf
11
New Hires/Recalls Identity
• Accessions is the sum of new hires and recalls
HirAt = HirNt + HirRt
– When this may not hold:• This holds almost exactly in the public release files
– Some minor differences may arise due to rounding or precision in the calculation
– Selected measures may be suppressed on individual records
12
Understanding Differences between QWI, OnTheMap and other data sources
• Users are often confused when different data provide different answers– For QWI, users want to understand
differences between QCEW and JOLTS– For OnTheMap, users want to understand
differences between LODES and Journey to Work.
13
Understanding QCEW-QWI Differences
• While state employment totals should be quite close, sub-state estimates will display deviations
• These differences have multiple sources• Different source data, different employment concepts,
geography edits, other edits and imputations differ across the agencies.
• But chiefly arise because: – to provide worker demographics, QWI aggregates from
individual UI records rather than firm employment– Census does not receive a QCEW file that includes final
edits.
14
Causes of Differences:Measure Definition
• B and Mon1 do not capture exactly the same universe – An individual may count towards either one of
the measures, but not towards the other
• Differences generally minor, but may be noticeable in some industries with particular seasonal patterns– e.g., education, agriculture
15
Causes of Differences:BLS Data Editing
• LEHD data receipts– Before 2004 LEHD received BLS edited data– Since 2004 LEHD does not receive BLS edited data (CIPSEA)
• BLS QCEW file may be edited/different from that which LEHD receives– Completeness– Imputed employment– Industry/geography changes
• Statewide totals are close (<1% off)• LEHD QA will periodically note BLS QCEW data
inconsistent with internal LEHD QCEW micro-data
16
Causes of Differences:UI Wage Data Reporting
• Firm may fail to report wage records– QCEW still reported or imputed
• Firm may report wage records and QCEW records on different account numbers– Successor/predecessor mistiming– Public sector issues
• PIK (SSN) miscoding prevents linking wage records to same longitudinal job
17
Causes of Differences:Industry Assignment
• Most establishments are assigned based on the reported NAICS_AUX
• For earlier years in the data series, the reported SIC code is probabilistically mapped to the current NAICS codes– Imputes may also be used for transitions between 1997, 2002,
and 2007 NAICS
• LDB data are used for NAICS back-coding purposes when the file has been provided by state
• Variations in algorithms between LEHD and BLS may result in differences– NAICS sector 55 (management of companies) displays
particular issues during SIC-NAICS transition
18
Causes of Differences:Geographic Coding
• LEHD performs own geo-coding of addresses– Generates lat-long for distance measures, allows
custom geography
• Address data are processed along with address data from other sources
• Results may differ from BLS assignments– Marginal shift over county line– Significant relocation
• Effort currently underway to reengineer LEHD geographic assignment to improve results
19
Differences between OnTheMap and Journey to Work
• OnTheMap uses LEHD data– Administrative data on employment, wages,
residence, and establishment locations
• Journey to Work uses ACS data– User reported place of work and place of
residence, wages and employment
20
OnTheMap and Journey to Work, some reasons they may differ• OnTheMap
– Establishment may not be same as worksite (construction workers)
– Tax address may differ from residence (students)
• Journey to Work– High nonresponse on place of work– Commute distance is capped in JtoW– Response bias in employment, wages
21
22
23
24
25
26
27
Overview: Summary
– The QWI are developed by incorporating data from a broad variety of sources
– Differences in data sources, construction, and imputation procedures may cause employment estimates that do not match other sources