Top Banner
1 Updated Unified Category System Updated Unified Category System for 1960-2000 Census Occupations for 1960-2000 Census Occupations Peter B. Meyer Peter B. Meyer US Bureau of Labor Statistics US Bureau of Labor Statistics (but none of this represents official measurement or policy) (but none of this represents official measurement or policy) SSHA 2006, Minneapolis; Nov 4, 2006 SSHA 2006, Minneapolis; Nov 4, 2006 Outline 1. Tentative standard categories 2. Users and bug fixes 3. How Census assigns occupation codes
25

Updated Unified Category System for 1960-2000 Census Occupations

Jan 14, 2016

Download

Documents

pelham

Updated Unified Category System for 1960-2000 Census Occupations. Peter B. Meyer US Bureau of Labor Statistics (but none of this represents official measurement or policy) SSHA 2006, Minneapolis; Nov 4, 2006. Outline Tentative standard categories Users and bug fixes - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Updated Unified Category System for 1960-2000 Census Occupations

1

Updated Unified Category System Updated Unified Category System for 1960-2000 Census Occupationsfor 1960-2000 Census Occupations

Peter B. MeyerPeter B. MeyerUS Bureau of Labor StatisticsUS Bureau of Labor Statistics

(but none of this represents official measurement or policy)(but none of this represents official measurement or policy)

SSHA 2006, Minneapolis; Nov 4, 2006SSHA 2006, Minneapolis; Nov 4, 2006

Outline1. Tentative standard categories2. Users and bug fixes3. How Census assigns

occupation codes4. Imputation practice

Page 2: Updated Unified Category System for 1960-2000 Census Occupations

2

Census Occupational Census Occupational ClassificationsClassifications

U.S. Bureau of Census determines a list of 3 digit occupation codes each ten years

Then puts one for employed respondents to the decennial Census and some other surveys

Vast data is available in these categories: CPS, ATUS, SIPP, NLS, ACS, decennial Census

But not always consistently over long time spans Research efforts may require some standard

Page 3: Updated Unified Category System for 1960-2000 Census Occupations

3

Tradeoffs in Classification Tradeoffs in Classification SystemsSystems Precise job distinctions vs. Consistency, duration,

and sample size High tech occupations vs. other technical occupations

blacksmith, database admin (shorter, more precise series) electrical engineer (longer evolving series)

“Superstars” jobs like athletes and musicians (need precision) Licensed jobs (need long comparable occupations)

Conformity with other data Avoid “sparseness” – many missing year-occ cells Meaning of occupation: function, tasks, skills, background,

social class

There is no perfect classification but there are tools & criteria for better ones.

Page 4: Updated Unified Category System for 1960-2000 Census Occupations

4

Baselines to improve onBaselines to improve on

IPUMS defined occ1950 for US workers recorded in ANY Census

Working paper (Meyer and Osborne, 2005) defined classification of 389 3-digit occupations codes from 1960 to present

It was adapted from the 500+ categories in 1990 Census: 379 categories have same name or almost same as 1990 125 were eliminated to help harmonize with other years

(Example to follow) 19 categories have expanded (changed name or a not-

elsewhere-classified category was given more scope) 3 categories added for 1960 data which doesn’t fit in

Page 5: Updated Unified Category System for 1960-2000 Census Occupations

5

Some distinctions are lost in Some distinctions are lost in standardizationstandardization

1970code

1970occupation

title

19801990code

1980/90 component category titlesCivilianLabor Force

% of1970

category

284

Sales workers,exceptclerks,retail trade

263 Sales workers, motor vehicles and boats 185,160 37.06%

266Sales workers, furniture and home

furnishings 98,941 19.80%

267Sales workers; radio, television, hi fi, and

appliances 76,674 15.35%

268Sales workers, hardware and building

supplies 81,668 16.35%

269 Sales workers, parts 39,120 7.83%

274 Sales workers, other commodities 16,008 3.20%

277 Street and door to door sales workers 2,082 0.42%

Census reports and IPUMS data show how many respondents would be coded in each of two classification systems.

Page 6: Updated Unified Category System for 1960-2000 Census Occupations

6

User input and new data since 2005

Sent these programs to 19 people who expressed interest Open-source code idea (helps find errors; also is public property)

Corrections from users did come in Philip Cohen, UNC Sociology, identified some problems/mistakes. Sarah Porter, research assistant at U of Iowa working with Jennifer

Glass, wrote a program to do some similar mappings. Comparing to that program I found mistakes in mine.

Dual-coded 1990/2000 data sets highlighted some surprises

Experimented with imputations (example to follow)

Visited the Census office where they assign these codes.

Page 7: Updated Unified Category System for 1960-2000 Census Occupations

7

Census Bureau's National Census Bureau's National Processing Center in Processing Center in

Jeffersonville, INJeffersonville, IN

Louisville, KY, is just south of it

I interviewed four specialists who assign occupation & industry codes.

Page 8: Updated Unified Category System for 1960-2000 Census Occupations

8

Information used when Information used when codingcoding “what kind of work"

“most important activities or duties"

employer name “what kind of industry”

city and state ("PSU") of respondent's home

industry type (manufacturing, service, other)

years of education, age, sex not income, although it was

available before Jan '94 software.

• Tens of thousands of job titles are mapped to a code in a reference book they have, if industry also matches.• Some cases may be "autocoded" by software and coder checks• After coding, public use samples have 3-digit occupation code and 3-digit industry code • Quality of assignments from public use samples are limited

Page 9: Updated Unified Category System for 1960-2000 Census Occupations

9

Imputation: Statisticians and Imputation: Statisticians and ActuariesActuaries

Counts of Actuaries and Statisticians in Census Sample

1960 1970 1980 1990

Actuaries . 45 129 182

Statisticians 199 237 352 338

These were separate categories in and after 1970

But in 1960 they were all in “statisticians and actuaries”

When standardizing (2005) they were put in “statisticians”

Will try to infer which of the 1960 people were actuaries.

Page 10: Updated Unified Category System for 1960-2000 Census Occupations

10

Statisticians and Statisticians and ActuariesActuaries

Pooled all 1970-1990 statisticians and actuaries Good predictors of whether respondent is an actuary:

Recorded in a later year Employed in insurance, accounting/auditing, or professional

services industries Employed in private sector High salary income High business income, or to earn mostly business income Is employed Lives in Connecticut, Minnesota, Nebraska, or Wisconsin

Ran many logistic regressions predicting the actuaries

Page 11: Updated Unified Category System for 1960-2000 Census Occupations

11

Statisticians and Statisticians and ActuariesActuaries

For 1970 data that logistic regression predicts occupation right 88% of the time

Revised counts of actuaries and statisticians after imputation

1960 1970 1980 1990

Actuaries 2929 45 129 182

Statisticians 170170 237 352 338

Page 12: Updated Unified Category System for 1960-2000 Census Occupations

12

1000

020

000

3000

040

000

5000

0

1960 1970 1980 1990year

Statisticians Actuaries

Mean salaries before reassignment

1000

020

000

3000

040

000

5000

0

1960 1970 1980 1990year

Statisticians Actuaries

Mean salaries after reassignment

More accurate standardized “statistician” category

Longer actuary time series Reduces sparseness – empty cells Builds a technique for this data

mining Benefits scale up through IPUMS

Statisticians Statisticians and Actuariesand Actuaries

Why work this arcane problem?

Page 13: Updated Unified Category System for 1960-2000 Census Occupations

13

Imputing judgesImputing judges In 1960 Census, lawyers and judges were one category Later, they’re separate, and separate in “standard” system Without more info, we categorize all in 1960 as “lawyers”. We wish to impute which ones are judges Useful fact: private sector ones were all called lawyers Predictors for the public sector ones, of who’s a judge:

Older Employed in state government High salary income Low business income Educated less than 16 years Employed at time of survey

Page 14: Updated Unified Category System for 1960-2000 Census Occupations

14

Logit regression predicting judges in 1970-Logit regression predicting judges in 1970-90 Census90 Census

  Coefficient Std error p-value

Year -0.005 0.011 0.633

Age 0.155 0.033 0.000

Age-squared -0.001 0.000 0.040

Federal government employee -1.440 0.137 0.000

State government employee 0.499 0.263 0.058

Ln(salary) -1.795 3.094 0.562

Ln(salary) squared 0.052 0.333 0.877

Ln(salary) cubed 0.003 0.012 0.798

Ln(business income) -0.041 0.036 0.261

Fraction of earned income that is business income -0.714 1.053 0.498

Education less than 16 years 2.235 0.320 0.000

Years of formal education -0.044 0.046 0.336

Is employed at time of survey 0.224 0.241 0.352

Constant 13.017 23.428 0.578

Dependent variable: maximum likelihood probability this individual is a judge.

Page 15: Updated Unified Category System for 1960-2000 Census Occupations

15

Thus we assign judge Thus we assign judge occupation codeoccupation code

gen logitindex = -.0046652 * year + .1549193 * age -.0006942 * age * age -1.4405086* indfed +.4986729 * indstate -1.795481 * lnwage +.0517015 * lnwage * lnwage +.0030016 * lnwage * lnwage * lnwage -.040749 * lnbus -.7140285 * busfrac +2.234934 * (educyrs<16) -.0442429 * educyrs +.2239105 * employed +13.0172 /* constant */ ; …

gen logitval=exp(logitindex)/(1.0+exp(logitindex))replace logitval=.0001 if !govtemployee /* this is a perfect predictor */replace logitval=.0001 if !indfed & !indstate & !indlocal /* this too */gen assigned = logitval>.46 /* Now ‘assigned’ has a 1 for imputed judges */

Threshhold probability is chosen to match the number of judges expected to be there, based on annual trend.

Can get 83% accurate predictions from such a rule on 1970 data.

This mis-assigns a few who should have stayed lawyers.

Page 16: Updated Unified Category System for 1960-2000 Census Occupations

16

Newly Imputed JudgesNewly Imputed Judges

  1960 1970 1980 1990

Lawyers 19711971 2570 5082 7603

Judges 8282 123 298 331

Respondents in Census samples after imputation

Page 17: Updated Unified Category System for 1960-2000 Census Occupations

17

What's next?What's next?

Use dual-coded CPS datasets with 1990 and 2000 codes to make a few more imputations

Keep listening, seek more help, make it better. Publish variable at IPUMS.org

Keep going? 1970 & 1980 dual coded data sets exist.

Page 18: Updated Unified Category System for 1960-2000 Census Occupations

18

Industry and occupation coding

Industry codes and occupations codes are assigned by the same group of people, at the same time for each respondent.

Industry is almost always decided first. The people who do that are “coders” Procedures are carefully documented I wasn’t a “sworn” Census agent and couldn’t

see it done, live

Page 19: Updated Unified Category System for 1960-2000 Census Occupations

19

Desirable Attributes of a Desirable Attributes of a ClassificationClassification For each occupation, well-behaved time-

series of: mean wage wage variance fraction of the population

New criterion: SPARSENESSSPARSENESS One prefers a classification not be sparse,

meaning it does not have many empty occ-year cells

Page 20: Updated Unified Category System for 1960-2000 Census Occupations

20

What new information would help referralists?

Information about a job title Information about employer's city and state

not respondent’s

But asking more questions would extend the CPS interview

Page 21: Updated Unified Category System for 1960-2000 Census Occupations

21

Problems faced by referralists

Too little information from respondent “Computer work" for “kind of work” Exaggeration (example: dot com businesses) Ambiguities:

"water company" for industry or employer "surveyor" occupation "boot" vs "boat" in handwriting

Having to hurry Referralists confer with each other routinely, but

sometimes make different choices from one another Does technological change go along with occupational

ambiguity? YES.YES. Problems with computer work, biotech. Still no nanotech in classification.

Page 22: Updated Unified Category System for 1960-2000 Census Occupations

22

The information coders have

Page 23: Updated Unified Category System for 1960-2000 Census Occupations

23

Who's Doing the CodingWho's Doing the Coding There were about 12 coders and 14 referralists

in October 2006 ReferralistsReferralists have been coderscoders before and

usually have 9+9+ years of experience I interviewed three referralistsreferralists, and a supervisor The ones I met handled referrals from several

surveys: CPS, ATUS, SIPP, NLS, ACS others on contract All these use decennial Census occupation codes

They DON’T handle the decennial Censuses

Page 24: Updated Unified Category System for 1960-2000 Census Occupations

24

Information available to referralist

Can match Employer name to a known employer from their Employer Name List (ENL), same as SSEL or Business Registry.

Can look on the web for that employer Can study “little red book” - SOC manual or (less often) giant Dict Occ Titles 1991 or, I’m told, look up employer in Dun and

Broadstreet data They try to make a coherent choice for industry and

occupation together.

Page 25: Updated Unified Category System for 1960-2000 Census Occupations

25

““Coders”Coders” and “Referralists”“Referralists”

CodersCoders follow carefully documented procedures from the Census headquarters in Suitland, MD

CodersCoders with two years of experience are expected to assign 94 codes an hour, with 95% accuracy (which is checked)

If there is not enough information to assign industry and occupation codes by procedure, the case is forwarded electronically ("referred") to a “Referralist" “Referralist" (aka (aka statistical assistant)statistical assistant)