Canadian Census E&I – Lessons Learned from 2006 with Plans for 2011 Work Session on Statistical Data Editing Vienna Austria, April 21-23 2008 Mike Bankier, Statistics Canada, [email protected]
Jan 25, 2016
Canadian Census E&I – Lessons Learned from 2006 with Plans for 2011
Work Session on Statistical Data EditingVienna Austria, April 21-23 2008
Mike Bankier, Statistics Canada, [email protected]
Outline of Talk
Changes Made for 2006 Census
Impact of adjusting occupancy status and imputation of total non-response households
Processing of demographic variables with an emphasis on age
Possible enhancements to E&I for 2011
Changes to 2006 Census
73% of dwellings mailed questionnaires18% of dwellings responded by Internet85% gave permission to link to tax formQuestionnaires captured using ICRNon-Response Follow-Up (NRFU) done from centralized officesFailed Edit Follow-Up (FEFU) done from call centres
2006 Census Changes
These new approaches reduced the field staff required by 46%
Because of widespread labour shortages in some regions, the collection period was extended from mid-July to the end of Aug. (Census day May15)
National NR rate 2.8% in 2006 vs 1.6% in 2001
Dwelling Classification Survey
Mistakes made in field classifying dwellings as occupied or unoccupied. Sample of dwellings revisited to reassess occupancy status for dwellings where no response receivedDCS estimated
17.4% of 934,564 dwelling classified as unoccupied were occupied and 29.1% of 366,527 dwellings classified as occupied but with no responses were actually unoccupied
Occupancy status for individual dwellings adjusted. Resulted in a 3.6% increase in the number of occupied dwellings and a 5.2% decrease in the number of unoccupied dwellings
Imputation of Total NR Households
After the DCS adjustment, total non-response dwellings had all responses imputed by borrowing unimputed responses from another householdUsing a single donor for total non-response was less likely to produce implausible results Weighting used in 2001 to convert unoccupied dwellings to occupied - it could transfer population from one city block to another and be noticed by users
Demographic E&I
Demographic E&I does minimum change imputation for blanks and inconsistencies so later program can form Census familiesAll demographic variables for all persons in household are imputed simultaneously using CANCEISThree types of Census families
Couples without childrenCouples with childrenLone Parents with children
Couple Editing Concepts
For a couple, they should beboth adults (age >=15) and
both married or both common-law and
have appropriate relationships to Person 1
Child/Parent Editing Concepts
For a child/parent pairAt least one parent must be 15 or more years older than the child and
A female parent must not be more than 50 years older than a child and
The relationships to Person 1 should be appropriate
0.85% In Wrong 5 Year Age Range - Data Capture Error
0
5,000
10,000
15,000
20,000
25,000
0 - 4
5 - 9
10 - 14
15 - 19
20 - 24
25 - 29
30 - 34
35 - 39
40 - 45
50 - 54
55 - 59
60 - 64
65 - 69
70 - 74
75 - 79
80 - 84
85 - 89
90 - 94
95 - 99
100 - 104
105 - 109
110 +
Captured Value
Correct Value
Analysis of Imputation of Age
AGEU and AGE represent respectively the age of the person before and after minimum change donor imputation
99.11% had AGEU = AGE
0.61% had AGEU = Blank/Invalid
0.28% had AGEU ≠ AGE because of an inconsistency between AGEU and another variable
AGE Imputation for WIFE
Female Lone Parent vs Child Ages Before Imputation
Female Lone Parent vs Child Ages After Imputation
WIFE vs Child Ages Before Imputation
WIFE vs Child Ages After Imputation
Number of Children by Age Difference With Mother
1
10
100
1,000
10,000
100,000
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Age Difference
Vital Statistics
Census Before Imputation
Census After Imputation
2011 Changes – Small Domains
Small domain (e.g. centenarians, same sex married couples) can have upwards bias because of response or data capture errors for persons outside the small domain
Sometimes no alternate source of data to verify the small domain count and the domain is too large to be manually reviewed 100%
2011 Changes – Small Domains
Manually review 20% sample of persons age 95+ to determine those with incorrect age
For other 80% of persons age 95+, use nearest neighbour imputation to determine those with incorrect age
Then in 2nd step, blank out incorrect ages and impute
2011 Changes – Use Failed Records as Donors
Sometimes stratum failure rate is so high that number of donors is insufficient
Failed records could be used as donors since frequently failed record is missing just one or two responses and would be suitable for imputing other responses
2011 Changes - More Minimum Change Donor Imputation
Will do more minimum change donor imputation and less deterministic imputation where possible
Will combine modules so more variables are imputed simultaneously where possible
Concluding Remarks
Sophisticated E&I programs can do a better job detecting and resolving edit failuresWith this comes the responsibility to make few assumptions regarding the characteristics of the non-respondents or those giving inconsistent responses The impact of imputation should be made clear to usersE&I should not be viewed as a panacea such that data quality standards can be lowered