1 Statistics Canada’s Small Area Estimation Product: BUPF 1.0 (Best Unbiased Prediction via Filtering) SAE-SPORD Project Team Statistics Research and Innovation Division Statistics Canada, Ottawa (for presentation to FLMM_LMIWG Workshop on Oct 17, 2007, Vancouver, BC)
41
Embed
1 Statistics Canada’s Small Area Estimation Product: BUPF 1.0 (Best Unbiased Prediction via Filtering) SAE-SPORD Project Team Statistics Research and Innovation.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Statistics Canada’s Small Area Estimation
Product: BUPF 1.0 (Best Unbiased Prediction via Filtering)
SAE-SPORD Project TeamStatistics Research and Innovation
DivisionStatistics Canada, Ottawa
(for presentation to FLMM_LMIWG Workshop on Oct 17, 2007, Vancouver, BC)
2
Project: SAE-SPORD (Small Area Estimation for Statistical Product Oriented
R&D)
Team: Avi Singh (Project Leader)
François Verret
Claude Nadeau
Pin Yuan
Acknowledgments:
Meth Res Block Fund, Labour Stat Div, FLMM-LMIWG
3
Outline
1. SAE: Introduction
2. SAE: Visual Depiction
3. Product BUPF: Description
4. BUPF Application to Labour Force Survey
5. BUPF Demonstration (GUI Sample Screen-shots)
6. Concluding Remarks and Future Work
4
1. SAE: Introduction
Direct estimates for small areas (or domains) not reliable; e.g., for provinces, annual LFS estimates of Managers in Manufacturing and Utilities (a three-digit occupation code A39) are not reliable. Here provinces could be deemed as small areas.
Data Requirements: Provincial estimates of employment by 3-digit occupation codes
5
Table 1 Monthly Total Employed (A39)(Annual Average for 2003 LFS )
Prov.Populatio
nSize
Sample Size
Direct Estimate
SE CV in %
NL 429,298 3,978 670 177 26.4
PEI 109,886 2,769 233 55 23.5
NS 758,549 5,858 1,532 292 19.0
NB 607,565 5,624 1,275 218 17.1
PQ 6,059,655 18,234 25,273 2,204 8.7
ON 9,766,566 30,373 42,447 3,178 7.5
MA 876,396 7,117 3,023 432 14.3
SK 744,431 7,295 1,963 339 17.3
AB 2,467,412 10,317 7,643 1,098 14.4
BC 3,346,181 9,636 8,676 1,228 14.2
Canada 25,165,939 101,201 92,734 4,260 4.6
6
1. SAE: Introduction …cont.
Need more sample to get more reliable estimates
A cost effective alternative-- use a model such as the common mean model; e.g., the proportion employed in A39 is common across provinces
Quality of estimates depends on the validity of the model.
7
1. SAE: Introduction …cont.
Model provides an indirect (or synthetic) estimate at the area level.
For the common mean model, multiply the national total by the provincial population proportion to get indirect the estimate, e.g., for NL
1.7% times 92,734 = 1582
8
Table 2 Direct and Indirect (under an oversimplified model) Estimates for A39 (Annual Average for 2003
LFS )
Prov.Populatio
nPortion
Sample Size
Direct Estimat
eSAE
Indirect
Estimate
Sample Size
NL 1.7% 3,978 670 1,582 101,201
PEI 0.4% 2,769 233 405 101,201
NS 3.0% 5,858 1,532 2,795 101,201
NB 2.4% 5,624 1,275 2,239 101,201
PQ 24.1% 18,234 25,273 22,329 101,201
ON 38.8% 30,373 42,447 35,989 101,201
MA 3.5% 7,117 3,023 3,229 101,201
SK 3.0% 7,295 1,963 2,743 101,201
AB 9.8% 10,317 7,643 9,092 101,201
BC 13.3% 9,636 8,676 12,330 101,201
Canada 100.0% 101,201 92,734 92,734 101,201
9
1. SAE: Introduction …cont.
A combination of the two estimates ( direct and indirect) may provide a reasonable estimate with adequate precision depending on the level of small area.
The direct estimate is not precise but unbiased, while the indirect estimate is generally precise but not unbiased.
10
1. SAE: Introduction …cont.
SAE combines the direct and the indirect in an optimal way:
SAE for Area d = (shrinkage factor for d) x (direct Estimate for d) +
(1- shrinkage factor for d) x (indirect estimate for d)
If the shrinkage factor is 10%, then only 10% of direct and 90% of indirect are used for SAE. If it is 50%, then both direct and indirect have equal say in compositing the two for SAE.
11
1. SAE: Introduction …cont.
The relative size of the shrinkage factor depends on variability in modeling error (in the indirect estimate) and sampling error (in the direct estimate).
Effective sample size for SAE is more than that for the direct estimate.
12
1: SAE: Introduction (Modeling Requirements)
Direct estimates from other small areas (termed indirect data) needed for modeling purposes; i.e., for predicting estimate for the area of interest.
Need enough small areas for adequate modeling. Subdivide provinces into subprovincial areas:• ER or ER by age by gender instead of province although it is the
province level that is of interest.
13
1: SAE: Introduction (Modeling Requirements)
Beneficial to have an Auxiliary Information Source (Administrative/ Census): need true population totals at the area level for all areas.
Using auxiliary source can improve modeling with the indirect data.
• Estimation Component ( combining direct and indirect)
24
4. BUPF Application to LFS…cont
Model: Direct Estimate for Area d = True value + sampling error
True Value= Predictor + Model error
Predictor = x1β1+ x2β2+…; it gives rise to indirect or synthetic estimates.
X-variables considered: # reported income, # employment beneficiary, age-sex counts, etc. all at the small area level
25
Table 3 Direct, Indirect and SAE of Monthly Total Employed (A39)(Annual Average For 2003 LFS )
Prov.
Direct SAE IndirectSAE - Dir
Dir.Estimate
CVEstimat
e
Mod. CVMod
RRMSE
Estimate
Mod. CVMod
RRMSE
NF 670 0.264 579 14.4 603 0.229 -0.136
PEI 233 0.235 207 16.8 187 0.179 -0.111
NS 1,532 0.19 1,417 10.5 1,450 0.177 -0.075
NB 1,275 0.171 1,112 10.0 1,083 0.168 -0.128
PQ 25,273 0.087 24,962 5.6 25,381 0.081 -0.012
ON 42,447 0.075 44,355 6.3 46,255 0.081 0.045
MA 3,023 0.143 2,348 8.2 2,251 0.129 -0.223
SK 1,963 0.173 1,766 9.1 1,753 0.164 -0.100
AB 7,643 0.144 7,276 7.8 7,292 0.134 -0.048
BC 8,676 0.142 8,712 9.4 8,792 0.129 0.004
Canada 92,734 0.046 92,734 4.6 95,047 0.073 0.000
26
5. STC’s SAE Product Demonstration
BUPF 1.0 Demo BUPF 1.0 Demo
28
Part I: Data Preparations
29
Part II: Modeling Preparations
30
Part II: Modeling Preparations
31
Part III: Model Selection and Diagnostics
32
Part III: Model Selection and Diagnostics
34
Part IV: Small Area Estimation and Evaluation
35
6. Concluding Remarks and Future Work
Several unique features in the BUPF product for SAE such as self-benchmarking, domain collapsing for nonsampled domains, and extensive diagnostics.
The Graphical User Interface (GUI) for the product is useful as a systematic checklist or as a virtual analyst for efficient production; also useful for training and product demonstration.
36
6. Concluding Remarks and Future Work
Complete beta-version of BUPF 1.0; current version is only alpha or a prototype and is not suitable for production.