Optimal Allocation in the Optimal Allocation in the Multi-way Stratification Multi-way Stratification Design for Design for Business Surveys Business Surveys (*) (*) Paolo Righi , Piero Demetrio Falorsi [email protected]; [email protected]Italian National Statistical Institute (*) Research of National Interest n.2007RHFBB3 (PRIN) “Efficient use of auxiliary information at the design and at the estimation stage of complex surveys: methodological aspects and applications for producing official statistics””
24
Embed
Optimal Allocation in the Multi-way Stratification Design for Business Surveys (*) Paolo Righi, Piero Demetrio Falorsi [email protected]; [email protected].
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Optimal Allocation in the Multi-way Optimal Allocation in the Multi-way Stratification Design for Stratification Design for
(*) Research of National Interest n.2007RHFBB3 (PRIN) “Efficient use of auxiliary information at thedesign and at the estimation stage of complex surveys: methodological aspects and applications forproducing official statistics””
OutlineOutline
Statement of the problem Multi-way Sampling Design Multi-way optimal allocation algorithm Monte Carlo simulation
Statement of the problem
Large scale surveys in Official Statistics usually produce estimates for a set of parameters by a huge number of highly detailed estimation domains
These domains generally define not nested partitions of the target population
When the domain indicator variables are available at framework level, we may plan a sample covering each domain
Fixing the sample sizes:Help to control the sampling errors of the main estimates;When direct estimators are not reliable (small area problem), having
the units in the domains allows to: bound the bias of small area indirect estimators; use models with
specific small area effects.
Statement of the problem
Standard solution for fixing the sample sizes stratifies the sample with strata given by cross-classification of variables defining the different partitions (cross-classified or one-way stratified design)
Main drawback:Too detailed stratification:
Risk of sample size explosion; Inefficient sample allocation (2 units per stratum constraint);Risk of statistical burden (e.g. repeated business surveys) .
Statement of the problem
Domain of Interest Parameter of interest and estimator:
Multivariate (r=1,…,R) and multidomain (d =1, … , D) context
being 1dk if dUk and 0dk otherwise.
Statement of the problem
The sampling strategy herein proposed bases each domain estimate on a planned sample size. We consider a general random sampling design where hU (h=1, …, H) of size hN define minimal planned subpopulations. We assume two cases dU = hU or dU = hh U
d where d is a subset of H,...,1 .
Statement of the problem
Example: Three domain types lT (l=1, .., 3). Nace four digit; Nace three digit by size; Nace 2 digit by geography Each domain type defines a partition of the population of lD cardinality being 321 DDDD . Different sampling design allows to plan the sample size of the interest domain: the standard approach define the hU ’s combining
the population of the three domain types. Then
321 DDDH and the kδ are defined as (0,..,1,...,0) vectors. We denote these design as cross-classified or one-way stratified design;
Statement of the problem
Example (continue): the hU ’s are defined combining all the couples of
domain types. Then )()()( 323121 DDDDDDH ;
some hU ’s agree with the domains of one population partitions (for instance 1T ) and the others hU ’s are defined combining couples of the remaining domain types ( 2T and 3T ). Then
)( 321 DDDH ; the hU ’s agree with the domains of interes. Then
321 DDDH .
Sampling design defining the hU ’s as in the last three points are denoted by Multi-way (or incomplete) stratification
Main problem of MWD: define a procedure for random selection
We propose to use the Cube method (Deville and Tillé, 2004):Select random sample of multi-way stratified design; For a large population and a lot of domains.
We note the (a) constraints depend on the unknown variables of interest. In practice only model predicted values can be used. In the paper we sketch the main phases of the algorithm in this operative context.
We consider a general prediction model M
lkuuE
uEkuE
uyy
rlrkM
rkrkMrkM
rkrkrk
0),(
;)(;0)(
~
22 .
We suppose the 2rk values are known or can be predicted
To take into account the model uncertainty, the sampling variance is replaced by the Anticipated Variances (Isaki and Fuller, 1982). An upward approximation of the anticipated variances for the proposed strategy is
Uk rkkdrk
Uk kdrk
drdrpmdr
f
ttEEtAV
2)(
*
2)(
*
2*)()()(
)1/1(
~)1/1(
)|ˆ()ˆ(
π
where 2)(
~kdr is computed by means of a model
predicted value rky~ . The approximation neglects a residual term that we do not show for sake of brevity. However, the optimization procedure does not change if the corrected anticipate variance is taken into account.
The algorithm consists of two calculation loops nested in each other.
Let G)( and G),( respectively denote
the generic quantity G as calculated by the iteration ( =0,1,2,..) of the first loop (outer process) and by iteration =0,1,2,..) of the second one (inner process).
Analysis of the allocation with the predicted values:The sample allocation procedure uses an approximation of the
AV
The simulation confirms the input AV is an upward approximation of the real AV
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 16
Average of Expectected Anticipated CV(%) Partition
1 8.1 17.82 9.2 19.1
1y 2y
Average of Empirical (10,000 Monte Carlo simulations) Anticipated CV(%) Partition
1 6.7 14.72 7.4 15.5
1y 2y
Monte Carlo simulationMonte Carlo simulation
Comparison with the standard approach:
The implicit model (one-way stratification model) is similar to the model used in our approach;
The allocation differences depend on the unit minimum number constraint (2) in each stratum;
The sample size is 751 units (+7.4%);
Taking into account the domains with small population strata (<10 units in average per stratum) standard approach produces +14.4% sample size.
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 17
ReferencesReferences
Bethel J. (1989) Sample Allocation in Multivariate Surveys, Survey Methodology, 15, 47-57.
Chromy J. (1987). Design Optimization with Multiple Objectives, Proceedings of the Survey Research Methods Sec-tion. American Statistical Association, 194-199.
Deville J.-C., Tillé Y. (2004) Efficient Balanced Sampling: the Cube Method, Biometrika, 91, 893-912.
Deville J.-C., Tillé Y. (2005) Variance approximation under balanced sampling, Journal of Statistical Planning and Inference, 128, 569-591
Falorsi P. D., Righi P. (2008) A Balanced Sampling Approach for Multi-way Stratification Designs for Small Area Estimation, Survey Methodology, 34, 223-234
Falorsi P. D., Orsini D., Righi P., (2006) Balanced and Coordinated Sampling Designs for Small Domain Estimation, Statistics in Transition, 7, 1173-1198
Isaki C.T., Fuller W.A. (1982) Survey design under a regression superpopulation model, Journal of the American Statistical Association, 77, 89-96