APPENDIX C ESTIMATES OF SAMPLING ERRORS Mahir Ulusoy and Alfredo Aliaga The estimates from a sample survey are affected by two types of errors--nonsampling and sampling. Nonsampling errors result from mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made to minimise this type of error during the implementation of the TDHS, nonsampling errors are impossible to avoid and difficult to evaluate statistically. Sampling errors, on the other hand, can be evaluated statistically. The sample of women selected in the TDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that would differ somewhat from the results of the actual sample selected. The sampling error is a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results. Sampling error is usually.measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which isthe ratio of the standard deviation to the square root of the sample size. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design. If the sample of women had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the TDHS sample is the result of a three-stage stratified design, and, consequently, it was necessary to use more complex formulas. The computer package CLUSTERS, developed by the International Statistical Institute for the World Fertility Survey, was used to compute the sampling errors for 42 variables with the proper statistical methodology. The CLUSTERS package treats any percentage or average as a ratio estimate, r = y/x, where y represents the total sample value for variable y, and x represents the total number of cases in the group or subgroup under consideration. The variance of r is computed using the formula given below, with the standard error being the square root of the variance, var(r) = 1-f mh 2 Zh x 2 mh-----i i=1 k in which Zhi = Yhi-r.Xhi , and Zh = yh-r.xh 143
16
Embed
APPENDIX C ESTIMATES OF SAMPLING ERRORS Mahir Ulusoy and … · APPENDIX C ESTIMATES OF SAMPLING ERRORS Mahir Ulusoy and Alfredo Aliaga The estimates from a sample survey are affected
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
APPENDIX C
ESTIMATES OF SAMPLING ERRORS
Mahir Ulusoy and Alfredo Aliaga
The estimates from a sample survey are affected by two types of errors--nonsampling and sampling. Nonsampling errors result from mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding o f the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made to minimise this type of error during the implementation of the TDHS, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample o f women selected in the TDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that would differ somewhat from the results of the actual sample selected. The sampling error is a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
Sampling error is usually.measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the ratio of the standard deviation to the square root of the sample size. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value o f that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of women had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the TDHS sample is the result of a three-stage stratified design, and, consequently, it was necessary to use more complex formulas. The computer package CLUSTERS, developed by the International Statistical Institute for the World Fertility Survey, was used to compute the sampling errors for 42 variables with the proper statistical methodology.
The CLUSTERS package treats any percentage or average as a ratio estimate, r = y/x, where y represents the total sample value for variable y, and x represents the total number of cases in the group or subgroup under consideration. The variance of r is computed using the formula given below, with the standard error being the square root of the variance,
var(r) = 1 - f mh 2 Zh
x 2 mh-----i i=1 k
in which
Zhi = Y h i - r . X h i , a n d Z h = y h - r . x h
143
where h II1 h
Yhi xhi
f
represents the stratum that varies from I to H is the total number of standard segments selected in tile la th s t r a t u m
is the sum of the values of variable y in standard segments i in the h th stratum is the sum of the number of cases (women) in standard segments i in the la th stratum is the overall sampling fraction, which is so small that CLUSTERS ignores it.
In addition to tile standard errors, CLUSTERS computes the design effect (DEFT) for each estimate, which is defined as the ratio of the standard error using the given sample design to the standard error that would result if a simple random sample had been used. A DEFT value of 1.0 indicates that the sample design is as efficient as a simple random sample, whereas a value greater than 1.0 indicates the increase in tile sampling error due to the use of a more complex and less statistically efficient design. CLUSTERS also computes the relative error and confidence limits for the estimates.
The results for the 42 variables mentioned, which are those considered to be of primary interest, are presented in this appendix for the country as a whole, for urban and rural areas, for the five regions, and for age groups. The type of statistic (mean or proportion) and the base population for each variable are given in Table C.1. Tables C.2 to C.12 present the value of the statistic (R), its standard error (SE), the number of unweighted (N) and weighted (WN) cases, file design effect (DEFT), the relative standard error (SE/R), and the 95 percent confidence limits (R_+2SE), for each variable.
Additionally, sampling errors were calculated for tile total fertility rote of the last year prior to the survey date and the infant mortality rate for the 5 years preceding the survey, for the national total, and for urban-rural areas. These calculations were undertaken using the Jacknife methodology rather than the CLUSTERS package because of the nature of these two estimates. The Jacknife methodology is based on having replicate values for the estimates and applying the sim pie standard error formulae to these replicates.
Tile TDHS included 478 clusters. Each replication considers all clusters but deletes one cluster at a time for the calculations and then creates pseudoindependent replicates. In total, 478 replications for the infant mortality and total fertility rates create tile pseudoindependent values:
e(.i) = 478 * estimate (all clusters) - 477 * estimate (all minus i ~h)
e estimate (all clusters)
and tile sampling errors for the estimate is given by:
The results of the calcnlations using the Jacknife methodology to estimate sampling errors for the infant mortality rate and tile total fertility rate for the national total, for urban and rural areas, and for the five major regions is shown in Table C.13.
Tile confidence interval (e.g., as calculated for EVBORN) can be interpreted as follows: the overall average from the national sample is 3.041 and the standard error is 0.044. Therefore, to obtain the 95 percent confidence limits, one adds and subtracts twice the standard error to the sample estimate, i.e., 3.041 _+ 0.088. There is a high probability (95 percent) that the true average number of children ever born to all women age 15 to 49 is between 2.954 and 3.128.
Of the 42 variables for which CLUSTERS was used for the estimation of sampling errors, 28 are based on women, and 14 are based on children under age 5. Ill general, the relative standard error for most
144
estimates for the country as a whole is small, except for estimates of very small proportions. There are some differentials in the relative standard error for the estimates of subpopulations such as urban and rural areas. For example, for the variable SECATT (secondary school attendance), the relative standard errors as a percent of the estimated proportion for urban and rural areas are 4.6 percent and 12.5 percent, respectively. The same istrue for SECGRD(proportion of women who completed secondary school) with values of 5 percent and 14.2 percent, for XCUPIL (current use of the pill) with values of 8.1 and 13.6 percent, for XCUIUD (current use of IUD) with values of 3.4 and 8.5 percent, and for XCUPAB (current use of periodic abstinence) with values of 17 percent and 0 percent, for urban and rural areas, respectively for each variable.
O f the 42 variables, 24 were found to have SE/R values of less than 0.03, which means that the SE of those variables is at most 3 percent of the estimate. SE/R values are between 0.031 and 0.059 for 13 variables, and greater than 0.06 for only 5 variables; the maximum value being 16.6 percent. The variables with the highest SE/R ratio are the ones calculated for relatively rare events.
The DEFT value is less than 1.3 for 24 variables; between 1.31 and 1.5 for 13 variables; and greater than 1.51 for only 5 variables. The maximum DEFT value obtained is 1.668. The average of 42 variables is 1.301. The average is 1.213 in urban areas and 1.293 in rural areas for 41 variables (due to the exclusion of the URBAN variable).
145
Table C.I List of selected variables for sampling errors, Turkey 1993
Variable Estimate Base Population
URBAN SECATT SECGRD CURMAR AGEMAR PREGNT NUPRFG NUMISC F, VBORN XEVB XI 'VB40 SURVIV KMETIIO XKMOD XKSOUR XEVUSI" XCUSE XCUPII. X C U I U D
XCUCON XCUWlT XCUSTE XCUPAB XCUMOD XPSOUR XNOMOR XDELAY IDI~AL T H ' A N U MEDEL1 DIARRI DIARR2 ORSTRE MEDTRE RESPI2 RESPI I IICARD BCG I) P 1"3 POL3 Mh:ASLE FULIAM
Urban Proportion Attended secondary or higher Proporlion Graduated secondary or higher Proportion Currently married Proportion Age at marriage Mean Currently pregnant Proporlion Number of pregnancies Mean Number of miscarriages Mean Children ever born Mean Children ever born Mean Children ever born Mean Children surviving Mean Know any method Proportion Know modem method Proporlion Know source of method Proporlion I'ver used any method I'roporlion Currently using any method Proporlion Current use pill Proportion Current use IUD Proportion Current u~,e condom Proportion Current use withdrawal Proportion Current use female sterih Proportion Current use periodic abst. Proportion Currently using modem method Proportion Using public source Proportion Want no more children Proportion Delay at least two years Proportion Ideal number of children Mean Mother received tetanus injection Proportion Mother received medical attention Proportion Ilad diarrhoea in last 2 weeks Proportion Itad diarrhoea in last 24 hours Proportion Children ORS treated diarrhoea Proportion Children medical treated diarrhoea Proportion I lad resp. disease last 2 weeks Proportion Itad resp. disease last 24 hours Proporlion Chitdren having health card Proportion Children with BCG Proportion Children with DPT (3 doses) Proportion Children with Polio (3 doses) Proportion Children with measles Proportion Children lidly immunised Proportion
I'ver-man'ied v, omcn Fvcr-married women Fver-married women Evcr-marricd women Evcr-marricd ~ omen Ever-married womcn Ever-married women Ever-married womcn l-vcr-married women Currently marricd women Currently marricd women 40-49 l'ver-married women I'ver-marricd women Currently married women Currently married women Currently married women Currently married women Currently married women Currently married w o m e n
Currently married women Currently married women Currently married women Currently married women Currently married women Modem users married women Currently married women Currently married women Ever-married women Birlhs last live years Births last llvc years Children under live years Children under live years Children with diarrhoea last 2 weeks Children with diarrhoea last 2 weeks Children under five years Children under live years Children 12 to 23 months Children 12 to 23 months Children 12 to 23 montlrs Children 12 to 23 months Children 12 to 23 months Children 12 to 23 months
] a b l e C.I(} Sampl ing errors - Age 15-24. l 'urkcy 1993
Nunlb,2r 01" cases Standard I)csign Relative
Value error Unweightcd Wcightcd effect error Variable (R) (SE) (N) (WN) (DH: r ) (SE/R)
('tmlidcnc¢ limits
I/.-2SE R+2SE
U R B A N .623 .016 SECATI" ,196 .013 SI{C(iRD .160 .(}1 I C U R M A R .987 .(}03 A(}IiMAR 17659 ,(}811 PRli{IN'I .203 .(}l I NIH}REG 1.379 0 3 6 N U M I S C 1 4 2 .(}12 l iVBORN I. 140 032 XEVI] I. 145 032 SI.JRVIV 1.062 0 2 7 K M E H I O .997 004 XKM(}I) .984 (}04 XKSOIIR .926 .008 XI!VUSl i 621 015 X C U S I .446 015 X C I I p I L .040 0 0 6 X( ' l IIUI} .139 .(}1 I X( ' [JCON .048 .006 X C U W H .204 .013 XCI I S l E .002 .001 XCUI 'AB 004 002 XCUMOI} ,236 013 X PSOI l R 582 ,029 X N O M ( }R ,299 ,012 XI)I(1.AY 427 .014 IDEAl, 2244 025 I li1 ANU 464 .020 M E I ) H . I ,790 .024 I } IARI { I ,286 0 l 5 I}IARR2 128 .012 (}I{S I'RI! ,185 025 MI{I)'I RI' 272 .025 RliSPI2 .171 ,014 RI!SPII .421 .016 I I{'A RI) .419 .032 I',{'(i .890 ,022 DP'I3 .739 .031 POI,3 ,733 {)32 MEASI,I¢ .783 .027 I:UI,IAM .616 .033
1361 1372 1.245 ,026 .5911 .655 1.361 1372 1.206 .1166 .1711 .222 I.~61 l a7 - I 131 3}70 .138 1 8 3 1361 1372 ,935 ,003 . 98 "~_ . {19.~ • 1361 1372 1 223 005 17.498 17,819 1361 1372 1,O50 ,056 AgO 2 2 6 1.361 1.372 I I 14 1121"1 1.307 1.452 1361 I.~7- I .O64 084 . I 18 .166 1361 1372 1.135 .1128 11177 1.203 1.~4. I.)_5 I 134 .1128 1.081 1.209 I .~61 I.~ 7 . I 06X (125 1.009 I. I 16 1361 I o7 . 1 295 I1114 .979 995 1342 b S . I _ a 4 .11114 .976 . . 1342 1355 1.130 .009 .910 .{142 1342 1355 1123 .024 .591 651 1342 1355 1.123 .034 .415 .476 1342 1355 IO34 .139 .029 1151 1342 1355 I I I I 075 .118 1 6 0 1342 1355 IO62 .129 .035 0 6 0 1342 1355 I 155 062 .[79 2 3 0 1342 1355 111115 5'15 -.000 (}(}5 1342 1355 1.056 .458 000 ,O(18 l a 4 . 1o.5 I.I 17 .I155 .210 "~ "~ 318 320 1.057 .050 523 640 1342 I.,55 953 .040 _75 ~
I n f a n t mnr t a l i t y rate Urban 44.038 4.992 2277 2284 1.027 . 113 34.053 54.022 Rural 65.442 7 8 6 0 1538 1539 1.276 .120 49.722 81.163
Total 52.574 4.391 3815 3823 1.148 .084 43.793 61.355
ill should be noted that adding the number o tcases Ibr urban and rural areas does not provide the total number o f cases for the entire c tmnt ry r h e calculation of the total t~:rliiity rate is based on years of exposure by women and tbe cases are not addilive in separate domains.