Improved County Level Improved County Level Estimation of Crop Estimation of Crop Yield Using Model- Yield Using Model- Based Methodology With Based Methodology With a Spatial Component a Spatial Component Michael E. Bellow, USDA/NASS Michael E. Bellow, USDA/NASS
34
Embed
Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component Michael E. Bellow, USDA/NASS.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Improved County Level Improved County Level Estimation of Crop Yield Estimation of Crop Yield
Using Model-Based Using Model-Based Methodology With a Methodology With a Spatial ComponentSpatial Component
Michael E. Bellow, USDA/NASSMichael E. Bellow, USDA/NASS
OutlineOutline
• BackgroundBackground
• Simulation MethodologySimulation Methodology
• Results of Ten State StudyResults of Ten State Study
• Convergence EvaluationConvergence Evaluation
• Summary Summary
County Level Commodity County Level Commodity EstimationEstimation• NASS program since 1917NASS program since 1917
• Estimates used by private sector, Estimates used by private sector, academia, governmentacademia, government
• Data from various sources usedData from various sources used
• NASS County Estimates System developed NASS County Estimates System developed to facilitate the estimation processto facilitate the estimation process
Available Data SourcesAvailable Data Sources
• Voluntary response surveys of farm Voluntary response surveys of farm operatorsoperators
• List frame control data (lists of known List frame control data (lists of known farming operations)farming operations)
• Previous year official estimates Previous year official estimates
• Census of Agriculture data (NASS conducts Census of Agriculture data (NASS conducts Census every five years)Census every five years)
• Earth resources satellite dataEarth resources satellite data
County Crop Yield EstimationCounty Crop Yield Estimation
• Yield is ratio of crop production to Yield is ratio of crop production to harvested area (acres)harvested area (acres)
• Accurate estimation challenging due Accurate estimation challenging due to to
- - reliable administrative data seldom availablereliable administrative data seldom available
- high year-to-year variability of yields (weather- high year-to-year variability of yields (weather
sensitive) sensitive)
- lack of adequate sample survey data- lack of adequate sample survey data
Desirable Features of a County Desirable Features of a County Yield Estimation MethodYield Estimation Method
• Repeatability Repeatability
• Accurate variance estimation Accurate variance estimation
• Produce estimates for counties Produce estimates for counties having no survey data having no survey data
Ratio (R) EstimatorRatio (R) Estimator • Traditional crop yield estimator used by Traditional crop yield estimator used by
NASSNASS• Computed as ratio between production Computed as ratio between production
and harvested area estimates (with minor and harvested area estimates (with minor adjustment)adjustment)
• Can produce inconsistent yields due to Can produce inconsistent yields due to fluctuations in harvested acreagefluctuations in harvested acreage
• No utilization of survey data from counties No utilization of survey data from counties other than the one being estimatedother than the one being estimated
Model-Based County Model-Based County Estimation MethodsEstimation Methods
• Based on linear or non-linear models Based on linear or non-linear models relating true yields to survey reported relating true yields to survey reported valuesvalues
• Generally fit using an iterative algorithmGenerally fit using an iterative algorithm
• Convergence not always guaranteedConvergence not always guaranteed
• Estimates can be adjusted for consistency Estimates can be adjusted for consistency with published state figureswith published state figures
Stasny-Goel (SG) MethodStasny-Goel (SG) Method
• Developed at Ohio State University under Developed at Ohio State University under cooperative agreement with NASScooperative agreement with NASS
• Assumes mixed effects model with farm size group Assumes mixed effects model with farm size group as fixed effect and county as random effectas fixed effect and county as random effect
• Random effect assumed multivariate normal with Random effect assumed multivariate normal with covariance matrix reflecting spatial correlation covariance matrix reflecting spatial correlation among neighboring counties -among neighboring counties -
corrcorriijj = = if county if county ii borders county borders county jj
= 0 otherwise= 0 otherwise
• EM algorithm used to fit modelEM algorithm used to fit model
Stasny-Goel Method (cont.)Stasny-Goel Method (cont.)• Previous year county yields used to derive initial Previous year county yields used to derive initial
estimates of county and size group effectsestimates of county and size group effects
• Processing continues until at least one of the Processing continues until at least one of the following two conditions is satisfied –following two conditions is satisfied – relative group and log-likelihood distances fall relative group and log-likelihood distances fall
below preset limits below preset limits maximum allowable number of iterations reachedmaximum allowable number of iterations reached
• County yield estimates computed as weightedCounty yield estimates computed as weighted
• averages of individual farm level estimates averages of individual farm level estimates
• (weights derived from Census of Agriculture (weights derived from Census of Agriculture data)data)
Griffith (G) MethodGriffith (G) Method • Developed by Dr. Dan Griffith at Syracuse Developed by Dr. Dan Griffith at Syracuse
University under cooperative agreement with University under cooperative agreement with NASSNASS
• Predicts yield values using published number of Predicts yield values using published number of farms producing crop of interestfarms producing crop of interest
• Assumes autoregressive modelAssumes autoregressive model
• Employs Box-Cox and Box-Tidwell transformations Employs Box-Cox and Box-Tidwell transformations
• Spatial imputation routine can compute estimates Spatial imputation routine can compute estimates for counties with missing survey datafor counties with missing survey data
Previous Research on Model-Previous Research on Model-Based MethodsBased Methods• Stasny, Goel and RumseyStasny, Goel and Rumsey (1991) – early version of (1991) – early version of
SG method tested on Kansas wheat production dataSG method tested on Kansas wheat production data• Stasny et al (1995)Stasny et al (1995) – improved version of SG tested – improved version of SG tested
on Ohio corn yield dataon Ohio corn yield data• Crouse (2000)Crouse (2000) – SG evaluated for Michigan corn and – SG evaluated for Michigan corn and
barley yield barley yield • Griffith (2000)Griffith (2000) – Griffith method tested on Michigan – Griffith method tested on Michigan corn yield datacorn yield data• Bellow (2004)Bellow (2004) – SG and Griffith methods compared – SG and Griffith methods compared
for North Dakota oats and barley yield (presented for North Dakota oats and barley yield (presented at FCSM Research Conference)at FCSM Research Conference)
Ten-State Research StudyTen-State Research Study
• Compare performance of Stasny-Goel, Compare performance of Stasny-Goel, Griffith and ratio methods for various Griffith and ratio methods for various crops in ten geographically dispersed crops in ten geographically dispersed states:states:
Post-Stratification Size GroupsPost-Stratification Size Groups
• NASS statewide survey data post-NASS statewide survey data post-stratified by county and farm size based stratified by county and farm size based on COA data on COA data
(two or three size groups defined)(two or three size groups defined)• Percentages of Census farm acres by size Percentages of Census farm acres by size
group used as weights for SG algorithmgroup used as weights for SG algorithm• Equal total land in farms criterion used toEqual total land in farms criterion used to form groups form groups
Data Sources For Research Data Sources For Research StudyStudy
• 2001-03 County Estimates Survey 2001-03 County Estimates Survey
• 2001-02 official crop yield estimates2001-02 official crop yield estimates
(‘previous year’ data)(‘previous year’ data)
• 2002 Census of Agriculture (number of2002 Census of Agriculture (number of
farms, land in farms)farms, land in farms)
Simulation ProcedureSimulation Procedure• Multiple regression performed on survey Multiple regression performed on survey
reported yield vs. official county yields,reported yield vs. official county yields, weighted average neighbor yields, size weighted average neighbor yields, size
group membership variables group membership variables • Artificial population of 10,000 simulated Artificial population of 10,000 simulated
survey data sets used to compute ‘true’ survey data sets used to compute ‘true’ population parameter valuespopulation parameter values
• 250 sample data sets selected at random 250 sample data sets selected at random from populationfrom population
• Moran’s IMoran’s I computed to test whether computed to test whether simulated data sets reflect spatial simulated data sets reflect spatial correlation of realcorrelation of real
survey datasurvey data
• SG, G and R methods applied to each of the SG, G and R methods applied to each of the
250 sampled data sets250 sampled data sets
• Average simulated parameter values Average simulated parameter values compared with corresponding population compared with corresponding population values for each estimation methodvalues for each estimation method
Measures of Estimator Measures of Estimator PerformancePerformance• Absolute BiasAbsolute Bias - average absolute difference between - average absolute difference between
simulated yield estimates and true (population) yieldsimulated yield estimates and true (population) yield• VarianceVariance – sample variance of simulated yield – sample variance of simulated yield
estimates estimates • Mean Square ErrorMean Square Error – average squared deviation – average squared deviation
between simulated estimates and true yield (SG between simulated estimates and true yield (SG program also computes analytic MSE)program also computes analytic MSE)
• Lower (Upper) Tail ProximityLower (Upper) Tail Proximity – average absolute – average absolute difference between 5difference between 5thth (95 (95thth) percentile of simulated ) percentile of simulated yield estimates and true yieldyield estimates and true yield
Pairwise Estimator Comparison for Absolute Pairwise Estimator Comparison for Absolute
BiasBias (* - better method)(* - better method)CropCrop Stasny-Goel vs. Stasny-Goel vs.
RatioRatio Stasny-Goel vs. Stasny-Goel vs. GriffithGriffith
Percent of Counties Percent of Counties Favoring Favoring
Percent of Counties Percent of Counties Favoring Favoring
Pairwise Estimator Comparison for VariancePairwise Estimator Comparison for VarianceCropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Pairwise Estimator Comparison for MSEPairwise Estimator Comparison for MSECropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Pairwise Estimator Comparison for LTPPairwise Estimator Comparison for LTPCropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Pairwise Estimator Comparison for UTPPairwise Estimator Comparison for UTPCropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
• Wilcoxon Rank Sum TestWilcoxon Rank Sum Test – compare – compare median absolute error (over simulation median absolute error (over simulation runs) of SG vs. R, SG vs. G for each county runs) of SG vs. R, SG vs. G for each county
• Wilcoxon Signed Rank TestWilcoxon Signed Rank Test – assess – assess whether median error of SG, G, R is whether median error of SG, G, R is negative, positive or zero (two one-sided negative, positive or zero (two one-sided tests performed for each county) tests performed for each county)
Results of Rank Sum Tests on Absolute BiasResults of Rank Sum Tests on Absolute Bias
CropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties FavoringPercent of Counties Favoring Percent of Counties FavoringPercent of Counties Favoring
Percent of Counties With Average Percent of Counties With Average Underestimate Less Than 10% of True Yield Underestimate Less Than 10% of True Yield (* - best method)(* - best method)
CropCrop MethodMethod
Stasny-Stasny-Goel Goel
GriffithGriffith RatioRatio
BarleyBarley 81*81* 62 62 4646
CornCorn 83*83* 7171 4242
Cotton Cotton (upland)(upland)
79*79* 78 78 64.5 64.5
Dry BeansDry Beans 95*95* 74 74 62.562.5
OatsOats 70.5*70.5* 5454 2121
RyeRye 41 41 52*52* 1313
SorghumSorghum 52*52* 4141 1111
SoybeansSoybeans 84*84* 7676 6262
SunflowerSunflower 80*80* 63.5 63.5 5050
Tobacco Tobacco (burley)(burley)
9393 98*98* 2727
Wheat (spring)Wheat (spring) 94*94* 5555 5454
Wheat (winter)Wheat (winter) 86*86* 7575 51.551.5
Convergence IssuesConvergence Issues
• SG algorithm not guaranteed to converge SG algorithm not guaranteed to converge within fixed limit on number of iterationswithin fixed limit on number of iterations
• Non-convergence associated with Non-convergence associated with numerical instability conditions numerical instability conditions
• Yield estimates produced for non-Yield estimates produced for non-convergent runs may be suspectconvergent runs may be suspect
• Convergence generally most reliable for Convergence generally most reliable for highly prevalent crops, least reliable for highly prevalent crops, least reliable for rare cropsrare crops
Algorithm Convergence Percentage By Crop Algorithm Convergence Percentage By Crop (Limit of 5000 Iterations) (Limit of 5000 Iterations)
CropCrop Method Method
Stasny-GoelStasny-Goel GriffithGriffith
BarleyBarley 9393 68 68
CornCorn 8787 77 77
Cotton (upland)Cotton (upland) 8181 8989
Dry BeansDry Beans 8989 7575
OatsOats 8080 71 71
RyeRye 7474 8383
SorghumSorghum 8585 6666
SoybeansSoybeans 9393 7373
SunflowerSunflower 90.590.5 8080
Tobacco (burley)Tobacco (burley) 4141 5252
Wheat (spring)Wheat (spring) 6363 52.552.5
Wheat (winter)Wheat (winter) 8888 65 65
Two Approaches to Dealing With SG Two Approaches to Dealing With SG Non-ConvergenceNon-Convergence • SG(1)SG(1) - use estimate generated at final allowable - use estimate generated at final allowable iteration (Niteration (N00))
• SG(2)SG(2) - keep track of which iteration (i*) maximized - keep track of which iteration (i*) maximized the log-likelihood the log-likelihood
- if i* < - if i* < NN00 , rerun algorithm to i* and use that estimate , rerun algorithm to i* and use that estimate
- if i* = N- if i* = N00 , resume processing at iteration (N0+1) and continue , resume processing at iteration (N0+1) and continue until either -until either -
o convergence occurs (use that estimate) OR o convergence occurs (use that estimate) OR o log-likelihood decreases from one iteration to next (use estimateo log-likelihood decreases from one iteration to next (use estimate
at next-to-last iteration)at next-to-last iteration)
Non-Convergence StudyNon-Convergence Study
• Does SG(1) or SG(2) outperform ratio estimator in Does SG(1) or SG(2) outperform ratio estimator in cases where SG failed to converge?cases where SG failed to converge? • Six cases with high non-convergence percentage Six cases with high non-convergence percentage
selected for comparison of SG(1), SG(2) and R selected for comparison of SG(1), SG(2) and R - 2002 CO barley (37 simulation runs)- 2002 CO barley (37 simulation runs)- 2002 MS soybeans (105) - 2002 MS soybeans (105) - 2002 NY winter wheat (39)- 2002 NY winter wheat (39)- 2002 ND dry beans (38)- 2002 ND dry beans (38)- 2002 OH oats (50) - 2002 OH oats (50) - 2003 OK rye (59) - 2003 OK rye (59)
Combined Pairwise Estimator Comparison Combined Pairwise Estimator Comparison forforNon-Convergence Test CasesNon-Convergence Test Cases
MeasureMeasure SG(1) vs. SG(1) vs. RatioRatio
SG(2) vs. Ratio SG(2) vs. Ratio SG(1) vs. SG(1) vs. SG(2)SG(2)
Percent of Percent of Counties Counties
FavoringFavoring
Percent of Percent of Counties Counties
FavoringFavoring
Percent of Percent of Counties Counties
FavoringFavoring
SG(1)SG(1) RR SG(2)SG(2) RR SG(1)SG(1) SG(2)SG(2)
Absolute Absolute BiasBias
78*78* 22 22 80*80* 20 20 2323 77*77*
VarianceVariance 95*95* 55 99*99* 11 00 100*100*
MSEMSE 81*81* 1919 83* 83*
1717 15 15 85*85*
LTPLTP 74* 74*
2626 88*88* 1212 1313 87*87*
UTPUTP 84*84* 1616 90*90* 1010 1515 85*85*
SummarySummary
• SG yield estimation method outperforms R SG yield estimation method outperforms R in all efficiency categories and G in most in all efficiency categories and G in most categories (G outperforms R)categories (G outperforms R)
• Convergence problems can be alleviated Convergence problems can be alleviated using enhanced SG approach using enhanced SG approach
• SG method recommended for integration SG method recommended for integration into NASS County Estimates Systeminto NASS County Estimates System