Atmospheric Downscaling using Genetic Programming Tanja Zerenner 1 , Victor Venema 1 , Petra Friederichs 1 , Clemens Simmer 1 1 Meteorological Institute, University of Bonn, Germany 1. Motivation Figure 1: Scale differences in TerrSysMP. The Transregional Collaborative Re- search Centre 32 (TR 32) has devel- oped an integrated modeling system, TerrSysMP, consisting of the atmo- spheric model COSMO, the land-surface model CLM, and the hydrological model ParFlow. These component models are usually operated at different resolu- tions in space and time. Thus up- and downscaling procedures are required at the interfaces between atmospheric and land-surface/subsurface models. 2. Method We develop a mixed physical/statistical downscaling scheme from a training data set of high-resolution models runs via multiobjective symbolic regres- sion using Genetic Programming (GP). Discretization etc. induces uncertainty in models. Hence we do not try to repro- duce the ’exact’ high-resolution model output fields, but ’realistic’ ones. Symbolic Regression Given a sample data set {X , Y } the aim is to find a function that maps X to Y . In symbolic regression the form of the regression function (linear, polynomial,...) is not known. Genetic Programming GP originates from machine learning: From a set of functions (arithmetic ex- pressions, IF-statements, etc.) and terminals (constants or variables) GP gen- erates potential solutions to a given problem while minimizing a fitness (cost, error) function. Figure 2: A pareto front for a maximization prob- lem with two objectives. Pareto Optimality When dealing with multiple objec- tives often there is no solution which is optimal in the absolute sense (i.e. in every objective). An n-tuple x = {x 1 , x 2 , ..., x n } is called pareto optimum of a set A of n-tuples, if there is no n-tuple y = {y 1 , y 2 , ..., y n } in set A with for all i = 1, 2, ...n; y i ≥ x i and for minimum one i y 1 > x 1 (for a maximization problem!). Implementation Our code is based on the GPLAB package for Matlab (Silva et al., 2003). For multiobjective fitness assignement we have integrated the Strength Pareto Apporach (SPEA) by Zitzler and Thiele (1999). 3. Temperature Downscaling We illustrate our method using the problem of downscal- ing near-surface temperature during clear-sky nights. The downscaling scheme currently implemented in the TerrSysMP (Schomburg et al., 2010) does not contain a regression step for this weather situation. 3.1 Set Up Predictors High-res. surface information Coarse weather information topography & 6 derived params. near-surface temperature plant cover near-surf. vert. temp. gradient roughness length near-surf. turbulent kinetic energy near-surf. horizontal windspeed cloud cover at 3 heights Training Data COSMO model output at 400m resolution for 27 timesteps and a domain size of 280 × 280 grid points, i.e. 40 × 40 grid points at the coarse (2.8km) scale. Objectives Root Mean Square Error: RMSE Mean Error of Standard Deviation: ME(STD) ME (STD) = MEAN STD 7×7 (T d ) - STD 7×7 (T t ) with T d denoting the downscaled temperature and T t denoting the ’true’ tremperature. STD 7×7 denotes the (fine-scale) standard de- viation within the coarse 7 × 7 pixels. Earth Movers Distance: EMD for histograms of temperature values (barwidth= 0.25K ) of full fields at single timesteps. The EMD is a measure for the ’distance’ between two histogram distributions. Figure 3: Sketch of concept of histogram differences. EMD 0 = 0 EMD i+1 = (A i + EMD i ) - B i EMD = ∑ i |EMD i | As objective we take the mean EMD over all training data fields, i.e. at each timestep. GP settings Parameter Value function set +,-,*,protected /, if generations 200 population size 100 max. pareto set size 50 genetic operators mutation, crossover 3.2 Results (a) coarse (b) interpolated (c) interpolated + downscaled 278 280 282 284 286 288 290 292 temperature [K] (d) high-resolution (’true’) Figure 4: Example for downscaling a near-surface temperature field using one GP solution. Shown is a nightly temperature field of 112 × 112km within North Rhine-Westphalia in Germany [50.56 ◦ -51.03 ◦ lat , 6.06 ◦ -6.83 ◦ lon]. -4 -2 0 2 4 -4 -2 0 2 4 temp. anomaly (true) [K] temp. anomaly (predicted) [K] 0 1 2 3 0 1 2 3 STD of temperature (true) [K] STD of temperature (predicted) [K] Figure 5: Scatterplots for GP solution from Fig. 4: GP output vs. reference (’true’) values. Shown are 2500 ran- domly chosen points. Figure 6: Cross section for GP solution from Fig. 4. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.7 0.8 0.20 0.25 0.30 0.35 0.40 RMSE [K] ME(STD) [K] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.7 0.8 0.2 0.4 0.6 0.8 1.0 1.2 1.4 RMSE [K] EMD ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.20 0.25 0.30 0.35 0.40 0.2 0.4 0.6 0.8 1.0 1.2 1.4 ME(STD) [K] EMD Figure 7: Values of the objectives for the 50 solutions of the final pareto set. Table 1: Values of the objectives for GP solution from Fig. 4. interp. downsc. RMSE [K] 0.70 0.84 ME(STD) [K] 0.58 0.22 EMD 1.75 0.34 4. Conclusion Our preliminary results show that realistic fine-scale structures can be retrieved from the coarse scale input, which constitutes a major advancement compared to the usually applied interpolations methods. 5. Outlook Expansion of training and validation data sets. Find and test further objectives (for better quantification of spatio-temporal characteristics of fine-scale fields). Downscale remaining atmospheric variables required. Implement the downscaling in the TerrSysMP. Downscaling ensemble. Acknowledgements We gratefully acknowledge the financial support from Transregio 32 ’Patterns in Soil-Vegetation-Atmophere Systems’ funded by the ’Deutsche Forschungsgemeinschaft’ (DFG). Furthermore we would like to thank Annika Schomburg for providing training data and COSMO model support. References Schomburg, Annika, et al. "A downscaling scheme for atmospheric variables to drive soil-vegetation-atmosphere transfer models." Tellus B 62.4 (2010): 242-258. Silva, Sara, and Jonas Almeida. "GPLAB-a genetic programming toolbox for MATLAB." Proceedings of the Nordic MATLAB conference. 2003. Zitzler, Eckart, and Lothar Thiele. "Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach." Evolutionary Computation, IEEE Transactions on 3.4 (1999): 257-271. Koza, John R. Genetic Programming: vol. 1, On the programming of computers by means of natural selection. Vol. 1. MIT press, 1992. contact: [email protected]