Algorithmic tuning of spread-skill relationship in ensemble forecasting

convergence.

5.2 Algorithmic tuning of spread-skill relationship in ensemble fore-casting systems

This work considers the EPSs and their so-called ensemble spread parameters, respon-sible for the prediction skill of the system. Manual adjustment of the ensemble spread parameters is challenging and thus there is a call for algorithmic tuning.

The important part of the optimization problems is to properly choose the cost func-tion. Here, the cost function is based on the filter likelihood of the ensemble spread parametersΘ. It is defined as a twice negative log likelihood of the form (3.17):

L(Θ) =

k=1

[(y_k−y_k^p)^T(C^y_k)⁻¹(y_k−y^p_k) + log|C^y_k|], (5.3) where the sum is overnconsecutive assimilation windows,y_k^pare the model mean pre-dictions projected on the observation space, whiley_k are the actual observations;C^y_k is the uncertainty of these one-step-ahead predictions. Both the mean predictiony^p_kand its covarianceC^y_kare functions of the spread parametersΘ. Note here, that the KF likelihood can be used even if no KF data assimilation has been done.

There are a number of reasons for using a filter likelihood cost function for this prob-lem. First of all, the filter likelihood function represents a compromise between the pre-diction errory_k−y_k^pand the ensemble spreadC^y_k. Besides, it is a rather natural selection here since it can be calculated from the ensemble of simulations. Moreover, if the size of the ensemble is sufficient in comparison to the number of observations in the assimilation window, the analytical covariances can be replaced by those calculated from the ensem-ble. Note also, that in practice, the prediction covarianceC^y_kcan be approximated by the diagonal matrix of the corresponding prediction variances which simplifies (5.3) to

L(Θ) =

k=1 L

l=1

[(y_k,l−y_k,l^p )² σ_y²

k,l+σ²_p

k,l

+ log(σ_y²

k,l+σ_p²

k,l)], (5.4)

wherey^p_k,landσ_p²

k,l are the prediction mean and its variance for the observation number lwithin assimilation windowk, andy_k,landσ_y²_k,lare the actual observation and its error variance. Note, the sum is over both assimilation windows and observations.

The optimization target of the estimation process is the assessment of the spread-skill relationship of certain ensemble spread parametersΘof the EPS with an ensemble of size N. This relationship, in general, can be determined from an adequately long sequence of consecutive ensemble runs S_M(Θ), where the subscript M denotes the length of such sequence. Additionally, we assume that there is a likelihood functionL(S_M(Θ))that can be computed from these sequences and this function is sensitive to the ensemble spread parametersΘof the EPS so that the sequence with the higher predictive skill would lead to a higher likelihood. Now, if one would be asked to compare the spread-skill properties of

ensemble spread parameters within a set ofKdifferent combinations of such parameters (Θ₁,Θ₂, . . . ,Θ_K), this would requireK ×M ×N forecast evaluations whereK is the number of different parameter combinations to test , M is the number of consecutive assimilation windows used to calculate the filter likelihood cost function and N is the ensemble size of the EPS. Naturally, an ensemble of ensembles is launched to perform this task. This setup is visualized in Figure 5.2 (the picture is taken from Ekblom et al.

(2019)) and more details can be found in Ekblom et al. (2019). It is easy to see that this process is rather computationally demanding, which calls for optimizers which are able to converge as fast as possible. Moreover, it should be able to deal with stochastic cost functions and be able to nicely fit into the ensemble environment. These requirements motivated the use of DE to solve this problem.

sequence of length M

. . . . . .

...

. . .

...

K sequences

N ensemble members

àL1

àL2

àLK

71 à

72 à

7_Kà

Figure 5.2: The assessment of spread-skill properties of ensemble spread parameters.

For each sequenceS_M(Θ_k)the likelihood valueL_kis evaluated.

In Ekblom et al. (2019) we proceed with the experiments where the Lorenz95 sys-tem (4.11) and the Wilks stochastic modification of this syssys-tem (see Wilks (2005)) are used for an idealized ensemble prediction. The former formulation is used to generate synthetic data while the latter is utilized as a forward model for the ensemble spread parameter estimation. The ensemble spread parameter vectorΘin this case consists of three values which come form the Wilks formulation, where two parameters impact the spread skill of the EPS and initial value perturbation scale factor. The parameter estima-tion process was preceded by a sensitivity analysis with respect to each of the ensemble spread parameters. Then, the estimation process using DE as a stochastic optimizer and filter likelihood type cost function was conducted in various set-ups. The results of the estimation process were justified by the number of the validation techniques. This con-firmed that algorithmic tuning of the ensemble spread parameters is possible in idealized systems. However, more realistic set-ups may require a more rigorous choice of the esti-mation process set-up and DE parameters.

Here, we want to emphasize the differences between the work done in Shemyakin and Haario (2018) and Ekblom et al. (2019) in similar looking experiments. Both works are aimed towards a parameter estimation process using the Lorenz95 system as an idealized test case and DE as an optimizer, but the overall intentions are significantly different. For instance, in Shemyakin and Haario (2018) the main problem was to estimate the closure

5.2 Algorithmic tuning of spread-skill relationship in ensemble forecasting systems63

parameters of the chaotic model and to show that DE, in general, can be applied in the stochastic environment. In Ekblom et al. (2019), however, the results are used for the estimation of ensemble spread parameters. This is done using a filter likelihood cost function instead of the usual least square cost function used the Shemyakin and Haario (2018).

6 Summary and conclusion

The motive for the present work was to explore alternative approaches to problems with restrictively high computational demands but where parallel, ensemble simulations are available. In weather prediction, the EPS (Ensemble Prediction System) provides such an example. The EPPES (Ensemble Prediction and Parameter Estimation System) employs EPS simulations by turning it into an algorithmic way to estimate closure parameters of chaotic models. The EPPES, as an extension to the EPS, addresses the problem with the aid of sequential updates of the hyper-parameters of certain statistical distributions, which allows the estimation of both parameters and their uncertainty quantification. The approach is heuristic but has produced promising results for improving operational NWP (numerical weather prediction) model parameters, a task so far done by manual tuning.

However, the initial applications of the EPPES have revealed certain issues, such as po-tentially slow convergence, lack of methods to perform multi-criteria optimization and failure to track possible seasonality in closure parameters.

Differential evolution (DE) belongs to the family of evolutionary or genetic algo-rithms, originally proposed to deal with a broad range of deterministic optimization prob-lems. It was selected here to optimize the kind of stochastic cost functions that emerge from the ensemble simulations used in the EPS. First, the ability of a new algorithm, called DE-EPPES, was tested for single criteria cost function optimization, especially with respect to the convergence speed in the case of poor prior knowledge of the estimated parameters. The development of DE as a tool for stochastic cost function optimization was continued next, with the focus on multi-objective optimization. A total cost function based on importance weights was constructed, either by algebraic or geometric means of individual cost functions. The calculation of the importance weights requires a scaling.

This implies issues for the convergence of DE, and certain modifications to the algorithm were needed. This led to the introduction of a recalculation step to keep the information of the quality of the parameters up-to-date. It was again confirmed for multivariate criteria, that the DE-EPPES was superior to the EPPES in terms of the convergence speed from poor prior data. Additionally, the ability of DE to estimate and follow seasonally varying parameters was verified. However, the original EPPES algorithm is more stable when identifying a Gaussian approximation of the posterior distribution of the parameters in cases where accurate prior data is provided. We may summarize here an analogy to usual, deterministic likelihood estimation: first optimize the estimates for parameters using the DE, then apply the EPPES as a stochastic sampler to provide approximate uncertainty quantification.

In the works discussed above, the parameter estimation of chaotic systems was es-sentially conducted by avoiding the chaoticity: the dynamics of the systems were split into short intervals within which the system behaves predictably. Then, the parameters are estimated by sequentially advancing through each of these intervals. Such an ap-proach requires a sufficient number of observations, in addition initial values for the state vector must be provided via some data assimilation process for each assimilation inter-val. However, these requirements might not be always fulfilled. Recently an approach was developed, that allows the identification of parameter posteriors of chaotic systems

directly from sparse time series data, without the knowledge of initial values. The algo-rithm employs ideas from fractal dimension theory to calculate a Gaussian likelihood for the sampling parameters using adaptive MCMC methods. However, in order to find a MAP (maximum likelihood point) estimate of this likelihood, a suitable optimizer has to be used. Due to chaoticity, the cost function is stochastic, so our modified DE algorithm applies again. Moreover, it was noticed here that DE was able to produce not only a MAP estimate, but also a reliable initial proposal distribution for the adaptive MCMC runs. Fur-thermore, in certain test cases an actual posterior distribution achieved by MCMC almost coincided with the initial proposal distribution provided by DE.

The objective of the previously mentioned problems was, in general, to estimate clo-sure parameters of certain models i.e an estimation of the parameters which impact the dynamics of the systems. However, we also collaborated in estimating the ensemble spread within the EPS framework itself. A proper tuning of the ensemble spread is still an open problem in meteorology. It was shown that algorithmic tuning of spread parameters can be achieved by utilizing a filter likelihood-type cost function and by using DE as a stochastic optimizer. A number of successful experiments were conducted with the use of the Lorenz95 system and its stochastic modification by Wilks. The results of these experiments were successfully verified by various validation techniques.

Besides the aforementioned works, there are a number of ongoing studies where the use of the DE for stochastic cost function optimization are being tested. This disserta-tion was finalized during a period of time when epidemiology is of particular concern for society in general and is becoming a field of unusually intensive modelling research.

Even before the crisis, we have identified this area as a potential application field: de-tailed individual-based Monte Carlo simulation models lead to computationally demand-ing stochastic cost functions, that naturally require the use of effective stochastic opti-mization methods.

References 67

References

Andersson, E., Fisher, M., Munro, R., and McNally, A. (2000). Diagnosis of background errors for radiances and other observable quantities in a variational data assimilation scheme, and the explanation of a case of poor convergence.Quarterly Journal of the Royal Meteorological Society. ISSN 00359009, doi:10.1256/smsqj.56511.

Auvinen, H., Bardsley, J.M., Haario, H., and Kauranne, T. (2009). Large-scale kalman fil-tering using the limited memory BFGS method.Electronic Transactions on Numerical Analysis. ISSN 10689613.

Auvinen, H., Bardsley, J.M., Haario, H., and Kauranne, T. (2010). The variational Kalman filter and an efficient implementation using limited memory BFGS.International Jour-nal for Numerical Methods in Fluids. ISSN 02712091, doi:10.1002/fld.2153.

Bannister, R.N. (2001). Elementary 4d-var.DARC Technical Report No. 2.

Bengtsson, L., Ghil, M., and K¨all´en, E. (1981).Dynamic Meteorology: Data Assimilation Methods, vol. 36.

Bibov, A., Haario, H., and Solonen, A. (2015). Stabilized BFGS approximate Kalman filter.Inverse Problems and Imaging. ISSN 19308345, doi:10.3934/ipi.2015.9.1003.

Bibov, A. (2017).Low-memory filtering for large-scale data assimilation. Ph.D. thesis.

Lappeenranta University of Technology. ISBN 978-952-335-076-2.

Bouttier, F. and Courtier, P. (2002). Data assimilation concepts and methods.

Meteorological training course lecture series. ECMWF. ISSN 0094-8276, doi:

10.1029/2007GL030733.

Buizza, R. (2000). Chaos and weather prediction.ECMWF Training Courses.

Bujok, P. and Tvrd´ık, J. (2015). Adaptive differential evolution: SHADE with competing crossover strategies. In: Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). ISBN 9783319193236, ISSN 03029743.

Byrd, R.H., Lu, P., Nocedal, J., and Zhu, C. (1995). A Limited Memory Algorithm for Bound Constrained Optimization.SIAM Journal on Scientific Computing. ISSN 1064-8275, doi:10.1137/0916069.

Byrd, R.H., Nocedal, J., and Schnabel, R.B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods.Mathematical Programming. ISSN 00255610, doi:10.1007/BF01582063.

Campagnoli, P., et al. (2009). Dynamic linear models. In: Dynamic Linear Models with R.

Cardinali, C. (2013). Data Assimilation: Observation Impact on the Short Range Forecast.

ECMWF Training Courses.

Chakraborty, U.K. (2008).Advances in Differential Evolution, vol. 143. Springer: Verlag.

ISBN 978-3-540-68827-3, 338 p.

Das, S. and Suganthan, P.N. (2011). Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation. ISSN 1089778X, doi:

10.1109/TEVC.2010.2059031.

Del Moral, P. (1998). Measure-valued processes and interacting particle systems. Applica-tion to nonlinear filtering problems 1.Annals of Applied Probability. ISSN 10505164, doi:10.1214/aoap/1028903535.

Del Moral, P. (1997). Nonlinear filtering: Interacting particle resolution.Comptes Ren-dus de l’Acad´emie des Sciences - Series I - Mathematics. ISSN 07644442, doi:

10.1016/S0764-4442(97)84778-7.

Del Moral, P. (2005). Feynman?Kac Formulae: Genealogical and Interacting Particle Systems with Applications. ISBN 0387202684.

Dennis, Jr., J.E. and Mor´e, J.J. (1977). Quasi-Newton Methods, Motivation and Theory.

SIAM Review. ISSN 0036-1445, doi:10.1137/1019005.

Dennis, Jr., J.E. and Schnabel, R.B. (1979). Least Change Secant Updates for Quasi-Newton Methods.SIAM Review. ISSN 0036-1445, doi:10.1137/1021091.

Dowd, M. (2011). Estimating parameters for a stochastic dynamic marine ecological sys-tem.Environmetrics. ISSN 11804009, doi:10.1002/env.1083.

Durbin, J. and Koopman, S.J. (2012). Time Series Analysis by State Space Methods:

Second Edition.Aging. ISSN 19454589, doi:10.1017/CBO9781107415324.004.

Ekblom, M., et al. (2019). Algorithmic tuning of spread?skill relationship in ensem-ble forecasting systems.Quarterly Journal of the Royal Meteorological Society. ISSN 0035-9009, doi:10.1002/qj.3695.

Evensen, G. (2009). The ensemble Kalman filter for combined state and pa-rameter estimation. IEEE Control Systems Magazine. ISSN 0272-1708, doi:

10.1109/MCS.2009.932223.

Evensen, G. (1992). Using the extended Kalman filter with a multilayer quasi-geostrophic ocean model. Journal of Geophysical Research: Oceans. ISSN 01480227, doi:

10.1029/92JC01972.

Evensen, G. (2004). Sampling strategies and square root analysis schemes for the EnKF.

Ocean Dynamics. ISSN 16167341, doi:10.1007/s10236-004-0099-2.

Feoktistov, V. (2006). Differntial Evolution: In Search of Solutions. Springer Science.

ISBN 9780387368955.

References 69

Fisher, M. and Andersson, E. (2001). Developments in 4D-Var and Kalman filtering.

ECMWF Tech. Memo. 347, 38 pp. Fournier, A.

Gauthier, P., Courtier, P., and Moll, P. (1993).Assimilation of Simulated Wind Lidar Data with a Kalman Filter. doi:10.1175/1520-0493(1993)121¡1803:AOSWLD¿2.0.CO;2, ISSN 0027-0644.

Gauthier, P., et al. (2007). Extension of 3DVAR to 4DVAR: Implementation of 4DVAR at the Meteorological Service of Canada.Monthly Weather Review. ISSN 0027-0644, doi:10.1175/MWR3394.1.

Haario, H., Kalachev, L., and Hakkarainen, J. (2015). Generalized correlation integral vectors: A distance concept for chaotic dynamical systems.Chaos. ISSN 10541500, doi:10.1063/1.4921939.

Haario, H., Laine, M., Mira, A., and Saksman, E. (2006). DRAM: Efficient adaptive MCMC.Statistics and Computing. ISSN 09603174, doi:10.1007/s11222-006-9438-0.

Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive Metropolis algorithm.

Bernoulli. ISSN 13507265, doi:10.2307/3318737.

Hakkarainen, J., et al. (2012). On closure parameter estimation in chaotic systems. Non-linear Processes in Geophysics. ISSN 10235809, doi:10.5194/npg-19-127-2012.

Hamrud, M., Bonavita, M., and Isaksen, L. (2015). EnKF and Hybrid Gain Ensemble Data Assimilation. Part I: EnKF Implementation.Monthly Weather Review. ISSN 0027-0644, doi:10.1175/MWR-D-14-00333.1.

Heimbach, P., Hill, C., and Giering, R. (2005). An efficient exact adjoint of the parallel MIT General Circulation Model, generated via automatic differentiation. In: Future Generation Computer Systems. ISSN 0167739X.

Houtekamer, P. and Mitchell, H. (2001). A Sequential Ensemble Kalman Filter for Atmo-spheric Data Assimilation.American Meteorological Society. ISSN 0027-0644, doi:

10.1175/1520-0493(2001)129¡0123:ASEKFF¿2.0.CO;2.

J¨arvinen, H., et al. (2010). Estimation of ECHAM5 climate model closure parameters with adaptive MCMC. Atmospheric Chemistry and Physics. ISSN 16807316, doi:

10.5194/acp-10-9993-2010.

J¨arvinen, H., Laine, M., Solonen, A., and Haario, H. (2012). Ensemble prediction and parameter estimation system: The concept.Quarterly Journal of the Royal Meteoro-logical Society, 138(663), pp. 281–288. ISSN 00359009, doi:10.1002/qj.923.

Julier, S.J. and Uhlmann, J.K. (2004). Unscented filtering and nonlinear estimation. In:

Proceedings of the IEEE. ISBN 9780470747049, ISSN 00189219.

Kalman, R.E. (1960). A New Approach to Linear Filtering and Prediction Problems. Jour-nal of Basic Engineering. ISSN 00219223, doi:10.1115/1.3662552.

Kalman, R.E. and Bucy, R.S. (1961). New Results in Linear Filtering and Prediction Theory.Journal of Basic Engineering. ISSN 00219223, doi:10.1115/1.3658902.

Laine, M., Solonen, A., Haario, H., and J¨arvinen, H. (2012). Ensemble prediction and parameter estimation system: The method.Quarterly Journal of the Royal Meteoro-logical Society, 138(663), pp. 289–297. ISSN 00359009, doi:10.1002/qj.922.

van Leeuwen, P.J. (2009). Particle Filtering in Geophysical Systems.Monthly Weather Review. ISSN 0027-0644, doi:10.1175/2009MWR2835.1.

van Leeuwen, P.J. (2011). Efficient nonlinear data-assimilation in geophysical fluid dy-namics.Computers and Fluids. ISSN 00457930, doi:10.1016/j.compfluid.2010.11.011.

van Leeuwen, P.J. and Evensen, G. (1996). Data Assimilation and Inverse Methods in Terms of a Probabilistic Formulation.Monthly Weather Review. ISSN 0027-0644, doi:

10.1175/1520-0493(1996)124¡2898:DAAIMI¿2.0.CO;2.

van Leeuwen, P.J.V. (2017). Particle Filters for nonlinear data assimilation in high-dimensional systems.Annales de la facult´e des sciences de Toulouse.

Lewis, J.M. and Derber, J.C. (1985). The use of adjoint equations to solve a varia-tional adjustment problem with advective constraints.Tellus A. ISSN 16000870, doi:

10.1111/j.1600-0870.1985.tb00430.x.

Liu, D.C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming. ISSN 00255610, doi:

10.1007/BF01589116.

Lorenc, A.C. (1986). Analysis methods for numerical weather prediction. Quar-terly Journal of the Royal Meteorological Society. ISSN 1477870X, doi:

10.1002/qj.49711247414.

Lorenc, A.C. and Rawlins, F. (2006). Why does 4D-Var beat 3D-Var?Quarterly Journal of the Royal Meteorological Society. ISSN 00359009, doi:10.1256/qj.05.85.

Lorenz, E.N. (1996). Predictability: a problem partly solved.Predictability of Weather and Climate, pp. 40–58. doi:10.1017/CBO9780511617652.004.

Maddison, J.R. and Farrell, P.E. (2014). Rapid development and adjoining of transient fi-nite element models.Computer Methods in Applied Mechanics and Engineering. ISSN 00457825, doi:10.1016/j.cma.2014.03.010.

McElhoe, B.A. (1966). An assessment of the navigation and course corrections for a manned flyby of mars or venus.IEEE Transactions on Aerospace and Electronic Sys-tems. ISSN 00189251, doi:10.1109/TAES.1966.4501892.

Nocedal, J. (1980). Updating quasi-Newton matrices with limited storage.Mathematics of Computation. ISSN 00255718, doi:10.2307/2006193.

References 71

Ollinaho, P., et al. (2013a). Parameter variations in prediction skill optimization at ECMWF.Nonlinear Processes in Geophysics, 20(6), pp. 1001–1010. ISSN 10235809, doi:10.5194/npg-20-1001-2013.

Ollinaho, P., et al. (2014). Optimization of NWP model closure parameters using total energy norm of forecast error as a target.Geoscientific Model Development, 7(5), pp.

1889–1900. ISSN 19919603, doi:10.5194/gmd-7-1889-2014.

Ollinaho, P., et al. (2013b). NWP model forecast skill optimization via closure parameter variations.Quarterly Journal of the Royal Meteorological Society, 139(675), pp. 1520–

1532. ISSN 00359009, doi:10.1002/qj.2044.

Ott, E., et al. (2004). A local ensemble Kalman filter for atmospheric data assimila-tion.Tellus, Series A: Dynamic Meteorology and Oceanography. ISSN 02806495, doi:

10.1111/j.1600-0870.2004.00076.x.

Phillips, N.A. (1971). Numerical weather prediction.Eos, Transactions American Geo-physical Union. ISSN 23249250, doi:10.1029/EO052i006pIU420.

Price, K., Storn, R.M., and Lampinen, J.A. (2005). Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series).The Journal of heredity, 104, p. 542. ISSN 1465-7333.

Qing, A. (2009). Differential Evolution: Fundamentals and Applications in Electrical Engineering. Wiley. ISBN 9780470823927 (ISBN), 1–418 p.

Schweppe, F. (1965). Evaluation of likelihood functions for Gaussian sig-nals. IEEE Transactions on Information Theory, 11(1), pp. 61–70. doi:

10.1109/TIT.1965.1053737.

Service, A.E. (1998). Data Assimilation Using an Ensemble Kalman Filter Tech-nique. American Meteorological Society. ISSN 0027-0644, doi:10.1175/1520-0493(1999)127¡1374:CODAUA¿2.0.CO;2.

Shemyakin, V. and Haario, H. (2017). Tuning Parameters of Ensemble Prediction System and Optimization with Differential Evolution Approach.

Shemyakin, V. and Haario, H. (2018). Online identification of large-scale chaotic system.

Nonlinear Dynamics. ISSN 1573269X, doi:10.1007/s11071-018-4239-5.

Smith, G.L., Schmidt, S.F., and McGee, L.A. (1962). Application of Statistical Filter Theory to the Optimal Estimation of Position and Velocity on Board a Circumlunar Vehicle. Technical report. ISBN 978-84-7666-231-1.

Solonen, A., et al. (2012). Variational ensemble kalman filtering using limited mem-ory BFGS. Electronic Transactions on Numerical Analysis. ISSN 10689613, doi:

10.1002/fld.

Springer, S., et al. (2019). Robust parameter estimation of chaotic systems.Inverse Prob-lems and Imaging. ISSN 1930-8337, doi:10.3934/ipi.2019053.

Storn, R. and Price, K. (1997). Differential Evolution ? A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Opti-mization, 11(4), pp. 341–359. ISSN 1573-2916, doi:10.1023/A:1008202821328, url:

http://dx.doi.org/10.1023/A:1008202821328.

Strawderman, R.L. and Higham, N.J. (1999). Accuracy and Stability of Numerical Algorithms. Journal of the American Statistical Association. ISSN 01621459, doi:

10.2307/2669725.

Talagrand, O. (1997). Assimilation of observations, an introduction.Journal of the Mete-orological Society of Japan. ISSN 00261165, doi:10.1256/qj.02.132.

Tanabe, R. and Fukunaga, A. (2013). Success-history based parameter adaptation for

In document Parameter Estimation of Large-Scale Chaotic Systems (sivua 62-151)