Simultaneous data assimilation and parameter estimation

process, after parameter’s vicinity has been established. Thus, the composition of the DE and EPPES can be thought of as a stochastic analogous to the the typical deterministic parameter estimation process where the most probable value is determined by a particular deterministic optimizer and then the actual distribution is estimated by MCMC.

4.5 Simultaneous data assimilation and parameter estimation

Up to this point we have considered either the data assimilation problems where the namical model has well tuned parameters, or the problem of parameter estimation of dy-namical models assuming the data for initial values comes from some assimilation process i.e. the state estimation is done. Now we want to do both processes simultaneously, with the data assimilation process being constantly improved by better model parametrization.

Thus, we present a possible direction for simultaneous data assimilation and parameter estimation by combining some of the methods which have been discussed earlier. The results presented here should be considered as proof-of-concept and the selection of the particular methods is justified mostly by the availability of the implementation of the cor-responding algorithms.

In order to conduct an experiment for simultaneous data assimilation and parameter estimation, the synthetic observations have to be prepared. Here, we use the same Lorenz-95 system (4.11) as before. This time, however, in order to simulate real life applications when the full state of the system is unobservable, only a fraction of the whole state of the system is directly observed. In terms of data assimilation, this is achieved by providing a certain observation operator. In our case we observe 60% of the whole state of dimension 360: 24 out of 40 slow and 192 out of 320 fast variables. The observation operator in this case will be represented by a360×360diagonal matrix with value zeros and ones on the diagonal. The synthetic data is generated with the same parameter values as in section 4.4. The actual parametrization for the estimation process isθ = (θ₁, θ₂), where F_2n−1 =θ₁andF_2n = θ₂with the true valueθˆ= (8.5,8.5). The observation error with a standard deviation of 0.1 is added to the generated data and observations are initially available as each 0.01 time units. Such dense measurements allow us to use the same data set to check how the assimilation window size impacts the whole estimation process.

The assimilation process is done using the VEnKF method defined in subsection 2.2.4.

The benefit of an ensemble method is the ability to match the ensemble members of assimilation algorithm with those of the DE. We augmented the implementation of the VEnKF method to cooperate with DE. In the propagation of the state estimate (step 2 in the algorithm 6), the dynamical model is parametrized with parameter values computed as a mean of the current DE generation. However, ensemble propagation (step 3 in the algorithm 6) is done with individual ensemble members of DE population. From the DE point of view, the assimilation process becomes a part of the cost function evaluation.

Since each ensemble member in the assimilation process is run with different initial values (due to the VEnKF sampling process) and parameters (due to DE) we again operate in the context of the EPS. For each DE optimization window, a set of several assimilation runs is used in order to calculate the cost function. Within a single DE optimization window the parameter values of the ensemble members remain the same and are updated only after

the selection step. A crucial point is that the actual size of the DE optimization window can be larger than the previously described deterministic interval as long as assimilation steps constituting the window are within this interval. From now on we will refer to the simultaneous data assimilation and parameter estimation with DE as DA+DE.

A series of runs were conducted to double-check the ability of the DA+DE to success-fully converge both from the data assimilation and parameter estimation perspective and how the length of the DA and DE windows impact this behaviour.

Note that here we do not specifically tune any algorithm-specific DE parameters to purposely work well together with VEnKF, but rather reused settings which have proven their reliability either in the literature or our previous runs. Moreover, to keep the DA+DE process computationally feasible and promising for the further applications in more de-manding real cases, we keep the ensemble size of DA+DE as small as 20 members.

Thus, the only parameters of the DA+DE process we want to control are the length of the single assimilation window and the length of the DE window. We conducted a number of experiments with different pairs of window lengths, where both DA and DE window lengths are from the set[0.02,0.05,0.1,0.2,0.4,0.8,1.6,3.2]. All the runs were done with a fixed total length of 320 time units.

Figures 4.12 and 4.13 show one of the experiments in which the DE window length is fixed to 1.6 time units and the different coloured lines correspond to different DA window lengths (see the figures’ legend for details). Figure 4.12 shows the data assimilation con-vergence in terms of a root mean squared error (rmse) calculated from the state estimates provided by the VEnKF and the respective observations. Figure 4.13 shows parameter convergence, where the coloured lines correspond to the mean values of the DE genera-tion. Additionally, the black lines represent the true values of the parameters. The results for other combinations of the window lengths together with standard deviations of the estimated parameters are summed up in Table 4.14, Table 4.15 and Table 4.16. Here the means and standard deviations of the state and parameter estimates are calculated from the estimation processes within the last 100 time units i.e where usually convergence should have taken place. The numbers for parameters convergence in these tables correspond to the absolute value of the difference between the true parameter value and the estimated means. Thus, the best values are closer to the zero.

As we can conclude from these summary tables, there are optimal values (0.1 and 0.2) for the DA window length, where the convergence both with respect to the data assimilation and parameter estimation are the best, accounting both for the mean values and standard deviations. In the current setup, the length of the DE window does not play very significant role as long as it is large enough i.e at least twice as big as the DA window length. The overall intuition for the selection of the DA window length is that overly small assimilation windows do not leave time for the parameters to show their impact, while overly large assimilation windows make the convergence too difficult since the high stochasticity of the cost function in the model evolution is close to a chaotic regime.

These proof-of-concept experiments show that the DA+DE process can be achieved in an algorithmic way. However, the proper choice of the ratio between the length of the data assimilation and parameter estimation parts should be done thoughtfully as a

4.5 Simultaneous data assimilation and parameter estimation 55

50 100 150 200 250 300

time 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

RMSE

Size of DE window is 1.6

0.02 0.05 0.1 0.2 0.4 0.8 1.6

Figure 4.12: RMSE for the data assimilation with DE window size 1.6 .

50 100 150 200 250 300

-10 -5 0 5 10

Size of DE window is 1.6

truth 0.02 0.05 0.1 0.2 0.4 0.8 1.6

50 100 150 200 250 300

time -10

-5 0 5 10 15

2 truth

0.02 0.05 0.1 0.2 0.4 0.8 1.6

Figure 4.13: Parameter convergence with a DE window size of 1.6.

0.02 0.05 0.1 0.2 0.4 0.8 1.6 3.2

Figure 4.14: Summary of the RMSE for the data assimilation.

The top table presents the means and the bottom table presents standard deviations of the last 100 time units.

Figure 4.15: Summary of the first parameter convergence.

The top table presents the means and the bottom table presents standard deviations of the last 100 time units.

trade-off between convergence speed of both algorithms.

4.5 Simultaneous data assimilation and parameter estimation 57

Figure 4.16: Summary of the the second parameter convergence.

The top table presents the means and the bottom table presents standard deviations of the last 100 time units.

5 Further applications

In this chapter we will briefly discuss other applications of the implemented differential evolution optimization framework. Here we consider two published articles: “Robust pa-rameter estimation of chaotic systems” by Springer et al. (2019) and “Algorithmic tuning of spread-skill relationship in ensemble forecasting systems” by Ekblom et al. (2019).

5.1 Robust parameter estimation of chaotic systems

The focus of the paper is to develop an alternative approach for a parameter estimation process of chaotic systems. Generally, this consists of three main steps: the formula-tion of the cost funcformula-tion as a statistical likelihood based on given or assumed informaformula-tion about the measurements; finding the MAP point of this cost function using a certain op-timization technique; and uncertainty quantification where the full posterior distribution of the parameters is constructed. The problem of a chaotic environment is that a likeli-hood cannot be easily created for the whole time interval since in a long run the impact of the parameters are confused by even a small deviation of the problem definition (in the initial values, parameter values, solver tolerance, etc.). As we already discussed before, one way to deal with chaosticity is to simply avoid it by splitting the dynamics of the model into subintervals (assimilation windows) so that deterministic simulation can be conducted. This approach requires that a sufficient number of measurements are available for the time intervals together with well-known initial values. This requirement can be rather difficult to satisfy in higher dimensional systems and sparse measurements.

Here, a special construction of a likelihood function that uses measurements from the whole time series is employed. This type of cost function is called a correlation integral likelihood (CIL) and was originally introduced in Haario et al. (2015). This approach is further developed by Springer et al. (2019). The idea of CIL is as follows. The state of the dynamical system is denoted bys=s(t,θ,x), with timet, parametersθand other inputs xsuch as, e.g., the initial values. One can denote the measurements of the system at time instancet_iass_i=s(t_i,θ,x). Now, if we consider two different trajectoriess=s(t,θ,x) andˆs = s(t,θ,ˆx)ˆ obtained for different parameters and input sets(θ,x)and(θ,ˆ ˆx), the modified correlation sum is defined as

C(R, N,θ,x,θ,ˆ ˆx) = 1 N²

i,j≤N

#(ks_i−ˆs_jk< R), (5.1) where it is assumed thatR >0andNdenotes the number of observations taken from the trajectories at time instancest= {t₁, . . . , t_N}. The expression#(ks_i−ˆs_jk < R)gives the value 1 when the state measurements are within a sphere of radiusRfrom each other and 0 otherwise. For a fixed parameter i.eθˆ=θand a fixed set of radiiR_k, k= 1, . . . , M, we can calculate theM-dimensional feature vector:

y={C(R_k, N,θ,x,θ,ˆ ˆx)}_k=1,...,M (5.2) With randomized valuesˆx 6= x, the vector y is stochastic and normally distributed by

generalization of the central limit theorem. Moreover, it provides an empirical cumula-tive distribution function (ecdf) for the corresponding distances with the values of radii R_k as bins. Thus, a training set consisting of a sufficient number of mutually different trajectories withxˆ6= xis employed to calculate the meany¯and the covarianceΣof the vectory. The target CIL is then defined asexp(−¹₂(y−y)¯^TΣ⁻¹(y−y)).¯

In Springer et al. (2019) we discuss how this likelihood can be computed in practice for the task of parameter estimation for various 3D examples. It also shows that aug-menting with the velocity component of the dynamics state of the model can significantly improve the quality of the likelihood. The DE method is used here to maximize the con-structed CIL in order to find the MAP. This optimization is accessed using the variant of the DE with a recalculation step as defined in Shemyakin and Haario (2018). After, the full posterior distribution is achieved by the MCMC sampling.

During the experiments, another useful feature has been revealed as a side product of DE optimization runs. We have noticed that DE is able to produce not only MAP, but also a good initial distributions for adaptive MCMC sampling (Haario et al. (2001, 2006)), when the mean and covariance for this distribution are calculated from the united several last generations. Moreover, in many test cases considered in the article, the proposal distributions almost coincide with the actual posterior distribution obtained by MCMC.

Figure 5.1 (the picture is taken from Springer et al. (2019)) shows an example for such a case with a well-known Lorenz-63 system.

2 2.2 2.4 2.6 2.8

6 8 10 12 14

2 2.2 2.4 2.6 2.8

24 26 28 30 32

6 8 10 12 14

24 26 28 30 32

PARAMETER POSTERIOR DE-PROPOSAL

Figure 5.1: Proposal distribution provided by DE and the estimated posterior distribution for the Lorenz-63 system.

The properties of DE, easy parallelization of the population member calculation as well as robustness in the stochastic environment, played a significant role in the process of parameter estimation. Additionally, the side product of DE optimization runs, a reliable proposal distribution, was valuable for the higher acceptance rate in the MCMC and faster

In document Parameter Estimation of Large-Scale Chaotic Systems (sivua 54-62)