Dynamical linear modelling and filter likelihood for chaotic systems

"_∂_M(s_˜

∂xk

∂M(s˜ k)

∂θk

0 I

. (3.6)

Therefore, in order to apply the EKF to the augmented state model, one should provide not only the partial derivatives of the evolution model with respect to the actual state of the systemx_k, but also the partial derivatives of this operator with respect to the model parametersθ_k.

In order to successfully apply the EKF, apart from the linearization, the model error covariance matrixC_x,θ should be defined. The state and the parameter errors are most likely correlated, but for simplicity they are usually modelled as mutually independent random variables. Hence, the matrixC_x,θhas the block diagonal form:

C_x,θ =

C_x 0 0 C_θ

, (3.7)

where the model state error part,C_x, has the same meaning as in the usual EKF formula-tion i.e it represents statistical properties of the filter step. On the other hand,C_θdoes not have a clear statistical interpretation and can be thought as a control of how parameters θare allowed to change between sequential assimilation steps. It should be pointed out that the random walk model used in (3.2) to define the parameter evolution is not the only available option. However, more sophisticated models would still require adjusting the respective model parameters.

It has been shown (see Hakkarainen et al. (2012)), that SA, if tuned properly, can be successfully applied to the parameter estimation problem of chaotic models. This is computationally feasible since no extra steps are involved with respect to the usual filtering methods. Nonetheless, it may be challenging as tuning of the parameter error covarianceC_θ is additionally convoluted due to the lack of both physical interpretation and the ability to algorithmically adjust it. Moreover, the statistical inference of the model parameters is difficult with SA, sinceθ is explicitly defined as a dynamical quantity: the parameter values are allowed to change at every assimilation step when new data comes so they may not converge to fixed values or to a fixed posterior (see J¨arvinen et al. (2010)).

3.2 Dynamical linear modelling and filter likelihood for chaotic sys-tems

The statistical analysis of data represented as a time series can be challenging since usu-ally only a single realization of the underlying process, whose properties might not be completely known, is available. There are a number of specific models to access the esti-mation of the underlying stochastic processes such as autoregressive (AR), integrated (I)

3.2 Dynamical linear modelling and filter likelihood for chaotic systems 31

and moving average (MA) models and their combinations. These models, however, as-sume some structure of the estimated processes and the resulting analysis might strongly depend on such assumptions. Dynamic regression done by dynamical linear modelling (DLM) unifies the estimation process in the KF framework. In time-series there are sev-eral essential components in the structure of the process: a slowly varying background level (trend), seasonality, external forcing and stochastic noise. DLM belongs to the class of state-space models, where specific assumptions of the form of observation and evolu-tion equaevolu-tion are made: linearity and Gaussianity.

Let us recall the basic setting for KF,

x_t = G_tx_t−1+w_t, (3.8)

y_t = F_tx_t+ν_t. (3.9)

We now assume that the model matricesG_t and F_t may depend on static parameters θ and want to estimate them in addition to the state vector x_t (Kalman (1960)). The error terms ν_t and w_t are again assumed to be mutually independent realizations of the standard normal distribution. DLM employs a Bayesian approach and represents a 3-level hierarchical statistical model: the uncertainty of the observation (3.10); the uncertainty of the state of the model and its evolution (3.11); and the uncertainty of the model parameters (3.12):

y_k ∼ p(y_k|x_k), (3.10)

x_k ∼ p(x_k|xk−1,θ), (3.11)

θ ∼ p(θ), (3.12)

Assume that a time seriesy₁, . . . ,y_nis available, the task of the parameter estimation process in Bayesian terms is to find the posterior distributionp(θ|y_1:n), wherey_1:n de-notesy_1:n = {y₁, . . . ,y_n}. The problem is to compute the marginal distribution of the state given all the measurements obtained until now. Thus, for a given parameter value θ, the filtering in time step k estimatesp(x_k|y_1:k,θ). As it was previously discussed, this is achieved by two consequent steps within each time interval: a prediction and an update. The prediction step is done by propagation of the previous state distribution p(x_k−1|y1:k−1,θ)towards the current time step using the evolution modelp(x_k|xk−1,θ),

p(x_k|y1:k−1,θ) = Z

p(x_k|xk−1,θ)p(xk−1|y1:k−1,θ)dxk−1. (3.13) When new observations for the current time step become available, they are incorporated with a predictive distribution (3.13) as a prior in Bayes’ rule formula to compute the model state posterior distribution:

p(x_k|y_1:k,θ)∝p(y_k|x_k,θ)p(x_k|y_1:k−1,θ). (3.14) The prediction distribution of the next observations given all previous observations can

also be calculated using the formula (3.13):

p(y_k|y1:k−1,θ) = Z

p(y_k|x_k,θ)p(x_k|y1:k−1,θ)dxk−1. (3.15) Finally, the use of Bayes’ rule and the chain rule for joint probabilities composes the initial estimation problem of the posterior distributionp(θ|y_1:n)in the following form:

p(θ|y_1:n) ∝ p(y_1:n|θ)p(θ) = (3.16)

= p(y_n|y1:n−1,θ)×p(yn−1|y1:n−2,θ)× · · · ×p(y₁|θ)p(θ).

This formulation suggests that the likelihood of the whole observation sety_1:n can be factorized into a product of predictive distributions of individual observations. The theoretical framework described above relates to the filtering and parameter estimation problems. In practice, it can be implemented with the aid of a given filtering method to compute the posterior density for a given parameter and an algorithm to obtain estimates, such as optimization and Monte Carlo approaches (see Campagnoli et al. (2009); Durbin and Koopman (2012)). So, for the case of EKF, the likelihoodp(y_1:n|θ)can be expressed explicitly (Schweppe (1965)) as

p(y_1:n|θ) = p(y₁|θ)

k=2

p(y_k|y_1:k−1,θ) =

= 1

p(2π)^d|C^y_k|

k=1

exp −1

2r^T_k(C^y_k)⁻¹r_k

∝

∝ exp

−1 2

k=1

r^T_k(C^y_k)⁻¹r_k+ log|C^y_k|

, (3.17)

where C^y_k = H_kCˆ_kH^T_k +C_η_k, r_k = y_k− H(ˆx_k) in terms of algorithm (1) and|C^y_k| denotes the matrix determinant.

The filter likelihood (FL) approach includes the static parametersθ in definition of the dynamical model:

x_k = M(xk−1,θ) +_k, (3.18)

y_k = H(x_k) +η_k, (3.19)

and uses the expression (3.17) as the likelihood for estimating the static parameters.

While actually written for stochastic systems, the formulas (3.18)-(3.19) are often used also for deterministic models. In that case error terms_k andη_k are interpreted as bias i.e modelling and observation uncertainties (see Hakkarainen et al. (2012)). Note, the normalization constant|C^y_k|of the likelihood (3.17) implicitly depends on the param-eterθthroughCˆ_k = M_kC^y_k−1M_k^T +C_k, where the modelM_k explicitly depends on θ. Thus, the term log|C^y_k|must not be neglected. This particular term is the only dif-ference from the classical Gaussian likelihood function, commonly used for parameter

3.2 Dynamical linear modelling and filter likelihood for chaotic systems 33

estimation problems. Note, that if the model error covarianceC_kis unknown, it can be estimated from the measurements together with unknown parametersθ. This ability is one of the superior properties of the filter likelihood approach with respect to the state augmentation method. The FL operates in an “offline” regime, since it involves several repeated filtering iterations over a given time interval and the number of such iterations solely depends on the parameter estimation method utilized. This property introduces an extra computational cost to the algorithm. Moreover, in high dimensional problems, the EKF becomes unfeasible. Even though the EKF can be replaced by ensemble filtering al-gorithms, in practice, most of such algorithms require random perturbations of the model state, which might introduce stochasticity in the definition of the filter likelihood formula and complicates the calculations (Dowd (2011)).

4 Parameter estimation by ensemble and population based methods

4.1 Ensemble prediction system

Despite the vast availability of Kalman filter-based parameter estimation routines we con-sidered earlier, the practical use of the most of them is limited in truly high dimensional set-ups, such as NWP models, since no exact filtering is available there due to memory and computational limitations. Although ongoing development in ensemble data assim-ilation implementations for NWP models may allow the direct use of them, most of the operational assimilation models currently are still variational.

Iterative sampling techniques, such as MCMC, cannot be directly used for parameter estimation of such models due to several reasons. On the one hand, approximation of the closure parameters distribution in an off-line manner i.e when the observations/analysis are available for the whole integration time interval, using the MCMC would require an unfeasible number of very CPU-demanding forward model evaluations. On the other hand, the direct use of MCMC for on-line parameter estimation, when the whole time interval is split into sequential assimilation intervals and the estimation is done as soon as there are new observations available would be unfeasible since there is no fixed likeli-hood function which is normally required in the accept/reject step of the MCMC. These limitations call for alternative approaches for the on-line estimation of parameters with as small extra computational effort required as possible.

In such circumstances, the exploitation of an ensemble prediction system (EPS), which is widely used in operational weather forecast systems, becomes a promising direction.

The core idea of an EPS is to provide both the forecast and its uncertainty information with the aid of ensemble runs. This is achieved through several steps which are repeated as soon as new data enters the system. Algorithm 7 describes these steps.

Algorithm 7Ensemble prediction system steps

1: Assimilation: The initial state of the system is estimated by a certain applicable assimilation method.

2: Perturbation:A perturbation of the initial state is calculated.

3: Ensemble run: An ensemble of runs is launched with each member having slightly different, perturbed initial values computed in the previous step.

4: Forecast and spread:The forecast is calculated from the ensemble run together with its spread which indicates the uncertainty of the predictions.

The main forecast is usually run with the most accurate and rigorous resolution of the model available from the assimilated initial state, while a set of auxiliary runs can be launched with less resolution from slightly perturbed initial values. How these perturba-tions should be properly done, however, is a topic of ongoing research. A schema with the main and auxiliary model evaluations is commonly employed. Thus, for instance, in the European Centre for Medium-Range Weather Forecasts (ECMWF), the main forecast is run together with an ensemble of 50 auxiliary forecasts to accomplish weather prediction

tasks together with the uncertainty quantification. Overall, this is an appealing environ-ment for the parameter estimation process, since a number of runs are generated by the EPS. In this chapter we will discuss the different approaches to how the EPS setting can be exploited to yield the on-line parameter estimation as a side product i.e with almost no extra computational effort.

In document Parameter Estimation of Large-Scale Chaotic Systems (sivua 31-37)