MCMC methods for parameter estimation - The estimation of tuberculosis transmission parameters

Monte Carlo methods are a class of computational techniques which are based on running the simulation repeatedly until the desired amount of the sampled statistics is collected. These methods have become widely used for solving such problems as numerical optimization and integration in different areas of science.

In this work, a special class of Monte Carlo methods, the Markov chain Monte Carlo (MCMC) methods is used.

MCMC methods are a class of algorithms with the aim to sample from probability distributions in order to construct a Markov chain that is assumed to approximate the desired distribution. In other words, the goal of MCMC is to produce a sequence of random samples (α1,α2, ...,αN) whose distribution tends to the posterior distri-bution with the increase of N. In most cases the construction of a Markov chain itself is not as challenging as a determination of the steps that one should take to achieve the convergence to the target distribution with an acceptable error.

The Markov chain term is used because the samples are generated in such a way that every new point in the sequence is independent from every other point except the previous one. Furthermore, it has been shown in Markov chain theory that the resulting sample from the MCMC calculation approaches the target distribution.

One of the most widely used MCMC methods is the random walk Metropolis al-gorithm. The main idea of the Metropolis method is the selection of the samples using a proposal distribution and an accept - reject technique. The key steps of the Metropolis algorithm can be presented as following:

• Initialization step: the starting pointα1 is chosen.

• Generation step: a new candidateαb is produced from the proposal distribution that can depend on the previous point of the chain α_n.

• Selection step: the candidate is accepted with the probability:

β(αn,α) =b min(1, π(α)b π(α_n))

Here π denotes the posterior density. If the point is rejected, one should proceed with the previous point of the chain. The procedure is to be repeated from the step 2 until the desired conditions are satisfied.

Therefore, the algorithm itself is not complicated, but there are some assumptions that need to be pointed out. First, it is assumed that the proposal distribution is symmetric. This means, that the probability of changing from the current point to the proposed one is the same as the changing in the opposite way. However, there are modifications of the Metropolis algorithm in a way that a non-symmetric distribution can be also used. However, in this work, the proposal distribution is assumed to be symmetric.

Second, one of the most critical points of the algorithm is how to select the proposal distribution. At least two items should be covered: its relative ease in sampling from, and similarity with the target distribution. Besides, it can be easily seen after the simulations that inefficient implementation of the algorithm is mainly caused by the proposal distribution which is selected in an inappropriate way: if it is too small, the candidates will be mostly accepted during the third step of the algorithm, but they will be too close to the previous point. This will cause the smaller movement of the chain and as a result, bigger amount of steps will be needed to cover the distribution.

On the other hand, if the proposal distribution is too large, the probability of the rejection will be much bigger and the points will be accepted too rarely.

In this work the proposal distribution is selected to be multivariate Gaussian dis-tribution as examples are made in a multidimensional continuous parameter spaces.

The current point in the sequence provides a center point to the Gaussian proposal distribution. This idea is the base for the slight modification of the Metropolis method called random walk Metropolis algorithm (Solonen et al. [12]).

In order to understand the basics of the random walk Metropolis algorithm the already mentioned model is considered:

y=f(x,α) +

Here, again, the measurement noise is assumed to be Gaussian, independent and identically distributed. Using the formula of posterior density and assuming a uni-form prior (p(α)∼1), the posterior density is obtained in a form:

π(α|y)∼l(y|α)p(α)∼e⁻^2σ¹²^SS(α)

where SS(α) = Pn

i=1[y_i −f(x_i,α)]². Having assumed this, the acceptance ratio which is used in the second step of the Metropolis algorithm can be rewritten:

β(α_n,α) =b min(1, π(α)b

π(α_n)) = min(1, e⁻^2σ¹²^[SS(^α)−SS(α^b ⁿ^)])

Taking into account all the assumptions that were made above and noting that there is a multivariate Gaussian proposal distribution with covariance matrixC and initial point α_n=α₀, one can proceed with the following algorithm:

• Generation step: a new candidate αi is produced from the proposal distribu-tion N(α_n,C) and SS(α)b is computed.

• Selection step: the candidate is accepted with the probability:

β(α_n,α) =b min(1, e⁻^2σ¹²^[SS(^α)−SS(α^b ⁿ^)]) If the candidate is accepted:

– α_n :=αb

– SS(α_n) := SS(α)b

Otherwise, repeat α_n in the chain.

• The procedure is to be repeated from the step 1 until the desired conditions are satisfied.

This version of the Metropolis algorithm is used in the work.

Even in random walk Metropolis algorithm the problem of the choice of the pro-posal distribution remains, but now the covariance matrix is responsible for the size and the shape of the proposal distribution. Fortunately, research has been done concerning this issue and some ways of approximating the covariance matrix were proposed. One approach is based on the linearization of the model. According to it, the covariance matrix can be approximated via the Jacobian matrix J:

C=σ²(J^τJ)⁻¹,

where [J]_ij = ^∂f^(x_∂ⁱ^,α)

j and αis, typically, the value obtained via ML method.

Besides, the scaling is very significant. It was found that for Gaussian targets the scaling factor is s_d= 2.4²/d, whered is dimension of the problem:

C=s_dσ²(J^τJ)⁻¹

The distribution of parameters for the example was also obtained using the random walk Metropolis algorithm, where the covariance matrix was chosen to be either scaled identity matrix or Jacobian-based with the already mentioned scaling factor (see Figures 2,3). It can be seen from the figures, that the MCMC chain obtained using Jacobian covariance matrix is more stable and the acceptance rate here is higher.

(a) The distribution of parameters (b) 2D plot of parameters

Figure 2: The distribution of the parameters (a) and 2D plot (b) with covariance matrix that was approximated via the Jacobian using MCMC.

(a) The distribution of parameters (b) 2D plot of parameters

Figure 3: The distribution of the parameters (a) and 2D plot (b) with a scaled identity covariance matrix using MCMC.

However, often modelling problems are high-dimensional. For example, the number of unknown parameters depends on the numerical problem discretization. These issues can lead to the preclusion of the fixed standard MCMC methods making its adaptive version more preferable.

In document The estimation of tuberculosis transmission parameters using ABC and MCMC methods (sivua 13-18)