• Ei tuloksia

Bootstrapping is a statistical method for estimation of nearly any statistic which is based on resampling. Typically, this method involves a straightforward procedure,

but it should be repeated a certain amount of time (usually quite large). This often causes heavy computer calculations.

In case of regression models, the procedure of resampling simply means changing the places of individual cases in a given data set. The idea is based on the following scheme:

• If there is an existing data in a form of x= (x1, x2, ..., xk), y= (y1, y2, ..., yk), the next step is to generate a new data setbx,byby resampling the existing one.

The transformation can be performed by choosing k indices from 1, ..., k by resampling and taking corresponding data points with respect to that indices.

• Create a fit to the model based on the generated in the previous step data.

• Return to the step 1 , until a desired number of estimators is obtained.

Even if the procedure is quite simple, the method has some disadvantages. First, as was mentioned in the beginning, this method involves the repeated calls of the optimization routine. This fact makes it computationally demanding. Second, it is heavily dependent on the success of the optimization step. Moreover, the boot-strapping procedure is inefficient if the number of measurements is small, since the resampled data will often have only a few points making the problem of the pa-rameter estimation ill-posed. However, this situation can be improved using the procedure of bootstrapping with residuals. It has the following algorithm:

• If there is an existing data in a form of x= (x1, x2, ..., xk), y= (y1, y2, ..., yk), the next step is to find the LSQ estimate of the fit αb and then compute the residuals:

resi =yi−f(xi,α)b

• Then, generate a new data set resc , by resampling the existing one. The transformation can be performed by choosing indices from 1, ..., k and taking corresponding data points with respect to that indices.

• Create a new data according to:

yi =f(xi,α) +b resdi

• Use the new data to compute the model fit.

• Return to the step 2 , until a desired number of estimators is obtained.

In the example, the number of the measurements is not big enough to proceed with the ordinary bootstrap procedure. Taking this into account, bootstrapping with residuals was chosen to perform the calculations for the example that is used in this work (see Figure 5). However, only a finite number of distinct combinations of data can be created, which impacts the results for small number of data points.

(a) The distribution of parameters (b) 2D plot of parameters

Figure 5: The comparison between parameter distribution obtained via MCMC and bootstrap method.

3 ABC and likelihood-free methods

3.1 Introduction to ABC methods

Approximate Bayesian computation (ABC) involves a class of computational meth-ods based on Bayesian statistics. These methmeth-ods are also known as likelihood-free methods and they can provide quite satisfactory ways of simulating the problem without using a likelihood . In this chapter, a concept of the original ABC and some extensions of it will be presented.

It is important to point out that the computational issue in a case of an unknown likelihood function l(α|y) will rise considerably. The reasons for such an issue is that the likelihood function can be very demanding to calculate or it is not available in a form as a function of the unknown parameter. Furthermore, one might not

know the normalizing constant of the likelihood.

This situation is common in epidemiology analysis. Specifically, the likelihood is called intractable since it can be presented as uncomputable multidimensional integral. Some solutions were proposed to solve the problem but it was diffi-cult to calibrate them (Cucala et al. [2]). There exist different conditions under which the Bayesian inference faces a partially known likelihood function in a form l(α|y) = l1(α|y)l2(α) where l2 is unknown. In such a case, the exact simulations from its posterior distribution can be impossible. In order to handle the problem, variational Bayes solutions and Laplace evaluations were advanced but all of them need various model simplifications.

The ABC methodology was first mentioned by Rubin [10]. It presents an almost automated resolution of the problems with models that are considered to be in-tractable but still can be simulated from (Marin et al. [6]). Thus, in the paper, Rubin produced the algorithm and described it as a special case of an accept-reject procedure. In this procedure, the parametersαto be estimated are generated from the prior distribution π(α). The parameters will be accepted conditionally on the responses from them being identical to a given sample, that is written as y. Addi-tionally, it was assumed thaty has values from a finite (countable) set D. Thus, the algorithm was presented as :

• Generate α from the prior distribution π(·).

• Generate h from the likelihood f(·|α).

• Return to the step 1 , untilh =y.

• Set αi.

• Repeat the whole process until a desired length of the chain will be obtained.

Initially, ABC was introduced by Tavare et al [15] and then was discussed by Pritchard et al. [9]. The author extended the algorithm for continuous sample spaces. The algorithm reads as follows:

• Generate α from the prior distribution π(·).

• Generate h from the likelihood f(·|α).

• Return to the step 1 , untilρ[S(h)]≤ε.

• Set αi.

• Repeat the whole process until a desired length of the chain will be obtained.

New elements that appear in the algorithm can be listed as:

−S (h) = (S1(h), S2(h), ..., Sn(h)), a vector of summary statistics.

−ε >0,a tolerance value.

−ρ >0, a distance defined on D.

In the Euclidean spaceRn, it is more common to use the Euclidean distance (2-norm distance) in order to obtain the distance between two points:

ρ(x, y)=(

n

X

i=1

|xi−yi |2)12

This distance was used to obtain the distribution of the parameters from the example (see Figure 6).

Figure 6: The comparison between parameter distribution obtained via MCMC and ABC methods with different ε.

The Minkowski distance of order p (p-norm distance) can also be used:

ρ(x, y)=(

n

X

i=1

|xi−yi |p)1p

The main idea of ABC is to produce a good approximation to the posterior distri-bution by using a representative summary statisticS with a small toleranceε. Thus the goal is:

πε(α|y)≈π(α|y)

Unfortunately, this class of methods often suffers from a low acceptance rate due to the rejection procedure. This is one of the main reasons to use this approach in combination with MCMC algorithm.