• Ei tuloksia

2.4 Life-Cycle Theory, Duration Model and Data

2.4.2 Duration Model

Because retirement in Finland tends to be fully absorbing, a duration model was considered an appropriate framework for modelling retirement. This section presents the basics of the duration model. The presentation mainly follows Lancaster (1990), and all equations that are presented are standard.

6 2Hakola (1999) tried to find the best value of k by comparing the likelihood values of regressions that were otherwise identical, except for the value ofk. She found that thekvalue of one was more likely than greater values. Lower values ofkwere not tested.

6 3See Hakola, 1999, for a random effects probit model.

The Basic Model Duration in this essay was defined as years of work after the year 1987.64 1987 was neither a boom nor a bust year. The choice of a neutral year is good for an empirical analysis because the choice of the starting year has distributional implications.

The hazard function gives the instantaneous rate of leaving the state per unit of time period t, given that the individual has not left before. This is demonstrated in equation 14.

θ(t) = lim

dt−→0

P(t≤T < t+dt|T ≥t)

dt , (14)

where duration T is a random realisation.

Applied to retirement, the hazard function gives the probability of retirement, given that the individual has not yet retired.

As there were no prior beliefs about the shape of the baseline hazard, the baseline hazard was kept asflexible as possible.65 Yearly data favoured the use of a piecewise constant hazard function. The piecewise constant model assumes an exponential hazard within the time periods, indicating that the probability of retirement does not change within the years. Yearly observations would not enable detection of probability changes within the years anyway.66 Between the years, the piecewise constant model does not impose any restrictions on the hazard function.

The piecewise constant model is given in the equation group 15.

θ(t) =

Applying this to the model that is estimated, 0 refers to the year 1987, c1 to 1988, c2 to 1989, and so forth. The probability of retiring in a certain year is independent of the time within the year, whereas the probability of retiring between the years is not restricted.

6 4Another possiblity is to arrange the data in such a way that duration is defined as duration at work after a certain age. See Hakola 2000a.

6 5Because most of the retirements in Finland cluster in two age groups (around the age of 60 and the age of 65), the hazard function for the agewise duration is multimodal (see Hakola 2000a).

6 6In practice, the non-parametric Cox model and the piecewise constant/exponential model

The likelihood function for the duration model is given in the equation group 16.

L = Πni=1f(t, a)(1c)×S(t, a)c (16) lnL = Σni=1[(1−c)×lnf(t, a) +c×lnS(t, a)]

= Σni=1[(1−c)×ln(S(t, a)×θ(t, a)) +c×lnS(t, a)]

= Σni=1[((1−c)×lnθ(t, a)) + lnS(t, a)]

= Σni=1[((1−c)×lnθ(t, a))−(H(t, a))], whereH(t, a) =

Z t 0

θi(t, a),

where f is the density function, S is the survival function, cis the censoring indicator,tis the duration,θis the hazard function, H is the integrated hazard, and a is the parameter vector for {a1, a2, ...aM} of the piecewise exponential hazard function.

The density function for those who retire is multiplied by the survival func-tion for those who get censored (do not retire during the sample). The equafunc-tion group also shows how the log likelihood function can be converted into a combi-nation of the hazard and the integrated hazard functions. (See Lancaster 1990.) The last line defines the integrated hazard.67

Competing Risks and Right Censoring Competing risks duration mod-els apply to situations where there are several alternative end states to the transitions. In this paper, there are three types of pensions (disability pen-sion, unemployment pension and old age pension) which are mutually exclusive.

Moreover, one of the ”states” can be right censoring - that is, the exit time of the individual is unknown because, for example, the samplefinishes before the individual retires. (Or the individual ends up in another retirement scheme.)

The multi-state framework is usually thought of consisting of several latent durations - out of which only the shortest is observed. For each duration spell that has ended, the observation gives both the length of the duration and the state to which the individual exited. Competing risks models generally assume the independence of channels. In other words, the existence of one channel does not affect the use of another. The destination-specific transition intensity (in equation 17) is written as the probability of duration ending and the end state

6 7True to its name, the integrated hazard is merely an integral of the hazard function.

as being state k, given that the duration lasted until time t:

θk(t) = lim

dt0

P(t≤T < t+dt, Dk = 1|T ≥t)

dt , (17)

whereDk is the channel specific dummy, equal to one if a specific channel was chosen.

Because the independence of the channels assumption is not considered plau-sible, the substitutability of the channels is also assessed in this essay. Because of the difficulties in implementing models that relax the assumption of the inde-pendence of channels, the substitutability of the channels is assessed separately.

The probability that an individual leaves for destination k at time t is written in equation 18. The probability of survival until time t multiplied by the destination specific hazard is equal to the right hand side of the equation.

Assuming independence of the channels, the joint probability density func-tion is simply a product of the marginal densities (specific to each channel).

Henceforth, the joint probability for the multi-state framework is written in equation 19.

Left Truncation When the data are sampled from a stock, sampling of all individuals is not symmetric. Assume that we are interested in a scheme which has the minimum eligibility age of fifty-five years. We can therefore consider how 54-, 55- and 56-year-old individuals in 1987 would end up in our sample.

The 54-year-old individual is not eligible for our scheme. Hence, he would not be in our sample in 1987, but he could enter in year 1988. The 55-year-old is eligible immediately in 1987. He would therefore enter the sample immediately.

The 56-year-old was already eligible in 1986. If he had already retired then, he would not be in our sample. By contrast, if he did not retire in 1986, he would

late” is easily handled, because we are still observing the whole spell from the left. The problem of the 56-year-old is called left truncation, and it is harder to deal with.

Truncation from the left (sometimes also called left censoring) has distrib-utional implications for the specification of the likelihood function. It actually leads to length-biased sampling.68 Accordingly, longer spells are more likely to be sampled. (the 56-year-old was in our sample only if he remained in the risk group for a longer time period.) Therefore, the likelihood contribution of a left-truncated spell has to be modified. This modified probability density function is given in equation 20.

fc(t) = t×f(t)

E(t) = t×f(t)

Ruf(u)du (20)

An unmodified probability density function is weighted by the spell length. If the spell is long, it is more likely to be sampled. Therefore, the distribution of the spells (f) is multiplied by the spell length (t). This is re-scaled by the expected spell length (E(t)). Hence, truncated spells that are longer than the average get more weight, and the spells that are shorter than the average get less weight in the modified likelihood function.

Proportional Hazard Model with Time-Varying Covariates In a pro-portional hazard model, the effect of the covariates comes through multiplication of the hazard function. For identification, a ”typical individual” is defined to have a baseline hazard function. Hazard functions of the other individuals are then compared with this baseline hazard function. In other words, each hazard function is proportional to the baseline hazard (multiplied by function k). This is given in equation 21. (θ(t, a|x)gives the baseline hazard function.)

θ(t, a, x) =k(x)×θ(t, a|x) (21) The most typical choice for the function k(x) is exp(-xβ). This function fulfills the non-negativity constraint, and is log-linear in the parameters. Cor-respondingly, the exponential of the estimated coefficients produces a hazard ratio with proportional interpretation. It is a proportional increase/decrease in the exit probability of an individual with the specific characteristic, to the exit probability of the individual with the baseline hazard.

6 8Accordingly, the probability that a spell is sampled is proportional to its length.

Time-varying covariates (explanatory variables that change over time) enter the proportional hazard model in the same way as time-constant covariates.

The conditioning, however, is done on the entire path of the covariate up to the specific date. As time-varying covariates are also observed yearly in my data, their time-invariance within each year makes their inclusion in the piecewise constant model straightforward. Both the hazard function and the time-varying covariates (as well as the time invariant covariates) are constant within a year.

Unobserved Heterogeneity Unobserved heterogeneity refers to the unob-served determinants that vary among the individuals (groups of individuals).

The most common reason for unobserved heterogeneity is the omitted variable.69 In duration models, omitted variables bias all estimates (both the estimated du-ration dependence and the coefficients of the other explanatory variables).

Unobserved heterogeneity can be taken into account in the estimations. It is included proportionally to the hazard function - as was the observable hetero-geneity. (See the previous section.) Equation 22 gives the conditional hazard function - conditioned by the unobserved characteristics (v).

θ(t, a|v) =v×θ(t, a) (22) As we can see, the unobservable characteristics simply multiply the hazard func-tion. Using the relationship between the survival function and the integrated hazard, I can write (23).

The proportional hazard (proportional to the unobservables) is inserted into the integrated hazard. The following line is derived from the relation between the hazard, the probability density function and the survival function. To obtain equation 24

6 9Other reasons being errors in recorded duration or errors in the recorded regression

vari-I used the equalityf(u) =−S0(u)and the rules of integration. The conditional survival function (onv) is therefore equal to the unconditional survival function to the power of the unobservable variable. To get the conditional probability density function (in equation 25), I can use again the relationship between the probability density function and the negative derivative of the survival function (f(u) =−S0(u)).

f(t, a|v) = −S0(t, a|v) (25)

= vf(t, a){S(t, a)}v1

Sincev is unobservable, it has to be integrated out of the formula. Denoting by g(v) the probability density function forv, the probability density function fortwith the unobserved heterogeneity is

fuh(t) = Z

0

f(t, a|v)g(v)dv (26)

= Z

0

vf(t, a){S(t, a)}v1g(v)dv

Because of its mathematical convenience, the gamma distribution is often used for the distribution of the unobserved heterogeneity (also in this essay).

The probability density function for the gamma distribution is g(v) =va−1e−v/b

Γ(a)ba . (27)

Its mean is equal to1and the standard error isφ.

Inserting the gamma distribution into equation 26 and integrating, I can derive the survival function for the model with the unobserved heterogeneity70, Suh(t, a) = [1−φln{S(t, a|v)}]1/φ. (28) Hence, the survival function with unobserved heterogeneity is the conditional survival function modified by the mean and the variance of the unobserved heterogeneity distribution.