Model for previously published data

3. The model

3.2. Observation models

3.2.3. Model for previously published data

(25)

3.2.3. Model for previously published data

The previously published datasets (appendix C.) of observed mean smolt ages (Χmean age) are fed into the model as observations of the weighted average of smolt ages (µ _{mean age}) in every given river.

The confidence intervals reported by L’Abée-Lund et al. (1989) are also included into the analysis as variation in the observation processes. Thus Χ Smolt mean age (formula 26) can be seen as a normally-distributed variable with an expected value µ Smolt age (formula 27)and standard deviation σ Χ Smolt mean age (formula 28): For rives, where the 95 % confidence interval was not reported and the study rivers σ Χ mean age was given a flat prior between 0 and 1 (formula 29):

(29)

This parameterizing forms a hierarchical link from the previous datasets, into the population dynamics model. Since the µ_r parameter in the smoltification model is assumed to correlate with latitude, the model is able to adjust the peak age of smoltification based on the observed mean age in every given river. If mortality was assumed to be zero, then:

40 4. Results

4.1. Model checking with simulated data

In order to validate that the model works as intended it was run with an artificial 16 year dataset, created by running a fully deterministic version of the model. In the model version used for model checking the relationship between smoltification and latitude (see section 3.3) was ignored and the models for Petersen-Lincoln tag-recapture and historical observations were left out.

The priors for catchabilities in the observation models were assigned with wide beta distributions, with expected value of 0.2. The x-parameter controlling the amount of new-born individuals was given a wide log.normally distributed prior with mode values corresponding to the fixed values used for data simulation (40 and 30), and standard deviations corresponding to 24.7 on the original scale. The idea was to give these key-parameters uninformative priors, so that the model would have to learn them almost only based on the information gained from the observations.

The MCMC simulation using the simulated data was first run with two chains for 5.0x10⁵ iterations per chains, while saving every 1 000^th sample. This took approximately 66 minutes on Windows 7 computer, fitted with 3.4 GHz Intel i7-4770 processor and 32 GB of memory. The chains seemed to be somewhat mixed, but heavily autocorrelated (figure 14).

Figure 14. Convergence diagnostics catchability after 5.0x10⁵ iterations.

The MCMC simulation was re-run with two chains, now for 2.0 x10⁷ iterations per chain while saving every 20 000^th sample, resulting in a sample size of 1000 iterations. This took approximately 45 hours on Windows 7 computer, fitted with 3.4 GHz Intel i7-4770 processor and 32 GB of

memory.

First 5.0 x10⁶ iterations were discarded in order to eliminate the burn-in phase of the simulation.

After this the simulation was deemed as converged, even though the running means of the two chains were not completely converged in some parameters. The simulation seemed to struggle especially with the x - parameter controlling the 0+ parr densities in the river and the catchability (w ) parameters in the electrofishing model (figures 15 and 16).

Figure 15. Convergence diagnostics and posterior density of x - parameter, depicting slightly different running means (center right graph) in the two simulation chains.

Figure 16. Convergence diagnostics and posterior density of w - parameter, depicting slightly different running means (center right graph) in the two simulation chains.

The convergence diagnostics for these parameters also suggested that the chains were still

somewhat autocorrelated (top right graph in figures 15 and 16). This was interpreted as a result of inefficient sampling by the algorithms used by JAGS. Further simulation with higher thinning values would have significantly increased the time needed for simulation. Since the posterior distributions were already centered near the true values used for data creation and the differences between the running means were considered small, further simulation was thought to be

unnecessary.

Figures 17-20 show the prior and the posterior distributions of the target variables. Red lines indicate the true known value of the parameter, used to simulate the data. In the graphs the solid black lines are the mean, and the dashed lines the median values of the posterior distribution (invisible lines indicate that the values are equal).

Figure 17. Prior (blue) and posterior distributions (grey) of smolt trapping catchability. Vertical lines indicate posterior mean (black), posterior median (dashed) and the true value (red).

Figure 18. Prior (blue) and posterior distributions (grey) of electrofishing catchability. Vertical lines indicate posterior mean (black), posterior median (dashed) and the true value (red).

Figure 19 Prior (blue) and posterior distributions (grey) of 0+ parr density. Vertical lines indicate posterior mean (black), posterior median (dashed) and the true value (red).

Figure 20 Prior (blue) and posterior distributions (grey) of σ and τ. Vertical lines indicate posterior mean (black), posterior median (dashed) and the true value (red).

Pictures 21 and 22 show box-plot graphs for the posterior distributions of the total amount of smolts and parrs. The red line indicates the true amounts obtained from the deterministic model. Dots are simulated values grated than 3/2 times the upper quartile, interpreted as outliners by R’s boxplot

47 function. These values are not actual outliner.

Figure 21. Posterior distributions of total size of the parr stock at the beginning of each year. Red line indicates the true value obtained from the deterministic model. Dots represent simulated values 3/2 times higher than the upper quartile, interpreted as outliners by the plotting function. The whiskers represent lowest and highest values, excluding outliners.

Figure 22. Posterior distributions of total yearly smolt run. Red line indicates the true value obtained from the deterministic model. Dots represent simulated values 3/2 times higher than the upper quartile, interpreted as outliners by the plotting function. The whiskers represent lowest and highest values, excluding outliners.

4.2. Analysis of the River Pirita and River Ingarskila datasets

The trout stocks of the study rivers was simulated for a 16 year period (1999-2014) with

electrofishing data from 2005 to 2013 for River Pirita and 2006 to 2013 for River Ingarskila, and smolt trapping data from 2009 to 2013 for River Pirita, and 2012 to 2013 for River Ingarskila. The

trout stocks of the 41 rivers providing the previously published observations were simulated for a 16 year period, without electrofishing and smolt trapping observations.

The River Pirita fyke net smolt catch of 2014 was omitted from the analysis. In order to assess the predictive performance of the model the simulated observations for 2014 were compared with the observed true data.

In order to help the reader understand the connections between different variables and better interpret the results presented in the following chapters, a graphical summary of the model is provided in appendix A.

4.2.1. MCMC simulation

The MCMC simulation was run with two chains for 1.0 x10⁶ iterations while saving every 1 000^th sample, resulting in a sample size of 1 000 iterations per chain. This took approximately 96 hours on Windows 7 computer, fitted with 3.4 GHz Intel i7-4770 processor and 32 GB of memory. A burn-in phase was evident in some parameters (bottom graphs of figures 23 and 24), because of which the first 40 % of the iterations were discarded, resulting in a sample of 600 iterations per chains.

Figure 23. Simulation diagnostics of parameter alpha_pii_Pirita.

Figure 24. Simulation diagnostics of alpha_p.

51 Figure 25. Simulation diagnostics of peak_p_Pirita.

4.3.2. Posterior densities

Figures 26 and 27 show the posterior densities for the estimated yearly amounts of brown trout smolts migrating downwards from both study rivers. The orange lines mark the credible intervals of the estimates (Gelman et al. 2014). The smolt run estimates in the beginning of the series (1999-2000) were extremely high in River Pirita and extremely low in River Ingarskila, due to the high uncertainty. The highest simulated value in the time series was 119092for in both rivers.

In River Pirita between the years 2001 and 2011 the estimates for the smolt run are somewhat leveled between 300 and 400 individuals per year. Between 2011 and 2014 the numbers of downward migrating smolts to drop between 200 and 100 individuals.

In River Ingarskila the trend seems to be reversed at first, with initial smolt numbers staying between 200 and just under 100 individuals. Between 2001 and 2003 there is very high uncertainty in the smolt run estimates. Between 2004 and 2010 the estimates stabilize between circa 400 and 300 individuals per year. In 2011 and 2012 there is a sharp drop, with 95 % intervals ranging

between circa 150 and almost 0 individuals. In 2013 and 2014 the number smolts is estimated to be around 100 and 200 individuals.

Figure 26. 200 draws from the posterior density of yearly smolt run in River Pirita.

Figure 27. 200 draws from the posterior density of yearly smolt run in River Ingarskila.

Figure 28 show the posterior distribution for the density 0+ trout parrs per 100 m² in both study rivers. The yearly mean values of 0+ parr density varied between 22.14and 48 individuals per 100 m² in River Pirita and between 22 and 48 in River Ingarskila.

Figure 28. Posterior distribution of yearly 0+ trout parrs in the autumn period in River Pirita and Ingarskila.

Figure 29 shows the prior (red lines) and posterior (blue lines) densities of ρ, the variable describing trout’s affinity to migrate to sea at certain age in both study rivers, in the absence of mortality. In river Ingarskila the highest probability seems to be concentrated around age 3, and around age 4 in River Pirita. There is visibly more uncertainty remaining in the posterior distribution of River Ingarskila.

Figure 29.Prior and posterior distributions for age-specific probability of smoltification (ρ) in the absence of mortality in study rivers.

Figure 30 shows the prior (red lines) and posterior (blue lines) densities of π, the variable used for estimating age-specific yearly survival probabilities. In river Pirita the probabilities are concentrated around 0.2 for newborn parrs, with relatively small uncertainty. Yearly survival probabilities rise as age increases and are increasingly more uncertain after age 3. In river Ingarskila the probability estimates are more uncertain for age 0 parrs and the slope of the logistic regression lines seem to be steeper than in River Pirita. For fish aged 6 and higher the probability estimates seem to approach 1.

Figure 30.Prior and posterior distributions for age-specific yearly survival probability estimate (π) in the absence of smoltification in both study rivers.

Figures 31 and 32 show the resulting combinations of π and ρ parameters, used to calculate the actual estimates for survival and smoltification probabilities in formulas 5 and 7. In river Ingarskila the highest probability of smoltification seems to be concentrated around age 3, and around age 4 in River Pirita. There is visibly more uncertainty remaining in the posterior distribution of River Ingarskila. Both posteriors resemble the posterior distributions ρ.

Figure 31. Posterior distributions for age-specific probability of smoltification (ρπ^1/4) in study rivers.

The age-specific survival probability P(survival) is the joint probability of survival and not-smoltifying. In other words it can be interpreted also as the probability of staying in the river another year. In River Pirita there is very little uncertainty for ages 1 to 3 and a very clear drop at age 3, after which probability increases. In River Ingarskila the posterior distribution of P(survival) is more leveled and contains more uncertainty at all ages. Posterior distributions in both rivers resemble an inverted version of P(smolt).

Figure 32. Posterior distributions for age-specific probability of survival ((1-ρ)π^1/4) in study rivers.

Figures 33 – 35 represent the prior (red lines) and posterior (blue lines) densities of parameter µ, σ and τ in study rivers. Parameter µ, which determines the location of the probability peak in

parameter ρ, seems to have stabilized around age 3 and 3.5 in River Pirita and around age 4 and 4.5 in River Ingarskila. The posterior distribution is much narrower in River Pirita.

Figure 33. Prior (red line) and posterior (blue line) distributions for parameter µ study rivers.

Parameter σ determines the range of ages, where individual trout can become migratory. In River Pirita the posterior distribution σ is quite narrow and centered approximately around 1.5. In River Ingarskila the distribution is significantly wider, indicating much higher uncertainty. The peak of the posterior distribution is located near 3.

Figure 34. Prior (red line) and posterior (blue line) distributions for parameter σ study rivers.

Parameter τ controls maximum height of parameter ρ and on a population scale it can be interpreted as the proportion of population that migrates out to sea, in the absence of mortality. In River Pirita posterior distribution of τ is located close to 1 and centered around 0.95. In River Ingarskila the posterior distribution is much wider and contains more uncertainty. Posterior mean value is close to 0.45.

Figure 35 Prior (red line) and posterior (blue line) distributions for parameter τ study rivers.

Figures 36 and 37 show posterior (blue lines) and prior (dashed red lines) of parameter απ and βπ in study rivers respectively. Parameter απ determines the intercept term for the logistic regression curve used to calculate the values for yearly probability estimate for survival (π). Parameter βπ determines the slope of the curve. The posterior distribution απ is centered approximately around -1 in River Pirita and around 0 in River Ingarskila. Posterior distribution of απ contains more

uncertainty in River Ingarskila, than in River Pirita. Posterior distributions of parameter βπ contain similar amounts uncertainty in both rivers. River Pirita’s posterior is slightly higher in density and its mean value is smaller than the posterior mean in River Ingarskila.

Figure 36. Prior (red line) and posterior (blue line) distributions for parameter απ study rivers.

Figure 37. Prior (red line) and posterior (blue line) distributions for parameter βπ study rivers

Figures 38 and 39 show the posterior densities of the catchability parameters in both observation models. Catchability in smolt trapping experiments (figure 38) seems to vary between 0 and 0.5 in both rivers throughout the time series.

Electrofishing catchability estimates vary between fish ages in both rivers (figure 39). In River Pirita there is a clear notch in catchability in age 6.This is also seen in River Ingarskila at age 2 and 8. Total range of catchability estimates in both rivers is between almost 1 and just under 0.1.

Figure 38. Posterior distributions of yearly catchability estimates in smolt trapping experiments.

Figure 39. Posterior distributions of age-specific catchability estimates in electrofishing experiments. Dashed red line indicates the posterior mean.

4.2.3. Model fit and predictive performance

The probabilities of two electrofishing catchability models was attempted to calculate using Bayesian model averaging (Gelman et al. 2014). The MCMC sample of posterior model

probabilities, produced by JAGS sampling algorithms, consisted of only probability of 1 for model 1 and probability of 0 for model 2.

Model’s ability to fit the data was assessed using Bayesian p-values (Gelman et al. 2014). The step function used to calculate the p-values was specified so that it would return 1, when the observation simulated by the model was larger than the true observation. This results in p-values larger than 0.5,

when the model overestimates the amount of fish caught in given sampling procedure, and vice versa. Mean p-values of exactly 0.5 best possible model fit (Gelman et al. 2014).

Bayesian p-values for both types of data collection in rivers Ingarskila and Pirita are plotted in figures 40-43, using a “caterpillar plot”. The red dots connected by a dashed line indicate the arithmetic mean of yearly p-values and the blue dots indicate median values. Figure 44 shows the model’s prediction for the amount of smolts to be caught in River Pirita spring 2014. The orange lines indicate the 95 % credible intervals of the prediction and the dashed red line indicate the true observed amount of individuals.

Bayesian p-values were mostly higher than 0.8 in all observations in both rivers. The mean and median values for the predicted amount of smolts captured in River Pirita 2014 were 4.53and 3 individuals. Maximum simulated value was 51. True observed number of individuals was 52 (orange line in figure 44).

Figure 40. Bayesian p-values for yearly smolt trapping catch in river Pirita.

Figure 41. Bayesian p-values for yearly smolt trapping catch in river Ingarskila.

Figure 42. Bayesian p-values for yearly electrofishing catch in river Pirita.

Figure 43. Bayesian p-values for yearly electrofishing catch in river Ingarskila.

Figure 44. Boxplot graph depicting the model’s prediction for River Pirita smolt catch of spring 2014 and the true observed value (dashed red line). Dots represent simulated values 3/2 times higher than the upper quartile, interpreted as outliners by the plotting function. The whiskers represent lowest and highest values, excluding outliners.

In order to study the model’s ability to learn the population dynamics of the trout population, the age-distribution of smolts caught in spring 2014 in River Pirita was predicted in the simulation.

Figure 45 shows the true age-distribution of the smolts caught in spring 2014 and the model’s prediction for it, without observing the actual data.

Figure 45. Observed and predicted age-distribution of trout smolts caught in River Pirita smolt trapping spring 2014.

4.2.3. Analysis of previously published data

Information about the mean age of downward migrating trout smolts was searched from literature and used as data for linking the smoltification model to latitudinal variation. The data and

publications used in the analysis are compiled in appendix C.

Figure 46 shows the reported mean age observations (grey dots), the model’s estimates for mean age of smolts, given possible observation error (red dots), and the posterior distribution parameter µ plotted against latitude. Peak age (µ) is the parameter defining the most probable age of

smoltification for trout parrs in different rivers, in the absence of mortality (see section 3.1.2. and formula 15).

The observations and mean age estimates seem to overlap in the Norwegian rivers and the two study rivers, but somewhat differ in the eight Polish and Lithuanian rivers. This is due to the

unknown sample size in these rivers (see formula 29). The values of parameter µare higher than the mean ages of smolts reported in literature (appendix C).

Figure 46. Posterior distributions of µ in relation to latitude and the mean ages of smolts derived from previous publications.

72 5. Discussion

5.1. Model checking

According to the posterior distribution diagnostics of the model checking simulation, the model seemed to function as intended and was able to learn the true values of the model parameters from the simulated dataset. Significant update of information is evident, when comparing the prior and the posterior distributions in figures 17 - 20. Simulation of the model was relatively slow and even after using thinning values as high as 20 000, some autocorrelation was detected in the simulation chains.

It is noteworthy that the model used for model checking was simpler than the model used for analysis, and most importantly the model included only one river, whereas the model used for data-analysis included a total of 43 rivers. This increases the amount of computation needed 43-fold for some parameters in the data-analysis model, meaning that obtaining similar MCMC samples from the posterior distributions, could require 43 times more computation time on a similar computer.

5.2. Posterior densities

5.2.1. Population dynamics parameters

The posterior densities for parameters µ, σ and τ in the study rivers show an update from prior distribution (figures 33 -35). Posteriors for River Pirita have visibly less uncertainty in River Pirita, than in River Ingarskila and are seemingly less affected by the prior distribution. The same is true for the posterior distributions of parameter ρ and the P(smolt) (joint probability ρ and the 4^th root π).

These results correspond to the larger dataset in River Pirita. In river Ingarskila the highest

probability of smoltification seems to be concentrated around age 4, whereas trouts in River Pirita seem to be most prone to migration at ages 3. The amount of uncertainty in the posterior

distribution is however much higher in river Ingarskila. When comparing the posteriors of P(smolt) and ρ it seems that the highest point in the curves is lower in P(smolt) than in ρ (figures 29 and 31).

This is expected as the effect of mortality is included in the values of P(smolt).

These results suggest that the model functioned as intended, and was able to learn these parameters from the dataset.

Parameters απ and βπ controlling the survival probability model seem more uncertain than the parameters of the smoltification model (figures 36 and 37). Especially the posteriors of the slope parameter βπ seem to be highly affected by the prior used. This is probably due to lower uncertainty assigned for the prior distribution that was needed to maintain the positive correlation between fish age and survival (see section 3.1.1.). The intercept parameter απ was assigned with a prior

containing much more uncertainty and a significant update is evident in the posterior distributions of both rivers. The logistic survival probability curve (π) created by the two parameters also show an update from prior distribution, but the two study rivers seem to differ from each other (figure

In document A Bayesian multiriver stock assessment model for describing the juvenile phase of anadromous brown trout (Salmo trutta L.) (sivua 40-0)