Experiments - Forecasting Emergency Department arrivals with Facebook Prophet library

The primary experiment in this thesis is to take a portion of history from the data and use the data to train the Prophet model to predict arrivals one day ahead. If implemented in production setting, this predictive horizon would enable ED management to process the forecasts and act upon the information, both of which are required in order to draw any benefit from the provided predictions.

The dataset is split into a training set (the first ~1200 days) and a test set (the last ~400 days). As their name describes, the training set is used to train the model and the test set is to evaluate the model. In machine learning, it is very important to separate the training and test sets, because otherwise, the model would overfit. This means that the model fits a certain dataset "excessively well", and when a new previously unknown dataset is introduced to the model, it performs poorly [27].

Other experiments are hyperparameter tuning and finding the variables which the model thinks are relevant regarding getting better results. These variables are for instance weather, Tampere University Hospital website visits and holidays. These topics will be discussed more extensively in Sections 4.2 and 4.3.

4.1 Data

The dataset contained hourly ED occupancy data from Tampere University Hospital ED Acuta and there are over 35 000 hours of data available from May 2015 to September 2019. In addition to the hourly arrivals, the dataset also contained several independent variables such as national holidays, several weather variables and availability of hospital beds in several local hospitals.

Initially, the dataset needed some preprocessing because it was in hourly and a fairly raw format. For example, NaNs (Not a Number) needed to be replaced with zeros and Boolean values with zeros and ones respectively. In terms of required computing power it is much lighter to train the models when the data is in daily format rather than in an hourly format. For the Prophet model itself, the API requires exactly two columns with specific column names for the data. These columns aredsandy which represent the timestamp and the desired forecasting unit. The dataset didn’t have a timestamp column ready, but

this was quite easy to create using Pandasdataframe.index.

Training the model with the dataset (without any additional variables) and predicting the future with rolling origin (see Section 4.4) takes around 7 minutes with Intel Core i5-7200U CPU. Nonetheless, when additional variables are considered, the computing time grows proportionally. For instance, when all of the variables available were used, the computing time rose to ~2 hours. The source code for this Prophet model implementation can be found from GitHub [28].

4.2 Hyperparameter tuning

In machine learning, hyperparameters are higher-level parameters that are set before training the model and these parameters are tuned based on the features of the dataset [29]. The Prophet model is packed with 16 different hyperparameters. However, only 4 of them are recommended to tune according to the documentation [30]. Prophet’s hyperparameters recommended to tune are listed below.

1. Changepoint prior scale. This parameter determines the trend flexibility and how the trend changes at the trend changepoints.

2. Seasonality prior scale. As well asChangepoint prior scale this determines the flexibility of the seasonality. Bigger values allow the model to fit to larger fluctuation points.

3. Holidays prior scale. This specifies the holiday effects flexibility and it is similar to Seasonality prior scale.

4. Seasonality mode. This parameter defines if the seasonality is either additiveor multiplicative. For instance, if the seasonality grows along the trend, then multi-plicativemode may be considered.

Table 4.1 shows all of the considered hyperparameters for the tuning.

Table 4.1.Different hyperparameter values

There are many ways to optimize these hyperparameters, such as grid search, random search and Bayesian optimization [31]. Prophet has an inbuilt method for the hyperpa-rameter tuning calledparallel cross validationand this method is used in this thesis.

4.3 Variable selection

In time series forecasting, finding the variables which the model considers important re-garding the results, is an interesting task. The following variable groups which are con-sidered are listed in Table 4.3

Table 4.2. Variables and their definitions.

Variable name Definition

Hospital beds available The amount of free hospital beds in health centers around Pirkanmaa.

Weekday E.g Monday.

Month E.g January.

Days around holiday Describes whether a day is a holiday, after a holiday or before a holiday

Holiday name Is working day Weather

Website visits Number of website visits to tays.fi and tays.fi/acuta.

Ekströms visits Number of website visits to tays.fi between 6 PM to midnight divided by rest of the visits.

Google trends Acuta per month normalized

Number of Google searches for "Acuta" normalized for each month.

Number of public events per day

E.g festivals held in Pirkanmaa.

Daily peak occupancy

Overall, there are 12 variable groups and these can be combined in 4096 different ways and this is calculated with the following formula:

combinations=

The result is as it is because all combinations are considered (for instance, one combina-tion is using no variables at all), and the order of the variables is irrelevant.

Trying out all of these combinations would be too exhaustive, so the process of choosing the variables is done in a simplistic way. First, the model is trained with all the variables considered, this gives a reference error value to compare the future results to. After that the model is fitted with 11 variables 12 times in such a way, that one variable is dropped off. Doing this procedure reveals which variables affect the prediction error in either increasing or decreasing manner.

4.4 Model validation

As discussed before, machine learning datasets are usually split into training and test sets. The model is fit to the training set and it is validated on the test set. However, in time series forecasting testing the model once on "fixed origin" can give misleading information on the performance. This is because if the data has outliers, a poor model performs better if a fixed origin is used. Hence one should use "rolling origin" for fitting time series data.

[32]

The idea of rolling origin is very simple and Figure 4.1 does a good job visualizing it.

Initially, the model is fit to the training set which is 15 days as the image shows. Then a prediction is made for the next three days ahead. After that, the model is fit again, but this time the training set is 16 days. This process is repeated until the whole dataset is covered. [32]

Figure 4.1.The basic principle of rolling origin with constant holdout size [32].

While the image represents the procedure efficiently, rolling window usage in this thesis is slightly different. The dataset’s training set size is approximately 1200 days and the test set 400 days. Also, the prediction is made one day ahead, not three as the image shows.

Hence the model is fit 400 times and each time the training set grows by one day.

In document Forecasting Emergency Department arrivals with Facebook Prophet library (sivua 14-18)