D ATA S IMULATION P ROCESS - DATA AND METHODOLOGY

3. DATA AND METHODOLOGY

3.1. D ATA S IMULATION P ROCESS

The primary data of this research is an artificially generated dataset that represents real stockmarket data as close as possible. In other words, the data replicate price movements of stockmarket securities with the difference, that the price changes can be manipulated before the generation with parameter setup. The simulation is done using Matlab. The dataset illustrated in table 1 has a total of 5 000 simulated daily stock return time-series that imitate ten years in time. Each simulated time-series has 2520 observations. Artificial data has not been used in many papers in the academic world of finance. Thus, you are reading through a true experimental paper in this area.

35 It is important to emphasize that although the data is artificially simulated, it is carefully done in a practice where it tries to replicate regular stock price time-series as much as possible.

This replication is vital since otherwise, with a high probability, the findings could not be extended to a real-world scenario. Albeit the similarity of the time-series, it is not guaranteed that the findings could be used in a real-life situation. Only future research will show to which degree the findings of this study can be used in a real-world situation.

3.1.1. Simulation parameters

It is essential to monitor the descriptive statistics of the simulated time-series in comparison to real-life stock market returns to ensure similarity with real-world data. The benchmark descriptive statistics are obtained from the study of Strobel et al. (2018). Their paper covers 1774 individual stocks from 18 indices on a timespan from January 1972 to December 2015.

This dataset is an ideal benchmark for this study since their dataset is comprehensive enough and focused on rather new information. Depending on the index, the median of mean annual returns of stocks varies between 4.03 % and 10.58 %, averaging at 7.27 %.

Likewise, the daily volatility varies between 1.81 % and 2,57 %, averaging at 2,08 %.

(Strobel et al., 2018) Since the given numbers are only averages of stocks in a single index, an assumption can be made that the fluctuation between single stock values is much more dispersed. Hence, the volatility parameter (𝜎) is varying between 1 % and 4 %, so that the average daily volatility of all simulated time-series is close to 2.08 %. Also, the average annual return of all simulated time-series should be close to 7.27 %. The dispersion of returns is illustrated further in the empirical results in tables 5 & 6 in the form of minimum and maximum annual returns.

Regarding serial correlation, the parameter setup gets more complicated. Going through papers studying the serial correlation of stock returns, one can find daily, weekly, monthly, or even yearly autocorrelation. Also, serial correlation has been found in other orders than only first-order as well. (Schwartz and Whitcomb, 1977; Fama et al., 1988; Levich et al., 1993; Okunev et al., 2003; Anderson et al., 2012) Creating a complex simulation system with different number orders of serial correlation would lead to wide dispersion and eventually weak significance of the results. Since the most related research by Levich et al.

36 (1993), Okunev et al. (2003) Hong et al. (2015) and Strobel et al. (2018) suggest a first-order autoregressive process, I find it natural to continue from where they left. Thus, the serial correlation in the simulations is fixed firstorder with a varying magnitude between -0.2 to -0.2 identical to the study of Hong et al. (2015). The findings of this study are limited exclusively to similar autoregressive processes. The parameters are further illustrated in table 1.

Table 1. Autoregressive Process Simulation Parameters

Simulations Obs. 𝜎 Order of

autoregression

5000 2520 (10 years)

0.01 - 0.04 1 0.2 - 0.2

The simulated stock price returns are generated as follows. An initial set of 5 000 autoregressive processes are generated in a loop based on the input variables illustrated in table 1. Volatility (𝜎) and magnitude of serial correlation (AR) are assigned as random floating-point numbers between the given minimum and maximum for each variable and each simulation loop separately. A large amount of simulation loops ensures that results are evenly distributed throughout the variability of input parameters. The length of each simulation is set to imitate ten years. The commonly accepted average of business days in a year is 252 days (Sullivan et al., 1999; Bajgrowicz et al., 2011; Zhou et al., 2019). As the findings of Strobel et al. (2018) suggest, a 44-year period is long enough to change market conditions significantly. Also, Levich et al. (1993) argue that in 15 years, the market conditions can change drastically, affecting the magnitude of serial correlation during that timespan. The findings by Anderson et al. (2012) suggest that a ten years period is enough to maintain similar market conditions to some extent regarding serial correlation. Thereby it would be hard to justify a timespan longer than ten years with a fixed amount of serial correlation. A 10-year timespan is selected since it is long enough to enable reliable results but short enough to mimic more natural change of conditions.

37 3.1.2. Descriptive Statistics of Artificial Dataset

After the initial simulation, the autoregressive processes are translated to simulated stock prices with a base value of 100. The prices are calculated by adding or deducting the effect of daily return from the previous price plus a trend factor. The trend factor is a fixed component, which is set in a way that the average return of all the simulation returns would eventually be close to the expected 7.27 %. The optimization is done by running the simulations multiple times with varying trend factor value. By analysing the somewhat linear relationship between the trend factor and average mean annual returns, the trend factor is set to 0.083. Finally, from the simulated stock prices with the added trend, the final stock price return simulations can be calculated. The needed statistics are further calculated from these returns. The descriptive statistics of the simulated stock price returns are introduced in table 2.

Table 2. Descriptive statistics of simulated stock price returns.

𝑟 𝜎 Skewness Kurtosis

0.0723 0.0249 0.2936 4.0820

The descriptive statistics indicate that the simulated stock price returns imitate real stockmarket returns as intended for this study. The mean annual return of the underlying time-series (𝑟) and the volatility (𝜎) reveal, that the simulation came close to the data of Strobel et al. (2018). Also, looking at the other two of the traditional descriptive statistics of Skewness and Kurtosis introduced by Pearson (1905), we can interpret the following. The skewness of the population is slightly positive, meaning that the tail on the positive side is longer than on the negative side. This positive skewness is common for stockmarket returns (Singleton and Wingender, 1986; Peiró, 1998). The assessment made using the Kurtosis, indicates that the distribution of returns is slightly leptokurtic since 4,0820 is above three, which is the value for normal distribution (Pearson, 1905). This is often the case in stock market returns since most of the returns are close to zero making the distribution fat-tailed (Officer, 1972; Cont, 2001; Strobel et al., 2018). Running a regular Anderson-Darling test on the mean returns, I can reject the null-hypothesis for normal distribution of the population

38 with a test statistic of 8.56 (Anderson and Darling, 1952). This rejection of the null-hypothesis is consistent with Pearson's (1905) Skewness and Kurtosis tests as well as the typical outcome of the test for normality for stockmarket returns (Officer, 1972).

To conclude, I am confident to say that the data simulation process was successful based on the set criteria. The simulated stock price returns data seems to imitate real stock market data as intended. From here I will move on to introduce the trading strategies to be analyzed on this simulated dataset.

In document Analyzing market conditions for enhanced performance of variable-length moving average trading rules : an experimental approach with artificial data (sivua 34-38)