• Ei tuloksia

4   RESEARCH METHODS

4.2   Data analysis

Data analysis of this study was performed with SPSS Statistics software (Ver. 22, IBM, Armonk, N.Y.), which is the most commonly used statistical tool in social sciences (Balnaves & Caputi 2001, p. 126). According to Burns and Bush (2010) data analysis has four different functions: to summarize the data by using measures and statistical values, conceptualize the data with words and/or graphics, to communicate the findings by explaining the underlying patterns or dependencies and to make generalizations by relating sample findings to the population (Burns and Bush 2010, p. 461).

As the survey questionnaire included vast amount of data that was not used in this analysis, the author converted to the data matrix into an Excel file and deleting all non-relevant variables. Three different types of statistical anal-ysis were used. Descriptive analysis summarizes the main features and basic findings of this dataset. This allows the author to highlight some of the most general patterns for the subsequent analysis forming a foundation (Vondruska, 1995). Predictive analysis, in turn is used to determine more complex patterns of associations, such as how different predictor variables associated with key de-pended variables, as well as to predict future conditions. By predicting a single dependent variable with several independent variables, the multiple regression

models provide a useful framework to predict future conditions that tend to depend on several different factors (Burns & Bush, 2010). Factor analysis, in turn, can be performed to identify dependent variables that form simple pat-terns on within a more extensive pattern of relationships (Gorsuch, 1983).

4.2.1 Cross-tabulations

Cross-tabulation analysis is used for assessing relationships between two nomi-nal-scaled variables. Cross-tabulation helps the researcher to discover underly-ing relationships between the variables (Burns & Bush, 2010).

A Chi-square (𝒳2) goodness of fit analysis, in turn, examines whether the frequencies of observed variables differ from the null hypothesis in a cross-tabulation. Thus, rejection of null hypothesis implies to non-monotonic associa-tion between the two variables (Burns & Bush, 2010).

4.2.2 Non-parametric tests for differences in means

Non-parametric tests were used in order to be able to perform summary calcu-lations using ordinal data, which do not meet assumptions of parametric statis-tical tests. Kruskall-Wallis test was used to analyze two factors. Firstly, it tested how the airline passengers’ intention to purchase changes regarding different airlines depending on whether they offer direct flights to respondents’ intended final destination. Secondly, it was used to examine differences between age groups and different variables. Kruskall-Wallis –test is an extension of the Wil-coxon-Mann-Whitney (Mann-Whitney) test and it is a one-way analysis of vari-ance. It is a non-parametric method to be used when the significance of more than two variables is tested and there is no need to hypothesize normal distri-bution. In addition, Kruskall-Wallis is well suited for testing scale variables that measure opinions. The null hypothesis in this study for Kruskall-Wallis test is set for .05 meaning that if the p-value is below that, significant difference have been found between the tested variables. (Sawilowsky & Fahoome, 2005.)

Mann-Whitney test is very similar to Kruskall-Wallis test, but it can han-dle only two-level independent variables, and is best suited to comparison of means of two populations (Balnaves & Caputi, 2001). The test compares two population means assuming they are equal having similar shapes regarding the distribution of the values (Balnaves & Caputi 2001, p. 193). In MannWhitney -test ”z” equals the difference between two sample means.

4.2.3 Factor analysis

The purpose of factor analysis is to simplify complex multivariate data to fewer uncorrelated factors to identify structure (e.g. clusters) in the data. The identi-fied clusters, in turn, may help to understand underlying factors contributing to the phenomena of interest. Typically, factor analyses identify three to five dif-ferent factors that explain significant amount of variation in the data. An ideal factor analysis takes place when each variable loads only to one factor, and dif-ferent kinds of factor rotation algorithms can be used to maximize variance

ex-plained by different factors. The author used Oblimin rotation method to enable correlation between the variables (Karma & Komulainen 2002, p. 51). Bartlett’s Test of Sphericity, in turn, was used to find out whether there were a sufficient number of statistically significant correlations among variables used in the Fac-tor analysis (UCLA, 2015). Kaiser-Meyer-Olkin Measure of Sampling Adequacy compares the sum of partial correlation coefficients to the sum of correlation coefficients. It can take values between 0 and 1, values closer to 1 are always better. Values less than 0.6 are not acceptable (UCLA 2015). When examining the generated factors, initial Eigenvalues were used when assessing the variance among the different factors. The more variance the factor accounts, the more significant it is (UCLA 2015).

4.2.4 Linear and logistic regression analysis

The regression analysis intends to explain and predict changes in dependent var-iable (response varvar-iable) with help of one or more independent varvar-iables (ex-planatory variable; Benalves & Peter 2001). The prerequisite for linear regres-sion analysis is that the relationship between dependent and independent vari-ables is linear and that there should not exist any strong correlation between the response variables (i.e. multicollinearity; Karhunen, Rasi, Lepola, Muhli &

Kanniainen 2011, p. 89). Residual errors examine the model’s predictive accuracy, i.e. the greater the difference is between the actual data points and predicted data points, the greater the error is in the prediction (Burns & Bush 2010). Lo-gistic regression analysis, in turn, examines dichotomous outcome variables, such as binary variables (UCLA 2015).

4.2.5 P-value

P-value refers to the probability that an association or difference of a similar scale than observed would occur again in another population even in case that the null hypothesis would be true. P-value can take values between zero and one. The smaller the value, the less likely there will be another sample that would show as large difference as the observed one. In this study, the author used the conventional P-value of 0.05 to assess whether observed associations or differences were significant: values equal or below 0.05 were considered sta-tistically significant.

4.2.6 Scales

In the data analysis the scale from 1 to 10 has been broken down to seven cate-gories when reviewing the likeliness to do something or alternatively agree-ment with something (Table 3).

TABLE 3 Scales for review of likeliness or agreement with the statement.

Score Likeliness Agreement

1 Not at all likely Completely disagree

2 Not likely Disagree

3-4 Somewhat not likely Somewhat disagree

5 Neither likely, nor unlikely Neither agree, nor disagree 6-7 Somewhat likely Somewhat agree

8-9 Likely Agree

10 Completely likely Completely agree

For describing coefficient strength of association, Burn’s and Bush’s (2010, 578) “Rules of Thumb about Correlation Coefficient Size –ranges were used (Table 4).

TABLE 4 Scales for review of strength of association.

Coefficient Range Strength of Association*

± 0.81 to  ±𝟏.𝟎𝟎 Strong

± 0.61 to  ±  𝟎.𝟖𝟎 Moderate

±𝟎.𝟒𝟏 to ±𝟎.𝟔𝟎 Weak

±  𝟎.𝟐𝟏 to ± 0.40 Very weak

±  𝟎.𝟎𝟎 to ± 0.20 None

* Assuming the correlation coefficient is statistically significant.