• Ei tuloksia

6.1 Statistical analysis

6.2.1 Exploring the data

As we see from appendix 1, the 316 individual cases (or car models studied) are fairly equally distributed over the five years: the largest amount of models from 2000 (69), the smallest amount from 1990 and 2010 (59). If we look at the de-scriptives for the year variable, we see that the means of the yearly distributions seem to be steadily growing – from 139.920 hours in the year 1990, to 165.722 hours in 2010 (an increase of about 18%). However, this is something we cannot classify as a trend before we have tested statistical significance of these differ-ences.

Moving on to the tests of normality, we see that the results are not looking prom-ising if we intend to perform an ANOVA analysis. Since the number of cases (N) is quite small, the Shapiro-Wilk test should give a more reliable result, but both the Kolmogorov-Smirnov and Shapiro-Wilk tests indicate that the yearly distribu-tion of cases cannot be said to be normally distributed, certainly not at a 5% sig-nificance level (the null hypothesis of both tests is a normal distribution). The reason for this is obvious – we see clear evidence of both skewness and kurtosis in the descriptives.

1990 Shapiro-Wilk stat.: 0,877 – DF: 59 – Sig.: 0,000 1995 Shapiro-Wilk stat.: 0,776 – DF: 67 – Sig.: 0,000 2000 Shapiro-Wilk stat.: 0,877 – DF: 69 – Sig.: 0,000 2005 Shapiro-Wilk stat.: 0,911 – DF: 62 – Sig.: 0,000 2010 Shapiro-Wilk stat.: 0,924 – DF: 59 – Sig.: 0,001

This is especially clear, if we look at the collected histograms of the per year dis-tribution in appendix 2. Most of the histograms show peaks at the far left of the figure and a long sparsely populated tale towards the right. In many ways this is only logical. We could see already during the data collection phase that the three low-to-mid price classes contained many cars with similar time statistics. But the fourth, high price class clearly contains fewer cars and more cars of unusual build, something which results in a skewed distribution. What we are seeing in the histograms is probably evidence of this – normal “bulk” production cars form-ing a coherent peak at the left, the more unusual higher price cars formform-ing their own set of peaks at the right.

The box and whisker diagram of the year distribution confirms this idea; the out-liers that we can see towards the top of the chart are exclusively made up of class 4 cars, the high-priced cars.

Returning to appendix 1, we take a look at the tests of homogeneity of variance.

At a 5% level of significance, we can clearly accept the null hypothesis that the variances across the groups (years) are equal, irrelevant whether we look at mean, median or adjusted ditto. The third assumption for ANOVA analysis is thus ful-filled but not the second.

Levene stat. (based on mean): 1,379 – DF1: 4 – DF2: 311 – Sig.; 0,241 Levene stat. (based on median): 1,160 – DF1: 4 – DF2: 311 – Sig.; 0,328

Variable ‘Class’

The variable Class is not of equally great interest to us as the variables year and region. It is interesting mainly as a confirmation of the division of models along the lines that seemed natural at the start of the data gathering phase – but it is per-haps not of academic interest: it would only seem natural that car models of high-er cost involve more assembly work than those of low cost. But nevhigh-ertheless, this is something that can be easily checked from the data at the same time.

The case processing summary in appendix 1 tells us that the high-priced rear wheel drive sedans class is the smallest (N=60) and the low price front-wheel-drive class is that largest (N=96). From the descriptive we can again find evidence of a growing trend: the low price class has the lowest mean repair time (136,8 hours) and the high price class the highest mean repair time (181,948 hours) – a difference of about 33%.

The tests of normality look better for this variable, but unfortunately not good enough. The low and high price classes are cleared for ANOVA analysis (we can perhaps accept the null hypothesis of normal distribution at a 5% level of signifi-cance), but not the midprice classes.

Class 1: Low-Price – Front Wheel Drive – Inline (straight) engine – Sedans Shapiro-Wilk stat.: 0,977 – DF: 96 Sig.: 0,087

Class 2: Mid-Price – Front Wheel Drive – Inline (straight) engine – Sedans Shapiro-Wilk stat.: 0,960 – DF: 64 Sig.: 0,038

Class 3: Mid-High-Price – Front Wheel Drive – V-engine – Sedans Shapiro-Wilk stat.: 0,910 – DF: 96 Sig.: 0,000

Class 4: High-Price – Rear Wheel Drive – V-engine – Sedans Shapiro-Wilk stat.: 0,979 – DF: 60 Sig.: 0,375

We can see evidence of why in appendix 2, the histograms: the midprice in-line class has two peaks, and the midprice v-motor class is strongly skewed to the left.

There is no easy explanation as to why this is so – probably this indicates that the class division is not entirely natural. If the two classes could be combined, the resulting chart might more closely approximate a normal distribution.

From the box and whisker diagram, we can clearly see that the fourth class, the high price class, is distinctly different from the other three with a greater variance and range. The highest outliers also come from this class.

Returning to appendix 1 and the test of homogeneity of variance we see another obstacle to the ANOVA analysis here – at a 5% level of significance we must reject the null hypothesis of equal variance between the groups. Thus two of the ANOVA assumptions are not fulfilled.

Levene stat. (based on mean): 16,592 – DF1: 3 – DF2: 312 – Sig.; 0,000 Levene stat. (based on median): 15,546 – DF1: 3 – DF2: 312 – Sig.; 0,000

Variable ‘Region’

When looking at the variable region, (appendix 1) we see that the Europe-group only has 50 valid cases, as opposed to the Asia and USA groups with over 100 each. This reflects the situation of the chosen review area – the American car fleet is made up mainly by American and Asian car companies. European low-cost sedans are a rare sight in America – when buying a European car the choice often falls on a sports car or a luxury sedan.

The descriptives show us that the Asian and American groups are close to each other in terms of average repair time (Asia 147,961 hours; USA 142,219 hours).

The European cars have a markedly higher mean (179,038). The same thing is true of the groups’ variance (Asia 280,690 – USA 175,842 – Europe 927,081). As a result of this, the test of homogeneity of variance gets a negative result: the group's variances cannot be said to be equal. The American markets bias towards choosing more expensive European cars might be one of the reasons of this dif-ference – as we saw in the previous paragraphs the higher cost cars seem to dis-play higher repair time values.

Levene stat. (based on mean): 42,883 – DF1: 2 – DF2: 313 – Sig.; 0,000 Levene stat. (based on median): 40,942 – DF1: 2 – DF2: 313 – Sig.; 0,000

In terms of normal distribution, only the Europe subgroup can be said to show signs of normality. If we look at the histograms in appendix 2 week we can say that the USA subgroup shows a leptokurtic distribution, with a far too high peak

to match a bell shaped curve. Looking at the box and whisker plot we again see that the Europe subgroup stands out completely in terms of mean and variance.

Asia Shapiro-Wilk stat.: 0,972 – DF: 137 – Sig.: 0,007 USA Shapiro-Wilk stat.: 0,970 – DF: 129 – Sig.: 0,006 Europe Shapiro-Wilk stat.: 0,962 – DF: 50 – Sig.: 0,112