• Ei tuloksia

118

4.3 A

FTER THE

E

VALUATION

What happens after the evaluation is highly dependent on the purpose of the evaluation and the evaluation design. Thus, this part of the evaluation process is discussed on a very general level.

4.3.1 Analysis and conclusions (6)

Analyzing the data and interpreting the results

If not already collected in electronic form, the subjective data, i.e., user expectations and experiences, as well as background information, need to be transcribed into electronic form to enable the usage of computer-based analysis tools. This may seem a minor matter, but this manual labor task should be taken into account in the evaluation design. In fact, a rather significant amount of unnecessary work may be prevented with proper design, i.e., collecting all possible data straight into electronic form.

However, as demonstrated earlier considering the LightGame case (VII) (p.

112), e.g., this is not always reasonable or even possible, and a lot of manual entering of the data may be required.

Choosing the statistical analysis tools, especially the statistical analyses, would be a topic of its own and is out of the scope of this dissertation. Such decisions are limited and directed by several things: the type of data, assuming whether the data are normally distributed, sample size, and so forth. In practice, the statistical expertise of the analyzer or the project team affects the used methods as well, although such restrictions should be solved somehow. The majority of the data analyzed in the research done for this dissertation has been of ordinal scale without the assumption of normal distribution. By this, I refer to subjective user expectations and experiences data, which has been collected mainly with disagree–agree like scales with a varying number of steps. Furthermore, the sample size, i.e., the number of participants has been small apart from a few exceptions.

Thus, the suitable approaches for analysis have been quite limited. Because of ordinal data, the median has been used as the mean number throughout the case studies. Mainly because of such small sample sizes, the results from the case studies presented here are of a descriptive nature. They consist of numerical expectation and experience values combined with other data, i.e., answers to open or interview questions, observation data, and background information, when possible. The analysis has comprised calculating the medians of the subjective numerical data and then reflecting the results with other sources of data.

Depending on the nature of the evaluation and the collected data, statistical analyses may not be necessary or even possible. However, if applicable, such tests can be used to strengthen the results and interpretations.

Conversely, they may reveal details that are not obvious through observing the median values, for instance. In case both expectations and experiences have been gathered and the sample size is reasonable, possible differences

…………

119

between the two valuations can be examined with, e.g., the Wilcoxon Signed Ranks Test. It is a nonparametric test suitable for repeated measurements of ordinal data without the possibility of assuming normal distribution. User expectations and experiences were compared with the test in the EventExplorer case (IV), and this analysis revealed a trend not perceivable in the medians only: Both the expectations and the experiences reached a median of 5, but comparing the answers with the test revealed that some experiences were, in fact, statistically worse than the expectations (p. 61). Furthermore, the Wilcoxon Signed Rank Test was suitable for analyzing the majority of the LightGame case’s (VII) Evaluation II subjective data as well: User experiences from the schoolchildren and the teachers were gathered two times, after the first and the third usage session.

Regarding those individuals who had provided their answers both times, the experiences within the user groups were compared with the test.

However, no statistically significant differences were found in the experiences of either user group.

To utilize the collected background information, one can check whether there are correlations between participants’ reported properties and their expectations or experiences, such as was done, e.g., in the EventExplorer case (IV) (p. 64) and in the LightGame case’s (VII) Evaluation II (Publication VII, p. 482–483). Possible correlations may help to understand certain trends in the answers. However, despite found correlations would be significant, they can be used as the basis for reflective interpretation and discussion only. Without other proof, correlations cannot be used to draw strong conclusions. They are never the basis for claiming that a property causes a specific experience, for instance.

Subjective, non-numerical data, i.e., answers to open or interview questions, or recorded participant comments, can be used to understand, explain, and interpret numerical experience data at their simplest. For example, when the participants of the LightGame case’s (VII) Evaluation II were asked what was the worst in the game, 42 percent answered that nothing was unpleasant after the first session, and similarly, 26 percent after the third session (p. 94). These great proportions alone make it rather unsurprising that the overall liking of the game reached a median 5 out of 5 after both sessions.

If qualitative data allow, they can be further categorized, and the categories can then be reflected with the numerical data, or at least with the results received from the numerical data. These kinds of qualitative data can include very relevant aspects about the participants’ experiences: Even one comment may quickly reveal reasons for certain experiences and obvious development needs. Supportive, objective data, i.e., observation or log data, or video and audio recordings, can be utilized in a similar manner, but those data may even allow the identification of usage patterns and different types of users, which can then be reflected with subjective data.

…………

120

Depending on the purpose of the evaluation, the depth of the analysis and interpretation of the results can differ substantially between cases. The methods and significance of analyzing data and interpreting results can totally differ, e.g., whether the purpose is to conduct a large-scale evaluation for an interactive system as part of methodology development, to validate a user experience questionnaire, and to publish the results in an academic journal, or to rapidly run a first-phase user experience evaluation for a mobile phone prototype and to share the results in an internal company meeting. However, this step should provide answers to the following questions, at least: What are the results? What do the results mean?

What should be done based on the results?

4.3.2 Dissemination (7)

Reporting the results

The last but certainly not least step of an evaluation process is reporting the results. Especially in academia, publications are the end products of the research, and thus, a crucial part of science. However, no matter in what form or to what extent they are presented, the outcome of an evaluation also needs to be disseminated in industry, for instance, to make decisions on possible next steps.

The LightGame case (VII) demonstrates a good example of an iterative process, where the system and evaluation approaches have been developed based on the observations and results received from many evaluations. First, a 10-minute version was designed and implemented (Publication VI, p. 313, Section 5.1) to test the concept in general. The initial version was evaluated with over 60 participants, and their experiences were investigated with statements answered on a scale of happy–neutral–sad smiley faces (Publication VI, p. 314, Section 6.1). Based on the feedback, an extended version for 60-minute physical exercise classes was created (the LightGame case (VII), p. 82–). This complete version was first evaluated with 110 participants, and their experiences were gathered with improved statements, which this time had the answering options “Yes,” “No,” and “I don’t know” (the LightGame case’s (VII) Evaluation I, p. 86). Finally, the content and the story of the game were expanded to investigate its suitability for longer-term usage (the LightGame case’s (VII) Evaluation II, p. 88–). The evaluation had altogether 173 participants who played the game three times. In this most recent evaluation, user experiences from both the schoolchildren and the teachers, who controlled the game and were thus another user group, were checked two times: after the first session and the third session. Based on the challenges detected in the previous questionnaire, the user experience statements’ answering scale was modified to a five-step, disagree-agree like scale.

The purpose of the evaluation has a great impact on how the results should be reported. For example, in the different phases of the LightGame case

…………

121

(VII), the findings were first communicated internally and informally within the project team to move to the next phase and only later prepared for academic dissemination, which resulted in several publications. Thus, the target audience and the message that wants to be communicated for it are extremely important.

To sum up, when reporting the results of a user experience evaluation, one should try to answer the following questions—despite the dissemination forum: What was done in the evaluation? What are the results? What do the results mean? What will be done next, or at least what should be done, based on the results?