Before the Evaluation - Evaluating the User Experience of Interactive Systems in Challenging Ci

102

Figure 33. The process of evaluating the user experience of interactive systems in practice.

Next, the process model is introduced by describing the phases and their content one by one. I discuss questions that need to be considered and decisions that need to be made when designing a user experience evaluation. Almost none of the issues can be considered in isolation, and their effect may extend to other phases and other issues. The content of the process model should be utilized in an iterative manner, especially for the phase before the evaluation, which is highly significant for evaluation processes in general. The content of the model is strongly skewed towards the actions occurring before the evaluation, as this is the phase in which the majority of the work is done.

4.1 B

EFORE THE

E

VALUATION

In terms of user experience evaluation, the most crucial phase in the process is the one occurring before the actual evaluation situation. As already mentioned, this is the phase where practically all major decisions are made.

4.1.1 Study background (1)

Defining and understanding the purpose and aims of the study

Considering user experience, the purpose of an evaluation can vary greatly.

At its simplest, the aim can be to study people’s general attitude towards an interactive system in a project where the core research questions lie elsewhere. At the other extreme, evoking specific user experience(s) may have been the core of the whole project and driven the design. In this kind of design for experiences, it is obviously crucial to evaluate whether the outcome meets the original aims. Thus, the original user experience aims, targets, goals, or whatever they are called in individual projects have a major impact on the evaluation design as well. Regarding commercial projects, i.e., commercial product development, a rather apparent purpose for an evaluation is to find out whether consumers like the product enough to purchase it, and this aim may steer the evaluation. In addition to the core

……………

103

aims of an evaluation, the situation may be made more complex by brand-related aims, for instance.

The case studies presented in this dissertation have not aimed at evoking certain user experience(s), i.e., the design processes have not been design for experiences per se. Nor have there been explicit user experience targets discussed that would have systematically controlled the design or implementation decisions. However, each case has had more general-level objectives that have influenced the design, but more importantly the evaluations in respect to my research. The clearest example of how general-level project objectives affect the user experience evaluation is demonstrated by the EventExplorer case study (IV): We wanted to provide something experiential to the users, and this experientiality objective ultimately led to the creation of a whole new user experience evaluation method. Moreover, in the LightGame case (VII), the more general-level objective of motivating schoolchildren to exercise through untraditional content in physical exercise classes affected the evaluation design: For example, their willingness to move this way again, the pleasantness of exercising compared to usual PE classes, and the compellingness of different elements in the game were inquired about, and these all concern motivation somehow.

Nowadays, many projects within the field of human-technology interaction are multidisciplinary, and partners from outside academia are also involved. Different stakeholders have their own backgrounds, expertise, and agendas for studies. It is crucial to ensure that the people involved are on the same page. The aims and the approaches used to achieve them in different fields must be openly discussed. For a researcher, certain things are self-evident, but for other people—and even for researchers from different fields—they may not. Obviously, this applies the other way around: Practitioners have vital knowledge that people from academia lack.

Thus, when working with other stakeholders, one needs to communicate even the smallest matters with each other to ensure mutual understanding.

Furthermore, if the people responsible for designing the user experience evaluation are new to some key aspects of the forthcoming study, they need to familiarize themselves with these characteristics. By these kinds of aspects, I refer to a domain or user group, for instance. The Dictator case (VI) demonstrates an evaluation within the healthcare domain, and it was important to gain an overview about the nurses’ work routines and their working environment to design the user evaluation properly. However, in the SymbolChat case (III), it was crucial to understand the varying limitations and abilities within the user group of intellectually disabled people. The work started with surveying possible symbol sets that could be used in the system and ended with designing the subjective data collection so at least some data could be gathered from the users themselves, i.e., by utilizing the smiley face cards and very simple questions. Usually, some of

……………

104

the project partners are professionals from the field, and a general understanding of the activity’s environment can be achieved through discussions with them. However, again, it should be noted that self-evident things are easily not communicated, although they would be new and relevant information to people from other fields. This issue needs extra attention to achieve a proper level of common understanding within the project group.

4.1.2 Circumstances (2)

Acknowledging the possibilities, challenges, and limitations in the evaluation Comprehending the characteristics affecting the evaluation is a necessity to design evaluations properly for individual cases. Next, I introduce the issues that need attention in the evaluation circumstances by focusing on system, context, and user group(s).

System (2.1)

The system under evaluation obviously has a major impact on evaluation design. The fundamental purpose of the system itself makes other aspects of user experience irrelevant and others highly relevant. For example, if the purpose of the system is to make the users have fun and purely entertain people, gathering effectiveness-related perceptions from the users is not a top priority. Conversely, the amusement level of a purely work-related system is hardly something that needs to be inquired about from the users.

Although work-related systems have to be pleasant enough to use and should be investigated, it is unlikely that they evoke real joy or a “wow”

from users.

In addition to the purpose of the system under evaluation, its key point(s) affect the evaluation design, i.e., how does the system differ from other systems meant for the same purpose. For example, if there are similar kinds of systems already available, but the system under evaluation provides new techniques for interaction, they should be acknowledged in the subjective data collection. To give some examples, in the MediaCenter case (I), our system enabled more efficient browsing of the electronic program guide for visually impaired users through text-to-speech. In the EventExplorer case (IV), our system was controlled by speech and gesture input—a combination not seen normally on public display applications. Finally, in the Dictator case (VI), our system enabled the nurses to enter patient information into the patient information system by speech without the need for extensive typing. These kinds of special and novel characteristics have to be addressed in the subjective data collection to see whether they are successful and whether the intended speciality of the system has been achieved. The same goes for the system’s main functionalities: If there are some unique or especially important functionalities, user experiences about them should be investigated.

……………

105 Context (2.2)

Without going any further into defining context, I loosely refer to the evaluation situation and environment with that term. Context can have physical, social, and cultural aspects, e.g. (Dey, 2001), as well as domain-related differences. The physical evaluation environment is probably the most obvious matter when talking about context, and it also has a quite strong effect on the evaluation design. All of the case studies presented in this dissertation have dealt with physical evaluation environments outside of laboratories. By evaluating outside of laboratories, it is possible to avoid the artificiality inevitably present in laboratory evaluations and get closer to real-world events—although an evaluation situation can hardly ever truly correspond to spontaneous real-world happenings. The downside of being outside of laboratories is, however, the fact that there are many things in the evaluation situation and environment that cannot be controlled. This is especially the case for evaluations conducted in public environments.

Furthermore, subjective data collection cannot take too much time or effort to complete in evaluations where the participants are not recruited beforehand, such as in evaluations conducted in public environments.

These kinds of evaluations rely on purely voluntary, spontaneous, and even sudden participation. Recruitment beforehand, however, usually tries to attract possible participants over a longer period of time, and I assume that the participation decision made after consideration may result in deeper commitment to the evaluation compared to spontaneous and sudden participation. Opportunistic recruitment at the evaluation scene and the objective to maximize the amount of collected data require that potential participants and respondents are not scared away, and it is important to have as complete data as possible from whoever participates. Hence, questionnaires have to be limited in content to keep them appealing.

In addition, public environments, such as the evaluation locations in the EventExplorer (IV) and EnergySolutions (V) cases, bring up the social aspects of context. The EventExplorer case’s (IV) evaluation was conducted in a library’s main lobby with other people passing by, and the EnergySolutions case’s evaluation at a housing fair also was in the middle of other people almost constantly around. In these kinds of environments, people may be more hesitant to even participate or be hesitant to really throw themselves into the usage for fear of embarrassing themselves in public.

Context effects have to be considered not only when designing an evaluation, but also when interpreting the results: One cannot conclude that people liked the gesture control over speech input in the EventExplorer case (IV). Although gesture control was preferred based on the usage amounts, the reported user experiences on the pleasantness of each input technique were similar. It is necessary to conclude that in this context

……………

106

participants used the gesture control more, but the experiences may have favored speech input even if the context would have been a private room, e.g., and the participants would have used the speech input more. However, as it was implemented, the system in the EventExplorer case (IV) could not have been controlled with speech exclusively. Another extreme of the social aspects of context might be privacy: When conducting evaluations in private contexts like people’s homes as in the MediaCenter case (I) and the SymbolChat case (III) with some participants, one has to pay special attention to respecting their privacy. This may not affect the user experience data collection content, but it is a practical issue that may have influence on the amount and time spent on the scene, for instance. In this kind of potentially intrusive evaluation, it is important not to bother the participants with irrelevant tasks, questions, and so forth, which applies also to work-related evaluations.

Context can be seen as a domain-related matter as well. By this, I mean industry, e.g., which can be further divided into different fields, such as the healthcare domain or the drilling industry. Although a common concern for both of these may be efficiency in general, these domains have differing relevant aspects that need to be taken into account: Speeding up the overall process of getting critical patient information to the next treatment step in healthcare, e.g., and improving the tangibility of drilling a blast hole for drill masters in the drilling industry. Some evaluation contexts can have principle-level restrictions by norms or laws. The school environment, as in the LightGame case (VII), and healthcare domain, as in the Dictator case (VI), are examples of these kinds of contexts. In both of them, e.g., it is highly important to protect the privacy of individuals.

User group(s) (2.3)

When designing a user experience evaluation, one needs to pay attention to possible restrictions and special characteristics within the user group.

There are several properties within users that may affect what can be asked from them, how those things can be asked, and even, what actions can be demanded from the users. A rather obvious example of these kinds of properties is age. When having children as participants, such as in the LightGame case (VII), the questions or the answering scales cannot be too complicated, or when talking about very young children, the data collection cannot necessitate the ability to read or write, i.e., abilities that the children have not acquired yet. These restrictions have a concrete effect on the evaluation design and especially subjective data collection. On the other extreme, when people get older, their operational abilities weaken: Senses, such as vision and hearing, and motor coordination deteriorate.

Furthermore, stereotypically, older people’s technical abilities can be assumed to be lower—although this is constantly changing as technically oriented generations become older. Decreased abilities affect the system

……………

107

design, in particular, but they have to be taken into account while designing the questionnaires or evaluation tasks, for instance.

Another situation where the evaluation design requires extra concern is when disabled people are the user group. Disabilities clearly have a major impact on the system design and the purpose of the system to start with, but the effect on the user experience evaluation design and execution can be considerable as well. In the MediaCenter case (I), e.g., all of the participants had some level of visual impairment, and thus, the subjective data collection had to be in a form suitable for screen-reader usage. In the SymbolChat case (III), however, the participants (except for one individual) had an intellectual disability besides other possible disabilities. This meant that we had to provide data collection material in a form that did not require reading or writing skills, and furthermore, was comprehensible enough for the participants. We asked the questions verbally, and the participants answered by selecting a smiley face card from a set of physical cards operating as the answering scale. Due to the limitations within the user group’s abilities, we were not able to gather that much data from the participants themselves. Thus, we broadened the understanding about the feasibility of our system by asking the personal assistants for feedback from the participants’ viewpoint.

Furthermore, participants’ expertise about the subject under evaluation has to be considered when designing not only the evaluation as a whole but also the content of the requested user expectation or experience items. For example, if the participants are university students and the system under evaluation deals with haptic feedback in drill rigs, such as in the DrillSimulator case (II), there is no point in asking about the detailed functionalities of a drill rig. On a more general level, the assumed level of technical knowledge among the participants has to be kept in mind when designing the subjective data collection content. It is obviously a totally different scenario if the participant set is known beforehand, as in the DrillSimulator case (II), compared to a situation where the participants enroll for the evaluation spontaneously, such as in the EventExplorer (IV) and EnergySolutions (V) cases. Thus, the way that the participants are recruited also makes a difference. The background of the participants is not known in all evaluation cases, but also the number of users participating may be hard to assume. If we would know for sure that we will have 50 persons participating in our study in a public environment evaluation, we would be able to design a wider data collection content and still receive almost complete data from 20 persons, for instance.

It is not unusual that a system and an evaluation have more than one user group. In the SymbolChat case (III), the personal assistants of the actual participants were another user group, as they also used the application themselves: The assistants familiarized themselves with, and at least tried,

……………

108

the SymbolChat in the beginning of the evaluation, but some of them performed the physical usage of the application throughout the evaluation.

The SymbolChat case’s (III) evaluation demanded quite a few resources as it was, and thus, the personal assistants’ subjective experiences were not explicitly investigated. However, they all used the application, and it would have been possible and surely useful to collect user experience data from their perspective as well. The LightGame case’s (VII) Evaluation II was different: Because the teachers were even more clearly another user group, their subjective opinions were gathered. In case there is more than one identifiable user group for a system, they all should be taken into account in all stages of the system development, i.e., in the evaluation as well. For example, in the LightGame case (VII), it is obviously crucial that the teacher is also happy with the system, as she or he would be the one to control the whole system in real-life usage.

4.1.3 Data collection (3)

Designing the data collection and producing the material for it

Data collection is the fundamental purpose of all user evaluations. Next, I discuss the importance as well as the advantages and disadvantages of different types of data and the content of data to be collected.

Subjective data (3.1)

Because user experiences are subjective, the core of data collection in evaluations has to be based on self-reporting by filling in questionnaires or answering interview questions, essentially, when talking about pre-defined systematic subjective data collection. Regarding user experience evaluation, it is worthwhile to gather at least some quantitative data, i.e., subjective ratings reported on a specific scale. This was done in all of the evaluation cases presented in this dissertation. For example, in the MediaCenter case (I), both user expectations and experiences were gathered considering 36 statements (p. 32), while in the DrillSimulator case (II), subjective expectations and experiences were inquired about with four statements about the haptic feedback (p. 39). Depending on the statements or questions used, these kinds of data allow a rather quick general view of the participants’ opinions through simple statistics, such as the median. The power of quantitative data is in the possibility to easily summarize and compare results.

However, quantitative user experience data, i.e., statement-based data, lack an important aspect of user experiences. The data cannot reveal the reasons behind the experiences. This means that quantitative data can tell exactly how pleasant a mobile phone is to use, e.g., without revealing anything about the reasons behind the pleasantness level. This information is crucial especially when there is something wrong: The developers have to know whether the problem is with the physical shape of the device, e.g., or the responsiveness of the touch screen to improve the product.

……………

109

At least some qualitative data should be gathered, then. Interviewing users may provide a rich data set, but conducting thorough interviews with all

In document Evaluating the User Experience of Interactive Systems in Challenging Circumstances (sivua 112-128)