• Ei tuloksia

3 C ASE S TUDIES

3.6 Dictator (VI)

74

3.6 D

ICTATOR

(VI)

Recording patient information into patient information systems is a notable part of the work for healthcare professionals, e.g., nurses and doctors.

Patient information entries are done either by manual typing or dictation.

In the latter case, the dictations are further manually transcribed by another person or automatically with speech recognition. To our knowledge, speech recognition is still rarely used in Finnish healthcare. The manual transcription of dictations takes time, and documents easily accumulate and create queues. Thus, there is inefficiency in getting patient information to the next treatment step. To address these issues, we designed and implemented a dictation application based on automatic speech recognition in close collaboration with researchers from the nursing sciences. The application was evaluated with nurses in one of Finland’s university-level hospitals. A sample usage situation can be seen in Figure 23.

The original Publication V is based on this case study.

Figure 23. An evaluation situation in the Dictator case (VI) (© Riitta Danielsson-Ojala).

3.6.1 Objective

The purpose of the case was to enable the speech-based entry of patient information for nurses as a true option for manual typing. Evaluation-wise, the objective was to investigate the potential of the approach in this domain, i.e., what do the nurses expect from it and how well do they receive this kind of functionality as part of their work.

…………

75 3.6.2 System

The system under evaluation in this case consisted of a tablet end-user client, a server, a Lingsoft4 speech recognition engine with medical language model, and an M-Files 5 document management system. The most important part visible to the participants was the end-user client, referred to as “the dictation application” here. Its graphical user interface can be seen in Figure 24. The application has functionality for recording, browsing, listening, and editing audios, as well as browsing and editing recognized texts.

Figure 24. The graphical user interface of the dictation application (Keskinen, Melto, et al., 2013, Figure 1).

Users have their own personal user accounts, and the documents are further organized under patients and days. While recording a dictation, the energy level of the audio signal is visualized. A scrolling timeline with bar visualization is also shown. These bars can be later used to navigate in the audio. After the recording is finished, it is sent to the server and then to the speech recognition engine. When the recognition is ready, the user receives a transcript for the dictation. In case there are multiple possible recognitions for a word, that word is highlighted with red, and the options can be accessed by tapping the word and the desired option from the list that appears. All other words can be edited by tapping the word and then typing the desired word. Because of strict policies and restrictions, as well as an enormous workload, our application was not integrated with the official patient information system. Therefore, the final dictation transcriptions were copied from the document management system into the official

4 http://www.lingsoft.fi/?lang=en

5 http://www.m-files.com/en

…………

76

patient information system using a PC. A more detailed description of the technical solutions can be found in the original publication (Keskinen, Melto, et al., 2013).

The medical language model available for the evaluation was based on doctors’ dictations. We noticed that it was not able to recognize the language and terminology used by the wound care nurses at a sufficient level. The word error rate in our recognition tests before the evaluation was over 20 percent at its best, and the participants would have been required to make too many corrections to the recognized text, in our opinion. Thus, we decided to use the Wizard of Oz approach, where a researcher fixes the most obvious recognition errors in the text before it is made available to the end user, i.e., the actual participant in the case of an evaluation. For the wizard, we have another version of the application, which enables her or him to see the recognized text counterparts for participants’ dictations, edit them, and send them back to the server while making them available to the participants as well.

3.6.3 Challenges

The challenges in this case arose from the context in many aspects. First, evaluations in a work environment demand respect to the fact that the participants are actually working while attending the study. This means that what they are asked or asked to do should be somewhat more beneficial compared to the evaluation of entertaining applications, which are mainly evaluated during people’s leisure time. Second, the healthcare domain is a specified field that requires special knowledge and includes many policies and restrictions related to privacy issues, for instance. Although these are not related to user experience per se, practical limitations are an essential part of any evaluation despite the research topic.

3.6.4 Evaluation

This user experience evaluation was conducted in a hospital environment with two female nurse participants, who made altogether 97 dictations.

User expectations before the usage period and user experiences after the usage were gathered from them with electronic forms.

Context

This case concerned the healthcare domain and work environment, as the application is meant for professionals working in the field. The physical environment of the evaluation was a university-level hospital's outpatient wound clinic, more specifically a reception room where the patients visit and patient information system entries occur. For the evaluation, the participants were given a tablet computer and a headset including a microphone. The computer already available in the room was also used when copying the final dictation transcriptions to the official patient information system.

…………

77

Although the tablet dictation application itself allows mobile usage, due to the participants’ mainly static work environment, mobile usage was not evaluated here. Furthermore, as the tested integrated microphones in tablets did not produce a sufficient audio quality level, we were forced to use a headset, which would have complicated mobile usage as well.

Participants

We had two female nurse participants aged 30 (P1) and 36 (P2) years.

Participant 1 had eight years of work experience in nursing, three years of which at the wound clinic, where she worked one day every second week at the time of the evaluation. Before the evaluation, she wrote all the nursing entries, which she reported to take about 80 to 100 minutes in a work shift.

Participant 2, however, had 13 years of work experience in nursing, eight years at the wound clinic, where she now worked two days a week. She usually dictated the nursing entries, which took her about 60 minutes every work shift.

Both participants reported four different systems into which they dictate or write entries. The entries include field-specific information about the wound properties, treatment products and methods, treatment plans, consultations, and so forth. Both participants reported to make notes for the dictations or text entries. Neither one of the participants had used speech recognition before the evaluation, and only participant 1 had tried a tablet computer a few times beforehand. Before the usage, both participants thought speech recognition could be useful in their work, and they also said that they could dictate during the care situation while treating a patient.

Procedure

The evaluation took three months in total. During this time, participant 1 made 30 dictations, and participant 2 made 67 dictations. The difference in the number of dictations is explained by the participants’ differing work shift amounts at the wound clinic. The procedure of the overall evaluation is presented in Table 12, followed by a description of one dictation cycle.

Due to research permission policies, all communication with the participants was done by the nursing science researchers. Thus, the interview and the introduction of the application before the evaluation were done by them. In addition, if there were any problems in using the application, the participants contacted the nursing science researchers.

…………

78

Evaluation phase Content

Before the usage

Interview of background information and current work practices

• Introduction of the application

• Expectations questionnaire

Usage • Using the application as part of normal work

After the usage

Experiences questionnaires

SUXES + open questions

◦ SUS

Table 12. The evaluation procedure of the Dictator case (VI).

During the actual usage phase of the evaluation, the participants used the application as part of their normal work and dictated everything they would normally enter into the patient information system. Because we were utilizing the Wizard of Oz technique, at this point, a researcher checked the speech recognition results and fixed the most obvious errors. After this, the corrected text was made available for the participant as well. Obviously, the participant was not aware of the wizard, but instead was under the impression that the received text was the result from the speech recognizer.

After receiving the text counterpart, the participant was able to edit it and, e.g., listen to the audios at certain points as necessary. When finished, she copied the final version of the text from the document management system and entered it into the official patient information system.

Subjective data collection

Background information interview. Thorough background information and information about current work practices were gathered with verbal interviews before the actual evaluation usage began. The main observations from these data are listed above in the Participants section. Otherwise, the inquired-about information dealt with matters concerning mainly the field of nursing sciences. The background information requested can be found in Appendix 1 in its entirety.

User expectations and experiences. User expectations and experiences were gathered according to SUXES (Section 2.2.2). In addition to the nine original properties—speed, pleasantness, clarity, error-free use, error-free function, easiness to learn, naturalness, usefulness, and future use—we constructed five statements comparing dictating with no application and the normally used entry practice. These concerned the properties speed, pleasantness, clarity, easiness, and future use. The basic SUXES statements were phrased as “Using the application is pleasant,” for pleasantness, e.g., and the comparative statements were:

• Dictating with the application is faster than with the entry practice I normally use.

…………

79

• Dictating with the application is more pleasant than the entry practice I normally use.

• Dictating with the application is clearer than the entry practice I normally use.

• Dictating with the application is easier than the entry practice I normally use.

• I would rather make the entries with the dictation application in the future than with the entry practice I used before.

Obeying the principles of SUXES, the expectations were reported by giving two values, an acceptable and desired level, and the experiences by giving one value, a perceived level. In addition, the values were given on a seven-step scale ranging from low to high.

To gather more general feedback and development ideas on the application, the experiences questionnaire included the following questions:

• Did using the headset distract you from dictation? (No / Yes / I don’t know)

• If it did distract, how?

• Could you use the headset daily, if it was the prerequisite for using the dictation application? (No / Yes / I don’t know)

• How did introducing speech recognition and the dictation application change your work practices?

• What speech commands are missing from the dictation application, in your opinion?

• What buttons are missing from the dictation application, in your opinion?

• How could the speech recognition or the dictation application be developed?

• Would you like to comment about anything else?

In addition to the SUXES-based expectations and experiences and the open-ended questions above, the nursing science researchers collected subjective usability-related experiences with the System Usability Scale (SUS) (Brooke, 1996), adapting a Finnish translation presented by Vanhala (2005, p. 26). All of the questionnaires were in electronic form, and the participants filled them in using a PC’s web browser.

Supportive, objective data collection

To find possible user patterns and to monitor the system’s functions, system and user events were logged throughout the evaluation. However, due to the low number of dictations per participant combined with the varying

“dictation cases,” analyzing these data was not considered relevant for this dissertation.

…………

80

3.6.5 Outcome and Conclusions

My main responsibility in this evaluation case was to design the collection of subjective data. This was done by gathering user expectations and experiences with mainly statement-based questionnaires. The statement results can be seen in Figure 25.

Figure 25. Median user expectations and experiences in the Dictator case (VI). (A) indicates the statements concerning the mobile dictation application, and (B) indicates the statements comparing the application to the normally used entry practice. Grey boxes represent the expectations (acceptable–desired levels), and black circles represent the

actual experiences (Keskinen, Melto, et al., 2013, Figure 2).

The results show that our participants had rather high expectations about the dictation application, although they saw rather modest fulfillment of the qualities as acceptable: Considering the statements concerning only the dictation application, the median desired level is 6 or 7 on each statement, and the median acceptable level ranges between 3 and 4.5. The high desired levels were met on almost all statements. Error-free functioning was experienced clearly below expectations because of severe issues with the hospital’s wireless Internet connection.

Considering the statements comparing the dictation application and the normally used entry method, the participants had even higher expectations:

The median desired levels ranged only between 6.5 and 7, while the median acceptable levels ranged between 4 and 4.5. The desired level of the statement “Dictating with the application is clearer than with the entry practice I normally use” was perfectly met, but all of the expectations of the other comparative statements were nearly realized as well. These results are even more satisfactory taking into account the fact that the other participant (P1) was not used to dictation at all, so the change in her routines during the evaluation may have easily resulted in more skeptical experiences.

…………

81

The positive attitude towards the application seen in the statement-based results is further strengthened by the responses to the open questions: The participants were now able to check the text much faster, while normally, it could take about a week before a dictation was available in writing. The participants were not even bothered by the headset. In fact, they would have been ready to use it daily, if necessary. The participants could not come up with missing speech commands or buttons, and the only development area they mentioned was better recognition for compound words. The importance of a working Internet connection was mentioned by the other participant, but based on the statement-based results, even the connection problems did not ruin the satisfaction with the application.

As fully automatic speech recognition could not be used in this case, the evaluation was conducted utilizing the Wizard of Oz technique. Although the user experiences are not based on truly existing automatic speech recognition, the results indicate users’ reactions and opinions on an application having a sufficient speech recognition rate, i.e., a rate that would be acceptable in a work environment. Thus, the user experience results demonstrate the potential of speech-based patient information entry as a true option for manual typing.

Beforehand, the challenges in this case arose from the context, i.e., the work environment and healthcare domain. Because the nurses handled real patient information, all communication with them had to be executed by our project partners, i.e., the nursing science researchers who had permission to access the data. Without the nursing science researchers being responsible for the practical execution of the evaluation, it may have been extremely challenging, if not impossible, to receive approval for the research from the hospital district. Disregarding these limitations and other practical issues, such as technical problems, the evaluation was rather straightforward from the user experience point of view.

The questionnaires had to be designed carefully: The content had to be unambiguous and the items “worthwhile” to avoid wasting the nurses’ time.

These are properties of a good questionnaire in every evaluation, naturally, but in this case especially, as we did not have a representative from the field of human-technology interaction present. To investigate the true potential of adopting this kind of application as part of the nurses’ work, statements comparing the application and the current patient information entry practice were created and inquired about in addition to the more familiar SUXES-like statements. The results were mainly extremely positive, and the participants were enthusiastic about the evaluated entry practice. As a possible downside, though, we did not receive any suggestions for improvements aside from the better recognition of compound words, which is a matter that more concerns the language model.

…………

82