Known Issues - The Development of a Content Management System for Small-Scale Voice Controlled

The ability of call speech synthesis without user activation has been removed from Google Chrome since version 70 due to API abuse concerns. This causes a slight issue to website design while using Google Chrome as the website cannot automatically speak to the user to let them know they have entered the website, but instead require the user to interact with the website such as clicking something before allowing speech synthesis.

This error does not occur on other browsers but causes a limitation to accessibility of voice-controlled websites on the Chrome browser. (Chrome Platform Status, 2021)

5. Evaluation

IT artifacts can be evaluated by functionality, completeness, consistency, and usability among other quality attributes. A practical evaluation is performed to know if the developed system is more effective than existing solutions and to know if the websites created with the system are suitable for a public release on the internet. This is a case of asocio-technicalevaluation where the benefits and usefulness to the end users are evaluated instead of focusing on the technical and performance aspects of the system.

(Hevner & Chatterjee, 2010, p. 109-111)

A total of 5 participants were interviewed for the evaluation. 3 of the existing participants had no previous experience in creating websites or using CMSs and could be considered the exact target audience of the system. The remaining 2 participants had some experience in website development or CMS usage. As the sample size is small due to limitations of time in writing the thesis, qualitative methods were selected as the most effective way of evaluating the system.

The system was evaluated through qualitative interviews, where the participants were individually given a demonstration of the Editor Interface and the Visitor View and asked questions in a free-form conversation on their opinions about the usability of the system and the potential for this kind of system. The evaluation consisted of a hands-on demonstration of the system with an explanation of the concept and the design of the system. The users were then asked to add a new page to a sample website along with new voice commands, and to test the functionality of a sample website. Both the usability experience of the editor interface as well as the Visitor View were evaluated.

To evaluate the system on a quantitative scale, the participants were asked to fill a System Usability Scale (SUS) form. SUS consists of ten statements evaluating subjective usability. The participants have to agree or disagree to the statements on a Likert scale, which cover different aspects of system usability such as the need for training or system complexity. The answers are then converted to numbers, from which a SUS scale ranging from 0 to 100 is calculated to measure system usability (Brooke, 1995).

According to Sauro (2011), an average SUS score is 68, with anything over 68 considered above average in terms of usability and scores less than 68 considered below average.

Even though the sample size was very small, SUS can be used to get reliable results about the usability of the system (Sauro, 2011).

5.1 Evaluation of the Usability of the Editor Interface

The participants evaluated the editor interface as it was seen in figures 3.3 and 3.4.

The average SUS score for the editor was 73, which states the system is slightly above average in terms of usability.

The following questions were asked as a part of the qualitative interview:

• Does the editor interface look like it is easy to understand?

• Could you create a website with the system based on looking at the user interface?

• How would you add a page to the website with the system?

• How would you add a voice command to a page?

• Are there some parts in the system which you do not understand?

The editor interface was seen as quite simple to use and self-explanatory for the most part, especially when changing the contents of the page. Dashboard was seen as very simple and requiring no changes to its functionality, although one participant suggested moving theAdd a New Pagebutton below the list of the pages on the website, as opposed to above the list it where it was originally placed.

On the Page Editor side, the concept of editing voice commands was seen as slightly more confusing. While the concepts of voice commands and using them in the finished website was clearer to the users, the creation of voice commands could use some clarification.

Figure 3.11 shows the voice command edit toolbox on the Page Editor. While the concept and the different parts of a voice command were seen as clear, the editor form needs further clarification to make the intents very clear, especially if the user does not see an existing voice command in the application as an example.

The termactioncould be changed to something clearer such asvoice command action, the termutterancecould also be changed and the icon for the response could also be changed to indicate better that it is the response spoken by the TTS system.

It was also suggested the controls for voice commands should be moved below the text input field for contents to the bottom of the page, so the user would not have to scroll as much vertically when editing multiple commands on a page.

Some issues and suggestions for future research also arose from the interviews. The ability to change the fonts and styling of a web page’s contents were requested, as well as the ability to give control on how the TTS system speaks the response. One problematic case was seen as if a voice command for a phone number was created, the TTS system might read the number as one large number instead of saying the individual numbers individually.

5.2 Evaluation of a Website Created with the System

The participants evaluated a website created with the system as shown in figure 3.5, focusing on the functionality of voice commands. The average SUS score for the website was 86, meaning the system is well above average in usability.

The following questions were asked as a part of the qualitative interview as well as to get their opinions about voice controls in general:

• How do you feel about using voice commands?

• Do you feel there are any benefits in using voice commands?

• Do you think there are any major downsides to using voice commands?

• Would you personally use voice commands if they were an option?

• Do you feel the website is intuitive to use?

The concept of using voice commands was clear, even though some found them a bit strange at first due to having no previous experience in using them. After a few tries, voice commands were seen as quite intuitive.

It was noted that the website should give clear clues to the end user on things they can say to the website. While visual clues stating the website supports voice commands and examples of possible sentences can be added to the website, they should not be the only thing indicating the support of voice commands. Ideally, the website should start talking first and give the user clear context on using voice commands, possibly asking a question from the user if they want to use voice commands or not.

Another issue requiring visual clues was seen as having the need to notify the user about turning their computer’s sound on if they haven’t done so already. As a web browser most likely does not have access of knowing the volume level of a computer, this can only be communicated to the user visually.

While the speech detection needs some improvements in detection accuracy and restriction of words it can detect, voice-controlled websites were seen as a positive thing and all the participants said they could see themselves using these kinds of websites in the future. As expectations of voice interaction levels were high across participants, future research should aim for developing fully conversational interfaces for websites.

6. Conclusions

The thesis investigated the development of a Content Management System for small-scale voice-controlled websites to make the development of accessible websites easier and faster. The motivation was to improve the accessibility of websites by allowing the users to navigate and control websites using their voice via speech recognition and allowing them to get information back from the website by speech synthesis. This would allow users such as the visually impaired or those unable to use more traditional input methods such as mouse and keyboard to access websites more freely.

The first research question investigated how voice-controlled websites are currently developed. Voice-controlled websites were often found to be developed from scratch without common guidelines or design patterns, requiring custom implementations and a lot of technical knowledge every time a new web application is to be developed.

Technologies typically used for creating voice-controlled web applications include HTML, JavaScript, and Speech-To-Text framework such as Web Speech API or IBM Watson.

Voice controls were found to have potential in both improving accessibility of websites as well as improving user task performance when used simultaneously with a Graphical User Interface, but difficulties in development might prevent more widespread adoption.

Thus, the thesis proposed a Content Management System with built-in support for voice controls as a solution to make the development of voice-controlled websites easier. Content Management Systems allow the users to create websites without much considerable technical expertise, allowing them to change contents to a website through a Graphical User Interface while the system takes care of the technical details of the website.

The second research question investigated the minimum required functionalities of such a CMS with support for developing a voice-controlled website. First, the target use cases and audience were investigated. The system developed in this thesis was targeted towards personal and small business websites, which contain informational but not very

interactional content. Personal and small business websites are used to promote services of individuals and companies, show directions, prices, contact information and the likes of such. These groups were targeted as previous research showed small businesses to be the largest groups often not having their own websites when they could benefit from one. As small business owners and individuals typically do not have considerable resources to develop websites, let alone voice-controlled ones, they are the ones who would benefit the most from using the system.

Previous research showed CMSs to be often used by non-technical personnel with limited computer skills in basic office software, so the proposed system needed to be simple to use and easy to understand. Due to the scale of the thesis, the system was designed to be used by a single user without having the support or safety checking for concurrent editing.

Based on these findings, the minimum required functionalities of a CMS for voice-controlled websites were determined as the ability to display web pages to the visitor, the ability to navigate between pages on the website via a Graphical User Interface, the ability to interact with the website using voice commands to navigate between pages and to ask questions from the website and the ability to edit the contents of the website via a simple GUI.

The third research question investigated the limitations of the CMS described above.

The limitations of the system are in the scale and the abilities of voice interaction, as the base system can only respond to relatively simple voice commands with speech responses. Voice commands can be only used to navigate between pages or to answer questions. At the current state, the system is not able to hold full conversations unlike virtual assistants such as Alexa or Siri, which the users might expect if they are used to such systems.

A technical prototype of the system was developed using TypeScript, React framework and Web Speech API which implemented the required features mentioned above. The system was developed to evaluate the possibilities of a CMS for voice-controlled websites in action. The system was evaluated through qualitative interviews, which found potential in the application, both in voice-controlled websites and combining CMS functionality with support for voice controls. The evaluations showed the system is effective at creating simple voice-controlled websites but should be developed further to allow for more complicated conversations and to allow the changing of visual styles of websites in order to be used in a production environment.

6.1 Limitations of the System and Future Research

The developed system of the version has a number of limitations due to the scope and time limitations of this thesis. Mainly, it lacks the ability to edit the visual styling of the website, the ability to add images to the pages, the ability to change the fonts, sizes or styling of the text contents or the ability to have different layouts for individual pages.

These are all possible options for future development of the system, which might require changing the content storage format of pages from plain text to a markup language, which then gets converted into React website elements on the Visitor View.

The system has been only tested with desktop computers, and needs testing on mobile phones, tablets, and other devices to find out how the layouts work, if Web Speech API is useful outside of desktop browsers and whether voice commands offer performance benefits when used in combination with a touch screen.

The Dashboard on the Editor View is quite underutilised and could include more functionality such as the reorganising of pages, the renaming of pages in the Dashboard as well as Page Editor and information on the website such as visitor statistics.

The system currently lacks login capacity and user access control, meaning anyone who goes to the editor URL could edit the contents of the page. This functionality was not implemented yet as it was not seen as a high priority due to this thesis focusing on the accessibility evaluation of the system. If the system were to put into further use, login functionality, access protection and security issues are the highest priority on the development list. The system was also designed to be used by a single editor user at a time on a local computer, meaning public access to the editor on the internet would require protections for concurrent editing, better edit validation and preferably a history for keeping track of the changes.

The system supports only the English language. Support for other languages would need translation of all the user interface elements, a translation API layer to the editor code allowing the changing of a language on the editor and testing on which languages Web Speech API works with.

Future research should investigate the possible accessibility benefits of the system and focus on how the system integrates with other accessibility software, such as how the system works on a user using a screen reader. Screen readers might cause issues with a website implementing speech synthesis as both systems might interfere with each other, speaking at the same time or causing other kinds of problems. Research should also focus on what kinds of voice commands the users of websites would want and

find useful, and how the creation of these kinds of commands could be enforced and communicated through the design of the Editor View.

In the current version of the system, voice commands are tied to pages, but future versions of the system could benefit from having additional site-wide commands, where navigation commands or help requests could be added. This makes the creation of websites simpler and allows faster creation of new pages as navigation commands do not have to be added individually to every single page.

Future research could extend the scope of the system beyond informational and small websites into more interactive websites such as web stores to find out if voice controls are beneficial in those types of tasks. The system could be extended with third-party plugins to implement web store functionality, with combining the voice commands of the CMS and making them implement actions such as adding an item to a shopping cart.

However, implementing more interactive contents such as web store functionality or logins raises new privacy concerns from voice controls. Especially sites that use passwords or require users to enter their private information are vulnerable if used in public places. Outsiders can listen to people entering their private details to the websites if they are used in plain text (or plain speech) format. Thus, research is needed to develop secure ways of entering information by voice in settings where the users cannot be sure their private information is not heard by outsiders. One solution might be to create functionality where users could enter their private details to a computer beforehand and then make the computer enter the information to the software or website currently used at the given time, but this requires good inter-compatibility of software and standardised information formats.

6.2 Summary

The research questions of this thesis were:

1. How are voice-controlled websites currently developed?

2. What functionalities are required from a Content Management System to enable easier development of voice-controlled websites?

3. What are the limitations of a CMS aimed for development of voice-controlled websites that implements the functions required in research question 2?

The the answers to the research questions are:

1. Voice-controlled websites are often developed from scratch, individually for each specific use case. No existing tools or templates exist yet and development is time-consuming.

2. A system to enable easier development of voice-controlled websites needs to have support for adding web pages and changing the contents of the web pages through a Graphical User Interface as well as interface for adding voice commands to the pages to enable asking questions and navigation between pages using voice.

3. The limitations of such a system are limited interaction patterns, as the system is designed around general use and does not offer as deep customisation and integration to interactive website functionalities such as a web store as a custom-made solution would have. The system also supports only simple conversations between the user and the computer, where the user can make a request by voice and the system can answer to the request using speech synthesis, but it will not remember the previous conversation.

References

Adorf, J. (2013). Web Speech API.

Al-Hawari, M., Al–Yamani, H. & Izwawa, B. (2008). Small Businesses’ Decision to have a Website Saudi Arabia Case Study.International Journal of Industrial and Systems Engineering,2(1), 5.

Barra, S., Carcangiu, A., Carta, S., Podda, A. S. & Riboni, D. (2020). A Voice User Interface for football event tagging applications.Proceedings of the International Conference on Advanced Visual Interfaces, 1–3. https://doi.org/10.1145/3399715.

3399967

Brill, J. (2021).react-speech-recognition. Retrieved June 12, 2021, from https://www.

npmjs.com/package/react-speech-recognition

Brooke, J. (1995). SUS: A quick and dirty usability scale.Usability Eval. Ind.,189. Christianson, C. & Cochran, J. (2009). ASP.NET 3.5 Content Management System

Development[OCLC: 1105777875]. Packt Publishing.

Chrome Platform Status. (2021).Remove SpeechSynthesis.speak without user activation. Retrieved July 14, 2021, from chromestatus.com/feature/5687444770914304 Dasgupta, R. (2018).Voice User Interface Design: Moving from GUI to Mixed Modal

Interaction. https://doi.org/10.1007/978-1-4842-4125-7

Freitas, D. & Kouroupetroglou, G. (2008). Speech Technologies for Blind and Low Vision Persons.Technology and Disability,20, 135–156. https://doi.org/10.3233/TAD-2008-20208

Gautam, R., Akshay, G. J., Dhavan, R., Kumawat, A. & Ajina, A. (2020). Speech oriented virtual restaurant clerk using web speech API and natural language processing.

International Journal of Engineering Research and,V9(5), ĲERTV9IS050684.

https://doi.org/10.17577/ĲERTV9IS050684

Google. (2019). Build a Conversational Action for Google Assistant. Retrieved May 27, 2021, from https://codelabs.developers.google.com/codelabs/actions-1

In document The Development of a Content Management System for Small-Scale Voice Controlled Websites (sivua 51-63)