• Ei tuloksia

TOOLKITS FOR DEVELOPING SPOKEN DIALOGUE SYSTEMS

In document Pay a bill (sivua 80-83)

APPLICATION

7. TOOLKITS FOR DEVELOPING SPOKEN DIALOGUE SYSTEMS

The developmentof aspoken dialoguesystem is acomplexprocess involvingthe

integrationofthevariouscomponenttechnologiesdescribedinsection4. Itwould

beaformidabletasktobuildandintegratethesecomponentsfromscratch. Fortu-

natelyanumberoftoolkitsandauthoringenvironmentshavebecomeavailablethat

support theconstruction ofspoken dialoguesystems, evenfor those whohaveno

specialistknowledgeofthecomponenttechnologiessuchasspeechrecognitionand

naturallanguageprocessing. Thefollowingaresomeof thedialoguedevelopment

environmentsthatarecurrentlyavailable:

|theGenericDialogueSystemPlatform(CPK,Denmark)

|GULAN - An Integrated System for Teaching Spoken Dialogue Systems Tech-

nology(CCT/KTH)

|the CSLU toolkit (Center for Spoken Language Understanding at the Oregon

GraduateInstituteofScienceandTechnology)

|CUCommunicatorsystem

|theNuanceDevelopers'Toolkit(NuanceCommunications)

|SpeechWorks

|NaturalLanguageSpeechAssistant(NLSA)(UnisysCorporation)

|SpeechMania—: A Dialogue Application Development Toolkit (Philips Speech

Processing)

|theREWARDDialogueplatform

|VocalisSpeechWare—

Therstthreesystemsweredevelopedmainlytosupportacademicresearchandto

support theteachingof spokenlanguagetechnology. TheCPKtoolkit,developed

attheCentreforPersonKommunikationattheUniversityofAalborginDenmark,

hasbeenincorporatedintotheREWARDdialogueplatformandisaccompaniedby

aWeb-basedcourse. Thismaterial,whichincludesdetailsofthedevelopmentplat-

form to beused forimplementation, is currentlynot available publicly. GULAN,

a systemfor teaching spoken dialogue technology, is under development at KTH

(Stockholm)andat LinkopingUniversityandUppsalaUniversity[Sjolander etal.

1998]. Thesystem,whichiscurrentlyinSwedishbut duetobeportedtoEnglish,

is presentlyonly runnable locally. The CSLU toolkit, to be described in greater

detailbelow,is availablefree-of-chargeunder alicenseagreementforeducational,

research, personal, or evaluation purpose. The commercial systemsare available

underarangeoflicenceagreements. Somesystemsareavailableasevaluationver-

sions and others can beobtained at a relatively low cost for academic purposes.

Websiteswithfurtherinformationaboutthesesystems,includingpricing,arelisted

inAppendixB.

A comprehensive descriptionand evaluation of all these systemsis beyond the

scopeofthecurrentsurvey. Togiveaavourofwhatisavailable,oneacademically-

orientedsystem,theCSLUtoolkit,andonecommercialsystem,thePhilipsSpeech-

Fig.17. UsingRADtosimulateanautoattendantatafurniturestore

Mania—system,will beexamined, followedbyabriefoutlineofdesirablefeatures

ofspokendialoguetoolkits.

7.1 TheCSLUtoolkit

TheCSLUtoolkithasbeendevelopedat theCenter forSpokenLanguageUnder-

standing(CSLU) at theOregonGraduate Instituteof Science and Technology to

supportspeech-relatedresearchanddevelopmentactivities[Suttonetal.1998].The

toolkitincludescoretechnologiesforspeechrecognitionandtext-to-speechsynthe-

sis,aswellasagraphically-basedauthoringenvironment(RAD)fordesigningand

implementingspokendialogue systems. This section will focus onlyon RAD.In-

formationaboutothercomponentsoftheCSLUtoolkitcanbefoundattheCSLU

website(seeAppendixB).

A major advantageof the RAD interface is that users are shielded from many

ofthecomplexspecicationprocessesinvolvedin theconstructionofaspokendi-

aloguesystem. Buildingadialoguesysteminvolvesselectingandlinkinggraphical

dialogue objectsinto a nite-statedialogue model, which may include branching

decisions,loops,jumps,andsub-dialogues,asillustratedinFigure17. Eachobject

canbe used for functions such asgenerating prompts, recording and recognizing

speech, andperformingactions. As farasspeechrecognitionisconcerned,the in-

put can bein the form ofsingle words, forwhich atree-based recognizeris used,

or as phrasesor sentences that are specied using a nite state grammar,which

alsoenableskeywordspotting. Thereareadditionalbuilt-infacilitiesfordigitand

alpha-digitrecognition. Thewordsspeciedforrecognitionatagivenstateareau-

tomaticallytranslatedbythesystemintoaphoneticrepresentationcalledWorldbet

usingbuilt-in wordmodels storedin dictionaries. Pronunciationscanalso becus-

tomised using the Worldbet symbols. It is also possible to implement dynamic

recognition,in which case alist ofwordsto berecognized is obtainedfrom some

externalsource, such asaWebpage, and pronunciationmodelsforthe wordsare

generateddynamicallyat run-time. Promptscanbespeciedin textualform and

areoutput usingtheUniversityofEdinburgh'sFestivalTTS(text-to-speech)sys-

tem,ortheycanbepre-recorded,and,withsomeadditionaleort,splicedtogether

at run-time. The useofthesub-dialoguestatespermitsamoremodular dialogue

design, assub-tasks,such aseliciting anaccountnumber,canbeimplemented in

a sub-dialogue that is potentially re-usable. Repair dialogues are a special case

of sub-dialogue. A default repairsub-dialogueis includedthat is activated ifthe

recognitionscore for the user's input falls belowa given threshold, but it is also

relativelyeasyto designandimplementcustomized repairsub-dialogues. Thereis

alsoaspecial dialogueobjectforinsertingpicturesand soundles atappropriate

placesinthedialoguewithouttheneedforcomplexprogrammingcommands. The

listbuilder object simplies the programming of a repetitive series of exchanges,

suchasquestions,answersandhintsin aninteractivelearningprogramme,byal-

lowingtheprogrammertospecifylistsofquestions,answersandhintsin asimple

dialogue box with the system looping through each of the alternativeseither in

serialorrandomorder.Anumberofonlinetutorials,accompaniedbysimpleillus-

trativeexamplesofdialoguesystems,provideanintroductiontothebasicfunctions

ofRAD.

Functions are provided in RAD for voice-based Web access. For example, a

given URL can be accessed and the HTML document read and parsed, relevant

stringscanbeidentied,tagsremoved,and therequiredinformationoutputusing

text-to-speech. Althoughnotdocumentedin thecurrentonlinetutorials,itisalso

relativelysimpletodevelopaninterfacetodatabasesandspreadsheets. Recentlya

naturallanguageprocessingcomponenthasbeendevelopedthatallowsrecognised

strings to be parsed and relevant concepts to be extracted [Kaiser et al. 1999].

Finally,thetoolkitincludesananimatedconversationalagent(BALDI),developed

at University of California at Santa Cruz, which presents visual speech through

facialanimationsynchronisedwithsynthesisedorrecordedspeech.

RAD iscurrentlybeingusedeectivelytoprovideinteractivelanguagelearning

for profoundly deaf children [Cole et al. 1999a] and to provide a practical intro-

ductionto spokendialoguetechnologyforundergraduate students[McTear 1999].

Plans are underway to develop multilingual versions of the toolkit [Cole et al.

1999b].

7.2 SpeechMania—

SpeechMania—,aproductof PhilipsSpeech Processing,is anapplicationdevelop-

mentenvironmentto supportthedevelopmentoftelephone-basedspokendialogue

systems. The software allows people to talk with computers over the phone to

accessinformationservicessuchasrailwayandighttimetables, bankstatements,

and stock exchange quotations, or to engage in transactions such as reserving a

hotel room orreservingseatsfor amoviethroughacallcentre. Thebasicsystem

architectureisshownin Figure18.

Speech recognition

Speech understanding

Dialogue control

Speech output

Caller Concept graph External

database

C++ interface

Word graphs

In document Pay a bill (sivua 80-83)