• Ei tuloksia

ORDER

In document Pay a bill (sivua 48-60)

'yes' 'do you wish to continue' 'no' EXIT

CHECK MEMBER

CANCEL OVERVIEW

ORDER

Fig.11. Dialoguegraphforanautomaticbookservice

systemprogressesthroughaseriesofstates,withthetransitionsbetweenstatesbe-

ingdeterminedbytheuser'sresponses. Therearevariouschoicepointsandloops,

aswellas subdialogues(for example, forthe tasksCHECKMEMBER, ORDER,

CANCEL,andOVERVIEW).Furthermore,inthisparticulararchitecture,theuser

canusethekeywordsrepeatandchange,torequestrepetitionofthesystem'soutput

andtochangeapreviouslyacceptedparameter.

5.1.2 Advantages of nite state models. A major advantageof thenite state

modelisitssimplicity. Fromadeveloper'sperspectivestatetransitionnetworksare

particularly suitable for modelling dialogue ow in a well-structured task involv-

ing information to be exchanged in a pre-determined sequence, with the system

retaining control over the dialogue and deciding which question to ask next. In

thiswaythesemanticsofthesystemisclearandintuitive. Moreover,astheuser's

responses arerestricted, fewertechnologicaldemands areput onthesystem com-

ponents,particularlythespeechrecogniser. Thelackofexibilityandnaturalness

may be justied as a trade-o against these technological demands. For these

reasonsmostcurrentlyavailable commercialsystemsusesomeform ofnite-state

dialoguemodelling.

It is interestingto note that there is somesupport in empirical studiesfor the

useofstate-baseddialoguecontrol. HoneandBaber[1995]examinedtherelation-

shipbetweendialoguecontrolandtransactiontimes,ndingthatmoreconstrained

dialoguesthat employedamenu-likeinteractionstylewith yes/no conrmationof

alluserinputtendedtoresultin dialogueswithlongertransactiontimes,aswould

be expected. However, this eect depended on the system's level of recognition

accuracy,whichwasmanipulatedin theexperiments. It wasfoundthat therewas

agreaterlikelihoodoferrorsinthelessconstrainedsystemasitpermittedalarger

activerecognitionvocabulary.

Inanother study twoversions of asimplecallassistance application werebuilt

[Potjeretal.1996].Thesystem-ledversionusedisolatedwordrecognitionandword

spotting,whilethemixed-initiativeversionusedcontinuousspeechrecognitionand

more complex natural language processing. In the system-led version the user

was prompted for the required servicein twosteps, while in themixed-initiative

version the user could request the service in a single utterance. The minimum

numberofturnspertransactionwaslowerforthemixed-initiativesystem,although

moreadditionalturnswererequiredforthemixed-initiativesystemonaccountof

the greater number of recognitionerrors. Thus the system-ledinterface wasnot

slower than its mixed-initiative counterpart. Moreover, a subjective analysis of

usersatisfactionindicatedthatuserswere satisedwithbothversions.

Similar results were found in a study involving train timetable information in

whichitwasfoundthatforsimpleservicesasystem-drivendialogueusingisolated

word recognition achieved good user acceptance [Billi et al. 1996]. This nding

was supported in a study of dialogue strategies comparing explicit and implicit

recoveryfromcommunicationbreakdowns[DanieliandGerbino1995]. Theversion

incorporatingexplicitconrmationand repair,which madegreateruseofisolated

word recognition and spelling, was found to be robust and safe, even though it

increasedthenumberofturnsrequiredtocompletethetransaction. Theconclusion

fromthesestudiesisthatsystem-leddialogueusingstatetransitionswouldappear

tobesuitableforsimpletaskswithaatmenustructureandasmalllistofoptions,

bringingalsotheadvantageoflesscomplexspokenlanguageanddialoguemodelling

technology. Thelackof exibility and naturalness may be justiedas atrade-o

againstthese technologicaldemands.

Asmentionedearlier,statetransitionnetworksareparticularlysuitableformod-

ellingdialogueowinwell-structuredtasks. Theautomaticbookserviceillustrated

inFigure11isagoodexample. Otherexamplesaredirectoryassistance,question-

naires,andtravelinquiries,providedthedialogueisconstrainedtoabasic,system-

ledseriesofquestionstoelicitanumberofwell-denedresponses. Consideringthe

partofadirectoryinquirydialogueinwhichthesystemelicitsthenameoftheper-

sontobecalled:herethesystemhastoidentifyauniqueindividual,whichgenerally

requireselicitingarstandlastname. Thismightbeaccomplishedinasinglestep

-RequestFirstandLastName-orinaseriesofsteps-RequestSurname>Request

Spelling of Surname > Request First Name > Conrm First and Last Name. A

nite state dialogue model could be created for this task with sub-dialogues for

sub-taskssuchas requestingthesurnameandrstname. Additional stateswould

berequiredforcasesofmultipleindividualswiththesamename,variationsonrst

names, and names that are pronounced similarly (homophones) and thus require

spelling to disambiguate. The main characteristic of this task is that there is a

niteandclearlydenedsetofinformationitemstobeexchanged,theinformation

canbeelicitedinanaturalorder,andthetaskmaybedecomposedintoahierarchy

of well-ordered sub-tasks [McTear 1998]. A nite-statemodel could alsobe used

for similarlystructured tasks such as obtainingweatherforecasts,footballscores,

orderingitemsfromacatalogue,ormakingsimplebanktransactions.

Dialoguesforquestionnairesarealsohighlystructuredeventhoughalargenum-

berofquestions may berequiredto elicittherequiredinformation. Forquestion-

nairestheusercanbeconstrainedthroughcarefullydesignedpromptsto produce

anacceptable rangeof responses[Hansen et al.1996]. In alargescaleprojectin-

volvingtheUS Census thedialoguewasimplementedusing anite statenetwork

astheinformationhadtobeelicitedinaxedorder,forexample:Name>Gender

> Birth date > Marital Status, etc., with sub-dialoguesbeing used for the more

complexitems([Coleetal.1997].Finite-statemodelscanbeusedforsimilartasks

suchas eliciting aperson'spersonal detailsfor nancialtransactions orobtaining

information forinsurancequotes. The keycharacteristicofthis class ofdialogues

isthat theyarewell-structured. Eventhoughthere maybeseveralitemsof infor-

mationtobeelicited,thesecanbebrokendownintowell-structuredsub-tasksthat

areindependentofoneanother.

5.1.3 Disadvantages of nite state networks. Finite state dialogue models are

not suitable for modelling less well-structured tasks characterised by sub-tasks

whoseorderisdiÆculttopredict,byinformationmodelledatdierentlevelsofab-

straction,orbycomplexdependenciesbetweenitemsofinformation[Kamm1995].

A good exampleis theFlightReservationSystem ofthe DanishDialogue Project

[Dybkjretal.1998].Althoughthereservationtaskwouldappeartobewellstruc-

tured,asitconsistsofaseriesoforderedsub-tasks,therearecomplexdependencies

in thissystembetweenvarious parameters,forexample,betweendiscountedfares

and ightavailability. As aresultaclientcouldopt foradiscountedfareand go

ontoconrm severalparametersonlytohavetobacktracktoadierentdialogue

pathbecausethedesireddeparturetimewasnotavailableatthediscountedprice.

The keyword \change" can be spoken by the user of this system to correct the

latestpiece of information givento the system,but to correct earlierinformation

"change"hastobeused repeatedlyto causethesystemto backtracksequentially

untiltheitemtobechangedisreached. Thuswhentherearedependenciesbetween

theitemsofinformationtheuseofanite-statedialoguemodelbecomesunwieldy,

leadingto acombinatorialexplosion ofstatesandtransitions.

Finitestatedialogue modelsare inexible. This characteristicis notaproblem

iftheinteractionwiththeuseriscontrolledbythesystemandrestrictedtoawell-

ordered sequenceof questions. However, because thedialogue pathsare specied

in advance, there is no way of managing deviations from these paths. Problems

arise iftheuser needsto correctan item orintroduce someinformationthat was

not foreseen at the time of the design of the dialogue. Adding natural language

facilities, while providing the user with greater exibility in what they can say,

canaddtothese problems. Takingtheexampleofasimpletravelinquirysystem,

anaturalorder for thesystem'squestions might be: destination> origin > date

> time. However, when answering the system's question concerning destination

the user mightreply with a destination as well asthe departure time (or indeed

other combinations ofthe four requiredparameters). Anite-state basedsystem

would simplyprogressthroughitsset ofpredeterminedquestions,ignoringorfail-

ing to process the additional information and then asking an irrelevantquestion

concerningthedeparture time.

Thesolutionto this problem wouldbeto include a dialoguemodel sothat the

system`knows' what it hasalready elicited aswell aswhat has still to be asked.

Thesystem couldthenloop throughthedialogue model until alltherequired in-

formationhasbeenelicited. Inthiswaytheproblemofirrelevantquestionswould

also be avoided. However, the problem is that as soon as the number of items

grows,thenumberoftransitionstocaterforeachrequireddialoguepathgrowsto

unmanageableproportions. Thisproblem isfurther augmentedifadequaterepair

mechanismsaretobeincludedateachnodeforconrmationorclaricationofthe

user's input. Thus it wasestimated that in the Philips system there were about

1,000 system questions. Allowing forexible adaptation to the user'sinput - for

example,inthecasewhereausersaysmorethanthesystemexpectedorprovides

anunanticipatedresponse-giventhatalmostanyquestioncouldfollowalmostany

other-thenetworkwouldrequiretensofthousandsoftransitions[AustandOerder

1995].

Dialogues involvingsome form of negotiation betweensystem and user cannot

be modelled using nite state methods, asthe course of the dialogue cannot be

determinedinadvance. Forexample,planningajourneymayrequirethediscussion

ofconstraintsthat areunknownbyeitherthesystemortheuserat theoutset. In

theseinteractionssomeformofnegotiationanddiscussionofconstraintsisrequired.

For example, in the TRAINS project, to be discussed below, the user and the

systemcollaboratetoconstructanagreedexecutableplanthathastobedeveloped

incrementallyin orderto incorporatenewconstraintsthatarise duringthecourse

ofthedialogue[Allenet al.1995].

5.2 Frame-basedsystems

Ratherthanbuildadialogueaccordingtoapredeterminedsequenceofquestionsto

beasked,aframe-basedsystemtakestheanalogyofaform-llingtaskin whicha

pre-determinedsetofinformationistobegathered. Thisframe(ortemplate)fulls

theroleof adialogue modelthat keepsaccountoftheitemsforwhichthesystem

requires information. Naturally this will alsoinvolvequestions,but the questions

donothavetobeaskedin aparticularsequence. Forexample,inthePhilipstrain

timetablesystem,thequestionsthatthesystemmightaskarelistedtogetherwith

their preconditions{ that is, theconditions under which that questionshould be

asked. Somequestions foratravelsystemmightbe:

condition: unknown(origin) &unknown(destination)

question: \Whichroute doyouwanttotravel?"

condition: unknown(origin)

question: \Wheredoyouwantto travelfrom?"

condition: unknown(destination)

question: \Wheredoyouwantto travelto?"

Givenallthequestionsandtheirpreconditions,whichdonotneedtobestatedin

chronological order,thedialogue control component candecidethenextquestion

to be asked based on those questions whose preconditions are true. If several

questions canbeaskedat aparticular stage in the dialogue, otherfactors canbe

used to choose a questionto be asked. For example,in thePhilips SpeechMania

system, each dialogue action (including the questions) is coded with a keyword

that determines the dialogue action's priority and thus the dialogue ow. Some

examplesofthese keywordsintheirdefaultorderofpriorityare:

ONCE: foranactionthathasnotyetbeenexecuted,forexample,

theinitialgreeting

MULTIPLE: ifmorethanonevaluehasbeenreturnedforavariable,

sothat ambiguityresolutionisrequired

VERIFIABLE: tobeusedifavaluehasnotbeenconrmedbytheuser

UNDEFINED: tobeusedwhennovaluehasbeendened foravariable

andaquestionisrequiredtoelicitthevaluefromtheuser.

Giventhisprioritymechanism,problemsrelatingtowhatisambiguousareresolved

before attempts to verify avalue,which are in turn resolvedbefore questions for

newvalues. Thusasequenceof questions evolvesbasedonthecurrentcontext of

thesystem(what hasbeenaskedso far,whatinformationisambiguous,whathas

tobeconrmed),withouthavingtospecifypredeterminedpathsthroughadialogue

network.

A similar mechanismhas beenused in theCommunicatorsystem developed at

theUniversityofColorado,Boulder[WardandPellom1999]. Thissystemobtains

informationfromtheInternetonairlineights,hotels,andrentalcars. Thedialogue

control isdescribedas `eventdriven', meaningthat the dialoguemanagerdecides

whattodonextbasedonthecurrentsystemcontextratherthanapredetermined

script. Inthiscasethecontextconsistsofthesemanticcontentoftheuser'sinput

together with a template of slots to be lled. On assimilating a parsed user's

utterancewiththedialoguecontext,thesystemdecidesonitsnextactionaccording

toasetofprioritiessimilarto thoseusedinthePhilipssystem:

|Clarifyifnecessary

|Finishifalldone

|Retrievedataandpresentto user

|Promptuserforrequiredinformation

Avariationonframesistheuseofaformconsistingofanumberofslotsforthe

relevantattributesinthedomain. DahlbackandJonsson[1999]describetheiruse

of informationspecicationforms for abus timetable informationsystem. Forms

are also used as the main dialogue items in VoiceXML documents. A form in

VoiceXML consists of eld and control items. A eld gathers information from

the user using speech or DTMF input while control items involve sequences of

proceduralstatementsfor prompting and computation. TheForm Interpretation

Algorithm determines which items in a form to visit depending on the status of

their guard condition. Thus unless a eld variable within a form has the value

undened, that form will not be visited. In a directed form the form items are

executedoncein asequentialorder,resultinginarigid, system-directeddialogue.

A mixed-initiativeform, combinedwith agrammar,enables theuser toinput all

therequireditemsin oneutterance,givingamoreexibledialogue.

Goddeauetal.[1996]discussamorecomplextypeofform,theE-form(electronic

form),whichhasbeenusedinaspokenlanguageinterfacetoadatabaseofclassied

advertisements for used cars. E-forms dier from the types of form and frame

describedsofar,inthattheslotsmayhavedierentprioritiesfordierentusers|

forexample,forsomeusersthecolourofacarmaybemorecriticalthanthemodel

or mileage. Furthermore, information in slots can be related | for example, a

morerecentmodelusuallycostsmore. TheE-formallowsuserstoexploremultiple

combinationstondthecarthatbestsuitstheirpreferences. Thustheselectionof

anappropriatecarisviewedasanoptimisationtaskwhichinvolvesmorethanthe

retrievalofaset ofrecordsfromadatabase. However,itistheuserwhoperforms

thisoptimisation,whereasin aproblem-solvingsystemtheoptimisationwould be

performedbythesystemor,ideally,asaresultof anegotiation dialoguebetween

system and user. The E-form is used to determine the system's next response,

whichisbasedonthecurrentstatusoftheE-form,themostrecentsystemprompt,

andthenumberofitemsreturnedfromthedatabase:

|Ifnorecordsfound,ask usertobemoregeneral

|Iflessthan5recordsfound,considerthesearchcompleteandgeneratearesponse

thatoutputstheretrievedrecords

|Otherwise cycle through an ordered list of prompts choosing the rst prompt

whoseslotin theE-formisempty

|Iftoomanyrecordshavebeenfound andalltheprompt elds havebeenlled,

asktheusertobemorespecic

Other data structuresthat canbeused to control the dialogueare schemas, task

structure graphs,and typehierarchies. Schemasare used in the CarnegieMellon

Communicator system to model more complex tasks than the basic information

retrievaltasksthatuseforms[Constantinidesetal.1998;Rudnickyet al.1999]. A

schemaisastrategyforcompleting agoalinatask-baseddialogue, suchasdeter-

mining anitinerary. Theitinerary is representedas ahierarchicaldata structure

thatisconstructedinteractivelyoverthecourseofthedialogue. Atthesametime

thenodesinthetreearelledwithspecicinformationaboutthetrip. Whilethere

isadefaultsequenceofactionstopopulatethetreethatismaintainedasastack-

based agenda, the user and thesystem can both control this ordering and cause

thefocusofthedialogue toshift(forexample: let's talk aboutthe rstleg(of the

itinerary)again). Taskstructuregraphsprovideasimilarsemanticstructuretothe

E-formandareusedtodeterminethebehaviourofthedialoguecontrolmodule as

wellasthelanguageunderstandingmodule[Wrightetal.1998]. Thegraphdepicts

relationshipsbetween theelementsof acustomer-servicesapplication and is used

toprovideacontextualinterpretationofspokenutterancesinadialogue. Similarly,

typehierarchiescanbeusedto modelthedomain ofadialogue andasabasisfor

claricationquestions[DeneckeandWaibel1997]. Giventhatinformationinatype

hierarchycanbe missing orunderspecied,claricationrequests aregenerated to

enabletheusertoachievetheircommunicativegoal.

Insummary: thereareanumberofdierenttypesofdatastructure,suchasthe

frame,E-form,schema,taskstructure graph,andtypehierarchythat canbeused

to model thestructure of the information required by theuser and to determine

theactionstobetakenbythedialoguesystemtoobtainthis information.

5.2.1 Advantages of frame-based systems. The frame-basedapproach has sev-

eral advantages over the nite-state based approach, for the user as well as the

developer. Asfarastheuserisconcerned,thereisgreaterexibility. Forexample,

there is someevidence that it canbediÆcult to constrain usersto the responses

requiredbythesystem,evenwhenthesystempromptshavebeencarefullydesigned

to do just that [Eckert et al.1995]. The ability to use naturallanguageand the

useofmultipleslotllingenablesthesystemtoprocesstheuser'sover-informative

answersandcorrections. Inthiswaythetransaction timeforthe dialoguecanbe

reduced,resultingin amoreeÆcientandmorenaturaldialogueow. Theframe-

basedsystem fullled anumber ofdialogue design requirements identied bythe

Philipsdialogueteam,includingthefollowing:

|thereshouldnotbearigidquestion-answerschemetoobtaintherequiredvalues;

|nomorequestionsthannecessaryshouldbeasked;

|nomoreconrmationthannecessaryshould berequired;

|informationgivenbythecaller,priortothesystemaskingforit,shouldbeused

[PhilipsSpeechProcessing1997].

Similarly,theCommunicatorsystemdevelopedattheUniversityofColorado[Ward

and Pellom 1999] enables a mixed initiative dialogue in which the user can take

control. ThisdegreeofusercontrolisgreaterthaninthePhilipssystem,wherethe

systemhascontrolofthedialogueowbuttheusercaninsertcorrectionstoitems

thatthesystemhasmisrecognisedormisunderstood. IntheCommunicatorsystem

theusercanrespondwithanythingtothesystem'squestion i.e. notnecessarilythe

answerto the question. The systemwill parse the utteranceand decide whether

and how to respond to it, putting on hold prompts for any additional missing

informationthatisrequiredbythesystemtobuildadatabasequery.

Fromadeveloper'sperspective,implementingthisdegreeofexibilityinagraph-

basedsystembecomescumbersome,ifnotimpossible. Alargenumberofstatesand

transitions arerequiredto deal withthenumber ofdierent pathsthat dialogues

mighttake. Aframe-basedsystemcanbespecieddeclarativelywiththesystem's

questionslistedas inarule-basedexpert system(orproductionsystem).

5.2.2 Disadvantagesofframe-basedsystems. Finitestate-basedandframe-based

approachesareappropriateforwell-denedtasksinwhichthesystemtakestheini-

tiative in the dialogue and elicits information from the user to complete a task,

In document Pay a bill (sivua 48-60)