• Ei tuloksia

FUTURE DIRECTIONS

In document Pay a bill (sivua 86-89)

HDDL

8. FUTURE DIRECTIONS

Thereareanumberofwaysinwhichspokendialoguetechnologymaydevelopover

thenextdecade. As farasresearchis concerned,thereareseveralinitiativesaim-

ingatconversationalsystemsthatsupportmorenaturalmixed-initiativedialogues.

Thefocusofmuchofthis workisonworkingsystemsratherthanonthedevelop-

mentoftheindividualcomponents. Thusinsteadofconcentrating,forexample,on

thedevelopment ofasophisticatednaturallanguageunderstanding component in

isolation,researchislikelytobedirectedtowardsthewaysinwhichsuchacompo-

nentcanbeintegratedwiththeothercomponentsofaspokendialoguesystem,and

onhowitcan bedeployedin real-worldapplications. Measurementof theperfor-

manceofsuchcomponentswillbeintermsoftheircontributiontotheperformance

ofthecompletesystem.

Studies of human-human dialogue haveprovided useful insightsinto how more

sophisticated dialogue systemsmight behaveand have resulted in theories of co-

operativeinteraction that inform the design and evaluation of interactive speech

systems [Bernsenet al. 1998] as well as models of conversational agency[Traum

1996],whichintegrateAIworkonplanningwithspeechacttheory. Asspeechrecog-

nitionandnaturallanguageunderstandingbecomemorerobust,moresophisticated

dialogue managers that have been developed in text-basedsystems, for example,

[CarberryandLambert1999],willemerge. JurafskyandMartin[2000](chapter19)

isarecentreviewof thetheoreticalbackgroundaswellasrecent developmentsin

dialogueandconversationalagency. Anothertrendistowardstheuseofstatistical

techniquesin dialogue management. Forexample, dialogue sequences have been

modelled asdialogue-act-N-gramsin ordertohelp predictupcomingdialogueacts

[NagataandMorimoto1994]. Probabilisticmethodsarebeingusedinconjunction

with reinforcement learningalgorithms to enablethe automatedlearningof opti-

maldialoguestrategies. DialogueismodeledasaMarkovdecisionprocess(MDP)

andviewedasatrajectoryinastatespacedeterminedbysystemactionsanduser

responses. Given multiple action choices at each state, reinforcementlearning is

used to explorethe choices systematicallyandto computethe best policy for ac-

tionselectionbasedonrewardsassociatedwitheachstatetransition[Litmanetal.

2000].

Whilemost of this survey hasbeenconcerned with dialogue systemsthat pro-

vide a spoken language interface, there has been a recent development towards

the integration of spoken languagetechnology with other modalities [Cohen and

Oviatt1995]. IntheTRAINSproject,for example,thesystemdisplaysamap of

the areaunder discussion with the route being planned marked and highlighted.

Someofthetravelinformationsystems,suchastheATIS (AirTravelInformation

System)intheUnitedStatesandtheEUEspritMASKprojectinvolvemultimodal

interaction. For example, the MASK system is planned asa multimodal, multi-

media service kiosk to be located in train stations, with the user being able to

speakto thesystem aswell asusing atouch screen and keypad, and thesystem

displaying information to theuser on ascreen [Lamel et al. 1995]. Theselection

andco-ordinationof dierentmediain relationto dierenttypesof contentto be

displayed and the varying needs of the user and the task have been the subject

ofmuchresearch (see,forexample,thepapersin Maybury[1993]). Althoughthis

workisstillinitsinfancyandmanyofthesolutionsadoptedtendtobead-hocand

application-specic,therehasbeensomeprogresstowardsageneraltheoryofinput

andoutputmodalitiesandofhowspeechmightbeintegratedwithinamultimodal

context[Bernsen1994].

AnotherimportantapplicationareaistheWorldWideWeb. Withtheincreasing

integrationoftheInternetanddomestictelevision,thereisapotentialforapplica-

tionsusingspokendialoguetechnologytoperformservicessuchashomeshopping,

orto control and program appliances around the home such as microwave ovens

andVCRs. TheseneedsarebeingaddressedthroughVoiceXML(VoiceeXtensible

MarkupLanguage)-anXML-basedmark-uplanguageforcreatingdistributedvoice

applicationsthatfeaturesynthesisedspeech,digitisedaudio,recognitionofspoken

and DTMFkeyinput, recording ofspoken input, telephony, andmixed-initiative

dialogues [VoiceXML Forum nd]. VoiceXML providesan open environment with

standardiseddialoguescriptingandspeechgrammarformats. Furthermore,because

itisbasedonXML,avastselectionofeditingandparsingtoolsisavailable,includ-

ingbothcommercialandfreelyavailableopen-sourcetools. AVoiceXMLdocument

specieseachinteraction dialoguetobeconducted by aVoiceXML interpreter. A

VoiceXMLdocumentformsaconversationalnitestatemachine,withsomedegree

mixedinitiativethatallowsusersinalimitedwaytoinputmorethanonevaluein

aparticulardialogue state. VoiceXML hasbeen acceptedasastandardforWeb-

based spoken dialogue systems and VoiceXML servers may well replace current

proprietarydevelopmentplatformsforspokendialoguesystems. Furtherworkwill

berequiredtointegratethemorecomplexfunctionalitiesdescribed inthis survey

intothenextversionsofthestandard.

Bringingthesepointstogether,someoftheissuesthatarelikelytobeimportant

inspokendialogueresearchinthenextdecadeare:

|more robust speech recognition, including the ability to perform well in noisy

conditions,to dealwith out-of-vocabulary words,and to integratemoreclosely

withtechnologiesfornaturallanguageprocessing;

|theuse of prosody in spokendialogue systems, both to providemore naturally

sounding output and to assist recognitionby identifying phrase boundaries as

wellasthefunctions ofutterances;

|researchconcernedwithcomponentintegrationandwithinvestigatingtheextent

towhichthelanguageunderstandinganddialoguemanagementcomponentscan

compensatefordecienciesin speechrecognition;

|investigationsoftheapplicabilityofdierenttechnologiesforparticularapplica-

tiontypes,suchasthecostsandbenetsofparsingusingtheoreticallymotivated

grammarscomparedwithrobustandpartialparsingandwithmorepragmatically

drivenmethodssuch asconceptspotting;

|studiesofdierentapproachestodialoguemanagementinrelationtotherequire-

mentsofanapplication,indicating,forexample,wherestate-basedmethodsare

applicableandinwhich circumstancesmorecomplexapproachesarerequired;

|theincorporationofmoresophisticatedapproachestodialoguemanagementde-

rivingfromAI-basedresearch;

|researchintotheuseofstochasticandmachinelearningtechniques;

|thedevelopmentof multi-modaldialoguesystems;

|dialoguesystemswithWebintegration.

It is unlikely that all of these issues will be addressed in commercial systems in

theshortterm,althoughthereisconsiderableinterestinthecommercialpotential

ofvoicecommerce,involvingtheintegrationofspokenlanguageandInternettech-

nologies. Ingeneral, however,the emphasis in dialogue research ison developing

moreadvancedsystemsandontestingtheoriesofdialogue,whileinthecommercial

environmenttheaim istoproducesystemsthatwill work inthereal world. Here

theperformanceofthesystemismeasurednotintermsoftheevaluationmeasures

applied to alaboratory prototypebut in terms of its eÆciency, eectiveness, us-

abilityand acceptability under real-worldconditions. Factors that determinethe

successfuldeploymentofasystemincludemarketability,protability,anduserac-

ceptance, forwhich considerable eorthasto be directed towardsmanaging user

expectationsinrespectoftheconstraintsofthetechnologyandconvincingusersof

thebenetsofthetechnology.

Inconclusion,ascanbeseenfromthissurvey,therehasbeenadramaticincrease

in interest in spoken dialogue systems over the past decade, and there is every

indicationthat thisinterestwillcontinue,giventhattherearestillmanyproblems

toberesolvedandgiventheobviousbenetsofthetechnology.

APPENDIX

In document Pay a bill (sivua 86-89)