HDDL
8. FUTURE DIRECTIONS
Thereareanumberofwaysinwhichspokendialoguetechnologymaydevelopover
thenextdecade. As farasresearchis concerned,thereareseveralinitiativesaim-
ingatconversationalsystemsthatsupportmorenaturalmixed-initiativedialogues.
Thefocusofmuchofthis workisonworkingsystemsratherthanonthedevelop-
mentoftheindividualcomponents. Thusinsteadofconcentrating,forexample,on
thedevelopment ofasophisticatednaturallanguageunderstanding component in
isolation,researchislikelytobedirectedtowardsthewaysinwhichsuchacompo-
nentcanbeintegratedwiththeothercomponentsofaspokendialoguesystem,and
onhowitcan bedeployedin real-worldapplications. Measurementof theperfor-
manceofsuchcomponentswillbeintermsoftheircontributiontotheperformance
ofthecompletesystem.
Studies of human-human dialogue haveprovided useful insightsinto how more
sophisticated dialogue systemsmight behaveand have resulted in theories of co-
operativeinteraction that inform the design and evaluation of interactive speech
systems [Bernsenet al. 1998] as well as models of conversational agency[Traum
1996],whichintegrateAIworkonplanningwithspeechacttheory. Asspeechrecog-
nitionandnaturallanguageunderstandingbecomemorerobust,moresophisticated
dialogue managers that have been developed in text-basedsystems, for example,
[CarberryandLambert1999],willemerge. JurafskyandMartin[2000](chapter19)
isarecentreviewof thetheoreticalbackgroundaswellasrecent developmentsin
dialogueandconversationalagency. Anothertrendistowardstheuseofstatistical
techniquesin dialogue management. Forexample, dialogue sequences have been
modelled asdialogue-act-N-gramsin ordertohelp predictupcomingdialogueacts
[NagataandMorimoto1994]. Probabilisticmethodsarebeingusedinconjunction
with reinforcement learningalgorithms to enablethe automatedlearningof opti-
maldialoguestrategies. DialogueismodeledasaMarkovdecisionprocess(MDP)
andviewedasatrajectoryinastatespacedeterminedbysystemactionsanduser
responses. Given multiple action choices at each state, reinforcementlearning is
used to explorethe choices systematicallyandto computethe best policy for ac-
tionselectionbasedonrewardsassociatedwitheachstatetransition[Litmanetal.
2000].
Whilemost of this survey hasbeenconcerned with dialogue systemsthat pro-
vide a spoken language interface, there has been a recent development towards
the integration of spoken languagetechnology with other modalities [Cohen and
Oviatt1995]. IntheTRAINSproject,for example,thesystemdisplaysamap of
the areaunder discussion with the route being planned marked and highlighted.
Someofthetravelinformationsystems,suchastheATIS (AirTravelInformation
System)intheUnitedStatesandtheEUEspritMASKprojectinvolvemultimodal
interaction. For example, the MASK system is planned asa multimodal, multi-
media service kiosk to be located in train stations, with the user being able to
speakto thesystem aswell asusing atouch screen and keypad, and thesystem
displaying information to theuser on ascreen [Lamel et al. 1995]. Theselection
andco-ordinationof dierentmediain relationto dierenttypesof contentto be
displayed and the varying needs of the user and the task have been the subject
ofmuchresearch (see,forexample,thepapersin Maybury[1993]). Althoughthis
workisstillinitsinfancyandmanyofthesolutionsadoptedtendtobead-hocand
application-specic,therehasbeensomeprogresstowardsageneraltheoryofinput
andoutputmodalitiesandofhowspeechmightbeintegratedwithinamultimodal
context[Bernsen1994].
AnotherimportantapplicationareaistheWorldWideWeb. Withtheincreasing
integrationoftheInternetanddomestictelevision,thereisapotentialforapplica-
tionsusingspokendialoguetechnologytoperformservicessuchashomeshopping,
orto control and program appliances around the home such as microwave ovens
andVCRs. TheseneedsarebeingaddressedthroughVoiceXML(VoiceeXtensible
MarkupLanguage)-anXML-basedmark-uplanguageforcreatingdistributedvoice
applicationsthatfeaturesynthesisedspeech,digitisedaudio,recognitionofspoken
and DTMFkeyinput, recording ofspoken input, telephony, andmixed-initiative
dialogues [VoiceXML Forum nd]. VoiceXML providesan open environment with
standardiseddialoguescriptingandspeechgrammarformats. Furthermore,because
itisbasedonXML,avastselectionofeditingandparsingtoolsisavailable,includ-
ingbothcommercialandfreelyavailableopen-sourcetools. AVoiceXMLdocument
specieseachinteraction dialoguetobeconducted by aVoiceXML interpreter. A
VoiceXMLdocumentformsaconversationalnitestatemachine,withsomedegree
mixedinitiativethatallowsusersinalimitedwaytoinputmorethanonevaluein
aparticulardialogue state. VoiceXML hasbeen acceptedasastandardforWeb-
based spoken dialogue systems and VoiceXML servers may well replace current
proprietarydevelopmentplatformsforspokendialoguesystems. Furtherworkwill
berequiredtointegratethemorecomplexfunctionalitiesdescribed inthis survey
intothenextversionsofthestandard.
Bringingthesepointstogether,someoftheissuesthatarelikelytobeimportant
inspokendialogueresearchinthenextdecadeare:
|more robust speech recognition, including the ability to perform well in noisy
conditions,to dealwith out-of-vocabulary words,and to integratemoreclosely
withtechnologiesfornaturallanguageprocessing;
|theuse of prosody in spokendialogue systems, both to providemore naturally
sounding output and to assist recognitionby identifying phrase boundaries as
wellasthefunctions ofutterances;
|researchconcernedwithcomponentintegrationandwithinvestigatingtheextent
towhichthelanguageunderstandinganddialoguemanagementcomponentscan
compensatefordecienciesin speechrecognition;
|investigationsoftheapplicabilityofdierenttechnologiesforparticularapplica-
tiontypes,suchasthecostsandbenetsofparsingusingtheoreticallymotivated
grammarscomparedwithrobustandpartialparsingandwithmorepragmatically
drivenmethodssuch asconceptspotting;
|studiesofdierentapproachestodialoguemanagementinrelationtotherequire-
mentsofanapplication,indicating,forexample,wherestate-basedmethodsare
applicableandinwhich circumstancesmorecomplexapproachesarerequired;
|theincorporationofmoresophisticatedapproachestodialoguemanagementde-
rivingfromAI-basedresearch;
|researchintotheuseofstochasticandmachinelearningtechniques;
|thedevelopmentof multi-modaldialoguesystems;
|dialoguesystemswithWebintegration.
It is unlikely that all of these issues will be addressed in commercial systems in
theshortterm,althoughthereisconsiderableinterestinthecommercialpotential
ofvoicecommerce,involvingtheintegrationofspokenlanguageandInternettech-
nologies. Ingeneral, however,the emphasis in dialogue research ison developing
moreadvancedsystemsandontestingtheoriesofdialogue,whileinthecommercial
environmenttheaim istoproducesystemsthatwill work inthereal world. Here
theperformanceofthesystemismeasurednotintermsoftheevaluationmeasures
applied to alaboratory prototypebut in terms of its eÆciency, eectiveness, us-
abilityand acceptability under real-worldconditions. Factors that determinethe
successfuldeploymentofasystemincludemarketability,protability,anduserac-
ceptance, forwhich considerable eorthasto be directed towardsmanaging user
expectationsinrespectoftheconstraintsofthetechnologyandconvincingusersof
thebenetsofthetechnology.
Inconclusion,ascanbeseenfromthissurvey,therehasbeenadramaticincrease
in interest in spoken dialogue systems over the past decade, and there is every
indicationthat thisinterestwillcontinue,giventhattherearestillmanyproblems
toberesolvedandgiventheobviousbenetsofthetechnology.
APPENDIX