• Ei tuloksia

General Reasoning (Theorem Proving)

In document Pay a bill (sivua 40-48)

Linguistic Interface Knowledge

Fig.9. ThearchitectureoftheCircuit-x-it-Shopsystem

actions and how theorems are used to prove goal completion. Among the other

types of knowledge included in this component are general dialogue knowledge

aboutthelinguisticrealisationsoftaskexpectationsandknowledgeabouttheuser

thatisacquired duringthecourseof thedialogue.

ProblemsolvingintheCircuit-Fix-ItShopsysteminvolvesco-operatingwiththe

user to solve aspecic goal, such as how to repair aparticular circuit. Problem

solving is achieved through communication between the system and the user to

establish what actions have to be carried out and what the current state of the

taskmightbe. Thecomponentsdescribedinthissectionsupportthesysteminits

reasoningaboutthestepsrequiredtocompleteatask,indecidingwhatinformation

tocommunicatetotheuser,andintheintegrationofinformationprovidedbythe

userintothesystem'smodelofthecurrentstateofthetask.

4.4.3 Communication with a planning system. Problem solving can also be

achievedthroughtheuseofaplanningsystemthatsupportsreasoningaboutgoals,

plansandactions. WhilethetaskstructuresusedintheCircuit-Fix-ItShopsystem

involvetaskdecomposition into sub-tasksandsubsequentlyinto primitiveactions

tobecarriedout,theproblem-solvingmechanismsaredierentfromthosethatare

usedinconventionalplanningsystems. TheCircuit-Fix-ItShopsystemispresented

withanexplicitgoalat thebeginningofthedialogueanditstaskistocollaborate

with the userin proving thegoal in much the samewayas atheorem is proved.

Planningsystemsincorporatefurthercomplicationsinthatoftenthesystemhasto

infertheuser'sgoalsfromstatementsoractionsthat may notexplicitlyrepresent

thegoals(plan recognition). Planning systemstypicallyinclude an explicitrepre-

sentationofbeliefs,desiresandintentionsthatarereasonedaboutduringthecourse

oftheproblemsolving. TheseelementsareassumedimplicitlyintheCircuit-Fix-It

Shopsystem.

The TRAINS project [Allen et al. 1995] is concerned with the integration of

natural language dialogue and plan reasoning to support collaborative problem

solving. The purpose of the dialogue is to negotiate and develop a plan. The

speech actsthat comprisethedialoguearemotivated byreasoningabouttheplan

and are at the same time interpreted in the light of the current plan. Figure 10

providesasimplied viewofthecomponentsoftheTRAINSsystem.

Dialogue Manager

Natural language generator

Domain plan reasoner

Execution planning and

monitoring Parser

Simulated TRAINS world

Manager's Utterance System's utterance

Fig.10. TheTRAINSarchitecture

PlanreasoningintheTRAINSsysteminvolvestwoalgorithms-theincorporation

algorithmandtheelaboration algorithm. Theincorporation algorithmisconcerned

essentiallywithplanrecognitioni.e. with ndingcausalandmotivationalconnec-

tions between potential interpretations of the current utterance and the current

plan. Thealgorithmsearchesthroughaspaceofplangraphswithnodesrepresent-

ing events and states, and links representing relations between eventsand states

suchas enablement,eect,generationandjustication. Theelaboration algorithm

supportsthesystem'sconstructionofaplanusingmeans-endsplanning.Iftheuser

encounters somechoice thatrequires conrmation,forexample,anelementinthe

planthatisambiguous,thesystemgeneratesanutterancetorequestconrmation.

4.4.4 Summary. Thisdiscussionoftheroleoftheexternalcommunicationcom-

ponentinaspokendialoguesystemhasshownhowanintegratedsystemarchitec-

ture,asillustratedin Figure7,isrequiredin ordertosupportinteractionbetween

thedialogue managementcomponent andtheother systemcomponents. Inaddi-

tiontotheproblemofdeterminingwhethersuÆcientinformationhasbeenelicited

from theusertoprovideinputto theexternalapplication, asdiscussedin section

4.3,obtainingtherequiredinformationfrom theexternalsourceis notnecessarily

astraightforwardtaskandcomplexinteractions mayberequiredinvolvingmedia-

tionsbetweenthedialoguemanagerandtheuser. Inthecaseofadatabasequery

the requested information may not be available in the form that was requested

sothat areformulatedqueryis required. Inaplan reasoningapplication such as

TRAINS the plan reasoner may fail to nd a connection betweenan event, goal

orfact inferred from theuser's utteranceand anodein theplan graph,in which

caseitcouldbeassumedthattheuser'sutterancehadbeenmisinterpretedandthe

languageunderstanding componentwouldberequiredto searchforanalternative

interpretation,failingwhichthesystemwouldrequestclaricationorrepair. Thus

the interpretation and resolution of the user's query may involve complex inter-

action withthe externalsourcebefore thesystemcanreport aresult backto the

user.

4.5 Responsegeneration

Assuming that the requested information has been retrieved from the external

source,theresponsegenerationcomponentnowhasto constructthemessagethat

is to besentto the speech outputcomponent to be spokento theuser. Broadly

speaking,theconstructionofthemessageconsistsofthree decisionsinvolving:

(1) whatinformationshouldbeincluded;

(2) howtheinformationshould bestructured;

(3) the form of the message - for example, the choice of words and syntactic

structure.

Responsegenerationcanbeachievedusingsimplemethods,suchastheinsertionof

theretrieveddataintopre-denedslotsinatemplate. Ontheotherhand,complex

methodsusingnaturallanguagegenerationtechniquesmaybeused,althoughgen-

erally these morecomplex methods haveonlybeen applied in researchprototype

systems.

Responsegenerationinadialoguesysteminvolvesadditionaltasksbeyondthose

required for other language generation tasks. Given that the information to be

generated is in the form of some non-linguistic representation - for example, the

results of a database query ora chain of reasoning from an expert system - the

dialoguemanagerhastorelatetheinformationtowhatwaspreviouslysaid(using

a discourse history) as well as to the user's goals and knowledge (using a user

model).

Useofadiscoursehistoryenablesthesystemto providearesponse thatis con-

sistentandcoherentwiththeprecedingdialogue. Forexample: ifsomeentitythat

has already been mentioned is to be referred to again, the system should check

whetherananaphoricexpressioncanbeusedunambiguouslytorefertotheentity

onasecondmention,asinthefollowingexampletakenfromReiterandDale[1997]:

The nexttrainisthe CaledonianExpress. It leaves at10am. Many

touristguidebookshighly recommend this train.

Little research has been done onthe use of pronouns in language generation,al-

thoughthere hasbeensomeresearch on generating denite descriptions- forex-

ample,the useof the train iftheCaledonianExpressand noothertrainhasbeen

previouslymentioned[DaleandReiter1995].

Asmentionedearlier,usermodellingintheearly1980swasconcernedwithmak-

ingnaturallanguagedialoguesystemsmoreco-operative. Inadditiontosupporting

theinterpretationoftheuser'sutterancesbymodellingtheuser'sbeliefs,goalsand

plans, theothermain application ofusermodels wastoenableasystemto adapt

itsoutput totheuser'sperceivedneeds[Wahlster andKobsa 1989]. A numberof

researchprojectsaddressedthisissue,ofwhichthefollowingareindicative.

TheKNOMEsystem[Chin1989]provideddierentlevelsofexplanationofUnix

commandsdependingonitscategorisationoftheuser'slevelofcompetenceandthe

degreeofdiÆcultyof thecommandinquestion. TheTAILORsystem[Paris1989]

adapteditsoutputtotheuser'slevelofexpertisebyselectingthetypeofdescription

and theparticularinformationthat wouldbeappropriateforagivenuser. Based

on an extensive analysis of scientic texts, it was found that texts from adult

encyclopaedias and manuals for experts mainly included structural information

that could berepresentedusing constituency schemasdescribingthe parts of the

objects,whileencyclopaediasforyoungchildrenandmanualsfornovicescontained

mainly process-orientedinformation that described the functional characteristics

oftheobjects. TAILORwasabletogenerateappropriatedescriptionstodierent

typesof userand to producearange ofdescriptionsfor usersfalling betweenthe

two extremes of novice or expert. Finally, in the IMP system, Jameson [1989]

investigatedtheuseof anticipationfeedbackto determinethebiasof thesystem's

output. Basicallywhat thisinvolvesis thatthesystemattemptsto anticipatethe

user'sreactiontoits outputand thentakesthisanticipatedreactioninto account

in nalising its output. This technique isparticularly appropriate forevaluation-

oriented dialogues, such as personnel selection interviewsand dialogues involving

travelagents,hotel managers,andsalespeople.

AusermodelwasusedintheCircuit-Fix-ItShopsystemtoenablethesystemto

determinewhatneededtobesaidtotheuserandwhatcouldbeomittedbecauseof

existing userknowledge(seetheexamplediscussed insection3.3). Inthissystem

the dialogue controller invoked inferences to derive additional axioms about the

user basedon theuser's utterances. These inferences, which are similar to those

usedby[Chin1989]intheKNOMEsystem,includedthefollowing[SmithandHipp

1994]: 60):

Ifthe axiommeaningisthat theuser hasagoaltolearnsomeinforma-

tion, then conclude that the user does notknow aboutthe information.

If the axiom meaning is that an action was completed, then conclude

thatthe user knowshow toperformthe action.

These inferences, which are based on abstract descriptions of actions and their

eects,wereusedtoprovideusermodelaxiomsthatcouldbeusedbythetheorem

proveralongwithotheraxiomsthatwereavailabletoprovegoalcompletion. Thus

theusermodelinformationwasemployedwithinthedialoguesystemtodetermine

theselectionof theinformationtobepresentedtotheuser.

A considerable amountof research in text generation has beenconcerned with

theorganisationofmessages,i.e. theirdiscoursestructure. Oneofthemostwidely

knownapproachesinvolvestheuseofrhetoricalrelationsbetweenelementsofatext,

asdescribed in Rhetorical StructureTheory (RST) [Mann and Thompson 1988].

Examplesof rhetoricalrelationsareelaboration,exemplication, andcontrast. Al-

ternatively, schemas have been used to provide the structure of the information

to be presented [McKeown 1985]. A schema sets out the main components of a

text, using elements such as identication, analogy, comparison, and particular-

illustration, whichhave asequential orderingin a textand canoccur recursively.

Schema-basedsystemsoftenusegeneralprogrammingconstructssuchaslocalvari-

ablesandconditional tests.

Theform oftheoutput isknown asthelinguisticrealisation. This involvesthe

choicesoflexicalitemsandsyntacticstructurestoexpressthedesiredmeaning. The

choiceoflexicalitemsmightinvolvedecidingbetweenthewordsleaveanddepart to

express theconceptof DEPARTURE, whilesyntacticdecisionsmightinvolvethe

choiceofanactiveorapassivesentence[ReiterandDale1997].Linguisticrealisa-

tionalsoinvolvesthegeneration ofgrammaticallycorrectstructures,forexample,

selectingtheappropriatetenseandrulesofagreement. Fromtheperspectiveofthe

constructionofatext,four dierentcategoriesof contentmaybeinvolved[Reiter

andDale1997]:

(1) unchanging text - i.e. parts of the messagethat are alwayspresent in the

outputtext;

(2) directly-available data - i.e. information that has been retrieved from a

databaseorknowledgebase;

(3) computabledata- i.e. informationthat isderivedfrom thedataasaresult

ofsomecomputation orreasoning (forexample,the numberof recordsfound

inthedatabasefortrainsbetweentwocities);

(4) unavailable data-i.e. informationthatisnotpresentinthedatabutwhich

supplementstheinformation(thisiscommonintextsauthoredbyhumans,for

example,extrainformationthatarailwaylinemaybeblockedbysnow).

Adialoguesystemmaymakeuseofatleasttherstthreetypes,usingunchanging

textfor theconstantpartsof amessage,retrieveddatato conveytheinformation

thatwasrequested,andcomputabledatatosummarisetheinformationortorequire

amorespecic choicefrom theuser.

4.6 Speechoutput

Speechoutputinvolvesthetranslationofthemessageconstructedbytheresponse

generationcomponentintospokenform. Inthesimplestcasespre-recordedcanned

speech maybeused, sometimeswith spacesto belled byretrievedorpreviously

recordedsamples,asin:

Youhaveacall from<Jason Smith>. Do youwish totakethe call?

inwhichmostofthemessageispre-recordedandtheelementinangularbracketsis

eithersynthesisedorplayedfromarecordedsample. Thismethodworkswellwhen

themessagestobeoutputareconstant,butsyntheticspeech isrequiredwhenthe

text isvariable and unpredictable,when largeamountsofinformation haveto be

processedandselectionsspokenout,andwhenconsistencyofvoiceisrequired. In

thesecasestexttospeech synthesis(TTS)is used.

Text tospeech synthesiscanbeseenasatwostageprocessinvolving

(1) textanalysis;

(2) speechgeneration[Edgingtonetal.1996a;1996b].

Textanalysisinvolvestheanalysisoftheinputtextthatresultsinalinguisticrep-

resentation that can beused by thespeech generation stageto producesynthetic

speech bysynthesisingaspeech waveformfrom thelinguisticrepresentation. The

textanalysisstageissometimesreferredtoastext-to-phonemeconversion,although

thisdescriptiondoesnotcovertheanalysisoflinguisticstructure thatisinvolved.

Thesecond stage,which isoften referredto asphoneme tospeech conversion, in-

volvesthegenerationofaprosodicdescription(includingrhythmandintonation),

followedbyspeechgenerationwhichproducesthenalspeechwaveform. Aconsid-

erableamountofresearchhasbeencarriedoutin texttospeechsynthesiswhichis

beyondthescopeof thepresentsurvey(see,forexample,[Edgingtonet al.1996a;

1996b;Carlson and Granstrom1997] for recent overviews). This researchhas re-

sultedin severalcommerciallyavailabletext to speechsystems, such asDECTalk

and the BT Laureate system [Page and Breen 1996]. The main aspects of text

to speech synthesisthat are relevantto spoken dialoguesystems willbereviewed

briey. Thetext analysisstageoftexttospeechsynthesiscomprisesfourtasks:

(1) textsegmentationandnormalisation;

(2) morphologicalanalysis;

(3) syntactictaggingandparsing;

(4) themodellingofcontinuousspeecheects.

Textsegmentation isconcerned withtheseparationof thetext into unitssuch as

paragraphs and sentences. In some cases this structure will already exist in the

retrievedtext, but thereare manyinstances ofambiguousmarkers. Forexample,

afull stopmaybe taken as amarker of asentence boundary, but it is also used

forseveral otherfunctions such asmarkinganabbreviation(St.),asacomponent

ofadate(12.9.97), oraspartofanacronym (M.I.5). Normalisationinvolvesthe

interpretationofabbreviationsand otherstandardforms such asdates, timesand

currencies, and their conversioninto a form that can be spoken. In many cases

ambiguity in theexpressions hasto beresolved -for example,St. canbe`street'

or`saint'.

Morphologicalanalysisis requiredontheonehandto dealwiththeproblem of

storing pronunciationsof large numbersof wordsthat are morphologicalvariants

of one another, and on the other to assist with pronunciation. Typically apro-

nunciationdictionary will storeonly the root forms of words, such as write. The

pronunciationsof relatedforms, such aswrites and writing,can be derived using

morphological rules. Similarly, words such as staring need to be analysed mor-

phologicallyto establish theirpronunciation. Potentialroot forms are star +ing

andstare +ing. Theformerisincorrectonthebasisof amorphologicalrulethat

requires consonant doubling (starring), while the latter is correct because of the

rulethatrequirese-deletionbeforethe-ing form.

Taggingisrequiredtodeterminethepartsofspeechofthewordsinthetextand

to permit alimited syntacticanalysis,usually involvingstochastic processing. A

smallnumberofwords-estimatedatbetween1and2%ofwordsinatypicallexicon

[Edgingtonetal.1996a]-havealternativepronunciationsdependingontheirpart

of speech. Forexample: live as averbwill rhyme with give, but as anadjective

rhymeswith ve. Thepartofspeechalso aectsstress assignmentwithin aword

-forexample, record asanounispronounced'record (with thestressontherst

syllable),andasaverbasre'cord (withthestressonthesecondsyllable).

Modellingcontinuous speech eects isconcerned withachieving naturalsound-

ing speech when the wordsare spoken in a continuous sequence. Two problems

areencountered. Firstly, thereareweakformsofwords,involvingmainlyfunction

words such asauxiliaryverbs,determinersand prepositions. These wordsare of-

ten unstressedand given reduced oramended articulationsin continuousspeech.

Without these adjustments theoutput soundsstilted and unnatural. The second

probleminvolvesco-articulationeectsacrosswordboundaries,whichhavetheef-

fect of deleting or changing sounds. Forexample: if thewordsgood and boy are

spokentogetherquickly,the/d/ingoodisassimilatedtothe/b/inboy. Modelling

these co-articulationeects isimportantfor theproductionof naturallysounding

speech.

There hasbeenanincreasingconcern withthegeneration ofprosody in speech

synthesis,aspoorprosodyisoftenseenasamajorproblemforspeechsystemsthat

tend to sound unnaturaldespite good modelling of theindividual units of sound.

Prosody includes phrasing, pitch, loudness, tempo, and rhythm, and is used to

conveydierencesinmeaningaswellastoconveyattitude.

Thespeechgenerationprocessinvolvesmappingfromanabstractlinguisticrep-

resentation of the text, as provided by the text analysis stage, to a parametric

continuous representation. Twomain methods have been used to model speech:

articulatory synthesis, which modelscharacteristicsof the vocal tract and speech

articulators, and formant synthesis, which models characteristics of the acoustic

signal. Formantsynthesishasbeenthemoresuccessfulmethod andhasproduced

commercialsystemssuchasDECTalkthatyieldahighdegreeof intelligibility.

Analternativemethodthatisusedinrecentwork,forexample,inBT'sLaureate

system, involves concatenative speech synthesis, in which pre-recorded units of

speech arestoredin aspeechdatabaseandselected andjoined togetherin speech

generation. The relevant units are usually not phonemes, due to the problems

that arisewith co-articulation, butdiphones, which assistin the modelling ofthe

transitions from one unit of sound to the next. Various algorithms have been

developedforjoining theunits togethersmoothly.

Generallyrelativelylittleemphasishasbeenputonthespeechoutputprocessby

developersof spokendialogue systems. This ispartlydue tothe factthat text to

speechsystemsarecommerciallyavailablethatcanbeusedto producereasonably

In document Pay a bill (sivua 40-48)