• Ei tuloksia

9A?A J IJK@O -NFAHJ 5OIJAI

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "9A?A J IJK@O -NFAHJ 5OIJAI"

Copied!
43
0
0

Kokoteksti

(1)

Welcome to study Expert Systems!

In this course, you will learn

general knowledge of expert systems and their possibilities

how to implement a system for your own problem

how to compare dierent systems and select the best one

several metaskills like new learning methods, groupworking, problem solving, information retrieval, ...

Course details

5 ects master-level course

Classes: Wed 14-16 D106/B 14.9.-30.11. 2005 Teacher: Wilhelmiina Hämäläinen

(Meeting times: Wed 13.00-14.00, whamalai@cs.joensuu.)

Prerequicites: basic knowledge of logics and probability calculus

1

(2)

Performance

No exam, no demo sessions, but still we will learn!

Prepare for 12 hours work/week as in normal courses!

1. Participating classes (you can drop at most 2/12 classes) 2. Presentations about main expert systems: A student pair

gives a 30 min presentation about some expert system with some lecture material for other students. They should also design some exercise task about the topic and check the so- lutions. Each exercise task should take approximately 2 hours work.

3. Exercises: In the beginning of course exercises about intro- duction lectures and all student presentations. Weekly exer- cises take approximately 6 hours work in the beginning of course.

4. Learning diary: Each student writes a personal learning diary and returns a piece of it weekly. The lecturer will give feedback next week. The learning diaries are evaluated in the end of course.

5. Projectworks The projectworks are made in groups of three students (other than in the presentations). The projects are performed in several phases: specication (topic and data), design (how to implement), implementation and testing

2

(3)

Evaluation

Projectworks 40%

Presentations 20%

Learning diaries 20%

Exercises: 20%

Presentations will be evaluated by students and the teacher (ac- cording to how much you learnt)

For learning diaries, we will x the detailed criteria together!

3

(4)

Learning diary

In the beginning:

What are your expections?

What do you know beforehand?

What is your motivation?

Every week:

What you have learnt and what you have not?

Set yourself learning goals, what you should learn or nd out!

Check also, if you have reached your previous learning goals!

All experiences, feelings, changes in motivation are also im- portant!

Explain the new or dicult things with your own words! Try to relate them to your previous knowledge, and invent new applications

4

(5)

In the end:

Combine all pieces of knowledge into one and draw the nal picture (by words or drawing a concept map)

Evaluate your whole learning process during the course: Did you reach your goals? Are you satised or not?

In addition:

You can write or draw anything you wish the style is totally free

Return your piece of learning diary weekly on paper, so that your teacher can give you individual feedback and answer your questions

Save old learning diaries for nal evaluation

5

(6)

Preliminary schedule

14.9. Introduction to Expert systems, course overview and work allocation

21.9. Basic types of ESs, rule-based, functional and probabilistic systems

28.9. ESs, student presentations 5.10. ESs, student presentations 12.10. ESs, student presentations

19.10. ESs, last student presentations. Comparing and evaluat- ing systems (lecturer). Beginning projectworks.

26.10. Work specications (topic + data), short presentations.

2.11. Work plans (method + how to implement), short presenta- tions.

9.11. Detailed plans with class diagrams and main algorithms.

16.11. Implementations, short presentations?

23.11. About testing, designing test plans. Other evaluation (re- strictions, time and space requirements etc.)

30.11. Supersession (4 h), nal presentations

6

(7)

What are expert systems?

Several denitions:

A computer system which emulates the decision-making ability of a human expert

A computer system capable of giving advice in a particular knowledge domain

A model of the expertise of the best practitioners of the eld

7

(8)

application of Articial Intelligence (simulates intelligent be- haviour)

roots in Decision Analysis (utility of decision)

also related to Machine Learning (learning models from data) and Data Mining (discovering new patterns in data)

performs at the level of an expert (to assist an expert or even instead of an expert)

have been designed for a narrow problem domain

can be implemented as an intelligent agent = a part of soft- ware, which monitors the system and oers help, when it considers it is needed

8

(9)

What they can do?

Several types of tasks

Diagnosis: nding faults in the system or diseases of living or- ganisms, e.g. printer fault detection system, automatic med- ical advicors in the net, ...

Monitoring: continuous interpretetion of signals, e.g. breating- support machines in hospitals, controlling systems in industry

Planning: producing a sequence of actions for achieving a par- ticular goal (e.g. designing an ideal curriculum for a school, interactive guides for repairing a car, navigating an explo- ration robot in Mars)

Instruction (intelligent tutoring systems): especially for prac- tising physical skills like plane simulators, but nowadays in all levels, especially in distance learning

Prediction: forecasting future events (e.g. advicing pricing ac- cording to market uctuations)

9

(10)

Examples

A rule-based system for automobile trouble-shooting The system consist of simple if ... then rules, with uncertainty measures. For example:

If the result of trying the starter is the car cranks normally and a gas smell is not present while trying the starter Then the gas tank is empty with 90% condence

The on-line demo can be found on:

http://www.expertise2go.com/webesie/car/

The system is described on

http://www.expertise2go.com/webesie/

10

(11)

Bayesian automobile trouble shooting

The idea is to diagnosize the most probable reasons for auto- mobile start-up problems and evaluate the probability that the engine works, when we x a certain component. For example if we observe that the gauge is empty (whether or not there is fu- el), we can calculate the probability that the car starts after we have lled the tank.

This example is from Breese, J.S., Heckerman, D. and Rommelse, K.: Troubleshooting Under Uncertainty. Technical report MSR- TR-94-07, 1994. http://www.research.microsoft.com/research/

pubs/view.aspx?msr_tr_id=MSR-TR-94-07

11

(12)

A decision tree for medical diagnosis

An HIV-positive patient has come to the emergency department for care. What is the urgency of the visit? The nurse should classify patients as 0=emergent, 1=urgent, 2=non-urgent.

This example is from Lewis, R.J.: An Introduction to Classica- tion and Regression Tree (CART) Analysis. Annual Meeting of the Society for Academic Emergency Medicine in San Francisco, California, 2000. http://www.saem.org/download/lewis1.pdf.

12

(13)

Why ESs are useful?

help if expertise is scarse, expensive or unavailable training new human experts is time-consuming and expensive

save time, because they are fast and the same system can serve several users parallelly

do not forget or make human errors

are consistent: similar situations are always handled in the same way

can combine knowledge of multiple human experts However, ESs

are totally dependent on their knowledge base

usually manage only one way of reasoning

don't have common sense!

cannot adapt to new situations creatively like people can

don't recognize when no answer exists or when the problem is outside teir area of expertise!

13

(14)

History

medical software tools begun to emerge during 1970's, e.g.

MYCIN (Stanford University 1976) for aiding physicians in diagnosing and treating patients with infectious blood dis- eases

the rst systems were simple rule-based systems dened by human experts

Revolution in the late 80's: Bayesian networks

in 1990's several new applications and methods

nowadays AI-tools general make-it-yourself expert sys- tems

Evolution: rule-based systems Uncertain rules Bayesian networks, neural networks, decision trees, ...

Tendency from manually constructed ad hoc models to ma- chine learning, systems, which can learn themselves

14

(15)

System construction

Terminology

A model M = (S, θ) = a model structure S and assigned pa- rameter values θ. I.e. it is an instance of a given model structure.

The model structure S is a structure which determines the pa- rameters required for constructing a model. Typically the struc- ture consists of variables and their relations (e.g. conditional de- pendencies).

The model parameters θ are assigned numerical values like probabilities in Bayesian network or linear coecients (above α, β, γ) in linear regression.

Modelling paradigm the general modelling principles used.

The modelling paradigm consist of basic denitions, assumptions, and techniques for constructing and using certain kind of models.

15

(16)

Hierarchy of concepts

Bayesian networks Decision

trees Linear regression

A B

C A

B C

δ1 C A

B

A B

C A

C B

Modelling paradigms

Model family

Model class

A

B C

P(B|A) P(B| A)

P(C|A) P(C| A) P(A)

Model

. .

.. ..

2 A

B C

δ δ

B C

A

3

Modelling paradigms like Bayesian networks, decision trees and linear regression describe the general modelling principles used.

A model family is a set of model structures in the given mod- elling paradigm. E.g. in Bayesian networks modelling paradigm a model family consists of dierent graph structures with variable nodes and conditional dependencies (edges) between variables.

A model class is a set of models with a xed structure but dier- ent parameters θi. In the case of Bayesian networks the param- eters are prior probabilities associated to root nodes and condi- tional probabilities associated to non-root nodes.

A model is an item in a model class with a xed structure and xed parameters.

16

(17)

Construction process (from the implementor's point of view)

1. Understanding the domain: interview human experts and read literature. What are the system functionalities? Special util- ities or risks?

2. Data management: what and how to collect? Preprocessing?

3. Dene the model structure (see below). Especially causal re- lations are important!

4. Dene the model parametres, usually from data.

5. Verication and evaluation: Test the system! How sensitive it is for small changes in the data? (=sensitivity analysis)

17

(18)

Four approaches

EXPERT

MODEL STRUCTURE + MODEL PARAMETRES MODEL=

DATA

DATA PATTERNS

STRUCTURE MODEL

STRUCTURE MODEL MODEL STRUCTURE +

MODEL PARAMETRES MODEL=

LEARNING MACHINE

DATA

MODEL PARAMETRES

MODEL PARAMETRES LEARNING

MACHINE defines 1)

3)

EXPERT

4) 2) DATA

DATA

MINING interprets defines

4 APPROACHES FOR ES CONSTRUCTION

18

(19)

1. Traditional approach: the expert denes both the model struc- ture and the parametres

simple and fast method

expert knows the general causalities quite well

numeric measures (parametres) are hard to estimate

2. Machine Learning-approach: the whole model is dened by ML techniques

simple model structures can be learnt fast, but learning a graph-structure is generally NP-hard problem! (intractable for large data sets)

several model structures can be equally plausible recog- nizing causal relationships is very hard!

accurate numeric parametres, if the data set was good (large and representative)

19

(20)

3. Compromise approach: The expert denes the model struc- ture, and the parametres are learnt from data

combines the expert's knowledge on general causalities and computational means for dening parametres

the data and expert's knowledge can be contrary!

4. Iterative approach (See: Descriptive and predictive modelling by WH): 1) Data mining reveals patterns in the data. An ex- pert denes the model structure according to these patterns.

2) The model parametres are learnt from data.

data and model structure are now consistent

can be applied to new and poorly known domains

Task: What other advantages and disadvantages do you invent in each of the approaches?

20

(21)

Remarks

Likelihood 6= importance (utility): e.g. the most probable medical diagnosis is:There is nothing wrong in you, you just have a cold, but the 3rd probable (not reported) diagnosis is lung cancer.

Don't test with the same data set you have used for model construction any model will then work perfectly! distinct training and test sets

In every modelling paradigm there is some bias they suit better for some problems than others. Become conscious of the bias and select the best one!

21

(22)

Summary

The main tasks of expert systems can be reduced to two main types:

1. Classication or regression: Given attributes X1, ..., Xk, what is Y?

(a) Classication: Y is class variable (discrete valued)

(b) Regression: Y is numeric function value Y = f(X1, ..., Xk) 2. Direction of inference:

Prediction: Y is eect of Xi's (in future)

Diagnosis: Y is cause of Xi's (in past)

3. Action selection: given situation and goal(s), what is the op- timal action or sequence of actions?

22

(23)

EXPERT

MODEL STRUCTURE + MODEL PARAMETRES MODEL=

DATA

DATA PATTERNS

STRUCTURE MODEL

STRUCTURE MODEL MODEL STRUCTURE +

MODEL PARAMETRES MODEL=

LEARNING MACHINE

DATA

MODEL PARAMETRES

MODEL PARAMETRES LEARNING

MACHINE defines 1)

3)

EXPERT

4) 2) DATA

DATA

MINING interprets defines

4 APPROACHES FOR ES CONSTRUCTION

Task: Invent examples of expert systems, which should learn a sequence of optimal actions!

23

(24)

Modelling paradigms

Three main techniques:

1. Logical rules

2. Numeric functions 3. Probabilistic reasoning

24

(25)

Rule-based systems

ES=Inference engine + knowledge base Knowledge base

=rules + most of facts

Rules are of form antecedent consequent or if condition then action

condition part is expressed as a logical formula with operators AND, OR, NOT

Usually both factual and heuristic knowledge

Heuristics=rules of good judgement, rules of good guessing, rules of thumb

An example rule from MYCIN:

if the stain of the organism is gram negative AND the morphlogy of the orgasm is rod AND the aerobicity of the organism is anaer- obic THEN there is a strongly suggestive evidence (0.8) that the class of the organism is Enterobacter iaceae

25

(26)

Inference engine

Searches through knowledge base and tries to match facts to the antecedents of rules.

If the rule matches, the consequent can be executed.

Several ifthen rules are chained:

1. Forward chaining: starts from the data give and tries to nd rules, whose premises can be satised. When a rule is satised, the conclusions are added to the knowledge base, and new rules are searched until all new information has been added.

2. Backward chaining (goal-oriented system): tries to prove a goal or rule conclusion by conrming truth of all its premises.

The premise can be a conclusion of another rule, which has to be satied, or data supplied by the user from observations.

26

(27)

Advantages:

easy to construct

experts can easily interpret the answers

can explain the reasoning process (gives condence, easier to debug)

Restrictions:

cannot represent uncertainty

real world phenomena are often complex and incompletely known

the knowledge can be imprecise or probabilistic

or expressed vaguely (most, several, few, often, un- likely,...)

how rank alternative solutions?

how evaluate condence of conclusions?

27

(28)

Solution: uncertain rules, which express uncertainty by

Fuzzy logic

The truth value of A, T(A), is not 0 or 1 but T(A) [0,1]

T(A B) = min(T(A), T(B)) T(A B) = max(T(A), T(B)) T(¬A) = 1 −T(A)

Dempster-Shafer theory

Instead of a single value, we dene an interval [Bel(A), P l(A)], where Bel(A) tells how necessary A is and P l(A) = 1 Bel(¬A) tells, how possible A is (Notice that generally Bel(A) + Bel(¬A) 6= 1)

Beliefs are combined by Dempster's rule

Flexible way to represent ignorance (lack of information), but sometimes the results seem counter-intuitive (if you are used to probability theory)

Probability theory (see below)

other measures

28

(29)

Truth-maintenance systems

mathematically sound methods

not in fashion nowadays, but some uncertain extensions have been introduced

Machine learning algorithms for model construction do not exist? (yet!)

the most natural implementation language is Prolog, which is not in fashion...

29

(30)

Main types:

1. Justication-based TMS

The set of logical rules is represented as a graph.

The nodes correspond to premises (always true), assumptions (which can be cancelled) or contradictions.

The justications are logical rules of form p1∧p2∧...∧pn p

we can calculate a set of nodes which are believed, ask justi- cations for it, or cancel inconsistent assumptions

2. Assumption-based TMS

similar graph structure with assumptions and justications (rules)

now we calculate for all nodes all minimal assumption sets, under which it is true

more eective approach, when assumptions change often be- tween queries

enlargements for uncertain reasoning (with probabilities, Dempster- Shafer beliefs, or possibilistic logic)

30

(31)

Decision trees

the decision rules are expressed in tree-form

the whole model can be easily learnt from data

can deal with both numeric and categorial data

Regression

For predicting the numeric value of dependent variable Y given the independent variables X1, ..., Xn.

The goal is to learn a function f such that Y = f(X1, ..., Xn).

The simplest model is linear regression Y = α+β1X1 +...+ βnXn

More complex functions can be learnt by neural networks

31

(32)

Probabilistic reasoning

The basic laws of probability calculus

0 P(A) 1

If A can get k values a1, .., ak, then Σi=1kP(A = ai) = 1 (and for binary-valued data P(¬A) = 1 P(A))

Probability of A and B, P(A, B) = P(A)P(B), if A is inde- pendent of B

Probability of A or B is P(A∨B) = P(A)+P(B)−P(A, B)

The conditional probability of B given A is P(B|A) = PP(A,B(A))

If we know P(A) and P(B|A), we can update the probability of A given B by Bayes rule: P(A|X) = P(A)PP(B)(B|A)

32

(33)

Example

You don't remember, if your friend has a cat, but you are al- lergic to cats. In addition, you have hay-u in summer time. In autumn, the probability of hay-u is only 30%. You know that a cat alone causes you allergic symptoms with 60% probability and hay-u alone with 70% probability. If there is both a cat and hay-u, the probability is 90%. What is the probability that you get allergic symptoms, when you visit your friend?

Let's denote A=Your friend has a catB=You have hay-u,C=You get allergic symptoms.

P(A) = 0.5 (You don't know it) P(B) = 0.30

P(C|A, B) = 0.90 P(C|A,¬B) = 0.60 P(C|¬A, B) = 0.70

P(C|¬A,¬B) = 0.10 (This is rare, but not impossible other factors can cause allergic symptoms, too!)

33

(34)

Now the probability of allergic

P(C) = P(A)P(B)P(C|A, B) + P(A)P(¬B)P(C|A,¬B) + P(¬A)P(B)P(C|¬A, B) + P(¬A)P(¬B)P(C|¬A,¬B) =

You cannot see a cat what is the probability of cat, if you get allergic symptoms?

P(A|C) = PP(A,C)(C) = P(A,B,C)+PP(C)(A,¬B,C =

P(A)P(B)P(C|A,B)+P(A)P(¬B)P(C|A,¬B)

P(C) =

Task: Calculate the probabilities!

Notice! If we would know the total probability distribution P(A, B, C) (for all value combinations), we could solve everything directly.

However, in practice, we can usually estimate only the condition- al probabilities.

34

(35)

Bayesian networks

A nice visual representation for probabilistic relations

Vertices correspond to variables and edges dependencies be- tween variables (if variables are independent, there is no edge!)

Each root node A has prior probabilities P(A) (for all values ai)

Each non-root node has a table of conditional probabilities P(B|X), where X is the set of antecedent nodes (for all value combinations of X)

We can predict the probability of future events simply by chaining probabilities

The probability of a cause A can be updated when events X have been observed by Bayes rule

35

(36)

Neural networks

Idea is to learn a function, which maps inputs to outputs (general regression).

In principle, can express any function, but it can be hard to learn!

The system consists of nodes, which calculate the weighed sum of inputs (from antecedent nodes), apply some function, and output the result to successor nodes

Main types:

Feed-forward networks: the data ows from inputs to outputs single-layer = perceptron: simple, but restricted

multi-layer: very expressive, but hard to learn and inter- pret

Recurrent networks: for modelling time series data (now we take into account also the value of the node in the previous time step)

36

(37)

Instance-based methods

do not built any explicit model

Assumption: the data point behaves in the similar way than the neighbouring points

How to dene neighbours?

k-nearest neighbours

for numeric data

A simple approach: if d1, .., dk are k nearest neighbours to d according to Xi variables, then Y is weighed average of Y1, .., Yk

Problem: how to dene neighbours? Several possible distance functions (and Euclidean is not always the best!)

Several variations for non-numeric data, e.g.

Case-based systems: All solutions are stored in a database.

Similar cases are searched from the database and the solutions are applied to new situations or new solutions are generated from the old ones.

Social or collaborative ltering (usually recommending sys- tems, not necessarily expert systems in classical sense)

37

(38)

Genetic algorithms

simulates evolution process

we keep a set of solutions (population), which evolves through selection, crossing and mutation operations

can be slow, but achieve good results for complex problems, which are hard to model otherwise

e.g. learning the set of optimal action rules for a robot

38

(39)

Which paradigm to select?

There is no one super system for all problems!

The most appropriate system depends on the problem

The systems come into fashion and go out of fashion Abstraction level?

Does the system work with

high-level data (structured, e.g. relational database)

low-level data (unstructured sensor data, e.g. sounds, images)

or both?

Traditionally only high-level data the expert can dene the model and understands it!

Task: Invent applications of expert systems, which use only low- level data!

39

(40)

RECOGNITION PATTERN

. . . STRUCTURED INFORMATION

TEXT

DOCUMENTS RETRIEVAL DATA

KNOWLEDGE DISCOVERY

MODEL

sensor k sensor 2

PHYSICAL SENSORS sensor 1

unstructured data

EXPERT domain

knowledge

OTHER DATABASES

40

(41)

Basic datatypes

Usually the system is restricted to certain data types!

or numeric quantitative

or categorial qualitative

ordinal nominal

DATA

continuous numeric

discrete numeric

dicrete data

The data can be classied according to type of attribute domain:

numerical (quantitative) data: has meaning as numbers: we can measure order, distance and equality of two numerical variables.

discrete data: can get only countably many values (Notice:

in computer systems, all the data is represented with some precision and is thus discrete) (e.g. age={1, ...,120})

continuous data can in principle get any value in the given interval (e.g. weight 0)

categorial (qualitative) data: has no meaning as numbers; gen- erally we can measure only equality between two categorial variables (e.g. 1=female, 2=male)

nominal data: the values are only names for categories and have no other meaning as categorizing

ordinal data: the values have a natural ordering, but the dierences have no meaning (e.g. 1=poor, 2=fair, 3=good, 4=excellent)

41

(42)

Temporal or static data?

In temporal data we can nd dynamic patterns (how the system evolves in time)

Amount of data:

If we have too little data (relative to number of variables), the parametres cannot be estimated accurately!

Corrupted data:

The data may contain errorneous or missing values

The measurements can cause variation (noise)

Outliers: non-typical data points with extreme values

Solutions:

enough data

robust models (=not sensitive to small variations in data)

use of expert's knowledge

Task: Find out, what means overtting and how it can be pre- vented!

42

(43)

Desirable characteristics

Typically, we wish that the model

can be learnt from data

can integrate expert's knowledge

reasons in a consistent way and detects incosistencies (e.g.

conicting rules, circular rule chains, illegal values)

can be updated in the light of new evidence

can handle noisy data (missing, spurious or errorneous)

meets time-requirements with the given data (works in real time, if needed)

43

Viittaukset

LIITTYVÄT TIEDOSTOT

Kandidaattivaiheessa Lapin yliopiston kyselyyn vastanneissa koulutusohjelmissa yli- voimaisesti yleisintä on, että tutkintoon voi sisällyttää vapaasti valittavaa harjoittelua

awkward to assume that meanings are separable and countable.ra And if we accept the view that semantics does not exist as concrete values or cognitively stored

We define the model structure for both data sets, such that F R is the root node in both networks but other variables A, B, C can be either root or leaf nodes and the edges can

Modelling tree height-diameter allometry of Chinese fir in relation to stand and climate variables through Bayesian model averaging approach.. Lu L., Chhin S., Zhang J.,

The relationship between defoliation and survival-times was captured in a Cox proportional hazard model with a defoliation stress index (DSI), diameter (DBH), crown class (CCL),

The development of a model of plant or cellular tissue requires the use of a modeling paradigm: (i) imperative using a script or a compiled language, (ii)

The various dierent approaches to model and implement intelligent behavior such as neural networks, fuzzy logic, non-monotonic (default) logics and Bayesian networks all address

Following a strict family involvement definition including only formal involvement in the family businesses, these kinds of persons would not be categorised as involved in