1 TAN=tree augmented naive Bayes

(1)

1 TAN=tree augmented naive Bayes

The idea is to define a networks structure, which is like naive Bayes (i.e. the root node is FR and leaf nodes are A and B for Prog.1, for Prog. 2 the leaf nodes could be TP1, D, E), but now we represent the strongest dependencies between leaf nodes.

In Prog. 1 model this means that we simply let B to depend on A, in addition to FR1. In Prog. 2, we should define the optimal dependencies, but you could simply try following: D depends on TP1, and E depends on D.

The parameters are simply calculated from frequencies. E.g. P(FR=1) is the number of rows, where FR=1, divided by the total number of rows. Conditional probabilityP(B = 1|F R = 1, A= 1) is the number of rows, where B=1, FR=1 and A=1 divided by the number of rows, where FR=1 and A=1.

If you want to get better accuracy, you could also try Dirichlet smoothing method method for defining parameters (and compare the results). (In fact, I could give an extra ects for the one, who implements this – it is not required for your project work!)

2 Bayesian multinets

Now we define two Bayesian classifiers, one for failed students and the other for passed students. The model structures should be different, because otherwise there is no reason for two networks. On the other hand, if it seems that the optimal structure for both networks is the same, we can use just one networks.

There are two ways to define the model structure. In both approaches you should first divide the datasets into two parts, one containing only failed students and the other one only passed students.

1. You can analyze the dependencies between attribute values in both data sets and try to find strongest dependencies. Then you define a model, which contains an arrow from X to Y, if Y is strongly dependent on X. I suggest to use simple models, like TANs.

2. You learn the optimal networks structure by some tool. Hugin may be able to do that, too, but I have used another tool called camml, which you can install to either Linux or Windows:

http://www.datamining.monash.edu.au/software/camml/.

1