• Ei tuloksia

Bayes classifier and Bayes error

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Bayes classifier and Bayes error"

Copied!
2
0
0

Kokoteksti

(1)

Bayes classifier and Bayes error

Jyrki Kivinen 19 November 2013

After the lecture today, several students asked for clarification to the concepts of Bayes classifier and Bayes error. To put it simply, the Bayes classifier (or Bayes optimal classifier) is a classifier that minimizes a certain probabilistic error measure. The Bayes error is then the error of the Bayes classifier. As far as I know, the word “Bayes” appears simply because the Bayes formula is often very useful in analysing these concepts.

The notions of Bayes optimality and Bayes error generalize directly also to regression, but for simplicity we consider only classification here.

To keep things even more simple, assume the objects to be classified come from a finite set X. If the set of classes is Y, a classifier is then a function f: X → Y.

We now fix some cost functionC: Y × Y → R. The most common case is the 0-1-loss where C(y, y) = 0 for all y, and C(y, y0) = 1 if y =y0, giving the number of misclassifications. (see, for example, pages 88–92 in lecture notes). If a classifier f is applied to a point x for which the real class is y, the cost C(y, f(x)) is incurred.

Suppose further we are given a probability distribution P(X, Y) over X × Y. We can then define the expected cost Cost(f) of a classifier f as

Cost(f) = X

(x,y)∈X ×Y

P(x, y)C(y, f(x)).

A classifier f is called Bayes optimal, or Bayes classifier, if it minimises Cost(·), that is,

f = arg min

f

Cost(f).

The minimum expected loss Cost(f) is called the Bayes error. (In general, the Bayes optimal classifier need not be unique, since there may be several classifiers that achieve the same minimal error.)

1

(2)

If we want to find the Bayes classifier, note that we can write Cost(f) = X

x∈X

Gx(f(x))

where

Gx(y) = X

y0∈Y

P(x, y0)C(y0, y).

Then we can, separately for each x∈ X, pick f(x) = arg miny∈YGx(y), and it is easy to see that this gives a Bayes optimal f =f. For example, for the 0-1-loss, we have

Gx(y) = X

y06=y

P(x, y0), so we get simply

f(x) = arg min

y∈Y

X

y06=y

P(x, y0) = arg max

y∈Y

P(x, y).

Often one is given the class conditional probabilities instead of the total probabilities P(X, Y), so this becomes

f(x) = arg max

y∈Y

P(x|y)P(y).

We have not said anything about where the probability distributionP(X, Y) comes from. In theoretical analyses we may assume that it is some un- known “true” distribution whereby the data are generated. In practice it might be the result of some machine learning algorith, and by determining the Bayes optimal prediction we are converting probabilistic predictions to forced-choice so as to minimize the resulting number of mistakes, assuming our initial probabilities were (roughly) correct.

2

Viittaukset

LIITTYVÄT TIEDOSTOT

Tut- kintojen lukum¨ a¨ ar¨ at on esitetty summina, joissa ensimm¨ ainen luku on ennen vuotta 2005 aloittaneiden opiskelijoiden osuus, toinen luku vuon- na 2005 ja sen j¨

Aineistoa ei ole vielä kerätty kovin pitkältä ajalta, joten tässä työssä luvus- sa 3.1 esitetty malli huomioi riippuvuuden vain edellisestä ajanhetkestä AR(1)-

The results were Logistic Regression has the lowest performance of the methods for classification, then Naive Bayes, next Bayesian networks, finally Support Vector Machines and

Coarse-grained evaluation measure was used to determine the performance of the EmoTect’s support vector machine, WEKA’s Multinomial Naïve-Bayes and J48 decision

I The central idea of the Na¨ıve Bayes classifier is to assume that the class-conditional distributions factorize, i.e.. Now all probabilities are

Your explanation should include, when appropriate, both a precise definition and a brief description of how the concept is useful in machine learning.. [12 points] Bayes

Regarding the “Smarket” data, following the strategy 2, the selected classifiers are Logistic Regression, K-nearest neighbor, Naïve Bayes, Tree classifier and Multinomial

In our study we built decision tree, logistic regression, random forest, gradient boosting, XGBoost and Support vector classification, gaussian naïve bayes