• Ei tuloksia

Continuesonthereverseside! Courseexamination,Wednesday16DecemberExaminer:JyrkiKivinen 582631IntroductiontoMachineLearning(Autumn2015)

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Continuesonthereverseside! Courseexamination,Wednesday16DecemberExaminer:JyrkiKivinen 582631IntroductiontoMachineLearning(Autumn2015)"

Copied!
2
0
0

Kokoteksti

(1)

582631 Introduction to Machine Learning (Autumn 2015) Course examination, Wednesday 16 December

Examiner: Jyrki Kivinen

Answer all the problems. The maximum score for the exam is 60 points.

You may answer in English, Finnish or Swedish. If you use Finnish or Swedish, it may be helpful to include the English translations to any technical terms you introduce.

To make grading easier, please write your answer to each problem on its own sheet.

1. [12 points] Explain briefly the following terms and concepts. Your explanation should include, when appropriate, both a precise definition and a brief description of how the concept is useful in machine learning. Your answer to each subproblem should fit to roughly half a page of normal handwriting.

(a) Gini indexand entropy

(b) one-versus-one andone-versus-rest (c) single linkageand complete linkage

2. [12 points] We have a binary classification task over an arbitrary instance space X. Suppose we are given a scoring classifier ˆs:X →R. Hence, large s(x)ˆ means that the true label of x is considered more likely to be +1 than −1. We have a test set of 10 examples. (For simplicity, we consider an unrealistically small set.) The table below gives for each of the 10 data points xi the scores(xˆ i) and the true labelyi.

data point 1 2 3 4 5 6 7 8 9 10

score 78 23 11 92 65 44 52 48 36 42

label −1 +1 −1 +1 +1 −1 +1 −1 −1 +1

(a) Draw the ROC curve that visualises the performance of the scoring classifier sˆon our test set.

(b) Define theranking accuracyof a scoring classifier. How can you determine it using the ROC curve? Use the ROC curve from part (a) as an example.

Continues on the reverse side!

(2)

3. [12 points] We consider predicting a binary class Y ∈ { −1,+1} with two binary in- put features X1, X2 ∈ { −1,+1}. Suppose we know that (X, Y) = ((X1, X2), Y) is generated as follows.

(a) First we pickY = +1with probability 2/3 andY =−1 with probability1/3.

(b) We then pick X1 = +1 with probability 1/2 and X1 = −1 also with probability 1/2.

(c) Finally, if Y = +1 we pick X2 = X1 with probability 3/4 and X2 = −X1 with probability 1/4; ifY =−1we pick X2 =X1 with probability 1/4 and X2 =−X1 with probability3/4

Calculate the posterior probabilities P(Y = +1 | X = (x1, x2)) for all (x1, x2) ∈ { −1,+1}2. What is the Bayes-optimal classifier for this distribution? What is the Bayes error?

4. [12 points] What is meant bygeneralisationandoverfittingin the context of supervised learning? In general, how can we evaluate generalisation, and try to avoid overfitting?

You should both explain general methods, and give specific examples of techniques for avoiding overfitting with at least two different types of models (for example, linear models and decision trees).

5. [12 points]

(a) For what kind of tasks can we use the K-means algorithm? Explain carefully what the inputs and outputs of the algorithm are, and give a very brief intuitive explanation of how the results are to be interpreted.

(b) Describe the actual K-means algorithm. The description should brief and on a high level.

(c) Define formally the objective function which the K-means algorithm tries to min- imise. Can you give any guarantees about how well the algorithm actually min- imises the function?

2

Viittaukset

LIITTYVÄT TIEDOSTOT

Explain how the minimum description length (MDL) principle with a two-part code can be applied to the model class selection problem.. Your answer should include

Your solution should consist of a brief explanation of the observations you made, a couple of representative plots to support this, and a printout of your program

• Line numbers are useful, if you refer to certain lines in your code. Begin each code line

The last chapter in your thesis (or section in a paper) is called References. For each

The last chapter in your thesis (or section in a paper) is called References. For each

The business model that focuses solely on maximizing shareholder wealth is be- coming obsolete, and is morphing into one that is concerned with all the stake- holders including

a) How is work-based learning in your example carried out? For instance, description of setting, learning methods. Work-based learning takes place at the company and at the

Write your answer under each problem number. If you do NOT know the answer, leave the space empty. For each WRONG answer, 1/4 of the points is deducted.. Erkki wants to fill the bigger