Continuesonthereverseside! Courseexamination,Wednesday11December,9:00–12:00Examiner:JyrkiKivinen 582631IntroductiontoMachineLearning(Autumn2013)

(1)

582631 Introduction to Machine Learning (Autumn 2013) Course examination, Wednesday 11 December, 9:00–12:00

Examiner: Jyrki Kivinen

Answer all the problems. The maximum score for the exam is 60 points.

You may answer in English, Finnish or Swedish. If you use Finnish or Swedish, it may be helpful to include the English translations to any technical terms you introduce.

1. [15 points] Explain briefly the following terms and concepts. Your explanation should include, when appropriate, both a precise definition and a brief description of how the concept is useful in machine learning. The explanation of a single concept should take at most half a page.

(a) precision and recall (b) Gini index

(c) one-against-onemulticlass classification (also known asall-vs-all) (d) k-fold cross validation

(e) partitional clustering

2. [15 points] Two of the most common learning algorithms for classification are the per- ceptron algorithm and Hunt’s algorithm (also known as TDIDT). Describe the main properties of these two algorithms, and compare them, considering for example

• what kind of classifiers the algorithms produce

• can they handle different kinds of input variables (categorical vs. numerical etc.)

• what are the most important variations and possible generalizations of the algorithms

• how efficient they are computationally.

Based on this, can you give some ideas on which kind of problems one or the other algorithm would be more appropriate?

(You donotneed to give pseudocode or other detailed explanation ofhowthe algorithms work.)

3. [15 points]

(a) Explain what is meant bylinear regression. Give an example of an application in which it might be appropriate, and also a general mathematical formulation.

(b) Derive the least-squares solution for the simple case where we have only one pa- rameter (i.e., one dimension and no bias term). Remember to explain what you do and why.

Continues on the reverse side!

(2)

(c) State also the solution for the general multivariate least-squares problem. You do not need to derive it, but explain the meaning of all symbols and any underlying assumptions. What computations one would need to do in practice to apply the formula? Can you estimate the computational complexity?

(d) Explain how you would apply the linear regression framework to fit a polynomial to a set of data points (x_i, y_i)∈R×R. Using polynomials as an example, explain the notions of overfittingand underfitting. What techniques can one use to avoid overfitting and underfitting, both generally and specifically in linear regression?

4. [15 points] In this problem, you are asked to apply hierarchical clustering to a set of seven data points, with dissimilarities between the points given by the matrix

p₁ p₂ p₃ p₄ p₅ p₆ p₇

p₁ 0 2.24 3.92 5.28 4.55 6.52 8.04 p2 2.24 0 2.86 4.04 2.35 4.53 6.00 p₃ 3.92 2.86 0 1.36 2.86 3.33 4.79 p₄ 5.28 4.04 1.36 0 3.28 2.66 3.91 p5 4.55 2.35 2.86 3.28 0 2.42 3.77 p₆ 6.52 4.53 3.33 2.66 2.42 0 1.53 p₇ 8.04 6.00 4.79 3.91 3.77 1.53 0

These dissimilarities are actually distances between the points p₁ = (0.5,2.0), p₂ = (2.5,3.0), p3 = (4.2,0.7), p4 = (5.5,0.3),p5 = (4.8,3.5),p6 = (7.0,2.5), p7 = (8.5,2.8) in the two-dimensional Euclidean plane, shown in the figure below. This information is not necessary for solving the problem, but probably useful for visualizing the situation.

(a) Apply basic agglomerative hierarchical clustering to this data using thesingle link (min) notion of dissimilarity between clusters. Visualize the result as a dendrogram.

(b) Repeat the clustering using now complete link (max) dissimilarity. Compare the results.

(c) In addition to single link and complete link, define at least two other notions of dissimilarity between clusters. (Just the definitions are sufficient, no further explanation is needed.)

2