Continues on the next page!

(1)

582669 Supervised Machine Learning (Spring 2012) Homework 5 (24 February)

Solutions are due onFriday, 24 February, at 9:00am. Notice that some of the problems are based on material that will be presented during the last week of lectures.

1. Consider the problem minimise x²

subject to (x−2)²≤1.

Write out the LagrangianL(x, λ). Solve the problem using the KKT conditions. Write out also the dualg(λ)in a closed form. Repeat the exercise with the constraint(x−2)²≤8.

The following is not a compulsory part of the homework (so you don’t need to provide plots etc.

as part of your homework), but you might find it useful to illustrate the notions of Lagrangian and dual: Use some mathematical software (such as Maple) to plot L(x, λ) (as a function of two variables, so you get a three-dimensional picture). Find from this plot the functionsx7→

max_λL(x, λ)andλ7→min_xL(x, λ). Notice the point where they coincide.

2. Consider a variant of the Perceptron algorithm, where the new weight vectorw_t+1is defined as thew that solves the minimisation problem

minimise kw−wtk² subject to ytw·xt≥1.

Writewt+1 as a function of wt,xtandyt(in other words, solve the optimisation problem).

Assuming that the initial weight vector w1 is zero, show that wt is a linear combination of x1, . . . ,xt−1. Write the kernelised form of the algorithm.

Remark:The idea is to always update the weight vector so that the latest data point is classified with a positive margin, but try to keep close to the old weight vector since it contains all the information we have gained from the previous trials. This particular algorithm is probably not very good (at least not with noisy data), but the general idea of deriving online algorithms from minimisation problems like this is very useful.

3. The 2-norm soft margin SVM is like the 1-norm soft margin SVM except that the objective function is

−µ+C

m

X

i=1

ξ²_i

(the slack variables are squared). Show that nowαis obtained by maximising

W(α) =− 1 4C

m

X

i=1

α²_i −





m

X

i,j=1

αiαjyiyjk(xi, xj)





1/2

.

Continues on the next page!

(2)

4. The following is written assuming you use Matlab and LIBSVM, but as usual feel free to use whatever tools you prefer.

The filehw05data.txtavailable from the course home page contains a sample of 400 points, one on each row. Each row contains the valuesx1, x2 and y for a data point(x, y)where x ∈R² andy ∈ { −1,1}. You can read the file into an400×3 Matlab array with commanddlmread.

The data consists of two concentric rings, with some noise near their boundary (plot it to see for yourself). Split the data into two parts, half as a training set and half as the test set.

Download the LIBSVM package fromhttp://www.csie.ntu.edu.tw/˜cjlin/libsvm/. We will use it to learn a soft margin support vector machine with the Gaussian kernel (i.e., radial basis function) using the training set, and test it using the test set.

The objective of the exercise is to see how the parametersC andγ (the “width” of the kernel) affect learning. Try fixingC to a suitable value and varyγ. At extreme values you should see overfitting (small error on training set, large error on test set) and underfitting (large error on training set), with the optimum somewhere in between. Repeat the same with γ fixed and C varying. You don’t have to find the optimal parameter combination. Just try to produce some graphics that show this effect of varying the parameters. Include in your solution also a brief description of what you did and what observations you made.

Before you start running the data through Matlab, it may be useful to familiarise yourself with the parameterisation etc. of LIBSVM using thesvm-toythat comes with the download and can also be used on the web site. Use the graphical user interface to create a data set that looks similar tohw05data.txt(but can be much smaller) and try different parameter combinations.

2