• Ei tuloksia

Continues on the next page!

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Continues on the next page!"

Copied!
2
0
0

Kokoteksti

(1)

582669 Supervised Machine Learning (Spring 2012) Homework 1 (27 January)

This homework is due on Friday, 27 January at 9:00 am. Either drop your solutions on paper in the box in the door of office B229a, or send them by e-mail to Joonas Paalasmaa (joonas.paalasmaa@helsinki.fi).

1. Consider soft binary classification with some appropriate loss functionL. Suppose that for some reason we believe that the next label y is 1 with probabilityq, and 0 with probability 1−q.

Then a natural prediction would bepwhich minimises the expected loss (1−q)L(0, p) +qL(1, p).

Such a valuepis called theBayes optimal prediction.

Determine the Bayes optimal prediction, as a function ofq, for absolute loss Labs, logarithmic lossLlog and square loss Lsq.

2. Consider the Weighted Majority algorithm with someβ >0, and letηandcbe as in Theorem 1.5.

In this problem you are asked to generalise the proof of Theorem 1.5.

(a) Show that if for someM ≥0there arekdifferent hypotheseshithat all satisfyL≤T0,1(S, hi)≤ M, then

L≤T0,1(S,WM)≤cηM+clnn k.

(b) Change the initialisation of the algorithm so that w1,i=pi for alli, wherepi >0for alli and Pn

i=1pi= 1but otherwise pis arbitrary. Show that for the modified algorithmWM0 we have

L≤T0,1(S,WM0)≤ min

1≤i≤n

cηL≤T0,1(S, hi) +cln 1 pi

.

3. How would you generalise (a) the Halving algorithm and (b) the Weighted Majority algorithm to multiclass classification? That is, nowY ={`1, . . . , `k}forkclasses`i, and we still use the 0-1 loss. Prove for your generalised algorithms the analogues of the mistake bounds of Theorem 1.2 and Theorem 1.5.

Continues on the next page!

(2)

4. This is a small programming exercise for familiarising you with doing simple online learning in your programming environment of choice. Popular choices include Matlab, Octave, R, and Python. You may use something else if you prefer, but if plan to you use some unusual language, please check first with Joonas Paalasmaa that he understands it enough to grade your work.

Consider the following setting.

• There are T inputs x1, . . .xT, where each xt = (xt,1, . . . , xt,n) ∈ { −1,1}n is an n- dimensional vector. Each componentxt,iis1with probability1/2and−1with probability 1/2.

• The concept class is H = {h1, . . . , hn} where hi picks the ith component of the input:

hi((x1, . . . , xn)) =xi.

• The labelsyt∈ { −1,1}are determined by

yt=

xt,1 with probability1−ν

−xt,1 with probabilityν

for some ν >0. In other words, h1 is the target concept but there is classification noise with rateν.

Implement the Weighted Majority algorithm in this setting. Try it with a couple of values forn (sayn= 100andn= 1000) andν (sayν = 0.2andν= 0.4) and see, how the choice of learning rateβ affects the behaviour of the algorithm. In particular, making following kinds of plots may be illuminating:

• Plot the normalized weightsvt,i=wt,i/P

jwt,j as a function oft; compare the “relevant”

weight vt,1 to “irrelevant” onesvt,j,j6= 1, or plot all the weights into the same figure.

• Plot the cumulative lossPt

j=1L0−1(yt,yˆt)as a function of t.

• Plot the cumulative mistake count with respect to the “unnoisy” labels,Pt

j=1L0−1(xt,1,yˆt), as a function of t.

In general, you should see that withβ close to0 the algorithm behaves erratically, withβ close to1more smoothly but also takes more time. In most cases T = 500or fewer iterations should be sufficient to see what is going on.

Your answer to this problem should consist of a brief explanation of the observations you made, supported by some plots, and your program code (which doesn’t need to be polished, but some comments in the code might be useful if you do something non-obvious). Don’t include all the plots you make, just half a dozen or so representative ones. You are not expected to make a systematic study of this setup, which after all is a bit artificial. The goal is to make the Weighted Majority algorithm a bit more concrete and prepare for later exercises with other online algorithms.

2

Viittaukset

LIITTYVÄT TIEDOSTOT

In practice you should just calculate the values exactly (if you need specific numerical values) or use Chernoff bounds or similar (if you need an asymptotically fairly

Then an error occurs for a given bit, i.e., the recipient is unable to recover its correct value, if at least k/2 of the sent bits were flipped.. (a) Give an exact expression in

(a) What is the optimal strategy if you want to minimize the expected number of questions required.. What can you say about the expected number of questions that

(This has been counted as two problems to keep the total work load more manageable. For purposes of grading, and perhaps also for planning your work, you may think that Problem 4 is

Your solution should consist of a brief explanation of the observations you made, a couple of representative plots to support this, and a printout of your program

Koska tarkastelussa on tilatyypin mitoitus, on myös useamman yksikön yhteiskäytössä olevat tilat laskettu täysimääräisesti kaikille niitä käyttäville yksiköille..

The Canadian focus during its two-year chairmanship has been primarily on economy, on “responsible Arctic resource development, safe Arctic shipping and sustainable circumpo-

Indeed, while strongly criticized by human rights organizations, the refugee deal with Turkey is seen by member states as one of the EU’s main foreign poli- cy achievements of