• Ei tuloksia

Continuesonthereverseside! Courseexamination,Thursday18October,9:00–12:00CK112Examiner:JyrkiKivinen 582650Information-TheoreticModeling(Autumn2012)

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Continuesonthereverseside! Courseexamination,Thursday18October,9:00–12:00CK112Examiner:JyrkiKivinen 582650Information-TheoreticModeling(Autumn2012)"

Copied!
2
0
0

Kokoteksti

(1)

582650 Information-Theoretic Modeling (Autumn 2012) Course examination, Thursday 18 October, 9:00–12:00 CK112 Examiner: Jyrki Kivinen

Answer all the problems. The maximum score for the exam is 36 points. You may answer in Finnish, Swedish or English.

1. [12 points]

(a) Let A ∈ {1,2,3,4} and B ∈ {1,2} be two random variables with the following joint distribution:

A= 1 A= 2 A= 3 A= 4 B = 1 1/16 1/8 1/4 1/16 B = 2 1/4 1/8 1/16 1/16 .

Calculate the entropiesH(A)and H(B), the joint entropyH(A, B)and the conditional entropyH(B |A).

(b) AssumeX andY are two random variables about which we only know that H(X) = 3 and H(Y) = 2. Under this assumption, what are the largest and smallest possible values for the mutual information I(X;Y)? Justify your claims, and show by example that your upper and lower bounds can actually be achieved.

(c) Define the Kullback-Leibler divergence (relative entropy) between two dis- cete distributions. Show that it is always non-negative.

In parts (a) and (b) you may use any properties of entropy etc. that you know from the course, but please state explicitly which properties you are using. In the proof for part (c), you may use only generally known mathematical results, including Jensen’s inequality.

Continues on the reverse side!

(2)

2. [12 points]

(a) What is meant by a decodable code, and a prefix code? What does the Kraft(-McMillan) Theorem tell us about such codes? Give both a precise mathematical formulations and a very brief intuitive explanation of how these concepts and results can be useful.

(b) Explain the basic idea of the Huffman coding algorithm. What are its input and output, and what guarantees do we have for the quality of the output?

What result does the algorithm give for a five-symbol alphabet{a,b,c,d,e} when P(a) = 0.05, P(b) = 0.35, P(c) = 0.15, P(d) = 0.25 and P(e) = 0.2.

You don’t need to give a step-by-step simulation, just the end result and perhaps a brief summary of some key points to help understand it.

3. [12 points]Two-part coding and model class selection. Explain how the minimum description length (MDL) principle with a two-part code can be applied to the model class selection problem. Your answer should include the following:

• what is the model class selection problem, including an example

• mathematical definition of the two-part code in the discretely parameterized case, and what is the standard way of dealing with a continuous parameter space

• what does the MDL principle mean in this context, and why using it might make sense

• in practice, what modelling and computational steps would be included in the process.

2

Viittaukset

LIITTYVÄT TIEDOSTOT

(a) Apply basic agglomerative hierarchical clustering to this data using the single link (min) notion of dissimilarity between clusters.. Visualize the result as

Your explanation should include, when appropriate, both a precise definition and a brief description of how the concept is useful in machine learning.. Your answer to each

Outline Administrative issues Overview of Contents Compression..

The exponential function is locally injective in C.. The

As a model class selection criterion we use the length of the code word that describes the number of clusters, the assignments of points to clusters, the types of the clusters and

information theory, statistical learning theory, Bayesianism, minimum description length principle, Bayesian networks, regression, positioning, stemmatology,

Thereafter, four operational satellites -- the basic minimum for satellite navigation in principle the basic minimum for satellite navigation in principle -- will be will be

In this paper, we in- vestigate the enumerative two-part crude MDL code for the Bernoulli and multinomial models, exhibit a strong connection with the NML approach, with surprising