582650 Information-Theoretic Modeling (Autumn 2012) Course examination, Thursday 18 October, 9:00–12:00 CK112 Examiner: Jyrki Kivinen
Answer all the problems. The maximum score for the exam is 36 points. You may answer in Finnish, Swedish or English.
1. [12 points]
(a) Let A ∈ {1,2,3,4} and B ∈ {1,2} be two random variables with the following joint distribution:
A= 1 A= 2 A= 3 A= 4 B = 1 1/16 1/8 1/4 1/16 B = 2 1/4 1/8 1/16 1/16 .
Calculate the entropiesH(A)and H(B), the joint entropyH(A, B)and the conditional entropyH(B |A).
(b) AssumeX andY are two random variables about which we only know that H(X) = 3 and H(Y) = 2. Under this assumption, what are the largest and smallest possible values for the mutual information I(X;Y)? Justify your claims, and show by example that your upper and lower bounds can actually be achieved.
(c) Define the Kullback-Leibler divergence (relative entropy) between two dis- cete distributions. Show that it is always non-negative.
In parts (a) and (b) you may use any properties of entropy etc. that you know from the course, but please state explicitly which properties you are using. In the proof for part (c), you may use only generally known mathematical results, including Jensen’s inequality.
Continues on the reverse side!
2. [12 points]
(a) What is meant by a decodable code, and a prefix code? What does the Kraft(-McMillan) Theorem tell us about such codes? Give both a precise mathematical formulations and a very brief intuitive explanation of how these concepts and results can be useful.
(b) Explain the basic idea of the Huffman coding algorithm. What are its input and output, and what guarantees do we have for the quality of the output?
What result does the algorithm give for a five-symbol alphabet{a,b,c,d,e} when P(a) = 0.05, P(b) = 0.35, P(c) = 0.15, P(d) = 0.25 and P(e) = 0.2.
You don’t need to give a step-by-step simulation, just the end result and perhaps a brief summary of some key points to help understand it.
3. [12 points]Two-part coding and model class selection. Explain how the minimum description length (MDL) principle with a two-part code can be applied to the model class selection problem. Your answer should include the following:
• what is the model class selection problem, including an example
• mathematical definition of the two-part code in the discretely parameterized case, and what is the standard way of dealing with a continuous parameter space
• what does the MDL principle mean in this context, and why using it might make sense
• in practice, what modelling and computational steps would be included in the process.
2