582650 Information-Theoretic Modeling (Autumn 2012) Homework 2/Solutions

(1)

582650 Information-Theoretic Modeling (Autumn 2012) Homework 2/Solutions

1. (a) Provide an alternative proof for Gibb’s Inequality where you use Jensen’s Inequality

Solution Consider distribution Y which forx∈ X produces value _p(x)^q(x) with probability p(x). Then

X

x∈X

p(x) logq(x)

p(x) =E[log(Y)]

≤

J ensen

log(E[Y]) = log X

x∈X

p(x)q(x) p(x)

!

= log X

x∈X

q(x)

!

= log(1) = 0

provides the required. Gibbs’ inequality trivially achieves equality atq=p, elsewhere strict concavity of the logarithm implies strict inequality.

(b) Prove that for any n positive numbers x1, . . . , xn, their geometric mean is at most their arithmetic mean:

n

Y

i=1

xi

!^1/n

≤ 1 n

n

X

i=1

xi.

Solution Consider distribution Y which produces number xi with probability ¹_n. We have

log





n

Y

i=1

xi

!1/n

= 1 n

n

X

i=1

logxi

!

=E[logY] ≤

J ensen

log(E[Y]) = log 1 n

n

X

i=1

xi

!

2. Consider two binary-valued random variablesX andY with the following joint distribution:

Y = 0 Y = 1

X= 0 1/3 1/3

X= 1 0 1/3.

In other words, the caseX = 1, Y = 0 never occurs, but the other combinations are all equally probable.

Calculate the values of (a) H(X)andH(Y) (b) H(X |Y)andH(Y |X)

(c) H(X, Y)

(d) H(Y)−H(Y |X) (e) I(X;Y).

Optionally, you may wish to draw a graphical representation to illustrate these quantities (as on page 30 of lecture notes for 12 September, or as drawn on the blackboard during the lecture).

(2)

Solution

(a) [H(X) =]H(Y) =−P

y∈{0,1}pY(y) logpY(y) = log 3−²₃log 2 (b) [H(X |Y) =]H(Y |X) =−P

x,y∈{0,1}pX,Y(x, y) logp_Y_|X(y|x) = ²₃log 2 (c) H(X, Y) =P

x,y∈{0,1}p_X,Y(x, y) logp_X,Y(x, y) = log 3 (d) H(Y)−H(Y |X) = log 3−⁴₃log 2

(e) I(X;Y) =H(Y)−H(Y |X) = log 3−⁴₃log 2.

3. Give a proof for the chain rule of mutual information:

I(X;Y₁, . . . , Y_N) =

N

X

i=1

I(X;Y_i|Y₁, . . . , Y_i−1).

Show also that in the special case where the random variables Yi are all independent, this simplifies to

I(X;Y1, . . . , YN) =

N

X

i=1

I(X;Yi).

Solution

I(X;Y1, . . . , YN) =H(Y1, . . . , YN)−H(Y1, . . . , YN |X)

=H(Y₁, . . . , Y_N) +H(X)−H(Y₁, . . . , Y_N, X)

=

N

X

i=1

H(Yi|Y1, . . . , Y_i−1)−

N

X

i=1

H(Yi|Y1, . . . , Y_i−1, X)

=

N

X

i=1

I(X;Y_i|Y₁, . . . , Y_i−1)

In case allYi are independent, we can simply drop them from the right side of the conditioning bar in line three above, as on the left side of the bars there are onlyYi’s of larger index. For the second term this requires that theYi are not only mutually independent, but also independent givenX, this was a bit unclear in the setting of the problem.

So what happens if theYiare mutually independent, but dependent givenX? Consider a domain of three boolean variablesX, Y₁, Y₂, whereY₁ and Y₂ take on valuestrue andf alseeach with probability0.5, andX = (Y₁⇔Y₂). Then1 =I(X;Y₁, Y₂)6=P2

i=1I(X;Y_i) = 0(in bits).

4. There are 12 balls that look exactly the same, one of which is either heavier or lighter than the others. Your task is to determine which one of the balls is different, and whether it is heavier or lighter. To do this, you are given a balance. You can put any number of balls on each side of the balance, and the balance will tell you whether the sides are of equal weight or if the left or the right side is heavier. What is the fewest number of weighings you can use to solve the problem? Please describe your strategy and prove that it always works. If you can, also prove that the problem cannot be solved using fewer weighings.

Solution First observe that the minimum number of weightings must be at leastdlog₃24e= 3, since there are24possible outputs (12 balls, each can be either light or heavy) and3 possible outcomes for each weighting (balance, tip left or right).

In each step we should reduce the number of remaining possible outputs as much as possible.

Here is one strategy.

2

(3)

(1) Weigh four balls against four, if the scale tips the special ball is among these eight (continue at (2)), otherwise it is among the remaining four (continue at (6)). In either case there are eight remaining outputs, since if the scale tips its direction tells us, given the number of the exceptional ball, whether it is heavy or light. This is the best we can achieve with a test of three possible outcomes.

(2) Weigh two potentially heavy balls and one potentially light ball against one potentially heavy, one potentially light and one neutral ball we had discarded in (1). If the scale stays balanced, the exceptional ball is among the remaining three (continue at (3)). If it tips to the double-heavy side the solution must be either of these two or the potentially light ball on the other side (continue at (4)). If the scale tips to the side with only one potentially heavy ball, then the solution can be either this, or the light ball on the other side (continue at (5)).

(3) Weigh the two potentially light balls against each other, the scale will either rise on the side of the lighter ball or stay level, in which case the third remaining ball is heavy.

(4) Weigh the two potentially heavy balls against each other, the scale will either tip to the side of the heavier ball or stay level, in which case the third remaining ball is light.

(5) Weigh the potentially light ball against a neutral ball discarded in (1). If the scale tips then this is the solution, otherwise the second remaining ball was heavy.

(6) Weigh three of the remaining four balls against three neutral balls. If the scale stays balanced the fourth ball is the solution, and with a third weighting against a neutral ball we can determine whether it is heavy or light. Otherwise we have three remaining ball of which we know whether one of them is heavy or one of them is light. Weigh two of these against each other in order to decide which of the three.

5. Consider the simple Bernoulli model that generates independent random bits withPr[X_i= 1] = pfor some fixed 0≤p≤1.

For sequence lengthn, and some > 0, the typical set Aⁿ is defined as the set of sequences x1, . . . , xn such that

2^−n(H(X)+)≤p(x1, . . . , xn)≤2^{−n(H(X)−)} .

What are the sequences in the typical setA¹⁵_0.1 under the Bernoulli model when p= 0.1? How aboutp= 0.3, andp= 0.5? What can you say about the sizes of these sets?

Solution Of course the above meansk≈E[k] =np, that is we observe as many ones, roughly, as we would expect. But is actually defined on the codelength, the equivalent of the above definition in terms of codelength is (take the minus log)

−1

nlog₂p(x₁, . . . , x_n)−H(X)

≤,

that is the average codelength per item in the sequence differs from its expectation (the entropy) by at most.

This solves to

|k−np| ≤ n log_1−p^p

forp6= 0.5and forp= 0.5this is a null condition: all sequences are typical for all >0.

The fixed numbers were n = 15 and = 0.1. For p= 0.1 we get|k−1.5| ≤ 0.473 such that there are no typical sequences, as no integer satisfies this.

Forp= 0.3we get|k−4.5| ≤1.23, that isk∈ {4,5}. The cardinality of the typical set then is A¹⁵_0.1= ¹⁵₄

+ ¹⁵₅

= 4368.

3

(4)

Bonus Problem We consider a binary symmetric channel with error rate p. In other words, we transmit single bits, and each bit gets flipped with probability pindependently of anything else. For error correction, we use a simplek-fold repetition code, where each bit is just repeatedktimes. Then an error occurs for a given bit, i.e., the recipient is unable to recover its correct value, if at leastk/2 of the sent bits were flipped.

(a) Give an exact expression in terms ofpandkfor the probability of error in transmitting a single bit. Can you come up with an approximate formula that’s easier to understand and evaluate?

Solution Define the binomially distributed variableX as the number of bits that have flipped out of thekrepetitions. An error will occur whenever at least half of the bits have flipped.

P(error) =P(X ≥ k 2) =

k

X

i=d^k₂e

P(X =i) =

k

X

i=d^k₂e

k i

pⁱ(1−p)^k−i

We may safely assumep <0.5. Forp= 0.5the corrupted string looks random with no chance of recovery. Forp >0.5 we should switch from a majority vote decision to minority vote.

The above can be bounded by the Chernoff bound from exercise series 1 to obtain (with epectation µ=kp)

P(X ≥ k

2)≤P(|X−µ| ≥ k

2−µ) =P

|X−µ| ≥ 1

2p−1

µ

≤

Chernof f

2 exp





− kp

1 2p−1²

3







(b) Supposing p= 0.1 and we want the error probability to be at most 0.00001 %, what value of k would be sufficient? A reasonably tight estimate is a sufficient answer.

Solution Plug inp= 0.1 to get

2 exp





− kp

1 2p−12

3





= 2 exp −k·0.1 _2·0.1¹ −12

3

!

= 2 exp

−8k 15

and solve the required inequality fork.

2 exp

−8k 15

≤10⁻⁷⇐⇒k≥ 15

8 ln(2·10⁷)≈31.5 Therefore,k= 32does the job.

(c) Assume now that we actually want to transmit a byte (8 bits) so that all the bits are correct.

What value ofk would be sufficient, with other parameters as in the previous part?

Solution The probabilty of at least one error in8 bits is at most8 times the probability of an error in one bit. Therefore we need to shrink the tolerance by a factor of8 to get

k≥15

8 ln(16·10⁷)≈35.4 and see thatk= 36is sufficient.

4