Autumn2012 Lecture4:NoisyChannelCodingJyrkiKivinen Information-TheoreticModeling

(1)

Outline Noisy Channels Channel Coding and Shannon’s 2nd Theorem Hamming Codes

Information-Theoretic Modeling

Lecture 4: Noisy Channel Coding

Jyrki Kivinen

Department of Computer Science, University of Helsinki

Autumn 2012

(2)

Lecture 4: Noisy Channel Coding

(3)

1 Noisy Channels

Reliable communication Error correcting codes Repetition codes

2 Channel Coding and Shannon’s 2nd Theorem Channel capacity

Codes and rates

Channel coding theorem

3 Hamming Codes Parity Check Codes Hamming (7,4)

(4)

1 Noisy Channels

Codes and rates

(5)

1 Noisy Channels

Codes and rates

(6)

1 Noisy Channels

Codes and rates

(7)

Reliable communication

In practice, most media are not perfect —noisy channels:

Modem line Satellite link Hard disk

Can we recover the original message (without errors) from a noisy code string?

(8)

Reliable communication

Modem line

Satellite link Hard disk

(9)

Reliable communication

Modem line Satellite link

Hard disk

(10)

Reliable communication

(11)

Reliable communication

(12)

Error correcting codes

source string decoded string

encoder noisy

channel decoder

We want to minimize two things:

1 Length of the code string.

2 Probability of error.

(13)

Error correcting codes

source string decoded string

encoder noisy

channel decoder

We want to minimize two things:

1 Length of the code string.

2 Probability of error.

(14)

Repetition codes

A simple idea: Just repeat the original string many times.

T R A N S M I S S I O N TTTRRRAAANNNSSSMMMIIISSSSSSIIIOOONNN TTTHRRAAANNBSSSMMMIIISSSSWSPILOOONNG T R A N S M I S S ? O N Transmission ratereduced to 1 : 3.

If errors independent and symmetric, probability of error reduced to 3(1−p)p²+p³ ≈3p², where p is the error rate of the channel.

(15)

Repetition codes

Get it? Get it? Get it? Get it? Get it? Get it? Get it? Get it? Get it?

(16)

Repetition codes

T R A N S M I S S I O N

TTTRRRAAANNNSSSMMMIIISSSSSSIIIOOONNN TTTHRRAAANNBSSSMMMIIISSSSWSPILOOONNG T R A N S M I S S ? O N Transmission ratereduced to 1 : 3.

(17)

Repetition codes

T R A N S M I S S I O N TTTRRRAAANNNSSSMMMIIISSSSSSIIIOOONNN

TTTHRRAAANNBSSSMMMIIISSSSWSPILOOONNG T R A N S M I S S ? O N Transmission ratereduced to 1 : 3.

(18)

Repetition codes

T R A N S M I S S I O N TTTRRRAAANNNSSSMMMIIISSSSSSIIIOOONNN TTTHRRAAANNBSSSMMMIIISSSSWSPILOOONNG

T R A N S M I S S ? O N Transmission ratereduced to 1 : 3.

(19)

Repetition codes

T R A N S M I S S I O N TTTRRRAAANNNSSSMMMIIISSSSSSIIIOOONNN TTTHRRAAANNBSSSMMMIIISSSSWSPILOOONNG T R A N S M I S S ? O N

Transmission ratereduced to 1 : 3.

(20)

Repetition codes

(21)

Repetition codes

If errors independent and symmetric, probability of error reduced to 3(1−p)p²+p³ ≈3p², wherep is the error rate of the channel.

(22)

Channel capacity Codes and rates Channel coding theorem

1 Noisy Channels

Codes and rates

(23)

Channel Capacity: basic intuition

We are going to define thechannel capacity C purely in terms of the probabilistic properties of the channel.

We consider encoding messages ofb bits into code wordsof b/R bits, for somerate0<R <1.

Error is the event that the original message cannot be correctly decoded from the received code word.

We say a rate R is achievableusing a channel, if there is an encoding such that the probability of error goes to zero as b increases.

The Source Coding Theorem, or Shannon’s Second Theorem, says rateR is achievable ifR<C, and not achievable if R >C.

(24)

Channel Capacity: basic intuition

(25)

Channel Capacity: basic intuition

(26)

Channel Capacity: basic intuition

(27)

Channel Capacity: basic intuition

(28)

Channel Capacity

Binary symmetric channel (BSC), error ratep:

Pr[y = 1|x = 0] = Pr[y = 0|x= 1] =p wherex is the transmitted andy the received bit

We define channel capacityas C(p) = 1−H(p) = 1−

plog₂ 1

p + (1−p) log₂ 1 1−p

. For instance,C(0.1)≈0.53. Ratio about 1 : 2.

(29)

Channel Capacity

Pr[y = 1|x = 0] = Pr[y = 0|x= 1] =p wherex is the transmitted andy the received bit We define channel capacityas

C(p) = 1−H(p) = 1−

plog₂ 1

p + (1−p) log₂ 1 1−p

.

For instance,C(0.1)≈0.53. Ratio about 1 : 2.

(30)

Channel Capacity

Pr[y = 1|x = 0] = Pr[y = 0|x= 1] =p wherex is the transmitted andy the received bit We define channel capacityas

C(p) = 1−H(p) = 1−

plog₂ 1

p + (1−p) log₂ 1 1−p

. For instance,C(0.1)≈0.53. Ratio about 1 : 2.

(31)

Channel Capacity

For channels other than BSC, the channel capacity is more generally defined as

C = max

pX

I(X,Y) = max

pX

(H(Y)−H(Y |X)) X is the transmitted andY the received symbol

I is calculated with respect to p_X_,Y(x,y) =p_X(x)p_Y_|X(y |x) p_Y|X is defined by the channed characteristics.

Intuition:

for a large capacity, we wantY to carry a lot of information however, knowingX should remove most of the uncertainty about Y

we can get a favourable p_X by choosing a suitable coding.

(32)

Channel Capacity

C = max

pX

I(X,Y) = max

pX

Intuition:

(33)

Channel Capacity

C = max

pX

I(X,Y) = max

pX

Intuition:

for a large capacity, we wantY to carry a lot of information

however, knowingX should remove most of the uncertainty about Y

(34)

Channel Capacity

C = max

pX

I(X,Y) = max

pX

Intuition:

(35)

Channel Capacity

C = max

pX

I(X,Y) = max

pX

Intuition:

we can get a favourablep_X by choosing a suitable coding.

(36)

Channel Capacity

Example 1: BSC

Choosing uniformp_X gives the maximumI(X;Y) = 1−H(p)

(exercise)

Example 2: Noisy typewriter

The maximum is obtained for uniform p_X (symmetricity) with uniform pX, also pY is uniform over 26 symbols

⇒ H(Y) = log₂26

if X is known, there are two equally probable values Y

⇒ H(Y |X) = log₂2 = 1

so I(X;Y) = log₂26−1 = log₂13 (capacity 13 bits per transmission)

(37)

Channel Capacity

Example 1: BSC

Choosing uniformp_X gives the maximumI(X;Y) = 1−H(p) (exercise)

⇒ H(Y) = log₂26

⇒ H(Y |X) = log₂2 = 1

(38)

Channel Capacity

Example 1: BSC

⇒ H(Y) = log₂26

⇒ H(Y |X) = log₂2 = 1

(39)

Channel Capacity

Example 1: BSC

The maximum is obtained for uniform p_X (symmetricity)

with uniform pX, also pY is uniform over 26 symbols

⇒ H(Y) = log₂26

⇒ H(Y |X) = log₂2 = 1

(40)

Channel Capacity

Example 1: BSC

⇒ H(Y) = log₂26

⇒ H(Y |X) = log₂2 = 1

(41)

Channel Capacity

Example 1: BSC

⇒ H(Y) = log₂26

⇒ H(Y |X) = log₂2 = 1

(42)

Channel Capacity

Example 1: BSC

⇒ H(Y) = log₂26

⇒ H(Y |X) = log₂2 = 1

(43)

Codes and rates

For simplicity, we consider BSC unless we say otherwise.

Messageswe want to send are blocks of b bits.

Thus, there areM = 2^b possible messages. We encode a message into code wordsof n bits.

So generally we need n≥log₂M =b.

Notation:

W ∈ {1, . . . ,M}: (index of) a message

Xⁿ=f(W)∈ {0,1}ⁿ: code word for messageW

Yⁿ∈ {0,1}ⁿ: received code word (noisy version ofXⁿ(W)) Wˆ =g(Yⁿ)∈ {1, . . . ,M}: our guess about what the correct message was.

The rateof the code isR = (log₂M)/n.

(44)

Codes and rates

Thus, there areM = 2^b possible messages.

We encode a message into code wordsof n bits.

Notation:

(45)

Codes and rates

So generally we need n≥log₂M =b. Notation:

(46)

Codes and rates

Notation:

(47)

Codes and rates

Notation:

(48)

Codes and rates

Notation:

(49)

Codes and rates

Notation:

(50)

Codes and rates

Notation:

Yⁿ∈ {0,1}ⁿ: received code word (noisy version ofXⁿ(W))

Wˆ =g(Yⁿ)∈ {1, . . . ,M}: our guess about what the correct message was.

(51)

Codes and rates

Notation:

(52)

Codes and rates

Notation:

(53)

Codes and rates

Letλ_w, for w ∈ {1, . . . ,M}, denote the probability that message w was sent but not correctly received.

We can write this as

λw = X

y6∈g⁻¹(w)

p(y |X =f(w))

whereg⁻¹(w) ={y |g(y) =w}. Average error: ¯λ= _M¹ P

wλ_w Maximum error: λmax = maxwλw

(54)

Codes and rates

λw = X

y6∈g⁻¹(w)

p(y |X =f(w))

whereg⁻¹(w) ={y |g(y) =w}.

Average error: ¯λ= _M¹ P

(55)

Codes and rates

λw = X

y6∈g⁻¹(w)

p(y |X =f(w))

wλ_w

Maximum error: λmax = maxwλw

(56)

Codes and rates

λw = X

y6∈g⁻¹(w)

p(y |X =f(w))

(57)

Channel coding theorem

A rateR is achievableif there is a sequence of codes, for increasingly large code word lengthsn, such that asn goes to infinity, the maximum errorλmax goes to zero.

Channel Coding Theorem

IfR<C, whereC is the channel capacity, then rate R is achievable.

IfR>C, then rateR is not achievable.

In other words, for any given >0 and R<C, for large enoughb we can encode messages ofb bits into code words of n=b/R bits so that the probability of error is at most.

This is also known as Shannon’s Second Theorem (the first one being the Source Coding Theorem).

(58)

Channel coding theorem

(59)

Channel coding theorem

(60)

Channel coding theorem

(61)

Channel coding theorem

Channel Coding Theorem—So what?

Assume you want to transmit data with probability of error 10⁻¹⁵ over a BSC,p = 0.1. Using a repetition code, we need to make the message63 times as long as the source string.

Shannon’s result says twice as long is enough.

If you want probability of error 10⁻¹⁰⁰, Shannon’s result still says that twice is enough!

However the messages you encode need to be sufficiently long!

(62)

Channel coding theorem

(63)

Channel coding theorem

(Exercise: Check the math. Hint: binomial distribution.)

(64)

Channel coding theorem

(65)

Channel coding theorem

(66)

Channel coding theorem

(67)

Channel coding theorem

The proof of Channel Coding Theorem (which we won’t cover) is based on choosingM code words, each n bits long, completely at random.

To decode y, just pickw for whichf(w) is closest to y. If log₂M <nR, then the expected error rate, over random choice of code books, is very small.

If random code books are good on average, then surely the best single code book is at least as good.

However, in practice we need specific codes that have high rates and are easy to compute. Finding such is difficult and out of scope for this course.

We will next give a simple example to illustrate the basic idea.

(68)

Channel coding theorem

To decodey, just pickw for whichf(w) is closest to y.

If log₂M <nR, then the expected error rate, over random choice of code books, is very small.

(69)

Channel coding theorem

(70)

Channel coding theorem

If log₂M <nR, then the expected error rate, over random choice of code books, is very small. This is the tricky part.

(71)

Channel coding theorem

(72)

Channel coding theorem

(73)

Channel coding theorem

However, in practice we need specific codes that have high rates and are easy to compute. Finding such is difficult and out of scope for this course. We will next give a simple example to illustrate the basic idea.

(74)

Parity Check Codes Hamming (7,4)

1 Noisy Channels

Codes and rates

(75)

Hamming Codes

Richard W. Hamming (11.2.1915–7.1.1998)

(76)

Parity Check Codes

One way to detect and correct errors is to addparity checks to the codewords:

If we add a parity check bit at the end of each codeword we can detect one (but not more) error per codeword.

By clever use of more than one parity bits, we can actually identify where the error occurred and thus also correct errors. Designing ways to add as few parity bits as possible to correct and detect errors is a reallyhard problem.

(77)

Parity Check Codes

By clever use of more than one parity bits, we can actually identify where the error occurred and thus also correct errors. Designing ways to add as few parity bits as possible to correct and detect errors is a reallyhard problem.

(78)

Parity Check Codes

By clever use of more than one parity bits, we can actually identify where the error occurred and thus also correct errors.

Designing ways to add as few parity bits as possible to correct and detect errors is a reallyhard problem.

(79)

Parity Check Codes

By clever use of more than one parity bits, we can actually identify where the error occurred and thus also correct errors.

Designing ways to add as few parity bits as possible to correct and detect errors is areally hard problem.

(80)

Hamming (7,4)

4 data bits (d ,d ,d ,d ), 3 parity bits (p ,p ,p )

(81)

Hamming (7,4)

source string 1011, parity bits 010

(82)

Hamming (7,4)

error in data bit d (07→1) is identified and corrected