• Ei tuloksia

Autumn2012 Lecture5:SourceCoding:AlgorithmsJyrkiKivinen Information-TheoreticModeling

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Autumn2012 Lecture5:SourceCoding:AlgorithmsJyrkiKivinen Information-TheoreticModeling"

Copied!
162
0
0

Kokoteksti

(1)

Outline Codes Optimal Codes

Information-Theoretic Modeling

Lecture 5: Source Coding: Algorithms

Jyrki Kivinen

Department of Computer Science, University of Helsinki

Autumn 2012

(2)

Outline Codes Optimal Codes

1 Codes

Decodable Codes Prefix Codes

Kraft-McMillan Theorem

2 Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Jyrki Kivinen Information-Theoretic Modeling

(3)

Outline Codes Optimal Codes

1 Codes

Decodable Codes Prefix Codes

Kraft-McMillan Theorem

2 Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

(4)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Extension Code

A (binary)symbol code C : X → {0,1} is a mapping from the alphabetX to the set of finite binary sequences.

Theextensionof code C is the mappingC : X → {0,1} obtained by concatenating the codewordsC(xi) for each input symbolxi:

C(x1,x2, . . . ,xn) =C(x1)C(x2). . .C(xn) .

Jyrki Kivinen Information-Theoretic Modeling

(5)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Extension Code

A (binary)symbol code C : X → {0,1} is a mapping from the alphabetX to the set of finite binary sequences.

Theextensionof code C is the mappingC : X→ {0,1} obtained by concatenating the codewordsC(xi) for each input symbolxi:

C(x1,x2, . . . ,xn) =C(x1)C(x2). . .C(xn) .

C*

I N P U T _ S T R I N G ...

(6)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Extension Code

A (binary)symbol code C : X → {0,1} is a mapping from the alphabetX to the set of finite binary sequences.

Theextensionof code C is the mappingC : X→ {0,1} obtained by concatenating the codewordsC(xi) for each input symbolxi:

C(x1,x2, . . . ,xn) =C(x1)C(x2). . .C(xn) .

C*

N P U T _ S T R I N G ...

0001 111001 101011111010011...

I

1001

Jyrki Kivinen Information-Theoretic Modeling

(7)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Extension Code

A (binary)symbol code C : X → {0,1} is a mapping from the alphabetX to the set of finite binary sequences.

Theextensionof code C is the mappingC : X→ {0,1} obtained by concatenating the codewordsC(xi) for each input symbolxi:

C(x1,x2, . . . ,xn) =C(x1)C(x2). . .C(xn) .

C*

P U T _ S T R I N G ...

I N

(8)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Extension Code

A (binary)symbol code C : X → {0,1} is a mapping from the alphabetX to the set of finite binary sequences.

Theextensionof code C is the mappingC : X→ {0,1} obtained by concatenating the codewordsC(xi) for each input symbolxi:

C(x1,x2, . . . ,xn) =C(x1)C(x2). . .C(xn) .

C*

U T _ S T R I N G ...

101011111010011...

I

1001 N

0001 P

111001

Jyrki Kivinen Information-Theoretic Modeling

(9)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Extension Code

A (binary)symbol code C : X → {0,1} is a mapping from the alphabetX to the set of finite binary sequences.

Theextensionof code C is the mappingC : X→ {0,1} obtained by concatenating the codewordsC(xi) for each input symbolxi:

C(x1,x2, . . . ,xn) =C(x1)C(x2). . .C(xn) .

C*

T _ S T R I N G ...

I N P U

(10)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Extension Code

A (binary)symbol code C : X → {0,1} is a mapping from the alphabetX to the set of finite binary sequences.

Theextensionof code C is the mappingC : X→ {0,1} obtained by concatenating the codewordsC(xi) for each input symbolxi:

C(x1,x2, . . . ,xn) =C(x1)C(x2). . .C(xn) .

C*

_ S T R I N G ...

010011...

I

1001 N

0001 P

111001 U

10101 T

1111

Jyrki Kivinen Information-Theoretic Modeling

(11)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Extension Code

A (binary)symbol code C : X → {0,1} is a mapping from the alphabetX to the set of finite binary sequences.

Theextensionof code C is the mappingC : X→ {0,1} obtained by concatenating the codewordsC(xi) for each input symbolxi:

C(x1,x2, . . . ,xn) =C(x1)C(x2). . .C(xn) .

C*

S T R I N G ...

I N P U T _

(12)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Extension Code

A (binary)symbol code C : X → {0,1} is a mapping from the alphabetX to the set of finite binary sequences.

Theextensionof code C is the mappingC : X→ {0,1} obtained by concatenating the codewordsC(xi) for each input symbolxi:

C(x1,x2, . . . ,xn) =C(x1)C(x2). . .C(xn) .

C*

I

1001 N

0001 P

111001 U

10101 T

1111 _

01 S T R I N G ...

0011...

Jyrki Kivinen Information-Theoretic Modeling

(13)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

(Other types of code)

For reference, some example of codes that arenotsymbol codes:

Run Length Encoding(RLE): for example, encode aaaaccabbb as (a,4),(c,2),(a,1),(b,3)

Adaptive codes, where the code of a symbol may change based on what symbols have appeared previously

Note that coding blocks ofb bits for some constant b is still a symbol code, with alphabet size|X |= 2b.

(14)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

(Other types of code)

For reference, some example of codes that arenotsymbol codes:

Run Length Encoding(RLE): for example, encode aaaaccabbb as (a,4),(c,2),(a,1),(b,3)

Adaptive codes, where the code of a symbol may change based on what symbols have appeared previously

Note that coding blocks ofb bits for some constant b is still a symbol code, with alphabet size|X |= 2b.

Jyrki Kivinen Information-Theoretic Modeling

(15)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

(Other types of code)

For reference, some example of codes that arenotsymbol codes:

Run Length Encoding(RLE): for example, encode aaaaccabbb as (a,4),(c,2),(a,1),(b,3)

Adaptive codes, where the code of a symbol may change based on what symbols have appeared previously

Note that coding blocks ofb bits for some constant b is still a symbol code, with alphabet size|X |= 2b.

(16)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

(Other types of code)

For reference, some example of codes that arenotsymbol codes:

Run Length Encoding(RLE): for example, encode aaaaccabbb as (a,4),(c,2),(a,1),(b,3)

Adaptive codes, where the code of a symbol may change based on what symbols have appeared previously

Note that coding blocks ofb bits for some constant b is still a symbol code, with alphabet size|X |= 2b.

Jyrki Kivinen Information-Theoretic Modeling

(17)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Decodable Codes

Decodable Code

CodeC is (uniquely)decodableiff its extensionC is a one-to-one mapping, i.e., iff

(x1, . . . ,xn)6= (y1, . . . ,yn) ⇒ C(x1, . . . ,xn)6=C(y1, . . . ,yn) .

x

A code with codewords{0,1,10,11} isnotuniquely decodable: What does 10 mean?

A code with codewords{00,01,10,11} is uniquely decodable: Each pair of bits can be decoded individually.

A code with codewords{0,01,011,0111}is also uniquely decodable: What does 0011 mean?

(18)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Decodable Codes

Decodable Code

CodeC is (uniquely)decodableiff its extensionC is a one-to-one mapping, i.e., iff

(x1, . . . ,xn)6= (y1, . . . ,yn) ⇒ C(x1, . . . ,xn)6=C(y1, . . . ,yn) .

x

A code with codewords{0,1,10,11} isnotuniquely decodable: What does 10 mean?

A code with codewords{00,01,10,11} is uniquely decodable: Each pair of bits can be decoded individually.

A code with codewords{0,01,011,0111}is also uniquely decodable: What does 0011 mean?

Jyrki Kivinen Information-Theoretic Modeling

(19)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Decodable Codes

Decodable Code

CodeC is (uniquely)decodableiff its extensionC is a one-to-one mapping, i.e., iff

(x1, . . . ,xn)6= (y1, . . . ,yn) ⇒ C(x1, . . . ,xn)6=C(y1, . . . ,yn) .

x

A code with codewords{0,1,10,11} isnotuniquely decodable: What does 10 mean?

A code with codewords{00,01,10,11} is uniquely decodable: Each pair of bits can be decoded individually.

A code with codewords{0,01,011,0111}is also uniquely decodable: What does 0011 mean?

(20)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Decodable Codes

Decodable Code

CodeC is (uniquely)decodableiff its extensionC is a one-to-one mapping, i.e., iff

(x1, . . . ,xn)6= (y1, . . . ,yn) ⇒ C(x1, . . . ,xn)6=C(y1, . . . ,yn) .

x

A code with codewords{0,1,10,11} isnotuniquely decodable: What does 10 mean?

A code with codewords{00,01,10,11} is uniquely decodable: Each pair of bits can be decoded individually.

A code with codewords{0,01,011,0111}is also uniquely decodable: What does 0011 mean?

Jyrki Kivinen Information-Theoretic Modeling

(21)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Prefix Codes

An important subset of decodable codes is the set of prefix(-free) codes.

Prefix Code

A codeC : X → {0,1} is called aprefix codeiff no codeword is a prefix of another.

It is easily seen that all prefix codes are uniquely decodable: each symbol can be decoded as soon as its codeword is read. Therefore, prefix codes are also calledinstantaneous codes.

x

A code with codewords{0,01,011,0111}is uniquely decodablebut not prefix-free: e.g., 0 is a prefix of 01.

A code with codewords{0,10,110,111}isprefix-free.

(22)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Prefix Codes

An important subset of decodable codes is the set of prefix(-free) codes.

Prefix Code

A codeC : X → {0,1} is called aprefix codeiff no codeword is a prefix of another.

It is easily seen that all prefix codes are uniquely decodable: each symbol can be decoded as soon as its codeword is read. Therefore, prefix codes are also calledinstantaneous codes.

x

A code with codewords{0,01,011,0111}is uniquely decodablebut not prefix-free: e.g., 0 is a prefix of 01.

A code with codewords{0,10,110,111}isprefix-free.

Jyrki Kivinen Information-Theoretic Modeling

(23)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Prefix Codes

An important subset of decodable codes is the set of prefix(-free) codes.

Prefix Code

A codeC : X → {0,1} is called aprefix codeiff no codeword is a prefix of another.

It is easily seen that all prefix codes are uniquely decodable: each symbol can be decoded as soon as its codeword is read. Therefore, prefix codes are also calledinstantaneous codes.

x

A code with codewords{0,01,011,0111}is uniquely decodablebut not prefix-free: e.g., 0 is a prefix of 01.

A code with codewords{0,10,110,111}isprefix-free.

(24)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

The codeword lengths of a prefix codes satisfy the following important property.

Kraft Inequality

The codeword lengths`1, . . . , `m of any (binary) prefix code satisfy

m

X

i=1

2−`i ≤1 .

Conversely, given a set of codeword lengths that satisfy this inequality, there is a prefix code with these codeword lengths.

Jyrki Kivinen Information-Theoretic Modeling

(25)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

Codewords{0,10,110,111}

(26)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

x

Kraft inequality violated. ⇒ Not decodable.

Jyrki Kivinen Information-Theoretic Modeling

(27)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

0010

Fixed-length code

(28)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

Decodable & prefix-free

Jyrki Kivinen Information-Theoretic Modeling

(29)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

Kraft?

Decodable?

Prefix-free?

x

(30)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

Kraft?

Decodable?

Prefix-free?

x

Jyrki Kivinen Information-Theoretic Modeling

(31)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

Kraft?

Decodable?

x

Prefix-free?

x

(32)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

Kraft?

Decodable?

x

Prefix-free?

x

Jyrki Kivinen Information-Theoretic Modeling

(33)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Proof: Assume first we have a prefix code with codeword lengths

`1, . . . , `m. LetL= maxi`i be the length of the longest codeword.

Create a binary tree with the code words as leafs.

Add nodes to make a complete tree of depthL.

Assign each new leaf to the code word above it. Notice that because of the prefix property, each leaf gets assigned at most once.

(34)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Proof: Assume first we have a prefix code with codeword lengths

`1, . . . , `m. LetL= maxi`i be the length of the longest codeword.

Create a binary tree with the code words as leafs.

Add nodes to make a complete tree of depthL.

Assign each new leaf to the code word above it. Notice that because of the prefix property, each leaf gets assigned at most once.

Jyrki Kivinen Information-Theoretic Modeling

(35)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Proof: Assume first we have a prefix code with codeword lengths

`1, . . . , `m. LetL= maxi`i be the length of the longest codeword.

Create a binary tree with the code words as leafs.

Add nodes to make a complete tree of depthL.

Assign each new leaf to the code word above it.

Notice that because of the prefix property, each leaf gets assigned at most once.

(36)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Proof: Assume first we have a prefix code with codeword lengths

`1, . . . , `m. LetL= maxi`i be the length of the longest codeword.

Create a binary tree with the code words as leafs.

Add nodes to make a complete tree of depthL.

Assign each new leaf to the code word above it. Notice that because of the prefix property, each leaf gets assigned at most once.

Jyrki Kivinen Information-Theoretic Modeling

(37)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Create a binary tree with the code words as leafs.

0 1

0 0

1

1 C(a) = 0

C(b) = 100 C(c) = 11 C(d) = 101

`1= 1

`2= 3

`3= 2

`4= 3 a

b

c

d

(38)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Add nodes to make a complete tree of depthL.

0 1 0 1

0 1

0 1 0 1

0 0

1

1 C(a) = 0

C(b) = 100 C(c) = 11 C(d) = 101

`1= 1

`2= 3

`3= 2

`4= 3 a

b d

c

Jyrki Kivinen Information-Theoretic Modeling

(39)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Assign each new leaf to the code word above it.

0 1 0 1

0 1

0 1 0 1

0 0

1

1 C(a) = 0

C(b) = 100 C(c) = 11 C(d) = 101

`1= 1

`2= 3

`3= 2

`4= 3 a

b d

c

(40)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

A code word of length`get 2L−` leaves assigned to it.

There are 2L leaves.

Number of assigned leaves≤total number of leaves.

Jyrki Kivinen Information-Theoretic Modeling

(41)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

A code word of length`get 2L−` leaves assigned to it.

There are 2L leaves.

Number of assigned leaves≤total number of leaves.

m

X

i=1

2L−`i ≤2L

(42)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

A code word of length`get 2L−` leaves assigned to it.

There are 2L leaves.

Number of assigned leaves≤total number of leaves.

m

X

i=1

2L·2−`i ≤2L

Jyrki Kivinen Information-Theoretic Modeling

(43)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

A code word of length`get 2L−` leaves assigned to it.

There are 2L leaves.

Number of assigned leaves≤total number of leaves.

m

X

i=1

2L·2−`i ≤2L

(44)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

A code word of length`get 2L−` leaves assigned to it.

There are 2L leaves.

Number of assigned leaves≤total number of leaves.

m

X

i=1

2−`i ≤1.

Jyrki Kivinen Information-Theoretic Modeling

(45)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

A code word of length`get 2L−` leaves assigned to it.

There are 2L leaves.

Number of assigned leaves≤total number of leaves.

m

X

i=1

2−`i ≤1.

This concludes the first half of the proof.

(46)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Assume now we are given a set of integers such that

m

X

i=1

2−`i ≤1.

Again letL= maxi`i. We construct a prefix code with codeword lengths`i.

We can assume that the lengths are sorted so that

`1 ≤`2 ≤ · · · ≤`m, and that `1 >0.

Jyrki Kivinen Information-Theoretic Modeling

(47)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Now create a binary tree of depthL. Associate codewords to nodes of the tree as previously.

Repeat the following fori = 1, . . . ,m.

1 Choose the leftmost node at depth `1 (depth of the root is 0). Assign the corresponding string as codeword i.

2 Remove the chosen node and its descendants from the tree.

(48)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Now create a binary tree of depthL. Associate codewords to nodes of the tree as previously.

Repeat the following fori = 1, . . . ,m.

1 Choose the leftmost node at depth `1 (depth of the root is 0). Assign the corresponding string as codeword i.

2 Remove the chosen node and its descendants from the tree.

Jyrki Kivinen Information-Theoretic Modeling

(49)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Now create a binary tree of depthL. Associate codewords to nodes of the tree as previously.

Repeat the following fori = 1, . . . ,m.

1 Choose the leftmost node at depth `1 (depth of the root is 0).

Assign the corresponding string as codewordi.

2 Remove the chosen node and its descendants from the tree.

(50)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Now create a binary tree of depthL. Associate codewords to nodes of the tree as previously.

Repeat the following fori = 1, . . . ,m.

1 Choose the leftmost node at depth `1 (depth of the root is 0).

Assign the corresponding string as codewordi.

2 Remove the chosen node and its descendants from the tree.

Jyrki Kivinen Information-Theoretic Modeling

(51)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: example

0 1 0 1

0 1

0 1 0 1

0 0

1

1 `1= 1

`2= 3

`3= 2

`4= 3

C(a) =. . . C(b) =. . . C(c) =. . . C(d) =. . .

a b c d

(52)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: example

0 1 0 1

0 1

0 1 0 1

0 0

1

1 `1= 1

`4= 3 C(a) =. . .

C(d) =. . . a

d

`2= 2

`3= 3 c b

C(c) =. . . C(b) =. . .

Jyrki Kivinen Information-Theoretic Modeling

(53)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: example

0 1 0 1

0 1

0 1 0 1

0 0

1 1

`4= 3

C(d) =. . . a

d

`2= 2

`3= 3 c b

C(c) =. . . C(b) =. . . a

C(a) = 0

`1= 1

(54)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: example

0 1 0 1

0 1

1

`4= 3

C(d) =. . . a

d

`2= 2

`3= 3 c b

C(c) =. . . C(b) =. . . C(a) = 0

`1= 1

Jyrki Kivinen Information-Theoretic Modeling

(55)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: example

0 1 0 1

0 1

1

`4= 3

C(d) =. . . d

`3= 3 c b

C(b) =. . . C(a) = 0

`1= 1 a

C(c) = 10 c

`2= 2

(56)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: example

0 1

1 1

`4= 3

C(d) =. . . d

`2= 2

`3= 3 c b

C(b) =. . . C(a) = 0

`1= 1 a

C(c) = 10

Jyrki Kivinen Information-Theoretic Modeling

(57)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: example

0 1

1 1

`4= 3

C(d) =. . . d

`2= 2 c

b

C(a) = 0

`1= 1 a

C(c) = 10 C(b) = 110 b

`3= 3

(58)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: example

1 1 1

`4= 3

C(d) =. . . d

`2= 2 c

b

C(a) = 0

`1= 1 a

C(c) = 10 C(b) = 110

`3= 3

Jyrki Kivinen Information-Theoretic Modeling

(59)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: example

1 1 1

d

`2= 2

`3= 3 c b

C(a) = 0

`1= 1 a

C(c) = 10

d

C(b) = 110 C(d) = 111

`4= 3

(60)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality:proof

The order of using up the tree is important.

0 1 0 1

0 1

0 1 0 1

0 0

1 1

C(b) = 000

C(d) =? b

c

d

`2= 2

`1= 3

`3= 2

`4= 1 a

d c b

C(c) = 01 C(d) = 100

No code of length 1 left for a!

Jyrki Kivinen Information-Theoretic Modeling

(61)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Notice that since`i ≤`i+1, we keep moving deeper into the tree.

Therefore the descendants of the node chosen for`i cannot include any descendants of nodes chosen for`1, . . . , `i−1.

Therefore, when`i is assigned, exactly 2L−`i leaves are removed from the tree.

By the same leave counting argument as before, the tree will not run out of leaves before we have assigned all the code words. This concludes the second direction of the proof.

(62)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Notice that since`i ≤`i+1, we keep moving deeper into the tree.

Therefore the descendants of the node chosen for`i cannot include any descendants of nodes chosen for`1, . . . , `i−1.

Therefore, when`i is assigned, exactly 2L−`i leaves are removed from the tree.

By the same leave counting argument as before, the tree will not run out of leaves before we have assigned all the code words. This concludes the second direction of the proof.

Jyrki Kivinen Information-Theoretic Modeling

(63)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Notice that since`i ≤`i+1, we keep moving deeper into the tree.

Therefore the descendants of the node chosen for`i cannot include any descendants of nodes chosen for`1, . . . , `i−1.

Therefore, when`i is assigned, exactly 2L−`i leaves are removed from the tree.

By the same leave counting argument as before, the tree will not run out of leaves before we have assigned all the code words.

This concludes the second direction of the proof.

(64)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality: proof

Notice that since`i ≤`i+1, we keep moving deeper into the tree.

Therefore the descendants of the node chosen for`i cannot include any descendants of nodes chosen for`1, . . . , `i−1.

Therefore, when`i is assigned, exactly 2L−`i leaves are removed from the tree.

By the same leave counting argument as before, the tree will not run out of leaves before we have assigned all the code words.

This concludes the second direction of the proof.

Jyrki Kivinen Information-Theoretic Modeling

(65)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

Question: What if the inequality is satisfied strictly, i.e., the sum of the terms in the sum equalslessthan one:

m

X

i=1

2−`i <1 .

Then it is possible to make the codewords shorter and still have a decodable (prefix) code.

(66)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

Question: What if the inequality is satisfied strictly, i.e., the sum of the terms in the sum equalslessthan one:

m

X

i=1

2−`i <1 .

Then it is possible to make the codewords shorter and still have a decodable (prefix) code.

Jyrki Kivinen Information-Theoretic Modeling

(67)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

Not all of budget used. ⇒ Some codewords can be made shorter.

(68)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft Inequality

0

1

00

01

10

11

000 001 010 011 100 101 110 111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Total budget

“Kraft tight” / complete code.

Jyrki Kivinen Information-Theoretic Modeling

(69)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft–McMillan Theorem

The Kraft inequality restricts the codeword lengths of prefix codes.

Could we do much better if we would only require decodability?

In fact it can be shown that we do not lose anything at all! Kraft-McMillan Theorem

The codeword lengths`1, . . . , `m of any uniquely decodable (binary) code satisfy

m

X

i=1

2−`i ≤1 .

Conversely, given a set of codeword lengths that satisfy this inequality, there is a uniquely decodable (prefix) code with these codeword lengths.

(70)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft–McMillan Theorem

The Kraft inequality restricts the codeword lengths of prefix codes.

Could we do much better if we would only require decodability?

In fact it can be shown that we do not lose anything at all!

Kraft-McMillan Theorem

The codeword lengths`1, . . . , `m of any uniquely decodable (binary) code satisfy

m

X

i=1

2−`i ≤1 .

Conversely, given a set of codeword lengths that satisfy this inequality, there is a uniquely decodable (prefix) code with these codeword lengths.

Jyrki Kivinen Information-Theoretic Modeling

(71)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft–McMillan Theorem

The Kraft inequality restricts the codeword lengths of prefix codes.

Could we do much better if we would only require decodability?

In fact it can be shown that we do not lose anything at all!

Kraft-McMillan Theorem

The codeword lengths`1, . . . , `m of any uniquely decodable (binary) code satisfy

m

X

i=1

2−`i ≤1 .

Conversely, given a set of codeword lengths that satisfy this inequality, there is a uniquely decodable (prefix) code with these codeword lengths.

(72)

Outline Codes Optimal Codes

Decodable Codes Prefix Codes Kraft-McMillan Theorem

Kraft-McMillan Theorem & Codes

Prefix Codes Decodable Codes

All Codes

Kraft Inequality

Jyrki Kivinen Information-Theoretic Modeling

(73)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

1 Codes

Decodable Codes Prefix Codes

Kraft-McMillan Theorem

2 Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

(74)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Codelengths and Probabilities

Let`1, . . . , `m be the codeword lengths of a uniquely decodable codeC : X → {0,1}. By the Kraft-McMillan theorem we have

c =

m

X

i=1

2−`i ≤1 .

Define a probability mass functionp : X →[0,1] as follows: pi = 2−`i

c

⇔ `i = log2 1 cpi

,

wherec is given above. Function p is indeed a pmf:

1 Non-negative: p(x)≥0 for allx ∈ X.

2 Sums to one: X

x∈X

p(x) =

m

X

i=1

1

c2−`i = c c = 1 .

Jyrki Kivinen Information-Theoretic Modeling

(75)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Codelengths and Probabilities

Let`1, . . . , `m be the codeword lengths of a uniquely decodable codeC : X → {0,1}. By the Kraft-McMillan theorem we have

c =

m

X

i=1

2−`i ≤1 .

Define a probability mass functionp : X →[0,1] as follows:

pi = 2−`i c

⇔ `i = log2 1 cpi

,

wherec is given above.

Function p is indeed a pmf:

1 Non-negative: p(x)≥0 for allx ∈ X.

2 Sums to one: X

x∈X

p(x) =

m

X

i=1

1

c2−`i = c c = 1 .

(76)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Codelengths and Probabilities

Let`1, . . . , `m be the codeword lengths of a uniquely decodable codeC : X → {0,1}. By the Kraft-McMillan theorem we have

c =

m

X

i=1

2−`i ≤1 .

Define a probability mass functionp : X →[0,1] as follows:

pi = 2−`i

c ⇔ `i = log2 1 cpi

, wherec is given above.

Function p is indeed a pmf:

1 Non-negative: p(x)≥0 for allx ∈ X.

2 Sums to one: X

x∈X

p(x) =

m

X

i=1

1

c2−`i = c c = 1 .

Jyrki Kivinen Information-Theoretic Modeling

(77)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Codelengths and Probabilities

Let`1, . . . , `m be the codeword lengths of a uniquely decodable codeC : X → {0,1}. By the Kraft-McMillan theorem we have

c =

m

X

i=1

2−`i ≤1 .

Define a probability mass functionp : X →[0,1] as follows:

pi = 2−`i

c ⇔ `i = log2 1 cpi

, wherec is given above.

Functionp is indeed a pmf:

1 Non-negative: p(x)≥0 for allx ∈ X.

2 Sums to one: X

x∈X

p(x) =

m

X

i=1

1

c2−`i = c c = 1 .

(78)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Codelengths and Probabilities

Let`1, . . . , `m be the codeword lengths of a uniquely decodable codeC : X → {0,1}. By the Kraft-McMillan theorem we have

c =

m

X

i=1

2−`i ≤1 .

Define a probability mass functionp : X →[0,1] as follows:

pi = 2−`i

c ⇔ `i = log2 1 cpi

, wherec is given above.

Functionp is indeed a pmf:

1 Non-negative: p(x)≥0 for allx ∈ X.

2 Sums to one: X

x∈X

p(x) =

m

X

i=1

1

c2−`i = c c = 1 .

Jyrki Kivinen Information-Theoretic Modeling

(79)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Codelengths and Probabilities

Assuming that the code is “Kraft tight”,c = 1, then under the pmfp corresponding to the codeword lengths `1, . . . , `m, the expected codeword length is

E[`(X)] =

m

X

i=1

2−`i`i

=

m

X

i=1

pi log2 1

pi =H(X) . This is the best we can hope for:

The expected codelength of any uniquely decodable code is at least the entropy:

E[`(X)]≥H(X) .

(80)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Codelengths and Probabilities

Assuming that the code is “Kraft tight”,c = 1, then under the pmfp corresponding to the codeword lengths `1, . . . , `m, the expected codeword length is

E[`(X)] =

m

X

i=1

2−`i`i

=

m

X

i=1

pi log2 1 pi

=H(X) . This is the best we can hope for:

The expected codelength of any uniquely decodable code is at least the entropy:

E[`(X)]≥H(X) .

Jyrki Kivinen Information-Theoretic Modeling

(81)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Codelengths and Probabilities

Assuming that the code is “Kraft tight”,c = 1, then under the pmfp corresponding to the codeword lengths `1, . . . , `m, the expected codeword length is

E[`(X)] =

m

X

i=1

2−`i`i

=

m

X

i=1

pi log2 1

pi =H(X) .

This is the best we can hope for:

The expected codelength of any uniquely decodable code is at least the entropy:

E[`(X)]≥H(X) .

(82)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Codelengths and Probabilities

Assuming that the code is “Kraft tight”,c = 1, then under the pmfp corresponding to the codeword lengths `1, . . . , `m, the expected codeword length is

E[`(X)] =

m

X

i=1

2−`i`i

=

m

X

i=1

pi log2 1

pi =H(X) . This is the best we can hope for:

The expected codelength of any uniquely decodable code is at least the entropy:

E[`(X)]≥H(X) .

Jyrki Kivinen Information-Theoretic Modeling

(83)

Outline Codes Optimal Codes

Entropy Lower Bound Shannon-Fano Coding Huffman

Entropy Lower Bound

E[`(X)]≥H(X) .

Proof.

E[`(X)]−H(X) =X

x∈X

p(x)`(x)−X

x∈X

p(x) log2 1 p(x)

=X

x∈X

p(x) log2 1

2−`x −X

x∈X

p(x) log2 1 p(x)

=X

x∈X

p(x) log2p(x) 2−`x

=X

x∈X

p(x)

log2 p(x)

q(x)+ log2 1 c

q(x) = 2−`(x) c

=D(p kq) + log2 1 c ≥0 .

Viittaukset

LIITTYVÄT TIEDOSTOT

2 MDL Principle Rules &amp; Exceptions Probabilistic Models Old-Style MDL Modern MDL.. Jyrki Kivinen

Encoding Continuous Data Differential Entropy Linear Regression Subset Selection Problem Wavelet Denoising..

To summarize, the prefix property is reasonable from a practical point of view, and from a theoretical point of view can be assumed without increasing input lengths too much....

Key ideas: motivation and basic definition universal and prefix machines invariance theorem, uncomputability basic upper bounds and examples More technical: proof of

[r]

5. Time, in minutes, a ustomer uses in a bank follows exponential distri-. bution with parameteer λ = 1 /

Inspired by the results of Kalton and Wood [21], Becerra and Rodriguez [6] proved that if X is a complex Banach space such that X R is convex- transitive and admits a

An X-ray crystallographic model of the beef heart mitochondrial ATPase (Abrahams et al. 1994) gives detailed information about the structure of the binding sites. Two of the