Top Banner
exercise in the previous class binary Huffman code? average codeword length? 1 A B C D E F G H prob. 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021 1.000 0.637 0.359 0.278 0.185 0.135 0.066 0.363 A 0.174 B 0.143 C 0.098 D 0.087 E 0.069 F 0.045 G 0.021 H 0 1 0 100 110 1010 1011 1110 11110 11111 0.363×1+0.174×3+...+0.021×5=2.660
34

exercise in the previous class

Jan 29, 2016

Download

Documents

halden

prob. 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021. 0. A B C D E F G H. 1. exercise in the previous class. binary Huffman code? average codeword length?. 0.363 ×1+0.174 ×3+...+0.021×5= 2.660. 1.000. 0.637. 0.359. 0.278. 0.185. 0.135. 0.066. 0.363 A. 0.174 B. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: exercise  in the previous class

exercise in the previous class

binary Huffman code?average codeword length?

1

ABCDEFGH

prob.0.3630.1740.1430.0980.0870.0690.0450.021

1.000

0.637

0.359 0.278

0.185 0.1350.066

0.363A

0.174B

0.143C

0.098D

0.087E

0.069F

0.045G

0.021H

01

0 100 110 1010 1011 1110 11110 11111

0.363×1+0.174×3+...+0.021×5=2.660

Page 2: exercise  in the previous class

exercise in the previous class

4-ary Huffman code?[basic idea] join four trees

we may have #trees < 4 in the final round.

with one “join”, 4 – 1 = 3 trees disappear.add dummy nodes, start with 3k+1 nodes.

2

ABCDEFGH

prob.0.3630.1740.1430.0980.0870.0690.0450.021

? dummy

0.363A

0.174B

0.143C

0.098D

0.087E

0.069F

0.045G

0.021H

a b c da db dc dda ddb

0*

0*

0.0660.320

1.000

a b cd

Page 3: exercise  in the previous class

today’s class

basic properties needed for source codinguniquely decodableimmediately decodable

Huffman codeconstruction of Huffman code

extensions of Huffman codetheoretical limit of the “compression”related topics

3

today

Page 4: exercise  in the previous class

today’s class (detail)

Huffman codes are good, but how good are they?

Huffman codes for extended information sourcespossible means ( 手段 ) to improve the efficiency

Shannon’s source coding theoremthe theoretical limit of efficiency

some more variations of Huffman codesblocks of symbols with variable block length

4

algorithm

math.

algorithm

math.

Page 5: exercise  in the previous class

how should we evaluate Huffman codes?

good codeimmediately decodable...“use code trees”small average codeword length (ACL)

It seems that Huffman’s algorithm gives a good solution.

To see that Huffman codes are really good,we discuss a mathematical limit of the ACL

...under a certain assumption (up to the slide 11)

...in the general case (Shannon’s theorem)

5

0

10

10

1

Page 6: exercise  in the previous class

theoretical limit under an assumption

assumptionthe encoding is done in a symbol-by-symbol manner

define one codeword for each symbol of the source SS produces M symbols with probabilities p1, ..., pM

Lemma (restricted Shannon’s theorem):1. for any code, the ACL ≥ H1(S)

2. a code with ACL ≤ H1(S)+1 is constructible

H1(S) is the borderline of “possible” and “impossible”.

6

Page 7: exercise  in the previous class

Shannon’s lemma (bad naming...)

To prove the restricted Shannon’s theorema small technical lemma (Shannon’s lemma) is needed.

Shannon’s lemma ( シャノンの補助定理)For any non-negative numbers q1, ..., qM with q1 +...+ qM ≤ 1,

7

∑𝑖=1

𝑀

−𝑝𝑖 log2𝑞𝑖≥∑𝑖=1

𝑀

−𝑝𝑖 log2𝑝𝑖(¿𝐻1(𝑆))

with the equation holds if and only if pi = qi.

remind: p1, ..., pM are symbol probabilities and p1 +...+ pM = 1

Page 8: exercise  in the previous class

proof (sketch)

left hand side – right hand side=

8

∑𝑖=1

𝑀

−𝑝𝑖 log2𝑞𝑖+∑𝑖=1

𝑀

𝑝𝑖 log2𝑝𝑖¿∑𝑖=1

𝑀

−𝑝𝑖 log2

𝑞𝑖

𝑝𝑖

=∑𝑖=1

𝑀 𝑝𝑖

log𝑒2(−log𝑒

𝑞𝑖

𝑝𝑖

)

≥∑𝑖=1

𝑀 𝑝𝑖

log𝑒2 (1− 𝑞𝑖

𝑝𝑖)= 1

log𝑒2∑𝑖=1

𝑀

(𝑝𝑖−𝑞𝑖 )

¿ 1log𝑒2

(∑𝑖=1

𝑀

𝑝𝑖−∑𝑖=1

𝑀

𝑞𝑖)=1

log𝑒2(1−∑

𝑖=1

𝑀

𝑞𝑖)

≥0y = 1 – x

1O

y = – logex

− log𝑒𝑥 ≥1−𝑥the equation holds iff qi/pi = 1

Page 9: exercise  in the previous class

proof of the restricted Shannon’s theorem: 1

for any code, the average codeword length ≥ H1(S)

Let l1, ..., lM be the length of codewords, and define .

Kraft: Shannon’s Lemma:

9

∑𝑖=1

𝑀

−𝑝𝑖 log2𝑞𝑖≥∑𝑖=1

𝑀

−𝑝𝑖 log2𝑝𝑖=𝐻1 (𝑆 ) .

the ACL We have shown that L ≥ H1(S).

𝑙𝑖=− log2𝑞𝑖

Page 10: exercise  in the previous class

proof of the restricted Shannon’s theorem: 2

a code with average codeword length ≤ H1(S)+1 is constructible

Choose integers l1, ..., lM so that .

The choice makes , and

10

... Kraft’s inequality

We can construct a code with codeword length l1, ..., lM,

whose ACL is

𝐿=∑ 𝑝𝑖 𝑙𝑖<∑ 𝑝𝑖 (−log 2𝑝𝑖+1¿¿)=∑−𝑝𝑖 log2𝑝𝑖+∑𝑝𝑖=𝐻1 (𝑆 )+1 .

Page 11: exercise  in the previous class

the lemma and the Huffman code

Lemma (restricted Shannon’s theorem):1. for any code, the ACL ≥ H1(S)

2. a code with ACL ≤ H1(S)+1 is constructible

We can show that, for a Huffman code, L ≤ H1(S) + 1

there is no symbol-by-symbol code whoseACL is smaller than L.

proof ... by recursion on the size of code trees

A Huffman code is said to be a compact code.

11

Page 12: exercise  in the previous class

coding for extended information sources

The Huffman code is the best symbol-by-symbol code, but...the ACL 1not good for encoding binary information sources

12

A

0

B

10

A

0

C

11

C

11

A

0

symbolAB

average

prob.0.80.2

C1

01

1.0

C2

10

1.0

If we encode several symbols in a block, then...the ACL per symbol can be < 1good for binary sources also A B

10

A C C

110

A

01

Page 13: exercise  in the previous class

block Huffman coding

13

fixed-length(equal-,

constant-)variable-length

(unequal-)block partitionrun-length

message

“block” operation

ABCBCBBCAA...

AB CBC BB CAA...

Huffman encoding

01 10 001 1101...

blocked message

codewords

Page 14: exercise  in the previous class

fixed-length block Huffman coding

ACL:0.6×1+ 0.3×2+ 0.1×2 = 1.4 bitfor one symbol

14

ABC

prob.0.60.30.1

codeword0

1011

AAABACBABBBCCACBCC

prob.0.360.180.060.180.090.030.060.030.01

codeword0

1001100101

1110111101101

111110111111

blocks with two symbolsACL:

0.36×1+ ... + 0.01×6 = 2.67 bit,but this is for two symbols

2.67 / 2 = 1.335 bit for one symbol

improved!

Page 15: exercise  in the previous class

block coding for binary sources

ACL:0.8×1+ 0.2×1 = 1.0 bitfor one symbol

15

AB

prob.0.80.2

codeword01

AAABBABB

prob.0.640.160.160.04

codeword0

10110111

blocks with two symbolsACL:

0.64×1+ ... + 0.04×3 = 1.56 bitfor two symbols

1.56 / 2 = 0.78 bit for one symbol

improved!

Page 16: exercise  in the previous class

the block length

blocks with three symbolsACL:

0.512×1+ ... + 0.008×5 = 2.184 bit

for three symbols2.184 / 3 = 0.728 bit for one symbol

16

AAAAABABAABBBAABABBBABBB

prob.0.5120.1280.1280.0320.1280.0320.0320.008

codeword0

100101

11100110

111011111011111

block size123:

ACL per symbol1.0

0.780.728

:

larger block size more compact

Page 17: exercise  in the previous class

block code and extension of information source

What happens if we increase the block length further?

Observe that...a block code defines a codeword for each block pattern.one block = a sequence of n symbols of S

= one symbol of Sn, the n-th order extension of S restricted Shannon’s theorem is applicable:

H1(Sn) ≤ Ln < H1(Sn) + 1

Ln = the ACL for n symbols

for one symbol of S,

17

𝐻1(𝑆𝑛)

𝑛≤𝐿𝑛

𝑛<𝐻1 (𝑆𝑛)𝑛

+ 1𝑛

Page 18: exercise  in the previous class

Shannon’s source coding theorem

H1(Sn) / n ... the n-th order entropy of S (→ Apr. 12)

If n goes to the infinity...

18

Shannon’s source coding theorem:1. for any code, the ACL ≥ H (S)2. a code with ACL ≤ H (S) + ε is constructible

Page 19: exercise  in the previous class

what the theorem means

Shannon’s source coding theorem:1. for any code, the ACL ≥ H (S)

2. a code with ACL ≤ H (S) + ε is constructible

Use block Huffman codes, and you can approach to the limit.You never overcome the limit. however.

19

AB

prob.0.80.2

block size123:

ACL per symbol1.0

0.780.728

:0.723 + εH(S) = 0.723

Page 20: exercise  in the previous class

remark 1

Why block codes give smaller ACL?

fact 1: the ACL is minimized by a real-number solutionif P(A) = 0.8, P(B) = 0.2, then we want l1 and l2 with...

20

fact 2: the length of a codeword must be an integer

s.t.

s.t.

and integers

...loss!

...gain!

frequent loss, seldom gain...

Page 21: exercise  in the previous class

remark 1 (cnt’d)

the gap between the ideal and the real codeword lengths: ... is an integer approximation of

the gap is weighted by the probability…

21

0

0.2

0.4

0.6

0.8

1

p

the weighted gaplong block many symbols small probabilities small weighted gaps close to the ideal ACL

Page 22: exercise  in the previous class

today’s class (detail)

Huffman codes are good, but how good are they?

Huffman codes for extended information sourcespossible means ( 手段 ) to improve the efficiency

Shannon’s source coding theoremthe theoretical limit of efficiency

some more variations of Huffman codesblocks of symbols with variable block length

22

algorithm

math.

algorithm

math.

Page 23: exercise  in the previous class

practical issues ( 問題 ) of block coding

Theoretically saying, the block Huffman codes are the best.

From practical viewpoint, there are several problems:We need to know the probability distribution in advance.

(this will be discussed in the next class)We need a large table for the encoding/decoding.

if one byte is needed to record one entry of the table...–256 byte table, if block length = 8–64 Kbyte table, if block length = 16–4 Gbyte table, if block length = 32

23

Page 24: exercise  in the previous class

use blocks with variable-length

If we define blocks so thatthey have the same length, then ...

some blocks have small probabilitiesthose blocks also need codewords

If we define blocks so thatthey have similar probabilities, then ...

length differ from block by blockthe table has little useless blocks

24

AAAAABABAABBBAABABBBABBB

prob.0.5120.1280.1280.0320.1280.0320.0320.008

codeword0

100101

11100110

111011111011111

AAAAABABB

prob.0.5120.1280.160.2

codeword0

10010111

Page 25: exercise  in the previous class

definition of block patterns

Block patterns must be defined so that...the patterns can represent (almost) all symbol sequences.

bad example: block pattern = {AAA, AAB, AB}

25

AABABAAB AAB AB AAB

AABBBAAB AAB ?

two different approaches are well-known;block partition approachrun-length approach

Page 26: exercise  in the previous class

define patterns with block partition approach

1. prepare all blocks with length one2. partition the block with the largest probability

by appending one more symbol3. go to 2

Example: P(A) = 0.8, P(B) = 0.2

26

AB

0.80.2

AAABB

0.640.160.2

AAAAABABB

0.5120.1280.160.2

010010111

codewords

Page 27: exercise  in the previous class

how good is this?

to determine the average codeword length,assume that n blocks are produced from S:

27

AAAAABABB

0.5120.1280.160.2

010010111

0.512n×3 + 0.128n×3 ...= 2.44n symbols

0.512n×1 + 0.128n×3 ... = 1.776n bits

S

AAA AB AAA B AB ...encode0 101 0 11 101 ...

2.44n symbols are encoded to 1.776n bits the average codeword length is 1.776n / 2.44n = 0.728 bit(almost the same as the block length = 8, p. 16, but small table)

Page 28: exercise  in the previous class

define patterns with run-length approach

run = a sequence of consecutive ( 連続の ) identical symbol

28

A B B A A A A A B A A A B

run of length = 1

run of length = 5

run of length = 3

run of length = 0

Example: divide a message into runs of “A”:

The message is constructible if the lengths of runs are given. define blocks as runs of various length

Page 29: exercise  in the previous class

upper-bound the run-length

small problem? ... there can be very long run put an upper-bound limit : run-length limited (RLL) coding

29

upper-bound = 3run length

01234567:

representation012

3+03+13+1

3+3+03+3+1

:

ABBAAAAABAAAB is represented asone “A” followed by Bzero “A” followed by Bthree or more “A”s followed by Btwo “A”s followed by Bthree or more “A”s followed by Bzero “A” followed by B

Page 30: exercise  in the previous class

run-length Huffman code

Huffman code defined to encode the length or runseffective when there is strong bias on the symbol probabilities

30

p(A) = 0.9, p(B) = 0.1

run length012

3 or more

block patternB

ABAABAAA

prob.0.1

0.090.0810.729

codeword10

110111

0

ABBAAAAABAAAB: 1, 0, 3+, 2, 3+, 0 110 10 0 111 0 10⇒AAAABAAAAABAAB: 3+, 1, 3+, 2, 2 0 110 0 111 111⇒AAABAAAAAAAAB: 3+, 0, 3+, 3+, 2 0 10 0 0 111⇒

Page 31: exercise  in the previous class

example of various block coding

S: memoryless & stationary. P(A) = 0.9, p(B) = 0.1the entropy of S is H(S) = –0.9log20.9 – 0.1log20.1=0.469 bit

31

code 1: a naive Huffman codeaverage codeword

length = 1

symbolAB

prob.0.90.1

codeword01

code 2: fixed-length (3bit)average codeword length = 1.661/3symbols =

0.55/symbolAAAAABABAABB

0.7290.0810.0810.009

0100110

1010

BAABABBBABBB

0.0810.0090.0090.009

11101011

1111011111

Page 32: exercise  in the previous class

example of various block coding (cnt’d)

with n blocks...0.1n×1 + ... + 0.478n×7 = 5.215n symbols0.1n×3 + ... + 0.478n×1 = 2.466n bits

the average codeword length per symbol = 2.466 / 5.215 = 0.47

32

code 3: run-length Huffman (upper-bound = 8)length

0123

prob.0.1

0.090.0810.073

codeword110

100010011010

length456

7+

prob.0.0660.0590.0530.478

codeword101111101111

0

Page 33: exercise  in the previous class

summary of today’s class

Huffman codes are good, but how good are they?

Huffman codes for extended information sourcespossible means ( 手段 ) to improve the efficiency

Shannon’s source coding theoremthe theoretical limit of efficiency

some more variations of Huffman codesblocks of symbols with variable block length

33

Page 34: exercise  in the previous class

exercise

Write a computer program to construct a Huffman code for a given probability distribution.

Modify the above program so that it can handle fixed-length block coding.

Give distribution, change the block length, and observe how the average codeword length changes according to the change.

34