Top Banner
Data Compression Arithmetic coding
21

Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Dec 23, 2015

Download

Documents

Gabriel Blair
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Data Compression

Arithmetic coding

Page 2: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Arithmetic Coding: Introduction

Allows using “fractional” parts of bits!!

Used in PPM, JPEG/MPEG (as option), Bzip

More time costly than Huffman, but integer implementation is not too bad.

Page 3: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Arithmetic Coding (message intervals)

Assign each symbol to an interval range from 0 (inclusive) to 1 (exclusive).

e.g.

a = .2

c = .3

b = .5

f(a) = .0, f(b) = .2, f(c) = .7

1

1

)()(i

j

jpif

The interval for a particular symbol will be calledthe symbol interval (e.g for b it is [.2,.7))

Page 4: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Arithmetic Coding: Encoding Example

Coding the message sequence: bac

The final sequence interval is [.27,.3)

a = .2

c = .3

b = .5

0.0

0.2

0.7

1.0

a = .2

c = .3

b = .5

0.2

0.3

0.55

0.7

a = .2

c = .3

b = .5

0.2

0.22

0.27

0.3

Page 5: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Arithmetic Coding

To code a sequence of symbols c with probabilities

p[c] use the following:

f[c] is the cumulative prob. up to symbol c (not included)

Final interval size is

The interval for a message sequence will be called the sequence interval

iii

iiii

cpsss

cfslll

*1

0

10

*110

n

iin cps

1

Page 6: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Uniquely defining an interval

Important property: The intervals for distinct messages of length n will never overlap

Therefore by specifying any number in the final interval uniquely determines the msg.

Decoding is similar to encoding, but on each step need to determine what the message value is and then reduce interval

Page 7: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Arithmetic Coding: Decoding Example

Decoding the number .49, knowing the message is of length 3:

The message is bbc.

a = .2

c = .3

b = .5

0.0

0.2

0.7

1.0

a = .2

c = .3

b = .5

0.2

0.3

0.55

0.7

a = .2

c = .3

b = .5

0.3

0.35

0.475

0.55

0.490.49

0.49

Page 8: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Representing a real number

Binary fractional representation:

So how about just using the shortest binary fractional representation in the sequence interval.

e.g. [0,.33) = .01 [.33,.66) = .1 [.66,1) = .11

1011.16/11

0101.3/1

11.75.

Algorithm

1. x = 2 *x2. If x < 1 output 03. else x = x - 1; output 1

Page 9: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Representing a code interval

Can view binary fractional numbers as intervals by considering all completions.

We will call this the code interval.

min max interval

. . . [. , . )

. . . [. ,. )

11 110 111 75 10

101 1010 1011 625 75

Page 10: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Selecting the code interval

To find a prefix code, find a binary fractional number whose code interval is contained in the sequence interval (dyadic number).

Can use L + s/2 truncated to 1 + log (1/s) bits

.61

.79

.625

.75Sequence Interval

Code Interval (.101)

Page 11: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Bound on Arithmetic length

Note that –log s+1 = log (2/s)

Page 12: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Bound on Length

Theorem: For a text of length n, the

Arithmetic encoder generates at most

1 + log (1/s) =

= 1 + log ∏ (1/pi)

≤ 2 + ∑ j=1,n log (1/pi)

= 2 + ∑k=1,|| npk log (1/pk)

= 2 + n H0 bits nH0 + 0.02 n bits in practicebecause of rounding

Page 13: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Integer Arithmetic Coding

Problem is that operations on arbitrary precision real numbers is expensive.

Key Ideas of integer version: Keep integers in range [0..R) where R=2k

Use rounding to generate integer interval Whenever sequence intervals falls into

top, bottom or middle half, expand the interval by a factor 2

Integer Arithmetic is an approximation

Page 14: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Integer Arithmetic (scaling)

If l R/2 then (top half)Output 1 followed by m 0sm = 0Message interval is expanded by 2

If u < R/2 then (bottom half)Output 0 followed by m 1sm = 0Message interval is expanded by 2

If l R/4 and u < 3R/4 then (middle half)Increment mMessage interval is expanded by 2

All other cases,just continue...

Page 15: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

You find this at

Page 16: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

ATB

Arithmetic ToolBox

(L,s)

L

L+s

(p1,....,p)

c

c

L’

s’

ATB(L’,s’)

As a state machine

Therefore, even the distribution can change over time

Page 17: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

K-th order models: PPM

Use previous k characters as the context. Makes use of conditional probabilities This is the changing distribution

Base probabilities on counts:e.g. if seen th 12 times followed by e 7 times, then the conditional probability p(e|th) = 7/12.

Need to keep k small so that dictionary does not get too large (typically less than 8).

Page 18: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

PPM: Partial Matching

Problem: What do we do if we have not seen context followed by character before?

Cannot code 0 probabilities!

The key idea of PPM is to reduce context size if previous match has not been seen.

If character has not been seen before with current context of size 3, send an escape-msg and then try context of size 2, and then again an escape-msg and context of size 1, ….

Keep statistics for each context size < k

The escape is a special character with some probability.

Different variants of PPM use different heuristics for the probability.

Page 19: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

ATB

PPM + Arithmetic ToolBox

(L,s)

L

L+s

p[ |context ]

= c or esc

L’

s’

ATB(L’,s’)

Encoder and Decoder must know the protocol for selectingthe same conditional probability distribution (PPM-variant)

Page 20: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

PPM: Example Contexts

Context Counts Context Counts Context Counts

Empty A = 4

B = 2

C = 5

$ = 3

A

B

C

C = 3

$ = 1

A = 2

$ = 1

A = 1

B = 2

C = 2

$ = 3

AC

BA

CA

CB

CC

B = 1

C = 2

$ = 2

C = 1

$ = 1

C = 1

$ = 1

A = 2

$ = 1

A = 1

B = 1

$ = 2String = ACCBACCACBA B k = 2

Page 21: Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

You find this at: compression.ru/ds/