-
TEORIJA INFORMACIJE
eljko
Jerievi, dr. sc.Zavod
za
raunarstvo, Tehniki
fakultet
&
Zavod
za
biologiju
i medicinsku
genetiku, Medicinski
fakultet51000 Rijeka, Croatia
Phone: (+385) 51-651 594 E-mail: [email protected]
http://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html
-
10 February 2012 [email protected] 2
Information theoryIz
dosadanjeg
gradiva
znamo
da
se informacija
prije
slanja
kroz
kanal
treba
prirediti. To se postie
pretvorbom
informacije
u formu
koja
ima
entropiju
blisku
maksimalnoj
ime
se efikasnost
prenosa
pribliava
maksimalnoj. Ovo
se moe
postii kompresijom
bez
gubitaka
informacije
(lossless compression),
napr. aritmetikim
kodiranjem.Druga
pretvorba
odnosi
se na
sigurnost
prijenosa
pri
emu
se
informacija
prevodi
u formu
gdje
je za
odreeni
tip pogreaka mogua
automatska
korekcija
(napr. Hamming-ovim
kodiranjem).
-
10 February 2012 [email protected] 3
Saimanje
(compression)
-
10 February 2012 4
Entropijsko
kodiranje: Kraft-ova nejednakost
(u Huffman & Shannon-Fano)1.4.1 The Kraft inequality
We shall prove the existence of efficient source codes by
actually constructing somecodes that are important in applications.
However, getting to these results requires someintermediate steps.A
binary variable-length source code is described as a mapping from
the sourcealphabet A to a set of finite strings, C from the binary
code alphabet, which we alwaysdenote {0, 1}. Since we allow the
strings in the code to have different lengths, it isimportant that
we can carry out the reverse mapping in a unique way. A simple way
ofensuring this property is to use a prefix code, a set of strings
chosen in such a way thatno string is also the beginning (prefix)
of another string. Thus, when the current stringbelongs to C, we
know that we have reached the end, and we can start processing
thefollowing symbols as a new code string. In Example 1.5 an
example of a simple prefixcode is given.If ci is a string in C and
l(ci ) its length in binary symbols, the expected length of
thesource code per source symbol isL(C) =Ni=1P(ci )l(ci ).If the
set of lengths of the code is {l(ci )}, any prefix code must
satisfy the followingimportant condition, known as the Kraft
inequality:
i2l(ci )
1. (1.10)
-
10 February 2012 5
Entropijsko
kodiranje: Kraft-ova nejednakost1.4.1 The Kraft inequality
The code can be described as a binary search tree: starting
from
the root, two branchesare labelled
0 and 1, and each node is either a leaf that corresponds to
the
end of a string,or a node that can be assumed to have two
continuing branches. Let lm be the maximallength of a string. If a
string has length l(c), it follows from the prefix condition
thatnone of the 2lml(c) extensions of this string are in the code.
Also, two extensions ofdifferent code strings are never equal,
since this would violate
the prefix condition. Thusby summing over all codewords
we get
i2lml(ci )
2lmand the inequality follows. It may further be proven that any
uniquely decodable codemust satisfy (1.10) and that if this is the
case there exists a prefix code with thesame set of code lengths.
Thus restriction to prefix codes imposes no loss in
codingperformance.
-
10 February 2012 6
Entropijsko
kodiranje: Kraft-ova nejednakost
1.4.1 The Kraft inequalityExample 1.5 (A simple code). The code
{0, 10, 110, 111} is a
prefix code for an alphabetof four symbols. If the probability
distribution of the source is
(1/2, 1/4, 1/8, 1/8), theaverage length of the code strings is
1
1/2 + 2
1/4 + 3
1/4 = 7/4, which isalso the entropy of the source.
-
10 February 2012 7
Entropijsko
kodiranje: Kraft-ova nejednakost1.4.1 The Kraft inequality
If all the numbers log P(ci ) were integers, we could choose
these as the lengthsl(ci ). In this way the Kraft inequality would
be satisfied with equality, and furthermoreL = i P(ci )l(ci ) = i
P(ci )log P(ci ) = H(X)and thus the expected code length would
equal the entropy. Such a case is shown inExample 1.5. However, in
general we have to select code strings that only approximatethe
optimal values. If we round log P(ci ) to the nearest larger
integer log P(ci ),the lengths satisfy the Kraft inequality, and by
summing we get an upper bound on thecode lengthsl(ci ) = log P(ci )
log P(ci ) + 1. (1.11)The difference between the entropy and the
average code length may be evaluatedfromH(X)
L = i P(ci ) log P(ci )
li = i
P(ci )log 2l P(ci )
log i
2li
0,where the inequalities are those established by Jensen and
Kraft, respectively. This givesH(X)
L
H(X) + 1, (1.12)where the right-hand side is given by taking the
average of (1.11).The loss due to the integer rounding may give a
disappointing resultwhen
the coding isdone on single source symbols. However, if we apply
the result to strings of N symbols,we find an expected code length
of at most NH + 1, and the result per source symbolbecomes at most
H + 1/N. Thus, for sources with independent symbols, we can get
anexpected code length close to the entropy by encoding
sufficiently long strings of sourcesymbols.
-
10 February 2012 8
Aritmetiko
kodiranje
Pretpostavimo
da
elimo
poslati
poruku
koja
se sastoji
od
3 slova: A, B & C s podjednakom
vjerojatnosti pojavljivanja
Upotreba
2 bita
po
simbolu
je neefikasna: jedna
od kombinacija
bitova
se nikada
nee
upotrebiti.
Bolja
ideja
je upotreba
realnih
brojeva
izmedu
0 & 1 u brojevnom
sustavu
po
bazi
3, pri
cemu
svaka
znamenka
predstavlja
simbol.
Na primjer, sekvenca
ABBCAB postaje
0.011201 (uz
A=0, B=1, C=2)
-
10 February 2012 9
Aritmetiko
kodiranje
Prevoenjem
realnog
broja
0.011201 po
bazi
3 u
binarni, dobivamo
0.001011001
Upotreba
2 bita
po
simbolu
zahtjeva
12 bitova
za
sekvencu
ABBCAB, a binarna
reprezentacija
0.011201 (u bazi
3) zahtjeva
9 bitova
u binarnoj
bazi
to
je uteda
od
25%.
Metoda
se zasniva
na
efikasnim
in place
algoritmima
za
prevoenje
iz
jedne
baze
u drugu
-
10
Brzo
prevoenje
iz
jedne
baze
u drugu
Linux/Unix bc
program
Primjeri: echo "ibase=2; 0.1" | bc .5 echo "ibase=3; 0.1000000"
| bc .3333333 echo "ibase=3; obase=2; 0.011201" | bc
.00101100100110010001 echo "ibase=2; obase=3; .001011001" | bc
.0112002011101011210 zaokrueno na .011201 (duina 6)
-
10 February 2012 11
Aritmetiko
dekodiranje
Aritmetikim
kodiranjem
moemo
postii
rezultat
blizak
optimalnom
(optimalno
je log2
p bita
za
svaki simbol
vjerojatnosti
p).
Primjer
s etiri
simbola, aritmetikim
kodom
0.538 i sljedeom
distribucijom
vjerojatnosti
(D je kraj
poruke):
0.6 0.2 0.1 0.1Simbol A B C D
Vjerojatnost
-
10 February 2012 12
Aritmetiki
kod
sekvence
je 0.538 (ACD)
Prvi
korak: poetni
interval [0,1] podjeli
u subintervale
proporcionalno
vjerojatnostima:
0.538 pada
u prvi
interval (simbol
A)
[0 0.6) [0.6 0.8) [0.8 0.9) [0.9 1)Simbol A B C DInterval
-
10 February 2012 13
Aritmetiki
kod
sekvence
je 0.538 (ACD)
Drugi
korak: interval [0,6) izabran
u prvom
koraku
podjeli
u subintervale
proporcionalno
vjerojatnostima:
0.538 pada
u trei
sub-interval (simbol
C)
[0 0.36) [0.46 0.48) [0.48 0.54) [0.54 0.6)Simbol A B C
DInterval
-
10 February 2012 14
Aritmetiki
kod
sekvence
je 0.538 (ACD)
Trei
korak: interval [0.48-0.54) izabran
u prvom
koraku
podjeli
u subintervale
proporcionalno vjerojatnostima:
0.538 pada
u etvrti
sub-interval (simbol
D, koji
je ujedno
i simbol
zavretka
niza)
[0.48 0.516) [0.516 0.528) [0.528 0.534) [0.534 0.54)Simbol A B
C DInterval
-
10 February 2012 15
Aritmetiki
kod
sekvence
je 0.538 (ACD)Grafiki
prikaz
aritmetikog
dekodiranja
-
10 February 2012 16
Aritmetiki
kod
sekvence
je 0.538 (ACD)
(ne)Jednoznanost: Ista
sekvenca
mogla
se prikazati
kao
0.534, 0.535, 0.536, 0.537 ili
0.539. Uporaba
dekadskih umijesto
binarnih
znamenki
uvodi
neefikasnost.
Informacijski
sadraj
tri dekadske
zamenke
je oko
9.966 bita
(zato?)
Istu
poruku
moemo
binarno
kodirati
kao
0.10001010 to
odgovara
0.5390625 dekadski
i zahtjeva
8 bita.
-
10 February 2012 17
Aritmetiki
kod
sekvence
je 0.538 (ACD)8 bita
je vie
nego
stvarna
entropija
poruke
(1.58 bita)
zbog
kratkoe
poruke
i pogrene
distribucije. Ako
se uzme
u obzir
stvarna
distribucija
simbola
u poruci
poruka
se moe
kodirati
uz
upotrebu
sljedeih
intervala: [0, 1/3); [1/9, 2/9); [5/27, 6/27); i binarnog
intervala
of
[1011110, 1110001). Rezultat
kodiranja
je poruka
111, odnosno
3 bita
Ispravna
statistika
poruke
je krucijalna
za
efikasnost kodiranja!
-
18
Aritmetiko
kodiranjeIterativno
dekodiranje
poruke
-
19
Aritmetiko
kodiranjeIterativno
kodiranje
poruke
-
20
Aritmetiko
kodiranjeDva
simbola
s vjerojatnou
pojavljivanja
px
=2/3 & py
=1/3
-
21
Aritmetiko
kodiranjeTri simbola
s vjerojatnou
pojavljivanja
px
=2/3 & py
=1/3
-
22
Aritmetiko
kodiranje
TEORIJA INFORMACIJEInformation theorySaimanje
(compression)Entropijsko kodiranje: Kraft-ova nejednakost (u
Huffman & Shannon-Fano)Entropijsko kodiranje: Kraft-ova
nejednakostEntropijsko kodiranje: Kraft-ova nejednakostEntropijsko
kodiranje: Kraft-ova nejednakostAritmetiko kodiranjeAritmetiko
kodiranjeBrzo prevoenje iz jedne baze u druguAritmetiko
dekodiranjeAritmetiki kod sekvence je 0.538 (ACD)Aritmetiki kod
sekvence je 0.538 (ACD)Aritmetiki kod sekvence je 0.538
(ACD)Aritmetiki kod sekvence je 0.538 (ACD)Aritmetiki kod sekvence
je 0.538 (ACD)Aritmetiki kod sekvence je 0.538 (ACD)Aritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeImage
bitplanesImage bitplanesImage bitplanesImage bitplanesAritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko
kodiranjeAritmetiko kodiranjeHvala na panjiOKHuffman-ovo
kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAdaptivno
Huffman-ovo kodiranjeModificirano Huffman-ovo kodiranje