UNIT-I INFORMATION THEORY 1. INTRODUCTION Communication theory deals with systems for transmitting information from one point to another. Fig 1: Commuication system 2. UNCERTAINTY, INFORMATION and ENTROPY – Any information source produces an output that is random in nature. So the source output is modeled as a discrete random variable, X, which is represented as symbols by S = { s 0, s 1, ……… s k-1 } with probabilities P(S = s k ) = p k, where k= 0,1…., k-1 – Therefore, this set of probabilities must satisfy the condition, – The symbols emitted by the source during successive signaling intervals are statistically independent. The source having this property is known as discrete memoryless source. – Any information can be defined in 3 ways: o Uncertainty (event S = s k ,occur before a process) 1 ∑ k=0 k−1 p k =1
26
Embed
slavanya.weebly.com€¦ · Web viewUNIT-I. INFORMATION THEORY. 1. INTRODUCTION . Communication theory deals with systems for transmitting information from one point to another.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
UNIT-I
INFORMATION THEORY
1. INTRODUCTION
Communication theory deals with systems for transmitting information from one point to another.
Fig 1: Commuication system
2. UNCERTAINTY, INFORMATION and ENTROPY
– Any information source produces an output that is random in nature. So the source output
is modeled as a discrete random variable, X, which is represented as symbols by
S = { s0, s1, ……… sk-1 }
with probabilities P(S = sk) = pk, where k= 0,1…., k-1
– Therefore, this set of probabilities must satisfy the condition,
– The symbols emitted by the source during successive signaling intervals are statistically
independent. The source having this property is known as discrete memoryless source.
– Any information can be defined in 3 ways:
o Uncertainty (event S = sk ,occur before a process)
o Surprise (during the process )
o Information gain (after the occurrence of the event )
– The amount of information is related to the inverse of the probability of the occurrence.
– Information Gain or self information: The amount of information gained after observing
the event S = sk, which occurs with the probability pk is termed as,
I (sk) =log(1/ pk) = - log pk
– The Units of information I (sk) are determined by the base of the logarithm, which is
usually selected as 2 or e.
When the base is 2 – units are in bits
When the base is e – units are in nats (natural units)
2.1 Properties of Information:1
∑k=0
k−1p k=1
– Consider the event S = sk, describing the emission of symbols xi , s with the probability pk,
1. I (sk) =0 , for pk =1
Outcome of an event is known before it occurs, therefore no information is gained.
2. I (sk) ≥ 0 for 0≤ pk ≤ 1
The occurrence of an event S = sk either provides some or no information.
3. I (sk) > I(si) for pk < p (si)
(i.e), less probable an event is, the more information we gain when it occurs.
4. I (sk, si) = I(sk) + I(si) if sk & si are statistically independent.
2.2 ENTROPY :
– It is a measure of average information content per source symbol.
– Denoted by H(S)
H(S) =E [I(sk)]
=−∑k=0
k−1
pk log pk bits/symbol
– The quantity H(S) is called the entropy of a discrete memoryless source with source letter S.
2.2.1 Properties of Entropy:
The entropy H(S) of such a source is bounds as follows
0≤ H(S) ≤ log2 K
where K is the radix (no: of symbols) of the letter of the Source
1. H(S)=0, if & only if pk =1 , for some k
pk = 0 , otherwise
This Lower bound on entropy corresponds to no uncertainty.
2. H(S) = log2 K, if & only if pk =1/ K , for all k .
This upper bound on entropy corresponds to maximum uncertainty.
2.2.2 Entropy of binary memoryless source:
– Consider a binary source for which symbol 0 occurs with probability p0 & symbol 1 with probability p1= 1- p0.
– Successive symbols emitted by the source are memoryless (i.e) statistically independent.
– Each symbol of a given alphabet is assigned a sequence of bitsaccording to the symbol probability.
– Huffman tree is built by bottom-up approach.
Procedure:
1. Calculate the probability of the list of symbols .
2. Source symbols are listed in order of decreasing probability.
3. The two source symbols of lowest probability a 0 and a 1. This step is referred to as a
splitting stage.
4. These two source symbols are combined into a new source symbol with probability equal to
the sum of the two original probabilities and it is placed in the list according to its new
value.
5. Recursively apply steps 3 and 4, until each symbol has become a corresponding code leaf on
a tree.
Problem:
The five symbols of the alphabet of a discrete memoryless source and their probabilities are {s0,
s1, s2, s3, s4} and {0.4, 0.2, 0.2, 0.1, 0.1} respectively. Compute the codewords of Huffman Code.
Also compute the entropy of the source.
Solution:
9
η =
η = = 0.96
The average length-code satisfied the following source coding property,
H(S) ≤ < H(S) +1
2.12 ≤ 2.2 < 3.12
6.1 Properties of Huffman Coding
Huffman coding uses longer codewords for symbols with smaller probabilities and shorter codewords for symbols that often occur.
The two longest codewords differ only in the last bit.
The codewords are prefix codes and uniquely decodable.
It should satify the shanon’s first theorem (source coding theorem) , H(S) ≤ < H(S) +1
6.2 Extended Huffman Coding:
We can encode a group symbols together and get better performance.10
H(S)
H (S )L
2.121932.2
L
L
It should satify the shanon’s first theorem, H(S) ≤ < H(S) +1
Problem:
Consider the source symbol with alphabet A= {a1,a2,a3} and the probabilities p(a1)=
0.8, p(a2)= 0.02, p(a3)= 0.18.
Solution:
11
L
7. JOINT AND CONDITONAL ENTOPY
7.1 Joint Entropy:
– The joint entropy H(X,Y) of a pair of discrete random variables (X,Y) with a joint distribution probability P(X,Y) is defined as
12
7.2 Conditional Entropy:
– It is defined as the amount information gained by transmitter when the state of receiver is known.
– It is the amount of uncertainty remaining about the channel input after the channel output has been observed.
–
– The mutual information is the average amount of information that you get about X from observing the value of Y.
– Therefore to measure the uncertainty of X based on the channel output Y is termed as,
– The conditional entropy H(X|Y) is defined as
7.3 MUTUAL INFORMATION (M.I):
– The difference H(X) – H(X|Y) represents the uncertainty about the channel input that is
resolved by observing the channel output.
– Therefore the mutual information is termed as,
I(X;Y) = H(X) – H(X|Y)
Similarly, I(Y;X) = H(Y) – H(Y|X)
H(X) - is the entropy of the channel input X
H(X|Y) – is the conditional entropy of the channel input X after observing the channel
output Y
7.3.1 Properties of Mutual information :
Property 1 : The mutual information of a channel is symmetric; (i.e)
I(X;Y) = I(Y;X)
Proof:
By multiplying with the above equation, we get
13
Joint probability P(x,y) = p(x) p(y|x) orP(x,y) = p(y) p(x|y)
(Noisy version of channel input x)
∑k=0
k−1
p( y k|x j )
I(X;Y) = H(X) – H(X|Y)
Now substitute H(X|Y) and equation 1, with the above equation, we obtain,
From the Baye’s rule for conditional probabilities,
Substitute equation 3 in 2, we get,
Hence proved.
Property 2: The mutual information is always nonnegative, (i.e) I(X;Y) ≥ 0
Proof:
From the conditional probability, P(xj|yk)= , substituting this in equation 2, we get,
By applying fundamental inequality function directly we obtain,
I(X;Y) ≥ 0
I(X;Y) ≥ 0, means we cannot lose information, on the average, by observing the output of a
Channel.
I(X;Y) = 0, means the input and the output channel are statistically independent
Property 3: The mutual information of a channel is related to the joint entropy of the
channel input and channel output by,
14
4
2
3
H(X)
1
p( x j| y k )p( x j )
=p( y k|x j )
p( y k )
p( x j , y k )p ( x j )
I(X;Y) = H(X) + H(Y) – H(X,Y)
Where the H(X,Y) is the joint entropy,
7.4 Chain Rule:
– The relationship between joint and conditional entropy is given as,
H(X,Y) = H(X) + H(Y|X)
H(Y,X) = H(Y) + H(X|Y)
Proof:
8. DISCRETE MEMORYLESS CHANNELS:
– A discrete memoryless channel is a statistical model with an input of X and output of Y
which is a noisy version of X (here both are random variables)
– In each time slot the channel accepts an input symbol X selected from a given alphabet X (x0, x1.... xj-1) and it emits an output symbol Y from an alphabet Y.
– The channel is said to be “discrete” when both of the alphabets have finite sizes.
– It is said to be “memoryless” when the current output symbol depends only on the current
– Also the input alphabet X and output alphabet Y need not have same size.– A discrete memoryless channel is to arrange the various transition probabilities of the
channel in the form of a matrix as follows:
– The J-by-K matrix P is called channel matrix or transition matrix.
– The fundamental property of the channel marix P, is the sum of the elements along any row
of the matrix is always equal to 1.
for all j
– The joint probability distribution of the random variables X and Y is given by
p(xj ,yk) = P( X =xj , Y = yk )
= P(Y= yk | X= xj) p(xj ) 8.1
= p (xj | yk ) p(xj )
– The marginal probability distribution of the output random variable Y is obtained by
averaging out the dependence of p(xj ,yk) on xj as shown by
p(yk ) = P(Y= yk )
16
for k=0,1,.... K-1 8.2
∑k=0
k-1p(yk/xj)= 1
=∑j=0
J-1
P (Y= yk | X= xj ) p (xj )
=∑j=0
J-1
p(yk | xj ) p (xj )
– The probabilities p(xj ) for j= 0, 1,…….J-1, are known as the a priori probabilities of the
various input symbols.
– The equation 8.2 states that the inputs are a priori probabilities p(xj ) and the channel matrix
p (xj | yk ), and the output is p(yk ).
9. CHANNEL CAPACITY :
– Capacity in the channel is defined as a intrinsic ability of a channel to convey information
– The channel capacity of a discrete memoryless channel is a maximum mutual information
I(X;Y) in any single use of the channel, where the maximization is overall possible input
probability distribution { p(xj)} on X.
– Channel capacity is denoted by C
C=max I(X;Y)
– The Channel capacity C is measured in bits per channel use or bits per transmission.
9.1 BINARY SYMMETRIC CHANNEL:
– It is the special case of the discrete memoryless channel with J=K=2.
– The channel has two input symbols (x0 = 0, x1 = 1) and two output symbols (y0 = 0, y1 =1 )
– The channel is symmetric because the probability of receiving a 1 if a 0 is sent is the same
as the probability of receiving a 0 if a 1 is sent .
– Conditional probability of error is denoted by p.
Fig 3: Transition Probability diagram of Binary symmetric channel
Channel Capacity for Binary Symmetric channel:
– Consider the binary symmetric channel which is described by the transition probability
diagram fig 3.
– This diagram is defined by the conditional probability of error p.
– The entropy H(X) is maximized when the channel input probability p(x0)=p(x1)= 1/2.
– The mutual information I(X;Y)is similarly maximized, so that it can be written as
C=max I(X;Y) | p(x0)=p(x1)= ½
– From fig 3, p(y0 | x1) = p(y1 | x0) = p and
p(y0 | x0) = p(y1 | x1) = 1- p17
9.1{p(xj)}
– Substituting these channel transition probabilities into the below equation
with J= K= 2 and then setting the input probability p(x0) =p(x1) in accordance with the
equation 9.1
– From that the capacity of the binary symmetric channel is,
C= 1+p log2 p + (1-p) log2 (1- p)
– By using the entropy function given in the below equation
H(p0) = -p0 log2 (p0 ) - (1- p0) log2 (1- p0) equation 9.2 can be reduced as
C= 1- H(p)
– The Channel capacity for binary symmetric channel is C= 1- H(p)
– The Channel capacity C varies with the probability of error p as shown in fig 4.
Observations:
When p=0, the channel is noise free. (i.e) the channel capacity C attains its maximum
value of 1 bit per channel use, which is exactly the information in each channel input. At
this value of p, the entropy function H(p) attains its minimum value of zero.
When p=1/2 due to noise, the channel capacity C attains its minimum value of 0,
whereas H(p) attains its maximum value of one. In such a case the channel is said to be
useless.
9.2 CHANNEL CODING THEOREM ( SHANNON’S SECOND THEOREM):
– The design goal of channel coding is to increase the resistance of the communication
systems to channel noise.
– Channel coding consists of mapping the incoming data sequence into a channel input
sequence and inverse mapping the channel output sequence into an output data sequence,
so that the channel noise of the system is minimized.
18
9.2
– Mapping and inverse mapping operations are performed by encoders and decoders.
– The channel encoder and decoder should be designed to optimize the overall reliability of a
communication system.
– Block Codes: Message sequence is divided into sequential block,each ’k’ bit long
– Code Rate: Each ’k’ bit block is mapped into an ’n’ bit block by the channel coder, where
n>k. The ratio r=k/n is called as code rate,
where k=’k bit block, n=block length and r is less than unity.
– Discrete memoryless source has a source alphabet S and entropy H(S) bits/ source symbol.
The source emits the symbol once every Ts seconds. Hence the average information rate
of the source is H(s)/Ts bits / seconds
– Discrete memoryless source has channel capacity equal to ‘C’ bits per use of the channel.
– The channel is capable of being used once every Tc seconds. Hence the channel capacity
per unit time is C/Tc bits/ seconds, which represents the maximum rate of information
transfer over the channel.
– The channel coding theorem for a discrete memoryless channel is stated into 2 parts:
1. H(s)/Ts ≤ C/ Tc ,where C/ Tc is called critical rate. It states the source output can be
transmitted over the channel and be reconstructed with the small probability of
error.
2. H(s)/Ts > C/ Tc , it shows that the the source output is not possible to transmit over
the channel.
C=channel capacity, Ts &Tc =Time, H(s) =Entropy.
Draw backs:
It won’t show us how to construct a good code.
It doesn’t have a precise result for the probability of symbol error after decoding the channel
output.
11. SHANNON LIMIT:
– Shannon showed that any communications channel such as a telephone line, a radio band, a
fiber-optic cable could be characterized by two factors:
1. bandwidth 2. noise
19
– Bandwidth is the range of electronic, optical or electromagnetic frequencies that can be used
to transmit a signal;
– Noise is anything that can disturb that signal.
– Given a channel with particular bandwidth and noise characteristics, Shannon showed how
to calculate the maximum rate at which data can be sent without error.