Logistics Logistics IT-II Differential Entropy Scratch EE515A – Information Theory II Spring 2012 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Spring Quarter, 2012 http://j.ee.washington.edu/ ~ bilmes/classes/ee515a_spring_2012/ Lecture 19 - March 27th, 2012 Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 1 Logistics Logistics IT-II Differential Entropy Scratch Outstanding Reading Read all chapters assigned from IT-I (EE514, Winter 2012). Read chapter 8 in the book (but what book you might ask? You’ll soon see if you don’t know). Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 2
22
Embed
EE515A Information Theory II Spring 2012 - …j.ee.washington.edu/~bilmes/classes/ee514a_winter_2012/lecture19... · \Information Theory, Inference, and Learning Algorithms", David
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Information Theory I and IIThis two-quarter course will be an thorough introduction toinformation theory.Information Theory I (EE514, Winter 2012): entropy, mutualinformation, asymptotic equipartition properties, data compressionto the entropy limit (source coding theorem), Huffman,communication at the channel capacity limit (channel codingtheorem), method of types, arithmetic coding, Fano codesInformation Theory II (EE515, Spring 2012) : Lempel-Ziv,convolutional codes, differential entropy, maximum entropy, ECC,turbo, LDPC and other codes, Kolmogorov complexity, spectralestimation, rate-distortion theory, alternating minimization forcomputation of RD curve and channel capacity, more on theGaussian channel, network information theory, information geometry,and some recent results on use of polymatroids in information theory.Additional topics throughout will include information theory as it isapplicable to pattern recognition, natural language processing,computer science and complexity, biological science, andcommunications.Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 3
our web page (http://j.ee.washington.edu/~bilmes/classes/ee515a_spring_2012/)
our dropbox (https://catalyst.uw.edu/collectit/dropbox/bilmes/21171) whichis where all homework will be due, electronically, in PDF format. Nopaper homework accepted.
our discussion board(https://catalyst.uw.edu/gopost/board/bilmes/27386/) iswhere you can ask questions. Please use this rather than email sothat all can benefit from answers to your questions.
Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 4
Your task is to give a 15-20 minute presentation that summarizes2-3 related and significant papers that come from IEEE Transactionson Information Theory.
The papers must not be ones that we covered in class, althoughthey can be related.
You need to do the research to find the papers yourself (i.e., that ispart of the assignment).
The papers must have been published in the last 10 years (so no oldor classic papers).
Your grade will be based on how clear, understandable, and accurateyour presentation is.
This is a real challenge and will require significant work! Many ofthe papers are complex. To get a good grade, you will need to workto present complex ideas in a simple way.
Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 10
“A First Course in Information Theory”, by Raymond W. Yeung,2002. Very good chapters on information measures and networkinformation theory.
“Elements of information theory”, 1991, Thomas M. Cover, Joy A.Thomas, Q360.C68 (First edition of our book).
“Information theory and reliable communication”, 1968, Robert G.Gallager. Q360.G3 (classic text by foundational researcher).
“Information theory and statistics”, 1968, Solomon Kullback, MathQA276.K8
“Information theory : coding theorems for discrete memorylesssystems”, 1981, Imre Csiszar and Janos Korner. Q360.C75 (anotherkey book, but a little harder to read).
“Information Theory, Inference, and Learning Algorithms”, DavidJ.C. MacKay 2003
Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 12
Shannon’s general model of communicationencoder decoder
noise
source channelencoder
channel sourcedecoder
receiversourcecoder
channeldecoder
This has been our guiding principle so far, and will continue to guideus for a while (key exception is network information theory that wewill cover, where the above picture may become an arbitrarydirected graph with multiple sources/receivers).
So far, channel has been discrete, but real-world channels are notdiscrete, we thus need differential entropy.
Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 17
Note: continuous entropy can be both positive or negative.
How can entropy (which we know to mean “uncertainty”, or“information”) be negative?
In fact, entropy (as we’ve seen perhaps once or twice) can beinterpreted as the exponent of the “volume” of a typical set.
Example: 2H(X) is the number of things that happen, on average,and can have 2H(X) � |X |.Consider a uniform r.v. Y such that 2H(X) = |Y|.Thus having a negative exponent just means the volume is small.
Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 26
This makes sense. We start with a continuous random variable Xand quantize it at an n-bit accuracy.
For a discrete representation to represent 2n values, we expect theentropy to go up with n, and as n gets large so would the entropy,but then adjusted by h(X).
H(X∆) is the number of bits to describe this n-bit equally spacedquantization of the continuous random variable X.
H(X∆) ≈= h(f) + n says that it might take either more than nbits to describe X at n-bit accuracy, or less than n bits to describeX at n-bit accuracy.
If X is very concentrated h(f) < 0 then fewer bits. If X is veryspread out, then more than n bits.
Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 36
For discrete entropy, we have monotonicity. I.e.,H(X1, X2, . . . , Xk) ≤ H(X1, X2, . . . , Xk, Xk+1). More generally
f(A) = H(XA) (42)
is monotonic non-decreasing in set A (i.e., f(A) ≤ f(B), ∀A ⊆ B).Is f(A) = h(XA) monotonic? No, consider Gaussian entropy with
diagonal Σ with small diagonal values. So h(X) = 12 log
[(2πe)n|Σ|
]can get smaller with more random variables.Similarly, when some variables independent, adding independentvariables with negative entropy can decrease overall entropy.
Prof. Jeff Bilmes EE515A/Spring 2012/Info Theory – Lecture 19 - March 27th, 2012 page 41