Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University
31
Embed
Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Collaborators:Tomas Gedeon
Alexander DimitrovJohn P. Miller
Zane Aldworth
Information Theory and Neural CodingPhD Oral
ExaminationNovember 29, 2001
Albert E. Parker
Complex Biological Systems Department of Mathematical Sciences
Center for Computational Biology
Montana State University
Outline
The ProblemOur Approach Build a Model: Probability and Information Theory Use the Model: Optimization
ResultsBifurcation TheoryFuture Work
Why are we interested in neural coding?
• We are computationalists: All computations underlying an animal's behavioral decisions are carried out within the context of neural codes.
• Neural prosthetics: to enable a silicon device (artificial retina, cochlea, etc.) to interface with the human nervous system.
Neural Coding and Decoding
The Problem: Determine a coding scheme: How does neural activity represent information about environmental stimuli?
Demands: • An animal needs to recognize the same object on repeated
exposures. Coding has to be deterministic at this level.• The code must deal with uncertainties introduced by the
environment and neural architecture. Coding is by necessity stochastic at this finer scale.
Major Obstacle: The search for a coding scheme requires large amounts of data
How to determine a coding scheme?
Idea: Model a part of a neural system as a communication channel using Information Theory. This model enables us to:
• Meet the demands of a coding scheme:o Define a coding scheme as a relation between stimulus and neural
response classes.
o Construct a coding scheme that is stochastic on the finer scale yet almost deterministic on the classes.
• Deal with the major obstacle:o Use whatever quantity of data is available to construct coarse but
optimally informative approximations of the coding scheme.
o Refine the coding scheme as more data becomes available.
Probability Framework(coding scheme ~ encoder)
(1) Want to Find: The Encoder
X YQ(Y |X)
environmentalstimuli
neuralresponses
The coding scheme between X and Y is defined by the conditional probability Q.
Probability Framework(elements of the respective probability spaces)
stimulus X=x neural response Y=y
(2) We have data: realizations of the r.v.’s X and Y
Q(Y=y|X=x)
k = 25ms windows over discretized time
X Y
environmentalstimuli
neuralresponsesQ(Y |X)
{0,1}k = 10 ms windows over discretized time
We assume that Xn(w) = X(Tn(w)) and Yn(w) = Y(Tn(w)) are stationary ergodicr.v.s, where T is a time shift.
Y
1
2
3
4
Xenvironmental stimuli
neur
al r
espo
nses
Overview of Our Approach
How to determine stimulus/response classes?Given a joint probability p(X,Y):
The Stimulus and Response Classes
environmental stimuli
neur
al r
espo
nses
Distinguishable stimulus/response
classes
Y
X
1
2
3
4
Information TheoryThe Foundation of the Model
• A signal x is produced by a source (r.v.) X with a probability p(X=x). A signal y is produced by another source Y with probability p(Y=y).
• A communication channel is a relation between two r.v.’s X and Y. It is described by the (finite) conditional probability or quantizer: Q(Y | X).
• Entropy: the uncertainty or self information of a r. v.
• Conditional Entropy:
• Mutual Information: the amount of information that one r.v. contains about another r.v.
),()()(
)()(
),(log);( ,
YXHYHXH
YpXp
YXpEYXI YX
)(
1log
XpEXH X
)|(log| , XYQEXYH YX
The entropy and mutual information of the data asymptotically approach the true population entropy and mutual information respectively.
Shannon McMillan Breiman Theorem (iid case)
If are i.i.d., then
a.s.)(),...,,(log1
lim 110 XHXXXpn n
n
niiX 1}{
Proof: Let Yi=log p(Xi ) are i.i.d.. The theorem follows from the Strong Law of Large Numbers. �
This result holds if is a stationary ergodic sequences as well. This is important for us since our data is not i.i.d., but we do assume that X and Y are stationary ergodic.
niiY 1}{
Why Information Theory?
niiX 1}{
Conceptual ModelMajor Obstacle: To determine a coding scheme, Q, between X and Y
requires large amounts of data
Idea: Determine the coding scheme, Q*, between X and YN , a quantization of Y, such that: YN preserves as much mutual information with X as possible:
X Y
Q(Y |X)environmentalstimuli
neuralresponses
YN
quantizedneural
responsesq*(YN |Y)
New Goal: Find the quantizer q*(YN |Y) that maximizes I(X,YN )
Q*(YN |X)
Mathematical Model
}0)|( and1)|(|)|({
:
yyqyyqyyq Ny
NNy
yYy
N
We search for the maximizer q*(YN|Y) which satisfies maxq H(YN |Y)
constrained by I(X,YN ) = Imax
Imax := maxqI(X,YN)
The feasible region assures that q*(YN|Y) is a true conditional probability.
321 yyy
y y is a product of simplices (each simplex is a discrete probability space)
We begin our search for the maximizer q*(YN|Y) by solving
① q* = argmaxqI(X,YN )
② If there are multiple solutions to , then, by Jayne's maximum entropy principle, we take the one that maximizes the entropy
maxq H(YN |Y) constrained by I(X,YN ) = Imax
In order to solve , use the method of lagrange multipliers to get
maxq H(YN |Y) + I(X,YN )
④ Annealing: In practice, we increment in small steps to . For each , we solve
q* = argmaxq H(YN |Y) + I(X,YN )
Note that lim q* = q* from .
Justification
Some nice properties of the model The information functions are nice.
Theorem 1 H(YN |Y) is concave, I(X,YN ) is convex.
is really nice.Lemma 2 is the convex hull of vertices ().
We can reformulate as two different optimization problems Theorem 3 An equivalent problem to is to solve
q*(YN|Y) = argmaxq vertices() I(X,YN )
Proof: This result follows from Theorem 1 and Lemma 2. �
Corollary 4 The extrema of lie on the vertices of .
Theorem 5 If q*M is the maximizer of
constrained by I(X,YN ) = Imax
then q* = 1/M q*M, where
Proof: By Theorem 3 and the fact that M is the convex hull of vertices(M) �
321 yyy
321 yyy
}0)|( and)|(|)|({
:
,
,
yyqMyyqyyq Ny
NNyM
yMYy
M
N
)|(max YYH Nq MM
Annealing:
q* = argmaxq H(YN |Y) + I(X,YN )
Augmented Lagrangian Method
Implicit solution:
Set . Solve implicitly for q:
Drawback: current choice of is ad hoc.
Vertex Search Algorithm:
maxqvertices() I(X,YN )
Drawback: |vertices ()| = N|Y|
N
n
n
y
yp
qI
yy
yp
qI
n ee
q
,1
Optimization SchemesGoal: Build a sequence .
*1}{ qq n
kk
1
2
3
321 yyy
0 y y
Ny
N
yyqIH ))|((
Results: Application to synthetic data(p(X,Y) is known)