Top Banner
Encoding or decoding
67

Encoding or decoding

Mar 20, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Encoding or decoding

Encoding or decoding

Page 2: Encoding or decoding

How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: • devise and evaluate explicit algorithms for extracting a stimulus estimate

• directly quantify the relationship between stimulus and response using information theory

Decoding

Page 3: Encoding or decoding

Let’s start with a rate response, r(t) and a stimulus, s(t). The optimal linear estimator is closest to satisfying

Want to solve for K. Multiply by s(t-t’) and integrate over t:

The optimal linear estimator

Page 4: Encoding or decoding

produced terms which are simply correlation functions:

Given a convolution, Fourier transform:

Now we have a straightforward algebraic equation for K(w):

Solving for K(t),

The optimal linear estimator

Page 5: Encoding or decoding

For white noise, the correlation function Css(t) = s2 d(t),

So K(t) is simply Crs(t).

The optimal linear estimator

Page 6: Encoding or decoding

t

K

Stimulus reconstruction

Page 7: Encoding or decoding

Stimulus reconstruction

Page 8: Encoding or decoding

Stimulus reconstruction

Page 9: Encoding or decoding

Yang Dan, UC Berkeley

Reading minds: the LGN

Page 10: Encoding or decoding

Other decoding approaches

Page 11: Encoding or decoding

Britten et al. ‘92: measured both behavior + neural responses

Binary choice tasks

Page 12: Encoding or decoding

Behavioral performance

Page 13: Encoding or decoding

Discriminability: d’ = ( <r>+ - <r>- )/ sr

Predictable from neural activity?

Page 14: Encoding or decoding

p(r|+) p(r|-)

<r>+ <r>-

z

Decoding corresponds to comparing test, r, to threshold, z.

a(z) = P[ r ≥ z|-] false alarm rate, “size”

b(z) = P[ r ≥ z|+] hit rate, “power”

Find z by maximizing P[correct] = p[+] b(z) + p[-](1 – a(z))

Signal detection theory

Page 15: Encoding or decoding

summarize performance of test for different thresholds z

Want b 1, a 0.

ROC curves

Page 16: Encoding or decoding

Threshold z is the result from the first presentation

The area under the ROC curve corresponds to P[correct]

ROC: two alternative forced choice

Page 17: Encoding or decoding

The optimal test function is the likelihood ratio,

l(r) = p[r|+] / p[r|-].

(Neyman-Pearson lemma)

Then

l(z) = (db/dz) / (da/dz) = db/da

i.e. slope of ROC curve

Recall a(z) = P[ r ≥ z|-] false alarm rate, “size”

b(z) = P[ r ≥ z|+] hit rate, “power”

Is there a better test to use than r?

Page 18: Encoding or decoding

If p[r|+] and p[r|-] are both Gaussian, one can show that P[correct] = ½ erfc(-d’/2). To interpret results as two-alternative forced choice, need simultaneous responses from “+ neuron” and from “– neuron”. Simulate “- neuron” responses from same neuron in response to – stimulus.

Ideal observer: performs as area under ROC curve.

The logistic function

Page 19: Encoding or decoding

Again, if p[r|-] and p[r|+] are Gaussian, and p[+] and p[-] are equal, P[+|r] = 1/ [1 + exp(-d’ (r - <r>)/ s)]. d’ is the slope of the sigmoidal fitted to P[+|r]

More d’

Page 20: Encoding or decoding

Close correspondence between neural and behaviour..

Why so many neurons? Correlations limit performance.

Neurons vs organisms

Page 21: Encoding or decoding

p[r|+] p[r|-]

<r>+ <r>-

z

Role of priors:

Find z by maximizing P[correct] = p[+] b(z) + p[-](1 – a(z))

Priors

Page 22: Encoding or decoding

Classification of noisy data: single photon responses

The wind or a tiger?

Rieke

Page 23: Encoding or decoding

Classification of noisy data: single photon responses

I

P(I|signal)

P(I|noise)

Nonlinear separation of signal and noise

Rieke

Page 24: Encoding or decoding

Classification of noisy data: single photon responses

I

P(I|signal)

P(I|noise)

Nonlinear separation of signal and noise

Rieke

Page 25: Encoding or decoding

Classification of noisy data: single photon responses

I

P(I|signal)

P(I|noise)

Nonlinear separation of signal and noise

Rieke

Page 26: Encoding or decoding

Nonlinear separation of signal and noise

Classification of noisy data: single photon responses

I

P(I|signal) P(signal)

P(I|noise) P(noise)

Rieke

Page 27: Encoding or decoding

How about costs?

I

P(I|signal) P(signal)

P(I|noise) P(noise)

Page 28: Encoding or decoding

Penalty for incorrect answer: L+, L-

For an observation r, what is the expected loss?

Loss- = L-P[+|r]

Cut your losses: answer + when Loss+ < Loss-

i.e. L+P[-|r] < L-P[+|r].

Using Bayes’, P[+|r] = p[r|+]P[+]/p(r);

P[-|r] = p[r|-]P[-]/p(r);

l(r) = p[r|+]/p[r|-] > L+P[-] / L-P[+] .

Loss+ = L+P[-|r]

Building in cost

Page 29: Encoding or decoding

For small stimulus differences s and s + ds

like comparing

to threshold

Relationship of likelihood to tuning curves

Page 30: Encoding or decoding

• Population code formulation

• Methods for decoding: population vector Bayesian inference maximum likelihood maximum a posteriori • Fisher information

Decoding from many neurons: population codes

Page 31: Encoding or decoding

Jacobs G A et al. J Exp Biol 2008;211:1819-1828

©2008 by The Company of Biologists Ltd

Cricket cercal cells

Page 32: Encoding or decoding

Cricket cercal cells

Page 33: Encoding or decoding

Theunissen & Miller, 1991

RMS error in estimate

Population vector

Page 34: Encoding or decoding

Cosine tuning curve of a motor cortical neuron

Hand reaching direction

Population coding in M1

r0

Page 35: Encoding or decoding

Cosine tuning:

Pop. vector:

For sufficiently large N,

is parallel to the direction of arm movement

Population coding in M1

Page 36: Encoding or decoding

Cosine tuning:

Pop. vector:

Population coding in M1

Difficulties with this coding scheme?

Page 37: Encoding or decoding

The population vector is neither general nor optimal. “Optimal”: make use of all information in the stimulus/response distributions

Is this the best one can do?

Page 38: Encoding or decoding

Bayes’ law:

likelihood function

a posteriori distribution

conditional distribution

marginal distribution

prior distribution

Bayesian inference

Page 39: Encoding or decoding

Introduce a cost function, L(s,sBayes); minimize mean cost.

For least squares cost, L(s,sBayes) = (s – sBayes)2 ;

solution is the conditional mean.

Want an estimator sBayes

Bayesian estimation

Page 40: Encoding or decoding

By Bayes’ law,

likelihood function

a posteriori distribution

Bayesian inference

Page 41: Encoding or decoding

Find maximum of P[r|s] over s More generally, probability of the data given the “model” “Model” = stimulus assume parametric form for tuning curve

Maximum likelihood

Page 42: Encoding or decoding

By Bayes’ law,

likelihood function

a posteriori distribution

Bayesian inference

Page 43: Encoding or decoding

Theunissen & Miller, 1991

RMS error in estimate

Population vector

Page 44: Encoding or decoding

ML: s* which maximizes p[r|s] MAP: s* which maximizes p[s|r] Difference is the role of the prior: differ by factor p[s]/p[r]

For cercal data:

MAP and ML

Page 45: Encoding or decoding

Work through a specific example • assume independence • assume Poisson firing

Noise model: Poisson distribution

PT[k] = (lT)k exp(-lT)/k!

Decoding an arbitrary continuous stimulus

Page 46: Encoding or decoding

E.g. Gaussian tuning curves

Decoding an arbitrary continuous stimulus

Page 47: Encoding or decoding

Assume Poisson:

Assume independent:

Population response of 11 cells with Gaussian tuning curves

Need to know full P[r|s]

Page 48: Encoding or decoding

Apply ML: maximize ln P[r|s] with respect to s

Set derivative to zero, use sum = constant

From Gaussianity of tuning curves,

If all s same

ML

Page 49: Encoding or decoding

Apply MAP: maximise ln p[s|r] with respect to s

Set derivative to zero, use sum = constant

From Gaussianity of tuning curves,

MAP

Page 50: Encoding or decoding

Given this data:

Constant prior

Prior with mean -2, variance 1

MAP:

Page 51: Encoding or decoding

For stimulus s, have estimated sest

Bias:

Cramer-Rao bound:

Mean square error:

Variance:

Fisher information

(ML is unbiased: b = b’ = 0)

How good is our estimate?

Page 52: Encoding or decoding

Alternatively:

Quantifies local stimulus discriminability

Fisher information

Page 53: Encoding or decoding

For the Gaussian tuning curves w/Poisson statistics:

Fisher information for Gaussian tuning curves

Page 54: Encoding or decoding

Approximate:

Thus, Narrow tuning curves are better

But not in higher dimensions!

Are narrow or broad tuning curves better?

..what happens in 2D?

Page 55: Encoding or decoding

Recall d' = mean difference/standard deviation

Can also decode and discriminate using decoded values. Trying to discriminate s and s+Ds: Difference in ML estimate is Ds (unbiased) variance in estimate is 1/IF(s).

Fisher information and discrimination

Page 56: Encoding or decoding

• Tuning curve/mean firing rate

• Correlations in the population

Limitations of these approaches

Page 57: Encoding or decoding

The importance of correlation

Shadlen and Newsome, ‘98

Page 58: Encoding or decoding

The importance of correlation

Page 59: Encoding or decoding

The importance of correlation

Page 60: Encoding or decoding

Model-based vs model free

Entropy and Shannon information

Page 61: Encoding or decoding

For a random variable X with distribution p(x), the entropy is H[X] = - Sx p(x) log2p(x)

Information is defined as I[X] = - log2p(x)

Entropy and Shannon information

Page 62: Encoding or decoding

Typically, “information” = mutual information: how much knowing the value of one random variable r (the response) reduces uncertainty about another random variable s (the stimulus). Variability in response is due both to different stimuli and to noise. How much response variability is “useful”, i.e. can represent different messages, depends on the noise. Noise can be specific to a given stimulus.

Mutual information

Page 63: Encoding or decoding

Information quantifies how independent r and s are: I(s;r) = DKL [P(r,s), P(r)P(s)]

I(s;r) = H[P(r)] – Ss P(s) H[P(r|s)] .

Alternatively:

Mutual information

Page 64: Encoding or decoding

Need to know the conditional distribution P(s|r) or P(r|s). Take a particular stimulus s=s0 and repeat many times to obtain P(r|s0). Compute variability due to noise: noise entropy

Mutual information is the difference between the total response entropy and the mean noise entropy: I(s;r) = H[P(r)] – Ss P(s) H[P(r|s)] .

Mutual information

Page 65: Encoding or decoding

Information is symmetric in r and s

Examples: response is unrelated to stimulus: p[r|s] = ?, MI = ? response is perfectly predicted by stimulus: p[r|s] = ?

Mutual information

Page 66: Encoding or decoding

r+ encodes stimulus +, r- encodes stimulus -

Simple example

but with a probability of error: P(r+|+) = 1- p P(r-|-) = 1- p What is the response entropy H[p]? What is the noise entropy?

Page 67: Encoding or decoding

Entropy Information

Entropy and Shannon information

H[p] = -p+ log p+ – (1-p+)log(1-p+)

H[P(r|s)] = -p log p – (1-p)log(1-p)

When p+ = ½,