Top Banner
What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks to NSF & NIMH.
17

What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Jan 19, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

What’s optimal about N choices?

Tyler McMillen & Phil Holmes,

PACM/CSBMB/Conte Center,

Princeton University.

Banbury, Bunbury, May 2005 at CSH.

Thanks to NSF & NIMH.

Page 2: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Neuro-inspired decision-making models*1. The two-alternative forced-choice task (2-AFC). Optimal decisions: SPRT, LAM and DDM*.2. Optimal performance curves.3. MSPRT: an asymptotically optimal scheme for n > 2 choices (Dragalin et al., 1990-2000) .4. LAM realizations of n-AFC; mean RT vs ER; Hick’s law.5. Summary

(the maximal order statistics)

* Optimality viewpoint: maybe animals can’t do it, but they can’t do better.** Sequential probability ratio test, leaky accumulator model, drift-diffusion model.

Page 3: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

2-AFC, SPRT, LAM & DDM

p1(x) p2(x)

Choosing between 2 alternatives with noisy incoming data

Set thresholds +Z, -Z and form running product of likelihood ratios:

Decide 1 (resp. 2) when Rn first falls below -Z (resp. exceeds +Z).

Theorem (Wald, 1947; Barnard, 1946): SPRT is optimal among fixed or variable sample size tests in the sense that, for a given error rate (ER), expected # samples to decide is minimal. (Or, for given # samples, ER is minimal.)

Page 4: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

DDM is the continuum limit of SPRT. Let

+Z

-Z

QuickTime™ and aVideo decompressorare needed to see this picture.

Drift, a

Extensive modeling of behavioral data (Stone, Laming, Ratcliff et al., ~1960-2005).

Page 5: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

There’s also increasing neural evidence for DDM:

FEF: Schall, Stuphorn & Brown, Neuron, 2002.

LIP: Gold & Shadlen, Neuron, 2002.

Page 6: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Balanced LAM reduces to DDM on invariant line:

(linearized: race model if ). Uncouple via

stable OU flow in y1 if large, DD in y2 if .

Absolute thresholds in (x1, x2) become relative (x2 - x1)!

+Z

-Z

Page 7: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

LAM sample paths collapse towards an attracting invariant manifold. (cf. C. Brody: Machens et al., Science, 2005)

QuickTime™ and aVideo decompressorare needed to see this picture.

First passage across threshold determines choice.

Page 8: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Simple expressions for first passage times and ERs:

Redn to 2 params:

Can compute thresholds that maximize reward rate:

(Gold-Shadlen, 2002; Bogacz et al., 2004-5) This leads to …

(1)

Page 9: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Optimal performance curves (OPCs):Human behavioral data: the best are optimal, but what about the rest? Bad objective function, or bad learners?

Left: RR defined previously;Right: a family of RR’s weighted for accuracy.Learning not considered here. (Bogacz et al., 2004; Simen, 2005.)

Increasing acc. wt.

Page 10: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

N-AFC: MSPRT & LAM

MSPRT chooses among n alternatives by a max vs. next test:

MSPRT is asymptotically optimal in the sense that # samples is minimal in the limit of low ERs (Dragalin et al, IEEE trans., 1999-2000).

A LAM realization of MSPRT (Usher-McClelland 2001)

asymptotically predicts (cf. Usher et al, 2002)

Page 11: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

The log(n-1) dependence is similar to Hick’s Law: RT = A + B log n or RT = B log (n+1).

W.E. Hick, Q.J. Exp. Psych, 1952.

We can provide a theoretical basis and predict explicit SNR and ER dependence in the coefficients A, B.

Page 12: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Multiplicative constants blow up log-ly as ER -> 0.

Behavior for small and larger ERs:

Empirical formula, generalizes (1),

(2)

(2)

Page 13: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

But a running max vs next test is computationally costly (?). LAM can approximately execute a max vs average test via absolute thresholds. n-unit LAM decoupled by:

y1 attracted to hyperplane y1 = A, so max vs average becomes

an absolute test!

Attraction is fasterfor larger n: stableeigenvalue1 ~ n.

DD on hyperplane

Page 14: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Max vs average is not optimal, but it’s not so bad:

absolutemax vs averagemax vs next

absolutemax vs averagemax vs next

Unbalanced LAMs - OU processes

Max vs next and max vs ave coincide for n=2. As n increases, max vs ave deteriorates, approaching absolutetest performance. But it’s still better for n < 8-10!

Page 15: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Simple LAM/DD predicts log (n-1), not log n or log (n+1) as in Hick’s law:

but a distribution of starting points gives approx log n scaling for 2 < n < 8, and ER and SNR effects may also enter.

Page 16: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

The effect of nonlinear activation functions, bounded below, is to shift scaling toward linear in n:

The limited dynamic range degrades performance, but can be offset by suitable bias (recentering).

Nonlinear LAMs

Linearized LAM

Page 17: What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Summary: N-AFC• MSPRT max vs next test is asymptotically optimal in

low ER limit.• LAM (& race model) can perform max vs next test.• Hick’s law:

emerges for max vs next, max vs ave & absolute tests. A, B smallest for max vs next, OK for max vs ave.• LAM executes a max vs average test on its attracting

hyperplane using absolute thresholds.• Variable start points give log n scaling for `small n.’• Nonlinear LAMs degrade performance: RT ~ n for

sufficiently small dynamic range.More info: http://mae.princeton.edu/people/e21/holmes/profile.html