Mathematical Problems of Decision Making

Mathematical Problems of Decision

Making

Tyler McMillenCalifornia State University at FullertonApril 25, 2007

Questions

How do you choose between multiple alternatives?

Is there a “best” way to choose?

Is the brain “hard-wired” to choose in the best way? (or not such a good way…)

Overview1. Description of problem2. Modeling perceptual choice3. Hypothesis testing4. Decision making5. Sequential effects

pure…or…appliedcountry…or…western

run … or … fight

hit … or … stay

…or…

door number 1,2 or 3?whose face is that?

lieddiedlien

died

lied

liedlienreconstruction

lien

reconstruction

90 or 0

45 or 25

45 or 40

90 or 0

45 or 40

Bars on a circle

Models of decision making

• Hard! Simplest types of decisions only partially understood

• Statistical regularities:

•Reaction Times (RT), Error Rates (ER), etc.•Hick’s Law: RT ~ log(N)•Loss avoidance•Magic number 7 (plus or minus 2)

Hick’s Law & Information Transmission

RT ~ A log(N) + B

(up to a point…)

Threshold Crossing

QuickTime™ and aCinepak decompressor

are needed to see this picture.

dx = a dt + c dW(drift-diffusion equation)

Stochastic Differential Equations (SDEs)

€

dx = f (x, t)dt + c(x, t)dW

€

∂p∂t

= − ∂x i f i(x, t)p[ ]i

∑ +1

2∂x i∂x j{[c(x, t)cT (x, t)]ij p}

i, j

∑

(Fokker-Planck equation)

Drift-diffusion equation

€

dx = Adt + cdW

€

∂p∂t

= −A ⋅∇p+1

2c 2∇ 2p

∂p

∂t= −a

∂p

∂x+

1

2c 2 ∂

2p

∂x 2

1-D Ornstein-Uhlenbeck equation

€

dx = λx + a[ ]dt + cdW

€

∂p∂t

= −∂

∂x(λx + a)p[ ] +

1

2c 2 ∂

2p

∂x 2

€

dW =W (t + dt) −W (t) ~ N 0, dt( )

Perceptual model for 2 choices

I1I2Input

Decay: k

Inhibition: w Neural unitsx1 x2

Q: Which is larger, I1 or I2?

+ noise

Perceptual model for 2 choices

Collapse to a line:

Dynamics determined by: x = x1 – x2

Equivalent to SPRT – optimal test! (Best when k=w.)(Can calculate explicitely ER, RT, RR. Behavior of humans, chimps, seems to fit that predicted by the drift-diffusion model. Cf. Ratcliff, et.al.)

dx = [(w-k) x + a] dt + c dW

dx = a dt + c dW (when “balanced”, w = k)

no noise noisy

x1 correct

Dashed - no inhibition or decaySolid - inhibition & decayInhibition “sharpens” acuity (spreads alternatives)

I1 I2Input

Inhibition

Neural units: x1 x2

IM

xM

Q: Which is larger, I1, I2, … , IM?

+ noise

…

Neural models of perceptual choice

Decay

…

Neural models of perceptual choice

Does the model capture observed behavior, e.g., Hick’s Law?

Can we show that the model performs optimally? (or not?)

Two different kinds of tasks:Free-response (make a decision any time)Interrogation (forced to decide at a given time)

What does the model say about the difference in behavior in the two kinds of tasks?

Optimality

The optimal decision making algorithm is the one that minimizes the time needed to make the decision (RT) for a given error rate (ER). This is equivalent to maximizing the reward rate (RR), the ratio of the probability of being correct to the time needed to make a decision:

Hypothesis Testing

Neyman & Pearson (1933) – optimal tests for fixed sample sizes

Wald, Friedman, Wallis, Barnard, Turing (1940’s) – optimizing the sample size in tests between two alternatives

Wald, Sobel, Armitage, Lorden, Dragalin, … (1940’s-present) – nearly optimizing tests for more than two alternatives

x

-0.5145

-1.0050

0.8634

-1.2762

0.2765

Ave: -1.22

Testing between M alternatives: H1, H2, … , HM

Know: pi(x) = P(x|Hi)

Which is the correct distribution?Suppose we draw 5 samples:

x

2.2189

1.7253

2.9901

2.2617

3.2134

Ave: 2.48

Example: 3 hypotheses

How confident can we be in our decision?How many trials should we make before we stop?

(If Hi is true, the density of x’s is pi(x) )

Another way to view the problem.

Decision will depend on the “path” of the sum of samples:

Drift-diffusion equation:

Path:

Test between two hypotheses H1 and H2:

(likelihood ratio)

(a) Fixed Sample Size TestsIf the number N of samples x1,…,xN is fixed,Neyman-Pearson Lemma (1933) says the best result will be obtained by taking

Value of K determines accuracy.


SPRT: Continue testing until 21 crosses an upper or lower threshold

SPRT Optimality Theorem: (Wald) Among all tests with a given bound on the error rate, the SPRT minimizes the expected number of trials

Q: Is there a generalization of the SPRT, an “MSPRT,” with the same optimality property?

Choose H1 Choose H2

(b) Sequential TestsIf testing can stop at any time, SPRT gives best result:

As the number of samples increases, SPRT approaches threshold test on drift-diffusion equation (sampling at each instant).


Two Approaches

• Continue testing until one hypothesis is preferred to all others. (Use SPRT’s as component tests between the hypotheses.)

Sobel-Wald Test on 3 hypotheses (1949) Armitage Test on multiple hypotheses (1950) Simons Test on 3 hypotheses (1967) MSPRT (1990’s)• Continue testing until all but one hypothesis can be rejected. (In the

spirit of significance testing, based on generalized likelihood ratios.) Lorden Test (1972) m-SPRTs

Test between more than two hypotheses:

No optimal test!

Multi-Sequential Probability Ratio Tests (MSPRT’s)

THEOREM: (Dragalin, Tartakovsky and Veeravalli, 1999) The MSPRT’s are “asymptotically optimal”: As the error rate approaches zero, the expected sample size in the MSPRT’s is bounded by the infimumum over all tests.

j: prior probability of Hj

Note: Both tests reduce to SPRT when M=2

Continue testing until pnj or Lj(n) cross

threshold, choose the first one that crosses.

MSPRT on 3 alternatives

Equal prior probabilities(unbiased)

Unequal prior probabilities1=.8, 2=.15, 3=.05

(biased)

Samples: x1,x2, …

red - a

blue - b

Boundaries for M alternatives

I1 I2Input:

Decay: kInhibition: w

Neural units: x1 x2

IM

xM

… Q: Which is larger, I1, I2, … , IM?+ noise

…

Perceptual model for M>2 choices

(Usher & McClelland, ’01)

Connectionist Model

This model has been successful in modeling response time, error rate, etc., statistics, in several cases. Additionally captures loss-avoidance phenomenon.Q: Is it optimal? Can we say anything about what happens when the number of alternatives increases?

Connectionist Model

MSPRT b test on : Choose first i that satisfies

M=2 model performs the optimal test.What about for M>2?

max-vs-next

Absolute and relative tests

max-vs-average

absolute

relative tests perform better (because of noise)

0.6 0.6

0.1 0.6

max-vs-average

max-vs-next

Max-vs-next is better (more information), but computationally more expensive.

Collapse to a HyperplaneTransform on eigenvectors:

On xi threshold crossing is equivalent to the “max vs average” test.

3 choices

Calculating RR

For 2 alternatives, can write (backward Kolmogorov) equations for 1st passage time (RT) and error rate (ER) as BVP’s:

For M>2 alternatives, backward Kolmogorov equations are drift-diffusion BVP’s on (hyper) triangles:

Can be solved explicitely to give expressions for RR as function of parameters.

No explicit solution. Solving numerically not easier than Monte Carlo simulations.

Pr(correct) = 0.95

Hick’s Law

4 alternatives

Best: max-vs-nextGood: max-vs-ave

(same as threshold crossing)Worst: “unbalanced”Balanced (w=k) gives best result.

Interrogation Protocol

TI = time to reach a given accuracyOptimal when w=k(magnitude of w,k irrelevant)

Interrogation Protocol

Hick’s “type” Law

Interrogation vs. Free Response

(2 choices)

Time to reach a given accuracy P.

Free-response does better – a particular example of the fact that sequential tests perform better than fixed sample size tests – That’s why they were invented!

Sequential effects

Cho, et.al. Mechanisms underlying dependencies of performance on stimulus history in a two-alternative forced-choice task. (2002)

Effects of inter-trial delay

W. SOMMER, H. LEUTHOLD and E. SOETENS, Covert signs of expectancy in serial reaction time tasks revealed by event-related potentials Perception & Psychophysics 1999, 61 (2), 342-353

A “simple” model

€

dx = −λx + a[ ]dt + cdW

Basic idea: Stable OU process with varying threshold

Why it “works”…

Conclusions & Questios

• The simple threshold crossing test in the connectionist model is not optimal, but pretty good.

• Suboptimality is compensated by simplicity.• Decay--if balanced by inhibition--is an

advantage.• What are the accumulators? Are there

accumulators!?• What is the actual mechanism underlying

sequential effects?

References

• Hick, W. (1952). On the rate of gain of information. Quart. J. Exp. Psych, vol. 4, pp. 11-26• McMillen, T. and Holmes, P. (2006). The dynamics of choice among multiple

alternatives. J. Math. Psych. vol. 50, pp 30-57• Miller, G.A. (1956). The magical number 7 (plus or minus 2), The Psychological Review,

vol. 63, pp. 81-97• Teichner, W. and Krebs, M. (1974). Laws of visual choice reaction time. Psych. Rev., vol

81, pp. 75-98• Usher, M. and McClelland, J. (2001). On the time course of perceptual choice: The leaky

competing accumulator model. Psych. Rev., vol 108, pp. 550-592

Collaborators

The Princeton neuroscience crew: Philip Holmes, Jonathan Cohen, Juan Gao, Patrick Simen, et.al.

Mathematical Problems of Decision Making

Documents