Indiana Universitykruschke/articles/Kruschke2007IPAMslides.pdf · Locally Bayesian Learning John K. Kruschke Indiana University Kruschke, IPAM GSS 2007

Locally Bayesian Learning

John K. Kruschke

Indiana University

Kruschke, IPAM GSS 2007

Bayesian Prediction & Estimation

),|( xyp

Hypothesized models, parameterized by , map each x value to a

probability distribution over y values.



),|( xyp

)(p

There is a distribution of probabilities regarding values of .



),|( xyp

)(p

dyxypyy

dpxypxyp

)|( ˆ loss, SSEFor

)(),|()|(

For a given x, we predict y by

marginalizing over parameter values.



dpxyp

pxypxyp

)(),|(

)(),|(),|(

),|( xyp

)(p

For a given x,y pair, we

estimate parameters by Bayes’ rule:



),|( xyp

)(p

Formalism doesn’t care what it refers to in the world. Suppose thatx is a stimulus,y is a response, and

is a hypothesis.


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian Prediction

),|( xyp

)(p

x


Then , p(), and p(y|x,) are in (or

refer to) the mind.

Bayesian Estimation = Learning

dpxyp

pxypxyp

)(),|(

)(),|(),|(

),|( xyp

)(p

x y


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian Cognition

),|( xyp

)(p

x

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Not only cognition by Bayes...

),|( xyp

)(p

x

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian cognition by others, too

),|( xyp

)(p

x

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian Cognition?

),|( xyp

)(p

x

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian Cognition?

),|( xyp

)(p

x

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian Cognition?

),|( xyp

)(p

x

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


Image from Jacob, Litorco & Lee (2004)

dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian Cognition?

),|( xyp

)(px

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian Cognition?

),|( xyp

)(px

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian Cognition?

),|( xyp

)(p

x

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


dyxypyy

dpxyp

xyp

)|( ˆ

)(),|(

)|(

Bayesian Cognition?

),|( xyp

)(p

x

dpxyp

pxypxyp

)(),|(

)(),|(),|(

y


To Ponder:

• For a Bayesian model of “cognitive behavior”, what level of analysis is appropriate?

• If a system is Bayesian at one level of analysis, is it Bayesian at other levels?


Bayesian Cognition?


Marr (1982):

Image Intensity

Primal Sketch

2½D Sketch

3D Model

Is the overall mapping, from image to 3D model, Bayesian?

Is each component Bayesian?

Consider a Chain of Bayesians

),|( 111 xyp)( 1p


Thomas1 Thomas2 Thomas3

),|( 222 xyp ),|( 333 xyp)( 2p )( 3p

Image Intensity

Primal Sketch

2½D Sketch

3D Model

Not Parallel Bayesians

),|( 111 xyp

)( 1p


Thomas1

),|( 222 xyp

)( 2pThomas2

),|( 333 xyp

)( 3pThomas3

A Chain of Bayesians

),|( 111 xyp)( 1p

1x


3y

),|( 222 xyp ),|( 333 xyp)( 2p )( 3p

1y 2y

2x 3x


3y

Not Iterated Bayesians

)( 1p

1x


3y

)( 2p )( 3p

1y 2y

2x 3x

Thomas Son of

Thomas

Grandson of

Thomas

3x

)|,( 111 xyp )|,( 222 xyp )|,( 333 xyp

1y 2y 2y

2x2x


),|( 111 xyp)( 1p

1x


3y

),|( 222 xyp ),|( 333 xyp)( 2p )( 3p

1y 2y

2x 3x


3y

Could Be Generative Bayesians

)|,( 111 xyp)( 1p

1x


3y

)|,( 222 xyp )|,( 333 xyp)( 2p )( 3p

1y 2y

2x 3x


3y

But not pursued here.


),|( 111 xyp)( 1p

1x


3y

),|( 222 xyp ),|( 333 xyp)( 2p )( 3p

1y 2y

2x 3x


3y


),|( 111 xyp)( 1p

1x


3y

),|( 222 xyp ),|( 333 xyp)( 2p )( 3p

1y 2y

2x 3x


3y

The standard approach: The three heads are conjoined over a

joint parameter space.

The Globally Bayesian Approach

3213213211313

21111212323

32113

),,(),,,|()|(

),|(),|(),|(

),,,|(

dddpxypxyp

dydyxypyypyyp

xyp

),,( 321 p

1x


3y

The Globally Bayesian Approach

1x


3y

32132132113

32132113

13321

),,(),,,|(

),,(),,,|(

),|,,(

dddpxyp

pxyp

xyp

),,( 321 p

The Locally Bayesian Approach


You are all individuals!

Yes, we are all individuals!


),|( 111 xyp)( 1p

1x3y

Thomas Richard Harold

),|( 222 xyp ),|( 333 xyp)( 2p )( 3p

1y 2y

2x 3x

Locally Bayesian Prediction

),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p )( 3p

1y 2y

2x 3x

Each Bayesian agent computes its best prediction, and propagates it forward.

This process needs integrals over only the individual parameter spaces.


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p

1y 2y

2x 3x

Update p(3|y3,x3) by Bayes’ rule. Involves integrating only over the 3

parameter space.

),|( 333 xyp


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

1y 2y

2x 3x

But how should poor Richard update his beliefs about 2? He needs a y2 value to

learn about!

?


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

1y

2x

)|( argmaxLet *

332*3

xypyx

2y


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

1y

2x

)|( argmaxLet *

332*3

xypyx

2y

Harold tells Richard to produce a value that is consistent with Harold’s beliefs!


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

1y

2x

)ˆ|()|( with of a valueget

just maximize; toneedt don' practice, In

)|( argmaxLet

23232

*

332*3

yypyypy

xypyx

2y

2y


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

2y),|( 222 xyp

1y

)|( argmaxLet *

221*2

xypyx

?


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

2y),|( 222 xyp

1y

)|( argmaxLet *

221*2

xypyx

Richard tells Thomas to produce a value that is consistent with Richard’s beliefs!

),|( 111 xyp


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

2y),|( 222 xyp

1y

)ˆ|()|( with of a valueget

just maximize; toneedt don' practice, In

)|( argmaxLet

12121

*

221*2

yypyypy

xypyx

1y

),|( 111 xyp


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

2y),|( 222 xyp

1y

Other updating dynamics are possible.E.g., first propagate y3 all the way back to the first

agent, and update p(1|y1,x1). Then compute predicted ŷ1. Then update p(2|y2, ŷ1 ). And so on.

),|( 111 xyp


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

2y),|( 222 xyp

1y

Each agent is told by its superior to learn a datum that is maximally consistent (or

minimally inconsistent) with the superior’s current beliefs.

),|( 111 xyp


),|( 111 xyp)( 1p

1x


3y


),|( 222 xyp ),|( 333 xyp)( 2p ),|( 333 xyp

2y),|( 222 xyp

1y

This process protects the superior’s beliefs from disconfirmation! The inferior will learn to “distort the data” to avoid disconfirming the

superior.

),|( 111 xyp

Locally Bayesian Learning (LBL)


LBL preserves current beliefs and creates “epicycles” for new data. Perhaps not perfectly

optimal, but then, are real systems?

Put your models where your data are...

• Some real behavior, in the domain of associative learning, to which Locally Bayesian Learning can be applied.


Typical Learning Task

RADIO

OCEAN

Press F, G, H or J.

Stimulus presentation and response collection:

Typical Learning Task

RADIO

OCEAN

[Wrong!/Correct!] The correct response is H.

Corrective feedback:

50

Phenomena Suggestive of Attention in Learning

• Fewer relevant cues faster learning.

• Intradimensional shifts are faster than extradimensional.

• Attenuated learning after blocking.

• Overshadowing.

• Context-specific attention.

• Highlighting.

• Et cetera!

51

Highlighting:

Early Training: I.PEgE .

Late Training: I.PEgE I.PLgL

Testing Results:

Ig? (E!)

PE.PLg? (L!)

52

Highlighting:



Testing Results:

Ig? (E!)

PE.PLg? (L!)

53

Highlighting:



Testing Results:

Ig? (E!)

PE.PLg? (L!)

I

PE PL

E L

54

Highlighting:



Testing Results:

Ig? (E!)

PE.PLg? (L!)

I

PE PL

E L

Design: Highlighting

Phase CuesgOutcome

Initial Training:

(2x) I1.PE1gE1 (2x) I2.PE2gE2

3:1 base-rateTraining:


(1x) I1.PL1gL1 (1x) I2.PL2gL2

1:3 base-rate Training:



Testing: PE.PLg?, etc.

I

PE PL

E L

Design: Highlighting

Phase CuesgOutcome

Initial Training:








Testing: PE.PLg?, etc.

“Canonical” Design: Highlighting

# Blocks CuesgOutcome

N1: (2x) I1.PE1gE1 (2x) I2.PE2gE2

N2:(3x) I1.PE1gE1 (3x) I2.PE2gE2


N1+N2:(1x) I1.PE1gE1 (1x) I2.PE2gE2


Testing: PE.PLg?, etc.Frequency of I.PEE trials equalsfrequency of I.PLL trials.

Highlighting: Results I.PE

0

20

40

60

80

100

E L Eo Lo

I.PE

I

PE PL

E L

Highlighting: Results I.PL

0

20

40

60

80

100

E L Eo Lo

I.PL

I

PE PL

E L

Highlighting: Results I

0

20

40

60

80

100

E L Eo Lo

I

I

PE PL

E L

Highlighting: Results PE.PL

0

20

40

60

80

100

E L Eo Lo

PE.PL

I

PE PL

E L

Highlighting: Results I.PE.PL

0

20

40

60

80

100

E L Eo Lo

I.PE.PL

I

PE PL

E L

Highlighting: Results I.PEo.PLo

0

20

40

60

80

100

E L Eo Lo

I.PEo.PLo

I

PE PL

E L

64

Not just for meaningless associations...

• Highlighting also happens in meaningful domains...

65

I

PE

E

An Application: Highlighting while web browsing.

66

I

PL

L


67


If browsed left-to-right and top-to-bottom, then I.PEE tends to be before I.PLL.

68

Test items

PE

PL

Results: I yields strong preference for Early quality;PE.PL yields strong preference for Later quality.

69

An Application: Highlighting of personal attributes.

Early Training:

honest(+) & conventional(-) g Fred

Late Training:


honest(+) & materialistic(-) g Jack

I+

PE- PL-

E L

70


Early Training:


Late Training:



honest

+

conventional

-materialistic

-

Fred Jack

71


Early Training:


Late Training:



honest

+

conventional

-materialistic

-

Fred Jack Likability: 5.60

Likability: 6.47

72

What causes highlighting?

• Can your favorite model of learning account for highlighting?

• How about various Bayesian approaches?

– Only candidates are Bayesian approaches with sensitivity to time or trial order

Rational Model(J. R. Anderson 1990)

• Representation:– There are internal clusters that represent subsets of

training items.

– Each cluster has its own set of Dirichlet distributions over beliefs about feature probabilities.

• Learning:– For each item presented, the item is assigned to the

cluster that is most probable.

– The Dirichlet parameters of that cluster are Bayesian updated.

Rational Model Does Not Show Highlighting:

• Cluster parameters are symmetric.

50/50 responding

i

cue

ii

out awa

cuea2

2w1w

cuea1

Kalman Filter(Sutton 1992; Dayan, Kakade et al. 2000+)

CwNwp ,|~

vatNtp out,|~

cuea2

2w1w

cuea1

Kalman Filter Updating: Step 1. Linear Dynamics

CwNwp ,|~

i

cue

ii

out awa

UDCDC

D

T

*

*

cuea2

2w1w

cuea1

Kalman Filter Updating: Step 2. Bayesian Learning

****'

***'

1

1

CaaCavaCCC

ataCavaC

TcuecueTcuecue

TcuecueTcuecue

**,|~ CwNwp

i

cue

ii

out awa

Kalman Filter Does Not Show Highlighting:

Symmetric weights:

– Weight from cue I is near zero.

– Weights from PE and PL are equal and opposite.

Explanation of Highlighting:

• Attention rapidly shifts to the distinctive feature of the later learned outcome.

I

PE PL

E L

Taught:

I

PE PL

E L

Learned:

Models of Attention Shifting: General Framework

Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44.

Kruschke, J. K. & Johansen, M. K. (1999). A model of probabilistic category learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 25, 1083-1119.

Models of Attention Shifting: RASHNL (/ALCOVE)

Roughly analogous to Automatic Relevance Determination (ARD) in Radial Basis Function (RBF) networks.

Models of Attention Shifting: EXIT (/ADIT)

Kruschke, J. K. (1996). Base rates in category learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 3-26.

Kruschke, J. K. (2001). Toward a unified model of attention in associative learning. Journal of Mathematical Psychology, 45, 812-863.

Models of Attention Shifting: EXIT (/ADIT)

Kruschke, J. K. (1996). Base rates in category learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 3-26.

Kruschke, J. K. (2001). Toward a unified model of attention in associative learning. Journal of Mathematical Psychology, 45, 812-863.

Models of Attention Shifting: ATRIUM & POLE

Kalish, M. L., Lewandowsky, S., and Kruschke, J. K. (2004). Population of linear experts: Knowledge partitioning and functionlearning. Psychological Review, 111(4), 1072-1099.

Erickson, M. A. & Kruschke, J. K. (1998). Rules and Exemplars in Category Learning. Journal of Experimental Psychology: General, 127, 107-140.

Models of Attention Shifting: Locally Bayesian

Kruschke, J. K. (2006). Locally Bayesian learning with applications to retrospective revaluation and highlighting. Psychological Review, 113, 677-699.

Locally Bayesian Learning Implemented in an Attentional Learning Model

Cues:

Attention (Hidden):

Outcome:

),|( attin wap attw )(~ attwp

ina

outw )(~ outwp

),|( outwcp

E

PE I PL

PE I PL


Cues:

Attention (Hidden):

Outcome:

6)()1( inatt

jj awsigp

otherwise 0

present is cue if 1in

ia

E

PE I PL

PE I PL


Cues:

Attention (Hidden):

Outcome: E

PE I PL

PE I PL

Hidden activations are attentionally filtered copies of input activations.

6 00 4 00

6 0-4 6 -40

att

jw


Cues:

Attention (Hidden):

Outcome: E

PE I PL

PE I PL

Each combination of weights constitutes a hypothesis. They are symmetrically distributed with uniform prior.

6 00 4 00

6 0-4 6 -40

att

jw


Cues:

Attention (Hidden):

Outcome: E

PE I PL

PE I PL

)()1( outwsigEp

}1,0{

)( ˆ

jj p


Cues:

Attention (Hidden):

Outcome: E

PE I PL

PE I PL

Outcome is arbitrary combination of cues. Prior favors all zeros; symmetrically distributed.

0 00 5 00

0 -55 5 0-5

outw

Highlighting: Prior Distribution


Prior beliefs are symmetric:

There are 50-50 beliefs in neutral (0) or inhibitory (-4) weights from PE and PL to I attn.


Prior beliefs are symmetric:

Beliefs about all cues are neutral.

Highlighting: During training...







Hypotheses After Initial Learning of PE.I E

Cues:

Attention (Hidden):

Outcome: E

PE I PL

PE I PL

5 5 0

00

6 6

Attention (Hidden): PE I PL







Highlighting: End of training

Hypotheses After All Learning, PE.I E and I.PL L

Cues:

Attention (Hidden):

Outcome: E

PE I PL

PE I PL

-40

5w4w

Attention (Hidden): PE I PL

Inhibition of I by PL prevents disconfirmation of previous learning that IE.

2


Model mimics human

preferences


PE does not inhibit attention to I: Beliefs in weights from PE to I-attn have shifted toward 0.


PL does inhibit attention to I:

Beliefs in weights from PL to I-attn have shifted toward -4.


Beliefs about I are asymmetric:

Stronger beliefs in +5 weights than -5 weights.


Beliefs about PE and PL are asymmetric:

PL beliefs are more extreme than PE beliefs.

Models of Attention Shifting: Locally Bayesian

Kalman Filter

Kalman Filter

Layers of Kalman Filters Applied to Highlighting

Kalman Filters

Kalman Filters

E

PE I PL

L

wPE wI wPL

I PLPE

wPE wI wPL

Cues:

Attention:

Outcomes:

Layers of Kalman Filters:Likelihood and Prior Distributions

E

PE I PL

L

wPE wI wPL

I PLPE

wPE wI wPL

Cues:

Attention:

Outcomes:

Layers of Kalman Filters:Outcome generation

Cues:

Attention:

Outcomes:

vectoractivationinput x

yx input

E

PE I PL

L

wPE wI wPL

I PLPE

wPE wI wPL

Layers of Kalman Filters:Target for Attention

Cues:

Attention:

Outcomes: E

PE I PL

L

wPE wI wPL

I PLPE

wPE wI wPL

Layers of Kalman Filters:Target for Attention

Cues:

Attention:

Outcomes:

(To determine unique maximum, included tiny cost for unequal attention values, and tiny cost for non-zero attention on absent cue.)

E

PE I PL

L

wPE wI wPL

I PLPE

wPE wI wPL

Layers of Kalman Filters:Dynamics and Bayesian Learning

Cues:

Attention:

Outcomes: E

PE I PL

L

wPE wI wPL

I PLPE

wPE wI wPL

Layers of Kalman Filters Applied to Highlighting:

Initial p(w)Outcome Node 1 Weights

Highlighting Initial

-0.5 0 0.50

1

2var = 0.04

mean = 0.00

cov = 0.00

-1 0 1

-0.5

0

0.5 cov = 0.00

-1 0 1

-0.5

0

0.5

cov = 0.00

-1 0 1

-0.5

0

0.5

-0.5 0 0.50

1

2var = 0.04

mean = 0.00

cov = 0.00

-1 0 1

-0.5

0

0.5

cov = 0.00

-1 0 1

-0.5

0

0.5 cov = 0.00

-1 0 1

-0.5

0

0.5

-0.5 0 0.50

1

2var = 0.04

mean = 0.00

Outcome Node 2 WeightsHighlighting Initial

-0.5 0 0.50

1

2var = 0.04

mean = 0.00

cov = 0.00

-1 0 1

-0.5

0

0.5 cov = 0.00

-1 0 1

-0.5

0

0.5

cov = 0.00

-1 0 1

-0.5

0

0.5

-0.5 0 0.50

1

2var = 0.04

mean = 0.00

cov = 0.00

-1 0 1

-0.5

0

0.5

cov = 0.00

-1 0 1

-0.5

0

0.5 cov = 0.00

-1 0 1

-0.5

0

0.5

-0.5 0 0.50

1

2var = 0.04

mean = 0.00

Attention Node 3 WeightsHighlighting Initial

-0.5 0 0.5 10

1

2var = 0.04

mean = 0.00

cov = 0.00

-1 0 1

-0.5

0

0.5

1 cov = 0.00

0 1 2

-0.5

0

0.5

1

cov = 0.00

-1 0 1

-0.5

0

0.5

1

-0.5 0 0.5 10

1

2var = 0.04

mean = 0.00

cov = 0.00

0 1 2

-0.5

0

0.5

1

cov = 0.00

-1 0 1

-0.5

0

0.5

1 cov = 0.00

-1 0 1

-0.5

0

0.5

1

-0.5 0 0.5 10

20

40var = 0.00

mean = 1.00


-0.5 0 0.5 10

1

2var = 0.04

mean = 0.00

cov = 0.00

0 1 2

-0.5

0

0.5

1 cov = 0.00

-1 0 1

-0.5

0

0.5

1

cov = 0.00

-1 0 1

-0.5

0

0.5

1

-0.5 0 0.5 10

20

40var = 0.00

mean = 1.00

cov = 0.00

-1 0 1

-0.5

0

0.5

1

cov = 0.00

-1 0 1

-0.5

0

0.5

1 cov = 0.00

0 1 2

-0.5

0

0.5

1

-0.5 0 0.5 10

1

2var = 0.04

mean = 0.00


-0.5 0 0.5 10

20

40var = 0.00

mean = 1.00

cov = 0.00

-1 0 1

-0.5

0

0.5

1 cov = 0.00

-1 0 1

-0.5

0

0.5

1

cov = 0.00

0 1 2

-0.5

0

0.5

1

-0.5 0 0.5 10

1

2var = 0.04

mean = 0.00

cov = 0.00

-1 0 1

-0.5

0

0.5

1

cov = 0.00

0 1 2

-0.5

0

0.5

1 cov = 0.00

-1 0 1

-0.5

0

0.5

1

-0.5 0 0.5 10

1

2var = 0.04

mean = 0.00

wPE wI wPL wPE wI wPL

wPL

wI

wPE


Final p(w)Outcome Node 1 Weights

Highlighting After Phase 3, Epoch 3, Trial 4

-1 0 1 20

0.5

1var = 0.23

mean = 0.63

cov = -0.13

-2 0 2 4-1

0

1

2 cov = 0.03

-2 0 2-1

0

1

2

cov = -0.13

-2 0 2 4-1

0

1

2

-1 0 1 20

0.5

1var = 0.23

mean = 0.57

cov = -0.05

-2 0 2-1

0

1

2

cov = 0.03

-2 0 2 4-1

0

1

2 cov = -0.05

-2 0 2 4-1

0

1

2

-1 0 1 20

0.5

1

1.5

var = 0.08

mean = -0.16

Outcome Node 2 WeightsHighlighting After Phase 3, Epoch 3, Trial 4

-1 0 10

0.5

1var = 0.23

mean = -0.13

cov = -0.13

-2 0 2

-1

0

1cov = 0.03

-2 0 2 4

-1

0

1

cov = -0.13

-2 0 2

-1

0

1

-1 0 10

0.5

1var = 0.23

mean = 0.17

cov = -0.05

-2 0 2 4

-1

0

1

cov = 0.03

-2 0 2

-1

0

1cov = -0.05

-2 0 2

-1

0

1

-1 0 10

0.5

1

1.5

var = 0.08

mean = 0.93

Attention Node 1 WeightsHighlighting After Phase 3, Epoch 3, Trial 4

-0.5 0 0.5 10

20

40var = 0.00

mean = 1.00

cov = -0.00

-1 0 1

-0.5

0

0.5

1 cov = 0.00

-1 0 1

-0.5

0

0.5

1

cov = -0.00

0 1 2

-0.5

0

0.5

1

-0.5 0 0.5 10

0.5

1

1.5

var = 0.09

mean = -0.09

cov = -0.06

-1 0 1

-0.5

0

0.5

1

cov = 0.00

0 1 2

-0.5

0

0.5

1 cov = -0.06

-1 0 1

-0.5

0

0.5

1

-0.5 0 0.5 10

0.5

1

1.5

var = 0.11

mean = 0.05


-1 0 10

0.5

1

1.5

var = 0.10

mean = -0.14

cov = -0.00

0 1 2

-1

0

1 cov = 0.00

-2 -1 0 1

-1

0

1

cov = -0.00

-1 0 1

-1

0

1

-1 0 10

20

40var = 0.00

mean = 1.00

cov = -0.00

-2 -1 0 1

-1

0

1

cov = 0.00

-1 0 1

-1

0

1 cov = -0.00

0 1 2

-1

0

1

-1 0 10

1

2var = 0.07

mean = -0.68


-1 0 10

0.5

1

1.5

var = 0.13

mean = -0.05

cov = -0.03

-1 0 1

-1

0

1 cov = 0.00

0 1 2

-1

0

1

cov = -0.03

-1 0 1

-1

0

1

-1 0 10

1

2var = 0.06

mean = 0.01

cov = -0.00

0 1 2

-1

0

1

cov = 0.00

-1 0 1

-1

0

1 cov = -0.00

-1 0 1

-1

0

1

-1 0 10

20

40var = 0.00

mean = 1.00

.63

.57

-.16

-.13

.17

.93

-.14

-.68


wPL

wI

wPE




-1 0 1 20

0.5

1var = 0.23

mean = 0.63

cov = -0.13

-2 0 2 4-1

0

1

2 cov = 0.03

-2 0 2-1

0

1

2

cov = -0.13

-2 0 2 4-1

0

1

2

-1 0 1 20

0.5

1var = 0.23

mean = 0.57

cov = -0.05

-2 0 2-1

0

1

2

cov = 0.03

-2 0 2 4-1

0

1

2 cov = -0.05

-2 0 2 4-1

0

1

2

-1 0 1 20

0.5

1

1.5

var = 0.08

mean = -0.16


-1 0 10

0.5

1var = 0.23

mean = -0.13

cov = -0.13

-2 0 2

-1

0

1cov = 0.03

-2 0 2 4

-1

0

1

cov = -0.13

-2 0 2

-1

0

1

-1 0 10

0.5

1var = 0.23

mean = 0.17

cov = -0.05

-2 0 2 4

-1

0

1

cov = 0.03

-2 0 2

-1

0

1cov = -0.05

-2 0 2

-1

0

1

-1 0 10

0.5

1

1.5

var = 0.08

mean = 0.93


-0.5 0 0.5 10

20

40var = 0.00

mean = 1.00

cov = -0.00

-1 0 1

-0.5

0

0.5

1 cov = 0.00

-1 0 1

-0.5

0

0.5

1

cov = -0.00

0 1 2

-0.5

0

0.5

1

-0.5 0 0.5 10

0.5

1

1.5

var = 0.09

mean = -0.09

cov = -0.06

-1 0 1

-0.5

0

0.5

1

cov = 0.00

0 1 2

-0.5

0

0.5

1 cov = -0.06

-1 0 1

-0.5

0

0.5

1

-0.5 0 0.5 10

0.5

1

1.5

var = 0.11

mean = 0.05


-1 0 10

0.5

1

1.5

var = 0.10

mean = -0.14

cov = -0.00

0 1 2

-1

0

1 cov = 0.00

-2 -1 0 1

-1

0

1

cov = -0.00

-1 0 1

-1

0

1

-1 0 10

20

40var = 0.00

mean = 1.00

cov = -0.00

-2 -1 0 1

-1

0

1

cov = 0.00

-1 0 1

-1

0

1 cov = -0.00

0 1 2

-1

0

1

-1 0 10

1

2var = 0.07

mean = -0.68


-1 0 10

0.5

1

1.5

var = 0.13

mean = -0.05

cov = -0.03

-1 0 1

-1

0

1 cov = 0.00

0 1 2

-1

0

1

cov = -0.03

-1 0 1

-1

0

1

-1 0 10

1

2var = 0.06

mean = 0.01

cov = -0.00

0 1 2

-1

0

1

cov = 0.00

-1 0 1

-1

0

1 cov = -0.00

-1 0 1

-1

0

1

-1 0 10

20

40var = 0.00

mean = 1.00

.63

.57

-.16

-.13

.17

.93

-.14

-.68

≠


wPL

wI

wPE




-1 0 1 20

0.5

1var = 0.23

mean = 0.63

cov = -0.13

-2 0 2 4-1

0

1

2 cov = 0.03

-2 0 2-1

0

1

2

cov = -0.13

-2 0 2 4-1

0

1

2

-1 0 1 20

0.5

1var = 0.23

mean = 0.57

cov = -0.05

-2 0 2-1

0

1

2

cov = 0.03

-2 0 2 4-1

0

1

2 cov = -0.05

-2 0 2 4-1

0

1

2

-1 0 1 20

0.5

1

1.5

var = 0.08

mean = -0.16


-1 0 10

0.5

1var = 0.23

mean = -0.13

cov = -0.13

-2 0 2

-1

0

1cov = 0.03

-2 0 2 4

-1

0

1

cov = -0.13

-2 0 2

-1

0

1

-1 0 10

0.5

1var = 0.23

mean = 0.17

cov = -0.05

-2 0 2 4

-1

0

1

cov = 0.03

-2 0 2

-1

0

1cov = -0.05

-2 0 2

-1

0

1

-1 0 10

0.5

1

1.5

var = 0.08

mean = 0.93


-0.5 0 0.5 10

20

40var = 0.00

mean = 1.00

cov = -0.00

-1 0 1

-0.5

0

0.5

1 cov = 0.00

-1 0 1

-0.5

0

0.5

1

cov = -0.00

0 1 2

-0.5

0

0.5

1

-0.5 0 0.5 10

0.5

1

1.5

var = 0.09

mean = -0.09

cov = -0.06

-1 0 1

-0.5

0

0.5

1

cov = 0.00

0 1 2

-0.5

0

0.5

1 cov = -0.06

-1 0 1

-0.5

0

0.5

1

-0.5 0 0.5 10

0.5

1

1.5

var = 0.11

mean = 0.05


-1 0 10

0.5

1

1.5

var = 0.10

mean = -0.14

cov = -0.00

0 1 2

-1

0

1 cov = 0.00

-2 -1 0 1

-1

0

1

cov = -0.00

-1 0 1

-1

0

1

-1 0 10

20

40var = 0.00

mean = 1.00

cov = -0.00

-2 -1 0 1

-1

0

1

cov = 0.00

-1 0 1

-1

0

1 cov = -0.00

0 1 2

-1

0

1

-1 0 10

1

2var = 0.07

mean = -0.68


-1 0 10

0.5

1

1.5

var = 0.13

mean = -0.05

cov = -0.03

-1 0 1

-1

0

1 cov = 0.00

0 1 2

-1

0

1

cov = -0.03

-1 0 1

-1

0

1

-1 0 10

1

2var = 0.06

mean = 0.01

cov = -0.00

0 1 2

-1

0

1

cov = 0.00

-1 0 1

-1

0

1 cov = -0.00

-1 0 1

-1

0

1

-1 0 10

20

40var = 0.00

mean = 1.00

.63

.57

-.16

-.13

.17

.93

-.14

-.68

≠


wPL

wI

wPE




-1 0 1 20

0.5

1var = 0.23

mean = 0.63

cov = -0.13

-2 0 2 4-1

0

1

2 cov = 0.03

-2 0 2-1

0

1

2

cov = -0.13

-2 0 2 4-1

0

1

2

-1 0 1 20

0.5

1var = 0.23

mean = 0.57

cov = -0.05

-2 0 2-1

0

1

2

cov = 0.03

-2 0 2 4-1

0

1

2 cov = -0.05

-2 0 2 4-1

0

1

2

-1 0 1 20

0.5

1

1.5

var = 0.08

mean = -0.16


-1 0 10

0.5

1var = 0.23

mean = -0.13

cov = -0.13

-2 0 2

-1

0

1cov = 0.03

-2 0 2 4

-1

0

1

cov = -0.13

-2 0 2

-1

0

1

-1 0 10

0.5

1var = 0.23

mean = 0.17

cov = -0.05

-2 0 2 4

-1

0

1

cov = 0.03

-2 0 2

-1

0

1cov = -0.05

-2 0 2

-1

0

1

-1 0 10

0.5

1

1.5

var = 0.08

mean = 0.93


-0.5 0 0.5 10

20

40var = 0.00

mean = 1.00

cov = -0.00

-1 0 1

-0.5

0

0.5

1 cov = 0.00

-1 0 1

-0.5

0

0.5

1

cov = -0.00

0 1 2

-0.5

0

0.5

1

-0.5 0 0.5 10

0.5

1

1.5

var = 0.09

mean = -0.09

cov = -0.06

-1 0 1

-0.5

0

0.5

1

cov = 0.00

0 1 2

-0.5

0

0.5

1 cov = -0.06

-1 0 1

-0.5

0

0.5

1

-0.5 0 0.5 10

0.5

1

1.5

var = 0.11

mean = 0.05


-1 0 10

0.5

1

1.5

var = 0.10

mean = -0.14

cov = -0.00

0 1 2

-1

0

1 cov = 0.00

-2 -1 0 1

-1

0

1

cov = -0.00

-1 0 1

-1

0

1

-1 0 10

20

40var = 0.00

mean = 1.00

cov = -0.00

-2 -1 0 1

-1

0

1

cov = 0.00

-1 0 1

-1

0

1 cov = -0.00

0 1 2

-1

0

1

-1 0 10

1

2var = 0.07

mean = -0.68


-1 0 10

0.5

1

1.5

var = 0.13

mean = -0.05

cov = -0.03

-1 0 1

-1

0

1 cov = 0.00

0 1 2

-1

0

1

cov = -0.03

-1 0 1

-1

0

1

-1 0 10

1

2var = 0.06

mean = 0.01

cov = -0.00

0 1 2

-1

0

1

cov = 0.00

-1 0 1

-1

0

1 cov = -0.00

-1 0 1

-1

0

1

-1 0 10

20

40var = 0.00

mean = 1.00

.63

.57

-.16

-.13

.17

.93

-.14

-.68


wPL

wI

wPE Inhibition of I by PL prevents disconfirmation of previous learning that IE.

Summary

• Locally Bayesian learning was applied to attentional shifts in associative learning, specifically to account for “highlighting”.


IPE PL

E LTaught:

IPE PL

E LLearned:

),|( 111 xyp)( 1p

1x3y


),|( 222 xyp ),|( 333 xyp)( 2p )( 3p

1y 2y

2x 3x

• Different levels of analysis invite possibility of a chain of Bayesian learners.

• Locally Bayesian learning prevents disconfirmation of superior’s beliefs and creates distortions in inferior’s beliefs.

Future Directions

• Better models and priors for application to associative learning, to expand scope and quantitatively fit human learning.

• Applications to other domains and phenomena. (Please suggest!)

• Formal analysis of global behavior of system of Bayesian agents.


Indiana Universitykruschke/articles/Kruschke2007IPAMslides.pdf · Locally Bayesian Learning John K. Kruschke Indiana University Kruschke, IPAM GSS 2007

Documents

Indiana Universitykruschke/articles/Kruschke2007IPAMslides.pdf · Locally Bayesian Learning John K. Kruschke Indiana University Kruschke, IPAM GSS 2007