Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Michael Lee Brent Miller Pernille Hemmer Bill Batchelder Paolo Napoletano
78
Embed
Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals
Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals. Mark Steyvers Department of Cognitive Sciences University of California, Irvine. Joint work with: Michael Lee Brent Miller Pernille Hemmer Bill Batchelder Paolo Napoletano. Ordering problem:. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals
Mark Steyvers
Department of Cognitive Sciences
University of California, Irvine
Joint work with:Michael LeeBrent Miller
Pernille HemmerBill Batchelder
Paolo Napoletano
Thomas Jefferson
Andrew Jackson
James Monroe
George Washington
John AdamsGeorge Washington
Ordering problem:
time
what is the correct order of these Presidents?
Goal: aggregating responses
3
D A B C A B D C B A D C A C B D A D B C
Aggregation Algorithm
A B C D A B C D
ground truth
=?
group answer
Bayesian Approach
4
D A B C A B D C B A D C A C B D A D B C
Generative Model
A B C D
ground truth =latent common cause
Important notes:
No communication between individuals
There is always a true answer (ground truth)
Aggregation algorithm never has access to ground truth ground truth only used for evaluation
5
Matching problem:
6
RembrandtVan Gogh Monet Renoir
A BC D
Wisdom of crowds phenomenon
Crowd estimate is often better than any individual in the crowd
(Think of independent noise influencing each individual)
7
Examples of wisdom of crowds phenomenon
8
Who wants to be a millionaire?Galton’s Ox (1907): Median of individual estimates comes close to true answer
Limitations of Current “Wisdom of Crowds” Research
Studies restricted to numeric or categorical judgments simple averaging schemes:
Mode Median Mean
No treatment of individual differences every “vote” is treated equally downplayed role of expertise
9
Cultural Consensus Theory (CCT)E.g. Romney, Batchelder, and Weller (1987)
Finds the “answer key” to multiple choice questions when ground truth is lost takes person and item differences into account
Informal version of CCT also developed for ranking data
10
Research Goals
Generalize “wisdom of crowds” effect to more complex data
Aggregation of permutations Ranking data Matching (assignment) data
11
Hierarchical Bayesian Models
Probability distributions over all permutations of items with N items, there are N! combinations e.g., when N=44, we have 44! > 10^53 combinations Approximate inference methods: MCMC
Cognitively plausible generative processes
Treatment of individual differences
12
Part IOrdering Problems
13
Experiment 1
Task: order all 44 US presidents
Methods 26 participants (college undergraduates) Names of presidents written on cards Cards could be shuffled on large table
14
= 1= 1+1Measuring performance
Kendall’s Tau: The number of adjacent pair-wise swaps
6 videos 3 videos with stereotyped event sequences (e.g. wedding) 3 videos “unpredictable” videos (e.g., example video) extracted 10 stills for testing
Method study video followed by immediate ordering test of 10 items
45
Bayesian Thurstonian Model
46
event1 (1)
event2 (2)
event3 (3)
event4 (4)
event5 (7)
event6 (6)
event7 (5)
event8 (8)
event9 (9)
event10 (10)
yogurt commercial
0 0.5 1 1.5 2
0
5
10
15
20
R=0.890
= 3
Two other examples
47
event1 (1)
event2 (2)
event3 (3)
event4 (4)
event5 (6)
event6 (5)
event7 (7)
event8 (8)
event9 (9)
event10 (10)
clay animation
= 1 event1 (1)
event2 (2)
event3 (3)
event4 (4)
event5 (5)
event6 (6)
event7 (7)
event8 (8)
event9 (9)
event10 (10)
wedding
= 0
Overall Results
48
1 10 20 300
5
10
15
Individuals
Thurstonian ModelBorda countModeIndividuals
Me
an
Part IIIMatching Problems
49
Example Matching Problem (one-to-one)
50
Dutch
Danish
Yiddish
Thai
Vietnamese
Chinese
Georgian
Russian
Japanese
A
B
C
D
E
F
G
H
I
godt nytår
gelukkig nieuwjaar
a gut yohr
С Новым Годом
สวั�สดี�ปี�ใหม่�
Chúc Mừng Nǎm Mới
გილოცავთ ახალწელს
Experiment
17 Participants
8 matching problems, e.g. car logo’s and brand names first and last names philosophers flags and countries greek symbols and letter names
Number of items varied between 10 and 24 with 24 items, we have 24! possibilities
51
Overall Results
52
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
0.2
0.4
0.6
0.8
1
Individuals
Mea
n A
ccur
acy
Heuristic Aggregation Approach
Combinatorial optimization problem maximizes agreement in assigning N items to N responses
Hungarian algorithm construct a count matrix M Mij = number of people that paired item i with response j find row and column permutations to maximize diagonal sum O( n3 )
- match “known” items- guess between remaining ones
Individual differences:
-some items easier to know-some participants know more
Dutch
Danish
Yiddish
Russian
godt nytår
gelukkig nieuwjaar
a gut yohr
С Новым Годом
Graphical Model
57
i items
jx
jy
z
ja
Latent ground truth
Observed matching
Knowledge State
jsProb. of knowing
id
j individuals
logitj i js d a
~ Bernoulliij ijx s
1 1( )
1 / ! 0ij
ij ij ij
xp y z
n x
person abilityitem easiness
Overall Modeling Results
58
1 10 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Individuals
Mea
n A
ccur
acy
Bayesian MatchingHungarian AlgorithmIndividuals
Calibration at level of items and people(for paintings problem)
59
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Greek symbols
R=0.953
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Philosophers
R=0.978
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Flags
R=0.973
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Paintings
R=0.916
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
US presidents
R=0.960
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Car logos
R=0.918
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Languages
R=0.947
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Sport balls
R=0.963
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Greek symbols
R=0.990
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Philosophers
R=0.992
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Flags
R=0.987
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Paintings
R=0.975
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
US presidents
R=0.992
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Car logos
R=0.992
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Languages
R=0.968
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Sport balls
R=0.995
ITEMS INDIVIDUALS
How predictive are subject provided confidence ratings?
60
0 1-2 3-4 5+0
0.2
0.4
0.6
0.8
1
0 1-2 3-4 5+0
0.2
0.4
0.6
0.8
1
# guesses estimatedby individual
Acc
urac
y
# guesses estimatedby model
(based on variable A)
r=-.42 r=-.77
Part IVOpen Issues
61
When do we get wisdom of crowds effect?
Independent errors different people knowing different things
Population response centered around ground truth
Some minimal number of individuals 10-20 individuals often sufficient
62
What are methods for finding experts?
1) Self-reported expertise: unreliable has led to claims of “myth of expertise”
2) Based on explicit scores by comparing to ground truth but ground truth might not be immediately available
3) Endogenously discover experts Use the crowd to discover experts Small groups of experts can be effective
63
What to do about systematic biases?
In some tasks, individuals systematically distort the ground truth spatial and temporal distortions memory distortions (e.g. false memory) decision-making distortions
Does this diminish the wisdom of crowds effect? maybe… but a model that predicts these systematic distortions might be
able to “undo” them
64
Can we build domain specific models?
Thurstonian model applied to wide variety of problems
How about domain specific models? e.g., apply serial recall models to serial recall better specify sources of noise model systematic biases
Mallows Model Borda Counts ModeThurstone v1Humans Thurstone v2
Notes
Noise in Thurstonian models acquisition / encoding noise retrieval noise
Link to crowd within (Ed Vul) are our results due to wisdom of crowds or individuals? Probably a bit of both and we cannot tell with our experiments However, there is probably a fair amount of encoding noise that
would not benefit from repeated measurements within individuals Different individuals probably do know different things
69
To Do
Compare explicitly estimated number of guesses with latent confidence
Identifiability issue fix mean A?
Hierarchical model test on small numbers of subjects
Model comparisons on small sets of subjects
70
TO DO: look at kurtosis of sigma distributions
Modeling Group Serial Recall
Goal: infer distribution over orderings of events given verbal reports i.e., P( original order | verbal report )
Many models for serial recall, e.g. Estes Perturbation model (1972) Shiffrin & Cook (1978) SOB (2002) Simple (2007)
but many of these models do not have a likelihood function p( item 1, item 2, …, item N | memory contents )
71
Bayesian Algorithm: not every person has equal weight
Extended wisdom of crowds to combinatorial problems approximate inference (MCMC) to infer probability distributions
over permutations
Bayesian methods that are calibrated we can tell who is likely to be accurate without having ground
truth available
73
Graphical Model
74
i items
jx
jy
z
ja
Latent ground truth
Observed matching
Knowledge State
jsProb. of knowing
id
j individuals
logitj i js d a
~ Bernoulliij ijx s
1 1( )
1 / ! 0ij
ij ij ij
xp y z
n x
item and person parameters
When do we get Wisdom of Crowds effect?
Analyze model performance in a variety of tasks
75
MDS solution of pairwise tau distances
76-15 -10 -5 0 5 10 15 20 25 30 35-20
-15
-10
-5
0
5
10
15
7
26
3
16
7 9
61
22
2
13
12
7
11
14
9
5
7
11
8
3
24
3
7
10
10
4
03
6
9
6
26
5
18
44 3
14
6
2
5
3
5
1
4210
11
4
3
42
0
8
21
7
3
5
1
1
8
1
33
14
3
20
6
8
16
7
22
23
2 3710
states westeast
IndividualsTruthThurstonian Model
distance to truth
MDS solution of pairwise tau distances
77-20 -15 -10 -5 0 5 10 15 20 25
-20
-15
-10
-5
0
5
10
15
20
14
23
25
24
1824
13
14
10
5
9
20
8
20
15
18
12
33
25
29
171
14
20
27176
13
11
15
3
17
17
17
24
7
26
9
13
17
27
13
15
11
15
15
23
2811
26
16
4
27
9
23
24
11
17
19
15
22
2
15
14
12
21
11
26
11
18
35
22
10
20
24
25
1
19
7
0
ten commandments
IndividualsTruthThurstonian Model
Modeling Performance Across Task
Current model is applied independently across tasks
Extend hierarchical model with random effects approach to tasks Each person has a an overall ability (Pearson’s “g” ) Ability in a specific task is varies around overall ability