A Model of Information Foraging via Ant Colony Simulation Matthew Kusner
Jan 12, 2016
A Model of Information Foraging via Ant Colony Simulation
Matthew Kusner
Information Foraging
Theory Background
– People search for information in roughly the same way that animals search for food in their surroundings.
Information Scent
– Ex: “the text associated with Web links” (Fu, 2007)
– Background knowledge
– Recommendations
Ant Colony Simulation
Pheromone trails
– Laid by ants who've found food.
– Followed by other ants with probability p.
– Path Evaporation Path Optimization Simulation specifics
AOL Data Set 21 million queries (March 1– May 31, 2006) 650k users 19 million click-through events Quantities: query time of query click URL user ID clicked link rank
Information Foraging → Ant Colony
user → ant clicked link → food information scent → pheromone path website importance → food distance where website importance is defined by:
– 1. Rank
– 2. Popularity of website
– 3. Combination of above methods
Distancing Methods
• Ranking
• Popularity
• Combination
[based on data in Joachims et al., 2005]
Results• AOL user-visit per website vector
– [numWvisits1, numWvisits
2, ..., numWvisits
n]
• Simulation ant-visit per food vector
– [numAvisits1, numAvisits
2, ..., numAvisits
n]
• Pearson Correlation Score (PCS)
• Permutation Test → 95% Coverage Interval
– (AOL_datai, simulation_data
i) selection with
replacement
• Bootstrapping → p-value
– Shuffle AOL vector
Query Type of distancing
# of users
# of clicked links
# of distinct websites visited
Average PCS
Average 95% CI
Start
Average 95% CI
End
Significant p-val?
ranking 125 59 19 0.8182 0.3203 0.9364 Yes
vacation popularity 125 59 19 0.1296 -0.1768 0.6624
combination 125 59 19 0.1488 -0.3819 0.3920
ranking 39 25 6 0.7631 -0.4781 0.9854
rhino popularity 39 25 6 0.3906 -0.2484 0.9919
combination 39 25 6 0.2013 -0.7389 0.9657
ranking 53 61 12 -0.1825 -0.5426 0.4706
zebra popularity 53 61 12 -0.0110 -0.4667 0.5079
combination 53 61 12 0.1558 -0.3655 0.6754
ranking 52 39 9 0.6118 -0.1797 0.9214
lion popularity 52 39 9 0.0699 -0.5776 0.7296
combination 52 39 9 0.0304 -0.6170 0.6609
ranking 194 56 21 0.5358 -0.0952 0.9301
football popularity 194 56 21 0.2693 -0.1583 0.6722
combination 194 56 21 0.4149 -0.0223 0.7612
ranking 220 74 16 0.7137 -0.4225 0.9529
basketball popularity 220 74 16 0.2228 -0.1755 0.6455
combination 220 74 16 0.1415 -0.3470 0.6661
Results• Queries with significant p-values:
– vacation” (ranking), “baseball” (ranking), “reebok” (ranking), “adidas” (ranking), “marbles” (ranking), “helicopter” (ranking), “car” (ranking), “potatoes” (ranking), “coffee” (ranking), “farming” (ranking), “rock” (popularity), “shirts” (ranking), “playstation” (ranking), “sega” (popularity), “tom cruise” (ranking), “mel gibson” (ranking), “burger king” (ranking), “chicago” (ranking), “los angeles” (ranking), and “paris” (ranking)
• Distancing methods without 95% CI overlap:– Ranking:
• “potatoes” - neither popularity, nor combination
• “shirts” - not popularity
• “playstation” - not popularity
• “burger king” - not combination
Discussion• Disadvantages of popularity and combination
methods
– “vacation” example
• Possible reasons for 95% CI overlap
– Randomness
– Disregard of structure
• Significance of queries with low p-values
– Search engine matching
• Future directions
– Different Simulation
– Other similarity metrics
– Random beginnings
References
• Fu, W., & Pirolli, P. (2007). SNIF-ACT: a cognitive model of user navigation on the World Wide Web. Human-Computer Interaction, 22(4), 355-412.
• T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay (2005). Accurately Interpreting Clickthrough Data as Implicit Feedback, Proceedings of the ACM Conference on Research and Development on Information Retrieval (SIGIR).