A Model of Information Foraging via Ant Colony Simulation Matthew Kusner.

A Model of Information Foraging via Ant Colony Simulation

Matthew Kusner

Information Foraging

Theory Background

– People search for information in roughly the same way that animals search for food in their surroundings.

Information Scent

– Ex: “the text associated with Web links” (Fu, 2007)

– Background knowledge

– Recommendations

Ant Colony Simulation

Pheromone trails

– Laid by ants who've found food.

– Followed by other ants with probability p.

– Path Evaporation Path Optimization Simulation specifics

AOL Data Set 21 million queries (March 1– May 31, 2006) 650k users 19 million click-through events Quantities: query time of query click URL user ID clicked link rank

Information Foraging → Ant Colony

user → ant clicked link → food information scent → pheromone path website importance → food distance where website importance is defined by:

– 1. Rank

– 2. Popularity of website

– 3. Combination of above methods

Distancing Methods

• Ranking

• Popularity

• Combination

[based on data in Joachims et al., 2005]

Results• AOL user-visit per website vector

– [numWvisits1, numWvisits

2, ..., numWvisits

n]

• Simulation ant-visit per food vector

– [numAvisits1, numAvisits

2, ..., numAvisits

n]

• Pearson Correlation Score (PCS)

• Permutation Test → 95% Coverage Interval

– (AOL_datai, simulation_data

i) selection with

replacement

• Bootstrapping → p-value

– Shuffle AOL vector

Query Type of distancing

# of users

# of clicked links

# of distinct websites visited

Average PCS

Average 95% CI

Start

Average 95% CI

End

Significant p-val?

ranking 125 59 19 0.8182 0.3203 0.9364 Yes

vacation popularity 125 59 19 0.1296 -0.1768 0.6624

combination 125 59 19 0.1488 -0.3819 0.3920

ranking 39 25 6 0.7631 -0.4781 0.9854

rhino popularity 39 25 6 0.3906 -0.2484 0.9919

combination 39 25 6 0.2013 -0.7389 0.9657

ranking 53 61 12 -0.1825 -0.5426 0.4706

zebra popularity 53 61 12 -0.0110 -0.4667 0.5079

combination 53 61 12 0.1558 -0.3655 0.6754

ranking 52 39 9 0.6118 -0.1797 0.9214

lion popularity 52 39 9 0.0699 -0.5776 0.7296

combination 52 39 9 0.0304 -0.6170 0.6609

ranking 194 56 21 0.5358 -0.0952 0.9301

football popularity 194 56 21 0.2693 -0.1583 0.6722

combination 194 56 21 0.4149 -0.0223 0.7612

ranking 220 74 16 0.7137 -0.4225 0.9529

basketball popularity 220 74 16 0.2228 -0.1755 0.6455

combination 220 74 16 0.1415 -0.3470 0.6661

Results• Queries with significant p-values:

– vacation” (ranking), “baseball” (ranking), “reebok” (ranking), “adidas” (ranking), “marbles” (ranking), “helicopter” (ranking), “car” (ranking), “potatoes” (ranking), “coffee” (ranking), “farming” (ranking), “rock” (popularity), “shirts” (ranking), “playstation” (ranking), “sega” (popularity), “tom cruise” (ranking), “mel gibson” (ranking), “burger king” (ranking), “chicago” (ranking), “los angeles” (ranking), and “paris” (ranking)

• Distancing methods without 95% CI overlap:– Ranking:

• “potatoes” - neither popularity, nor combination

• “shirts” - not popularity

• “playstation” - not popularity

• “burger king” - not combination

Discussion• Disadvantages of popularity and combination

methods

– “vacation” example

• Possible reasons for 95% CI overlap

– Randomness

– Disregard of structure

• Significance of queries with low p-values

– Search engine matching

• Future directions

– Different Simulation

– Other similarity metrics

– Random beginnings

References

• Fu, W., & Pirolli, P. (2007). SNIF-ACT: a cognitive model of user navigation on the World Wide Web. Human-Computer Interaction, 22(4), 355-412.

• T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay (2005). Accurately Interpreting Clickthrough Data as Implicit Feedback, Proceedings of the ACM Conference on Research and Development on Information Retrieval (SIGIR).

A Model of Information Foraging via Ant Colony Simulation Matthew Kusner.

Documents

reebok ranking

chicago ranking

playstation ranking

vacation ranking

baseball ranking

adidas ranking

farming ranking

coffee ranking