Hippocampus: Answering Memory Queries using Transactive Search Michele @pirroh Catasta Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, and Philippe Cudre-Mauroux 1
May 10, 2015
Hippocampus: Answering Memory Queries
using Transactive Search
Michele @pirroh CatastaAlberto Tonon, Djellel Eddine Difallah, Gianluca Demartini,
Karl Aberer, and Philippe Cudre-Mauroux
�1
�2
“A transactive memory system is a mechanism through which groups collectively encode, store, and retrieve knowledge.”
“[…] a memory system that is more complex and potentially more effective than that of any of its individual constituents.”
A transactive search system discovers and aggregates the information stored in a transactive memory.
Wikipedia
Wikipedia
–Daniel M. Wegner
“[…] it is a property of a group. This unique quality of transactive memory brings with it the realization that we are speaking of a constructed system, a
mode of group operation that is built up over time by its individual constituents.”
�3
�4
INFORMATION NEED reconstruct the attendees’ list of the 86th Academy Awards (2014)
�5
�6
MISTAKES: not all the nominees participate to the ceremony
PRECISION :-( !
!
!
MISSING ENTRIES: what about all the people working “behind the scenes”?
RECALL :-(
�7
FROM THE IDEA…
• for data that is stored in the memories of a group of people, the current query strategies are suboptimal
• we need a new form of human computation, different from standard crowdsourcing (i.e., no anonymous crowd)
�8
Navigational: The immediate intent is to reach a particular Web site.
Informational: The intent is to acquire some information assumed to be present on one or more Web pages.
Transactional: The intent is to perform some Webmediated activity.
Transactive: The intent is to acquire some information that can be reconstructed only by an [ephemeral] social network.
“A taxonomy of Web Search”
— A. Broder (2002)
…TO THE TESTING ENVIRONMENT
• We want to reconstruct the attendees list of two Semantic Web conferences, ISWC2012 and ISWC2013
!
• We were given access to the ground truth but, in general, such lists are not publicly available
!
• Additional data sources: authors list (first author, last author, etc.), mentions in Online Social Networks
�9
�10
EXPERIMENT ARCHITECTURE
• tailored Web UI + results aggregator
• iterative reconstruction: every time a new person was mentioned, Hippocampus sent her an invitation to contribute to the attendees list
�11
Hippocampus !!!
discovery (Web UI +
messaging)results
aggregatorstorage
layer
MACHINE LEARNING APPROACHES
• we collected the proceedings information and all the tweets with the conference hashtags
• we trained state-of-the-art classifiers with these features:
�12
not possible without the ground truth!
ML + CROWDSOURCING APPROACHES
• Uncertain cases (precision): we asked the crowd to revise the low-confidence results of the ML classifier.(e.g., people that didn’t attend the conference but tweeted about it)
• Unseen cases (recall): we asked the crowd to actively look for attendees not included in the authors list (e.g., organizers mentioned in the Web site)
�13
the crowd has access only to public data on the Web!
Transactive vs ML & Crowdsourcing ISWC 2013
�14
Authors and Tweets: baseline (exhaustive list of authors and twitterers)Machine Learning: SVM, M5P RegressionMachine Learning + Crowdsourcing: Hybrid_(uncertain, unseen, uncertain_unseen)
attendees found over time Transactive Search
0"
50"
100"
150"
200"
250"
300"
350"
400"
450"
10(Dec" 11(Dec" 12(Dec" 13(Dec" 14(Dec" 15(Dec" 16(Dec"
Retrieved(Re
sults(
Day(
Retrieved"A4endees"2012" Duplicates"Names"2012"
Retrieved"A4endees"2013" Duplicates"Names"2013"
�15
Transactive Memory Graph in green, two isolated “components” discovered by top-contributors
�16
CONCLUSIONS
• for a specific class of queries, our Transactive Search performs up to 46% better than the best alternative approach (i.e., Machine Learning + Crowdsourcing)
!
• we will explore incentives for Hippocampus, as it is currently two orders of magnitude slower than the alternative approaches
• we reported some initial evidences that, as human memories fade with time, our approach works best with recent events
�17