CCF ADL, Jul 31 2011 Susan Dumais Microsoft Research http:// research.microsoft.com/~sdumais Temporal Dynamics and Information Retrieval In collaboration with: Eric Horvitz, Jaime Teevan, Eytan Adar, Jon Elsas, Ed Cutrell, Dan Liebling, Richard Hughes, Merrie Ringel Morris, Evgeniy Gabrilovich, Krysta Svore, Anagha Kulkani
78
Embed
Susan Dumais Microsoft Research http:// research.microsoft.com/~sdumais
Temporal Dynamics and Information Retrieval. Susan Dumais Microsoft Research http:// research.microsoft.com/~sdumais. In collaboration with: - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CCF ADL, Jul 31 2011
Susan DumaisMicrosoft Research
http://research.microsoft.com/~sdumais
Temporal Dynamics and Information Retrieval
In collaboration with:Eric Horvitz, Jaime Teevan, Eytan Adar, Jon Elsas, Ed Cutrell, Dan Liebling, Richard Hughes, Merrie Ringel
Change is everywhere in digital information systems New documents appear all the time Document content changes over time Queries and query volume change over time What’s relevant to a query changes over time
E.g., U.S. Open 2011 (in May vs. Sept) User interaction changes over time
E.g., tags, anchor text, social networks, query-click streams, etc. Relations between entities change over time
E.g., President of the US in 2008 vs. 2004 vs. 2000 Change is pervasive in digital information
Looking for: recent email from Fedor that contained a link to his new demoInitiated from: Start menuQuery: from:FedorLooking for: the pdf of a SIGIR paper on
context and ranking (not sure it used those words) that someone (don’t remember who) sent me about a month agoInitiated from: OutlookQuery: SIGIR
Looking for: meeting invite for the last intern handoffInitiated from: Start menuQuery: intern handoff kind:appointment
Looking for: C# program I wrote a long time agoInitiated from: Explorer paneQuery: QCluster*.*
CCF ADL, Jul 31 2011
Lots of metadata… especially time
Stuff I’ve Seen
Stuff I’ve Seen: Findings Evaluation:
Internal to Microsoft, ~3000 users in 2004 Methods: free-form feedback, questionnaires, usage
patterns from log data, in situ experiments, lab studies for richer data
Personal store characteristics: 5k–1500k items
Information needs: Desktop search != Web search
Short queries (1.6 words) Few advanced operators in the initial query (~7%) But … many advanced operators and query iteration in UI
People know a lot about what they are looking for and we need to provide a way to express it !
CCF ADL, Jul 31 2011
Type N SizeWeb 3k 0.2 GbFiles 28k 23.0 GBMail 60k 2.2 GbTotal 91k items 25.4 GbIndex 190 Mb
+1.5 Mb/week
Susan's (Laptop) World
Stuff I’ve Seen: Findings Information needs:
People are important – 29% queries involve names/aliases
Date is the most common sort order Even w/ “best-match” default Few searches for “best” matching object Many other criteria (e.g., time, people, type), depending
on task Need to support flexible access
Abstraction is important – “useful” date, people, pictures
Age of items retrieved Today (5%), Last week (21%), Last month (47%) Need to support episodic access to memory
CCF ADL, Jul 31 2011
Memory Landmarks Importance of episodes in human memory
Memory organized into episodes (Tulving, 1983) People-specific events as anchors (Smith et al.,
1978) Time of events often recalled relative to other
events, historical or autobiographical (Huttenlocher & Prohaska, 1997)
Identify and use landmarks facilitate search and information management Timeline interface, augmented w/ landmarks Learn Bayesian models to identify memorable
events Extensions beyond search, e.g., Life
BrowserCCF ADL, Jul 31 2011
CCF ADL, Jul 31 2011
Memory LandmarksSearch Results
Memory Landmarks- General (world, calendar)- Personal (appts, photos)
Linked to results by time
Distribution of Results Over Time
[Ringle et al., 2003]
CCF ADL, Jul 31 2011
Memory Landmarks: Findings
Dates Only Landmarks + Dates0
5
10
15
20
25
30
Sea
rch
Tim
e (s
)
With Landmarks Without Landmarks
Memory LandmarksLearned models of memorability
CCF ADL, Jul 31 2011
[Horvitz et al., 2004]
Images & videos
Appts & events
Desktop& search activity
Whiteboardcapture
Locations
LifeBrowser[Horvitz & Koch, 2010]
CCF ADL, Jul 31 2011
LifeBrowserLearned models of selective
memory
CCF ADL, Jul 31 2011
CCF ADL, Jul 31 2011
News is a stream of information w/ evolving events But, it’s hard to consume it as such
Personalized news using information novelty Identify clusters of related articles Characterize what a user knows about an event Compute the novelty of new articles, relative to this
Temporal IR Query frequency over time Retrieval models that incorporate time
Ranking algorithms typically look only at a single snapshot in time
But, both content and user interaction with the content change over time
Model content change on a page Model user interactions
Tasks evolve over time
CCF ADL, Jul 31 2011
Query Dynamics Queries sometimes mention time, but often don’t
Explicit time (e.g., World Cup Soccer 2011) Explicit news (e.g., earthquake news) Implicit time (e.g., Harry Potter reviews; implicit “now”)
Queries are not uniformly distributed over time Often triggered by events in the world
Using temporal query patterns to: Cluster similar queries Identify events and find related news
Query Dynamics
Q: cinema
Discrete Fourier Transform Best k components (vs. first k) Best k significantly reduce
reconstruction error
Burst detection Bursts as deviations from moving
average
[Vlachos et al., SIGMOD 2004]
Modeling query frequency over time (Vlachos et al.)
Query Dynamics Types of query popularity
patterns Number of spikes (0, 1, multiple) Periodic (yes, no) Shape of rise and fall (wedge,
sail, castle) Trend (flat, up, down)
Changes in query popularity and content to changes in user intent (i.e., what is relevant to the query) … more on this later
[Kulkarni et al., WSDM 2011]
Model query patterns using empirical query frequency (normalized by total queries per day)
Examples:
Identify “similar” queries using correlation coefficient between the normalized time series
Nice use of time to identify semantic similarity between queries/entities, but not predictive
Using Query Dynamics to Find Similar Queries
[Chien & Immorlica, WWW 2005]
Q: movies
Q: scott peterso
n
Q: weather report
CCF ADL, Jul 31 2011
Many queries to Web search engines are motivated by events in the world Should you show just Web results? Or, provide an integrated view of news and Web?
Example Learn model to predict “newsworthiness” of
a query (i.e., will a user click on news results) Is the query part of a burst? [content consumption] Are the top-ranked news results very recent? [content
production] Improve prediction using ongoing click data for this and
related queries
[Diaz 2010]
Using Query Dynamics to Identify “News” Queries
Temporal Retrieval Models 1
Current retrieval algorithms look only at a single snapshot of a page
But, Web page content changes over time Can we can leverage this to improved
retrieval? Pages have different rates of change
Different priors (using change vs. link structure) Terms have different longevity (staying power)
Some are always on the page; some transient Language modeling approach to ranking
CCF ADL, Jul 31 2011
)|()()|( DQPDPQDP
Change prior Term longevity
[Elsas et al., WSDM 2010]
Relevance and Page Change
Page change is related to relevance judgments Human relevance judgments
5 point scale – Perfect/Excellent/Good/Fair/Bad Rate of Change -- 60% Perfect pages; 30% Bad
pages
Use change rate as a document prior (vs. priors based on link structure like Page Rank) Shingle prints to measure change
CCF ADL, Jul 31 2011
)|()()|( DQPDPQDP
Change prior
Relevance and Term Change
Terms patterns vary over time
Represent a document as a mixture of terms with different “staying power” Long, Medium, Short
Navigational queries 2k queries identified with a “Perfect”
judgment Assume these relevance judgments are
consistent over timeCCF ADL, Jul 31 2011
Experimental Results
Baseline Static Model
Dynamic Model
Dynamic Model + Change Prior
Change Prior
CCF ADL, Jul 31 2011
Temporal Retrieval Models 2
Initial evaluation Navigational queries; assume relevance is “static”
over time But, relevance often changes over time
E.g., Stanley Cup in 2011 vs. in 2010 E.g., US Open 2011 in May (golf) vs. in Sept (tennis) E.g., March madness 2011
Before event: Schedule and tickets, e.g., stubhub During event: Real-time scores, e.g., espn, cbssports After event: General sites, e.g., wikipedia, ncaa
Many features Textual – similarity q1, q1 Session-based – queries, clicks, queries since last click,
etc. Time-based – time between q1, q2, total session time, etc.
Trigger Yahoo! Scratch Pad if a research mission is detected
[Donato et al., WWW 2010][Jones & Klinckner, CIKM 2008)
CCF ADL, Jul 31 2011
Cross-Session Tasks Many tasks extend across sessions – e.g., medical
diagnosis and treatment, event planning, how-to advice, shopping research, academic research, etc. 10-15% of tasks continue across multiple sessions 20-25% of queries are from multi-session tasks
Example
Develop methods to support task resumption over time Same Task: Find (previous) related queries/clicks Task Resumption: Predict whether user will resume
task
[Kotov et al., SIGIR 2011]
CCF ADL, Jul 31 2011
Cross-Session Tasks
Results Same Task Task Continuation
Develop support for task continuation
[Kotov et al., SIGIR 2011]
Approach Classification (logistic regression, MART) Features (query, pair-wise, session-based, history-
based)
CCF ADL, Jul 31 2011
Other Examples of Dynamics and
Information Systems Document dynamics, for crawling and
indexing Adar et al. (2009); Cho & Garcia-Molina (2000); Fetterly et al.
(2003) Query dynamics
Kulkarni et al. (2011); Jones & Diaz (2004); Diaz (2009); Kotov et al. (2010)
Temporal retrieval models Elsas & Dumais (2010); Liu & Croft (2004); Efron (2010); Aji et al.
(2010) Extraction of temporal entities within
documents Protocol extension for retrieving versions
over time E.g., Memento (Van de Sompel et al., 2010)
Dumais et al. (SIGIR 2003). Ringle et al. (Interact 2003). Horvitz & Koch (2010). Gabrilovich et al. (WWW 2004). Diaz (2009).
Document dynamics Adar et al. (WSDM 2009). Cho & Garcia-Molina (VLDB
2000). Fetterly et al. (WWW 2003).
User interaction Adar et al. (CHI 2009). Teevan et al. (SIGIR 2007). Tyler et al. (WSDM 2010). Teevan et al. (WSDM 2011). Adar et al. (CHI 2010). Teevan et al. (UIST 2009). Teevan et al. (CHI 2010).
CCF ADL, Jul 31 2011
Query dynamics Kulkarni et al. (WSDM 2011). Jones & Diaz (TOIS 2007). Vlachos et al. (SIGMOD 2004). Chien & Immorlica (WWW 2005).
Temporal retrieval models Elsas & Dumais (WSDM 2010). Radinsky (in prep). Li & Croft (2004). Efron & Golovshinshy (SIRGIR
2011). Tasks over time
White et al. (CIKM 2010). Donato et al. (WWW 2010). Kotov et al. (SIGIR 2011). Jones and Klinkner (CIKM 2008).