Search, Interrupted: Understanding and Predicting Search Task Continuation Eugene Agichtein* Emory University Atlanta, GA, USA [email protected]Ryen W. White, Susan T. Dumais, and Paul N. Bennett Microsoft Research Redmond, WA, USA {ryenw, sdumais, pauben}@microsoft.com ABSTRACT Many important search tasks require multiple search sessions to complete. Tasks such as travel planning, large purchases, or job searches can span hours, days, or even weeks. Inevitably, life interferes, requiring the searcher either to recover the “state” of the search manually (most common), or plan for interruption in advance (unlikely). The goal of this work is to better understand, characterize, and automatically detect search tasks that will be continued in the near future. To this end, we analyze a query log from the Bing Web search engine to identify the types of intents, topics, and search behavior patterns associated with long-running tasks that are likely to be continued. Using our insights, we devel- op an effective prediction algorithm that significantly outperforms both the previous state-of-the-art method, and even the ability of human judges, to predict future task continuation. Potential appli- cations of our techniques would allow a search engine to pre- emptively “save state” for a searcher (e.g., by caching search re- sults), perform more targeted personalization, and otherwise better support the searcher experience for interrupted search tasks. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – search process; selection process Keywords Search session analysis; Search behavior; Personalization. 1. INTRODUCTION As Web search becomes increasingly important for planning and decision making, the complexity and scope of search tasks per- formed on search engines is increasing. Search engines are now often used for tasks such as travel planning, job hunting, or real estate searching. However, these tasks require significantly more effort and time to complete [10][21][24][25], potentially spanning days, weeks, or even months. While existing commercial Web search engines such as Bing and Google now provide tools to help users maintain and manage their search histories, the support they provide is not sufficient and the tools are not specifically designed to allow searchers to resume tasks that may been interrupted. A challenge for search engines is to detect when a searcher is performing a long-running search task and predict whether they will continue it in the future. To this end, we analyze a query log from Bing to understand the types of intents, motivations, topics, and search behaviors associated with long-running tasks that are likely to be continued. Specifically, we try to understand search task continuation by analyzing tasks that were and were not con- tinued by over a thousand Web searchers. For example, consider the task of planning a wedding. The searcher might begin by checking recommended venues and their availabilities. However, at that point the task could be interrupted, as it requires checking dates and venues with the immediate fami- ly. When the task is continued the next day, the searcher has to re- start from the beginning, unless the user planned for this event, and manually saved the most promising intermediate results. In- deed, there has been previous work on system support that lets users explicitly record promising content [10][27]. However, a perfect search engine could save the user the trouble if it could reliably detect that a suspended search session is likely to be con- tinued at a later time. While previous studies have considered long running tasks span- ning multiple sessions (e.g., [10][21][24][25]), we dive deeper into the problem of task continuation to analyze the intent, moti- vation, and topics of these tasks. The more extensive analysis we perform allows for a fuller understanding of which tasks are most commonly resumed, in turn resulting in more accurate task con- tinuation prediction. Potential applications include pre-emptively “saving state” for a searcher (e.g., by caching search results), more targeted personalization, and otherwise better supporting the searcher experience for long-running searches. More formally, our problem is predicting task continuation: Given an active search task that has been suspended, predict whether the searcher will continue the task in the near future (e.g., within the next five days). This problem is challenging, since it requires a search engine to make predictions about the kinds of tasks that tend to be contin- ued, which intuitively would require substantial knowledge about the world. Yet, this work presents techniques to make these pre- dictions automatically as well as, and often better than, experi- enced human annotators. Our contributions are threefold: A large-scale characterization of the intents, motivations, and topics associated with long-running search tasks (Section 3). Novel features to effectively capture these characteristics for automated prediction of task continuation (Section 4). Techniques for accurate prediction of continuation that outper- form both a state-of-the-art automatic baseline and human pre- dictions, coupled with the analysis of the most effective fea- tures used by the predictive algorithms (Section 5). Next, we present related work to put our contributions in context. 2. RELATED WORK Prior research that relates to what we describe in this paper falls into four main areas: (i) behavioral analysis and modeling of search, (ii) understanding search intent, (iii) analysis of cross- session tasks, and (iv) task switching and interruptions. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR’12, August 12–16, 2012, Portland, Oregon, USA. Copyright 2012 ACM 978-1-4503-1472-5/12/08...$15.00. * Work done while visiting Microsoft Research.
10
Embed
Search, Interrupted: Understanding and Predicting Search ...€¦ · start from the beginning, unless the user planned for this event, and manually saved the most promising intermediate
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Search, Interrupted: Understanding and Predicting Search Task Continuation
Eugene Agichtein* Emory University Atlanta, GA, USA
1. INTRODUCTION As Web search becomes increasingly important for planning and
decision making, the complexity and scope of search tasks per-
formed on search engines is increasing. Search engines are now
often used for tasks such as travel planning, job hunting, or real
estate searching. However, these tasks require significantly more
effort and time to complete [10][21][24][25], potentially spanning
days, weeks, or even months. While existing commercial Web
search engines such as Bing and Google now provide tools to help
users maintain and manage their search histories, the support they
provide is not sufficient and the tools are not specifically designed
to allow searchers to resume tasks that may been interrupted.
A challenge for search engines is to detect when a searcher is
performing a long-running search task and predict whether they
will continue it in the future. To this end, we analyze a query log
from Bing to understand the types of intents, motivations, topics,
and search behaviors associated with long-running tasks that are
likely to be continued. Specifically, we try to understand search
task continuation by analyzing tasks that were and were not con-
tinued by over a thousand Web searchers.
For example, consider the task of planning a wedding. The
searcher might begin by checking recommended venues and their
availabilities. However, at that point the task could be interrupted,
as it requires checking dates and venues with the immediate fami-
ly. When the task is continued the next day, the searcher has to re-
start from the beginning, unless the user planned for this event,
and manually saved the most promising intermediate results. In-
deed, there has been previous work on system support that lets
users explicitly record promising content [10][27]. However, a
perfect search engine could save the user the trouble if it could
reliably detect that a suspended search session is likely to be con-
tinued at a later time.
While previous studies have considered long running tasks span-
ning multiple sessions (e.g., [10][21][24][25]), we dive deeper
into the problem of task continuation to analyze the intent, moti-
vation, and topics of these tasks. The more extensive analysis we
perform allows for a fuller understanding of which tasks are most
commonly resumed, in turn resulting in more accurate task con-
tinuation prediction. Potential applications include pre-emptively
“saving state” for a searcher (e.g., by caching search results),
more targeted personalization, and otherwise better supporting the
searcher experience for long-running searches.
More formally, our problem is predicting task continuation:
Given an active search task that has been suspended,
predict whether the searcher will continue the task in
the near future (e.g., within the next five days).
This problem is challenging, since it requires a search engine to
make predictions about the kinds of tasks that tend to be contin-
ued, which intuitively would require substantial knowledge about
the world. Yet, this work presents techniques to make these pre-
dictions automatically as well as, and often better than, experi-
enced human annotators. Our contributions are threefold:
A large-scale characterization of the intents, motivations, and
topics associated with long-running search tasks (Section 3).
Novel features to effectively capture these characteristics for automated prediction of task continuation (Section 4).
Techniques for accurate prediction of continuation that outper-
form both a state-of-the-art automatic baseline and human pre-
dictions, coupled with the analysis of the most effective fea-tures used by the predictive algorithms (Section 5).
Next, we present related work to put our contributions in context.
2. RELATED WORK Prior research that relates to what we describe in this paper falls
into four main areas: (i) behavioral analysis and modeling of
search, (ii) understanding search intent, (iii) analysis of cross-
session tasks, and (iv) task switching and interruptions.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGIR’12, August 12–16, 2012, Portland, Oregon, USA. Copyright 2012 ACM 978-1-4503-1472-5/12/08...$15.00.
* Work done while visiting Microsoft Research.
Search behavior has been studied intensely in recent years. Log
data from search engines have proven to be extremely valuable in
studying how people search in naturalistic settings across a wide
variety of different search intents. Most previous work has
focused on search behavior analysis and prediction within a single
search session [1][7][42], and related queries within a session can
be part of a search goal [16][19], which try to represent the more
abstract concept of search intent given only observable events.
However, there is growing interest in using long-term search log
data to build models of users’ interests [39] and improve search
result ranking [34].
An important part of representing search intent is understanding
the various types of search tasks and the different motivations that
searchers may have for pursuing their information goals. Earlier
work on understanding search behavior focused on classifying
queries into high-level search goals, such as informational,
navigational and transactional [6][8][32]. Kellar et al. [20] con-
ducted a field study in which they logged detailed Web usage and
asked participants to provide task categorizations of their Web
usage based on the following categories: fact finding, information
gathering, browsing, and transactions. They showed differences in
search behavior per task type. In particular, information gathering
tasks were the most complex; participants spent more time com-
pleting this task, viewed more pages, and used the Web browser
functions most heavily during this task. Li and Belkin [23] review
and discuss previously-proposed task classifications and develop a
faceted classification that can be used to describe searchers’ work
tasks and information search tasks. They identify essential facets
and categorize them into generic task facets (e.g., source, product,
and goal) and common task attributes (e.g., task characteristics
and user perceptions). Rather than characterizing the nature of the
search intent, Radlinski et al. [30] model search intent from que-
ries and clicks in a way that could be directly consumed by search
engines. Goals and related constructs (such as search intent) have
also been widely studied in psychological research. Austin and
Vancouver [4] review the theoretical development of the structure
and properties of goals, goal establishment and striving processes,
and goal-content taxonomies, which we use to motivate the selec-
tion of task dimensions to analyze. In fact, to our knowledge, our
research is the first attempt to bring theory of motivation from
psychology to bear on search intent analysis.
In this paper we focus on tasks extending across multiple sessions.
Search behavior can be analyzed over time to identify queries that
express the same underlying information need. Previous work has
tried to automatically identify queries on the same task. Mei et al.
[26] proposed a framework to study sequences of search activities
and focused on simple prediction and classification tasks, ranging
from predicting whether the next click will be on an algorithmic
result to segmenting the query stream into goals and missions.
Teevan et al. [37] showed, via query log analysis, that nearly 40%
of queries were attempts to re-find previously encountered results.
Aula et al. [3] studied the search and information re-access
strategies of experienced Web users using a survey. They found
that people often have difficulty remembering the queries they
used originally to discover information of interest. MacKay and
Watters [25] explored a variety of Web-based information seeking
tasks and found that almost 60% of complex information
gathering tasks continued across sessions. Liu and Belkin [24]
examined the structure (parallel or dependent) of tasks that extend
across different search sessions. Jones and Klinker [19] proposed
methods to partition a query stream into research missions and
goals, where each mission corresponds to a set of related
information needs and may include multiple search goals. Morris
et al. [27] developed SearchBar, a system that proactively and
persistently stores query histories, browsing histories, and users’
notes and ratings. SearchBar supports multi-session investigations
by assisting with task context resumption and information re-
finding. Donato et al. [10] developed SearchPad, a system that
automatically identifies research missions and presents a search
workspace comprising previous queries and results related to the
mission. SearchPad uses measures of topic coherence between
pairs of consecutive queries and user engagement to identify such
research missions. This work was further extended by Aiello et al.
[2] to group queries into mission-coherent clusters based on
searcher behavior. However, none of the research described so far
specifically addressed the important challenge of predicting
search task continuation.
The most similar research to this paper is that of Kotov et al. [21].
In that paper, the authors describe research on modeling cross-
session information needs, and address the challenge of
identifying all previous queries in a user’s search history on the
same task as the current query, and predicting whether a user will
return to the task in future sessions. Kotov et al. developed
classifiers for these two tasks and through evaluation using
labeled data from search logs showed that their classifiers can
perform both tasks effectively. We use these classifiers as a
baseline for some of the analysis presented later in the paper.
Also relevant to this work is previous research on task switching
and interruptions. Multi-tasking and external factors such as
interruptions have been previously associated with prolonged
search tasks. Spink [35] studied the multi-tasking behavior of a
single searcher in a public library using diary, observation and
interviews and found that switching between tasks was common.
On the basis of that study, she then developed a model of infor-
mation multi-tasking and information task switching. Czerwinski
et al. [9] present the findings of a week-long diary study of task
interleaving admist interruptions, following eleven information
workers in a non-search setting. They show that task complexity,
task duration, length of absence, interruption count, and task type
influence the perceived difficulty of switching back to tasks with
participants reporting that it was most difficult to recommence
complex tasks. The features we devise to represent tasks adapt and operationalize these ideas.
The research presented in this paper extends previous work in a
number of ways. First, we perform a detailed descriptive analysis
of the cross-session search tasks that maps task intents and
motivations derived from the information science and psychology
literatures to evidence of task continuation mined from search
logs and labeled by trained human annotators. Second, we
propose new features to model characteristics of cross-session
search tasks, focusing on future task continuation, using features
of search behavior mined from annotated log data. Third, we show
that these features can improve continuation modeling and
prediction over a previously-reported state-of-the-art baseline, and
even over experienced human annotators attempting to perform
the same prediction task.
3. UNDERSTANDING CROSS-SESSION
TASKS: DESCRIPTIVE ANALYSIS
This section describes the data collection (Section 3.1), the human
data annotation for the dimensions hypothesized to be related to
search task continuation (Section 3.2), and presents analysis of
task characteristics (Sections 3.4-3.5) based on both manual anno-
tation of the tasks and an extended set of search log data.
3.1 Data Collection The data were gathered from the Microsoft Bing commercial Web
search engine by sampling a set of sessions over a one-week peri-
od for more than 1,000 users. Similar to [21], we study what have
been previously defined as “early dominant” tasks identified for
each user. An early dominant task is defined as having at least two
distinct queries issued within a two-day period at the beginning of
the week of interest. Some of these tasks are continued later dur-
ing the week, while others are not. The data that we used for our
study are summarized in Table 1.
Users and tasks (1 early-dominant task per user)
1,191
Unique queries 28,474
Active period Last week of February 2010
Prior history 2 weeks prior
Continued tasks 683 ( 57% )
Table 1. Search log data used in this study.
Additionally, the data above were augmented by extracting up to
an additional two weeks of prior history for each user in the sam-
ple, from the two weeks immediately before the week of interest.
This history contained search sessions determined based on a 30-
minute inactivity timeout [40], as well as the queries and URLs
issued and visited. This allowed us to study the potential for utiliz-
ing additional profile information to predict task continuation.
3.2 Data Annotation We annotated the characteristics of early dominant search tasks
(defined above) according to a range of dimensions derived from
information and cognitive science literatures, following the proce-
dure in Section 3.3. Our goals were: (1) to analyze the relationship
of task characteristics to task continuation and (2) to learn to au-
tomatically identify these characteristics for better search continu-
ation modeling and prediction. In particular, we wished to investi-
gate how task intent and motivation, as well as other contextual
factors such as task urgency, relate to the likelihood of continuing
a search task (within the one-week horizon that we used in our
study). In the remainder of this subsection we define the dimen-
sions on which we annotated tasks.
Intent Type: The type of the task, derived from previous studies
in the information science literature (e.g., [20][23]). The hypothe-
sis is that some task types, such as information gathering or trans-
actions, are associated with task continuation. The specific intent
types chosen for labeling were:
Fact finding (focused): Find specific piece(s) of information (e.g., a query such as “mc gilvery oil wolsey”).
Information gathering (exploration): Find information on a
topic rather than for a specific fact (e.g., “english comedy”).
Undirected browsing: Explore a site or the Web without an
obvious goal (e.g., “portland craigs list”).
Transaction: Accomplish a task or perform a transaction online
(e.g., “pay discover card bill”)
Communication (social): Read or interact in online social sites such as forums.
Information maintenance or update: Monitor information on a
running topic and possibly update a Web resource.
Motivation: The cognitive or affective motivation inferred to be
behind the task, derived and simplified from cognitive science and
psychology literature [4]. Our intuition was that some motivations
are more likely to associate with task continuation than others.
The motivations selected for labeling were:
Affective: Based on emotion or feeling, with sub-categories of