Background Motivation Model & Metric Experimental Setup Results Summary Incorporating Clicks, Attention and Satisfaction into a SERP Evaluation Model Aleksandr Chuklin ¶,§ Maarten de Rijke § [email protected][email protected]¶ Google Research Europe § University of Amsterdam AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 1
52
Embed
Incorporating Clicks, Attention and Satisfaction into a SERP Evaluation Model
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Background Motivation Model & Metric Experimental Setup Results Summary
Incorporating Clicks, Attention and Satisfactioninto a SERP Evaluation Model
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 1
Background
Background Motivation Model & Metric Experimental Setup Results Summary
Search Engine Result Page (SERP) Evaluation
Main problem
Combining relevance of individual SERP items (Rk) into awhole-page metric.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 3
Background Motivation Model & Metric Experimental Setup Results Summary
Search Engine Result Page (SERP) Evaluation
Examples
Precision at N:
P@N =1
N
N∑k=1
Rk
Discounted Cumulative Gain (DCG):
DCG@N =N∑
k=1
1
log2 (1 + k)· Rk
Model-Based Metrics (Chuklin et al. 2013):
Utility@N =N∑
k=1
P(Ck = 1) · Rk
document 3
document 4
document 1
document 2
document 5
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 4
Background Motivation Model & Metric Experimental Setup Results Summary
Search Engine Result Page (SERP) Evaluation
Examples
Precision at N:
P@N =1
N
N∑k=1
Rk
Discounted Cumulative Gain (DCG):
DCG@N =N∑
k=1
1
log2 (1 + k)· Rk
Model-Based Metrics (Chuklin et al. 2013):
Utility@N =N∑
k=1
P(Ck = 1) · Rk
document 3
document 4
document 1
document 2
document 5
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 4
Background Motivation Model & Metric Experimental Setup Results Summary
Search Engine Result Page (SERP) Evaluation
Examples
Precision at N:
P@N =1
N
N∑k=1
Rk
Discounted Cumulative Gain (DCG):
DCG@N =N∑
k=1
1
log2 (1 + k)· Rk
Model-Based Metrics (Chuklin et al. 2013):
Utility@N =N∑
k=1
P(Ck = 1) · Rk
document 3
document 4
document 1
document 2
document 5
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 4
Background Motivation Model & Metric Experimental Setup Results Summary
Search Engine Result Page (SERP) Evaluation
Examples
Precision at N:
P@N =1
N
N∑k=1
Rk
Discounted Cumulative Gain (DCG):
DCG@N =N∑
k=1
1
log2 (1 + k)· Rk
Model-Based Metrics (Chuklin et al. 2013):
Utility@N =N∑
k=1
P(Ck = 1) · Rk
document 3
document 4
document 1
document 2
document 5
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 4
Background Motivation Model & Metric Experimental Setup Results Summary
Main Goal of This Paper
Better measure for SERP utility
Namely, improve this (Chuklin et al. 2013):
N∑k=1
P(Ck = 1) · Rk
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 5
Motivation
Background Motivation Model & Metric Experimental Setup Results Summary
Complex Heterogeneous SERPs
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 7
Background Motivation Model & Metric Experimental Setup Results Summary
Motivation 1: Non-Trivial Attention Patterns
Hello world program - Wikipedia, the free encyclopediaA " Hello world" program is a computer program that prints out " Hello world" on a display device. It pre-dates the age of the World Wide Web where posted messages ...
Hello-World: World Languages for Children of all agesGames, songs and activities make learning any language fun. Use hello-world by itself or as an enhancement to any language program!
CMT : Videos : Lady Antebellum : Hello WorldWatch Lady Antebellum's music video Hello World for free on CMT.com
hello world
Image Results
Search
Hello WorldHello World Approach and Methodology The idea of Hello World was first conceived by scholars teaching English in Asia in the early 1990's. Their basic belief was that ...
Hello World SoftwareBuy hello world.
FiltersAnytimePast DayPast WeekPast Month
hello world Search
(a) Presentation
9
1
3
5
6
7
8
42
(b) Arrangement
9
1
3
5
6
7
8
42
(c) Mouse Data
Figure 1: Module-level representation of mouse-tracking data. The session sequence for this data would be[1, 3, 5, 6, 7, 6, 5, 3, 5].
Figure 2: Distribution of unique page arrangements forSERPs from two large scale web search engines. The hor-izontal axis indicates the rank of the arrangement whensorted by frequency. The vertical axis indicates the fre-quency of that arrangement.
In addition, we propose a user model which allows us togeneralize to arbitrary page arrangements. This is impor-tant because previous user models based on click logs allassume a single topology across all queries. That is, by ig-noring non-web modules, the graph structure in Figure 3(a)is shared across all queries. In our case, the topology in Fig-ure 3(b) might be di↵erent for two arbitrary queries. There-fore, the edge weights learned for one query will be uselessof a novel arrangement (topology).
In order to estimate the parameters of our user model,we exploit user mouse behavior associated with a SERP ar-rangement. We adopt this strategy because of the high cor-relation in general between eye fixation and mouse position[9]. Previous work has confirmed this correlation for SERPs[30, 16].
The focus of our study will be on the problem of construct-ing robust models able to make predictions about mouse be-havior on arrangements for which we have little or no dataavailable. Having such models provide a tool which can beused when manually designing new pages [31]. At a largerscale, mouse-tracking models could be useful for retrospec-
m1
m2
m3
m4
m5
m0
m6
(a) linear
m1
m2
m3
m4
m5
m0
m6
(b) relaxed
Figure 3: The linear scan model and its relaxation.
tively detecting ‘good abandonments’, cases where the userwas satisfied without clicking a link [21].
In this paper, we make the following contributions,
• a generalization of the linear scan model.
• an e�cient and e↵ective method for estimating thegeneralized model.
• an e�cient and e↵ective method for estimating param-eters of unobserved arrangements (topologies).
• experiments reproduced on data sets from two largecommercial search engines.
2. RELATED WORKThe motivation for capturing mouse movement at scale
originates from results demonstrating a strong correlationbetween eye and mouse position [9]. In the context of websearch, this correlation has been reproduced on SERPs [30],suggesting that, with some care [16], we can use loggedmouse data as a ‘big data’ complement to eye-tracking stud-ies [3]. Such studies have found that mouse-tracking is usefulfor click prediction [17] and advertisement interest predic-tion [14]. In fact, mouse movement analysis has been sug-gested as useful for web site usability analysis in general [2,3]. Even without assuming a relationship between eye andmouse, important search signals such as query intent [13]and document relevance [18] can be detected.
1452
Image credits: F. Diaz, R.W. White, G. Buscher, and D. Liebling. Robust models of mouse movement on dynamicweb search results pages. In CIKM, 2013. ACM Press
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 8
Background Motivation Model & Metric Experimental Setup Results Summary
Motivation 2: Satisfaction Without Clicks
High direct page utility (measured by DCG or ERR) leads to higherabandonment rate (SERPs with no clicks)
direct page utility
Image credits: from A. Chuklin and P. Serdyukov. Good abandonments in factoid queries. In WWW, 2012.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 9
Background Motivation Model & Metric Experimental Setup Results Summary
Problems of Existing Models and Evaluation Metrics
existing models mostly do not model non-trivial userattention patterns
existing models do not use explicit user satisfaction data
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 10
Background Motivation Model & Metric Experimental Setup Results Summary
Problems of Existing Models and Evaluation Metrics
existing models mostly do not model non-trivial userattention patterns
existing models do not use explicit user satisfaction data
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 10
Background Motivation Model & Metric Experimental Setup Results Summary
Problems of Existing Models and Evaluation Metrics
existing models mostly do not model non-trivial userattention patterns
existing models do not use explicit user satisfaction data
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 10
Model & Metric
Background Motivation Model & Metric Experimental Setup Results Summary
Clicks + Attention + Satisfaction (CAS) Model
SERP
𝜑&
𝐸&
𝐶&
𝜑)
𝐸)
𝐶)
𝜑*
𝐸*
𝐶*
𝑆
…
Utility
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 12
Background Motivation Model & Metric Experimental Setup Results Summary
Clicks + Attention + Satisfaction (CAS) Model
SERP
𝜑&
𝐸&
𝐶&
𝜑)
𝐸)
𝐶)
𝜑*
𝐸*
𝐶*
𝑆
…
Utility
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 13
Background Motivation Model & Metric Experimental Setup Results Summary
Click Model
Examination assumption: click happens only when an item wasexamined and attractive:
P(Ck = 1) = P(Ek = 1) · P(Ck = 1 | Ek = 1)
N.B. Here we assume that P(Ck = 1 | Ek = 1) = α(~Rk) where ~Rk
comes from the raters and α is a logistic function.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 14
Background Motivation Model & Metric Experimental Setup Results Summary
Click Model
Examination assumption: click happens only when an item wasexamined and attractive:
P(Ck = 1) = P(Ek = 1) · P(Ck = 1 | Ek = 1)
N.B. Here we assume that P(Ck = 1 | Ek = 1) = α(~Rk) where ~Rk
comes from the raters and α is a logistic function.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 14
Background Motivation Model & Metric Experimental Setup Results Summary
Clicks + Attention + Satisfaction (CAS) Model
SERP
𝜑&
𝐸&
𝐶&
𝜑)
𝐸)
𝐶)
𝜑*
𝐸*
𝐶*
𝑆
…
Utility
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 15
Background Motivation Model & Metric Experimental Setup Results Summary
Attention (Examination) Model
Logistic regression model:
P(Ek = 1) = ε(~ϕk),
where ~ϕk is a vector of features for SERP item k .
Feature group Features # of features
rank user-perceived rank of the SERP item(can be different from k)
1
CSS classes SERP item type (Web, News,Weather, Currency, KnowledgePanel, etc.)
10
geometry offset from the top, first or second col-umn (binary), width (w), height (h),w × h
5
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 16
Background Motivation Model & Metric Experimental Setup Results Summary
Attention (Examination) Model
Logistic regression model:
P(Ek = 1) = ε(~ϕk),
where ~ϕk is a vector of features for SERP item k .
Feature group Features # of features
rank user-perceived rank of the SERP item(can be different from k)
1
CSS classes SERP item type (Web, News,Weather, Currency, KnowledgePanel, etc.)
10
geometry offset from the top, first or second col-umn (binary), width (w), height (h),w × h
5
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 16
Background Motivation Model & Metric Experimental Setup Results Summary
Attention (Examination) Model
Logistic regression model:
P(Ek = 1) = ε(~ϕk),
where ~ϕk is a vector of features for SERP item k .
Feature group Features # of features
rank user-perceived rank of the SERP item(can be different from k)
1
CSS classes SERP item type (Web, News,Weather, Currency, KnowledgePanel, etc.)
10
geometry offset from the top, first or second col-umn (binary), width (w), height (h),w × h
5
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 16
Background Motivation Model & Metric Experimental Setup Results Summary
Clicks + Attention + Satisfaction (CAS) Model
SERP
𝜑&
𝐸&
𝐶&
𝜑)
𝐸)
𝐶)
𝜑*
𝐸*
𝐶*
𝑆
…
Utility
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 17
Background Motivation Model & Metric Experimental Setup Results Summary
Satisfaction Model
in previous models, satisfaction comes only from clickedresults;
in our model it also comes from the SERP items that simplyattracted attention;
P(S = 1) = σ(τ0 + U) =
σ
(τ0 +
∑k
P(Ek = 1)ud( ~Dk) +∑k
P(Ck = 1)ur (~Rk)
)
where ~Dk and ~Rk are ratings assigned by the raters for directsnippet relevance and result relevance respectively. ud and ur arelinear functions of rating histograms.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 18
Background Motivation Model & Metric Experimental Setup Results Summary
Satisfaction Model
in previous models, satisfaction comes only from clickedresults;
in our model it also comes from the SERP items that simplyattracted attention;
P(S = 1) = σ(τ0 + U) =
σ
(τ0 +
∑k
P(Ek = 1)ud( ~Dk) +∑k
P(Ck = 1)ur (~Rk)
)
where ~Dk and ~Rk are ratings assigned by the raters for directsnippet relevance and result relevance respectively. ud and ur arelinear functions of rating histograms.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 18
Background Motivation Model & Metric Experimental Setup Results Summary
Satisfaction Model
in previous models, satisfaction comes only from clickedresults;
in our model it also comes from the SERP items that simplyattracted attention;
P(S = 1) = σ(τ0 + U) =
σ
(τ0 +
∑k
P(Ek = 1)ud( ~Dk) +∑k
P(Ck = 1)ur (~Rk)
)
where ~Dk and ~Rk are ratings assigned by the raters for directsnippet relevance and result relevance respectively. ud and ur arelinear functions of rating histograms.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 18
Background Motivation Model & Metric Experimental Setup Results Summary
Satisfaction Model
in previous models, satisfaction comes only from clickedresults;
in our model it also comes from the SERP items that simplyattracted attention;
P(S = 1) = σ(τ0 + U) =
σ
(τ0 +
∑k
P(Ek = 1)ud( ~Dk) +∑k
P(Ck = 1)ur (~Rk)
)
where ~Dk and ~Rk are ratings assigned by the raters for directsnippet relevance and result relevance respectively. ud and ur arelinear functions of rating histograms.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 18
Background Motivation Model & Metric Experimental Setup Results Summary
The CAS Metric
Utility that determines the satisfaction probability:
U =∑k
P(Ek = 1)ud( ~Dk)
︸ ︷︷ ︸NEW
+∑k
P(Ck = 1)ur (~Rk)
︸ ︷︷ ︸Chuklin et al. 2013
has an additional term
trained on mousing and satisfaction (in addition to clicks)
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 19
Background Motivation Model & Metric Experimental Setup Results Summary
The CAS Metric
Utility that determines the satisfaction probability:
U =∑k
P(Ek = 1)ud( ~Dk)︸ ︷︷ ︸NEW
+∑k
P(Ck = 1)ur (~Rk)︸ ︷︷ ︸Chuklin et al. 2013
has an additional term
trained on mousing and satisfaction (in addition to clicks)
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 19
Background Motivation Model & Metric Experimental Setup Results Summary
The CAS Metric
Utility that determines the satisfaction probability:
U =∑k
P(Ek = 1)ud( ~Dk)︸ ︷︷ ︸NEW
+∑k
P(Ck = 1)ur (~Rk)︸ ︷︷ ︸Chuklin et al. 2013
has an additional term
trained on mousing and satisfaction (in addition to clicks)
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 19
Experimental Setup
Background Motivation Model & Metric Experimental Setup Results Summary
Dataset
199 queries with explicit unambiguousfeedback (satisfied / not satisfied);
1,739 rated results
direct snippet relevance (D)
result relevance (R)
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 21
Background Motivation Model & Metric Experimental Setup Results Summary
Dataset
199 queries with explicit unambiguousfeedback (satisfied / not satisfied);
1,739 rated results
direct snippet relevance (D)
result relevance (R)
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 21
Background Motivation Model & Metric Experimental Setup Results Summary
Baselines and CAS Model Variants
UBM model that agreeswell with online team-draftexperimental outcomes;
PBM position-based model,a robust model with fewerparameters than UBM;
random model that predictsclick and satisfaction withfixed probabilities (learnedfrom the data).
uUBM fromChuklin et al. 2013. Similarto UBM, but parameters aretrained on a different andmuch bigger dataset.
CASnod is a stripped-downversion that does not use(D) labels;
CASnosat is a version ofthe CAS model that doesnot include the satisfactionterm while optimizing themodel;
CASnoreg is a version ofthe CAS model that doesnot use regularization whiletraining. All other modelswere trained withL2-regularization.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 22
Background Motivation Model & Metric Experimental Setup Results Summary
Baselines and CAS Model Variants
UBM model that agreeswell with online team-draftexperimental outcomes;
PBM position-based model,a robust model with fewerparameters than UBM;
random model that predictsclick and satisfaction withfixed probabilities (learnedfrom the data).
uUBM fromChuklin et al. 2013. Similarto UBM, but parameters aretrained on a different andmuch bigger dataset.
CASnod is a stripped-downversion that does not use(D) labels;
CASnosat is a version ofthe CAS model that doesnot include the satisfactionterm while optimizing themodel;
CASnoreg is a version ofthe CAS model that doesnot use regularization whiletraining. All other modelswere trained withL2-regularization.
AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 22
Results
Background Motivation Model & Metric Experimental Setup Results Summary
Is the New Metric Really New?Correlation Between Metrics
Table: Correlation between metrics measured by average Pearson’scorrelation coefficient.