Top Banner
The Impact of Ranker Quality on Rank Aggregation Algorithms: Information vs. Robustness Sibel Adalı, Brandeis Hill and Malik-Magdon Ismail Rensselaer Polytechnic Institute
21

The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Jun 06, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

The Impact of Ranker Quality on Rank Aggregation Algorithms: Information vs. Robustness

Sibel Adalı, Brandeis Hill and Malik-Magdon IsmailRensselaer Polytechnic Institute

Page 2: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

• Given a set of ranked list of objects, what is the best way to aggregate them into a final ranked list?

• The correct answer depends on the what the objective is.

• The consensus among the input rankers

• The most correct final ordering

• In this paper:

➡ We implement existing rank aggregation methods and introduce new ones.

➡ We implement a statistical framework for evaluating the methods and report on the rank aggregation methods.

Motivation

1

2

3

4

5

Ranks

Ranker1 Ranker2 Ranker3

Page 3: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Related Work

• Rank aggregation methods

• Use of cheap methods such as average and median is common

• Methods based on consensus introduced first by Dwork, Kumar, Naor and Sivakumar [WWW 2001] and median rank as an approximation by Fagin, Kumar, Sivakumar [SIGMOD 2003]

• Methods that integrate rank and textual information are common in meta-searching, for example Lu, Meng, Shu, Yu, Liu [WISE 2005]

• Machine learning methods learn the best factors for a user by incorporating user feedback, for example Joachims [SIGKDD 2002]

• Evaluation of rank aggregation methods are mainly with real data using fairly small data sets, for example Renda, Straccia [SAC 2003]

Page 4: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

• Given two rankers A and B

• Precision (p) finds the number of objects A and B in common (maximization problem)

• Kendall-tau (τ) finds the total number of pairwise disagreements between A and B (minimization problem)

Error Measures

A B C

o1 o2 o4

o2 o3 o2

o3 o1 o3

o4 o4 o1

o5 o6 o7

D

o2

o1

o3

o4

o5

Input Rankers Aggregate • Precision of D with respect to A,B, and C

p(A,D) + p(B,D) + p(C,D) = 5 + 4 + 4 = 13

• Kendall-tau of D with respect to A, B, and C

τ(A,D) + τ(B,D) + τ(C,D) = 1 + 1 + 4 = 6

• Missing values for τ are handled separately.

Page 5: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Aggregation Methods

• Cheap Methods:

• Average (Av)

• Median (Me)

• Precision optimal (PrOPT)

• Methods that aim to optimize the Kendall-tau error of the aggregate with respect to the input rankers

• Markov chain methods (Pagerank, Pg)

• Iterative methods that improve a given aggregate methods

• adjacent pairs (ADJ)

• iterative best flip (IBF)

Page 6: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

• Rank objects with respect to the number of times they appear in all the lists

• Break ties with respect to their ranking in average rankers

• Break remaining ties randomly

Precision Optimal

A B C

o1 o2 o4

o2 o3 o2

o3 o1 o5

o4 o4 o1

o5 o6 o7

Input Rankers

D

o1, o2, o4

o3, o5

o6, o7

D

o2

o1

o4

o3

o5

o6

o7

Number of timeseach object appears

Break ties

D

o2

o1

o4

o3

o5

Choose top K

Page 7: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

o1

o3

o2

1

1/3

2/3

2/3

1/3

• Construct a graph from rankings (similar to Dwork et. al. WWW2001)

• Each object returned in a ranked list is a vertex

• Insert an edge (i,j) for each ranked list where i is ranked higher than j

• Compute the pagerank [Brin & Page, WWW 1998] on this graph

• The edges are weighted (wj,i) proportional to the difference in rank it represents

• The navigation probability is proportional to the edge weights

• The random jump probability (pi) is proportional to the indegree of each node

• Alpha (α) is set to 0.85.

• The pagerank Pgi is the solution to the equations below:

Pgi = !pi + (1 ! !)!

(j,i)!E

Pgjwj,i

Pagerank

A

o1

o2

o3

Page 8: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Iterative Improvement Methods

• Adjacent Pairs (ADJ)• Given an aggregate ranking, flip adjacent pairs until the total error with

respect to the input rankers is reduced -> normally the Kendall-tau error metric is used [Dwork]

• Iterative Best Flip (IBF)• Given an aggregate ranking

While not doneFor each object

record the current configurationfind the best flip among all other objects and do this flip even if it increases the error temporarily and make this the current configuration

Choose the lowest error configuration from the historyIf the overall error is lower or if this is a configuration not seen before, then make this the current configurationElse break ;

Page 9: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Iterative Best Flip

A B C

o1 o5 o1

o2 o2 o4

o3 o3 o2

o4 o4 o3

o5 o1 o5

D

o5

o1

o2

o4

o3

Input Rankers Aggregate

Errorτ = 14

After bestflip for o5

D

o1

o5

o2

o4

o3

Errorτ = 13

After bestflip for o1

Errorτ = 14

D

o2

o5

o1

o4

o3

After bestflip for o2

Errorτ = 13

D

o5

o2

o1

o4

o3

After best flip for o4

Errorτ = 12

D

o4

o2

o1

o5

o3

After bestflip for o3

Errorτ = 11

D

o4

o2

o1

o3

o5

Choose the minimum error configuration from this run!and continue

IBF seems to outperform ADJ and do well even when we start from a random ranking.

Page 10: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Analysis of Aggregation Methods

• Complex aggregators incorporate little nuances about the input rankers.They use more information but are sensitive to noise.

• Simple aggregators disregard information contained in the input rankers but are less sensitive to noise.

• For example average is more complex than median and precision optimal

• How about pagerank and other Kendall-tau optimization based optimizers?

A B

o1 o3

o2 o1

o3 o2

D1

o3

o1

o2

Input Rankers Kendall-tau optimal aggregations

D2

o1

o2

o3

D3

o1

o3

o2

The question we would like to answer is which aggregator performs well under different conditions. Does reducing Kendall-tau with respect to Kendall-tau always lead to a good solution?

Page 11: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Statistical Model of Aggregators

• Suppose there is a correct ranked list called the ground truth that represents the correct ordering.

• The correct ordering is computed for each object using:

• A set of factors that measure the fit of object for a specific criteria (factors F =f1 ... fF where fl in [-3,3])

• Examples of factors are number of occurrences of a keyword, recency of updates to a document or pagerank

• A weight for each factor (W= w1,..., wF where w1 + ... + wF = 1)

• The final score of each object oi is computed using a linear combination function

• Objects are ranked with respect to the scores.

Vi =F!

l=1

wlfl(oi)

Page 12: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

f1f2 f3 f4 f5

w1 w2 w3 w4 w5

GROUND TRUTH

f1j =

f1+!1

f2j =

f2+!2

f3j =

f3+!3

f4j =

f4+!4

f5j =

f5+!5

RANKERj

Objects

o1

..

.

.

.

.

.

.

on

w3j

w4j

w1j w5

jw2j

Vi =F!

l=1

wl.fl(oi) V ji =

F!

l=1

wjl .f

jl (oi)

• Each ranker produces a ranked list by using the same formula and the same factors• The ranker j tries to estimate the factors’ true values for each object,

produces Fj

• It also guesses the correct weights for the combination formula, produces Wj

Statistical Model of Aggregators

Page 13: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

• The ranker’s estimate Fj of a factor introduces an error ε , i.e. Fj = F + εj

• The magnitude of error depends on a variance parameter σ2

• The distribution of the error can be adjusted to model different types of spam

• In our model, we can model various types of correlation between the factors and the errors, but we do not report on those.

Statistical Model of Aggregators

V ar(!jil) = "2 (# ! fl(oi))!.(# + fl(oi))"

maxf!["3,3](# ! f)!.(# + f)"

Page 14: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

• We distribute the scores for each factor uniformly for 100 objects, use 5 factors and 5 rankers

• We set γ= 1, δ = 5, β = 0.01 which models a case where rankers make small mistakes for “good” objects and make increasingly larger mistakes for “bad” objects

• We vary σ2 from 0.1, 1, 5 to 7

• We set the ground truth weights to W

• We assign 1,2,3,4, and 5 rankers to correct weights (W) and the remaining rankers are assigned the incorrect weights (Wr) (nMI represent the number of rankers with the wrong weights)

W = 〈1

15,

2

15,

3

15,

4

15,

5

15〉

Test Setup

W r= !

5

15,

4

15,

3

15,

2

15,

1

15"

Page 15: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Test Setup

• For each setting, we construct 40,000 different data sets • For each dataset, we construct each aggregator for top 10 from the input

rankers and output top 10• Compare the performance of each aggregator with respect to the ground

truth using precision and Kendall-tau• For each error metric, we compute the difference between all pairs of

aggregators• For each test case and error metric, we output for all pairs of aggreagators

[A1, A2] a range [l, h] with 99.9% confidence • We assume A1 and A2 are roughly equivalent (A1 ≡ A2) if the range [l,h]

crosses zero• Otherwise, we construct an ordering A1 > A2 or A2 < A1 based on the

range and the error metric• We order the aggregators using topological sort based on this ordering

for each test and each error metric

Page 16: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

PrOpt

RndIBF

0.0028

PgADJ0.0026

MeADJ0.0024

AvADJ0.0026

PgIBF 0.0027

MeIBF

0.0022

AvIBF

0.0027

Pg

0.0031Me

0.0016

Av

1.4E-4

3.2E-4

5.7E-4

4.0E-4

2.7E-4

7.5E-4

3.0E-4

0.0013

MeIBF

RndIBF0.0197

Me

0.0227

MeADJ

0.0214

0.0244

PrOpt

0.0183

0.0166

PgADJ0.0203

0.0186

AvADJ

0.0170

0.0154

AvIBF 0.0144

0.0128

Pg

8.7E-4

0.00210.00469

PgIBF0.0172

0.0155

Av

0.0138

0.0121

Results, precision for nMI = 0

Legend

Av Average

Me Median

Pg Pagerank

Rnd Random

PrOpt Precision Optimal

xADJ ADJ opt. after aggregator x

xIBF IBF opt. after aggregator x

σ2 = 0.1 σ2 = 1.0

Page 17: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Av Me0.1006MeADJ 0.0286AvADJ 0.0583AvIBF 0.0929RndIBF 0.0115PgIBF 0.0168

MeIBF0.0129

PgADJ

0.0151

0.0190

Pg 0.0157

0.0196

PrOpt

0.0125

0.0119

Me Av0.0672AvADJ 0.0799MeADJ 0.0319

AvIBF0.4138

RndIBF

0.4162

PrOpt

0.0216

0.0192

PgADJ0.0217

0.0193

PgIBF0.0181

0.0156

MeIBF

0.0165

0.0141

Pg

0.0181

0.0157

Results, precision for nMI = 0

Legend

Av Average

Me Median

Pg Pagerank

Rnd Random

PrOpt Precision Optimal

xADJ ADJ opt. after aggregator x

xIBF IBF opt. after aggregator x

σ2 = 5

σ2 = 7.5

Page 18: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Kendall-tau results for nMI = 2

Legend

Av Average

Me Median

Pg Pagerank

Rnd Random

PrOpt Precision Optimal

xADJ ADJ opt. after aggregator x

xIBF IBF opt. after aggregator x

Pg Av-4.348PrOpt -4.7128AvADJ -6.6752PgADJ -1.3268AvIBF -0.0285PgIBF -0.0761RndIBF -0.174MeIBF -0.2622MeADJ -0.179Me -0.0186

MeADJ Me-2.7771Av -0.3827AvADJ -1.5151PgIBF -5.384

AvIBF

-5.3778RndIBF

-0.4184

-0.4246

PgADJ -0.1668

Pg

-0.1578MeIBF

-0.0944

-0.1034PrOpt -0.0998

AvNB MeNB-1.8643

MeNBadj-0.544575

AvNBadj-1.48045

PgNB-0.541825

PrOpt-0.057450004

PgNBadj-1.1758001

AvNBIBF-0.181225

PgNBIBF -0.1032

RndIBF

-0.07852385

MeNBIBF-0.068401285

PgNB AvNB-3.46555

PrOpt-1.7299

AvNBadj-5.63925

PgNBadj-1.142

PgNBIBF-0.339975

AvNBIBF-0.062074997

RndIBF-0.67745

MeNBIBF-0.166725

MeNB-0.3103

MeNBadj-0.05835

σ2 = 5

σ2 = 7.5

σ2 = 1

σ2 = 0.1

Page 19: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

MeADJ

PrOpt0.0030

PgADJ0.0033

MeIBF

0.0028PgIBF 0.0022

Me 0.0125

AvIBF

0.0108Pg

0.0111

0.0128RndIBF 0.0039AvADJ 0.3533Av 0.5772

MeADJ Me0.0635Av 0.1114AvADJ 0.0418AvIBF 0.1383MeIBF 0.0075

RndIBF0.1412PgIBF

0.0147

0.0194

PrOpt0.0184

PgADJ 0.0174

Pg

0.0202

Precision results for nMI = 4

Legend

Av Average

Me Median

Pg Pagerank

Rnd Random

PrOpt Precision Optimal

xADJ ADJ opt. after aggregator x

xIBF IBF opt. after aggregator x

σ2 = 7.5

σ2 = 0.1

Page 20: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Result Summary

• Low noise:

• Average is best when all the rankers are the same

• Median is best when there is asymmetry among the rankers

• High noise

• Robustness is needed, PrOpt, IBF and Pg are the best

• As misinformation increases, robust but more complex rankers tend to do better

PrOpt

Av*

Pg*

PrOpt

Me*

PgADJ

Me

(MeADJ)

Av

(Pg)

Av

(AvADJ)

Av

Pg*

PrOpt

MeIBF

PgADJ

Me

(MeADJ)

Av

(Pg)

Av

(AvADJ)

PrOpt

(Pg

PgADJ)

PrOpt

MeIBF

Pg*

RndIBF

MeIBF

Av

(Pg

AvADJ)

Av

(AvADJ)

PrOpt

Pg*

MeIBF

PrOpt

MeIBF

Pg*

PrOpt

Pg*

*IBF

PrOpt

Pg

PgADJ

PrOpt

Pg

PgADJ

high

noise

low

noise

less

misinformation

more

misinformation

Page 21: The Impact of Ranker Quality on Rank Aggregation ... · Input Rankers D o1, o2, o4 o3, o5 o6, o7 D o2 o1 o4 o3 o5 o6 o7 Number of times each object appears Break ties D o2 o5 Choose

Conclusion and Future Work

• Two new aggregation methods, PrOPT and IBF that seem to do well in many cases, IBF seems to do well even starting from a random ranking

• No single rank aggregation method is best, there is a trade-off between information and robustness

• Further evaluation of rank aggregation methods is needed

• Testing with various correlations both positive and negative between ranking factors and the errors made on these

• Testing of the model with negative weights where misinformation is more misleading