Top Banner
Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC
50

Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

Jan 13, 2016

Download

Documents

Merry Reeves
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

Ranking in DB

Laks V.S. Lakshmanan

Depf. of CS

UBC

Page 2: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 2

Why ranking in query answering? 1/3

• Mutimedia data – fuzzy querying: e.g., “find top 2 red objects with a soft texture”.

Obj Score

D 0.85

B 0.80

A 0.75

E 0.65

C 0.60

Obj Score

A 0.9

D 0.8

C 0.4

B 0.3

E 0.1

Combine scores

Overall score

Page 3: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 3

Why ranking? 2/3

• IR: “find top 5 documents relevant to `computational’, `neuroscience’ and `brain theory’. – IR systems maintain full text indexes; inverted lists of

docs w.r.t. each keyword. – Same Q/A paradigm as before.

• Buying a home: several criteria – price, location, area, #BRs, school district. ORDER BY query in SQL.

• Finding hotels while traveling.

Page 4: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 4

Why ranking? 3/3• Data stream, e.g., of network flow data: “find 10

users with the max. BW consumption and max. #packets communicated”. – score may be complex aggregation of these two measures.

• In a social net, find 5 items tagged as most relevant to “lawn mowing” and blonging to users socially close to the seeker.

• And now, find top-k recs (recommender systems). • etc. • Fagin et al. – pioneering papers PODS’96, 01,

JCSS 2003. Burgeoned into a field now. • Focus on middleware algorithm, which given a score

combo. function, computes top-k answers by probing diff. subsystems (or ranked lists).

Page 5: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 5

Computational model • Naïve method. • How to compute top-K efficiently? • Access methods:

– Sorted access (sequential access) [SA]. – Random access [RA].

• Diff. optimization metrics: – Overall running time of algorithm. – SA < RA: minimize RAs. – RA not possible#: avoid RAs. – Combined optimization.

• Has led to a variety of algorithms. • Memory vs. disk model. • For the most part, assume score agg. is a monotone function;

use SUM in examples.

#: typical in IR systems.

Page 6: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 6

Fagin’s Algorithm (FA) • m lists sorted by descending scores. • Access (SA) all lists in parallel.

– For each new object seen, fetch scores from other lists by RA. Overall score t(x) = t(x1, …, xm). Store (obj, score) in set Y.

– Remember each object seen (under SA) in all lists in set H.

• Repeat until |H| >= K. • Sort Y in descending order of scores, breaking

ties arbitrarily, and output top K.

Page 7: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 7

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

Answers seen in >=1 list, i.e., Y

unsorted.

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen (under SA) in all 4 lists, i.e., H.

Page 8: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 8

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

Page 9: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 9

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

3.30

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

Page 10: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 10

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

3.30

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

2.65

Page 11: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 11

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

3.30

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

2.65

3.40

3.05

Page 12: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 12

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

Page 13: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 13

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

2.55

Page 14: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 14

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

2.55

H

Page 15: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 15

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

2.55

H, G

Page 16: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 16

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

2.55

H, G, B, C

2.05

|H| = 4.

Page 17: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 17

FA Example concluded • A, F – not seen in any list. Yet, we are sure they

can’t make it to top-4. Why? • Based on where the cursors are now, what’s the

max. possible score for A, F? • What assumptions are being made about t()? • FA is shown to be optimal with very high

probability [Fagin: PODS 1996]. • But can be beaten by other algorithms on

specific inputs. • What about buffer size?

Page 18: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 18

Threshold Algorithm

• Do parallel SA on all m lists. • For each object x seen under SA in a list, fetch its

scores from other lists by RA and compute overall score.

• If |Buffer| < K add x to Buffer; • Else if score(x) <= k-th score in buffer, toss;

• Else replace bottom of buffer with (x, score(x)) & resort.

• Stop when threshold <= k-th score in buffer. • Threshold := t(worst score seen on L1, …, worst

score seen on Lm). • Output the top-K objects & scores (in buffer).

Page 19: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 19

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Page 20: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 20

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Page 21: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 21

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar:

x1 x2 x3 x4 0.95 1.00 0.95 1.00

Page 22: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 22

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T = 3.90.

x1 x2 x3 x4 0.95 1.00 0.95 1.00

3.40

3.05

2.65

Page 23: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 23

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T=3.60.

x1 x2 x3 x4 0.90 0.95 0.80 0.95

3.40

3.05

2.65 X

3.05 X

3.15

Page 24: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 24

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T=3.30.

x1 x2 x3 x4 0.85 0.85 0.70 0.90

3.40

3.05

2.65 X

3.05 X

3.15

2.55 X

Page 25: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 25

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T=3.10.

x1 x2 x3 x4 0.80 0.80 0.65 0.85

3.40

3.05

2.65 X

3.05 X

3.15

2.55 X

Page 26: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 26

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T=2.90. ==> can stop!

x1 x2 x3 x4 0.75 0.75 0.60 0.80

3.40

3.05

2.65 X

3.05 X

3.15

2.55 X

Page 27: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 27

TA Remarks

Page 28: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

TA is Instance Optimal

04/21/23 28

Page 29: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

TA IO Proof (contd.)

04/21/23 29

Page 30: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

Proof (contd.)

04/21/23 30

Page 31: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

Proof (contd.)

04/21/23 31

Page 32: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

Proof (contd.)

04/21/23 32

Page 33: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

Proof (concluded)

04/21/23 33

Page 34: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 34

No Random Access Algorithm• What if RA > SA or RA wasn’t allowed? • Do SA on all lists in parallel. At depth d:

– Maintain worst scores x1, …, xm. – x any object seen in lists {1, …, i}.

• Best(x) = t(x1, …, xi, xi+1, …, xm). • Worst(x) = t(x1, …, xi, 0, …, 0).

– TopK contains K objects with max worst scores at depth d. Break ties using Best. M = k-th Worst score in TopK.

– Object y is viable if Best(y) > M. • Stop when TopK contains >=K distinct objects

and no object outside TopK is viable. Return TopK.

Page 35: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 35

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[0.95, 3.90]

[1.00, 3.90]

[0.95, 3.90]

[1.00, 3.90]

Page 36: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 36

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[0.95, 3.65]

[1.80, 3.65]

[1.90, 3.75]

[1.00, 3.65]

[0.90, 3.60]

[0.95, 3.60]

Page 37: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 37

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[1.85, 3.40]

[1.80, 3.55]

[1.90, 3.65]

[1.85, 3.40]

[0.90, 3.35]

[1.80, 3.35]

[0.70, 3.30]

Page 38: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 38

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[3.30, 3.30]

[1.80, 3.45]

[2.70, 3.55]

[1.85, 3.30]

[1.75, 3.20]

[1.80, 3.25]

[0.70, 3.15]

Page 39: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 39

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[3.30, 3.30]

[1.80, 3.35]

[2.70, 3.50]

[2.60, 3.20]

[1.75, 3.10]

[3.15, 3.15]

[1.50, 3.00]

Page 40: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 40

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[3.30, 3.30]

[1.80, 3.20]

[3.40, 3.40]

[2.60, 3.15]

[3.05, 3.05]

[3.15, 3.15]

[1.50, 2.95]

[0.70, 2.70]

Page 41: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 41

NRA Features

• What sort of t() do we need to assume, for NRA to work correctly?

• How large can the buffers get?

• How does the amount of bookkeeping compare with TA?

• NRA is instance optimal over algo’s not making RA (and of course, not making wild guesses).

Page 42: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 42

Combined optimization

• What if we are told cost(RA) = .cost(SA)?

• Can we find algo’s better than NRA and TA in this case?

• Combined algorithm = CA. (See Fagin et al.’s paper for details.)

Page 43: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 43

Worrying about I/O cost

• Based on Bast et al. VLDB 2006.

• Inverted lists of (itemID, score) entries in desc. score order, as usual, but on disk.

• Blocks sorted by itemID; across blocks still in desc. score order.

Inverted Block Index (IBI) Algorithm.

• What is an IBI?

Page 44: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 44

A Motivating Example List 1 List 2 List 3 Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3 · Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·

Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]unseen: ≤ 2.4

Page 45: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 45

A Motivating Example List 1 List 2 List 3

Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3

· Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·

Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]

unseen: ≤ 2.4

Round 2 (SA on 1,2,3)Doc17 : [1.5 , 2.0]Doc25 : [0.7 , 1.6]Doc83 : [0.9 , 1.6]

unseen: ≤ 1.4

Page 46: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 46

A Motivating Example List 1 List 2 List 3

Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3

· Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·

Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]

unseen: ≤ 2.4

Round 2 (SA on 1,2,3)Doc17 : [1.5 , 2.0]Doc25 : [0.7 , 1.6]Doc83 : [0.9 , 1.6]

unseen: ≤ 1.4

Round 3 (SA on 2,2,3!)Doc17 : [1.5 , 2.0]Doc83 : [1.4 , 1.6]

unseen: ≤ 1.0

Page 47: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 47

A Motivating Example List 1 List 2 List 3

Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3

· Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·

Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]

unseen: ≤ 2.4

Round 2 (SA on 1,2,3)Doc17 : [1.5 , 2.0]Doc25 : [0.7 , 1.6]Doc83 : [0.9 , 1.6]

unseen: ≤ 1.4

Round 3 (SA on 2,2,3!)Doc17 : [1.5 , 2.0]Doc83 : [1.4 , 1.6]

unseen: ≤ 1.0

Round 4 (RA for Doc17)Doc17 : 1.7

all others < 1.7done!

Note deviation from round-robin.

Page 48: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 48

IBI Algorithm

• Same setting as NRA/CA, except use IBI. • Maintain two lists: Top-K items (T = d1, …, dk) and

StillHaveASHot (SHASH) (S = dk+1, …, dk+q) items. • Pos_i = curr cursor position on list Li. • high_i = score in Li at curr cursor position (upper bounds

score of unseen items). • For items d in S:

– Which attr scores are known E(d). – Which attr scores are unknown E~(d). – Worst(d) = total score from E(d). – Best(d) = Worst(d) + {high_i(d) | i E~(d)}. (Exactly as Fagin.)

Page 49: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 49

IBI Algorithm (contd.)

• In each round, compute: – min-k = min{Worst(d) | d T}. – bestscore that any unseen doc can have = sum of all high_i’s. – For dj S: def_j = min-k – worst(d_j). [denotes deficit below

qualification level for top-k.] • T sorted in desc. Worst(); S sorted in desc. Best().

[sorting on (score, ItemID) for fast processing.] • Invatiant: min-k >= max{Worst(d) | d S}. • Termination: when min-k >= max{Best(d) | d S}. • Can remove an obj from S whenever its Best <= min-k.

stop when S = {}. • Early termination AND minimal bookkeeping are BOTH

important for performance.

Page 50: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

04/21/23 50

More on IBI Framework

• Instead of scheduling SAs using RR, use a differential approach for diff. lists based on expected score reductions at future cursor positions (Knapsack).

• Do SA*RA*.

• Order RAs based on estimated Prob[dj can get into top-k answers].