Top Banner
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon Bakiras Dimitris Papadias Presenter: Kamiru
39

The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

Dec 14, 2015

Download

Documents

Lorin Howard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong Kong

Department of Computer Science

Continuous Monitoring of Top-k Queries over Sliding Windows

Authors: Kyriakos Mouratidis, Spiridon Bakiras Dimitris PapadiasPresenter: Kamiru

Page 2: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Outline

Motivation Problem Setting Related Works

Top-k Queries Skyband

Solutions Top-k Computation Maintenance Module Skyband Monitoring Algorithm

Experimental Evaluation Conclusion Future Works

Page 3: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Motivation

We define the top-k query first: Given a dataset P and a preference function f, a top-k

query retrieves the k tuples in P with the highest scores according to f.

One real life application is: find the top 5 hotels with the following preference functionf(hotel) = -hotel.price + hotel.quality

Page 4: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Motivation

Existing methods are not applicable to streaming environment

The internet traffic flow monitoring is one real life application for the streaming case. The data on the internet have very high data rate Each tuple may include

• Source IP address, destination IP address, start time, end time, MTU, TTL…etc.

Page 5: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Motivation

The availability of such records traffic estimation network security troubleshooting

For instance, top-k query helps the system to prevent the DDoS (Distributed Denial of Service) attack if it monitors the top-k flows with the largest individual throughput in real time

Page 6: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Motivation

The server 155.223.2.4 has higher chance to have DDoS attack than 155.223.2.3 on this network.

NoPackets destination ip

11 155.223.2.4

22 155.11.5.6

2 155.223.2.1

NoPackets destination ip

32 155.213.2.4

2 155.11.5.6

NoPackets destination ip

12 155.213.2.3

2 155.11.5.2

50 155.223.2.4

155.223.2.4 155.223.2.3

Page 7: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Problem Setting

A function f is increasingly monotone on dimension xi if for any pair of tuples (points) p1, p2 with

p1.xi≥p2.xi and p1.xj=p2.xj j!=i

we have

score(p1)≥score(p2),

where score(pi)=f(p1.x1,…,pn.xn)

The decreasingly monotone can be defined as the same with the reverse operation (≤).

Page 8: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Problem Setting

Notice that a function may be increasingly monotone on some dimensions, and decreasingly monotone on the remaining.

For instance,

f(p)=p.x1–p.x2,

f is increasingly monotone on x1 and decreasingly monotone on x2

x1

x2

f has higher valuef has higher value

f has lower valuef has lower value line defined by f=x1-x2

line defined by f=x1-x2

a

b

Page 9: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Problem Setting

Problem definition:

Given a set of queries Q and a set of points P. The top-k results (Rq) of query qQ are

{Rq | |Rq|=k, f(ri)>f(rj)},

which riRq, rjRq

For each timestamp, update the new arrival objects Pins

remove the objects which are expired Pdel

outputs the top-k results for each query qQ to the remaining P

Page 10: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Related Works – Top-k query computation

Several existing methods solve the top-k calculation in various scenarios.

They focus on computing the top-k results from multiple data repositories.

Fagin et. al. introduce two efficient methods for processing ranked queries: Threshold algorithm (TA) No Random Access algorithm (NRA)

Page 11: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

TA and NRA

Both methods need to do sorted access in parallel to each of the m sorted lists Si

which m is the number of inputs (attributes), the data in domain i are stored into Si

Descending order is used to scan the data points from all Si

Page 12: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

TA and NRA

As an object o is seen in input Si

TA do random access to the other lists to find the grade xi

of object o in every list Si. Then compute the value of function f.

NRA does not access to other list. Instead of compute the

value of function f, it just updates two bounding attributes.

Both algorithms stop when top-k result is large than threshold T

Page 13: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Example of TA and NRA

Assume that we have 3 ranked inputs, and 5 records (a~e) in our database, find the top-1 query with the preference function f=SUM by TA and NRA.

Page 14: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Example of TA and NRA

TA First loop Get object c, compute f(c)=0.9+0.2+0.9=2

• Update result R={(c,2)}

• Threshold value T=0.9+∞+∞=∞>Rk.value, continue

Get object a, compute f(a)=0.1+0.9+0.8=1.8• Do not update the results since Rk.value>1.8

• Threshold value T=0.9+0.9+∞=∞>Rk.value, continue

Get object c, do not compute f• Threshold value T=0.9+0.9+0.9=2.7>Rk.value,

continue

Second loop, … Until T<Rk.value

S1

c 0.9

d 0.8

b 0.6

e 0.3

a 0.1

S2

a 0.9

b 0.8

e 0.6

d 0.4

c 0.2

S3

c 0.9

a 0.8

b 0.6

d 0.6

e 0.5

Page 15: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Example of TA and NRA

NRA maintains the objects whose upper rub and lower rlb bound of their aggregate score

For initial setting, if the range of value is [0,1] rlb = {0,0,0,0,0}, rub = {∞,∞,∞,∞,∞}

Page 16: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Example of TA and NRA

NRA Get object c (0.9), a (0.9), and c (0.9) from S1, S2, and S3

• rlb = {0.9,0,1.8,0,0}– Update newly accessed objects

– Update ralb=0.9+ra

lb=0.9

• rub = {2.7,0,2.7,0,0}– Update objects which have been seen so far

– e.g. update raub = 0.9+0.9+0.9 = 2.7

• R = {(c,1.8)}• t = min{rx

lb:xR} = 1.8• u = max{rx

ub:xR} = 2.7• if t<u then repeat, otherwise, leave

Get object d (0.8), b (0.8), and a (0.8) from x1, x2, and x3

• …

S1

c 0.9

d 0.8

b 0.6

e 0.3

a 0.1

S2

a 0.9

b 0.8

e 0.6

d 0.4

c 0.2

S3

c 0.9

a 0.8

b 0.6

d 0.6

e 0.5

Page 17: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

LARA

Mamoulis proposed the LARA (Lattice-based Rank Aggregation) algorithm which is an optimized NRA method

LARA separates the algorithm into two phases Growing phase

• If t=min{rxlb:xR}<T, it is impossible to attempt any pruning.

• T is the sum of possible values from all inputs. In the above example, T=2.7 after the first loop.

Shrinking phase• If an object o is not seen in growing phase, then o is not a result of

the query• rub value only store to the lattice nodes instead of storing to object

itself

• Avoid a lot of updates to objects which have seen so far

S1S2S3

S1S2 S1S3 S2S3

S3S2S1

Page 18: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Conclusion of Top-k query computation

The performance NRA should be better than TA in conventional database, since it avoids a lot of random accesses.

The performance of LARA is much better than NRA which is shown on their experiments.

Page 19: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Related Works – Skyband

The skyline is the points which are not dominated by any point A record pi is said to dominate another pj, if and only if, pi is

preferable to pj on every attribute The skyline of a dataset contains all tuples that belong to the

result of any top-1 query with a monotone function. The k-skyband contains the tuples that are dominated by at

most k-1 other points

p1

p2

p3

p4

p7

p6

p5 skyline

2-skyband

Page 20: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Related Works – Skyband

The skyband is used to monitor the top-k results in score-time space.

Assume that we want to monitor the top-2 results in the following example:

score

expiration time

p1

p2

p3

p4

p5

score

expiration time

p1

p2

p3

p4

p5

{p1,p2}

{p1,p4}

{-}

{p1,p3} {p4}

Page 21: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Top-k computation

Grid-based indexing method is usedFor each cell c in grid G, maxscore(c) is the

maximum possible value in cell cFor each query q

Start from:• The algorithm starts from the c which has highest maxscore(c)

Terminate condition:• The search terminates when the cell c under

consideration has maxscore(c) Rk.value

Page 22: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Top-k computation

An example is given to explain how the top-k computation works.

Assume that we have two inputs (x1 and x2) and a function f=x1+2x2

The highest maxscore(c) is c4,4 maxscore(c)=f(P) Scan c4,4

Next scanning cell is c3,4

maxscore(p’)>maxscore(p’’) …

Until maxscore(c)Rk.value

c4,4

c1,1

c3,4

PP’

P’’

P’’’

P’’’’p1

p2

p3

Page 23: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

The maintenance module

Given two datasets: Pins and Pdel

For all pPins

Insert p into the corresponding cell c For all q who visited c,

• Insert into q.R if f(p)q.Rk.value

For all pPdel

Delete p from the corresponding cell c For all q who visited c,

• If pq.R, mark q as affected

Page 24: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

The maintenance module

For each affected query q, Invoke Top-k Computation(q) For all c which are not scanned by Top-k Computation(q)

• Delete q from c.visitedquery

Page 25: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Example of maintenance module

q:f=x1+2x2, find top-1 result

Timestamp1

Pins={p3,p4}, Pdel={p1,p2}

Timestamp2

Pins={p5}, Pdel={p3}

p1

p2

p3

p4

p5

Page 26: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Summary of the maintenance module

Insertion does not invoke any top-k re-computationDeletion has more higher cost than insertion

Affected query need to do• Top-k computation

• Update the cells which are not scanned by top-k computation, the worst case is |cell|

Page 27: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Skyband Monitoring Algorithm

I demonstrate how to use the k-skyband to monitor the results in score-time space in previous slide

The dominance counter (DC) can be used to get the k-skyband DC is the number of records with higher score that

expire after p score

expiration time

p1

p2

p3

p4

p5

01

10

4

p6

Monitoring a top-2 queryMonitoring a top-2 query

22

15

0

Page 28: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Skyband Monitoring Algorithm

The computation of dominance count can be calculated by a balance tree (BT)

The expiration time of every processed element of q.skyband is stored into a balanced tree BT sorted in descending order The order of insertion is in descending score order

p.DC is simply the number of tulples that precede p in BTscore

expiration time

p1

p2

p3

p4

p5

p1

p2

Balance treeBalance tree

p3

p1 p2

01

10

4p4

p5

Page 29: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Skyband Monitoring Algorithm

Given two datasets: Pins and Pdel

For all pPins

Insert p into the corresponding cell c For all q who visited c,

• If f(p)q.Rk.value– Insert p into q.skyband and p.DC=0– For each p’ in q.skyband with f(p’)f(p)

» Update p’.DC=p’.DC+1» If p’.DC=k evict p’ from q.skyband

Page 30: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Skyband Monitoring Algorithm

For all pPdel

Delete p from the corresponding cell c For all q who visited c,

• If pq.R, delete p from q.skyband

For all q whose skyband has changed If q.skyband has at least k points

•q.R=top-k(q.skyband) Else

• Invoke Top-k Computation(q)• Compute dominance counters

Page 31: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Experimental Evaluation

They evaluate the proposed methods using streams of both independent (IND) and anti-correlated (ANT) datasets

IND (d=2)IND (d=2) ANT (d=2)ANT (d=2)

Page 32: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Experimental Evaluation

Default experimental setting Data dimensionality (d): 4 Data cardinality (N): 1M Arrival rate (r): 10K Query cardinality (Q): 1K Result cardinality (k): 20

Page 33: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Experimental Evaluation

Page 34: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Experimental Evaluation

Page 35: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Experimental Evaluation

Page 36: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Conclusions

The top-k computation module processes the minimum number of cells

Proposed two monitoring algorithms TMA and SMA

TMA re-computes the result from scratchSMA maintains a superset of the current answer in

the form of k-skybandIn the experimental evaluation, SMA shows that it

overcomes other proposed solutions

Page 37: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

Future works

Non-monotone preference functionQueries support various dimensionality

Cluster the queries to make a super query SQ, and monitor the results for these superset of queries

Page 38: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong Kong

Department of Computer Science

Thank you for your attention!

PS. Hope I can show this page on the time!

Page 39: The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

The university of Hong KongDepartment of Computer Science

References