Top Banner
Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments Xiao Zhang 1 , Wang-Chien Lee 1 , Prasenjit Mitra 1, 2 , Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science and Technology The Pennsylvania State University 3 School of Information Systems, Singapore Management University EDBT, Nantes, France, 03/28/2008
35

Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Dec 17, 2015

Download

Documents

Lorena Holland
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Processing Transitive Nearest-Neighbor

Queries in Multi-Channel Access Environments

Xiao Zhang1, Wang-Chien Lee1, Prasenjit Mitra1, 2, Baihua Zheng3

1 Department of Computer Science and Engineering2 College of Information Science and Technology

The Pennsylvania State University3 School of Information Systems, Singapore Management University

EDBT, Nantes, France, 03/28/2008

Page 2: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Background Problem Analysis New TNN Algorithms Optimization Experiments Conclusions & Future Work

Outline

Page 3: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

What is TNN?◦ S is a set of

banks◦ R is a set of

restaurants◦ TNN distance =

5+1 = 6

Background – TNN

Page 4: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

What is TNN? Given a query point p and two datasets S and R, TNN

returns a pair of objects (s, r) such that ∀(s’, r’)∈S×R, dis(p, s) + dis(s, r) ≤ dis(p, s’) + dis(s’, r’)

where dis(p,s) is the Euclidean distance between p and s.

Background – TNN

First proposed by Zheng, Lee and Lee [1].

[1] B. Zheng, K.C.Lee and W.-C.Lee. Transitive nearest neighbor search in mobile environments. SUTC 2006

Page 5: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Server has all the data and broadcasts data in forms of radio signals in channels.

Mobile clients (cell phones and PDAs) tune in to broadcast channels, download necessary data and process queries.

Background - broadcast

Broadcast VS. on-demand◦ Support an arbitrary

number of mobile devices to have simultaneous access

◦ Efficient use of limited bandwidth

◦ Light workload on the server side

Page 6: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Assumption:◦ Zheng, Lee and Lee assumed a single broadcast

channel.◦ Based on existing technology (dual-mode, dual-

standby cell phone), we assume multiple channels.◦ A mobile client can access information in multiple

channels simultaneously Challenges:

◦ How to utilize the parallel processing ability of mobile clients to facilitate query processing?

◦ How to reduce access time?◦ How to reduce energy consumption?

Background - motivation

Page 7: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

1. We developed two new algorithms for TNN query in multi-channel access environment.

2. We proposed two new distance metrics (MinTransDist and MinMaxTransDist) so that our new algorithms efficiently reduce search cost.

3. We proposed an optimization technique to reduce energy consumption.

Our contributions:

Page 8: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

1. Two broadcast channels, for S and R 2. 2-dim points 3. Air-indexing: R-tree[2]

4. Broadcast in depth-first order, in order to avoid back-tracking

5. (1, m) interleaving [3]

6. performance metrics (in # of pages): ◦ Access time◦ Tune-in time

Background – settings

[2] A. Guttman. R-trees: a dynamic index structure for spatial searching. inSigmod’84[3] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997

Page 9: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Problem Analysis Randomly choose

ANY pair of objects (s’, r’ ), use the trans. dist. as a search range

Guarantee to enclose the answer pair (s, r)

Page 10: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Theorem[1]: ◦ the transitive distance determined by any pair of

objects (s, r) is an upper bound. General ideas of answering TNN queries:

◦ Estimate: find a search range from the query point p by searching the index

◦ Filter: filter unqualified data objects in the search range determined earlier to find the pair of objects with minimum transitive distance.

Problem Analysis

Page 11: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Deficiencies of existing algorithms:◦ Approximate-TNN-Search:

Uses an equation to estimate the search range in the first step

Search range may be too large or too small◦ Window-Based-TNN-Search:

Two sequential NN searches in estimation step Search range estimation is done in sequential order Large access time

Problem Analysis

Page 12: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Algo 1: Double-NN-Search◦ Issue two NN queries in estimation step◦ p’s NN in S, and p’s NN in R◦ (s1, r2)

New TNN algorithms – algo1

Page 13: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Hybrid-NN-Search◦ Increases interaction between two channels◦ Uses result of the finished NN to guide the

unfinished NN in order to reduce search range◦ Uses new distance metrics to perform branch-

and-bound◦ Treat TNN distance as a whole

New TNN Algorithms – algo2

Page 14: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

NN in Channel 1 finishes first Already found s=p.NN(S) Looking for r2, instead of r1

New TNN Algorithms – algo 2

Page 15: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

NN in channel 2 finishes first Already found r=p.NN(R) Looking for s2 instead of s1

Use new criteria when searching the index Need new distance metrics for

branch&bound

New TNN Algorithms – algo 2

Page 16: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

MinTransDist: ◦ Lower bound for trans. dist. from p to an MBR to r.

MinMaxTransDist:◦ Upper bound for trans. dist. from p to an MBR to r.

Details given in the paper.

New TNN Algorithms – algo 2

Page 17: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Algorithm description:◦ If the two NN searches in both channels are not

finished, follow the Double-NN algorithm◦ If the NN search in Channel 1 (Dataset S) finishes

first, let s=p.NN(S), use s as the new query point and perform NN on the remaining portion of R-tree for dataset R.

◦ If the NN search in Channel 2 (Dataset R) finishes first, change distance metrics, use MinTransDist and MinMaxTransDist to perform branch-and-bound. Find an s which can minimize the transitive distance.

New TNN Algorithms - Hybird

Page 18: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Updating and pruning strategy◦ Use queue to keep potential MBRs, sorted based

on their arrival time◦ Case 2 (s=p.NN(S) finishes first):

Switch NN query point to the s Initial upper bound update

If there is an intermediate result r’, update the upper bound with dis(p, s)+dis(s, r’ )

Scan the queue of MBRs and use dist. metr. in traditional NN queries.

New TNN Algorithms - Hybrid

Page 19: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Updating and pruning strategy (cont.)◦ Case 3 (r=p.NN(R) finishes first):

If there is an intermediate result s’, use dis(p, s’)+dis(s’, r) as the new upper bound

Then scan all the MBRs in the queue, usez=minMi∈MBR_queue{MinMaxTransDist(p, Mi, r)} to update the upper bound.

In traversal, use MinMaxTransDist to update the upper bound; use MinTransDist for pruning

New TNN Algorithms - Hybrid

Page 20: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Example for pruning:

New TNN Algorithms - Hybrid

Page 21: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Goal: reduce energy consumption Analysis:

◦ Previous algorithms minimize the search range in the Estimate Step by issuing “exact” search

◦ Energy consumption in Filter Step is low◦ Energy consumption in Estimate Step is high

Approach: ◦ use “approximate” search in Estimate Step to

save energy in this step

Optimization

Page 22: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Approximate Search:◦ Relax the pruning condition◦ Use ratio of overlapping area to estimate the

probability◦ Compare the ratio with a threshold α

Optimization

Page 23: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

How to determine α?

factors:◦ R-tree height and node depth

Use small α on the root and large α on leaves

◦ Difference in densities of the two datasets involved Small α or 0 on the dataset with smaller density

Optimization

α0 1exact search

approximate search

Page 24: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Dataset 1:◦ 39,000 * 39,000 square region◦ Densities: 10-7.0, 10-6.6, 10-6.2, 10-5.8, 10-5.4, 10-5.0, 10-

4.6, 10-4.2

◦ # of points: 152, 382, 960, 2411, 6055, 15210, 38206, 95969

Dataset 2:◦ 39,000 * 39,000 square region◦ # of points: 2,000 – 30,000 with 2,000 increment

Performance Evaluation - settings

Page 25: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

R-tree as air index Broadcast in depth-first order STR packing algorithm [3]

(1, m) interleaving [2]

1,000 query points generated for each of the experiments

Performance Evaluation - settings

Parameter Size

Index pointer 2 bytes

Coordinate 4 bytes

Data content 1k bytes

Page capacity 64 – 512 bytes

[3] S.Leutenegger, M.Lopez and J.Edginton. Str: a simple and efficient algorithm for r-tree packing. ICDE 1997[2] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997

Page 26: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Algorithms with exact search:◦ Access time: Double-NN and Hybrid-NN have the

same access time, which is smaller than Window-Based

◦ 1.8 ≥ size(S) / size(R) ≥ 1 / 40

Performance Evaluation

Page 27: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Algorithms with exact search:◦ Tune-in time: when 0.01 ≤ size(S)/size(R) ≤ 0.4

Hybrid-NN gives the best tune-in time

Performance Evaluation

Page 28: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

ANN vs. eNN◦ Improvement in tune-in time ranges from 11%-

20%

Performance Evaluation

Page 29: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Hybrid algorithm with ANN:

Performance Evaluation

Page 30: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Double-NN and Hybrid-NN effectively reduce access time

Cases in which our algorithms reduces tune-in time are stated and discussed

Optimization technique effectively reduces tune-in time of all three algorithms

Conclusions

Page 31: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Generalized TNN queries in broadcast environment:◦ More than 2 datasets are involved◦ Visiting order not specified◦ Complete route query

Using new distance metrics in disk based environment

Future Work

Page 32: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Any questions?

Thank you!

Page 33: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Def 1: (MinTransDist)◦ Given two points p and r, and an MBR MS, MinTransDist(p,

MS ,r) finds a point s on MS such that MinTransDist(p, MS ,r)=dis(p, s)+dis(s, r) and for any point s’≠ s, s’ ∈MS

dis(p, s’)+dis(s’, r) ≥ MinTransDist(p, MS ,r)

New TNN Algorithms – distance metrics (backup slides)

Page 34: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Def 2: (MaxDist)◦ Given two points p and r, and a line segment ℓ,

MaxDist(p, ℓ, r) = maxi=I,2 {dis(p, vi)+dis(vi, r), where vi, (i=1, 2) are the two end points of ℓ

◦ MaxDist(p, ℓ, r) gives a tight upper bound for all the transitive distances from p to any points on ℓ, to r.

New TNN Algorithms – distance metrics (backup slides)

p

r

Page 35: Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Def 3: (MinMaxTransDist)◦ Given two points p and r, and an MBR MS,

MinMaxTransDist(p, MS, r) = min1≤i≤4{ MaxDist(p,ℓi, r ) } where ℓi (1≤i≤4) are the four sides of MBR MS

Lemma:◦ Given a starting point p, an ending point r, and an

MBR MS enclosing a point dataset S, ∃s ∈ S, such that dis(p, s)+dis(s, r) ≤ MinMaxTransDist(p, MS, r)

New TNN Algorithms – distance metrics (backup slides)