Cheng, Chen, Chen, Xie Evaluating Probability Threshold k-Nearest-Neighbor Queries over Uncertain Data Reynold Cheng (University of Hong Kong) Lei Chen (Hong Kong University of Science &Tech) Jinchuan Chen (Hong Kong Polytechnic University) Xike Xie (University of Hong Kong) International Conference on Extending Database Technology 2009
34
Embed
Cheng, Chen, Chen, Xie Evaluating Probability Threshold k- Nearest-Neighbor Queries over Uncertain Data Reynold Cheng (University of Hong Kong) Lei Chen.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Cheng, Chen, Chen, Xie
Evaluating Probability Threshold k-Nearest-Neighbor Queries over Uncertain Data
Reynold Cheng (University of Hong Kong)Lei Chen (Hong Kong University of Science &Tech)Jinchuan Chen (Hong Kong Polytechnic University)Xike Xie (University of Hong Kong)
International Conference on Extending Database Technology 2009
Inherent in various applicationsLocation-based services (e.g., using GPS,
RFID) [TDRP98, SSDBM99]Natural habitat monitoring with sensor
networks [VLDB04a]Biomedical and biometric databases[ICDE06,
ICDE07]
Cheng, Chen, Chen, Xie
Attribute Uncertainty Model [TDRP98,ISSD99,VLDB04b]
pdf
y
PreciseLocation
x
yL
CloakedLocation
x
y
probabilitydensity function
U
(pdf)
Uncertainty region
We represent an uncertainty pdf as a histogram
Cheng, Chen, Chen, Xie
k-NN Queries
k-NN Query over Precise Data- application in LBS [VLDB03]- natural habitat monitoring system [VLDB04a]- network traffic analysis [ICDCS07]- pattern matching in CAM [VLDB04c]
k-NN over Uncertain Objects- [VLDB08a] ranks the probability each object is the NN of the query point.- [ICDE07a] use expected distance and does not discuss the probability.
Probability Threshold k-Nearest-Neighbor Query (T-k-PNN)
INPUT
1. A query point q, parameter k, threshold T
2. A set of n objects with uncertainty regions and pdfs
OUTPUT A number of k-subset
p(S) is the qualification probability of the k-subset S})(|||{ TSpkSDSS
}...,{ 21 nOOOD
Cheng, Chen, Chen, Xie
Example of a k-PNN query (k=3)
{O1, O2 , O3}
{O1, O2 , O4}
O2
O3
O1
O4
O5
O6
O7
O8
q
Cheng, Chen, Chen, Xie
Example of a k-PNN query (k=3)
O2
O3
O1
O4
O5
O6
q
{O1, O2, O3} {O1, O2, O4}
…
{O6, O7, O8}
5683 C
k-bound
2063 C
{O1, O2, O3} {O1, O2, O4}
…
{O4, O5, O6}
O7
O8
Cheng, Chen, Chen, Xie
k-bound Filtering (k=3)
O2
O3
O1
O4
O5
O6
q
k-bound
O7
O8
f1
f2
f3
fk (k-bound): is the k-th minimum maximum distance
Since min(r7)> f3, O7 can not be 3-NN of q. Because there are always 3 objects with distances smaller than f3.We apply k-bound filtering on an index (e.g. R-tree) to prune unqualified objects.
Long Beach (53k)(http://www.census.gov/geo/www/tiger/)
Uncertainty pdf Uniform (default)Gaussian (represented by histograms)
Threshold (T) 0.1
k 6
Cheng, Chen, Chen, Xie
1. k-bound Filtering
Cheng, Chen, Chen, Xie
2. Performance of GVR
Cheng, Chen, Chen, Xie
3. k-subset Generation
Cheng, Chen, Chen, Xie
3. k-subset Generation
Cheng, Chen, Chen, Xie
4. Verification and Refinement
Cheng, Chen, Chen, Xie
5. Time Analysis
Cheng, Chen, Chen, Xie
6. Gaussian Distribution
Cheng, Chen, Chen, Xie
Conclusion
We proposed an efficient evaluation framework for T-k-PNN query
We proposed various techniques:- k-bound to filter away those unqualified objects- PCS to reduce the number of k-subsets- verification/refinement methods to avoid exact calculation
Future Work- extend the techniques to other queries
Cheng, Chen, Chen, Xie
Reference [TDRP98] P. A. Sistla, O. Wolfson, S. Chamberlain, and S. Dao,“Querying the uncertain position of moving objects,” in Temporal
Databases: Research and Practice, 1998. [SSDBM99] D.Pfoser and C. Jensen, “Capturing the uncertainty of moving-objects representations,” in Proc. SSDBM, 1999. [VLDB04a] A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong, “Model-driven data acquisition in sensor networks,” in
Proc. VLDB, 2004. [ICDE06] C. Böhm, A. Pryakhin, and M. Schubert, “The gauss-tree: Efficient object identification in databases of probabilistic feature
vectors,” in Proc. ICDE, 2006. [ICDE07a] V. Ljosa and A. K. Singh, “APLA: Indexing arbitrary probability distributions,” in Proc. ICDE, 2007. [SIGMOD03] R. Cheng, D. Kalashnikov, and S. Prabhakar, “Evaluating probabilistic queries over imprecise data,” in Proc. ACM SIGMOD,
2003. [ICDE07b] J. Chen and R. Cheng, “Efficient evaluation of imprecise location-dependent queries,” in Proc. ICDE, 2007. [VLDB06a] M. Mokbel, C. Chow, and W. G. Aref, “The new casper: Query processing for location services without compromising privacy,”
in VLDB, 2006. [TKDE92] D. Barbara, H. Garcia-Molina, and D. Porter, “The management of probabilistic data,” TKDE, vol. 4, no. 5, 1992. [VLDB04b] N. Dalvi and D. Suciu, “Efficient query evaluation on probabilistic databases,” in VLDB, 2004. [VLDB06b] P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, “Trio: A system for data,
uncertainty, and lineage,” in VLDB, 2006. [VLDB03] G. Iwerks, H. Samet, and K. Smith, “Continuous k-nearest neighbor queries for continuously moving points with updates,” in
Proc. VLDB, 2003. [ICDCS07] S. Ganguly, M. Garofalakis, R. Rastogi, and K. Sabnani, “Streaming algorithms for robust, real-time detection of ddos attacks,”
in ICDCS, 2007. [AKDDM96] U. Fayyad, G. Piatesky-Shapiro, P. Smyth, and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining. AAAI
Press/MIT Press, 1996. [VLDB04c] N. Koudas, B. Ooi, K. Tan, and R. Zhang, “Approximate NN queries on streams with guaranteed error/performance bounds,” in
Proc. VLDB, 2004. [VLDB08a] G. Beskales, M. Soliman, and I. Ilyas, “Efficient search for the top-k probable nearest neighbors in uncertain databases,” in
VLDB, 2008. [VLDB06c] O. Mar, A. Sarma, A. Halevy, and J. Widom, “ULDBs: databases with uncertainty and lineage,” in VLDB, 2006.
Cheng, Chen, Chen, Xie
Reference [VLDB07a] L. Antova, C. Koch, and D. Olteanu, “Query language support for incomplete information in the maybms system,” in Prof.
VLDB, 2007. [SIGMOD08a] S. Singh et al, “Orion 2.0: Native support for uncertain data,” in Prof. ACM SIGMOD, 2008. [ICDE08a] Singh et al, “Database support for pdf attributes,” in Proc. ICDE, 2008. [TKDE04] R. Cheng, D. V. Kalashnikov, and S. Prabhakar, “Querying imprecise data in moving object environments,” IEEE TKDE, vol. 16,
no. 9, Sept. 2004. [DASFAA07] H. Kriegel, P. Kunath, and M. Renz, “Probabilistic nearest-neighbor query on uncertain objects,” in DASFAA, 2007. [MUD08] Y. Qi, S. Singh, R. Shah, and S. Prabhakar, “Indexing probabilistic nearest-neighbor threshold queries,” in Proc. Workshop on
Management of Uncertain Data, 2008. [TKDE08] X. Lian and L. Chen, “Probabilistic group nearest neighbor queries in uncertain databases,” IEEE Trans. On Knowledge and
Data Engineering, vol. 20, no. 6, 2008. [ICDE08b] R. Cheng, J. Chen, M. Mokbel, and C. Chow, “Probabilistic verifiers: Evaluating constrained nearest-neighbor queries over
uncertain data,” in Proc. ICDE, 2008. [VLDB05] Y. Tao, R. Cheng, X. Xiao, W. K. Ngai, B. Kao, and S. Prabhakar, “Indexing multi-dimensional uncertain data with arbitrary
probability density functions,” in Proc. VLDB, 2005. [VLDB07b] J. Pei, B. Jiang, X. Lin, and Y. Yuan, “Probabilistic skylines on uncertain data,” in Proc. VLDB, 2007. [SIGMOD08b] X. Lian and L. Chen, “Monochromatic and bichromatic reverse skyline search over uncertain databases,” in Proc.
SIGMOD, 2008. [ICDE07c] M. Soliman, I. Ilyas, and K. Chang, “Top-k query processing in uncertain databases,” in Proc. ICDE, 2007. [SIGMOD08c] M. Hua, J. Pei, W. Zhang, and X. Lin, “Ranking queries on uncertain data: A probabilistic threshold approach,” in Proc.
SIGMOD, 2008. [VLDB08b] V. Rastogi, D. Suciu, and E. Welbourne, “Access control over uncertain data,” in Proc. VLDB, 2008. [VLDB08c] C. Koch and D. Olteanu, “Conditioning probabilistic databases,” in Proc. VLDB, 2008. [VLDB08d] R. Cheng, J. Chen, and X. Xie, “Cleaning uncertain data with quality guarantees,” in Proc. VLDB, 2008. [SIGMOD84] A. Guttman, “R-trees: A dynamic index structure for spatial searching,” Proc. of the ACM SIGMOD Int’l. Conf., 1984.