Top Banner
7/28/2019 Data Structures for Range Searching http://slidepdf.com/reader/full/data-structures-for-range-searching 1/13 Data Structures for Range Searching JON LOUIS BENTLEY Departments of Computer Sctence and Mathematics, Carnegte-Mellon Unwerslty, Pittsburgh, Pennsylvanta 15213 JEROME H. FRIEDMAN Computatmn Research Group, Stanford Lmear Accelerator Center, Stanford, Cahfornia 94305 Much research has recently been devoted to "multikey" searching problems. In this paper the partmular multlkey problem of range searching Is investigated and a number of data structures that have been proposed as solutions to this problem are surveyed. The purposes of this paper are to bring together a collection of widely scattered results, to acquaint the reader with the structures currently avadable for solving the particular problem of range searching, and to display a set of general methods for attacking multikey searching problems. Keywords and Phrases: analysis of algorithms, orthogonal range queries, range searching, cells, multidimensional binary search trees, projection CR Categorws. 3.63, 3.74, 5.25 INTRODUCTION The study of data structures for facilitating rapid searching is a fascinating subject of both practical and theoretical interest. Knuth [KNUT73] provides a definitive trea- tise on the subject of searching when the search is based on only one "key," but he points out that not much was known at the time his book was published about data structures for sets that have many "keys." This subject area, which is often called "multikey searching," "multidimensional searching," or "multiple attribute re- trieval," has been the focus of a great deal of research in the past few years. In this paper we study a small part of this area by surveying the work that has been done on one particular multikey searching problem. This problem is important in itself (having applications in such areas as database sys- Thin research was supported m part by the Office of Naval Research under Contract N00014-76-C-0370 and m part by the Department of Energy tems, statistics, and design automation) and, in addition, serves as a representative of the entire class of multikey searching problems. We need some definitions to describe this particular searching problem precisely. In database terminology a file is a collection of records, each containing several attri- butes or keys. A query asks for all records satisfying certain characteristics. An or- thogonal range query asks for all records with key values each within specified ranges (that is, each key is between speci- fied upper and lower bounds). The process of retrieving the appropriate records is called range searching. This problem can also be cast in geometric terms by regarding the record attributes as coordinates and the k values for each record as representing a point in a k-dimensional coordinate space. The file of records then becomes a point set in k-space. The intersection of the query ranges is a k-dimensional hyperrectangle in the space {that is, a "box"), and a range query calls for finding all points lying inside Permission to copy without fee all or part of this materml is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notme and the title of the pubhcatlon and its date appear, and notme is gwen that copying is by permlsmon of the Association for Computing Machinery. To copy otherwme, or to repubhsh, reqmres a fee and/or specific permission © 1979 ACM 0010-4892/79/1200-0397 $00 75 Computing Surveys, Voi. 11, No. 4, December 1979
13

Data Structures for Range Searching

Apr 03, 2018

Download

Documents

Muhedin Hadzic
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 1/13

Data S t ructures for R ange S earching

JON LOUIS BENTLEY

Departments of Computer Sctence and Mathematics, Carnegte-Mellon Unwerslty, Pittsburgh,Pennsylvanta 15213

JEROME H. FRIEDMAN

Computatmn Research Group, Stanford Lmear Accelerator Center, Stanford, Cahfornia 94305

Much research has recently been devot ed to "multikey" searching problems. In this paper

the partmular multlkey problem of range searching Is investigated and a number of data

structures that have been proposed as solutions to this problem are surveyed. The

purposes of this paper are to bring togethe r a collection of widely scat tered results, to

acquaint the reader with the structures currently avadable for solving the particular

problem of range searching, and to display a set of general metho ds for attacking multikey

searching problems.

Keywords and Phrases: analysis of algorithms, orthogonal range queries, range searching,

cells, multidimensional binary s earch trees, projection

CR Categorws. 3.63, 3.74, 5.25

INTRODUCTION

The study of data structures for facilitatingrapid searching is a fascinating subject ofboth practical and theoretical interest.

Kn ut h [KNUT73] provides a def initive trea-tise on the subject of searching when thesearch is based on only one "key," but hepoints out that not much was known at thetime his book was published about datastructures for sets that have many "keys."This subject area, which is often called"multikey searching," "multidimensionalsearching," or "multiple attribute re-trieval," has been the focus of a great deal

of research in the past few years. In thispaper we study a small part of this area bysurveying the work that has been done onone particular multikey searching problem.This problem is important in itself (havingapplications in such areas as database sys-

Thin research was supported m part by the Office ofNaval Research under Contract N00014-76-C-0370and m part by the Depart ment of Energy

tems, statistics, and design automation)and, in addition, serves as a representativeof the entire class of multikey searchingproblems.

We need some definitions to describe this

particular searching problem precisely. Indatabase terminology a f i l e is a collectionof r eco r d s , each containing several at t r i -b u t e s or keys . A q u er y asks for all recordssatisfying certain characteristics. An or-t h o g o n a l r a n g e q u e r y asks for all recordswith key values each within specifiedranges (that is, each key is between speci-fied upper and lower bounds). The processof retrieving the appropriate records is

called r a n g e s e a r c h i n g . This problem canalso be cast in geometric terms by regardingthe record attr ibutes as coordinates and thek values for each record as representing apoint in a k-dimensional coordinate space.The file of records then becomes a point setin k-space. The intersection of the queryranges is a k-dimensional hyperrec tangle inthe space {that is, a "box"), and a rangequery calls for finding all points lying inside

Permission to copy without fee all or part of this materml is granted provided that the copies are not made ordistribute d for direct commercial advantage, the ACM copyright notme and the title of the pubhcat lon and itsdate appear, and notme is gwen t hat copying is by permlsmon of the Association for Computing Machinery. Tocopy otherwme, or to repubhsh, reqmres a fee and/or specific permission© 1979 ACM 0010-4892/79/1200-0397 $00 75

Computing Surveys, Voi. 11, No. 4, December 1979

Page 2: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 2/13

398

CONTENTS

• J. L. Bentley and J. H. Fried man

I N T R O D U C T I O N

1 T H E D A T A S T R U C T U R E S

1 1 S e q u e n t m l S c a n

1 2 P r o j e c t i o n

1 3 C e l l s1 4 k - d T r e e s

1 5 R a n g e T r e e s

1 6 k - r a n g e s

1 7 O t h e r S t r u c t u r e s

1 8 C o m p a r i s o n o f M e t h o d s

2 A D D I T I O N A L W O R K

3 C O N C L U S I O N S

R E F E R E N C E S

T

this hyperrectangle. We will often castrange searching in this geometric frame-work as an aid to intuition.

Range searching arises in many applica-tions. In a geographic database of U.S.cities one might seek a list of all those withlat itude between 37 ° and 41 ° and longitudebetween 102 ° and 109 ° (defining the sta te

of Colorado). To compile an honor list ofolder students, a university administratormay wish to know those students whoseage is between 21 and 24 years and whosegrade point average is between 3.5 and 4.0.In data analysis it is often useful to doseparate analyses on sets of data lying indifferent regions (hyperrectangles) of theobservation space and then compare (orcontrast) the respective results. (At the

Stanford Linear Accelerator Center, for ex-ample, over 10 hours per week of IBM 370/168 time is devoted to this application.) Instatistics, range searching can be employedto determine the empirical probability con-tent of a hyperrectangle, to determine em-pirical cumulative distributions, and to per-form density estimation (see LOFT65).Lauther [LAuT78] describes how rangesearching can be used to solve a designautomat ion problem in very large-scale in-

tegrated circuit ry (VLSI).This paper has been written with two

distinct audiences in mind. For the expertin searching {with background either indatabase systems or theoretical computerscience), this paper is intended as a survey

that gathers together and presents in acommon terminology a number of resultstha t have recently appeared on the problemof range searching. This problem is of par-ticular interest for two reasons: First, it isan important problem in many practicalapplications (and a difficult theoretical

problem!); second, the methods that weinvestigate are broadly applicable to manyother multikey searching problems. Thesecond type of reader for whom this paperis intended is a computer scientist who issomewhat familiar with data structures forsingle-key searching, and who would like atutorial on the problem of range searching.For this reader, the methods that we dis-cuss are described on an intuit ive level, and

references are given to more precise de-scriptions elsewhere in the literature.In Section 1 of this paper we examine six

data structures for the range searchingproblem in some detail, and then brieflycompare those structures at the end of thesection. Additional work (that both hasbeen done and needs to be done) is de-scribed in Section 2, and conclusions arethen offered in Section 3.

1 . T H E D A T A S T R U C T U R E S

In this section we investigate a number ofsearch methods for range searching. Eachsearch method is specified by a data struc-

ture for storing the data and algorithms forbuilding (which we call preprocessing) andsearching the structure. We will analyze asearch structure (say A) by giving threecost functions of N (the number of points)

and k (the number of dimensions):

• PA(N, k), the cost of preprocessing Npoints in k-space into a data structure;

• SA(N, k), the storage required by thedata structure;

• QA(N, k), the search time or query cost.

These costs can be analyzed in terms oftheir average or thei r worst case; we usuallyspeak of the worst-case cost, explicitly men-

tioning the average whenever we employ it.In many applications one may desire var-ious utility operations on data structures,such as insertion and deletion. In this sec-tion we ignore this issue, considering onlystatic (unchanging) files; we then r eturn to

C o m p u t i n g S u r v e y s , V o l 1 1, N o . 4 , D e c e m b e r 1 9 7 9

Page 3: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 3/13

Data Structures for Range Search ing . 399

O

I

I

t

I

l

I

I

II

I

I

• o

I It t

I t

I I i.

I II

I II

I II

I II

I I

I I I

I I I

I I

I I

I U •I O

I I I I

I I I I

I l II I I

a I I I

F m u R ~ 1 .

! / , d , '! D !

I I / I I I

:> " 2 ' '# ; / , ' , , ,,'

I I / r Q I I I

j / q / ~ d " ' !I I

¢ " I f !

I U u s t r a t i o n o f p r o j e c t m n

I

I

f

o

I!

II

!

i y!

I r!! a

i n

t g

! ei

the question of dynamic structures in Sec-tion 2.

1 . 1 Se q ue nt ia l Sc a n

The simplest approach to range searchingis to store the N points in a sequential list.As each query arrives, all elements of thelist are scanned and every record that sat-isfies the query is reported. If the queriesdo not have to be handled immediately,then they can be "batched" so that manyqueries can be processed with one sequen-tial pass th rough the file. Since all k keys ofthe N records must be stored and each

k-key record is examined as the structure isbuilt or searched, it is easy to see that thesequential scan structure SS has the prop-erties

Pss(N, k) = O(Nk),

Sss(N, k) = O(Nk),

Qss(N, k) = O(Nk).

Sequential scanning has the advantage ofbeing trivial to implement on any storage

medium. It is competitive with the moresophisticated methods described in this pa-per when the file is small and the numberof attributes is large, or when a large frac-tion of the records in the file satisfy thequery (or queries, if they are batched).

1 . 2 P r o j e c t ion

The projection technique involves keeping,for each attribute, a sequence of the records

in the file sorted by that attribute. One canview this geometrically as a projection ofthe points on each coordinate. The k listsrepresenting the projections can be ob-tained by using a standard sorting algo-rithm k times. After preprocessing, a rangequery can be answered by the followingsearch procedure: Choose one of the attri-butes, say the ith. Look up the two positionsin the ith sequence (using a binary search)

of the extreme values defining the range onthe ~th attribute of the query. All recordssatisfying the query will be in the list be-tween these two positions just found. This(smaller} list is then searched by bruteforce. The projection technique is referredto as inverted lists by Knuth [KNUT73].This technique was applied by Friedman,Baskett, and Shustek [FRIE75] in their so-lution of the "nearest neighbor" problemand by Lee, Chin, and Chang [LEEC76] to

a number of database problems.The projection technique is illustrated in

Figure 1. The points represent a set ofsixteen records of two keys each, repre-sented by the x- and y-coordinates. Thedashed lines are the projection of the r e c -

C o m p u t i n g S u r v e y s , V o l . 11 , N o . 4, D e c e m b e r 1 97 9

Page 4: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 4/13

4 0 0 • J. L. Bentley and J. H. Friedman

o r d s o n t o t h e x - c o o r d i n a t e ( t h a t i s , t h e

r e c o r d s s o r t e d i n t o x - o r d e r ). T h e v e r t i c a l

s l a b is t h e x - r a n g e o f t h e q u e r y , t h e h o r i -

z o n t a l s la b i s th e y - r a n g e , a n d t h e r e c t a n g l e

t h a t i s t h e i r i n t e r s e c t i o n c o n t a i n s t h o s e

p o i n t s w h i c h s a t i s fy t h e q u e r y . T o a n s w e r

t h i s q u e r y , w e n e e d o n l y in v e s t i g a t e t h e s ix

p o i n t s t h a t a r e i n s id e t h e v e r t i c a l s la bm a r k e d b y t h e 4 5 ° l in e s .

O n e c a n a p p l y t h e p r o j e c t io n t e c h n i q u e

w i t h o n l y o n e s o r t e d l i s t ( p r o j e c t i o n ) . I f t h e

d i s t r ib u t i o n o f v a l u e s o f t h e v a r i o u s a t t r i-

b u t e s i s m o r e o r l es s u n i f o r m o v e r s i m i l a r

r a n g e s a n d t h e q u e r y r a n g e s o f e a c h a t tr i -

b u t e a r e s i m i l a r , t h e n o n e l i s t i s s u f f i c i e n t .

I f t h i s is n o t t h e c a s e , h o w e v e r , t h e n k e e p -

i n g s e v e r a l li s ts c a n o f t e n le a d t o s u b s t a n t i a l

r e d u c t i o n s i n t h e q u e r y t im e . T h e m u l t i p lep r o j e c t i o n s a r e e x p l o i t e d b y p e r f o r m i n g t w o

b i n a r y s e a r c h e s i n e a c h t o f i n d t h e l o w e r

a n d u p p e r b o u n d s o f t h e r e s p e c t i v e r an g e ,

a n d t h e n s e a r c h in g t h a t p r o j e c t i o n w i th t h e

s m a l l e s t n u m b e r o f r e c o r d s i n t h e r a n g e .

T h e c o s t a n a ly s i s o f p r o j e c t i o n i s

s t r a i g h t f o r w a r d . T o p r e p r o c e s s a f il e o f N

r e c o r d s o f k k e y s e a c h , w e m u s t p e r f o r m k

s o r t s o f N e l e m e n t s . T o s t o r e s u c h a f il e, w e

m u s t s t o r e k l is t s o f N e l e m e n t s e a c h . T h e s ef a c ts i m m e d i a t e l y y i e l d

Pp(N, k) = O(kN l og N ) ,

Sp(N, k) = O(kN).

F r i e d m a n , B a s k e t t , a n d S h u s t e k [ F r e E 7 5 ]

s h o w t h a t f o r s e a r c h e s t h a t h a v e a l m o s t

c u b i c a l q u e r y r e g i o n s a n d f i n d a s m a l l n u m -

b e r o f r e c o r d s ( a n d a r e t h e r e f o r e s im i l a r t o

n e a r e s t n e i g h b o r s e a rc h e s ) , t h e q u e r y t i m e

o f p r o j e c t i o n i s g i v e n b y

Qp(N, k) = O(N H/k) ( a v e r a g e c a s e)

w h e n t h e p o i n t s e t i s d r a w n f r o m a s m o o t hu n d e r l y i n g d i s t r i b u t io n . T h e p r o j e c t i o n

t e c h n i q u e is m o s t e f fe c t iv e w h e n t h e q u e r -

i e s a lm o s t a l w a y s c o n t a i n o n e r a n g e t h a t

e x c l u d e s m o s t o f th e f ile .

1 . 3 C e l l s

The re are two ways they can search [for the m urder

weapon] from the body outward m a spiral , or dwldethe room up In to squares- - that ' s the g r id method .

From the C BS series Kojak,"De ath Is N ot a Pass ing Grade"

C a r t o g r a p h e r s a s w e l l a s d e t e c t i v e s u s e t h e

g r id ( o r c e ll ) m e t h o d . S t r e e t m a p s o f m e t -

r o p o l i t a n a r e a s a r e o f t e n p r i n t e d i n t h e f o r m

o f b o o k s . T h e f i rs t p a g e o f t h e b o o k s h o w s

t h e e n t i r e a r e a , a n d t h e r e m a i n i n g p a g e s

a r e d e t a i l e d m a p s o f ( sa y ) o n e - m i l e - s q u a r e

r e g i o n s . T o f i n d ( f o r e x a m p l e ) a l l s c h o o l s in

a s p e c i f ie d re c t a n g l e , o n e w o u l d l o o k a t t h e

f i rs t p a g e t o f i n d w h i c h s q u a r e s o v e r l a p t h e

r e c t a n g le a n d t h e n c h e c k o n l y o n t h o s ep a g e s o f t h e b o o k t o f in d t h e s c h o o ls . T h i s

a p p r o a c h c a n b e m e c h a n i z e d i m m e d i a t e ly .

A s q u a r e o f t h e m a p c o r r e s p o n d s t o a c el l

i n k - s p a c e , a n d t h e p o i n t s o f t h e f i le w i t h i n

t h e c e ll a r e s t o r e d t o g e t h e r i n a n i m p l e m e n -

t a t i o n . T h e f i r s t p a g e o f t h e m a p b o o k

c o r r e s p o n d s t o a d i r e c t o r y t h a t a l l o w s o n e

t o ta k e a h y p e r r e c t a n g l e a n d l o o k u p t h e

s e t o f c e l ls .

T h e c e l l t e c h n i q u e i s i l lu s t r a t e d i n F i g u r e2 . T h e s i x t e e n p o i n t s i n t h a t f i g u r e r e p r e -

s e n t s i x t e e n r e c o r d s c o n t a i n i n g tw o k e y s

e a c h . T h e p o i n t s i n e a c h c e l l a r e s t o r e d

t o g e t h e r in a n i m p l e m e n t a t i o n . T h e q u e r y

is g i v e n b y t h e r e c t a n g l e i n th e u p p e r p a r t

o f t h e f i g u re , a n d t o a n s w e r i t , o n l y t h o s e

p o i n t s i n t h e f o u r d a s h e d c e l l s n e e d b e

i n v e s t ig a t e d . T h e s q u a r e s i n t h a t f i g u re a r e

t h e " d i r e c t o r y " c o r r e s p o n d i n g t o t h e f ir s t

p a g e o f t h e m a p b o o k .T h e d i r e c to r y c a n b e i m p l e m e n t e d in tw o

w a y s . I f t h e p o i n t s a r e ( sa y ) u n i f o r m l y d is -

t r i b u t e d o n [0 , 1 0] 2 a n d w e h a v e c h o s e n

1 × 1 c e ll s, t h e n w e c a n u s e a t w o - d i m e n -

s io n a l a r r a y a s t h e d i r e c to r y , n a m e d D I -

R E C T ( 0 . . 9, 0 . . 9 ) . I n D I R E C T (i,j) w e

w o u l d k e e p a p o i n t e r t o a l i s t o f a l l p o i n t s

i n t h e c e l l [ t , t + 1 ] × [ j , j + 1 ] . I f w e

w a n t e d t o f i n d a l l p o i n t s i n [5 .2 , 6 .3 ] × [ 1 .2 ,

3 .4 ], t h e n w e w o u l d o n l y h a v e t o e x a m i n e

ce l l s (5 , 1 ) , (5 , 2 ) , (5 , 3 ) , (6 , 1 ) , (6 , 2 ) , an d (6 ,

3 ) - - w e c a l l t h i s " t r a n s l a t i n g " f r o m a r a n g e

q u e r y t o a s e t o f c e l l i d 's . T h e m u l t i d i m e n -

s io n a l a r r a y w o r k s v e r y w e l l w h e n t h e

p o i n t s a r e k n o w n a p r i o r i t o b e u n i f o r m l y

d i s t r i b u t e d o v e r s o m e g i v e n r e c t a n g l e i n t h e

k e y s p a ce . W h e n t h is is n o t k n o w n t o b e

t h e c a s e , o n e w o u l d p r o b a b l y u s e a s e a r c h

m e t h o d , s u c h a s h a s h i n g , f o r t h e d i r e c t o r y .

I n t h is m e t h o d w e n a m e e a c h c e ll a s b e f o re ;

s o c e l l (t , j ) i s a p o i n t e r t o t h e p o i n t s i n[ i , i + 1] × [ j , j + 1 ]. I n s t e a d o f s t o r in g a l l

c e ll s, h o w e v e r , w e s t o r e o n l y t h o s e c e l lst h a t a c t u a l l y c o n t a i n r e c o r d s o f t h e f ile . T o

p r o c e s s a q u e r y , w e t r a n s l a t e t h e r e c t a n g l e

i n t o a s e t o f c e l l id ' s ( a s w e d i d a b o v e ) , l o o k

Computing Surveys, Vol 11, No. 4, December 1979

Page 5: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 5/13

• /

/

ql #

/

}

f

/

/

/

f

/

/

//

/

Data Structures for Range Searchin g

/

/

/

/

/

/

f,/

/

/ :• /

/

/

. ¢

/ /

/ /

/ /

.J

f

/ /

/ /

/ /

/ /

/

/ I/ #

D / // /

#

/ g

/ g

/ #

/ 0# /

/ /

/ /

I I /

t O

401

F I G U R E 2 I l l u s t r a t i o n o f c e l ls

up those id's, and check all the points in theoccupied cells for inclusion in the rectangle.The storage required for the cell techniqueis the storage for the directory plus loca-tions for the linked list representing pointsin cells; the size of the directory is usuallymuch smaller than N.

Knuth [KNUT73] has discussed thisscheme for the two-dimensional case. Lev-inthal [LEVI66] used a cell technique inthree-dimensional Euclidean space for de-termining all atoms within 5 angstroms ofevery atom in a protein molecule--he re-ferred to this as "cubing." The idea of using

hashing for the cell directory was first de-scribed by Yuval [YUVA75], and was laterused by Rabin [RAm76] to solve the "clos-est pair" problem. Bentley, Stanat, andWilliams [BENT77] discuss a number ofdifferent implementations for the directory(two of which we have seen).

The basic parameters of the cell tech-nique are the size and shape of each cell. Inanalyzing a search there are two costs to

count: cell accesses (the number of direc-tory look-ups) and inclusion tests (testingwhether a point satisfies the range query).If the cell size is extremely large, there willbe few cell accesses and many inclusiontests. If the cell size is very small, on the

other hand, there will be very many cellaccesses and very few inclusion tests.Clearly, either extreme is to be avoided.

The best cell size and shape depend onthe size and shape of the query hyperrec-tangle. Bentley, Stanat, and Williams[BENT77] show that if the query hyperrec-tangles have constant size and shape sothat only their location (in the coordinatespace) is unspecified, then for a single grida nearly optimum size and shape for thecells are the same as those of the queryhyperrectangle. For this case the numberof cells accessed is 2 k, and the expected

search time is proportional to 2 k times thenumber of points in the range. In this con-text the performance of cells is given by

P~(N, k) -- O(Nk),

S~(N, k) ffi O(Nk),

Qc(N, k) = O(2 k F) (average),

where F is the number of records found. Inmost applications, however, the queries willvary in size and shape as well as in location,

so there is little information available formaking a good choice of cell size and shape.

1.4 k - d T r e e s

In this section we examine a dat a structurecalled the "k-dimensional binary search

C o m p u t i n g S u r v e y s , V o l 1 1, N o . 4 , D e c e m b e r 1 9 79

Page 6: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 6/13

402 , J. L. Bentley and J. H. Frie dman

tree," which is usually abbreviated as "k-dtree." This st ructure is a natu ral generaliza-tion of the standard one-dimensional binarysearch tree, so we will briefly review a spe-cial type of that structure (a complete de-scription of binary trees can be found inKNUT73). To build a file of single-key rec-

ords into a binary search tree, we choosethe median of the set as the discriminatorvalue and build all records with key valuesless than or equal to the discriminator intothe left subtree of the root (recursively) andall elements with greater key values intothe right subtree. This process continuesrecursively until there are only a few (saysix or less} nodes in the set, at which pointwe store them as a linked list. Note that no

records are stored in the internal nodes ofsuch a binary search tree; they are con-tained only in the leaf nodes or "buckets"at the bo ttom of the tree. We can answer arange search in this struc ture by a recursivealgorithm that compares the range to thediscriminator of the node it is currentlyvisiting. If the range is entirely to one sideor the other of the discriminator, only theappropriate son is searched; otherwise both

sons are searched recursively.The single-key binary search tree per-forms three functions at once: It stores therecords of the file (in the external nodes, or"buckets"), it divides the data space intosegments (by choosing the discriminators),and it gives a directory among the segments(the tree structure). We now investigate amultidimensional generalization of the bi-nary search tree that performs these samethree functions: storing the records, divid-

ing space into hyperrectangles, and provid-ing a directory among the hyperrectangles.It accomplishes this by using the same ideaas the one-dimensional algorithm with onecritical exception: In the one-dimensionaltree we only have one key to use as thediscriminator; in a multidimensional treewe have to choose at each internal nodeone of k keys to use as a discriminator.

The algorithm for constructing a k-d tree

is to choose for the discriminator that co-ordinate j for which the spread of attr ibutevalues (as measured by any convenient sta-tistic, such as variance or distance fromminimum to maximum) is maximum forthe subcollection represented by the node.

The partitioning value is chosen to be themedian value of this attribute. This algo-rit hm is the n applied recursively to the twosubcollections represented by the two sonsof the node just partitioned. The partition-ing is stopped, creating a terminal node (orbucket), when the cardinality of the sub-

collection is less than a prespecified maxi-mum, which is a parameter of the proce-dure. (Friedman, Bentley, and Finkel[FRIE77] found empirically that valuesranging from 8 to 16 work well in a Fortranimplementation.) The result of this proce-dure is tha t the coordina te space is dividedinto a number of buckets, each containingapproximately the same number of points(by the stopping criterion) and each ap-

proximately "cubical" in shape (by choos-ing as discriminator the dimension of max-imum spread, which slowly chops long andskinny rectangles in to cubes).

Range searching with k-d trees isstraightforward. Starting at the root, thek-d tree is recursively searched in the fol-lowing manner. When visiting a node thatdiscriminates by the fl h key (which we calla j-discriminator), one compares the jth

range of the query with the discriminatorvalue. If the query range is totally above(or below) that value, then one need onlysearch the right subtree (respectively, left)of that node; the other son can be prunedfrom the search because any node it con-tains does not satisfy the query in thatparticular key. If the query range overlapsthe node's key (that is, the key is betweenthe low and high bounds of the range), thenboth sons need be searched. This can be

accomplished by searching both sons recur-sively (the search being implemented by astack).

The application of k-d trees to (two-di-mensional) range searching is illustrated inFigure 3. The k-d tree is depicted in twoways: Figure 3a shows the structure in 2-space, and Figure 3b shows the abstracttree. The root of the tree is internal nodeA; it is an x-discriminator. The vertical line

in the right part of the figure labeled A isthe discriminating line. That is, every pointto the left of that vertical line is in the leftsubtree of A {with B as root), and everypoint to the righ t is in the subtree with rootC. This partitioning continues recursively,

Computing Surveys, Vol 11, No 4, December 1979

Page 7: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 7/13

" A

E o •

13 •

i

D •

• •

Data Structures for Range Searching

U

F

.,,,.

/x h I/ x

: / " ~ ' l• D,0 • /

// •

i f •

G

(a)

• 403

A

B C

(b)FmURE 3. Illustratmn of k-d trees a) Planar representa tmn, b) tree representa tion

and the resulting cells (buckets) in this treeeach contain two points. The query rectan-

gle is illustrated in Figure 3a, and t he searchfor all points within the rectangle is illus-trated in both figures. The search starts atthe root, and since the query, rectangle isentirely to the right of the vertical linedefined by A, the left subtree of A {with Bas root) can be pruned from the search.This is illustrated in Figure 3b by the per-pendicular line through the son link from Ato B. The search continues, searching both

sons of C, both sons of F, and only the leftson of G. A total of three buckets aresearched; these buckets are dashed in theplanar representation and are marked byan S in the tree representation.

In the k-d tree as introduced by Bentley

[BENT75a], the discriminators are chosencyclically (that is, the roo t is discriminated

by the first key, its sons by the second, andso on). The idea of "adaptive partitioning"was proposed by Friedman, Bentley, andFinkel [FRIE77] and makes the k-d tree astructure very "sensitive" to the particularfile that it represents. The application ofk-d trees to a host of problems can be foundin BENT79b, GOTL78, and SILv78b.

Analysis of k-d trees for range searchinghas been considered by several researchers.

The work required to construct a k-d treeand i ts storage requirements (see BENT79b)are

Pk(N, k) = O(N log N),

Sk(N, k) ffi O(Nk).

Computing Surveys, Vol. 11, No. 4, December 1979

Page 8: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 8/13

404 • J. L. Bentley and J. H. Fried man

The search cost depends on the nature ofthe query. Lee and Wong [LEEW80] haveshown tha t in the worst case,

Qk(N, k) < O ( N H/k + F)

where F is the number of points found inthe region. If the query range is almost

cubical and the number of records thatsatisfies the query is small (so that therange query is similar to a nearest ne ighborsearch), then Friedman, Bentley, and Fin-kel's [FRIE77] analysis shows tha t

Qk(N, k) = O(log N + F)

(average case for small answer).

For the case where a large fraction of the

file satisfies the query, Bentley and Sta nat[BENT75b] and Silva-Filho [Smv78a] showthat

Qk(N, k) = O(F)

(average case for large answer).

The k-d tree structure is most effective insituations where little is known about thenature of the queries or a wide variety ofqueries are expected. It is also useful if

other types of queries (in addition to rangequeries) are anticipated; many other quer-ies supported by k-d trees are discussed byBent ley [BENT79b].

1 . 5 R a n g e T r e e s

A number of very similar structures forrange searching (of primarily theoreticalrathe r t han practical interest) have recentlybeen described by Lueker [LuEK78], Leeand Wong [LEEW80], and Willard[WILL78a]. In this section we investigatethe range tree, a structure introduced byBentley [BENT79a] that is also similar tothe former structures. It achieves the bestworst-case search time of all the s tructureswe have seen so far in this paper, but hasrelatively high preprocessing and storagecosts. For most applications the high stor-age will be prohibitive, but the range tree

is very interesting from a theoretica l view-point. Since the range tree is defined recur-sively in dimension (that is, the k-dimen-sional structure is defined in terms of the(k - 1)-dimensional struc ture), we beginour discussion by looking at a one-dimen-

sional structure and then generalize thatstructure to higher dimensions.

The simplest structure for one-dimen-sional range searching is a sorted array.The preprocessing sorts the N elements inascending order by key. To answer a rangequery, we do two binary searches to find

the positions of the low and high end of therange in the array. After these two positionshave been found, we can list all the pointsin that part of the array as the answer tothe range query. (Note tha t this is preciselythe projection method applied to the one-dimensional problem.) For this structurewe use linear storage and O ( N log N) pre-processing time. The two binary searcheseach cost O(log N), and the cost of listing

the points found in the region will, ofcourse, be proportional to the number ofsuch points. Letting F be the number ofpoints found in the region, we have

Pr(N, 1) = O(N log N),

Sr(N, 1) = O(N),

Q~(N, 1) = O(log N + F).

We will now build a two-dimensionalrange tree, using as a tool the one-dimen-

sional sorted arrays (SA's) we describedabove. The range tree is similar to the"binary search trees" described by Knuth[KNUT73, Sect. 6.2], so we will use his ter-minology in our discussions. The range treeis a rooted b inary tree in which every nodehas a left son, a right son, a discriminatingvalue (all nodes in the left subtree have adiscriminating value less than the node's),and (unlike a regular binary search tree)every node contains an SA. The root of therange tree contains an SA (sorted byy-coordinate) and has as a discriminatingvalue the median x-value for all points. Theleft subtree of the root has an SA containingthe N/2 points with x-value less than me-dian sorted by y-coordinate. Similarly, theright son of the root represents the N/2points with x-value greater than the medianand has an SA of those points sorted byy-coordinate. This partitioning continues so

that i levels away from the root we have 2'subtrees, each representing N/2 L pointscontiguous in the x-dimension and eachcontaining an SA of the points sorted byy-coordinate. This partitioning continuesfor a total of (approximately) log N levels;

Computing Surveys, Vol 11, No 4, December 1979

Page 9: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 9/13

D a t a S t ru c t ur e s f or R a n g e S e a r c h i n g • 405

we handle small point sets (say, less than adozen points) by bru te force.

The search algorithm for a range tree ismost easily described recursively. Eachnode in the tree represents a range in thex-dimension from the least x-value con-tained in the subtree to the greatest. When

visiting a node, we compare the x-range ofthe qu ery to the range of the node, and ifthe node's range is entirely within thequery's, then we search that structure's SAfor all points in the query's y-range andreturn. If the query's range does not whollycontain the node's, then we compare thequery's x-range to the node's discriminatorvalue. If the range is entirely below thediscriminator, we recursively visit the left

subtree ; if it is above, we visit the right; andif the range overlaps the discriminator, thenwe visit both subtrees.

Th e analysis of the planar tree is rathercomplicated . Since t her e are log N levelsin the tree and N points are stored oneach level, the total storage required isO ( N log N). T he preprocessing can be per-formed in O ( N log N) time if clever tech-niques are employed. Analysis shows thatat most two SA searches are done on eachlevel of the tree {each of cost approximatel ylog N), so the total cost for a search isO(log 2 N) plus the time for listing the

points in the region. Letting F stand, asbefore, for the total number of points foundin the region we have

P r ( N , 2) = O ( N log N),

S r ( N , 2) = O ( N log N),

Q r ( N , 2) = O(log ~ N + F) .

If we step back for a moment, we can seehow we built the structure: We cons truct eda two-dimensional structure by building atree of one-dimensional structures. We canperform essentially the same operation toyield a three-dimensional structure: Weconstruct a tree containing two-dimen-sional structu res in the nodes. This processcan be continued to yield a structure fork-dimensions, which will be a tree contain-

ing (k - D-dimensional structures. Thiswill yield a struct ure with p erforma nces

P r ( N , k) = O( Nl og k-I N),

S ~ (N , k ) = O ( N logk-l N),

Q r ( N , k) = O(log k N + F) .

The range tree structu re is very interest-ing from a theoretical viewpoint. The#sym ptot ic search time is very fast, but theamount of storage used is usually prohibi-tive in practice. Although t he application ofthis structure to practical problems willprobab ly be limited to cases when k ffi 2 or

3, it does provide an important theoreticalbenchmark. It also gives us an interestingtechnique (recursion in dimension) thatmight yield fruit in practice. {Indeed, there

are some very interesting relationships be-tween range trees and the k-d trees of Sec-tion 1.4.)

1 .6 k - ranges

The k - r a n g e is an effici ent worst-case s truc-

ture for range searching introduced byBentley and Maurer [BENT80b]. They de-veloped two ty pes of k-ranges, overlappingand nonoverlapping. Both of these struc-

tures involve storing sets of lists of pointssorted by different coordinates; additionaldimensions are added recursively, muchlike the range trees of the last section. Be-cause k-ranges are rather complicated todescribe and are of primarily theoretical

interest, we will not describe them here butonly mention their performance. Th e over-lapping k-ranges can be made to have per-formance

Po(N, k) f f i O(N~+~),

S o ( N , k ) = O ( N ' + ~ ) ,

Q o ( N , k) = O(log N + F)

for any e > 0. It is pleasing to note th at theconstants "hidden" in the O's of the above

equations are just k /E . Overlapping k-ranges have very efficient retrieval time butsomewhat high preprocessing and storagecosts; their dual structures, nonoverlappingk-ranges, have very efficient preprocessingand storage costs but increased querytimes. Th eir performance is

P n ( N , k ) = O ( N log N),

S n ( N , k ) = O ( N ) ,

Q°(N, k) = O(N),for any fixed ¢ > 0. Th e de tails of thesestructures can be found in BENT80b. Al-though these stru ctures were developed pri-marily as a theoretical device, they mightprove efficient in some implementations

C o m p u t i n g S u r v e y s , V o l. 1 1 , N o 4 , D e c e m b e r 1 9 t 9

Page 10: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 10/13

406 • J. L. Bentl ey and J. H. Frie dma n

(Their primary drawback is that their spacerequirements are high, and space is usuallya critical resource.)

1 .7 Ot her S t ruc t ures

In the previous sections we have investi-gated six structures for the range searchingproblem that (in the au thors' opinion) dom-inate other structures proposed for thisproblem. In this section we briefly investi-gate some of these other structures.

Knuth [KNUT73] points out tha t the no-tion of cells can be applied recursively. Tha tis, when one of the cubes has more thansome certain number of points, the cube is

further divided into subcubes of yet smallersize. This scheme implies a multidimen-sional tree with multiway branching. Interms of both the partitioning imposed onthe space and the ease of implementation,this idea seems to be dominated by a datastructure called the quad tree.

The quad t ree was first described by Fin-kel and Bentley [FINK74]. I t is a generaliza-tion of the standard binary search tree, inwhich every node has 2 h sons. Bentl ey and

Stanat [BENT75b] analyzed the perform-ance of quad trees for "square" rangesearches in uniform planar point sets, andLinn [LINN73] discussed the fact th at quadtrees (which he called "search-sort k trees" )have advantages over binary trees whenused in a synchronized multiprocessor sys-tem. This application aside, however, thequad tree seems to be dominated by thek-d trees of Section 1.4.

A great deal of work has been done re-cently on multikey searching problems thatare similar in flavor to the range searchingproblem. Dobkin and Lipton [DOBK76] and

Bentley [BENT80a] have investigated anumber of searching problems defined onsets of points in k-dimensional space. Rivest[RIVE76] provides a number of interestingdata structures for answering "partial-mat ch" queries, which are essentially range

queries in a file in which the keys assumediscrete values. For discussions of efficientsearch methods in the context of databasesystems, the reader is referred to such pa-pers as LIou77, SHNE77, YANG77, andYANG78.

1 . 8 C o m p a r is o n o f M e t h o d s

In Sections 1.1 through 1.6 we have dis-

cussed six structures for range searching.The performances of these six structures(seven including the two variants of k-ranges) are summarized in Table 1, whichshows the preprocessing, storage, and querycosts of each structure. All the functions inthat table reflect worst-case costs, exceptthose query costs that are footnoted. Forthose functions the probabilistic assump-tions are described in the notes.

Four of these six structures (sequentialscan, projection, cells, and k-d trees) havebeen presented as providing practical solu-tions to the range searching problem. Foreach structure there are situations in whichit is clearly superior and other situationswhere it performs badly. In this section wewill mention some of these situations andcompare the performance of the fourmethods.

If the file is small and the number of

att ributes large, if the f'fle is to be searchedonly a few times, or if the queries can bebatched so that nearly all the records in thefile satisfy at least one, then sequential scan

T A B L E 1. P e r f o r m a n c e of D a t a S t r u c t u r e s fo r R a n g e S e a r c h m g

S t r u c tu r e P ( N , k ) S ( N , k ) Q ( N , k )

S e q u e n t i a l s c a n O {N } O ( N ) O ( N )P r o j e c t i o n O ( N lo g N ) O (N ) O (N 1-1/* + F) a~)C e l l s O ( N ) O ( N ) O ( F ) a(z)k - d t r e e s O ( N lo g N ) O ( N ) O ( N H / k + F )

O ( l o g N + F ) a ())N o n o v e r l a p p m g k - r an g e s O ( N lo g N ) O ( N ) O ( N ~ + F )R a n g e t r e e s O ( N l o g k - l N ) O ( N l o g * -1 N ) O ( l o g * N + F )O v e r l a p p i n g k - r a n g e s O ( N ~+~) O ( N ~+~) O ( l o g N + F )

a Q u e r y t i m e s t h a t i n d m a t e a v e r a g e c a se a n a l y s is P r o b a b i h s t m a s s u m p t i o n s a r e(1 ) S m o o t h d a t a s e t s - - v e r y sm a l l q u e r y r e g io n .(2 ) A n y d a t a s e t - - c e l l si z e e q u a l s q u e r y si ze .(3 ) S m o o t h d a t a s e t .

Com puting Survey s, Vol. 11, No 4, Dec emb er 1979

Page 11: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 11/13

D a t a S t ru c t u re s f o r R a n g e S e a r c h i n g • 407

is the method of choice. In o ther cases oneof the more sophisticated methods is likelyto be more efficient. Projection does bestwhen the query range on one of the attri-butes is usually sufficient to eliminatenear ly all the File records. For th is case thelow overhead of searching this structure

allows it to dominate the others. In situa-tions where several or many of the attri-butes serve to restrict the range query, theprojection technique performs relativelypoorly.

Both the cell and k-d tree structures areappropriate in situations where the queryrestricts several of the attributes. If theapproximate size and shape of the queriesare roughly constant and known in ad-

vance, then cells defined by a fixed gridwith size and shape similar to those of theexpected queries is most advantageous. Forqueries with sizes and shapes that differconsiderably from the design, however, per-formance can be quite poor.

The k-d tree structure is characterized byits robustness to wildly varying queries.The cell design adapts to the distributionof the attr ibute values of the file records inthe k-dimensional coordinate space. Thecells all contain very nearly the same num-ber of records; there are no empty cells. Indense regions there are many cells and acorrespondingly fine division of the coordi-nate space; in sparse regions there is acoarser division with fewer cells. For mostapplications of range searching tha t are notcharacterized in the preceding paragraphs,k-d trees are likely to be the method ofchoice.

2 . A D D I T I O N A L W O R K

Our discussion of the data structures inSection 1 is on a very abstract conceptuallevel, and we have ignored many problemsthat arise in actual applications of rangesearching. In this section we briefly exam-ine some of those problems and the solu-tions that have been proposed to handlethem.

All files that we have discussed so farhave been stat ic; that is, they representunchanging files. Many applications, how-ever, require d y n a m i c structures, in whichinsertions and deletions can be made. Thesequential scan structure is easy to main-

tain dynamically, and so is the projectionstructure using methods for maintainingone-dimensional sorted lists described byKnuth [KNUT73]. The cell technique cansupport insertions and deletions by merelykeeping a linked list of the points in eachcell and inserting or deleting the new or old

record in the appropriate list. Dynamic k-dtrees are a more subtle problem and havebeen discussed by Bentley [BENT79b] andWillard [WILL78b].

Considerable research remains to bedone in the development of heuristics foraiding the search methods we have seen.For example, if the range queries in a seven-dimensional problem almost always involveonly two of the attributes, then the design

of the structure should involve only thosetwo attributes. Heuristics for detectingthese and other similar situations would bevery helpful. Techniques described by Ben-tley and Burkhard [BENT76] might proveuseful in such an investigation.

Our discussion of all of the data struc-tures has been for the case in which theyare implemented in primary memory.Many applications (particularly databases)inherently involve secondary storage mediasuch as disks and tapes. All the structuresof Section i can be efficiently implementedon such mediaJ

Several researchers have r ecently consid-ered an interesting generalization of therange searching problem, which calls foradding a range re s t r i c t i on to an existingdata structure. That is, we already havesome structure for performing a particulartype of query, and we want to have the

capability of saying "perform th at query onall records in which this key lies in thatrange." Bentley [BENT79a], Lueker[LUEK79], and Willard [WILL78a] have de-veloped a number of t r a n s f o r m a t i o n s ondata structures that allow one to add therange restriction capability. (These trans-formations actually led to the discovery ofboth the range tree and the k-range datastructures of Section 1.) Although the stor-

age requirements of the resulting structuresseem to be too high to make them of im-

1 For detai ls of these I mplementation s, the reader isreferred to BENT78 whmh m an earlier versio n of thinpaper

Comput ing Surveys, Vol. 11, No. 4, December 1979

Page 12: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 12/13

408 * J . L . Bentley and J. H. Friedman

m e d i a t e p r a c t i c a l i n t e r e s t , t h i s a p p r o a c h i sa n o v e l a t t a c k o n t h e p r o b l e m o f c o n s t r u c t-i n g d a t a s t r u c t u r e s f o r r a n g e s e a r c h i n g .

A n i n t e r e s t i n g t h e o r e t i c a l p r o b l e m t h a tc o u l d p r o v e t o b e o f p r a c t i c a l v a l u e i s p r o v -i ng l o w e r b o u n d s o n t h e c o m p l e x i ty o f t h er a n g e s e a r c h i n g p r o b l e m . S a x e [ S A X E 7 9 ]

h a s i n v e s t i g a t e d t h i s p r o b l e m u s i n g t h es t a n d a r d " d e c i si o n tr e e " m o d e l o f c o n c r e t e

c o m p l e x i t y t h e o r y a n d h a s s h o w n t h a tk - r a n g e s h a v e o p t i m a l w o r s t - c a s e q u e r yt im e s . T h e s e k - r a n g e s h a v e v e r y h ig h s t o r -a g e r e q u i re m e n t s , h o w e v e r ; s o it w o u l d b ev e r y d e s i r a b l e t o h a v e l o w e r b o u n d s t h a tm a k e s t r o n g e r s t a t e m e n t s o f t h e f o r m , " ify o u o n l y u s e t h is m u c h s t o r a g e a n d p r e p r o -c e ss in g , th e n t h i s i s t h e f a s t e s t s e a r c h t i m e

y o u c a n h a v e . " F r e d m a n [ FR E D 7 9 ] h a s r e -c e n t l y m a d e p r o g r e s s i n t h i s d i r e c t i o n . A n -o t h e r i n t er e s ti n g o p e n p r o b l e m i s t o s h o wl o w e r b o u n d s o n t h e a v e r a g e c o m p l e x i t y ,

r a t h e r t h a n j u s t t h e w o r s t - c a s e c o m p l e x i ty .

3 . C O N C L U S I O N S

I n t h i s p a p e r w e h a v e i n v e s t i g a t e d a n u m -

b e r o f d a t a s t r u c t u r e f o r t h e r a n g e s e a r c h i n g

p ro b l em . I n 1 9 7 3 K n u t h [ K N u T 7 3 , p . 5 5 4 ]

w a s a b l e t o w r i t e t h a t " n o r e a l l y n i c e d a t as t r u c t u r e s s e e m t o e x is t " f o r t h e p r o b l e m o fr a n g e s e a r c h i n g . I n t h i s p a p e r w e h a v e t r i e d

t o s h o w t h a t t h i s s i t u a t i o n h a s c h a n g e d i n

t h e i n t e r i m , a n d t h a t t h e s e c h a n g e s c a n

h a v e a s u b s ta n t i a l i m p a c t o n b o t h t h e t h e -

o r y a n d p r a c t i c e o f m u l t i k e y s e a r c h i n g .

R E F E R E N C E S

B E NT L EY , J . L . " M u l t i d i m e n s m n a l b i -

n a r y s e a r c h t r e e s u s e d f o r a s s o c m t i v es e a r c h i n g , " Comm ACM 1 8 , 9 ( S e p t .1975) , 509-517BENTLEY, J . L . , AND STANAT, D F ." A n a l y s i s o f r a n g e s e a r c h e s m q u a dt r e e s , " Inf Process Lett 3 , 6 ( Ju ly 1975},1 7 0-1 7 3BENTLEY, J L , A N D BU RK H A RD , W A ." H e u n s t m s f o r p a r t i a l m a t c h r e t r ie v a ld a t a b a s e d e s i g n , " Inf Process. Lett 4 , 5{Feb 1976) , 132-135 . 'BENTLEY, J L . , STANAT, D. F . , AND W IL-L IA M S, E . H J R " T h e c o m p l e x i ty o ff i x e d - ra d i u s n e a r n e i g h b o r s e a r c h i n g , " Inf

Process. Lett. 6 , 6 (Dec. 1977) , 209-212 .BENT LEY, J L. , AND FRIEDM AN, J H . Asurvey of algortthms and data structuresfor range searching, C a r n e g m - M e l l o nC o m p u t e r S c i e n c e R e p C M U - C S - 7 8 -1 3 6a n d S t a nf o r d h n e a r A c c el e r at o r C e n t e rR e p S L A C - P U B - 2 1 8 9 , p r e h m l n a ry v er -

B E N T 7 5 a

B E N T 7 5 b

BENT76

BENT77

BE N T 7 8

B E N T 7 9 a

B E N T 7 9 b

B E N T 8 0 a

B E N T S 0 b

D O BK 7 6

FINK74

F R E D 7 9

F R I E 7 5

FR~E77

G O T L 7 8

K N O T 7 3

L A U T 7 8

L E E C 7 6

L E E W 7 8

L E E W 8 0

LEVI66

LINN73

LIOU77

L O FT 6 5

s i o n i n Proc. Computer Science and Sta-tistics: llth Ann. Symp. on the Interface,M a r c h 1 97 8, p p . 2 9 7 - 3 0 7 .B EN TL EY , J . L . " D e c o m p o s a b l e s e a r c h -i n g p r o b l e m s , " Inf. Process. Lett. 8, 5( J u n e 1 9 7 9 ) , 1 3 3 -1 3 6 .B E NT L EY , J . L . " M u l t i d i m e n s i o n a l b i -n a r y s e a r c h t r e e s i n d a t a b a s e a p p l i c a -t i o n s , " IEEE Trans Softw. Eng S E - 5 , 4( J u l y 1 9 7 9 ) , 3 3 3 -3 4 0 .

B E NT L EY , J . L " M u l t i d i m e n s i o n a l d i -v i d e - a n d - c o n q u e r , " t o a p p e a r m Comm.ACM.B E N T L E Y , J. L, A N n M A U R E R , H. A.

"Efficient orst-case data structur es orrange searching," o appear m A c t a I n f .

D O B K I N , D , A N D L I P T O N, R . J " M u l t l -dimensional searching problems," S I A M

J . C o m p u t . 5, 2 (1976), 81-186.FIN K E L , R . A , A N D BE N T L E Y, J . L ." Q u a d t r e e s - - a d a t a s t r u c t u r e f o r r e -t r i e v a l o n c o m p o s i t e k e y s , " Acta Inf 4 , 1(1974) , 1-9.FRE D M A N , M . "A n e a r o p t i m a l d a t a

s t r u c t u r e f o r a t y p e o f ra n g e q u e r y p r o b -l e m , " i n Proc. ll th AC M Symp. Theory ofComputing, M a y 1 97 9, p p . 6 2 - 6 6 .FRIEDMAN, J H ., BASKE TT, F., AND SHUS-T E K , L . J " A n a l g o r i t h m f o r f i n d i n gn e a r e s t n e i g h b o r s , " IEEE Trans Corn-put. C-2 4 , 1 0 ( O c t . 1 9 7 5 ) , 1 000-1 006 .FRIEDMAN, J H . , BENTLEY , J L. , ANDF IN K E L, R . A . " A n a l g o r i t h m f o r f i n d i n gb e s t m a t c h e s m l o g a r i t h m i c e x p e c t e dt i m e , " ACM Trans. Math. Softw. 3 , 3( Sep t . 1 9 7 7 ) , 2 09 -2 2 6 .GOTLIEB, C. C. , AND GOT LIEB , L . RData types and structures, P r e n h c e - H a l l ,

E n g l e w o o d C l i ff s , N . J , p p . 3 5 7 -3 6 3K N U TH , D . E . The art of computer pro-gramm~ng, vol . 3- sorting and searching,A d d i s o n - W e s l e y , R e a d i n g , M a s s . , 1 9 7 3 .L A U TH E R , U . "4 - d ~ m e n s m n a l b i n a r ys e a r c h t r e e s a s a m e a n s t o s p e e d u p a s so -c i a t iv e s e a r c h e s i n d e s i g n r u l e v e r i f i c a t i o no f i n t e g r a t e d c i r c u i t s , " J Des. AutomFault-Tolerant Comput. 2 , 3 ( J u l y 1 9 7 8 ) ,2 4 1 -2 4 7 .L E E , R . C . T . , CH IN , Y . H , A N D CH A N G ,S . C . " A p p l i c a t io n o f p r i n c ip a l c o m p o -n e n t a n a l y s m t o m u l h - k e y s e a r c h i n g , "IEEE Trans. Softw. Eng. S E - 2 , 3 ( S e p t

1976) , 185-193.L E E , D . T , A ND W O N G , C K . " W o r s t -c a s e a n a l y s i s f o r r e g i o n a n d p a r t i a l r e g i o ns e a r c h e s I n m u l t i d i m e n s i o n a l b i n a r ys e a r c h t r e e s a n d q u a d t r e es , " Acta Inf 9,1 (1978) , 23-29.L E E , D . T . , A ND T O N G , C . K . " Q u l n t a r yt r e e s ' a f 'd e s t r u c t u r e f o r m u l t i d i m e n s i o n a ld a t a b a s e s y s t e m s , " t o a p p e a r m ACMTrans. Database SystL E VI NT H A L, C . " M o l e c u l a r m o d e l - b u i l d -i n g b y c o m p u t e r , " Sc~ Am 2 1 4 ( J u n e1966) , ~2-52 .LINN, J . General methods for parallel

searchtng, T e c h R e p . 61 , D i g i t a l S y s t e m sL a b , S t a n f o r d U . , S t a n f o r d , C a h f , M a y1973.LIOU, J H . , AND YAO, S B " M u l t i - d i -m e n s i o n a l c l u s t e r i n g f o r d a t a b a s e o r g a -n i z a t i o n , " Inf. Syst. 2 (1977}, 187-198.L O F T S G A A R D E N , D . O , A N D Q U E S E N -

Com puting S urveys, Vo | 11, No 4, Decem ber t979

Page 13: Data Structures for Range Searching

7/28/2019 Data Structures for Range Searching

http://slidepdf.com/reader/full/data-structures-for-range-searching 13/13

Data Structures for Range Searching • 409

L U E K 7 8

L U E K 7 9

R A B i 7 6

R I V E 7 6

SAXE79

SH N E 7 7

B ER RY , C . P . " A n o n p a r a m e t r i c d e n s i t yf u n c t i o n , " A n n . M a t h . S t a t . 36 (1965) ,1049-1051 SILV78aL U EK E R , G . " A d a t a s t r u c t u r e f o r o r -t h o g o n a l r a n g e q u e r i e s , " m Proe 1 9 t hS y ru p F o u n d a t t o n s o f C o m p u t e r S c t e n c e ,I E E E , O c t . 1 9 7 8 , p p . 2 8 - 3 4L U EK E R , G . " A t r a n s f o r m a t i o n fo r a d d -m g r a n g e r e s t r ic t i o n c a p a b d l t y t o d y n a m t cd a t a s t r u c t u r e s f o r d e c o m p o s a b l e s e a r c h -m g p r o b l e m s , " T e c h . R e p . 1 29 , U o f C a l -i f o r m a a t I r v i n e , 1 9 79 .R A B IN , M O " P r o b a b d l s t m a lg o -r i t h m s , " m A g o r t t h m s a n d c o m p l e x i t y .n e w d w e c t l o n s a n d r e c e n t r e s u l t s , J . FT r a u b ( E d . ) , A c a d e m m P r e s s , N e w Y o r k ,1976 , pp . 21-39 . YANG 77R IV E ST , R L . " P a r h a l m a t c h r e t r i e v a la l g o r i t h m s , " S I A M J C o m p u t . 5 , 1( M arc h 1 9 76 ), 1 9 -5 0 YA N t~ 78S A X E, J . B " O n t h e n u m b e r o f r a n g eq u e r m s m k - s p a c e , " t o a p p e a r m Dt sc re t e YU V A 7 5A p p l M a t hS H N EID E R M A N, B . " R e d u c e d c o m b i n e d

S I L v 7 8 b

W I L L 7 8 a

W IL L 7 8 b

i n d e x e s f o r e f f ic i e n t m u l t i p l e a t t r i b u t e r e -t r i e v a l , " In f . Sy s t . 2 (1977) , 149-154.S IL V A -FIL H O , Y . V . A v e r a g e c a s e a n a l oy s l s o f r e g to n s e a r c h m b a l a n c e d k - dtrees, R e p ., U . o f K e n t , C a n t e r b u r y , E n g -l a n d , N o v . 1 9 7 8.S[LVA-F~LHO, Y V. M u l t ~ d i m e n s t o n a lse arc h t re e s as rad i c e s o f fi l es , R e p . , Uo f K e n t , C a n t e r b u r y , E n g l a n d , D e c 1 97 8.WILLARD, D. E . P r e d i c a t e - o r i e n t e dd a t a b a s e s e a r c h a l g o r i t h m s , R e p . T R - 2 0 -7 8 , H a r v a r d U . A i k e n L a b . , 1 9 7 8 .W I L LA R D , D . E . " B a l a n c e d f o r e s t s o fk - d * t r e e s a s a d y n a m i c d a t a s t r u c t u r e , "r e f o r m a t iv e a b s t r a c t , H a r v a r d U ., B o s t o n ,M a s s , 1 9 7 8 .Y A NG , C . " A v o i d i n g r e d u n d a n t r e c o r da c c e s s e s i n u n s o r t e d m u l t i l i s t f ' d e o r g a n i -z a t i o n s , " I n f S y s t . 2 (1977) , 155-158.Y A N G , C . " A c l a s s o f h y b r i d l i s t f il e o r -g a m z a t i o n s , " In f . Sy s t . 3 (1978) , 49-58 .Y U V AL , G . " F i n d i n g n e a r n e i g h b o r s mk - d i m e n s i o n a l s p a c e , " In f . Proc e ss . Le f t .3 , 4 ( M ar ch 1 9 75 ), 1 1 3 -1 1 4

RECEIVED JANUA RY 1979; FINAL REVISION ACCEPTED AUGUST 1979 .

Com puting S urveys, Vol . 11, No 4, December 1979