Mobile Query Processing Incorporating Server and Client Based Approaches by James Winly Jayaputera, BAppSci(Comp.Sci), MIT Thesis for fulfillment of the Requirements for the Degree of Doctor of Philosophy (0190) Clayton School of Information Technology Monash University September, 2008
293
Embed
Mobile Query Processing Incorporating Server and Client ...users.monash.edu/~srini/theses/James_Thesis.pdf · Mobile Query Processing Incorporating Server and Client Based Approaches
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Mobile Query Processing Incorporating Server
and Client Based Approaches
by
James Winly Jayaputera, BAppSci(Comp.Sci), MIT
Thesis
for fulfillment of the Requirements for the Degree of
Doctor of Philosophy (0190)
Clayton School of Information Technology
Monash University
September, 2008
Abstract
This thesis studies query processing in a mobile environment. The main objective
is to investigate the performance improvement of mobile query processing, focusing
on the server and client sides.
In server side query processing, we consider single-cell and multi-cell queries,
whereby a cell is a service area for a single stationary host to communicate with a
static network. A quick response in answer to a mobile query is important, because
mobile users invariably move to another location while awaiting the query result.
To handle such a dynamic situation, we proposed solutions to answer single-cell
and multi-cell queries. The proposed solutions for processing single-cell queries are
divided into static and dynamic query scopes, and angle of movement. The static
and dynamic query scopes are extended to process multi-cell queries. Furthermore,
another solution is added in order to deal with a situation where the areas of several
base stations are either disjoint or overlapping. Finally, our algorithms also handle
disconnections which occur during query result transmission from a base station to
the mobile users.
Indexing mechanisms are important to speed up query processing, especially
for handling multi-cell queries. We propose two indexing mechanisms called Local
Index and Global Index mechanisms. The local index stores indexes of any requested
objects with limited slots, whereas the global index builds the index while a base
ii
station is starting up. For both mechanisms, we developed algorithms to deal with
the existence and non-existence of replicated objects at the requested cell.
Frequent disconnections is a common problem occurring in a mobile environment.
Providing a cache in a mobile device is an important consideration. A cache is
useful if the repeat of many queries can be retrieved from the cache. Due to the
limitation of storage space in the mobile device, we have developed three cache
replacement policies, called: Path-based, Density-based and Probability Density Area
Inverse Distance (PDAID) mechanisms, which are based on distance, weight and
cost factors for each method, respectively.
In order to analyse the behaviour of the proposed methods, we have implemented
and simulated the performance of each algorithm. The results of each performance
are compared and analysed. The server side query processing shows an improvement
of the total retrieved objects while the query processing time and the amount of
data transfer are reduced. Furthermore, the server is able to decide whether the
next query result needs to be produced when the mobile users missed the current
query result. The proposed indexing mechanism has reduced the execution time
compared with the conventional approach in processing multi-cell queries. The
proposed approaches for the client side have also improved the cache-hit rate while
reducing the amount of data transfer.
iii
Declaration
I declare that this thesis is my own work and has not been submitted in any form foranother degree or diploma at any university or other institute of tertiary education.Information derived from the published and unpublished work of others has beenacknowledged in the text and a list of references is given.
James Winly JayaputeraSeptember 20, 2008
iv
Acknowledgments
This thesis would never have come into existence without precious encouragement,
guidance, and both personal and academic support from two of my supervisors, Dr.
David Taniar and Professor Bala Srinivasan.
I would like to dedicate this thesis to my family, who have supported me to the
end of this journey. Without them, I would not have completed this thesis.
I also would like to thank all of my friends, without mentioning them individually,
who helped to make this possible. I also to thank Bruna Pomella for correcting
Modulation Spread Spectrum OFDM (Orthogonaltechnique Frequency Division
Multiplexing)Distance coverage Up to 300 feet 60 feet - speed goes
down with increaseddistance
Maturity More matured products Less matured butprogressing fast
Number of access Every 200 feet in each Every 50 feet;points required directionMarket penetration Quite widespread Just starting in 2002Interference with Band is more polluted - Less interference becauseof other devices significant interference here few devices in this
bandInteroperability Current problems expected Problems now but
to be resolved in future expect resolution soonCost Cheaper - $300 for access More expensive $500 (in
point and $75 for adapter 2001 /2002) - will comedown
Vendors Major vendors in both camps
CHAPTER 2. LITERATURE REVIEW 16
the other hand, Table 2.2 [79] shows a comparison of Wireless LAN standards
802.11a and 802.11b in more detail by considering several factors.
• Broadband Wireless Network
The wireless technology that allows simultaneous wireless delivery of voice,
data, and video has appeared recently in metropolitan areas, which is called
Broadband Wireless (BW) [81]. This wireless technology is mainly available in
metropolitan areas with a requirement of clear sights between the transmitter
and the mobile computing devices. Two types of this technology are: Local
Multi-point Distribution Service (LMDS) and Multi-channel Multi-point Dis-
tribution Service (MMDS). The first, LMDS, uses a high bandwidth wireless
frequency within a range of 20-31 GHz. The last type, MMDS, uses a lower
bandwidth wireless frequency within 2 GHz and has a coverage of up to 35
miles (roughly 56 KMs).
• Wide Area Wireless/Radio Network
Wide Area Wireless is designed to provide data transmission and its infrastruc-
ture consists of base stations, network control centres and switches to transmit
the data [127]. The characteristics of Wide Area Wireless are high mobility,
wide ranging and low data rate digital communication [88, 127]. This network
type can be categorised into public and private radio network [88]. The first
category is the wireless data communications supplied to the public by service
providers and the average data rate is 4800 bps to 19.2 Kbps [127]. The second
category is provided by a private company for its own purposes. Examples of
public packet data network are ARDIS, CDPD, Ericssons Enhanced Digital
Access Communication Systems (EDACS), Metricom, Mobitex and Motorola
Datatrac [33].
CHAPTER 2. LITERATURE REVIEW 17
• Satellite-based Network
The satellite network has been used to deliver communication, which relays
voice, video or data, since the 1960s [26]. The characteristics of the satellite-
based network are that it has wide range coverage, expensive, two-way com-
munication and low quality voice. It has wide area coverage which spans the
ocean as well as remote land areas [70]. It provides two-way communications,
however, it has low quality voice or limited data [127, 88]. It is also expensive
to provide this type of network [31].
There are three common terms used for these satellites based on their dis-
tance and spatial relationship with the earth, namely GEOstationary Satellites
(GEOS), Medium Earth Orbit Satellites (MEOS) and Low Earth Orbit Satel-
lites (LEOS) [88, 31, 110]. GEOS, MEOS and LEOS are located at altitudes
of 35,786 km, 10,000 km and 1,000 km respectively.
• Cellular Network
The cellular network has evolved from first generation up to fourth generation.
The first generation (1G) of cellular systems appeared in the early 1980s and
is based on analog technology [6]. Voice is transmitted using Frequency Mod-
ulation (FM) [88]. The first generation characteristics are low capacity, lack of
security, and unsuitable for non-voice applications [6]. The data transfer rate
is 1.2-9.6 Kbps [88].
In the early 1990s, the second generation (2G) of cellular systems appeared
and was heralded by the arrival of digital modulation techniques that promised
increased capacity, better speech quality, enhanced security features, and more
efficient terminals [6]. It has a data transfer rate from 9 to 14 Kbps [88]. Exam-
ples of the second generation cellular network includes Time Division Multiple
CHAPTER 2. LITERATURE REVIEW 18
Access (TDMA), and Code Division Multiple Access (CDMA), Global System
for Mobile Communications (GSM), and Personal Digital Cellular (PDC).
The second and a half generation is an enhancement of the second generation.
The examples include Enhanced Data Rates for Global Evolution (EDGE),
High-Speed Circuit-Switched Data (HSCSD) and General Packet Radio Ser-
vices (GPRS). Their data transfer rates are 474 Kbps, 38.4 Kbps, 171.2 Kbps
[82].
The third generation was developed in 1992. The examples of third generation
include the Universal Mobile Telecommunications System (UMTS), the Code
Division Multiple Access (CDMA2000). This generation has three categories
of data rates as follow [6]:
– 2.4 Mbps to stationary users (fixed location)
– 384 Kbps to pedestrian users (travel speed: 3 metres/hour)
– 144 Kbps to vehicular users (travel speed: 60 metres/hour)
The next generation of 3G wireless network is 3.5G with 3Mbits/secs data
rates [29].
The fourth generation has not officially been released yet, but it is expected
that this generation will support applications up to 1 Gbps [53].
As we mentioned earlier in this section, a cell is a service area for one BS where
each cell may have the same or different size. According to [71, 35], cells are
classified into three types: Macro, Micro and Pico cells. A Macro cell is a cell
which has a radius of 700-8000 metres, a data transfer rate of 144 - 384 Kbps with
bandwidth frequency of 11.34 Mhz. A Micro cell has a radius of 75 - 700 metres
with a data transfer rate of 384 Kbps and bandwidth frequency of 1.26 Mhz. A Pico
CHAPTER 2. LITERATURE REVIEW 19
cell is an area with a radius of 20-75 metres, a 384 Kbps - 2 Mbps data transfer rate
and 1.26 Mhz bandwidth frequency.
2.2.2 Location Positioning Systems
This section discusses available location positioning devices which are used to reg-
ister mobile user details in order to use a wireless facility.
• Satellite Positioning
The common popular Satellite Position is the Global Positioning System (GPS).
This system provides two basic types of services: the Standard Positioning
Service (SPS) and the Precise Positioning Service (PPS) [56]. The SPS is a
positioning and timing service focusing on the civilian user, whereas the PPS
is a positioning, velocity, and timing service for military applications. The
second service is restricted to authorised users only (such as: United States
and allied military and US government). Another Satellite Position is called
Galileo which will start its operation in year 2009.
• Cellular Positioning
This cellular positioning system is the integration of GPS so that the cellular
network provides terminals with assistance and correction of the satellites [56].
Examples of the cellular positioning for the second generation cellular network
(GSM, stands for Global System for Mobile Communications) are Cell-Id in
combination with timing advance, Enhanced Observed Time Difference (E-
OTD), Uplink Time Difference of Arrival (U-TDoA), and Assisted GPS (A-
GPS). The introduction of Cell-Id and A-GPS into existing GSM networks is
comparatively simple, while E-OTD and U-TDoA comprise essential modifi-
cations and extensions.
CHAPTER 2. LITERATURE REVIEW 20
Table 2.3: Performance characteristics of cellular positioning methods
Accuracy Consistency YieldRural Suburban Urban
Cell-Id >10 km 2–10 km 50–1,000 m Poor GoodE-OTD & 50–150 m 50–250 m 50–300 m Average AverageOTDoAU-TDoA 50–120 m 40–50 m 40–50 m Average AverageA-GPS 10–40 m 20–100 m 30–150 m Good Good
Examples of the cellular positioning for the third generation cellular network
(GSM) are Cell-based methods, Observed time difference of arrival with idle
period downlink (OTDoA-IPDL), Assisted GPS (A-GPS). Table 2.3 shows the
performance characteristics of each cellular positioning method [56]. From the
table, the performance of A-GPS show the most accurate and consistent of the
methods, even though its service area is the smallest service area compared
with the others.
Assisted GPS (A-GPS) is a hybrid solution to use information from both the
satellites and network [4]. This technology enables a mobile terminal including
GPS receiver to be positioned faster and more accurately [112]. The A-GPS
is located at BSs and feeds information to mobile computing devices. This
technology has been used in “KDDI au network” in Japan [112]. The advan-
tages of using A-GPS are: (i) improved accuracy, (ii) reduction of position
acquisition time, (iii) less power consumption at the GPS receiver, and (iv)
increase in receiver sensitivity [4].
CHAPTER 2. LITERATURE REVIEW 21
• Indoor Positioning
This positioning system operates within an indoor or local environment, such
as shopping centres or buildings. There are four indoor-based positioning sys-
tems: WLAN-based, Radio Frequency Identification (RFID)-based, infrared-
based and ultrasound-based. The first method is the most popular and IEEE
802.11 devices are used. The RFID-based is an emerging technology that is
primarily used today for applications like asset management, access control,
textile identification, collecting tolls, or factory automation [56].
Some such projects include Xerox ParcTab [117], the Wireless Indoor Position-
ing System (WIPS) project [119], Active Bat [118] and the Cricket system [92].
The first two projects use infrared-based positioning [117, 119]. The last two
projects use ultrasounds and a combination of ultrasounds and radio respec-
tively [118, 92].
2.3 Query Types
This section describes query types classification in a mobile environment. The gen-
eral query types are divided into two classes: Traditional and Mobile Queries. The
traditional query type category contains common query types that exist in a wired
network database, whereas the mobile query contains queries that exist only in a
wireless environment.
Figure 2.3 shows query type classifications in a mobile environment. The tra-
ditional query is the typical database queries. If we classify the traditional query
based on the geographical presentation, this type of query can be divided into two
classes: Location-Aware and Non-location. In the mobile computing environment,
the location of mobile users is dynamic and the query results often depend on this
CHAPTER 2. LITERATURE REVIEW 22
dynamic location. Therefore, this situation creates another additional class, which
is called Location-Dependent Queries.
Figure 2.3: Query types classification
2.3.1 Traditional Query
Traditional query is the most widely known query used in a database. The query
types of traditional query can be classified as: Spatial, Temporal, Spatio-Temporal
(Hybrid) and Others.
A Spatial query performs operations which include spatial searches and map
overlay, as well as distance-related operations [37]. A spatial query always requests
for spatial data information. Spatial data means that the requested data have a
complex structure, are often dynamic and no standard algebra are defined.
A Temporal query specifies a validity or deadline for the query results to be
returned. Example: “A student retrieves a subject timetable for this year”. The
subject timetable will not be valid for the past or future year.
CHAPTER 2. LITERATURE REVIEW 23
A Spatial-Temporal (Spatio-temporal) query requests for a spatial search and
specifies the validity or deadline for the query results to be received. Example:
“Retrieve the five ambulances that were nearest to the location of the accident
between 4-5pm.” [90].
The last category is Other. It implies that the other remaining queries do not
belong to one of the classifications above. Examples:
• A tourist requests restaurant information.
• Students request their academic records or contact details.
2.3.2 Location Query
[50] were the first authors to introduce the idea of queries with location constraints.
These types of queries have one parameter which is location. It implies that the
query result is related to or depends on, that parameter.
Location Dependent Query [130, 63, 94] is a type of query where the answers
depend on the current location of the sequesters. For example, “select all restaurants
within 500 metres from my location”. The answer should give a list of restaurants
within 500 metres from the current location of the requester as illustrated in Figure
2.4. If the requester moves to a new location, the list of restaurants will be changed.
A location is an important field in this type of query and this field can be implicitly
or explicitly mentioned in the query [94].
These types of queries can be further categorised into two groups. The first group
is based on sources and objects, and the second one is based on query retrieval [113].
The sources and objects are represented as users while sending the query and the
searched objects. Their states can be either static or moving. The second state is
based on the states of the query retrieval either one-time or continuous. A one-time
As we mentioned in the earlier section, mobile users need to register with a Base
Station or location positioning device (as mentioned in Section 2.2.2) in order to use
CHAPTER 2. LITERATURE REVIEW 28
a wireless facility. This registration process includes registering location details of
mobile users [67, 19].
After the registration process has been done, a location-dependent query is sent
by a moving user and this query is received by a base station. In processing this
query, the user mobility factors are considered since they are important factors in
answering a location-dependent query [30]. The mobility factors include current
position, velocity and direction of the user, all of which are linked to the query.
This information is used to predict the next location. After the next position is
known, the server probes its database to match the object information against the
user query.
While processing the query, the mobile user moves to another location, which
could be inside the same cell or another cell. In addition, the movement of the
mobile user can be differentiated into two categories: Constraint and Unconstrained
movements [85]. The former is the movement within a network, for example, users
may be driving car, riding bicycle, or travelling by tram or train. In addition, roads
can be either one-way or two-way roads. The latter one is the movement that is not
restricted, for example, walking.
Furthermore, there are three important situations that will lead to wrong an-
swers being given to the recipients. To illustrate this situation, let us consider that
most wireless applications use GPS to get the accurate location. For example, the
server is starting to process the query. In one situation, the user might disconnect
while informing the GPS. Therefore, the GPS does not have the correct location
information of users. The GPS gives the old location instead of the current one to
the server. In another situation, the server uses the current location information
collected from the GPS, but the location information given is not the current one
since the GPS has not been updated with the latest one. This last situation might
CHAPTER 2. LITERATURE REVIEW 29
occur where the user is expected to enter cell A. However, the user enters cell B.
This situation will lead the server in cell A to process unnecessary requests.
Figure 2.5: Requesting a static object and moving within a single cell.
Figure 2.5 shows a global overview of location-dependent query processing within
a single cell. A mobile user sends a location-dependent query to a server asking for
static objects through a BS. The server generates a query result for that query. The
query result is received by the user, but the result is invalid. This is due to the user
having reached a new location and the result does not apply to any objects within
the query scope of the new location.
Figure 2.6: Requesting a static object and moving to another cell.
CHAPTER 2. LITERATURE REVIEW 30
Figure 2.6 illustrates the requesting of a static object with multi-cells move-
ment. A moving user requests a static object while moving into another cell. The
server processes the query and sends the query results to the requester. Since the
requester has moved to another cell, the BS forwards the query result to the BS
where the requester is located. When the query result is received by the requester,
the received result is invalid since the result contains object information from the
previous location.
Figure 2.7: Requesting a moving object and moving within a single cell.
A global overview process of requesting moving objects while moving within a
single cell is shown in Figure 2.7. In the figure, a moving truck requests a moving
object from his location. At the same time, a moving object is registering itself to
the BS. The server processes the query and returns a result to the requester, which
is the moving truck. Upon receiving a query result, the moving car has moved to
another location which is out of range of the query scope. Therefore, the truck
receives an invalid query result.
CHAPTER 2. LITERATURE REVIEW 31
Figure 2.8: Requesting a moving object (user and object moves to another cell).
CHAPTER 2. LITERATURE REVIEW 32
Figure 2.8 shows a user searching for a moving object but both move to another
cell. While sending a query, the user moves to another location resides inside dif-
ferent cells. The cell generates the query result and sends the query result back to
the requester. At the same time, the object moves to another cell before the object
acknowledges the new position to the current cell. Therefore, the server sends an
old position of the moving object to the requester. This event results in the user
receiving an invalid result.
Figure 2.9: Requesting a moving object and user stays at same position.
Figure 2.9 shows a user in the controller room requesting a moving object. Before
the server processes the query, the object has updated its position. Therefore, the
server generates a correct result that is received by the user. On the other hand,
when the server finished processing the query before the object updates its position,
the user receives incorrect information.
Figure 2.10 shows a similar situation to that shown in Figure 2.9. However,
the object moves to another cell. When the object updates its location before the
CHAPTER 2. LITERATURE REVIEW 33
Figure 2.10: Static user requests a moving object.
server has finished processing the query, the user receives the correct information.
Otherwise, the user will receive a query result that contains incorrect information.
Figure 2.11: Periodic query
Figure 2.11 presents a periodic query illustration. In the figure, a user sends a
query once while expecting a query result to be sent at every interval time. The
CHAPTER 2. LITERATURE REVIEW 34
server processes the query and sends the result at every interval time. The process
ends if the user asks the server to stop sending a query result.
Figure 2.12: Non periodic query
Figure 2.12 presents an illustration on a one-time query, where a user sends a
query and receives a query result once. The server no longer processes the query.
2.4.2 Query Processing for a Single Cell
This section presents a query processing mechanism, that focuses in particular on
location-dependent query processing while the mobile user is travelling within a
single cell. The discussion in this section involves a variety of query scope shapes
and approaches to predict next movement location.
A number of shapes exist, such as rectangles, circle, polygon, hexagon [75].
Defining a valid scope for a mobile client is important to generate a correct answer
to a given query since the mobile user has moved to a new location. In this section,
we analyse previous studies in defining a valid scope. The existing works focused on
defining a valid scope using polygon, rectangle and circle.
• Polygon
An approach called Polygonal Endpoints (PE) uses a polygon shape to process
CHAPTER 2. LITERATURE REVIEW 35
a location-dependent query [130]. A direct way to explain the valid scope
of data value is by using the PE scheme. All endpoints of the polygon are
recorded to define a valid scope.
• Circle
Another way to define a valid scope is by using the Approximate Circle (AC)
scheme. The AC scheme is one of the most convenient ways to generate a
valid scope, if we know the distance within which the user would like to find
an object. In the AC scheme, a valid scope can be defined by the centre of the
circle and the radius of value. The maximum size of the circle can be defined
as the current velocity of the user [128]. The advantage is to predict size of
valid scope at the current speed in a time interval.
As mentioned earlier, the movement of a mobile user can be within either a con-
strained or unconstrained network. There are two ways to predict user movement:
using a time function and indexing.
One of the data modelling concepts to represent the position of moving objects
in databases as a function of time, is the Moving Objects Spatio-Temporal (MOST)
devised by [102]. The aim of this approach is to estimate the position of objects
when a query is entered. Therefore, excessive updates are avoided. In their work,
the location of a moving object is conducted as dynamic attributes which are divided
into three sub-attributes: function, updatetime and initial value. How the value of
these dynamic attributes changes over time is denoted by the function. This function
can answer both one-time and periodic query types.
Another way to decide the next location of moving objects is by using Indexing.
Chapter 2.5 discusses indexing structures in details.
In order to answer a query in an efficient way, a query or object space is par-
titioned into several regions. [103] provided a solution to answer Reverse Nearest
CHAPTER 2. LITERATURE REVIEW 36
Neighbour (RNN) queries in two-dimensional space. They divide the space around
the client location into six equal regions by a straight line intersecting the client
location. Thus, there exist at most six RNN objects around the client location.
Moreover, the Region Quad-tree indexing structure is an indexing structure
which uses a minimum bounding rectangle to store data points in four quadrants
of equal size [97, 98]. Section 2.5 presents more details on Quad-tree indexing
structures.
2.4.3 Query Processing for Multiple Cells
While a user is travelling, the user may move to another cell due to the transparency
of a cell boundary. When a user moves to another cell, a handover event occurs
during this period. The current base station may send an invalid query result after
the user moves to another cell.
Zheng et al. [128] categorised various handover mechanisms into four types:
Naive, Priority, Intelligent and Hybrid mechanisms. The Naive method is the sim-
plest of the four mechanisms to be implemented. However, the waiting time for
answers from a server is shorter compared with the Priority method which can an-
swer queries of normal users unless the number of urgent users keeps increasing.
The Hybrid method does not give a better result because, if the number of users is
large, the waiting time will be lengthened. The Intelligent method gives a better
result since this method does the calculation of the expected time to leave current
cell. In this method, if the expected time to leave current cells is known, the BSs
of new cells know when to process the queries by assuming that unexpected delays
are not occurring.
On the other hand, [73] proposed other four handover approaches, namely Ping-
Pong avoidance (PPA), Towards the Border (TTB), MGIS Data Resolution (MDR)
CHAPTER 2. LITERATURE REVIEW 37
and Transmission Power and Interference Optimization (TPIO). In the PPA ap-
proach, undesirable handoffs can be minimised by taking advantage of the area
information and mobility model to predict users movement. The TTB is useful for
predicting when the users will reach the boundary of the BS.
The Intelligent and TTB approaches have the same purpose which is to predict
how long it takes the users to reach the BS boundary. However, the Intelligent
approach is very straightforward in computing the reaching time in a new BS cover-
age. This approach ignores the movement direction. In contrast, the TTB approach
considers user directions in computing the reaching time.
2.5 Indexing Structures for Query Processing
The indexing technique is a common mechanism to help in accessing a collection of
records and improving the efficiency of query processing [93, 129]. This technique
uses an index structure, which is a data structure that organises data records to
optimise certain kinds of retrieval operations. An index allows us to efficiently load
all records that match search conditions on the search key fields of the index.
Various index mechanisms for conventional and mobile query processing are dis-
cussed in this section. Existing indexing mechanisms including their related out-
standing problems for query processing will be discussed.
2.5.1 Conventional Index Query Processing
Due to its efficiency in answering queries, all database records have been indexed
and placed into an index structure. Various types of index structures have been
developed [93, 32, 54]. Amongst those existing index mechanisms, the tree based
schemes are prominent and widely used due to their easy tree traversal [123, 115].
CHAPTER 2. LITERATURE REVIEW 38
The B+-tree [32], is widely known as one of the data structures for index, and
is a data structure that contains subtree and leaf nodes. A subtree is formed by
a collection of non-leaf nodes. A non-leaf node contains up to m keys and m+1
pointers to the nodes on the next level of the tree hierarchy. All nodes on left-hand
side of the parent node have key values less than or equal to the key of that parent
node. In contrast, the key values of the right-hand side nodes of the parent node
are greater than the key values of parent node. The bottom-most nodes are called
leaf nodes.
The R-tree index structure stores multi-dimensional indices (such as points),
which was developed by Gutmann [41]. This index structure type has the efficiency
and capability to handle both point and region data items. Many researchers [109,
41, 11, 99], just to name a few, expanded the features of the original R-tree into
many variations of R-tree. The aim of this expansion was to provide an efficient and
dynamic index structure for spatial data.
The structure of R-tree is similar to that of the B-tree indexing structure. Figure
2.13 illustrates the R-tree. In the B-tree, a node of the tree is a single index; however,
a node in the R-tree stores a set of d-dimensional geometric objects represented as
a rectangle, which is called a Minimum Bounding Rectangle (MBR), which is used
to group the closest objects together into a rectangle where every area has the least
enlargement area.
The R-tree insertion operation can be explained as follows. Assuming that sev-
eral data points would be inserted into an R-tree with a maximum 6 points per
node. In the first state, while inserting data points with id p to rectangle R, a
bounding box is computed for the object and insert the pair <p,R> into the tree.
The bounding box is enlarged when a data point is inserted. If the tree is empty,
then this bounding box becomes a root node of the tree.
CHAPTER 2. LITERATURE REVIEW 39
(a) R-tree in two dimensional space
(b) Rtree
Figure 2.13: The R-tree illustration [93]
After a certain time, when the maximum points for a bounding box have been
reached, a new bounding box is created to accommodate new data points. The
existing objects are redistributed to adjust to the bounding box. Adjustment of the
bounding box is called Splitting. In general in a tree splitting process, objects in the
existing bounding box, to minimise the need for enlargement, are grouped together.
Once the splitting process has been completed, both nodes become leaf nodes. A
new root node is created to cover both bounding boxes.
Now, assume that a R-tree exists and a data point with id ‘d’ is to be inserted.
A traversal is started at the root node and cruises a single path from the root node
CHAPTER 2. LITERATURE REVIEW 40
to a leaf. At each level, the child node is chosen whose bounding box demands the
least enlargement to cover the data point d. If several children have bounding boxes
that cover d, from these children, we select the one with the smallest bounding box.
At the leaf level, the data point is inserted, and if necessary, the bounding box of
the leaf is enlarged to cover d. When the bounding box is enlarged at the leaf level,
this enlargement must be propagated to ancestors of the leaf (after the insertion
is made), the bounding box for every node must cover the bounding box for all
descendants. If the leaf node lacks space for the new object, a similar process as
mentioned above is applied, which includes splitting the node, reallocating entries
between the old leaf and the new node, adjusting the bounding box and propagating
these changes up the tree. Algorithm 2.1 shows the R-tree insertion algorithm for
inserting a data entry E(I,B).
Algorithm 2.1: The R-tree insertion algorithm.
beginN ← root Node.if N is a leaf then
return N.endSelect a node ’A’ in N whose A1 needs least enlargement to store EI.Traverse until a leaf node is reached by setting N to be the child nodepointed by A.if the selected node A is the leaf node and has a free space for E then
Insert E.else
Split node A using one of the splitting algorithms.endPropagate any changes upwards by invoking Adjust Tree.if Adjust Tree requires the root node to be split then
Expand length of the tree.end
end
CHAPTER 2. LITERATURE REVIEW 41
Algorithm 2.2 shows the algorithm for the adjusting tree. The process is started
from a leaf node upwards until it reaches a root node of the tree. When a node
is full because of the insertion of a new record or a previous split, a new node is
created to store the remaining contents of the existing node. The adjusted node is
propagated to its parent node until it reaches the root node.
Algorithm 2.2: The adjusting R-tree algorithm.
beginN ← a Leaf Nodeif N was split previously then
NN ← LL where LL is the second split node.endwhile N 6= rootNode do
P ← parent node of NPN ← a bounding box of N in PAdjust PN I so that it tightly covers all bounding box entries in N.if NN is partner of N due to resulting from an earlier split then
Produce a new bounding box called PNN which covers allrectangles in NN and the pointer PNN ptr pointing to NN in P.Add this new bounding box PNN to P.if P has no free slot then
Execute Split Node to separate the content of P into P and PP.N ← P .NN ← PP .
end
end
endend
When nodes are full, nodes splitting occurs. The splitting mechanism is not as
simple as for B-tree since there will be overlapping of MBRs. The original R-tree
proposed three splitting mechanisms [41] as follows:
• Linear
This splitting algorithm is a mechanism that selects ends that are far apart. It
finds nodes by selecting them randomly and allocates them so that the smallest
MBR enlargement is required by the allocation.
CHAPTER 2. LITERATURE REVIEW 42
• Quadratic
This algorithm minimises a small-area split; however it is not guarantee to
produce the smallest area possible. Similar to the Linear mechanism, this
algorithm selects two nodes that have the maximum distance between them
and allocates another node to one of the two nodes. In the node allocation,
the node is placed into a group in order to have less expansion.
• Exponential
This algorithm is the most straightforward splitting algorithm of the three
candidates. It finds all possible groupings and selects the best one, so the
minimum area node can be found.
To search a point for a query Q in a R-tree, a traversal begins with the R-tree
root node to a leaf level. The bounding box for each child of the root is verified to
see whether this bounding box overlaps with the query. If more than one child of the
root has a bounding box that overlaps Q, all corresponding subtrees are traversed.
When we get to the leaf level, the node is checked to find whether it contains the
desired point. On the other hand, it is possible that a leaf node will not be visited if
the query point is not in the indexed dataset. Algorithm 2.3 is a searching algorithm.
R-trees can also be used to answer Nearest-Neighbour (NN) queries [96]. Nearest-
Neighbour queries are to find objects within a certain radius. Minimum Distance
(MinDist) and Minimum of Maximum possible distance (MinMaxDist) ordering met-
rics are used for the R-tree searching algorithm. MinDist is used to decide the closest
objects to point P from all those enclosed in a rectangle R. MinMaxDist is a metric
used to calculate the minimum value of all the maximum distances between the
query point and points on each of the n axes respectively. This metric guarantees
there is an object within the MBR at a distance less than or equal to MinMaxDist.
CHAPTER 2. LITERATURE REVIEW 43
Algorithm 2.3: The searching R-tree algorithm.
beginN ← a root node.if N is not a leaf node then
Find each child of the current node that bounding box of the childnode overlaps the query point / region.if found then
recursively search the child of the node.end
elseVerify all entries to discover whether an entry overlaps with S.Return the entry that overlaps with the query point / region.
endend
Algorithm 2.4 shows the Nearest-Neighbour search algorithm with depth-first
search traversal. The traversal starts with the R-tree root node to the leaf level. In
the beginning, the nearestN (the nearest neighbour distance) value is infinity. At
each level, a new node parameter is pointed by a newly visited non-leaf node during
the downward traversal. The algorithm calculates the ordering metric restrictions
for all its MBRs and sorts their corresponding node into a list called Active Branch
List(ABL).
Once the ABL has been created, pruning strategies 1 and 2 are applied to the
list to eliminate unnecessary branches. Then, the algorithm goes through each
entry in the ABL until the ABL is empty. For each entry, this algorithm is called
recursively by passing the entry, Point and nearestN values. At a leaf node, a
function objectDIST is called to calculate the distance between the point and the
MBR. The returned value is compared with the current value of nearestN. If the
returned value is smaller, the value of the nearestN is updated. The step is repeated
for each entry in the leaf node. On the returning from the recursion, this new
// Current NODEcurrNode ← Node// Search POINTsearchPoint ← Point// Nearest NeighbournearestN ← NearestNODE newNodeBRANCHARRAY branchListinteger dist, last, i// At leaf level - compute distance to actual objectsif (Node is Leaf) then
for i = 1 to Node.count dodist ← objectDIST(Point, Node.branch[i].rect)if (dist < Nearest.dist) then
else// Non-leaf level - order, prune and visit nodes// Generate Active Branch ListgenBranchList(Point, Node, branchList)// Sort ABL based on ordering metric valuessortBranchList(branchList)// Perform Downward Pruning// (may discard all branches)last ← pruneBranchList(Node, Point, nearestN, branchList)// Iterate through the Active Branch Listfor i ← 1 to last do
newNode ← Node.branchbranchList[i]
// Recursively visit child nodesnearestNeighbourSearch(newNode, Point, Nearest)// Perform Upward Pruninglast ← pruneBranchList(Node, Point, Nearest, branchList)
end
endend
CHAPTER 2. LITERATURE REVIEW 45
estimation of NN is taken and pruning strategy 3 is applied to eliminate all branches
with MinDist(P,M) > Nearest for all MBRs M in the ABL.
The three strategies of the pruning theorem are described as follows [96]:
1. An MBR M with MinDist(P,M) greater than the MinMaxDist(P,M1) of an-
other MBR M1 is discarded because it cannot contain the NN. This is used
for downward pruning.
2. An actual distance from P to a given object O which is greater than the
MinMaxDist(P,M) for an MBR M can be discarded, because M consists of an
object O1 which is nearer to P. This is used for upward pruning.
3. Every MBR M with MinDist(P,M) greater than the actual distance from P
to a given object O is eliminated because it cannot surround an object closer
than O. This is used for upward pruning.
In the context of retrieving objects from several servers, the above algorithms
are not efficient, because a tree traversal is always started from the root node in
those servers.
2.5.2 Moving Object Index Query Processing
Some researchers also have done some works in applying the concept of existing index
structures to process queries in a mobile environment. This section discusses existing
works that use an indexing structure to process queries in the mobile environment.
Authors in [107, 27] used the PMR Quadtree index structure to answer con-
tinuous queries that change in terms of function of time. The index structure is
another variant of the quad tree that used to store segment fragments and has a
hierarchical vector representation [80]. The index values contain a function of time
in the two-dimensional time-attribute space. More specifically, the PMR Quadtree
CHAPTER 2. LITERATURE REVIEW 46
stores information about a line segment in every quadrant of the underlying space
that it crosses.
The RQ-tree index structure is a combination of the R-tree and the Quad-tree
to index the location of objects [39]. The authors argued that space entities are not
distributed evenly and they could form different shapes of objects. R-tree degrades
the performance of all scopes of located objects that are not close to rectangles.
Therefore, the RQ-tree contains the R-tree as the outer tree and the Quad-tree to
store the remaining objects. This approach uses the R-tree to store regular objects
(the form of objects is a rectangle) and the Quad-tree to store irregular objects,
where the Quad-tree root node is a leaf of R-tree.
The R-tree is used to index static range query and velocity constrained for pro-
cessing continuous spatial queries in querying moving objects [91]. In this mecha-
nism, all incoming queries are indexed in an R-tree index structure and the second
R-tree index (VCI) is a R-tree based index with an additional field vmax in each
node and is used to index all moving objects. The vmax entry for an internal node
is the maximum of the vmax entries of its children. At the leaf level, the vmax entry
is the maximum allowed speed among the objects pointed to by the node.
The Lazy Update R-tree (LUR-tree) was proposed in [59]. This approach in-
dexes the current positions of moving objects. It also decreases update cost by
eliminating unnecessary modification of the tree while updating the positions. The
index structure is updated only when an object leaves the corresponding MBR. The
LUR-tree swaps a position of an object in the leaf node only if the new position of
the object is still in the MBR.
The TPR-tree indexing structure is based on the R-tree to index continuously
moving objects at all times in the future [12]. In this scheme, the size of the rect-
angle is extended based on the velocity and time. Therefore, the number of targets
CHAPTER 2. LITERATURE REVIEW 47
remaining inside the rectangle will increase as the size of the rectangle is increased.
This indexing structure is also used to index the uncertainty of moving objects [45].
The TPR*-tree indexing structure is an enhancement of the TPR-tree indexing
structure by considering predictive queries [105]. The TPR*-tree adapted insertion
/ deletion algorithms from R*-tree indexing structure.
Another variant of the TPR-tree is the TPROM-tree [28], which is an index
structure that indexes the current and future positions of moving objects. The
index structure also handles object updates efficiently by adopting a memory-based
update approach. The aim of adopting the memory update approach is to reduce
the update cost by avoiding the necessity to delete old data items from the index
structure while updating the index structure.
The Q+R-tree [121] indexing structure is similar to RQ-tree in terms of using a
combination of Quad-tree and R-tree indexing structures. However, the Q+R-tree
is used to index moving objects. The R-tree component indexes quasi-static objects,
whereas the Quad-tree indexing structure is to index fast moving objects which are
distributed over wider regions. In other words, the R-tree indexes those objects that
are currently moving slowly, whether or not they are crowded together in buildings
or houses.
Another variant of the Q+R-tree is the PQR-tree [40], which efficiently indexes
the current and near future positions of the moving objects. The PQR-tree also
extensively decreases the update cost. The PQR is different from Q+R-tree in
terms of integrating structure to put the moving objects. The benefit of this index
structure is that it is able to manage the moving objects inside and outside road
networks at the same time. The current position and near future positions of moving
objects can be queried effectively.
CHAPTER 2. LITERATURE REVIEW 48
The D-tree is similar to the KD-tree indexing structure [124]. The D-tree index-
ing structure is a height-balanced binary tree which is constructed based on data
regions partitioning. In constructing the D-tree, a space is recursively partitioned
into two subspaces containing a similar number of regions until every subspace has
one region. One or more polylines in a two-dimensional space are a group of divisions
between regions which represents a partition of two subspaces.
The KD-tree is a binary search tree that represents a recursive subdivision of the
universe into subspaces by an average of (d-1)-dimensional hyperplanes [37]. The
hyperplanes are iso-oriented and their direction alternates between the d possibili-
ties. KD-tree is also known as the Range tree [13]. In [55], the authors proposed an
approach to map moving objects and their velocities into points and keep the points
in a KD-tree index structure.
The Spatio-Temporal R-tree (STR-tree) and the Trajectory-Bundle tree (TB-
tree) [86] are two indexing structures, which are extensions of R-tree, to index mov-
ing object trajectories. The former considers the trajectory identity in the index,
whereas the second one is a hybrid structure, which keeps trajectories and allows
for R-tree typical range search in the data.
2.6 Mobile Query Processing at Client Side
This section discusses issues that relate to query processing mechanisms for mobile
client devices. These issues are grouped into 3 classifications: Mobile-join, Top-K
queries and Caching. The first and the last categories are similar. In the first cate-
gory, data is downloaded from several cells, which has to be executed on the mobile
devices in order to obtain explicit results. Once the results have been produced
and sent to the user, they are deleted within a short amount of time. In the third
CHAPTER 2. LITERATURE REVIEW 49
category, the data is retrieved from the current cell for the first time request and
loaded from the local copy for subsequent requests. In contrast to the first category,
data in the local copy is kept until there is not enough room to store new incoming
data. Therefore, providing a caching to cache frequently accessed data items on the
client side is an effective approach to improving the system performance [10, 126].
The second category focuses on retrieving records which are ranked in Top-K.
2.6.1 Mobile-Join
Obtaining explicit query results can be done at the client side by retrieving data
from several cells, which would be executed by joining them locally. Downloading
all relations from those cells may not be a perfect solution considering the limited
resources in a mobile device, which includes small size memory space to store a
large volume of data and small size of display to view all results [68]. Several join
mechanisms have been proposed and they are explained in this section.
[66] have proposed three query processing mechanisms at the mobile client side.
In the first approach, a mobile client requests data from related cells and those
data are joined on the mobile client device. In the second approach, all data are
downloaded from one cell and only the primary key is retrieved. The information is
then matched at the mobile client side. If any information is missing, the missing
information is retrieved from other side. In the last approach, all primary keys are
needed from cells and the downloaded primary keys are matched at the client side.
The data on those matched keys are downloaded from cells.
Authors in [83] proposed two query processing mechanisms, where the pieces of
data are located either in other mobile devices or servers. In the first approach, a
mobile user sends a query to a server which then informs other mobile users that
have other parts of data. The mobile user and the server send the data to the
CHAPTER 2. LITERATURE REVIEW 50
requester. Similar to the first approach, the server is in charge of joining other data
and sends the data to the requester.
Block-Based Processing is a mechanism that breaks down the data into blocks
and transfers each block one by one to the server [68, 66]. The aim is to overcome
memory capacity limitation and narrow bandwidth in the mobile environment. Two
block-based mechanisms for client side query processing have been proposed in [66],
namely: Static and Dynamic blocks. Both mechanisms are similar in terms of
providing the number of records per block to be downloaded from one server. These
records are then compared with another list from another server. They are different
in the way records are eliminated from a block. For the dynamic mechanism, the
last record of each block is compared to find out which block containing last record
is smaller and this block will then be entirely removed. In other words, the block
that has the larger last record, will be preserved with the qualified match being
removed from the block.
The Recursive and Adaptive Mobile Join (RAMJ) mechanism can be executed on
a mobile device and joins two relations located on non-collaborative remote servers
[69]. Data space is partitioned into several parts and statical information about those
data is retrieved from the servers without downloading the original data. Based on
this information, the information from those parts are joined and adaptively select
the data and its relation that fall into this partition are loaded from the server.
2.6.2 Top-K Queries
Processing or displaying the entire query results is not needed if k number of query
results, which are highly ranked, can be processed or displayed. This type is called
Top-K queries. Some researchers have done some work in Top-K queries for Web
CHAPTER 2. LITERATURE REVIEW 51
databases, Peer-to-Peer (P2P) network, data stream, sensor network or mobile en-
vironment [15, 9, 48, 120, 25].
Some works have been undertaken on processing Top-K in a web application
[62, 122]. This work take into account the unavailable relation attributes to be
accessed through the external web form interface. Furthermore, this situation causes
a potentially large set of data sets to be queried repeatedly. Hence, work has been
done to tackle the above challenges by proposing a technique that will execute Top-K
queries. The Top-K queries are executed through a setting where the attributes, for
which users determine target values, are controlled by external, autonomous sources
with a variety of access interfaces.
Content sharing, which is one application in the P2P environment has been
receiving more attention from users. As a result, the number of people using this
application has been increasing which will impact upon the network performance.
Therefore, it is essential to decrease the network data transfer due to the expansion
of the number of people using this environment. Most of the users are generally
interested only in a result that are correlates with their query, or the best one. One
solution is to apply the Top-K queries within this environment in order to return
the most relevant results.
Some researchers have been working on applying Top-K query algorithms in
the P2P network [8, 76, 21, 43]. Decentralised Top-K query developments have
been completed by some of these researchers. They tried to use local ranking,
optimized routing and merging to reduce the number of results returned to the
users. Consequently, the load of data transfer has been reduced, however ranking
and merging of results has increased the computing workload.
Top-K has also enabled the output of the highly relevant objects in the earliest
stage, and this is useful in mobile environments because the amount of data transfer
CHAPTER 2. LITERATURE REVIEW 52
and power consumption can be reduced. Many of the existing works, for example
KLEE and SR-Combine [77, 78], have demonstrated their efficiency in dealing with
tough response times in a mobile environment. The proposed schemes deliver some
initial results early which reduces waiting time, data transfer cost and processing
power. The most important feature is its capability to adapt to various environments
with a faster bandwidth network, because this feature is self-adapting for retrieval
and concentrates on real time requirements.
Due to the increase in a number of popular applications, the size of the data
stream flowing over the network is also increasing too. It means that the data size
flowing over the networks will overload the network traffic. The impact of this is that
users may have problems if they are not fully able to handle the large, continuous
flow of data. One of the proposed approaches is the space-saving algorithm [77]
which uses the maintaining partial information of interest as the main idea. The
aim of this algorithm is to process some stream types before the data is eliminated
forever. In this algorithm, the benefit of defining top-K in data streams is based on
the frequency of elements retrieving 0.5 percent or more of the total hits which might
comprise the top 500 elements. Hence, this algorithm produces space efficiency with
a strict guarantee on errors that limits estimate counts of elements with Top-K
memory requirements.
2.6.3 Cache Replacement Policies
In general, placing a cache at client side reduces network activities between client and
server. Caching mechanisms have been widely used to store frequently accessed data
for database [7, 106], distributed [106, 36] and web systems [125] in wired network
and mobile environments [104, 47, 94]. This section focuses on cache replacement
policies for the mobile environment.
CHAPTER 2. LITERATURE REVIEW 53
The Least Recently Used (LRU) cache replacement policy uses a timestamp to
eliminate objects from a cache. Timestamp is the time when data items are received.
When the cache is not large enough to receive new objects, this approach eliminates
data items that have the oldest accessed time.
[95] proposed a client caching which is based on the clustering structure ex-
ploiting both semantic and temporal locality, which is called Two-level LRU. This
approach clusters together the semantically or adjacently related query results in
the cache. Because of its intrinsic properties, semantic caching is regarded as an
ideal cache scheme for mobile computing. The aim of this approach is to keep the
most profitable data in the cache with the help of clustering. Thus, if a query Q2
can be totally or partially answered by Q1, it is put in the same cluster as Q1. In
the clustering process, when a query can be partially answered by a segment of a
group, part of the segment which is an answer to the query is removed from the
segment. Part of the segment and the remaining answer to the query are combined
into a new segment. If a segment becomes empty as a result of the removal, it is
removed from the cluster. On the other hand, if the query is partially answered by
segments which belong to different clusters, the clusters are merged into one cluster.
The aim of this approach is to effectively reduce wireless network traffic and deal
with disconnection.
[22] proposed a semantic model for client-side caching and replacement in a
client-server database system, which is called Manhattan-Distance based. In this
approach, the client maintains a semantic description for the data in its cache as a
reminder query. Reminder query consists of the missing parts that are not available
in the cache. The maintenance of usage information for replacement is done in an
adaptive fashion for semantic regions. The usage information here is incorporated
CHAPTER 2. LITERATURE REVIEW 54
with collections of tuples. The usage of sophisticated value functions which asso-
ciates semantic notions of locality is possible if a semantic description of cached data
is maintained. This policy gives a higher priority to replacing the cached objects
which are the greatest Manhattan-distance from the client’s current location.
The Furthest Away Replacement (FAR) policy is proposed in [94]. In their
proposed approach, they make decisions based on the current location and movement
direction of mobile clients. Therefore, a priority is given to the data items that are
located furthest away and in the opposite direction to the user’s current location.
It means that the cached objects which have the higher priority will be evicted first
since the users are unlikely to access those objects within a short time.
The RBF-FAR replacement policy approach has slightly modified FAR [65].
They claim that FAR fails in some cases since predicting the next possible loca-
tion is not considered. The RBF-FAR improves the FAR approach by adding an
intelligent knowledge to predict the next possible movement. The aim of adding
an intelligent knowledge is to use RBFNN to predict the next location instead of
Velocity in FAR RBFNN is a self learning model which can learn from historical
information from the semantic segments index.
The Probability Area (PA) / Probability Area Inverse Distance (PAID) ap-
proaches are two cache replacement policies for location-dependent data under a
geometric location model [130]. In the PA approach, valid scope area and the access
probability of data items are two consideration factors for cached objects replace-
ment decisions. Whereas, the PAID approach considers the inverse distance as
another additional factor. In both approaches, the data item that has least cost of
product of those two factors has higher priority to be removed from the cache.
CHAPTER 2. LITERATURE REVIEW 55
The Mobility Aware Replacement Scheme (MARS) approach considers a gain-
based cache replacement policy, which considers client’s location, movement direc-
tion and access probability parameters [60]. This approach is unable to detecting
a user’s regular movement. Therefore, the authors proposed another improved ap-
proach, which is called MARS+ [61], to deal with the temporal properties and spatial
properties of a client’s access pattern in order to improve the caching performance.
The MARS+ approach also makes it possible to detect regular client movement
paths. While deciding on which cached objects to be eliminated, this movement
pattern knowledge is used to evict the cached objects.
The Prioritized Predicted Region-based Cache Replacement Policy (PRRP) ap-
proach analyses the data item cost on the basis of access probability, valid scope
area, data size in cache and data distance based on the predicted region, which have
not been considered in any of the existing policies [57]. In this proposed approach,
the fundamental aim is to select cached item victims by using a predicted region-
based cost function. The predicted region is selected based on the client’s movement
and applies it to determine the data distance of an item.
The Weighted Predicted Region-based Cache Replacement Policy (WRRP) ap-
proach picks a predicted region based on a client’s movement, then, the predicted
region is applied to calculate the weighted data distance of an item [58]. This ap-
proach is similar to PRRP in terms of considering access probability, valid scope
area and data size in cache. The WRRP approach takes into account the weighted
data distance from the predicted region as an additional factor to pick an elimination
victim from the cached data items.
The Rule-based Least Profit Value (R-LPV) approach considers the profit gained
due to data caching [18]. In this policy, various caching parameters are considered.
They are data access probability, update frequency, retrieval delay from the server,
CHAPTER 2. LITERATURE REVIEW 56
cache invalidation delay, and data size. The item is eliminated by using a function
called profit function. The purpose of this function is to determine the profit from
caching an item. This cache replacement policy is similar to the one in the client-
server environment [52].
The Proactive caching model caches the result objects as well as the index that
supports these objects as the results [46]. The purpose of caching the indexes is to
enable the objects to be reused for all common types of queries.
The Complementary Space (CS) cache replacement policy is used to maintain
a global view of the whole dataset [64]. In this cache replacement policy, different
portions of a global view are cached in varied granularity based on the accessed prob-
abilities in the future queries. The cached objects with very high access probabilities
are kept in the cache.
2.7 Outstanding Problems
A discussion of the problems and shortcomings of existing works is presented in this
section. Earlier in this chapter we reviewed the literature that dealt with works on
mobile query processing at the client and server sides. Our review reveals that there
are still problems and issues that need to be addressed and resolved.
An examination of existing problems from previous researchers is carried out,
which is described in the next three subsections. A discussion of mobile query
processing problems is presented in Section 2.7.1, followed by indexing mechanism
problems for processing multi-cell queries. The last problem to be discussed is a
client caching replacement policy, which will be presented in the last subsection.
CHAPTER 2. LITERATURE REVIEW 57
2.7.1 Mobile Query Processing at Server Side
This section presents issues that are still outstanding in the mobile query involving
both single and multiple cells. After analysing the existing work on query processing
at the server side, several major problems still exist, one is the use of circle as a query
scope. There are objects which are located outside the circle boundary, which are
not retrieved by the query. Another is that objects have been passed are not of any
interest to the user. Hence, in this thesis, we focus on using square as a query scope
and excluding unnecessary items, which have been passed. We will demonstrate
that it is possible to retrieve additional items without including unnecessary items
into the query result, so as to improve the performance of query result retrieval.
The next problem is to overcome frequent disconnections in the query processing.
This implies that the query processing needs to be intelligent of when to process or
preserve the existing result in order to deal with the frequent disconnections while
processing a single-cell or multi-cell query. To date, we have not yet come across
an existing work that addresses on this issue. Hence, our research in this thesis
(Chapter 3 in particular) will look at improving the query processing at the server
side.
The following questions state some of the major problems to be addressed:
• How do we model efficient mechanisms for query processing within a single
cell?
• How do we design an efficient mechanism to process queries which involve
several cells?
• How can the proposed model cope with overlapping or non-overlapping cells
boundaries?
• How do the proposed model deal with frequent disconnections?
CHAPTER 2. LITERATURE REVIEW 58
2.7.2 Indexing Structures for Multi-Cell Query Processing
Processing a query has to be done quickly before the mobile users pass the predicted
location to receive the query result. In this section, we describe an indexing problem
when retrieving query results from multiple cells, which involves more amount of
records.
Indexing is a convenient way to store database records information into server
memory due to its small value. Many approaches have been created to organise
those indexes in memory; however, a tree index is one of the most prominent data
structure used in practice.
Some major problems of the existing tree indexing structures are as follows:
• How can we have a single structure that contains all static items in active
cells?
• How can we store all requested items from neighbouring cells into the current
cell without reducing any performance of the current cell?
• If we are able to solve the above problem, how do we manage the requested
objects when they do not exist in a cell?
2.7.3 Client Cache Management
This section describes an outstanding problem in query processing, which are focus-
ing on when users travel around several same locations and pose the same queries.
As we discussed in the previous section, the mobile environment has some limited
features despite its advantage of being able to establish communication everywhere
at anytime. Its unreliable network connection, narrow network bandwidth and ex-
pensive data transfer cost are some negative factors in the mobile environment.
CHAPTER 2. LITERATURE REVIEW 59
Providing a cache for the client device is one way to overcome this issue, because
the incoming query results are stored in the local cache. The problem arises because
to date all existing approaches store all incoming query results to the local cache.
The next problem is dealing with the cache maintenance when the available
cache slots cannot accommodate all incoming query results. Improving the cache
hit performance is another focus of our research in this thesis. We attempt to use
an existing grouping mechanism to group the cached objects.
The following are questions regarding cache maintenance development:
• How can we model a cache that stores at least k items per request rather than
receiving a full set of incoming items in order to cope with the limitation of
the mobile environment?
• How does the quality of a cache hit improve by considering the weight factor?
• How can we model a cache by adapting one of the grouping algorithms where
the request items received at least K items?
• How can we design a cache model by considering distance, grouping and at
least K items per request?
2.8 Conclusion
At the beginning of this chapter, we have presented the architecture of a mobile
environment which includes the current wireless technologies. The mobile computing
environment has some constraints such as narrow bandwidth, short-life battery,
limited storage and frequent disconnections, all of which make the task of processing
mobile queries more complex.
CHAPTER 2. LITERATURE REVIEW 60
A user’s mobility creates a unique class of mobile queries besides the traditional
query. This class is called Location-Dependent Query. The location of user and
objects are two important parameters in location dependent query. These two pa-
rameters add further complexity to query processing since these parameters change
their location during the query processing period.
Finally, the main contributions of this chapter can be categorised as following:
• A query taxonomy is presented. This classification is important since it is pos-
sible to analyse all types of queries for data management in a mobile computing
environment.
• Mobile query processing issues at both server and client sides are shown. The
issues, arising from location-dependent query processing in a mobile computing
environment, need further investigation.
Chapter 3
Query Processing at Server Side
3.1 Introduction
A mobile query is a query that is requested while the user is travelling. The current
location of mobile user is a unique factor that must be considered, because the query
result depends on the current location of the requester. Users may remain in the
same location or move to another location while waiting for the query results. If a
user stays in the same location, an invalid query result is unlikely to occur since the
receiving location is the same as the sending location. On the other hand, when the
user moves to another location, the sending location is different from the receiving
location. Although, the receiving location can be predicted based on the travel
velocity of the mobile user, invalid query result might still occur if the user passes
beyond the predicted location.
While requesting a query result, the query scope might request objects which are
located in the same or different cell. Query scope is an area where the user requests
some objects. For example, retrieve a list of hospitals within 500 metres. Therefore,
an area within 500 metres from the user’s location is called query scope. Cell is
61
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 62
a service area for one base station. A base station is an intermediate host which
connects mobile devices and static hosts. If a query scope does not pass one cell, the
query processing for the situation is called single-cell query processing. However, if
a query scope passes more than one cell, it is called multi-cell query processing.
In this chapter, we propose schemes for single-cell and multi-cell query processing
approaches at the server side, which focus on retrieving static objects. The proposed
approaches for single cell query processing are divided into three categories based on
the query scope movement against a base station: namely (i) Static, (ii) Dynamic
and (iii) Angle. The static category is characterised by the query scope and is parallel
to the base station. In this category, we have developed three algorithms, which are
based on horizontal, vertical and diagonal movement. The dynamic category is
when the query scope is perpendicular towards the user direction. Finally, the angle
category is based on the angle of user direction.
The proposed approaches for multi-cell query processing are categorised into two
categories. The first category considers an overlapping and non-overlapping amongst
base stations. The second one considers how to handle disconnection to the base
station boundary.
The structure of this chapter is shown in Figure 3.1. Section 3.2 presents a pre-
liminary knowledge as a foundation of this chapter. Sections 3.3 and 3.4 describe
proposed single-cell and multi-cell query processing approaches at server side re-
spectively. Section 3.5 discusses the proposed approach on handling disconnections
for single and multi-cell query processing. Case studies are described in Section
3.6, which give some illustrations and to support the explanation of the proposed
approaches. A further discussion for both approaches is provided in a discussion
section (Section 3.7). The last section of this chapter concludes the contents of this
chapter.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 63
Figure 3.1: The framework of chapter 3
3.2 Preliminaries
This section presents an overview of query processing at the server side. The sec-
tion is divided further into three subsections as outlined in Figure 3.1. The first
subsection introduces all terms which are used in this chapter. The next sub-section
(Section 3.2.2) discusses a shape selection criteria to be used as a query scope.
Several query types are described in the last sub-section (Section 3.2.3).
3.2.1 All Terms Used
In this section, we introduce terms used in our work. These are:
• Cell scope: an area serviced by one base station. Mobile users can exchange
information with a base station within this area.
• Base station (BS): a stationary host which does message forwarding from and
to a static network. BS can connect to one or multiple database servers.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 64
We assume that a BS connects to a single database server, even though it is
connected to multiple database servers.
• Query scope (QS): an area within which mobile users query static objects. We
use the terms ‘query scope’ and ‘valid scope’ interchangeably. This scope can
be presented using an existing shape, such as circle, hexagon, and square.
• Parallel query scope: a query scope which is parallel to a BS where mobile
users currently reside.
• Dynamic query scope: a query scope which is not located in parallel to a BS
where mobile users currently reside.
• Location: a point in two coordinates which present the location of a mobile
user or static objects. To simplify, we assume that a location of an object is
presented as a point.
• Travel direction: a straight line which is measured from starting to ending
points.
3.2.2 Shape Selection for a Query Scope
In this section, we discuss how to choose a shape as a query scope. There are a
number of shapes that can be used to denote as a query scope, such as: rectangle,
triangle, square, and circle.
Figure 3.2 shows all locations of vending machines, a restaurant and a user within
the BS boundary. Assuming the user would like to find the nearest restaurant within
n or m square, where n and m are numbers which represent a distance or area where
the target will be probed respectively. All targets within the boundary of BS are
valid for that BS only. In order to get a valid answer to a user’s query, the BS needs
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 65
to keep track of the current location and the query scope of a user. Otherwise, some
targets in a generated query result become invalid when the user has moved, even
though, its movement is still within the same boundary of the BS.
Figure 3.2: A scenario presented in two-coordinates
Now, we describe the shapes. Firstly, lets consider a rectangle. A rectangle has
different length for its horizontal and the vertical lengths. It is hard to apply the
rectangle as a query scope. Secondly, lets consider a triangle to represent a valid
scope. Assume the distances from the centre to left, right and top are the same.
These distances tell us that the base is twice its height. If we calculate the area of
a triangle, then the area of a triangle is the same as the area of a square. However,
it is hard to decide whether a target is inside the boundary. Therefore, we do not
consider to use a rectangle or triangle as the query scope.
The next two candidates, a square and a circle, have similar capabilities. A
square has more accuracy and can more easily be used to catch a target closest to
the user compared with other shapes, and its length is the same as its width. The
dimension of a square is presented by the distance from the user query to the left,
right, top and bottom. If an area is entered, then the dimension of the square can
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 66
be found by taking the square root of the area (√
area). Therefore, a dimension of
the square presents as a valid scope of the query.
On the other hand, circle is one of the most popular shapes of choice because
it is the right shape for retrieving the nearest neighbour objects in an efficient way.
All objects within distance n units can be found.
Figure 3.3: The proposed approach
Now, lets apply both shapes to the illustration in Figure 3.2. A sample query, “a
user would like to find a restaurant within n units from the current distance”. The
restaurant will be found if we use a square as a valid scope, because the square has a
greater scope compared with that of a circle as shown in Figure 3.3 if the perimeter
of both shapes are the same. Furthermore, one can argue that one could:
• Increase the size of the circle
This is a common argument in order to retrieve objects located close to the
query scope. However, we do not know the optimum size of the circle that
needs to be enlarged. If we increase the size of the circle, too many objects
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 67
are retrieved and resources are wasted (bandwidth, power consumption and
memory).
• Resend the same query
Resending the same query needs more processing time and power. It results
in the objects being passed or being outside the scope area. Furthermore, the
user may miss a query result if the server is busy.
Therefore, a square shape is preferred to be used as a query scope due to its
efficiency in query processing at the server side. This shape is more efficient and
accurate and can more easily be used to discover objects within this shape. Fur-
thermore, the possibility of finding the restaurant is higher than if square is used.
3.2.3 Query Types
This section describes briefly location-dependent queries. There are many queries
that are similar to a location-dependent query.
A common query type which is similar to the location-dependent query is the
spatial, location independent query. An example of a spatial query would be to find
a certain region at location X1, Y1. Note that, this type of query is not a location-
dependent query. The reason is that this query asks a certain object which does not
depend on the current location of the user.
The results of a location-dependent query type are dependent on the current
location of the mobile user who initiates the query. Current location means the
location of the mobile user when he/she receives a query result. This type of query
exists only in a mobile environment. Figure 3.4 shows a situation when a mobile user
sends a location-dependent query and receives its result. The sending and receiving
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 68
locations are not the same. Objects located inside the query scope are valid objects
which are returned to the requester.
Figure 3.4: A location-dependent query in details
The location-dependent query processing can involve object retrieval from both
single-cell and multi-cell. Single-cell query processing is a query processing where
the query scope does not pass the base station boundary (as shown in Figure 3.4).
On the other hand, if the query scope passes more than one base station boundaries,
the query processing is called multi-cell query processing.
Consider the query: “retrieve a list of restaurants within 500 metres”. In this
query, the location is implicitly mentioned. The query can be either a spatial or
location-dependent query type. It is a spatial query if the location of the requester
remains the same from the time of asking query until receiving the query result.
We can categorise it as a location-dependent query since it depends on the current
location of the requester. In contrast, it is a location-dependent query if the location
from which the query was sent or the location at the time of receiving the query
result are different.
Furthermore, consider this query: retrieve a list of restaurants within 500 metres
from hotel A. This query is not a location-dependent query since the location of
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 69
restaurants depends on the location of hotel A and is independent of the location of
the user. However, this query can be classified as a location-related query.
3.3 Query Processing for Single-Cell
We discuss in details how our proposed algorithm works in handling the situation
mentioned above in this section. Our proposed algorithms are divided into three
categories, which are elaborated in next three subsections. They are as follows:
• Static Query Scope
This category is based on movement of mobile users. We propose three algo-
rithms based on user movements: horizontal, vertical and diagonal. The query
scope is parallel to the base station.
• Dynamic Query Scope
The dynamic query scope category is the query scope that is perpendicular to
the travel direction of mobile user.
• Angle of Movement
Here, we consider the angle of the travel direction, which is calculated between
the travel direction and the centre horizontal line of the query scope. We
classify the angle of travel direction into three groups: 0 < α ≤ 30, 30 < α ≤60 and 60 < α < 90 degrees.
3.3.1 Static Query Scope Category
In this category, the query scope is parallel to a BS shape. It is parallel in the sense
that to avoid creating a query scope based on the travel direction of mobile users.
Hence, this category is to simplify a process of creating query scope.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 70
Each of these three subsections presents the steps of the proposed approaches in
this category. The first subsection shows a main part of the proposed approaches.
The last two subsections contain two parts of the proposed approaches, which are
responsible to retrieve objects based on the user movement.
The Main Part
In general, this part is an entry point and contains the entire process of query result
retrieval in this category. The process includes receiving a user input, predicting an
expected recipient location, creating a query scope, searching objects in the current
BS, and sending the query results back to the mobile user. This process terminates
when the mobile user receives the requested information.
While the server is processing the user’s request, the user moves from one location
to another. The contents of query result retrieval would be based on the recipient
location instead of the location of sender. After the recipient location is known, the
query scope is generated whose size is based on a given value. Information of all
objects where their location inside the query scope are then retrieved. Finally, the
information is shipped to the requester and an acknowledgement is expected to be
received. If the server does not receive any acknowledgement, the server produces a
new query result and sends it to the user. The new query result might be different
with the current one due to the mobility of the user.
Algorithm 3.1 shows the details of our main proposed algorithm. It can be
explained as follows:
(i) The server receives an input from the user. It contains the following factors:
the current location of mobile user, travel direction, velocity, and searching
distance. The first factor is very straight-forward, which is the location when
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 71
Algorithm 3.1: The main proposed algorithm
Input: Location, Query, SpeedOutput: Resultsbegin
tstart ← time when a query is received (assume zero)(TDx, TDy) ← travel distance from tstart to tstart+1 at the current velocity(X1, Y1) ← Sender location at time tstart
Create a query scope with dimension 2 * SD and 2 * SD at location(X2, Y2)Divide the scope into 4 equal areas where the user is located at thecentre pointDir ← user travel directionobjsInOverlappingArea ← calling algorithmcheck overlapping area(allObjsFound)allObjsFound ← calling the algorithm to get valid objects based onuser movementisReceived ← send allTargetFound to userif isReceived is false then
tstart ← tstart+1
(X1, Y1) ← update the location at time tstart
(X2, Y2) ← update the location at time tstart+1
end
endend
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 72
the user sent a query to the server. A travel direction can be determined us-
ing either a travel history or two points in two-dimensional coordinates. The
process is simplified by producing two dimensional coordinates. The coordi-
nates are connected by a straight line, which show the direction of the travel
between the start and the end points. The velocity value is taken from the cur-
rent value of velocity. The last factor is used to measure how far the searching
area distance is.
(ii) Predicting the next location where the mobile user is expected to receive the
query result. It is predicted by doing a calculation based on the current travel
direction, speed, and query processing time.
(iii) Creating of a query scope process. Since a square has the same height and
width, the dimension of the square is presented by the length of the param-
eter. The length parameter is the searching distance from the client request
multiplied by two. We multiply by two, because the length is the distance
from the user to the left and the right sides.
(iv) Once it has been created, it is divided equally into 4 regions. The aim of this
division is to speed up the searching process on the server side; therefore, the
regions that have been passed, which are located in the opposite direction, will
not be processed further.
(v) Verify whether there is an overlapping area, which is an overlap area between
previous and current query scopes. This area exists if the mobile users fail to
receive the query result at time tstart−1. The time tstart−1 is a unit of time when
the user expects to receive the query result previously. The purpose of checking
the overlapping area algorithm of the query scope is to avoid processing the
existing targets in the next interval time. The details of this process will be
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 73
discussed in Section 3.5.1 while a disconnection issue being presented. The
execution result of this process is either a set information of objects, which
locates in this overlapping area or an empty set of information.
(vi) Load information of objects that their locations is located within the query
scope. The result set of the overlap query scope is passed with a consideration
that the overlap area is excluded in the current process of probing objects.
Unless the mobile user is predicted to stop while retrieving the query results,
the server decides which area of the query scope is processed. When the travel
direction of the user is either horizontal or vertical towards the query scope, two
regions of query scope are processed. On the other hand, if the travel direction
is diagonal, a region of the query scope is processed. These regions of query
scope being processed are located in the same direction as the travel direction.
If the user misses a query result previously, these regions are subtracted by the
overlapped area. The information of all objects in this area is retrieved. The
details of these processes are presented in the next two subsections.
(vii) Send the generated query result to the user. Once the query result is ready to
be shipped, the collected information is then sent to the user and the server
waits an acknowledgement from the user. The mobile user sends an acknowl-
edgement to the server once the query result has been received. In contrast,
when the mobile user receives either partial query result or none, due to a weak
signal or disconnection, the acknowledgement will not be sent to the server. At
the server side, a parameter is used to kept track whether an acknowledgement
has been received by the server. Its value will be true if an acknowledgement
from the user has been received. Otherwise, its value is false and the server
prepares the next query result for time tstart+1.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 74
The Vertical/Horizontal Movement Algorithm
This section focuses on horizontal or vertical movement in two-dimensional coor-
dinates. The vertical movement is when a mobile user travels along Y-axis of the
two-dimensional coordinates, whereas the horizontal movement is when the mobile
user travels along X-axis. A discussion of the proposed vertical movement algorithm
is presented first, followed by the horizontal movement.
The proposed vertical movement approach retrieves information of requested
objects based on travel direction of the mobile user. In the start, it receives the
current position, a query scope and travel direction of a mobile user from the main
part (see previous section). The current position and travel direction are used to
determine which regions of query scope is being processed. When the mobile user
is going up-direction, information of all objects which are located in the two upper
regions (regions 1 and 2) are retrieved. However, if the information of objects has
existed in the overlapping collection, they will not be loaded anymore. On the other
hand, when the mobile user is going down-direction, the similar approach is applied.
However, the difference is to retrieve only information of objects, which are located
the two bottom regions (regions 3 and 4). The algorithm of this scheme is shown in
Algorithm 3.2.
Figure 3.5 shows examples of how this algorithm works when a user goes verti-
cally. All information about targets located in shadowed regions is sent to the user.
While a user is travelling down or south, all information about objects located in
the bottom regions (regions 3 and 4) is sent to the mobile user as shown in Figure
5a. On the other hand, all information about targets located in top regions (regions
1 and 2) are sent to the mobile user while a user is moving up or north (Figure
3.5b).
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 75
Algorithm 3.2: The vertical movement algorithm
Output: Resultsbegin
Objects ← objects collection in the current Base Station boundary(X, Y ) ← Current location at tstart+1
Dir ← user direction(either up or down)overlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do
if (object is not in overlapping collection) and (object is in scope) thenif (direction is up) and (object.Ycoordinate ≥ Y ) then
collection ← collection + objects found in region 1 and 2else
collection ← collection + objects found in region 3 and 4end
returned. When the user is coming from the Top Right direction, all objects found
in region 3 will be returned to the user (Figure 3.7c). Another example is that if
a user goes to Top Left, the opposite region (region 2) will be probed (shown in
Figure 3.7a).
3.3.2 Dynamic Query Scope Category
This category focuses on the query scope that is dynamically changed based on the
user direction. It does not imply that the shape of query scope is changed, but
the query scope is not parallel towards the cell scope (as shown in Figure 3.8). In
contrast, the query scope is perpendicular to the direction of the user and the angle
of movement is not necessary. Making the query scope to be perpendicular to the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 80
direction of the user has significantly reduced its complexity. After the query scope
was created, information of objects within the shaded area is retrieved and returned
to the user.
Figure 3.8: Dynamic query scope for the diagonal movement
Algorithm 3.5 shows the process of this approaches. The details of the proposed
approaches are described as follows:
(i) Generates a line equation of the travel distance.
(ii) Forms a query scope with a given size and perpendicular to the above line
equation.
(iii) Finds an overlapping area between current and previous query scopes. If any,
selects the objects in the overlapping area.
(iv) Retrieves information of all objects that are located in the shadowed regions
of the current query scope and outside the overlapping area. The shadowed
regions are areas that have not been passed.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 81
Algorithm 3.5: The dynamic query scope algorithm.
Output: Resultsbegin
objects ← objects collection in the current Base Station boundary(X, Y ) ← Current location at tstart+1
(TDX , TDY ) ← user searching distanceTravel line ← line equation for travel directionQuery Scope ← query scope which is perpendicular to Travel lineSearching area ← two regions of Query Scope which is located in frontof current locationoverlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do
if (object is not in overlapping collection) and (object is inSearching area) then
collection ← collection + objects found in Searching areaendobject ← next object
Figure 3.11 shows three different types of multi-cell queries where there is no
overlapping area amongst BSs. Figure 3.11a shows a user that moves into its neigh-
bouring BS. In Figure 3.11b a user moves toward the BS borders. Figure 3.11c shows
the movement within the same BS and the query scope crossing the corresponding
BS boundary.
As mentioned before, the illustration in Figure 3.11a shows a user travelling
within BS1 and the query scope is crossing the BS1 boundary. The target of query
scope (shaded area) is decided by the user direction. When BS1 knows that the query
scope is crossing its boundary, it processes the query within its area (shaded area
from user location to ∆X) and gets partial information about the query result from
BS2 by forwarding the remaining query scope information from BS1 to BS2. Once
BS2 finishes generating query results, it forwards the query results to its requester
neighbour, BS1. Then, BS1 combines the partial query results retrieved from the
other BS (BS2) and forwards the joined query results to the user.
Figure 3.11b shows a user location at the border line of BS1 and BS2. In this
situation, BS1 does not process the user query because, when the user misses the
query results, BS1 needs to forward the user query twice to its neighbour. After
BS1 receives the user query, it forwards the user query and the query scope to its
neighbour, BS2, which processes the query and forwards the query results to the
user immediately. In both figures, handovers do not happen since the user remains
within one cell.
Figure 3.11c shows a user moving from his/her current BS1 to BS2. BS1 receives
the query and calculates the predicted location of user. BS1 forwards the user query
and the prediction of the user location to BS2 and the user query is handled by
BS2. In this situation, generating query results of a number of users is dependent
on the knowledge on knowing when the users enter new cells. Predicting when users
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 87
enter new cells have been discussed in section 2.4.3 [128]. So, in this case, the next
neighbour, BS2, knows when users enter its area. The remaining processes of query
result retrieval are the same as in the previous example.
(a) Movement within one cell (b) Movement to BS border line
(c) Movement within another cell
Figure 3.11: Three types of users’ movement
The above figure shows non-overlap areas of multiple BSs, these areas can overlap
each other. This situation raises an issue in answering multi-cell query raises issues.
The issue and its proposed solutions will be addressed in Section 3.4.1. Later in
Section 3.4.2, a proposed solution to apply these proposed approaches to deal with
a multi-cell query, which is either static or dynamic, is described.
3.4.1 Non-Overlapping and Overlapping Area Algorithms
The issue of multi-cell query retrieval is to avoid any duplicate data items retrieved
from other BSs and to reduce waiting time to retrieve query results from other
BSs. This section describes two proposed solution of multi-cell query retrieval which
involve non-overlap and overlap areas of multiple BSs respectively.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 88
Non-Overlapping Area algorithm
The proposed solution to answer a multi-cell query from non-overlapping areas of
multiple BSs is the focus of this section. Before the proposed solution is presented,
two major types of non-overlapping BS areas are described. Figure 3.12 shows two
major types of non-overlapping BS scopes. The first figure shows a whole area that
is covered by many BSs; whereas, the second figure shows there is an area which is
not covered by those BSs (described by the shaded area). Handling mobile query
retrieval in both situations is the aim of our proposed approach.
Figure 3.12: Non-overlapping base stations(BS)
As we mentioned previously, our proposed approach keeps track all online BSs
such that all BSs are required to register to all of their surrounding neighbour BSs.
When there is a multi-cell query, the current BS retrieves information of local and
remote objects. The current BS refers to a BS where the mobile user is inside its
service area. Hence, when the user is expected to arrive at a new cell, the current
BS is the one that sends a query result. Because new location of the user is located
in the new BS area, which is different to the one receives the user query that has
been forwarded the user query to this new BS. Hence, we assume that a handover
has been carried out.
In retrieving information of remote objects, the current BS searches the area of
query scope which overlaps with any of online BSs in its list. For example, when the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 89
area of query scope overlaps with area of BS A, the overlapped area is then sent to
the BS A in order to get the query result. Using the approach mentioned in Section
3.3, the BS A generates the query result and returns it to the current BS. It merges
all information of remote and local objects, which will be sent to the user.
Algorithm 3.7 shows the details of proposed algorithm. They are described as
follows:
(i) Retrieve information of all objects from the current BS which are covered by
the query scope. The details of information retrieval process will be discussed
in Section 3.4.2.
(ii) Load an information of online BS from the list in sequential order.
(iii) For every online BS, find an overlap area between the query scope and area of
the online BS. If an overlap area exist, the current BS sends the overlap area
to that BS. In other words, the overlap area is the query for that particular
BS.
(iv) The online neighbour BS that receives the query, execute the query in the same
way as the current BS.
(v) The online neighbour BS returns a query result, which contains either a list
information of objects or an empty list, to the current BS.
(vi) The current BS combines the returned query result into its query result.
(vii) Repeat the process until information of all objects inside the query scope is
retrieved.
(viii) Return the query result to the requester.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 90
Algorithm 3.7: Non-overlapping algorithm.
Input: Query, NoOfBSOnlineOutput: Resultsbegin
Queryscope ← Scope of queryCurrent BSscope ← current base station boundaryCurrent BSID ← current base station IDNoOfBSOnline ← number of online neighbour BSCollectionOfOnlineBS ← List of online BSResult ← Get Result(Queryscope, Current BSscope)while (index < NoOfBSOnline) do
Current Neighbour BS ← CollectionOfOnlineBS at position indexif intersection(Queryscope, Current Neighbour BSscope is existed) then
// Append retrieved results from current BS// to the end collection of all retrieved resultsResult ← Result + Current Neighbour BS(Queryscope)
endindex ← increment index by one
endReturn Result
end
Figure 3.13 shows an illustration of the multi-cell query retrieval. MU2 and MU1
sends 2 queries to BS1 and BS5 respectively. BS1 forwards the query by sending the
overlap area with BS2 and one with BS4 to BS2 and BS4 respectively on behalf of
mobile user MU2. The BS2 and BS4 return information of objects inside the query
scope.
The retrieval process of MU1 is similar to MU2, except the uncovered area will
not be sent to any of online BSs. Furthermore, the query scope covers the area of
BS1 and BS2, which are not direct neighbours of BS5. Fetching result information
from both BSs can be done by recursively passing the overlap area of query scope to
all neighbour BSs of the current BS. These neighbour BSs pass overlap parts of the
received query scope to their neighbour BSs. This process continues until the whole
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 91
Figure 3.13: Multi-cell query illustration
area of query scope is processed and there is no further overlapping area between
any BS and query scopes.
Overlapping Area Algorithm
The query result retrieval for overlapping area of multiple BSs has similar process as
the one for non-overlapping. The existence of overlapping area makes both processes
different, because a mechanism has to be applied to avoid any object duplications
in a query result.
This section elaborates two proposed approaches to handle the situation, they
are area and query result eliminations. Both proposed approaches are explained as
follows:
1. Eliminating neighbour BS overlapping area
This proposed approach is used to avoid reprocessing an overlap area of mul-
tiple BSs that have been processed. When the query scope covers an overlap
area of multiple BSs, the overlap area will be searched once. The first BS in
the list of online BSs would be in charge for searching objects within that area.
Algorithm 3.8 shows the complete process. They can be explained as follows:
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 92
(i) Retrieve a query from an user, extract information of the query and
generates a query scope based on the information of the query.
(ii) Search the requested information of all objects where their places are
inside the query scope within the current BS area.
(iii) Load an information of the online BS based on its position in the list of
online BSs. Verify the area of that BS against the area in a list of the
processed BSs. This list contains all BSs that have processed the query.
If the online neighbour BS being processed is in the list, this BS does
not get a task to process the current query.
(iv) Before forwarding the query to that neighbour BS, the current BS elim-
inates the overlapping area of the current neighbour BS being processed
and the current BS.
(v) Once the overlapping area has been eliminated, the query and list of the
executed BSs are forwarded to that BS which will then generates a query
result using the same mechanism as the current BS.
(vi) Repeat the process until all online BSs have been processed.
2. Eliminating items from neighbour BS query results
The proposed approach is similar to the previous one. The only difference
is that any overlapping neighbour BS areas is not eliminated, because the
duplicated items in the returned query results are eliminated from the query
results.
Algorithm 3.9 shows our second proposed algorithm for retrieving items from
multiple overlapping cells by eliminating duplicate items in the query result.
To simplify our description, we do not discuss the whole algorithm since it is
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 93
Algorithm 3.8: Eliminating neighbour BS overlapping area algorithm.
Input: Query, list of BS doneOutput: Resultsbegin
Queryscope ← Scope of queryBSscope ← current base station boundaryCurrent BSID ← current base station IDResult ← Get Result(Current BSID)Area Taken ← BSscope
list of BS done ← list of BS done + Current BSIDwhile (index < NoOfBSOnline) do
Current Neighbour BS ← CollectionOfOnlineBS at position indexif (Current Neighbour BS.ID is in list of BS done) then
list of BS done ← list of BS done + Current Neighbour BS.IDContinue to next neighbour BS
endlist of BS done ← list of BS done + Current Neighbour BS.IDCurrent Neighbour BSscope ← Current Neighbour BSscope -Area Takenif intersection(Queryscope, Current Neighbour BSscope is existed) then
// Append retrieved results from current BS// to the end collection of all retrieved resultsResult ← Result + Current Neighbour BS(Queryscope, list of BS done)
endArea Taken ← Area Taken + Current Neighbour BSscope
index ← increment index by oneendReturn Result
end
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 94
similar to one mentioned in the previous subsection. We highlight only those
parts which are different.
Algorithm 3.9: Eliminating items from neighbour query result.
Input: Query,list of BS doneOutput: Resultsbegin
Queryscope ← Scope of queryBSscope ← current base station boundaryCurrent BSID ← current base station IDResult ← Get Result(Current BSID)list of BS done ← list of BS done + Current BSIDwhile (index < NoOfBSOnline) do
Current Neighbour BS ← CollectionOfOnlineBS at position indexif (Current Neighbour BS.ID is in list of BS done) then
list of BS done ← list of BS done + Current Neighbour BS.IDContinue to next neighbour BS
endlist of BS done ← list of BS done + Current Neighbour BS.IDif intersection(Queryscope, Current Neighbour BSscope is existed) then
// Append retrieved results from current BS// to the end collection of all retrieved resultstempNeighBSResult ← Current Neighbour BS(Queryscope, list of BS done)tempNeighBSResult ← Eliminate duplicate items fromtempNeighBSResult against ResultResult ← Result + tempNeighBSResult
endindex ← increment index by one
endReturn Result
end
This algorithm does not follow the BS areas that have been processed, but, it
treats the overlapping BS as a non-overlapping area. It means that a neighbour
BS collects any items in the overlapping area covered by the query scope,
although these items are collected by other neighbour BSs. As a result, the
returned query result contains some duplicated items when it merges with the
one in the caller BS. Therefore, an additional step is taken to filter out the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 95
duplicate items before joining any query results of other neighbour BSs and
the one of the current BS together.
3.4.2 Static and Dynamic Query Scope Algorithm
Static query scope refers to a parallel query scope with a base station boundary.
Dynamic query scope is a query scope perpendicular to the straight line of the travel
direction of mobile users. Section 3.3 has presented more details about these query
scopes and the proposed query processing algorithms in a single cell. This section
presents a discussion to apply the proposed algorithms, which has been discussed in
Section 3.4.1, to be used with static and dynamic query scopes to retrieve a query
result. The information retrieval algorithm for the static query scope is presented
first, followed by the dynamic one.
Static Query Scope Algorithm
Before a discussion of proposed approaches is presented, an illustration is shown
to give a better picture. Figure 3.14 illustrates a static query scope that covers a
partial area of multiple cells. When MU2 sends a query scope, ACFH, which covers
partial area of the BS, the BS decides part of query scope being processed which
depends on the travel direction. The scope, KCFL, is then processed. The BS then
does a reduction process of the query scope towards the area of BS. The aim of
reduction is to eliminate the part of the query scope which does not belong to the
BS. The reduction process is very straight-forward since it cuts any four sides of the
query scope if they are greater than the BS scope. After the reduction process has
been done in BS1, BS2 and BS4, we have three smaller separate query scopes inside
three BSs. These smaller query scopes are KBEJ in BS1, BCDE in BS2 and DFLJ
in BS4.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 96
After every BS has completed the reduction query scope process, the BS searches
all items which are located inside the smaller query scope. These items are returned
to the BS caller, otherwise the BS returns nothing to the BS caller. Therefore, BS4
and BS2 return items inside areas BCDE and DFJG respectively to BS1. Then,
BS1 sends back the returned results and its result to the user.
Figure 3.14: An illustration of static query scope
Algorithm 3.10 shows details of the proposed information retrieval approach.
They can be explained as follows:
(i) If the scope of BS is smaller than the query scope, return all items inside the
area of BS.
(ii) Otherwise, create a new scope of query that covers the scope of BS.
(iii) Find all items which are located in the new scope and store them into a col-
lection called result.
(iv) Return it to the requester.
Dynamic Query Scope
Dynamic query scope is different from static query scope. There are many possi-
bilities when a dynamic query scope is used to retrieve query results from several
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 97
Algorithm 3.10: Get Result algorithm for static query scope
Input: queryScope, BSScopeOutput: Resultsbegin
list of items ← items in this BS//Check whether query scope is greater than BS scopeif (BSScope < queryScope) then
return all items in list of itemsend//Check whether partial of query scope pass BS scopeNew Query Scope ← queryScopeif (queryScope.Xmax > BSScope.Xmax) then
New Query Scope.Xmax ← BSScope.Xmax
endif (queryScope.Xmin < BSScope.Xmin) then
New Query Scope.Xmin ← BSScope.Xmin
endif (queryScope.Ymin < BSScope.Ymin) then
New Query Scope.Ymin ← BSScope.Ymin
endif (queryScope.Ymax > BSScope.Ymax) then
New Query Scope.Ymax ← BSScope.Ymax
endwhile (item in list of items) do
if (item is inside New Query Scope) thenresult ← result + item
end
endreturn result
end
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 98
BSs since they come from a different angle. These possibilities can be classified into
four categories based on the coverage of the query scope on a BS area, as shown in
Figure 3.15. The new shape of the query scope can either be a polygon or triangle.
The first three figures show a polygon, whereas the last figure on the bottom right
shows a triangle. The boundary of the new query scope intersects with one or more
BS boundaries.
(a) two parallel lines (b) two perpendicular lines
(c) two parallel lines (d) two perpendicular lines
Figure 3.15: Dynamic Query intersects a base station (BS) (top) in the same line.(bottom) in two different lines
Algorithm 3.11 shows a retrieval algorithm when a query scope passes a neigh-
bour BS area. This algorithm starts with an initialisation value to some parameters.
Then, it checks whether the BS area is smaller than the query scope. If it is, the
neighbour BS returns all items in its area immediately to the mobile user.
When a query scope partially overlaps neighbour BS, there is an intersection
area as shown in Figure 3.16. The form of this area is a polygon with has n number
of vertices, where the value of n is between 3 and 6. The intersection area is formed
by any corner point of BS boundary and a number of intersection points between
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 99
Figure 3.16: An illustration of dynamic query situation
query and BS scopes. Hence, the BS needs to know where these intersection points
are located and which one or more corner points of BS boundary are inside the
intersection area. An intersection point lies on two line equations: a line equation
of the BS boundary and the stored line equation of the query scope and searching
distance. These points are stored in a clockwise order.
The BS compares all items inside the list of items whether or not their location
is inside the query scope, using the right hand rule, which is specified as follows:
• Take two points inside the collection.
• Use formula (p.y−p.y0)(p.x1−p.x0)− (p.x−p.x0)(p.y1−p.y0) to find whether
point p is located inside the query scope. This formula returns a value less
than 0, equal to 0 or greater than 0.
• Point P lies inside the query scope if the value is less than or equal to 0
All points located inside the query scope are collected and returned to the BS
requester. Then, the BS requester combines the result with its result together to
the requester.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 100
Algorithm 3.11: Neighbour cell retrieval algorithm for dynamic query scope
Input: queryScope, BSScopeOutput: Resultsbegin
isInside ← truelist of items ← items in this BSlist of BS Vertices ← BS vertices//Check whether query scope is greater than BS scopeIntersection points ← find all intersection points lies on query scope andneighbour BS boundariesSort elements of Intersection points in clockwise orderif (all point in list of BS Vertices covered by queryScope) then
return all items in BSScopeendif Any point in list of BS Vertices covered by queryScope then
Add that point into Intersection pointsendSort elements of Intersection points in clockwise orderwhile item in list of items do
p ← item[ndxItem]while (isInside is true) and (index < number of elements inIntersection points) do
p0 ← Intersection points[index]p1 ← Intersection points[index+1]isRS ← (p.y - p.y0) (p.x1 - p.x0) - (p.x - p.x0) (p.y1 - p.y0)if (isInside is true) and (isRS ≤ 0) then
isInside ← trueelse
isInside ← falseendindex is incremented by one
endif isInside then
result ← result + pendndxItem is incremented by one
endend
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 101
3.5 Handling Disconnections
In a mobile environment, there might be a situation that mobile users do not re-
ceive any query results due to a disconnection between the mobile user and the
base station. The disconnection can be either unpredicted or predicted, which is
caused by interference or the recipient’s location is outside the coverage of any base
station respectively. This section discusses our proposed algorithms for handling
disconnection.
A server can handle predictable disconnection more easily than unpredictable
disconnection. The reason is that the server knows the wake-up time of mobile
users. On the other hand, the second type has no knowledge of when the mobile
device gets connected and the recipient’s next location.
As a result of the disconnection, either the query result has not been received
by the mobile user or an acknowledgement has been lost during the transit. To
deal with the missing results problem, the server could reprocess the query result
when the disconnection is predictable. On the other hand, reprocessing the query
result might not be a good idea since the server needs a certain amount of time
to produce the query result and frequent disconnection happens. Preserving an
existing query result and sending it periodically within a certain amount of time
is one solution to manage sending the query result in this situation. For both
solutions, the mobile user needs to send an acknowledgement once the query result
has been received. The server keeps a query result for a certain amount of time to
avoid excessive query results from other mobile users. In the case of any failure in
receiving acknowledgement by the server, this situation would be treated as if the
mobile users had not received the query result sent.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 102
The rest of this section discusses the proposed approaches for handling two types
of disconnections for single cell and multiple cells. The proposed approach to han-
dling disconnection in the single cell is presented first, followed by that for multiple
cells. For each subsection, the proposed techniques for handle predictable and un-
predictable disconnection are presented.
3.5.1 Single Cell
Predictable Disconnection
This section elaborates on a proposed mechanism to handle a situation when mobile
users miss query results within a predictable time. In other words, mobile users are
alerted to be ready to receive query results in the next interval.
Figure 3.17: Illustration of predicted disconnection situation
Consider that a user at location Z0 sends a query, which requests objects within
a distance D, at time tstart. The user is travelling with a constant speed S. The user
does not receive a query result at location Z1 at time tstart+1. The user is expected
to arrive at location Z2 to receive a query result at time tstart+2 as shown in Figure
3.17. If the gap between Z1 and Z2 is less than the distance of the user query, an
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 103
overlapping of the query results (the area is indication by TRSQU) is generated at
time tstart+1 and time tstart+2. It overloads the server to generate the same results
in future.
In our proposed approach, retrieving the same result set can be avoided when
the above situation happens. The approach is divided into two major steps. Deter-
mining the existence of overlapping area of two query scopes, which exists if the gap
between Z1 and Z2 is shorter than the distance value of the query, is the first step
of this approach. The second step is to exclude any items from a result set. In the
second step, there are two ways: items or area. Both ways are similar to the ones
to eliminate duplicated items of multi-cell queries processing for the overlapping
BSs (described in Section 3.4.1). Both ways collect items that are located inside
the overlapped region from the existing query result. In the items elimination, we
do not eliminate the overlapped area while searching the items, however, all items
are compared with the items in overlapped region. The area elimination, we elimi-
nate the overlapped area while searching the items, therefore comparing the items
whether they are located in the query scope is done.
Algorithm 3.12 presents our proposed algorithm. The details of this algorithm
are explained below:
(i) Verify the previous position, (X1, Y1). If it is outside the current query scope,
terminates the algorithm and returns an empty result set.
(ii) Form an overlapping area of both query scopes.
(iii) Retrieve all objects which are located in the query scope, but they are outside
the overlapping area. This step is the second part of the proposed approach.
Hence, an elimination process is completed here by choosing one of two ways.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 104
(iv) Keep all items in the overlapped regions.
(v) Add those objects with ones in the overlapping collection.
Objects ← objects collection in the current Base Station boundary(X1,Y1) ← Current location at tstart+1
(X2,Y2) ← Current location at tstart+2
Dist ← distance of user queryoverlapping collection ← emptyif (X1,Y1) is outside queryScope then
return overlapping collection;endT ← (X2± distance, Y2± distance)R ← (X1± distance, Y2± distance)S ← (X1± distance, Y1± distance)Q ← (X2± distance, Y1± distance)U ← (X2± distance, Y2± distance)overlapping area ← area formed by TRSQU coordinatesoverlapping collection ← all objects in the overlapping areacollection ← overlapping collection + searches all objects which are notlocated in the overlapping areareturn overlapping collection
end
Unpredictable Disconnection
This subsection presents a proposed technique for managing a situation where un-
predictable disconnection occurs. There are two possible solutions for handling such
disconnection: non-reprocessing or reprocessing query result.
Algorithm 3.13 shows the proposed non-reprocessing algorithm when unpre-
dictable disconnections occurs. This algorithm is executed when the BS have re-
ceived an information that the mobile user is ready. At the start of the algorithm,
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 105
a query scope is a query scope when a mobile user misses query results. This query
scope has been available in the server. (X1, Y1) is the last position when the mobile
user missed query results. (X2, Y2) is the current position of the mobile user. When
(X2, Y2) is still inside the query scope, then the server sends the existing query re-
sult. Otherwise, the server needs to reprocess the existing query with next location
of the mobile user.
Algorithm 3.13: Non-reprocessing algorithm
beginQueryscope ← query scope from user(X1, Y1) ← Sender location at time tmissed
(X2, Y2) ← location at time tcurrent
if ((X2, Y2) is inside Queryscope) thenSend existing query results
elseRegenerate query result
endend
The advantage of this algorithm is that it reduces the server load by keeping the
existing query result, which depends on the server configuration. Two drawbacks of
this algorithm are that it increases server memory consumption because it retains
existing query results, and some objects may be invalid since the requester has
moved to a new location.
Alternatively, the server regenerates a new query result without worrying about
the existing query result. Algorithm 3.14 shows the reprocessing algorithm when the
mobile user has missed a query result. This algorithm is executed when the mobile
user reconnects to the current BS. Similar to the one mentioned previously, the
server collects the current location information of the mobile user and predicts the
next location at the beginning of the algorithm. Then, the server generates query
scope at (X2, Y2) with the same searching distance which was passed to the server
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 106
Algorithm 3.14: Reprocessing algorithm
begin(X1, Y1) ← Sender location at time tcurrent
(TDx, TDy) ← Travel distance of mobile user// Prediction of next location at time tcurrent+1
(X2, Y2) ← (X1 + TDx, Y1 + TDy)Queryscope ← generate query scope at (X2,Y2) with same searchingdistanceResult ← Reproduce query result at time current+1Send result to user
end
beforehand. The next step is the server reproduces the query result and finally, the
server sends it to the requester.
3.5.2 Multiple Cells
Disconnections also occur while retrieving multi-cell queries. The problem occur
if the mobile user travels to an area which is outside a service area of one BS.
The BS needs to avoid processing query when the user cannot be reached. This
section presents a discussion for the above problem, focusing on predictable and
unpredictable disconnections.
Predictable Disconnection
This describes the proposed algorithm for handling a predictable disconnection sit-
uation which is the period of disconnection can be known in advance. For example,
when a mobile user is outside the service area of the base station within a certain
period of time.
Algorithm 3.15 shows our proposed algorithm for handling a predictable dis-
connection in receiving the query result. At the start of the algorithm, the server
retains the existing query results that were not sent while the next recipient location
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 107
Algorithm 3.15: Predictable disconnection algorithm for multi-cell retrieval
(Xt, Yt) ← new location at time tif (Xt, Yt) is outside the current BS area then
isOutCurrBS ← trueexit loop
endwhile ((Xt, Yt) is inside queryScope) do
Send existing query result when connection is establishedisSent ← acknowledgement from the user upon receiving the result
end
endwhile (isOutCurrBS) and (BS in list of online BS) do
if ((Xt, Yt) is inside the BS) thenForward query, location to that BSExit loop
endBS ← next BS
endremove query result
end
was still inside the query scope. Otherwise, a new query result is generated as the
location of mobile user is outside the query scope.
In addition, the possibility of leaving the current BS area exists, thus, the current
BS needs to send the query to a neighbour BS. However, the next location of the
mobile user may not belong to any online BS. Therefore, the current BS needs
to calculate the next location while the mobile user enters any of the online BSs.
Hence, the new BS processes the query and sends the query once the mobile user is
connected to the new BS.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 108
Unpredictable Disconnections
The problem of unpredictable disconnections for multi-cell queries is different than
one for single cell. The movement of user to another cell creates the difference
between two. The current cell should have a knowledge whether to remove or keep
the query result. We propose an approach to handle such situation in this section.
Algorithm 3.16 shows an algorithm for maintaining unpredictable disconnection.
The server waits for acknowledgement once it has finished sending the query result.
The acknowledgement value parameter is either true or false. It is true if the mobile
user receives the query result completely.
Algorithm 3.16: Unpredictable disconnection algorithm for multiple cellsretrieval
beginqueryResult ← existing query resultisSent ← falsewhile (numOfSendingTrial < maxSendingAllowed) and (not isSent) do
if (connected) thenif (recipient location is outside the query scope) then
regenerate query result again based on new locationexit loop
endsend query resultwaiting for acknowledgement in t period timeacknowledgement ← acknowledgement from user upon receivingquery resultif (acknowledgement is true) and (acknowledgement is received)then
isSent ← trueend
endelse
exit from loopnumOfSendingTrial is incremented by one
endremove query result
end
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 109
The waiting period is calculated by the maximum number of query results sent
multiplied by the waiting period for receiving an acknowledgement. The value of
both parameters are configurable depending on the server capacity. The query result
is kept at the server side until the maximum number of sending has been reached
or the user is disconnected from the server.
The formula, to calculate the waiting period before a query result is deleted, is
given below:
WP = MS * WPA
Where:
WP is Waiting period before a query result is deletedMS is Maximum number of sendWPA is Waiting period to receive an acknowledgement (timeout)
If one of the above condition is reached, the server removes the query result.
The removal is to avoid running out of server space, even though the server has a
large space. The mobile user needs to send the query again after this period or the
new location will be outside the query scope.
This algorithm focuses on the recipient location that resides inside the query
scope. Otherwise, it is ineffective in keeping the query result at the server side since
the query result is invalid for the new location.
3.6 Case Studies
In this section, we describe case studies for single cell and multi-cell queries to
illustrate how these proposed approaches work and how query results are computed.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 110
3.6.1 Single-Cell Query Processing
We illustrate situations where the user has stopped or is moving while receiving the
query result. The user may move slowly or quickly and the movement direction may
be vertical, horizontal or diagonal. We define a slow velocity as a velocity when the
user movement is less than the distance of user query. In other words, if a user
query finds a target within x, then the user movement is less than x. In contrast,
a fast velocity is a velocity when the user movement is greater than or equal to the
distance of the user query. Therefore, the user may either hit or miss the query
result during the process of receiving the query result from the server.
Based on the above situations, this case study is divided into 4 cases. Each one
presents a discussion when a user has zero, vertical, horizontal and diagonal move-
ments respectively. Each one discusses two situations of retrieving query results, hit
and miss query results, while they are moving.
Case Study 3.6.1. The mobile user stays in the same location
In this example, we assume that the user is not moving to any other location
while a query result is being received. Consider a mobile user is located at point
(5,5) and sends a query to a server (refer to Figure 3.18). The query is “Find a
closest restaurant within 2 kms”. This user stays at the same location when the
answer is given by the server. In other words, the location at time tstart+1 is the
same as the one at time tstart.
The server will generate a valid scope by adding and subtracting the distance
to/from the mobile user position. Therefore, we have a square that is formed by
the following coordinates: Top Right: (7,7), Bottom Right: (7,3), Top Left: (3,7),
Bottom Left: (3,3).
After the valid scope has been produced, it will search a restaurant within ranges,
3 < x < 7 and 3 < y < 7. In other words, all regions will be searched. Once the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 111
Figure 3.18: Stay at the same location (Case Study 3.6.1)
server finds a restaurant within that range, then it will generate a query result for
that query. The server forwards the query result to the user. An acknowledgement
flag is set to be true if the user has accepted the query successfully. Otherwise, the
server keeps processing the query result for the next interval time.
Case Study 3.6.2. Vertical Movement
We illustrate that the user is moving vertically with a constant speed. First, we
show that the user receives the query result immediately from the server at time
tstart+1. Later, a situation when the user missed the query result from the server at
time tstart+1 is shown.
After the scope has been created and divided into four equal regions, the server
identifies the user movement direction. The examples in Figure 3.19 show the user
travels horizontally. Then, the server executes the Algorithm 3.2 in order to find
all objects queried. In Figure 3.19a, the user is moving in an upward direction,
therefore, the server will search targets within region 1 and region 2 of the scope
instead of the whole scope. This is due to our assumption that the user is interested
only in the targets that have not been passed. Hence, the valid targets will be
vending machines (V6, V8, V9, V11, V13) and these targets are forwarded to the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 112
(a) Move up (b) Move down
Figure 3.19: Vertical movement (Case Study 3.6.2-1)
user. The server sets the parameter forwarded to be true once the user has received
the answer successfully.
On the other hand, if the user is moving down, regions 3 and 4 will be probed
(Figure 3.19b). Hence, the vending machines (V2, V7, V10, V14) are valid results
and are forwarded to the user. Then, the parameter forwarded is set to be true.
We have shown a situation where the user receives the query result at time
tstart+1 above. Now, we assume that the user missed the query result at time tstart+1
when the user is moving up with a constant speed. The user will receive it in the
next interval time tstart+2 as shown in Figure 3.20.
The beginning part of this algorithm is the same as above where the server
initialises parameters needed by assigning the value from a user query received.
After the parameter initialization, the server generates a new scope for the location
Z1 at tstart+2 and divides the scope into four equal regions. Once the scope has
been created and divided, the server searches for targets within region 1 and region
2 instead of all regions based on our assumption above. Therefore, the only valid
vending machines will be vending machines (V9, V13) at time tstart+1.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 113
Figure 3.20: Vertical movement with overlap situation (Case Study 3.6.2-2)
Upon handling the disconnection, the server regenerates new query results for the
next interval time where the user is predicted to reach location Z2 at time tstart+2.
In recreating the query results for time tstart+2, the server verifies targets in the
old query results. The server will invalidate targets which are not bounded by the
overlapping area PQRS (see Figure 3.20). In this scenario, no target is invalidated.
Then, the server probes new targets in regions 1 and 2 inside the square generated
at time tstart+2. The new targets found will be joined to the existing valid targets.
Hence, the query results returned at time t2 is vending machines (V6, V8, V9, V13).
Case Study 3.6.3. Horizontal Movement
Here, we present three examples of the horizontal movement of a user. Two
hit illustrations that show a user receiving the query result while he/she is moving
horizontally with a constant speed at time tstart+1 are given first. Later, a situation
where the user misses the query result is presented. We assume that the user will
only receive the query result at tstart+1.
Let us consider an illustration where a user query “Find all vending machines
within 2 kms”, the speed of travelling is S as presented in Figure 3.21. The horizontal
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 114
(a) Move right (b) Move left
Figure 3.21: Horizontal movement (case study 3.6.3-1)
movement to the right direction with speed S as shown in Figure 3.21a. In the
beginning of process, the server receives a user query including the current travel
information. The server creates a query scope based on the information in the user
query and then selects the regions to be searched. The server assigns all targets that
are located within regions 1 and 4 of the scope, due to our assumption mentioned
above, into parameter collection. We assume that the user will arrive at point (5,5)
at time tstart+1. Therefore, the valid vending machines: (V9, V10, V11, V13 and V14),
are forwarded to the user and the parameter are set to true.
On the other side, when the user is moving in a left direction with velocity S,
the regions 2 and 3 will be searched to get valid objects (shown in Figure 3.21b).
Therefore, the valid targets: (V2, V6, V7, V8), are forwarded to the user and the
parameter forwarded is set to be true.
Now, let us consider a situation where the user missed the query result at time
tstart+1 and then the user will receive a new result at time tstart+2 as shown in Figure
3.22. Assume that the server had processed and found the vending machines: (V9,
V11, V13 and V14) as valid targets of the user query at time tstart+1.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 115
Figure 3.22: Horizontal movement with overlap situation (case study 3.6.3)
However, the user could not receive the query result when he/she was at location
Z1 on time tstart+1 because there is a disconnection upon receiving the results. We
assume the user keeps travelling with a constant speed and is predicted arrive at
location Z2 at time tstart+2. Then, the server regenerates the query result for the
next interval time. Since the user is moving slowly with constant speed, there is an
overlapping area, formed by points P,Q,R,S, between the square at time tstart+1 and
tstart+2. Therefore, the server will invalidate some targets in the existing query result.
In this case, the vending machines: (V9, V14) have expired and are eliminated. In
other words, the vending machines’ locations that are bounded by the overlapping
area, PQRS, are the valid targets for time tstart+2. After the server has eliminated
the invalid targets, the server will probe targets that are located within regions 1 and
4 of the scope (excluding the overlapping area) since the user is moving horizontally
and is interested in the targets that have not been passed. These found targets are
substituted with the targets found in area PQRS. Afterwards, these targets: (V10,
V11 and V13), are forwarded and received by the user at time tstart+2.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 116
Case Study 3.6.4. Diagonal Movement
In this example, two illustrations (as shown in Figure 3.23) of diagonal movement
are presented. The first illustration demonstrates a situation where a user receives
a query result at time tstart+1 when he/she moves diagonally with a constant speed.
In contrast, the last illustration shows a stage when a user misses a query result at
time tstart+1 and is expected to receive a new query result at time tstart+2.
(a) Diagonal movement (b) Overlap
Figure 3.23: Diagonal movement and overlap situation (Case Study 3.6.4)
Let us consider that a user sends the same query as in the previous example
and moves in a top right direction as shown in Figure 3.19 and Figure 3.21. At
the start of the process, a server receives the user query. The server then produces
a scope on a prediction location at time tstart+1, which is point (5,5), and divides
the scope into four equal regions. The next process analyses the user direction by
calling the diagonal movement algorithm (Algorithm 3.4) to check targets in the
opposite region. In the algorithm, the servers verify the user movement direction.
In our example, the user moves to top right direction, therefore, the server will
search targets within region 1 instead of all regions based on our assumption. Then,
the valid vending machine, V11, is sent to the user.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 117
In the next illustration (Figure 3.23b), the scenario is similar to that above;
however, the user experiences a disconnection upon receiving the query result at time
tstart+1. Therefore, the user missed the query result at that time and is expected to
receive the next query result at time tstart+2.
When the user acknowledges that the user has missed the current result, the
server regenerates a new query result for a location Z2 since the user is predicted to
arrive at location Z2 at time tstart+2. The server generates a scope for the location
Z2. The overlapping area is searched to invalidate the existing targets that are not
bounded within this area. Then, the server searches targets which are located within
the scope (excluding the overlapping areas). Then, the new targets found are joined
to the existing targets. Hence, the query result where the content is (V11, V13) is
returned to the user. A returned acknowledgement is set once the user receives the
query result.
3.6.2 Multi-Cell Query Processing
In this subsection, we present examples to show how our proposed query processing
algorithm for multiple cells works. We divide this into two parts: a non-overlapping
BS area and an overlapping BS. First, we discuss how to retrieve query results where
there is no overlapping area, then, we discuss the process for the situation where
there is an overlapping area. As a running example, we use the same query as
mentioned in Section 3.6.1, which is sent to a server through BS1.
Non-Overlapping BS Area
Two examples are given to illustrate two situations where a user moves within the
current cell and moves to another cell and requests for information of objects from
multiple cells.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 118
Figure 3.24 shows a situation where a query scope is crossing eight BS bound-
aries. In this situation, BS3 receives the query scope and forwards it to its neighbour
BSs (BS2, BS4, BS7, BS8, BS9). Those BSs search objects within the requested
area and verifies their list of online BSs. BS4 and BS9 forwards the query again to
BS5 and BS10 respectively to request objects in their area. BS5 and BS10 returns
information of the requested objects to the requester, BS4 and BS9. All BSs, which
got the forwarded query by BS3, returns all information to BS3. BS3 merges all of
the results and then send it to the user.
Figure 3.24: A query scope is crossing multiple cells
Figure 3.25 shows a situation where a user moves to another BS. The user sends
the same query to BS3. Once BS3 receives the user query, the prediction location of
users is calculated based on the function of time. This function of time is formulated
from the multiplication of travel speeds by time. Since the new location of mobile
user is outside its area, the BS3 forwards the query to BS8. BS8 creates a query
scope and processes the query inside the shaded area.
In fetching the query result, BS8 searches objects inside the requested area and
its list of online BSs. The query is then requested to its online neighbour BSs which
their areas being covered by the query scope. In this situation, BS7 and BS9 receive
the forwarded query and do the same processes as BS8. Then, BS9 forwards the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 119
Figure 3.25: Moving across to another base station (BS) boundary
query to BS10 since this query covers partial area of BS10. BS10 does the same
process and send the query result back to BS9. BS9 merges its query result and the
one from BS10. BS9 and BS7 send the query result back to the BS8. BS8 combines
its query results and ones from BS9 and BS7. Once the query results are merged,
the result is then sent to the user.
Overlapping BS Area
Having presented two examples of non-overlapping BSs, we now show three examples
of possible situations which show a new location of users as the query scope interacts
with BSs where the BSs area overlaps with others. We also provide examples where
new locations of users are within the intersection area and before the intersection
area.
Figure 3.26a illustrates when a user moves to another BS and there is an over-
lapping area amongst the BSs. After BS1 receives the query from a user, it searches
targets within its area. After that, it verifies whether there is an intersection with
itself. If there is any intersection between BS1 and other BS neighbours, BS1 for-
wards the query to those intersected BSs. In this figure, BS1 forwards the query to
BS2. BS2 adjusts its minimum boundary by assigning the maximum boundary of
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 120
(a) Within an overlapping BS area (b) Outside an overlapping BS area
(c) Within many overlapping BS areas
Figure 3.26: Three situations of overlapping base station area
the BS1. Then, BS2 generates a new query scope by subtracting the current query
scope from BS2 minimum boundary. Finally, all targets within the query scope and
BS2 area are collected and sent back to BS1. BS1 collects the results from BS2 and
combines these into its collection. The final results are sent to the user.
Figure 3.26b shows a situation in which a new location of users is before the
overlapping area (shaded area between BS1 and BS2) and the query scope is beyond
the BS1 area. In this situation, BS1 searches targets within its area and overlapping
areas (shaded area). Once the searching process has been completed, BS1 checks
whether the query scope is beyond its boundary. If it is not, BS1 sends the query
result to the requester.
On the other hand, if the query scope is beyond the BS1 area, BS1 forwards the
query to all BSs which pass their areas. In this example, BS1 forwards the query
to the BS2. BS2 searches its area by excluding the overlapping area since BS1 has
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 121
already searched in that area. BS2 returns the query results to BS1 which returns
the query to the requester.
Figure 3.26c shows one example of complex situations. Once BS3 has received
the user query, it searches its area to match objects within its area with the user
query. The overlapping areas of BS4 and BS5 are included too. Then, BS3 passes
the user query to either BS4 or BS5, depending on which registers with BS3 first. If
BS4 registers first, the overlapping BS5 is included. Otherwise, that area is excluded
from the BS4 area. This situation is similar to that of BS5. If this BS registers first,
the overlapping area BS4 is included in the BS5 area. Alternatively, the overlapping
area does not belong in the BS5 area. After all areas where the user query passes
return their answer to BS3, it joins those answers and sends them to the user.
3.7 Discussion
Our proposed approaches are designed to retrieve all requested objects which have
not been passed while mobile users are travelling to receive the query result. It
focuses on a straight line movement and constant speed.
The proposed approaches are concerned with minimising query processing and
data transfer of the query results while mobile users are travelling within a single
cell or multiple cells. The proposed approaches are divided into two categories, they
are: query processing and handling disconnection. The query processing is further
divided into single-cell and multi-cell queries.
The advantage of proposed query processing approaches is to avoid processing
an unnecessary part of the query, which is the area that have been passed by the
user. Another benefit is to reduce amount of data transfer in sending the query
result.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 122
We also proposed a solution to handle disconnection while transferring the query
result. The proposed solutions are divided to handle predictable and unpredictable
disconnections. The benefits of both proposed approaches are to generate query
result without requesting the user to resend the query while a disconnection occurs.
This query result is produced based on the predicted future location. Unfortunately,
there is a limitation to reproduce query result when the user has not received it for
several times, which is configurable depending on the server load.
3.8 Conclusion
The chapter discussed mobile query processing for single and multiple cells. In the
beginning of this chapter, the effectiveness of using a square shape as a query scope in
location-dependent query processing has been introduced. It focuses on retrieving
static object information within a single cell. It also presents the advantages of
a square shape in preference to other shapes to be used as the query scope when
processing location-dependent queries. The algorithms of retrieving objects location
information are developed to eliminate objects that have been passed by users, even
though they are still inside the query scope. Finally, when users miss query results
and their movements are slow, the past and current query scope overlap each other.
Therefore, an algorithm is developed to handle this situation in order to prevent
redundant information from being sent.
In the second part of this chapter, we discuss three methods to retrieve items
from multiple cells. The first method considers overlapping and non-overlapping BS
scope and the parallel query scope with the base station. The second one deals with
dynamic query scope. Finally, we propose an algorithm to deal with disconnection
while receiving query results. We have discussed the efficiency of those proposed
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 123
algorithms in retrieving query results from multiple cells. Case studies are provided
to show the efficiency of the algorithm.
Chapter 4
Indexing for Multiple Servers
Retrieval
Chapter 3 focused on how mobile query processing is performed. However, we did
not discuss the indexing mechanism when multi-cell queries are requested. Thus,
this chapter presents our new contribution in processing multi-cell queries using
indexing, namely Local Index and Global Index.
4.1 Introduction
It is a characteristic of mobile queries that the locations of mobile users are dynamic
and they often request data items which are located inside a single cell or multiple
cells. This dynamic change has created the need to have a better query processing
speed and to reduce the number of invalid query results.
Query processing to retrieve objects which are located in multiple cells has raised
an issue which impact upon the query processing performance. Each cell finishes
query processing within a different amount of time. The difference in query process-
ing time for each cell is caused by different transfer speeds, queue size and query
124
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 125
processing speed in every server. These three factors increase the query processing
time.
One way to improve query processing is to provide an index structure for each
cell. Indexing technique is a common mechanism to help accessing a collection of
records and improve efficiency of the query processing [93, 129]. An index organises
data records to optimise certain kinds of retrieval operations. Several indexing
schemes have been proposed in the past, the most prominent among them being the
tree-based schemes [123]. The tree indexing schemes start searching from the root
nodes to the leaf nodes. The tree index structures help to process single cell queries.
However, their disadvantage in processing multi-cell queries, is that each cell needs
to traverse from the root node in order to produce the query result.
This chapter proposes two index mechanisms, namely Local and Global Indexing.
The aim is to handle the limitations of multi-cell query processing while examining
index structures in those cells. Neither proposed approaches intends to create a
new type of index structure; however, they extend the existing index structures to
improve the efficiency of multi-cell query processing. Nor do we concentrate on
concurrency issues, which occur in tree searching, and their solutions.
Moreover, both proposed approaches use an original type of multidimensional
index structure, called R-tree [41]. Both proposed approaches can also be applied
to any R-tree families. The proposed mechanisms have their own characteristics,
which can be summarised as follows:
• Local Index mechanism
As the name implies, this mechanism tries to process a multi-cell query within
a single cell. When there is a multi-cell query, remote indices of the objects
in a query result are stored locally in the current cell. In storing the remote
indices, the remote objects information are either replicated with their indexes
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 126
or kept in the original cell, which are then pointed by pointers from the leaf
nodes that hold the remote indexes. Then, if the future multi-cell queries
request for the same area, the current cell can answer it locally. On the other
hand, we do not store all requested remote indices locally, because this would
slow down the query processing and consumes more space.
• Global Index mechanism
This mechanism differs from the Local Index mechanism in terms of indices
organisation. Instead of storing remote indices of the objects in the query
result, a global index structure is created when a base station is starting up.
The cell propagates its index structure to every cell. Hence, while processing
a multi-cell query, the tree index traversal can be done from this global index
structure.
Figure 4.1: Chapter 4 framework
Figure 4.1 shows this chapter’s framework. Section 4.2 gives an overview of
this chapter as a foundation to the proposed approach. Two proposed indexing ap-
proaches using the original R-tree are discussed in Sections 4.3 and 4.4 respectively.
Examples of the usage of both proposed approaches are discussed in Section 4.5.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 127
Section 4.6 presents a discussion on two proposed approach. Finally, the conclusion
summarises the contents of this chapter.
4.2 Preliminary Study
This section presents an overview of the original R-tree indexing structure with a
brief explanation of multi-cell query processing scenario.
When a base station receives a multi-cell query, the base station verifies whether
the query scope beyond the base station area. The part of the query scope that
is beyond its area is forwarded to base stations, whose areas being covered by the
query scope. Assume that the index structure for each base station is the original
R-tree and has been built in advances. Each base station searches its index structure
to match the area that overlaps with that of the query scope. At each base station,
the probing process starts from the root node down to the leaf nodes. Objects of
the matched leaf nodes are collected to be returned to the user.
The R-tree structure is an adaption of the B+-tree to deal with spatial data
and it is a height-balanced data structure, which has internal and leaf nodes [93].
Internal nodes consist of index entries of the form <n-dimensional box, pointer to
leaf node>. Leaf nodes have a pointer to a data entry. Data entry contains a pair
<n-dimensional box, rid> where rid is an identification of an object and the box
is the smallest box that contains the object, that can be presented as a point or a
region. The n-dimensional box for the internal and the leaf nodes is called Minimum
Bounding Rectangles (MBR) or Minimum Bounding Boxes(MBB).
Figure 4.2 shows two dimensional regions and R-tree indexing structure. Figure
4.2a demonstrates the geometric location of objects presented as two dimensional
coordinates. Figure 4.2b shows the R-tree index structure for the two dimensional
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 128
coordinates. In the figure, there are 12 regions of data object (shadowed boxes),
denoted by (R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19). These
regions are presented as leaf entries of the R-tree index structure, as shown in Figure
4.2b. Regions at the upper level of the R-tree represent bounding boxes for internal
nodes. The middle level of the R-tree index structure is called Internal nodes which
are presented in the white coloured boxes. In the figure, there are five internal nodes,
namely: (R3, R4, R5, R6, R7). The top level of R-tree index is called root node. It
has two entries: (R1, R2).
(a) 2D coordinates
(b) R-tree
Figure 4.2: R-tree and 2D coordinates [93]
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 129
The bounding rectangles for at least two nodes can overlap each other. For
example, the bounding rectangles R1 and R2 overlap each other. It implies that more
than one leaf nodes could occupy a given data object while satisfying all bounding
rectangles boundaries [93]. However, every data object is stored in exactly one leaf
node, although its bounding rectangle overlaps with the regions corresponding to at
least two higher-level nodes. Let us consider data objects R8 and R9. These data
objects are located within region R3 and R4, however, the data objects R8 and R9
are located only within either R3 or R4.
4.3 Local Index
This section discusses the Local Index (LI) mechanism for mobile query processing to
retrieve data items from multiple cells. The data indexing mechanism is an efficient
way to retrieve data items across multiple cells. The indexing mechanism improves
the retrieval time by supplying the correct information for a client to retrieve the
remote data items from the current cell. The R-tree indexing structure is used to
store the multi-dimensional data item indexes.
The LI mechanism is a tree indexing mechanism where an index structure of one
cell contains indexes from other cells. In this mechanism, indexes from different cells
are not replicated, however the requested data item indices are stored into the local
cell. In other words, the tree index structure is expanded by adding the new remote
data item indices to local index structure. However, there is a situation where the
maximum number of nodes held has been reached. In this case, a number of nodes,
which holds the remote indexes, need to be deleted from the tree in order to put the
new remote indices. The total of the number of eliminated items and the available
spaces must be the same as total of new remote indexes. The insertion or deletion
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 130
operation is similar to that of insertion or deletion of an index for a single cell.
For simplicity, the geometric location of data items is described in two dimensional
coordinates. In the LI mechanism, each cell has its R-tree indexing structure to
index its local data items. The R-tree structure for each cell is different from the
others.
Figure 4.3: Three index structure into 3 cells
A simple scenario is the current server sends a query scope to two neighbourhood
cells for finding Automatic Teller Machines (ATMs). Figure 4.3 depicts the initial
index tree for each cell where the ATM location is used as the index partitioning
attribute which is the same as the table partitioning. The following assumption
is made for the range partitioning rules as follows: cell 1 holds the index location
between 1 to 30, cell 2 holds the index location between 31 to 60 and the rest go to
cell 3. Each key in the local index corresponds to a local record. Please note that
although the internal nodes amongst these cells are different, they are the same in
naming convention.
The indexing structure above is developed from tables which is presented in
Figure 4.4 which consists of index number, location and name of object.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 131
Figure 4.4: Tables for cell 1, cell 2 and cell 3 (from left to right)
Upon receiving query results, there are two ways to store the remote data items
for LI mechanisms in the current server. The server can store the remote indexes
only or the indexes with original data items. The details of both processes are
discussed in Sections 4.3.1 and 4.3.2. For simplicity, the LI mechanism with remote
indexes only is called as LI-1; whereas, the LI mechanism with remote indexes and
data items is called LI-2.
Algorithm 4.1 shows the details of Local Index algorithm, which can be explained
as follows:
(i) A mobile client sends a query scope which involves query result retrieval from
multiple cells. The server receives the query and probes keys in the local index
structure to determine whether there are any keys located within the query
range.
(ii) If the server finds only a partial of query scope, the remaining query scope
is sent to its neighbouring cell. Upon receiving the result, the server receives
either index only, or both index and data values of the requested data items.
The local server stores all incoming indices into available nodes of the local
index structure. If the server receives only the indices of data items, then,
pointers from those nodes to data values in neighbouring cells are created to
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 132
link the cached index and original data values. In the second situation, pointers
are created from nodes to local storage.
(iii) If the server finds all indices within its index structure, the server retrieves
data values by following the available pointers in those nodes.
(iv) The query result is sent to the mobile user once the query result is ready.
Algorithm 4.1: The Local Index algorithm
Input: QScopebegin
Rtree ← store indexes of objects into a R-tree indexQuery scope ← QScopeNoOfBSOnline ← number of online neighbour BSCollectionOfOnlineBS ← list of online BSResult ← search rtree(query scope)while (index < NoOfBSOnline) do
Current Neighbour BS ← CollectionOfOnlineBS at position indexif intersection(Queryscope, Current Neighbour BSscope is existed) then
Result neigh ← Current Neighbour BS(Queryscope)Rtree ← update Rtree(Result neigh)// Append retrieved results from current BS// to the end collection of all retrieved resultsResult ← Result + Result neigh
endindex ← increment index by one
endend
4.3.1 Cache Remote Indexes Only
A discussion of the proposed LI mechanism where the data and index segments lo-
cated in different cells is presented in this section. The aim is to speed up the query
processing by searching the common requested objects locally and cache mainte-
nance.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 133
Consider the indexing structures and data tables as shown in Figures 4.3 and
4.4 respectively. Assume that the server from cell 2 sends a query scope to cell 1
and the server cell 2 receives an item with index 29 from cell 1. The server adds
the index 29 and creates a pointer to the original data item, which is cell 1. Figure
4.5 shows the indexing structure after the index 29 is inserted into cell 2. Note that
there is a pointer from index 29 to data item 29 in cell 1.
Figure 4.5: Index structure after the records insertion using local index-1
This mechanism has a tree management similar to that of a single cell for R-tree
management [41] The procedure for insertion and deletion steps can be summarised
as follows.
After a new index has been received by the appropriate cell, the new index
is appended to the existing index structure. Algorithm 4.2 shows the insertion
algorithm of LI-1. They can be described as follows.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 134
(i) If the maximum number of entries has been reached, remove an existing index
from the tree which has not originated from this cell. The removal procedure
will be discussed later.
(ii) Find a right leaf node to insert the new key into the indexing structure.
(iii) Insert the new key if there is enough space in the leaf node.
(iv) Otherwise, this leaf node must be split into two leaf nodes and propagate the
splitting up to the root node if needed. This splitting process can be done by
using one of the existing splitting algorithms for the original R-tree.
(v) The last step is to create a data pointer from the entry on the leaf node where
the new index key is inserted to the data item at the remote cell.
Algorithm 4.2: The insertion algorithm of Local Index-1
Input: indexesbegin
MAX CAPACITY ← maximum nodes to be stored into this RtreeRtree ← indexes of objects in the R-treertree capacity ← current capacity of Rtreenum nodes freed ← 0size of indexes ← get size(indexes)if MAX CAPACITY - rtree capacity < size of indexes then
num nodes freed ← size of indexes - (MAX CAPACITY -rtree capacity)Rtree ← remove nodes(num nodes freed)
endfor each index in indexes do
Inserted node ← insert index to the R-tree and return a reference ofthe Inserted nodecreate a pointer from the Inserted node to the original data of index.
endend
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 135
When the maximum number of entries has been occupied, some entries need
to be removed from the cache. The process for evicting some entries is shown in
Algorithm 4.3, which is described as follows:
(i) Select a key victim using one of the existing cache replacement policies.
(ii) Remove the data pointer from the desired key, then discard the desired key
from the cache.
(iii) When eliminating the desired key from a leaf node, there is a chance that the
node becomes underflow after the removal. If this case occurs, try to find a
sibling node which needs less enlargement and redistribute the entries among
the node and its sibling so that both are at least half full; otherwise the node
is joined into its siblings and the number of nodes is decreased.
Algorithm 4.3: The deletion algorithm of Local Index-1
Input: num nodes freedbegin
Rtree ← indexes of objects in the R-treefor i ← 0 to num nodes freed do
index ← select victim()Deleted node ← Find a node, which match with the indexremove pointer(Deleted node)Rtree ← remove node(Deleted node)
endend
4.3.2 Cache Remote Indexes and Data Items
Caching remote indexes would only increase the data transfer to the remote cell.
This situation leads to a bandwidth bottleneck, although the data transfer is much
wider nowadays. To avoid the bandwidth bottleneck, caching the common requested
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 136
remote indexes and their data items into the desired cell is one solution. This issue
is the focus of our discussion in this section.
The LI process is similar to that described in the previous section, except this
time the actual data item is cached. To simplify our discussion, we use the same
illustration as in the previous section. In the previous section, the data item is not
copied into cell 2 and a data pointer is created and points to the data item located
in cell 1. Figure 4.6 illustrates index caching and its record. In the figure, the data
item from cell 1 is copied to cell 2. The data pointer is pointing to the local data
table instead of to the remote data table.
Figure 4.6: Index structure after the records insertion using local index-2
The summary of the detailed procedure for inserting and deleting are as follows:
Algorithm 4.4 presents a details of the insertion process of LI-2. The insertion
process is similar with the LI-1, they are different in creating a pointer to data item,
which is the last two steps of insertion process. They can be described as follows.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 137
(i) Store the remote data items to the requester cell.
(ii) Create data pointers from the entry on the leaf node, where the new indexes
key are inserted, to the data items at the requester cell.
Algorithm 4.4: The insertion algorithm of Local Index-2
Input: indexes, data itemsbegin
MAX CAPACITY ← maximum nodes to be stored into this RtreeRtree ← indexes of objects in the R-treertree capacity ← current capacity of Rtreenum nodes freed ← 0size of indexes ← get size(indexes)if MAX CAPACITY - rtree capacity < size of indexes then
num nodes freed ← size of indexes - (MAX CAPACITY -rtree capacity)Rtree ← remove nodes(num nodes freed)
endfor each index in indexes do
data storage[index] ← Store data(data item[index])Inserted node ← insert index to the R-tree and return a reference ofthe Inserted nodecreate a pointer from the Inserted node to data storage[index]
endend
Algorithm 4.5 shows the deletion algorithm of LI-2. The deletion process of
LI-2 is similar as LI-1, except the deletion process of LI-2 has an additional step to
remove the replicated data items. It can be explained as follows:
(i) Find a node to be deleted.
(ii) Remove the data item which is pointed by a data pointer.
(iii) Remove data pointer.
(iv) Remove the node from R-tree and adjust the R-tree if necessary.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 138
Algorithm 4.5: The deletion algorithm of Local Index-2
Input: num nodes freedbegin
Rtree ← indexes of objects in the R-treefor i ← 0 to num nodes freed do
index ← select victim()Deleted node ← Find a node, which match with the indexremove dataItem(Deleted node)remove pointer(Deleted node)Rtree ← remove node(Deleted node)
endend
4.4 Global Index
While a server is requesting data items on behalf of mobile clients to neighbouring
cells, the server performs some activities before the server sends the query result back
to the client. These activities involve waiting for the query result to be received,
and caching the new data items. The caching processes include the index insertion
into its local tree index structure and adjusting its nodes after the insertion. These
processes slow down the query processing; however, this limitation can be handled
by Global Indexing (GI) mechanism.
Unlike the LI mechanism, the index structure is built while a server in one cell
is starting up in the GI. Also, the GI mechanism has some degree of replication and
all indices are maintained globally. In other words, each cell has a different part of
a global index, although the global index structure is still kept.
In this mechanism, the ownership of each index node needs to be maintained in
order to preserve the global indexing structure. The ownership rule of each index
node is that the cell owning a left node also owns all nodes from the root to that
leaf. Consequently, the root node is replicated to all cells and the internal nodes
(except root node) may be replicated to some cells. Furthermore, if a leaf node has
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 139
some keys belonging to different cells, this leaf node is replicated to the cell owning
the keys.
As a running example, let us consider three different cells and 100 data items.
Each cell holds 30 indices of point data items. Cell 1 holds data item indices from
1-10. Cell 2 hold data item indices 11-20. The rest of the data item indices goes to
cell 3.
Figure 4.7: Global Index for all cells using GI mechanism.
Figure 4.7 shows a GI (global index) structure which are partitioned by a cell
boundary. The root node is replicated to all three cells and some nodes are replicated
to neighbour cells. The key P10-P12 (the fifth leaf node) is copied to cell 1 and 2,
because this node holds entries that belong to two cells. The key P10 belongs to cell
1 and keys P11 and P12 belong to cell 2. Due to some leaf nodes being replicated,
some of the internal nodes are replicated whereas others are not. For example, the
non-leaf R2 is replicated to cells 1 and 2, whereas the non-leaf R5 is not replicated.
Each leaf node has a data pointer which points to a data item located in either the
same cell or a different cell.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 140
Similar to the LI mechanism, the replication degree can be either indexes level
or indexes and data items. The GI mechanism with indexes only is called GI-1. On
the other hand, the GI mechanism with indexes and data items is called GI-2.
4.4.1 Remote Data Items Located at Different Cell
Our discussion here elaborates on the GI mechanism when the data items located
at the remote cell are not replicated to other neighbour cells. The global index
structure maintenance and query processing are two main discussions.
Algorithm 4.6 is to maintain the global index structure where the degree of data
items replication does not exist. The algorithm includes the insertion and deletion
entries to neighbour cells. However, the details of the R-tree splitting procedures
are not discussed here; these can be found in [41].
The algorithm is used to match a node with a given key and to perform the
insertion or deletion process. Details of the algorithm are as follows. The key here
is Minimum Bounding Rectangle (MBB) value. This algorithm is recursively probing
tree nodes starting from the root node to leaf-node. The key insertion or deletion
is done once a node has been found. Then, a data pointer is established or removed
between the entry to the actual data item depending the operation. Unless the key
is found in the current cell, a child tree (cellTree) is passed to a neighbouring cell to
probe its tree. When a node is overflow or underflow after a key has been inserted or
deleted, the existing splitting or merging algorithm for R-tree single cell is applied.
The starting data pointer is adjusted if the entry is moved to a new node.
In this mechanism, the data structure for the GI mechanism can be explained
as follows:
If a child node exists locally, the node pointer points to this index only, although
this child node is also replicated to other cells. For example, from MBB R1 at cell
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 141
Algorithm 4.6: Node maintenance of GI-1 algorithm
Input: Tree, Key, Operationbegin
Node ← a root node of Treeif Key /∈ Node then
cellTree ← assign child tree in neighbour cellsNode Maintenance(cellTree, key, operation)
elseif Node is leaf Node then
Execute insert/delete operation on local nodeCreate/remove a data pointer from the entry to the actual dataitemif Node is overflow or underflow then
Execute split/merge on leaf nodeAdjust all starting point of data pointers in the leaf node.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 142
1, there is only a node pointer to the local MBB R2. The MBB R2 at cell 2 will not
accept an incoming node pointer from the MBB R1 at cell 1; however, it will accept
one node pointer from the local MBB R1 only.
If a child node does not exist locally, the node pointer will select one node pointer
to the closest child node (in case if multiple child nodes exist somewhere else). For
example, from the MBB R1 at cell 1, there is only one outgoing right node pointer
to the child node (R3,R4) at cell 2. In this case, an assumption is made that cell 2
is the closest neighbour of cell 3. Hence, the MBB R3 and R4, which also exists at
cell 3 will not accept a node pointer from root node R1 at cell 1.
Using the single node pointer model discussed above, there is always a chance of
tracing a node from any parent node from different cell. Figure 4.8 shows a single
node pointer model for the GI mechanism, which presents only the top three levels
of the index tree exhibited previously in Figure 4.7. From the figure, it is possible
to trace to nodes (R10,R11) from the root node 37 at cell 1, although there is no
direct connection from root node R1 to its direct child node (R3, R4) at cell 3. This
tracing to node (R10, R11) also can be done through node (R3, R4) at cell 2.
Figure 4.8: GI mechanism uses single node pointers.
A single node pointer model can be more formally described as follows.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 143
1. Given a parent node is duplicated when its child nodes are separated to mul-
tiple places, there is always a direct connection from whichever copy of this
parent node to any of its child nodes.
2. Applying the same procedure as the first point above, given a replicated grand-
parent node, there is always a direct connection from whichever copy of this
grandparent node to any of the parent node.
Considering both the above statements, we can conclude that there is always a
direct connection from whichever copy of the grandparent node to any of its child
nodes.
Apart from the issues of node pointer at internal nodes, that are at leaf nodes are
worth discussing. As some leaf nodes are replicated, it is also important to manage
data pointers at leaf nodes. Figure 4.9 shows a data structure where the data items
are not replicated to anywhere. In this figure, not all data pointers are shown in
order to increase the readability of the figure. As shown in the figure, leaf node that
contains indexes P10-P12 is replicated at cell 1 and 2. By applying a single node
pointer mechanism, each data item accepts two data pointers. For example, the
record for entry P10 accepts 2 incoming data pointers from cells 1 and 2. Similarly,
records for entry P11 and P12 receive 2 incoming data pointers from cells 1 and 2.
This mechanism has a similar concept to LI-1; that is, the leaf node is replicated
and its pointed record is not replicated. The main difference between GI-1 and LI-1
is the fact that GI schemes have one global index, whereas the LI schemes use a
local index.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 144
Figure 4.9: Global Index without replicated remote data items.
4.4.2 Remote Indexes and Data Items Located at Same Cell
In this mechanism, the data items are replicated to a cell to which the entries at leaf
node level are replicated. In other words, GI-2 has a similar idea to GI-1 in terms of
non-leaf nodes replications. Both approaches are different in the way they establish
data pointers at the leaf node level. The GI-2 has an extra step to replicate the
remote data items.
In this mechanism, the data structure for the internal node is similar to GI-1,
except a data pointer at leaf node points to the record this located at the same cell.
This data pointer can be explained as follows:
If a leaf node exists locally, a data item is not replicated and it is linked with the
entry in the leaf node by a data pointer. Figure 4.10 illustrates the GI mechanism
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 145
with replicated data items. In this figure, not all data items and data pointers are
shown to have clear visibility of the figure. For example, from the entry P4 and P5
at cell 1, there is only one data pointer to data items and the data items are not
replicated to cells 2 and 3.
Figure 4.10: GI mechanism where data items are replicated.
If a leaf node is replicated, the data items which belong to entries in the leaf
node are replicated to cells where the leaf node is duplicated. Once the data items
have been replicated, data pointers are created to link entries in the leaf node to
the appropriate data items at each cell. For example, leaf node (P10,P11,P12)
is replicated at cell 1 and 2 where the data items of those replicated entries are
duplicated. For example, the record for entry P10 is replicated from cell 1 to cell 2.
The data pointer for the entry P10 at cell 1 points to record P10 at cell 1 (Dotted
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 146
line), whereas the data pointer for the same entry at cell 2 points to record P10 at
cell 2 (line). Similarly, the record for entry P11 is replicated from cell 2 to cell 1. A
data pointer is established between entry and record for P11 at cell 1 and another
data pointer is created to link the entry and the record for P11 at cell 2.
Algorithm 4.7 is to maintain index nodes in the global index where the remote
data items are replicated. A GI with replicated data items model can be more
formally described as follows:
1. If a leaf node is replicated to another cell, there is always a copy of data items
for each entry in the leaf node. In addition, a direct connection from an entry
to a data item in the same cell always exists.
2. When a leaf node is not replicated to another cell, there are always original
data items for each entry in the leaf node. There is a single direct connection
from an entry of leaf node to a data item.
3. The number of direct connections between leaf node and data items is always
equal to the number of entries in each node.
4.5 Case Study
This section describes a case study using the proposed approaches. Two case studies
for each proposed indexing approach are presented in this section. To simplify our
explanation in these case studies, we reuse the indexing structure in Figure 4.3 on
page 130. Two case studies are presented in this section. A Local Index case study
is presented first, followed by a Global Index.
Let us suppose that the proposed Local Index mechanism is applied for the first
case study. Assume that a mobile user requests data items from cell 2 and the query
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 147
Algorithm 4.7: Node Maintenance of GI-2 algorithm
Input: Tree, Key, Operationbegin
Node ← a root node of Treeif Key /∈ Node then
cellTree ← assign child tree in neighbour cellsnode Maintenance(cellTree, key, operation)
elseif Node is leaf Node then
Execute insert/delete operation on local nodeReplicate the remote data itemCreate/remove a data pointer from the entry to the actual dataitemif Node is overflow or underflow then
Execute split/merge on leaf nodeAdjust all starting point of data pointers in the leaf node.
sending loc ← sender locationreceiving loc ← recipient locationnext loc ← next predicted location from the receiving locationGroups ← list of groupsnumOfRequiredSlots ← number of required slots to be freedwhile numOfRequiredSlots ≥ numOfSlotsFreed do
for each group in groups doif (sender loc or recipient loc or next loc) ∈ group then
continueendDist to next ← find distance from the centroid of the group to thecentroid of the next loc.Dist to recipient ← calculate distance from the centroid of thegroup tothe recipient positionMin dist group ← Min(Dist to next, Dist to recipient)List Min dist ← add Min dist group to list
endMax dist group ← Max(List Min dist)groups ← remove groupgroup idnumOfAvailableSlots ← numOfObjects + numOfAvailableSlots
endend
In the figure, distances from the centroid of group G0, G2, G6 and G7 to the
current and next positions are calculated. Then, the minimum value for each group
is collected, for example MinG0 (8,11) = 8, MinG2(10,15) = 10. After that, the
maximum value of those minimum values is selected. The group that has this
maximum value will be eliminated from the cache.
On the other hand, there is a possibility that a future query scope may overlap
with more than one group. To handle this situation, a mechanism similar to the
one in the previous example is applied. The distance from a group to all overlapped
groups and current position is measured. The minimum distance of this group is
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 171
Figure 5.5: Complex illustration of our elimination approach
selected. Using the same mechanism, the distance calculation is also applied to all
groups in the cache, and the minimum distance for each group is chosen. Then, a
maximum value from those minimum distances is taken. The cached objects within
the group that has the maximum distance value are eliminated from the cache.
Figure 5.6: A query scope overlaps with multiple groups
Figure 5.6 illustrates a query scope overlapping with two groups (G1a and G1b).
The figure shows that G0 and G2 are the two groups to be eliminated. In order
to eliminate them, distances from a centre point of G0 to the centre of overlapped
groups (G1a and G1b) are measured. After that, a distance from G0 to the receiving
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 172
location is computed. Then, a minimum distance of those distances is taken, which
is 8.5. A similar procedure is applied for finding a minimum distance value of G2,
which is 9.5. In the final stage, a maximum distance of both minimum distance
values is chosen, which is 9.5. Hence, G2 is eliminated from the cache.
5.3.2 Density Based Elimination Algorithm
A discussion of the density-based cache replacement policy is presented in this sec-
tion. Density is the ratio between the number of items and the area of a group. If
a group has fewer objects, the density of that group is small. It implies that the
group does not have many items of interest. Therefore, the group which contains
beginnext location ← predict next location after the user received query resultgroups ← current available groupsnumOfRequiredSlots ← number of required slotswhile numOfRequiredSlots ≥ numOfSlotsFreed do
for each group in groups dogroup ← Find a group that has less collection and least accessedtimesisReqNext ← Check possibility for the group to be requested next.if isReqNext 6= true then
numOfSlotsFreed ← numOfSlotsFreed + numItemsInTheGroupgroup ← remove all cached items in the selected group.
end
end
endend
Algorithm 5.2 shows the elimination of cached objects based on density. First, it
predicts the next location of the user and starts the elimination process. While it is
evicting the group, the group that has a smaller collection will have higher priority
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 173
to be removed. In other words, a group which has the least density value has
higher priority to be eliminated. After the group has been eliminated, the number
of available slots is calculated. If the number of slots is insufficient, the elimination
process will evict more groups until there is a sufficient number of needed slots. This
algorithm does not prioritise user movement patterns.
Figure 5.7: Illustration of density elimination
Figure 5.7 presents an illustration of density elimination, where N presents a
number of objects in a group. The area of every group has the same size. Consider
that the user moves from the current location (G5) to the retrieving location (shaded
area), where the user receives the query result. While retrieving query result, the
cache is full and cached objects need to be evicted. The next location after the
recipient location is predicted to determine whether the next group is not requested
in the next interval time. In our scenario, the least dense group is G1, however, this
group is not eliminated because it is predicted to be requested. Hence, a group of
cached objects in G0 is evicted from the cache. The elimination keeps going until
there is enough space to store the incoming objects.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 174
5.3.3 PDAID Elimination Algorithm
This section presents our cost-based replacement policy, which is called Probabil-
ity Density Area Inverse Distance (PDAID). The proposed approach eliminates a
group of cached objects based on a cost value. The cost value is calculated during
the cache retrieval and is based on several factors. Therefore, this section is divided
into two subsections, a cached objects retrieval modification and the cache replace-
ment algorithms. The early subsection discusses how a cost value is calculated and
updated. The later section shows the proposed cache replacement algorithm which
uses the calculated cost to remove cached objects from the cache.
Modification of Cached Objects Retrieval Algorithm
This section discusses a proposed cache objects retrieval approach which modifies
the general cache retrieval algorithm and would be used with the PDAID proposed
approach. The main discussions of this section focus on our cost formula and the
modified cache objects retrieval algorithm.
As mentioned in Section 5.2.7, the density of the valid scope area as an additional
factor is taken into account in our proposed approach. Hence, the PAID formula is
modified to have a density factor by replacing the value A with Da, where the Da
is the density value of an area. The value DA is calculated as follows:
Da = NA
Where Da is the density value of an area,
N is the number of objects in an area, and
A is the valid scope area.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 175
Therefore, the modified formula for C can be found as follow:
Pg = P * Da
Where Pg is the access probability of a group,
P is the access probability of an item, and
Da is the density of an area
To simplify the access probability formula, we assume α is constant. Thus, the
access probability formula becomes:
P = 1tc−ti
where tc is the current accessed time, and
ti is the last accessed time.
Hence, the cost of data value becomes :
C = Da(tc−ti)∗D
C = PgD
Where Pg is an access probability of a group,
D is the data distance, and
C is the elimination of cost.
Using the modified PAID formula, the cached objects retrieval and elimination
processes are slightly modified. The aim is to include a weight factor into the
formula. In the above formula, the value of Pg is computed during the retrieval
process, because if this computation is completed during the elimination process,
it would slightly increases the computation during that process. The value of C
is calculated during the cache elimination process, because the user is dynamically
moving and the distance is not independent on the current location of user.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 176
A group of cached objects is accessed while retrieving the cached objects or
storing new objects to the cache. In storing new objects, those objects are grouped
and all factors which are mentioned above are kept. In grouping the cached objects,
they form new groups or merge into existing groups. When new groups are formed,
their Pg values are initialised to zero. If the new objects are merged into new existing
groups, the existing groups are split into new groups. The Pg value of both groups
are not reset back to zero, because they contains existing cached objects. The Pg
value of a group is recalculated when the information of cached objects in the group
are retrieved, then value of accessed time and the Pg value are updated.
Algorithm 5.3: Cache retrieval for PDAID algorithm
beginGroups ← Find any group intersects with the query scope in nextposition.tc ← current timeti ← 0for each group in Groups do
groupResult ← find the cached objects in the group matches with thequery scopeif groupResult 6= empty then
Algorithm 5.3 shows the cache retrieval algorithm which considers multiple fac-
tors. When the query scope intersects with a group of cached objects, the value of
Pg is updated. The value of Pg is to keep track of the cost of a group which has
the requested query result. After the value Pg has been calculated, the query result
is added to the parameter results. After that, the algorithm continues finding the
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 177
next intersected group, calculating the value Pg and adding the result found to the
parameter results. Once the cached result has been generated, the result is sent to
the user.
Figure 5.8: Illustration of PDAID retrieval
Figure 5.8 shows an illustration of PDAID retrieval. Assumes that area of every
group is the same. G0 and G1 are two groups which are stored at time t0 and t1
respectively. At time t2, a number of new objects is stored and all cached objects
are regrouped. This situation causes the G0 is split to 2 groups: G0 and G2.
The access probability of a group can be explained as follows: The Pg values of
G0 and G1 at time t0 and t1 are initialised to zero. When the G2 is formed, the value
of Pg is not zero if it has any existing cached objects. Therefore, the calculation of
Pg for G0 as follows: the density value (Da) is 4; the value of P is 0.5 and the value
of Pg is 4 * 0.5 = 2. The Pg calculation for G2 can be done in the same way as G0,
thus its value is 2.5.
PDAID Replacement Algorithm
This section presents our cost-based replacement policy, which is called Probability
Density Area Inverse Distance (PDAID). Earlier in this section, we have shown how
to calculate the access probability cost for each group while a group of cached objects
is accessed. To simplify our proposed approach, we assume that groups of all cached
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 178
objects have been formed and each group has had its value of access probability
calculated.
The formula to calculate value of C as follows:
C = PgD
Where Pg is an access probability of a group,
D is the data distance,
C is the elimination of cost.
When the cache does not have enough space, a group is eliminated. The group
elimination is done by calculating the value of C for all groups and removing the
group that has the smallest value of C. The value of C is the elimination cost. The
value of C is calculated by dividing value of Pg with a distance. The distance is
measured between the central point of a group and a next predicted location of
the user. The next prediction location is a location after the receiving location
and determined based on the travel history of mobile user. A group which has the
smallest distance has the smallest chance to be accessed again. Therefore, a group
that has the smallest value of C is eliminated.
Algorithm 5.4 shows the cached objects eviction based on multiple criteria. At
the start of the algorithm, all groups in the cache are assigned to parameter Groups
and the number of required slots is assigned to parameter numOfRequiredSlots. Once
the parameter assignments have been completed, the algorithm starts eliminating
groups which retrieve the Pg value, measure the distance and calculate the value
of C. The group eviction process is similar to that with algorithms in previous
section. This algorithm finds the group with the smallest value of C as an evicted
victim. Once the evicted group has been removed from the cache, the parameter
numOfSlotsFreed is increased by the number of items in the evicted group and the
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 179
Algorithm 5.4: Cached objects elimination of PDAID algorithm.
begingroups ← all groups in cacheuserLocation ← current location of usernumOfRequiredSlots ← number of required slotswhile numOfSlotsFreed ≥ numOfRequiredSlots do
min value ← maximum valuewhile each group in groups do
D ← calculate distance(group, userLocation)Pg ← retrieve value(Pg)C ← Pg / Dcurrent Value ← the value of Cif current value < min value then
min value ← current valuemin group ← group
end
endif min group 6= empty then
groups ← remove(min group)numOfSlotsFreed ← numOfSlotsFreed + numItemsInTheGroup
end
endend
parameter min value is reset to the maximum value. Then, the elimination process
continues until the number of required slots has been made available.
The illustration of Figure 5.8 is reused to describe the PDAID replacement policy.
Recalls the values of Pg for G0, G1 and G2 are 2, 0 and 2.5 respectively. The user
at current position (shaded circle) stores a new incoming objects, however the cache
space is not enough to accommodate those incoming objects. Thus, the existing
cached objects are evicted. The eviction process is completed by choosing a least
value of C amongst all cached groups. The value of C is calculated by dividing the
value of Pg with the distance. Therefore, the values of C for all cached groups are
0.2, 0 and 0.208 respectively. Hence, the eviction order is G1, G0 and G2.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 180
5.4 Case Studies
This section presents several case studies to illustrate our proposed approach. The
initial situation is given earlier, followed by illustrations and explanation for every
proposed approach.
Figure 5.9: Initial situation after cached objects have been grouped
Figure 5.9 shows the initial client cache status after the user has sent some
queries. The figure shows that some groups of objects have been formed and also
shows the current position of the mobile user. In the current situation, the user
would like to store the incoming query results to the cache; however, the cache
cannot store all incoming objects. Therefore, some cached objects are going to be
evicted to make available more empty cache slots.
To explain our cache invalidation policy with the example above, we present our
proposed approaches in three points, where each point describes density-based, path-
based and PDAID (cost-based) replacement policies respectively. The discussion of
three proposed approaches are as follows:
Case Study 5.4.1. The density-based policy
Assume that areas of all groups are the same size, which is a 2-unit area. Two
case studies are given below:
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 181
Case 1: Number of cached objects for each group are as follows: G0: 10, G1: 5, G2:
6, G3: 8, G4: 10, G5: 8, G6: 10, G7: 6. This illustration is shown in Figure 5.10.
Figure 5.10: Density based approach (Case Study 5.4.1-1)
In this case, G1 is the group that has the least number of cached objects. How-
ever, this group is not going to be removed since it is predicted to be the next
requested query. Hence, either G2 or G7 is going to be removed in this case. A
group which has the least access time has a higher possibility of being removed.
Case 2: The number of cached objects for each group is similar with the case
study 1, except as follows: G1: 10, G3: 4, G4: 5, G7: 8. Figure 5.11 shows an
illustration of this case study.
Figure 5.11: Density-based approach (Case Study 5.4.1-2)
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 182
The group that has the least number of cached objects is G3. However, this
group is not going to be removed since it has been accessed recently. Therefore, the
next victim is G4. Similar to G3, this group has been recently accessed and it will
not be removed. Then, group G2 is the next group that has the least collection and
access time. Therefore, this group is going to be removed.
For both cases, the new objects are inserted into the cache after the cache elimi-
nation, followed by the creation of a new group or adjustment of the existing group.
Adjustment of the existing groups is performed only to the groups that have the new
inserted objects. The outcome of the adjusting existing group is that the existing
groups have new object members and/or new groups are created.
Case Study 5.4.2. The path-based policy
In this part, we discuss the use of the Path-based approach. Two cases are
presented. In the first case, a query scope overlaps only with one group, whereas
the second one overlaps with multiple groups. The illustrations of both cases are
slightly different.
Figure 5.12: Path-based approach (Case Study 5.4.2-1)
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 183
Figure 5.12 shows the first case illustration for Path-based elimination where
the query scope covers only a single group. Group victim selection is done by
calculating two different distances. The first distance is measured between centroids
of two groups and the other one is calculated between centroid of the group and the
user’s receiving location. For example, the distance between G7 and G0 is 8, while
the distance between G7 and the user is 15. Once the two different distances have
been calculated, the smallest value is selected, which is 8. A similar procedure is
applied for G1, G2 and G6 and the smallest distance value for each group is selected.
Those values are 5, 18 and 10 respectively. Distances for group G4, G3 and G5 are
computed since these three groups are being used. Once the smallest distance values
have been selected, the maximum values are targeted as victims. Hence, G1 is the
victim and is eliminated from cache.
Figure 5.13: Path-based approach (Case Study 5.4.2-2)
Figure 5.13 shows a situation which is similar to the first case. In this case, the
next predicted query scope covers two cached groups. The elimination process is the
same as for the first case. To simplify, distances for G7 are 8, 15 and 17 (denoted by
H, G and I respectively), distances for G2 are 18, 10 and 26 (denoted by C, D and
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 184
K respectively), and the distances for G6 are 12, 25 and 27 (denoted by F, E and
J respectively). The minimum distance values for all three groups are 8, 10 and 12
for G7, G2 and G6. Then, the maximum value of those minimum distance values is
selected, which is 12. Hence, G6 is eliminated from the cache.
After one or more groups have been eliminated, the remaining processes are the
same as for case study 5.4.1, which inserts new objects and regroups the cached
objects. The regrouping process adjusts the existing groups and/or creates new
groups.
Case Study 5.4.3. The PDAID (cost-based) policy
This case study shows how to eliminate a group of objects based on the cost of
a group. The cost value is calculated based on PDAID mentioned in Section 5.3.3.
Figure 5.14 shows an illustration of this case study. For simplicity, the cache
has been filled by objects and the groups have been formed. These cached objects
have not been accessed again. Assume that the client receives new objects at the
current position and the cache is full. The cached objects which are located in next
prediction location are not evicted, because these objects will be requested next.
In this situation, access probability of all groups are zero since they have not been
accessed. When the user accesses G3 and G4, the access probability of this group
changes. Therefore, the eviction is based on the first incoming group. G0 is going
to be eliminated first if it is the first group inserted to the cache.
After one or more groups have been evicted, the rest of the caching process is
the same as for case study 5.4.1, which enters new objects and then adjusts existing
groups or creates new groups.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 185
Figure 5.14: PDAID-based approach (Case Study 5.4.3)
5.5 Discussion
This section discusses our proposed approaches. First, we discuss our elimination
approach based on distance followed by that based on density. The last discussion
is the elimination based on multiple factors.
Our proposed approach to elimination is similar to that of the FAR algorithm.
In our proposed approach, we eliminate a group of objects rather than individual
objects. The group is eliminated if the group has a maximum distance to the next
predicted location and minimum distance to the current location. The distance be-
tween two groups is a distance between two centroids of each group. Each group may
have a different shape which has a different formula to decide the shape. Therefore,
we use formula K-mean [34] to find the centroid of each group. In addition, when
a query scope overlaps more than one group, a minimum value from both groups is
found.
The second elimination approach is based on density. In this approach, the group
which has fewer objects has higher priority to be evicted. If the group is far away
and has more objects within a small area, this group is not eliminated. Therefore,
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 186
distance is not counted in this approach. When more than one group have the
same density value, the group which is formed first has a greater chance of being
eliminated in advance.
The last approach is based on multiple factors. The approach, called PDAID,
calculates the cost of a group based on several factors. The PDAID approach is
similar to the PAID approach, except we consider density and area values of one
group rather than an area of that group. The reason is that a larger area may consist
of only a few objects compared with the smaller area. A group has higher priority
to be chosen for elimination if the group has the furthest distance, the longest access
time, fewer objects and a small area.
5.6 Conclusion
This chapter discussed about our three proposed approaches for the client caching,
focusing on objects elimination. The aim of our proposed approach is to answer
client queries which satisfy at least K-objects answered from the cache. Our pro-
posed elimination approaches eliminate a group of objects based on three different
criteria. With the first criterion, the group of objects is eliminated based on dis-
tance. The second one is density-based; whilst the last one is elimination based on
multiple criteria.
In the first criterion, the distance-based elimination uses the MinMax algorithm,
which eliminates the group which is faraway from the next predicted location after
the location from the user has received a query result. In the second criterion, the
density-based elimination drops the group which has drop objects. The last one is
based on cost of a group by considering four factors in order to eliminate a group.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 187
These four factors are: access probability, valid scope area, density and data distance
factors.
Chapter 6
Performance Evaluation
This chapter presents the performance evaluation of our approaches that have been
elaborated in Chapters 3, 4 and 5. The purpose of this chapter is to evaluate those
approaches under various conditions.
The evaluation is performed by implementing and simulating the proposed ap-
proaches using JavaTM and PlanimateTM . The implementation and its results are
presented in Section 6.1; while the simulation and its results are presented in Section
6.2. The implementation section briefly describes our implementation and its results
for query processing at the server side. The simulation section contains a short sum-
mary of the simulation model, and more comprehensive results. Our simulation also
validates the outcomes of the implementation.
6.1 Implementation and its Results
An evaluation of the implementation of mobile query processing at the server side
is described in this section. This section is divided into two parts: a short summary
of the implementation details, and an elaboration of implementation results.
188
CHAPTER 6. PERFORMANCE EVALUATION 189
6.1.1 Implementation Environment
A summary of the implementation details is given in this section. The summary
includes implementation settings and the architecture.
Database records 250,000 - 1,250,000BS dimension 10,000 x 10,000
Searching distance 500 - 2500Shape used Circle, square
Speed 0, 50Direction horizontal, vertical and diagonal
Once users have passed these values to our simulation, our simulation assigns
those values to appropriate variables. After all value parameters have been assigned,
we create a boundary of base station and a query scope. Then, our simulation
retrieves all records from the chosen database by storing them into an array. These
retrieved records from the database are valid for this base station since they were
generated by our generator.
The next step is to find records that belong to the query scope. Remember that
our proposed approaches are only to retrieve records that have not been passed.
APPENDIX A. IMPLEMENTATION MODEL 254
Thus, we divide the query scope into four equal regions. To simplify our discussion,
the regions are numbered anti-clockwise starting from the top right. The selection of
regions is done by identifying the entered velocity from users. The velocity consists
of two elements: X and Y . Both elements are positive if the client travels to the
north east position. In contrast, the values of both elements are negative if the
client travels south west. Once the travel direction has been identified, the region
selection can be done. If the travel direction is east, the selected regions are one
and four. The complexity of regions selection can be seen in the case study from
Chapter 2.
Object validation is done in the next step (shown in Figure A.1). In this step,
an object is retrieved from the coordinate collection (line 3). Then, we compare the
location of the object to the chosen regions of a square and a circle as the query
scope. Lines 5 to 12 show object validation where the object is located inside the
square presented as a query scope. If it is inside, the counter for the square is
incremented and verified whether or not it is located inside a circle. Measuring the
distance between the object and the user is done by using Euclidean distance (line
15-17). The counter for circle is incremented if the object is inside the circle. These
objects are located inside the square and the circle will increment the counter for the
square and the circle. This verification continues until the value of internalCounter
is equal to number of objects inside the coordinate collection (line 1).
Figure A.2 shows how our program is run and its output when a user does
not move. We requested some information (location, searching distance, speed,
travel direction) from the user since we do not have any device for collecting live
information from a user. The time is measured based on a time unit.
Next experiment is the implementation when a user misses query results, thus
the server needs to reproduce the next query results. The implementation process is
APPENDIX A. IMPLEMENTATION MODEL 255
1 while ( in te rna lCounte r < coord inate . s i z e ( ) )2 {3 ptDblBuff = ( Point2D . Double ) coo rd inate . elementAt ( in te rna lCounte r ) ;4 /∗ I s a coord inate i n s i d e the reg ion ? ∗/5 i f ( ( ptDblBuff . x < qsTopRight . x ) &&6 ( ptDblBuff . y < qsTopRight . y ) &&7 ( ptDblBuff . x > qsTopLeft . x ) &&8 ( ptDblBuff . y < qsTopLeft . y ) &&9 ( ptDblBuff . x < qsBottomRight . x ) &&
10 ( ptDblBuff . y > qsBottomRight . y ) &&11 ( ptDblBuff . x > qsBottomLeft . x ) &&12 ( ptDblBuff . y > qsBottomLeft . y ) )13 {14 /∗ Find Distance o f t a r g e t to i n s i d e c i r c l e to source ∗/15 pwrDistance = Math . pow( ( ptDblBuff . x − source . x ) , 2 . 0 ) +16 Math . pow( ( ptDblBuff . y − source . y ) , 2 . 0 ) ;17 ptDblDistance = Math . s q r t ( pwrDistance ) ;18 i f ( ptDblDistance <= distanceFromSource )19 {20 noOfPlacesFoundInCirc le++;21 }}22 noOfPlacesFoundInSquare++;23 }24 i n t e rna lCounte r++;25 }
Figure A.1: Implementation for object validation against query scope
similar to that of the initial process. However, if there are any existing query results
or the receiving flag is false, the server checks whether there is any overlapping area
between the current and the previous scopes. When an overlapping scope does not
exist, the query processing is the same as before. In contrast, when the current and
previous query scopes existed, the server invalidates any objects in the query results
which are not located inside the overlapping area. Then, the server searches objects
within the non-overlapping area of the current query scope. The existing objects
within the overlapping and the non-overlapping areas are merged. Then the server
sends the query results to the users.
The processing time is measured when the server starts processing the query
results. The measurement is done when invalidating objects from query results and
generating query results from the beginning of the process.
APPENDIX A. IMPLEMENTATION MODEL 256
[jjayaput@sng-1 experiment1]\$ java generateCoordinate datadata50kEnter your current position including floating point(0-10000) :5000Enter distance that you would like to search :500Enter your current speed (0 - stop) :0Total : 50000 CoordinatesTime : t0Source (5000.0,5000.0)direction: hSearching in Region 0
Number of Places Found in Square:127Number of Places Found in Circle:100Searching in Region 1
Number of Places Found in Square:114Number of Places Found in Circle:83Searching in Region 2
Number of Places Found in Square:112Number of Places Found in Circle:84Searching in Region 3
Number of Places Found in Square:123Number of Places Found in Circle:93
Figure A.2: Snapshot of experiment 1 simulation
A.3 Implementation for Query Processing in Multi-
Cells
The implementation of query processing in multiple cells is quite complex, since
it involves a number of servers. In our implementation, we use TCP/IP for all
communication protocols. The time it takes to send a query result from a server to
a mobile user is ignored since we assume that the time needed for sending a query
result is constant.
To simulate multiple cells implementation, we use three machines where each
machine runs one server to serve one cell and has its own database.
APPENDIX A. IMPLEMENTATION MODEL 257
Figure A.3: Class diagram of server implementation
APPENDIX A. IMPLEMENTATION MODEL 258
Our class diagram for server implementation is shown in Figure A.3. It has five
classes: Server, ThreadedSocket, BSEntity, Message and Result. The explanation
for each class is as follows:
• Server
This class is the front end of the server which initiates the server boundary
and listening for any incoming request. The server instantiation has two parts:
default or custom. In the default mode, the BS boundary has been decided
automatically by the simulation. In the other words, the default mode is used
to initiate the main server. If the user does not give any parameters to this
class, the default value is used. Table A.3 values for the main server are as
follows:
Table A.3: Server default settingParameter ValueBS Width 900BS Height 2,000Server Port 8189
Figure A.4 shows an implementation snapshot to register to main server.
If the server configuration is set by the user, there must be at least one main
server up and running. The reason is that any customised server needs to
register to the main server. The port for main server is 8189. The registration
process for other servers is very simple: they connect to the incoming port of
the main server and send their identity to the main server. Then, they wait
for an acknowledgment from the main server that their registration status has
been successful. They can stand by to listen for incoming request if their
registration process to the main server has been successful. Otherwise, the
instantiation of this server has failed.
APPENDIX A. IMPLEMENTATION MODEL 259
1 i f ( port != 8189 )2 {3 try4 {5 // Es t a b l i s h connect ion to the main se rve r6 Socket socketToNeigh = new Socket ( "203.24.130.25" , 8189 ) ;7
8 PrintWriter output = new PrintWriter ( socketToNeigh . getOutputStream ( ) ) ;9 input = new BufferedReader ( new InputStreamReader (
10 socketToNeigh . getInputStream ( ) ) ) ;11 Str ingToken ize r s t = null ;12
13 output . p r i n t l n ( r eque s t ) ; // Asking f o r r e g i s t r a t i o n .14 output . f l u s h ( ) ;15
16 St r ing buffInputFromSocket = null ;17
18 try19 {20 while ( ( buffInputFromSocket = input . readLine ( ) ) != null )21 {22 s t = new Str ingToken i ze r ( buffInputFromSocket ) ;23 System . out . p r i n t l n ( "updatingÃNeighbourÃList" ) ;24
25 b s en t i t y . updateNeighbour ( inetAddr , port , p o s i t i o n (x , y ) , Dimension ) ;26
27 System . out . p r i n t l n ( "NeighbourÃListÃupdated" ) ;28 }29 }30 catch ( Exception e )31 { System . out . p r i n t l n ( "RegistringÃmainÃserverÃtoÃneighbourÃlistÃfailed" ) ; }32
33 // We don ’ t need to e s t a b l i s h connect ion to the main se rve r34 // once the r e g i s t r a t i o n completed .35 // The communication between se rve r w i l l be handled by c l a s s BSEntity .36 socketToNeigh . c l o s e ( ) ;37 }38 catch ( Exception e ) { System . out . p r i n t l n ( "ServerÃRegistrationÃfailed" ) ; }39 }
Figure A.4: Implementation of a server registering itself to a main server
In the next step, this class initiates the listening port and keep listening for
incoming request from other servers and client. The server port can be chosen
directly by simulation or user. Figure A.5 shows an implementation snapshot
for listening to incoming requests using ServerSocket provided by Java library.
When there is an incoming request, this class remembers the port from
requesters and calls ThreadedSocket class. When it calls ThreadedSocket class,
it passes the requester’s port, its boundary. What ThreadedSocket class does
APPENDIX A. IMPLEMENTATION MODEL 260
1 ServerSocket s e r v e r = new ServerSocket ( port ) ;2 System . out . p r i n t l n ( "ServerÃisÃready" ) ;3 while ( true )4 {5 Socket socke t = s e r v e r . accept ( ) ;6 i f ( socke t != null )7 {8 new ThreadedSocket ( socket , counter , b s e n t i t y ) . s t a r t ( ) ;9 System . out . p r i n t l n ( "ThreadÃstarted" ) ;
10 }11 }
Figure A.5: Implementation on how a server keep listening from incoming request
is to create separate process (child process) if there is an incoming request and
the server class keeps listening for next request.
• ThreadedSocket
This class contains the actual implementation of the Thread interface from
the JavaTM standard library since they only provide the Thread interface. As
mentioned earlier, this class is a child process of the class Server.
Class ThreadedSocket contains two methods: constructor and run methods.
In the constructor, it initialises values of the class variables with the ones
sent by the Server class. The run method executes this class, which is called
automatically once the class instantiation is done.
In the early process of the run method, it receives the incoming request
from the socket, which is converted into a string. Verification is carried out to
identify whether the request from the mobile user or server registration request
from the other server against the incoming request.
If the incoming request is a server registration of the other server, it will
call updateNeighbour of class BSEntity. As a confirmation result, the main
server identification is sent to that server.
APPENDIX A. IMPLEMENTATION MODEL 261
If the request is a request from a mobile user, a calling to poolingInput
method of class BSEntity is done. The poolingInput is to pool requests from
mobile users before the query is processed. In return, the poolingInput gives
the query result as the answer. This query result is an answer for LDQ of the
mobile user. At the end of this method, the query result is sent to the mobile
user through the requester port.
• Message
Class Message is used to pool a query from a mobile user. It separates the
received query from a mobile user and stores them into class variables. The
class variables are userID, currentPosition, movement, searchingDistance, new-
Position and scope. The usage of the first four parameteres are very straight-
forward. The last parameter, scope, is to generate the valid scope of user
query.
Analysing the velocity of the mobile user is done in advance to form a valid
scope. Once this is done, the valid scope is created by adding the component
of newPosition with the component of searchingDistance. If the movement is
vertical, the x-coordinate of valid scope ranges from ”minus by one” to ”twice
of x-coordinate” of searchingDistance. However, the Y-coordinate ranges from
the y-coordinate of newPosition to the y-coordinate of searchingDistance. The
valid scope creation for horizontal movement is similar to that for the verti-
cal movement. The only difference is the x-coordinate for vertical movement
becomes the y-coordinate for horizontal and the y-coordinate for vertical move-
ment becomes the x-coordinate for horizontal.
APPENDIX A. IMPLEMENTATION MODEL 262
• Result
This class is responsible for generating and storing a query result. It has
two constructors and two methods. Both constructors are default and copy
constructors to initialize class variables. The two methods are generateResult
and getResult methods. The last method returns the generated result to the
caller. The generateResult compares objects’ location from the database with
the valid scope. For those objects located inside the valid scope, the counter
of objects found is incremented and the locations of those objects are stored.
• BSEntity
This class is the main class for the server implementation. The query
processing and server registration processing are three tasks for this class. Let
us discuss the server registration process first, followed by the query processing
task.
The server registration process is done with a method called updateNeigh-
bour. This method accepts five parameters: address, port, position, BSWidth
and BSHeight. The first two parameters are the other servers’ addresses and
listening ports. The last three parameters are the bottom left position of the
other server, other BS width and height respectively. Then, this method find
an empty slot in its list of neighbour BSs. Once the empty slot has been
found, it creates an object of type NeighbourDetails by passing all accepted
parameters. At the end of this method, it sends a confirmation message to the
caller.
The second part of this task is the query processing. The query processing
here involves the server and the client. To differentiate the request type, a
method called poolingInput does this filtering job. If the query is asked by
APPENDIX A. IMPLEMENTATION MODEL 263
BS, this method calls method generateQueryFromBS. Otherwise, it calls on
method generateQueryFromClient to do query processing from a client.
After the filtering of query type has been done, the query processing task
is done by the following methods. The last two methods are to retrieve objects
from neighbour cells:
– generateQueryFromClient
In the beginning, the incoming request is pooled inside an array. We
used FIFO as a standard queuing priority. Once a server has finished
processing, the server processes the next element of the array.
The query processing is done by retrieving all objects in the current
BS. The procedure is the same as static objects retrieval for the single
cell. Once the process is finished, this method calls method generate-
QueryResultsFromNeighbour to retrieve static objects from a neighbour
cell. At the end of this method, information about static objects from
current and neighbour cells is combined and returned to the caller. The
processing time is also measured here. It measures the processing time
from current and neighbour cells.
– generateQueryResultsFromNeighbour
This method is used to find which neighbour BS overlaps with the
query scope. It goes through its database to get a list of overlapped
neighbour BS. Then, it passes the overlapping parts of the query scope by
opening the connection to that BS. While the neighbour BS is processing
the current query, the current BS waits until it gets the results from that
neighbour BS.
APPENDIX A. IMPLEMENTATION MODEL 264
– generateQueryFromBS
Before it starts searching static objects from its database, the time
measurement begins at the starting point of this method. The second step
determines which area of the query scope needs to be searched. Once the
area has been determined, it starts the searching process by comparing
each object that belongs to that area. These objects are collected into a
result collection. At the end of the process, the result collection is sent
to the caller and the time measurement is stopped.
Appendix B
Simulation Model
B.1 Simulation Package Overview
Planimate is a discrete event animation software platform for prototyping, devel-
oping and operating highly visual dynamic discrete event simulation models and
interactive applications [89]. Figure B.1 shows the opening page when this package
is loaded.
Figure B.1: Opening page of Planimate
265
APPENDIX B. SIMULATION MODEL 266
The Planimate contains two different types of palettes, namely Objects and
Items, as shown in Figures B.2 and B.3. The first type of palettes, Object palettes,
contains 18 different objects which symbolise different activities for simulating the
features of the real-environment.
Figure B.2: Planimate Objects
Figure B.3 shows the Items palette. An item is a temporary object that will
interact with the permanent objects and move through the system. An item coop-
erates with the permanent objects through paths which need to be defined.
B.2 Query Processing Model
A brief explanation of our proposed simulation models is given and described in this
section. The explanation includes a description of the features that are available in