Spatio-Temporal Indexing for Large Multimedia Applications

Submitted to the 3rd IEEE International Conference on

Multimedia Computing and Systems (ICMCS’96).

Spatio-Temporal Indexing for Large Multimedia Applications

Yannis Theodoridis∗∗ , Michael Vazirgiannis, Timos Sellis

Computer Science Division

Department of Electrical and Computer Engineering

National Technical University of Athens

Zographou, Athens, 157 73 GREECE

phone: +30-1-7721402

fax: +30-1-7722459

e-mail: {theodor, mvazirg, timos}@cs.ntua.gr

∗ Author to whom correspondence should be addressed.

1

Spatio-Temporal Indexing for Large Multimedia Applications

Yannis Theodoridis, Michael Vazirgiannis, Timos Sellis

Computer Science DivisionDepartment of Electrical and Computer Engineering

National Technical University of AthensZographou, Athens, 157 73 GREECE

e-mail: {theodor, mvazirg, timos}@cs.ntua.gr

ABSTRACT: Multimedia applications usually involve a large number ofmultimedia objects (texts, images, sounds etc.). Spatial and temporalrelationships among these objects should be efficiently supported and retrievedwithin a multimedia authoring tool. In this paper we present several spatial,temporal and spatio-temporal relationships of interest and propose efficientindexing schemes, based on multi-dimensional (spatial) data structures, for largemultimedia applications that involve thousands of objects. Evaluation models ofthe proposed schemes are also presented as well as hints for the selection of themost appropriate one, according to the multimedia author’s requirements.

1. INTRODUCTION

A Multimedia Application (MAP) involves a variety of individual multimedia objects

presented according to the MAP scenario. The multimedia objects that participate in

the MAP, called actors, are transformed either spatially or temporally in order to be

presented according to author's requirements. Moreover the author has to define the

spatial and temporal ordering of actors within the application context and define the

relationships among them. Finally the way that the user will interact with the

application as well as the way that the application will treat application or system

events have to be defined.

It is evident that authoring complex MAPs that involve a large number of actors may

be a very complicated task, keeping in mind the large set of possible events that may

encounter in the application context and the number of actors as well as the various

potential combinations of these parameters. Typically we would expect the number of

actors and their relationships in the context of an MAP to be 104 as regards order of

magnitude. Taking in account the vast number of possible events and their

2

combinations as regards interaction, the amount of the entities that have to be managed

by the MAP authors is considerable.

Thus the need for an indexing scheme that will support the author to manage the large

number of actors and spatio-temporal relationships among them is required. Current

authoring tools do not provide such facilities. Sample requirements would include:

• which actors appear in the application at a specific time instance,

• what is the spatial layout (screen layout) at a specific time instance during the

application,

• what is the temporal layout of the application in terms of temporal intervals,

• what is the spatio-temporal relationship among a set of actors in the application (i.e.

“does actor A spatially overlap with actor B in the application?” or “which actors

temporally overlap with actor A?”)

In this paper we propose indexing schemes for large multimedia applications in order

to assist authors manage the large number of actors, have spatial and temporal layouts

of parts or of the entire application, answer queries regarding spatio-temporal

relationships among actors. As regards multimedia application modelling we are based

on previous research efforts [Vazi93, Vazi95a], while we exploit a set of complete

spatio-temporal operators for the description of the spatial and/or temporal

relationships between the actors [Vazi95c].

The proposed indexing schemes are based on the R-tree index [Gutt84] which is

widely used for indexing of spatial data in several applications, such as Geographic

Information Systems (GIS), CAD and VLSI design, etc. We adapt R-trees in order to

index either spatial, temporal or spatio-temporal occurrences of actors and

relationships between them. Moreover, we evaluate the proposed schemes against the

case of serial storage of actors and their spatio-temporal coordinates, giving also hints

to multimedia database designers in order to select the most efficient scheme according

to the requirements of MAP authors.

In the literature there is no previous work, according to our knowledge, on indexing

spatio-temporal characteristics of actors1 . Research has mainly focused on content-

based image indexing, i.e., fast retrieval of objects using their content characteristics

(color, texture, shape). For example, in [Falo94a] a system, called QBIC, that couples

several features from machine vision with fast indexing methods from the database area

1 In the rest of the paper we will use the terms actor and object interchangeably.

3

is proposed in order to support colour, shape and texture matching queries. Nearest-

neighbour queries (based on image content) are addressed in [Chiu94]. In general,

indexing of actors’ contents is an active research area while indexing of actors’ extends

in the spatio-temporal coordinate system sets a new direction.

The paper is organised as follows: Section 2 presents the underlying model for MAPs

of interest and the spatio-temporal relationships and operators among actors of a MAP

that need efficient support. In Section 3 we propose two indexing schemes, a simple

and a unified one, based on the R-tree spatial index in order to support these operators.

The analytical evaluation of the proposed schemes is presented in Section 4. We

conclude in Section 5 summarising our work and giving hints for future research.

2. MULTIMEDIA APPLICATION MODELLING

Modelling of MAPs is a matter of current research [Duda95, Hirz95]. As a part of our

research we have specified object-oriented models for multimedia applications. More

specifically, we have specified models for compositions of multimedia objects in MAPs

[Vazi95c] and models for representation of MAP interactive scenarios based on

events [Vazi95a, Vazi95b].

The life-cycle of a MAP, as regards authoring, involves the following phases:

• High Level Specification: In this phase the author defines the high level scenario of

the application. More specifically the overall spatial layouts (screens) and high level

functionality of the application is designed, independent of specific media content.

• Media content selection and transformation: In this phase the author must select

the specific media contents for the actors. The media content should go through a

transformation phase, so as to align high level specifications (spatial and temporal

layouts) to MAP. For instance, assume an image originally sized 300x200 pixels

that has to be resized to 150x100 pixels for the sake of an application. These

transformations may be spatial and/or temporal and should not alter the original

data files.

• Specific scenario definition: In this phase the specific scenario for each actor must

be defined. The scenario belongs to one of the following two categories: pre-

orchestrated (pre-defined spatial and temporal ordering of actors) or interactive

(the application flow is affected by events related to user, application or the

system). In both cases the requirement for complete representation of the spatio-

temporal ordering and relationships among actors arises.

4

In the past, the term “synchronisation” has been widely used to describe the temporal

ordering of actors in a MAP. A MAP specification should describe both temporal and

spatial ordering of actors in the context of the MAP. The spatial ordering (i.e.,

absolute position and spatial relationships among actors) issues have not adequately

been addressed. We claim that the term synchronisation is poor for MAPs, instead we

propose the term “composition” to represent both the temporal and the spatial

composition of actors. As “composition” we define the spatio-temporal ordering of

actors in the context of the MAP. Composition has spatial and temporal features and

must represent the corresponding relationships among the actors [Vazi95c].

A model for scenarios should cover both pre-orchestrated and event-based cases. The

general form of a scenario tuple (scenario fundamental unit) [Vazi95a] should

represent: the triggering event (simple or complex), stopping event, set of actions to be

executed, constraints to be fulfilled for the scenario to start its execution. As regards

the interactive scenario cases the author should define the events (simple or complex)

that will be consumed by the application and will trigger actions to be executed. These

actions are essentially composed presentations of media objects. The events may be

related to user interaction, system or application events [Vazi95b].

In this paper we focus on the pre-orchestrated case, since event-based (or interactive)

scenario cases include non-deterministic temporal and/or spatial occurrences of actors.

Though this is a matter of our future research.

2.1. Spatio-Temporal Relationships and Operators

As we have mentioned above, a crucial parameter for MAP development is the

specification of the spatial and temporal presentation specifications as well as

relationships among the participating actors. We present here a set of operators to

represent every possible spatio-temporal relationship between actors. As regards the

spatial relationships we exploit the set of operators defined in [Papa96] that represent

the possible topological-directional relationships between two 2-dimensional objects

(Table 1). The 169 relationships Ri_j (i = 1, ..., 13, j = 1, ..., 13) compose a complete

set of spatial operators. As regards temporal relationships we exploit a complete set of

temporal operators defined in [Vazi95c] that represent temporal relationships between

multimedia object presentations in MAPs (Figure 1).

5

R i_1 R i_2 R i_3 R i_4 R i_5 R i_6 R i_7 R i_8 R i_9 R i_10 R i_11 R i_12 R i_13

R 1_j

R 2_j

R 3_j

R 4_j

R 5_j

R 6_j

R 7_j

R 8_j

R 9_j

R 10_j

R 11_j

R 12_j

R 13_j

Table 1: Spatial relationships between two objects

(covering directional-topological information)

A -t-> B

t<0, |t|>durA, durA<durB

AB

t = 0

A B

t>0

A B

t<0, |t|<durA

AB

before during

meets overlaps

AB

begins (A\/B)AB

ends (A/\B)

Figure 1: The temporal operators that are defined and the relationships they represent

6

With this set of operators we can represent any spatio-temporal relationship among

actors. For instance, the following composition: “Image B to appear 3 seconds after

the image A, 4cm to the right and 5cm down the right bottom vertex of image A”

would be represented as the following composition tuple [Vazi95c]:

r1’ = A [(r 13_13 , v3, v2, 4, 5), ( -3->)] B

where r 13_13 is the corresponding spatial relationships (from Table 1), (-3->) is the

temporal relationship between the actors and v3, v2 are the named vertices of the

actors while (4,5) are their spatial distances on the two axes.

In the rest of the paper we are going to use the following spatial, temporal and spatio-

temporal relationships, as typical queries in order to illustrate the proposed indexing

schemes through these sample operators:

• spatial operators:

- overlap(p,q): returns the list of objects p that spatially overlap object q.

- above(p,q): returns the list of objects p that spatially lie above object q.

• temporal operators:

- during(p,q): returns the list of objects p that are temporally included in the

temporal interval that corresponds to the execution of object q.

- before(p,q): returns the list of objects p that are temporally executed before object

q.

• spatio-temporal operators:

- overlap_during(p,q): returns the list of objects p that spatially overlap object q

while its execution.

- overlap_before(p,q): returns the list of objects p that spatially overlap object q and

their execution ends before the beginning of execution of q.

- above_during(p,q): returns the list of objects p that spatially lie above object q

while its execution.

- above_before(p,q): returns the list of objects p that spatially lie above object q and

their execution ends before the beginning of execution of q.

We classify the above operators in two categories: inclusive (those that indicate spatial

or temporal overlapping or inclusion among actors) and exclusive (those that indicate

spatial or temporal disjointness among actors) ones. A general assumption is that

queries that involve inclusive operators are highly selective (i.e., the result is a limited

set of objects) while queries that involve exclusive operators indicate low selectivity

(i.e., the result is a large set of objects). We will also define in the next subsection a

7

sample MAP to use it as an example for illustration reasons as regards the indexing

schemes that will be proposed in Section 3.

2.2. An Example Multimedia Application

In this subsection we describe an indicative MAP in terms of spatio-temporal

relationships as defined above. The spatial layout of the application appears in Figure

2a while the temporal one appears in Figure 2b. The high level scenario of the

application is the following:

“The application starts with video clip A (located at point 10,50 relatively to the

application origin Θ). At the same time a narration E starts. 10 sec after the start of

video A, image B appears to the right side (18 pts) and below the upper side of A (12

pts). The image B will be displayed for 7 sec. Just after the image B disappears, video

A stops while a text window C appears 7 pts below (left aligned) the position of image

B. Text window C will remain for 11 sec on the screen. 3 sec after C appears, a small

image D appears inside C 8 pts above the bottom side of C, aligned to the right side.

D will remain for 4 sec on the screen. Meanwhile, at the 10th sec of the application, a

bitmap logo F appears at the bottom-left corner of the application window. F

disappears after 3 sec. The application ends when narration E ends.”

C D

B

7

8

18

12

A

ΘΘ

(50,50)

FTime

FE

D

A

10 13 17 20 24 28È

C

B

30

(a) spatial layout (b) temporal layout

Figure 2: Spatial and temporal layout of a multimedia application

The objects to be included in a composition tuple are those that are spatially and/or

temporally related to each other. Typical queries submitted by the author of an

application would be the following:

1. which actors temporally overlap the presentation of actor D? (only temporal

relationship is involved)

8

2. which actors spatially lie above actor D in the application window? (only spatial


3. which actors spatially overlap with actor D during its presentation? (spatio-temporal


4. what is the spatial layout of the screen on the 22nd sec of the application?

5. what is the temporal layout between the 10th and the 20th sec of the application?

In the next section we propose efficient indexing mechanism to support such queries in

a large multimedia application.

3. SPATIO - TEMPORAL INDEXING

MAPs usually involve a large amount of non-traditional objects, such as images, video,

sound, and text. The quick retrieval of a qualifying set, among the huge amount of

data, that satisfies a query based on spatio-temporal relationships is necessary for the

efficient construction of a MAP. Spatial and temporal features of objects are identified

by six coordinates: the projections on x- (points x1, x2), y- (points y1, y2), and t- (points

t1, t2) axes. A serial scheme, maintaining the objects characteristics as a set of seven

values (id, x1, x2, y1, y2, t1, t2), as illustrated in Figure 3 for the objects of the example

MAP, is not an efficient solution.

Multimedia DB

(A,10,50,50,75,0,17) (B,68,45,100,63,10,17) (C,68,5,110,38,17,28) ......

video A image B text C......

Figure 3: A serial indexing scheme of actors

Queries involving spatio-temporal operators, such as the ones presented in Section 2,

should usually require the retrieval of all pages of the serial scheme in order to be

answered since no ordering is provided by such a scheme. As a conclusion, efficient

indexing mechanisms that can support a wide range of spatio-temporal operators need

to accompany MAP tools. In the next subsections we propose two indexing schemes

and their retrieval procedures.

9

3.1. A Simple Spatial and Temporal Indexing Scheme

A simple indexing scheme that could be able to handle spatial and temporal

characteristics of actors consists of two indexes:

• a spatial (two-dimensional) index for spatial characteristics (size and x- and y-

coordinates) of the objects and

• a temporal index for temporal characteristics (duration and start/stop time) of the

objects.

In the literature concerning the area of spatial databases, several data structures have

been proposed for the manipulation of spatial data (a survey can be found in

[Same89]). Among others, R-trees [Gutt84] and their variants [Sell87, Beck90] seem

to be the most efficient ones. On the other hand, the manipulation of temporal

information can be supported either by one-dimensional versions of the above data

structures (since all of them have been designed for n-dimensional space in general) or

by specialised temporal data structures, such as Segment Trees [Bent75] or Segment

R-trees [Kolo91].

For uniformity reasons we select a single multi-dimensional data structure (R-tree) to

play the role of the spatial (2D R-tree) and temporal (1D R-tree) index. The above

indexing scheme is illustrated in Figure 4.

Multimedia DB

Spatial Info Temporal Info

2D R-tree 1D R-tree

......

............

......

Figure 4: A simple (spatial and temporal) indexing scheme of actors

The adoption of the above indexing scheme clearly improves the retrieval of spatio-

temporal operators compared to serial retrieval. Logarithmic times of response using

10

hierarchical tree indexes are always better than linear ones when no indexes are

present, especially when thousands or millions of objects are involved. Even for

complex operators where both indexes need to be accessed (e.g., for the

overlap_during operator) the cost of the two indexes’ response times is expected to be

lower than the cost of the serial retrieval.

A weak point of the above scheme has been already mentioned. The retrieval of objects

according to their spatio-temporal relationships (e.g., the overlap_during one) with

others, demands access to both indexes and, in a second phase, the computation of the

intersection set between the two answer sets. Access to both indexes is usually costly

and, in many cases, most of the elements of the two answer sets are not found in the

intersection set. In other words, most of the disk accesses to each index separately are

useless. To eliminate this problem, several techniques for spatial join, between two R-

trees for example, have been proposed [Brin93] and could be applied to our indexing

scheme. However, this solution is not applicable when two completely different

indexes are used (e.g. an R-tree for spatial information and a Segment Tree for

temporal information). A more efficient solution is the merging of the two indexes (the

spatial and the temporal one) in a unified mechanism. This scheme is proposed in the

next subsection.

3.2. A Unified Spatio-Temporal Indexing Scheme

In this subsection we propose a unified spatio-temporal indexing scheme that

eliminates the inefficiencies of the previous one and further improves the performance

of a MAP tool. The proposed indexing scheme consists of only one index: a spatial

(three-dimensional) index for the complete spatio-temporal information (location in

space and time coordinates) of the objects. If we assume that the R-tree is an efficient

spatial indexing mechanism then the unified scheme is illustrated in Figure 5.

The main advantages of the proposed scheme, when compared to the previous one, are

the following:

• Indexing mechanism is based on a unified framework. Only one spatial data

structure (e.g. R-tree) needs to be implemented and maintained.

• Spatio-temporal operators are efficiently supported. Using the appropriate

definitions, spatio-temporal operators are implemented as three-dimensional queries

and retrieved using the three-dimensional index. So the need for (time consuming)

spatial joins is eliminated.

11

Multimedia DB

Spatio-Temporal Info

3D R-tree

......

......

Figure 5: A unified (spatio-temporal) indexing scheme of actors

The superiority of the second indexing scheme against the first one and, obviously, the

serial scheme on the retrieval of spatio-temporal operators will be formally proved in

Section 4 where analytical models that predict the performance of each scheme will be

presented. In the rest of the section we will present the retrieval process of such

operators when the unified indexing scheme is available within a multimedia

application tool.

3.3. Retrieval of Spatio-Temporal Operators Using R-trees

The majority of multi-dimensional data structures have been designed as extensions of

the classic alphanumeric index, B-trees. They usually divide the plane into appropriate

sub-regions and store these sub-regions in hierarchical tree structures. Multi-

dimensional objects are represented in the tree structure by an approximation instead

of their actual scheme, for simplicity and efficiency reasons2 .

When the MBRs of two objects are disjoint then we can conclude that the objects that

they represent are also disjoint. If the MBRs however share common points, no

conclusion can be drawn about the topological relation between the objects. For this

reason, spatial queries involve the following two-step strategy [Oren86]:

2 In this paper we examine spatial data structures based on the traditional approximation of Minimum

Bounding Rectangles (MBRs). MBRs are the most commonly used approximations in spatial data

structures because they need only 2n points - projections on the n axes - for the representation of a n-

dimensional object.

12

• Filter step: The tree structure is used to rapidly eliminate objects that could not

possibly satisfy the query. The result of this step is a set of candidates which

includes all the results and possibly some false hits.

• Refinement step: Each candidate is examined (by using computational geometry

techniques). False hits are detected and eliminated.

One of the most efficient hierarchical multi-dimensional data structures is called R-tree

[Gutt84]. It is a height-balanced tree, which consists of intermediate and leaf nodes.

The MBRs of the actual data objects are assumed to be stored in the leaf nodes of the

tree. Intermediate nodes are built by grouping rectangles (or hyper-rectangles, in

general) at the lower level. An intermediate node is associated with some rectangle

which encloses all rectangles that correspond to lower level nodes. In order to retrieve

objects that belong to the answer set of a spatio-temporal operator, with respect to a

reference object, we have to specify the MBRs that could enclose such objects and

then to search the intermediate nodes that contain these MBRs. This technique was

proposed and implemented in [Papa96] in order to support spatial operators of high

resolution (e.g. meet, contains) that are popular in GIS applications.

As an example, Figure 6 shows how the MBRs of Figure 2 are grouped and stored in

the 3D R-tree of our unified scheme. We assume a branching factor of 4, i.e., each

intermediate node contains at most four entries. At the lower level MBRs of objects

are grouped into two intermediate nodes R1 and R2, which in turn compose the root

of the index. If we consider query 3 of subsection 2.2, it corresponds to the

overlap_during operator with D being the reference object q. In order to answer this

query, only R2 is selected for propagation. Among the entries of R2, objects C and

(obviously) D are the ones that constitute the qualified answer set.

Note that only the right subtree of the R-tree index of Figure 6a was propagated in

order to answer the query. The rate of the accessed nodes heavily depends on the size

of the reference object q and, of course, the kind of the operator (more selective

operators result to smaller number of accessed nodes).

13

A E B

R1

F

R2

C D (a)

x

y

t

A B

C

D

F

E

R1R2

(b)

Figure 6: Retrieval of overlap_during operator using 3D R-trees

If we now consider query 2 of subsection 2.2, it corresponds to the overlap operator

with D being the reference object q. Because the query gives no temporal information

on the reference object, the unified scheme transforms it to a large cube that covers the

whole t- axis. In this case, the simple scheme, presented in subsection 3.1, could be

more efficient, since the 2D R-tree which is dedicated to spatial information of objects

is able to answer the query. Similarly, query 1 of subsection 2.2 corresponds to the

during operator and could also be efficiently supported by the simple scheme.

A special type of queries, which is very popular during MAP authoring, consists of

spatial or temporal layout retrieval. In other words, queries of the type “Find the

objects and their position in screen at the T0 second” (spatial layout) or “Find the

objects that appear in the application during the (T1,T2) temporal segment and their

temporal duration” (temporal layout) need to be supported by the underlying scheme.

As we will present immediately, both types of queries are efficiently supported by the

unified scheme, since they correspond to the overlap_during operator and an

appropriate reference object q: a rectangle q1 that intersects t-axis at point T0, or a

cube q2 that overlaps t-axis at the (T1,T2) segment, respectively. The reference objects

q1 and q2 are illustrated in Figure 7a. In a second step, the objects that compose the

answer set are filtered in main memory in order to design their positions on the screen

14

(spatial layout) or the intersection of their t- projections to the given temporal segment

(temporal layout).

Queries 4 and 5 of subsection 2.2 belong to the “layout” type queries and could be

processed as described above. In particular, query 4 could be answered by exploiting

the reference object q1 at the specific time instance T0 = 22sec. The result would be a

list of objects (the identifiers of the actors, their spatial and temporal coordinates) that

are displayed at that temporal instance on the screen. This result may be visualised as a

screen snapshot with the objects that are included in the answer set drawn in that

(Figure 7b). As regards query 5, it could be answered using as reference object a cube

q2 having dimensions (Xmax-0) • (Ymax-0) • (T2-T1) where Xmax • Ymax is the dimension of

the screen and (T2-T1) is the requested temporal interval; T1 = 10 and T2 = 20 in our

example. The result would be a list of objects (the identifiers of the actors, their spatial

and temporal coordinates) that are included or overlapped with the cube q2. This result

can be visualised towards a temporal layout by drawing the temporal line segments of

the retrieved objects that lie within the requested temporal interval (T2-T1) (Figure 7c).

x

y

t

A B

C

D

F

E

R1R2

T1

T2

q1

q2

T0

*

**

(a) query windows for spatial and temporal layout

C D

A

ΘΘ

(b) spatial layout

Time

FE

A

10 13 17 20È

C

B

(c) temporal layout

Figure 7: Spatial and temporal layout retrieval using 3D R-trees

15

On the other hand, the simple indexing scheme (consisting of two index structures) is

not able to give straightforward answers to the above layout queries, since information

stored in both indexes needs to be retrieved.

In this section we proposed two schemes for indexing of actors that appear in MAPs

and presented the retrieval procedure that concerns spatio-temporal operators on these

objects. In the next section both schemes will be analytically evaluated and compared

to each other. Their comparison will result to general conclusions on the advantages

and disadvantages of each solution.

4. ANALYTICAL EVALUATION OF THE INDEXING SCHEMES

We present an analytical model that estimates the performance of R-trees on the

retrieval of n-dimensional queries. The analytical formula is applicable to both indexing

schemes, if we keep in mind that the simple one consists of one 2D R-tree and one 1D

R-tree while the unified one consists of one 3D tree. Using this model we can estimate

the performance of both schemes and compare their efficiency using several spatio-

temporal operators.

4.1 Analysis of R-tree Performance

Most of the work in the literature has dealt with the expected performance of R-trees

for processing overlap queries i.e., the retrieval of data objects p that share common

area with a query window q [Page93, Falo94b, Theo95a]. More particularly, let N be

the total number of data objects indexed in a R-tree, D the density of the data objects

in the global space and f the average capacity of each R-tree node. If we assume that

the average size of a query window q is qii

n

=∏

1 then the expected retrieval cost (number

of disk accesses) of an overlap query using R-trees is [Theo95a]:

C qN

fD

f

Nqj j

j n

ii

n

j

N

ff

( )

log

= + ⋅ ⋅

+

==

+

∏∑1

1

11

1

(1)

where the average density of the R-tree nodes Dj at each level j is given by:

( )D

D

fj

j n

n

n

= +−

−

111

1

1 (2)

16

In other words, Dj can be computed recursively using D0 which denotes the density D

of the data MBRs. Qualitatively, this means that we can estimate the retrieval cost of

an overlap query based on the knowledge of the data set and the query window only.

Since Eq. 1 expresses the expected performance of R-trees on overlap queries using a

query window q, in order to estimate the retrieval cost of a spatio-temporal operator

R(p,q) we need the following transformation: R(p,q) ⇒ overlap(p,Q). In other words,

the retrieval of a spatio-temporal operator using R-trees is equivalent (in terms of cost)

to the retrieval of an overlap query using an appropriate query window Q. The

necessary transformation Q for each operator R should take into consideration the

corresponding constraint of the intermediate nodes because only these nodes are

important when estimating the retrieval cost [Papa95]. For the spatio-temporal

operators that we consider in this paper, the appropriate query windows Q are

illustrated in Figure 8.

x

y

t

Qx

y

Q

t

x

y

Q

t

x

y

Q

t

overlap(p,q) above(p,q) during(p,q) before(p,q)

x

y

t

Q

x

y

t

Q

x

y

t

Q x

y

t

Q

overlap_during(p,q) above_during(p,q) overlap_before(p,q) above_before(p,q)

(a) query windows Q for 3D R-trees

x

y

Q

x

y

Qt

Qt

Q

overlap(p,q) above(p,q) during(p,q) before(p,q)

(b) query windows Q for 2D R-trees (c) query windows Q for 1D R-trees

Figure 8: Query windows Q for spatio-temporal operators

17

Figure 8a illustrates query windows Q (3D boxes) with respect to the eight operators

discussed, while Figures 8b and 8c illustrate query windows Q (2D rectangles and 1D

line segments, respectively) that correspond to spatial (overlap, above) and temporal

operators (during, before), respectively.

Using information from Figure 8 and Eq. 1 we can estimate the expected cost for the

query window Q, which equals to the expected cost C(R) for the retrieval of a spatio-

temporal operator R. The accuracy of the above analytical model has been already

evaluated on spatial relationships of varying selectivity (e.g., inside, near, northeast,

and combinations) in [Theo95b]. Intuitively, we assume that the unified scheme should

be the most efficient one when both spatial and temporal information are included in

the query while in the rest cases the simple scheme seems to be preferable. The

accuracy of these intuitive conclusions will be examined in the next subsection where

the above analytical model will be used to the analytical comparison of the proposed

schemes.

4.2 Analytical Comparison

In order to compare the efficiency of each scheme on the retrieval of spatio-temporal

operators we assumed a multimedia application including 10,000 actors of the

following distribution:

• a portion of 75% characterised by small projections on the three axes (x-, y-, t-) e.g.

text or video that cover a small space on the screen and last a short time interval,

• a portion of 15% characterised by zero projection on the two axes (x-, y-) and small

projection on the third axis (t-) e.g. sounds that cover zero space on the screen and

last a short time interval,

• a portion of 5% characterised by small projections on the two axes (x-, y-) and large

projection on the third axis (t-) e.g. heading titles or logos that cover a small space

on the screen and last a long time interval,

• a portion of 5% characterised by large projections on the two axes (x-, y-) and small

projection on the third axis (t-) e.g. full text or background patterns that cover a

large space on the screen and last a short time interval, and

The above distribution characterises, in general terms, a typical MAP and will be used

as the sample for the comparison of the two indexing schemes. Different distributions

of actors are also supported in a similar way by adapting their density D.

18

For the analytical estimates we used Eq. 1 and the following values: amount of data

objects N = 10,000 (8,500) for the 1D and 3D (2D) R-tree indexes3 , density of data

objects D = 145, 145, 1.6 for the 1D, 2D, and 3D indexes, respectively4 , and average

node capacity f = 0.67 • M, where M = 84, 50, 35 for 1D, 2D, and 3D R-trees,

respectively5 . The sizes of the reference objects q varied from 0% up to 50% of the

global space per axis while the corresponding query windows Q for each combination

of R-tree index and operator were illustrated in Figure 8. Table 2 summarises the

comparative results for the operators discussed in the paper. For uniformity reasons we

set the cost of serial retrieval6 (as illustrated in Figure 3) to be 100% and expressed the

costs of the two schemes per operator as portions of that value.

Operator Simple scheme Unified scheme

overlap 5% - 10% 5% - 15%

above 45% - 50% 80% - 95%

during 2% - 10% 25% - 45%

before 25% - 35% 80% - 95%

overlap_during 5% - 20% 1% - 5%

overlap_before 35% - 40% 3% - 10%

above_during 55% - 60% 15% - 25%

above_before 70% - 85% 50% - 65%

Table 2: Comparison of the two schemes (with respect to serial cost)

Several conclusions arise from the analytical comparison results presented in Table 2:

• The intuitive conclusion that the simple scheme would outperform the unified one

when dealing with operators that keep only temporal or spatial information while

the opposite would be the case for spatio-temporal operators is really true. The first

3 The amount of data objects stored in the 2D index is less than the ones stored in the 1D and 3D

indexes because zero-space objects (e.g. sounds) are not included in the dataset of the 2D index.4 The D values are implied from the above distribution if we assume that small (large) space

corresponds to 5% (50%) of the screen and short (long) period of time corresponds to 1% (10%) of the

whole duration of the application.5 67% is a typical capacity for R-trees and variants while the M values represent the maximum node

capacity for pages of 1024 bytes.6 The cost of serial retrieval is computed as follows: In our example, we store 10,000 objects with each

one requiring a space of 28 bytes (4 bytes X 7 numbers). If we set the size of a disk page to be 1024

bytes then the serial scheme demands 285 pages. All of these pages should be accessed in order to

answer any spatio-temporal operator.

19

four operators are more efficiently supported by the simple scheme while the cost of

the unified scheme is usually two or three times higher. The reverse situation

appears for the last four operators.

• Both schemes are much more efficient than the serial retrieval scheme. For the most

selective operators (overlap, during, overlap_during) the improvement is at a level

of one or even two orders of magnitude, compared to the serial cost. For the least

selective operators (above, before, above_before) the cost of the most efficient

scheme is a 1/4 up to 1/2 portion of the serial cost.

The above conclusions are, more or less, expected. However, in real cases, a mixture

of temporal, spatial and spatio-temporal operators needs to be supported. The question

of the most efficient scheme for such mixed requirements arises. To propose guidelines

for answering this question we present in Figure 9 the average cost of each indexing

scheme when (a) all eight operators are involved, (b) only the most selective

(inclusive) operators are involved, and (c) only the least selective (exclusive) operators

are involved. For each case, we tune the rate of spatial or temporal operators against

spatio-temporal ones to vary from 1:9 up to 9:1 and find the threshold point i.e., the

rate that indicates the change of the most preferable scheme.

The conclusions from Figure 9 are really interesting:

• First of all, if we do not distinguish between selective and non-selective operators,

the threshold point appears at the rate 4.5:5.5. In other words, if the spatio-

temporal queries compose more than 45% of the total, then the unified scheme is

the best solution. The fact that the threshold is under the middle point (i.e., 50%)

indicates that the unified scheme is more robust than the simple one; the extra cost

because of the third axis is usually lower than the maintenance cost of two indexes.

• If we distinguish between high- and low- selective operators, then the thresholds

shift right (high-selective operators) or left (low-selective operators). In other

words, when dealing with selective operators, the simple scheme is sometimes

preferable even if the majority (up to 65%) of the queries involve spatio-temporal

information. An explanation that can be given is that 1D and 2D R-trees are

extremely fast when dealing with high-selective temporal and spatial operators,

respectively, while the extra one or two axes involved in the 3D R-tree significantly

raise its cost. The conclusion is reversed for low-selective operators.

20

(a)

all operators involved

1:9 2:8 3:7 4:6 5:5 6:4 7:3 8:2 9:1

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

0

20

40

60

simple

AAAAAAAAunified

(b)

only the most selective

(inclusive) operators

involved

1:9 2:8 3:7 4:6 5:5 6:4 7:3 8:2 9:1

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAA






0

10

20

30

simple

AAAAAAAAAAAAunified

(c)

only the least selective

(exclusive) operators

involved

1:9 2:8 3:7 4:6 5:5 6:4 7:3 8:2 9:1

AAAAAAAA

AAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAA

0

50

100

simple

AAAAAAAAunified

Figure 9: Retrieval cost of the two indexing schemes (% of the serial cost)

The above conclusions make a clear evaluation of the two proposed indexing schemes

when various parameters (type of operators, rate of spatial and temporal operators

against spatio-temporal ones, selectivity of operators) are involved. It is a choice of the

multimedia database designer to select the most preferable solution, with respect to the

requirements of the MAP author.

5. CONCLUSION

In this paper we proposed a mechanism for management of actors in large multimedia

applications. This mechanism is based on indexing spatial and temporal presentation

features of the actors during the application. The two indexing schemes proposed are

based on the R-tree structure; the first scheme includes one 1D and one 2D R-tree that

21

index temporal and spatial characteristics of actors, respectively, while the second

scheme includes one 3D R-tree that indexes the spatio-temporal characteristics of

actors, considering time to be the third axis of the coordinate system. We evaluated the

two schemes against the “serial” scheme and presented guidelines that help one to

select the most appropriate solution.

Authoring complex MAPs that involve a large number of actors is a complicated task,

keeping in mind the large set of possible events that may encounter in the application

context and the number of actors as well as the various potential combinations of these

parameters. Thus the need for a scheme that will support the author to manage the

large number of actors and spatio-temporal relationships among them is required.

Current authoring tools do not provide such facilities. The mechanism we proposed

provides an actor management scheme to authors who can query, before application

execution, the application scenario for spatio-temporal relationships among actors (i.e.,

“does actor A spatially overlap with actor B in the application?” or “which actors

temporally overlap with actor A?”). Moreover, authors may request spatio-temporal

layouts of the application at specific spatial and/or temporal instances (i.e., “which

actors appear in the application at a specific time instance”, or “what is the spatial

layout (screen layout) at a specific time instance during the application”, or “what is

the temporal layout of the application in terms of temporal intervals”).

A limitation of our approach is that it does not support interactive scenarios due to the

non-deterministic spatial and temporal occurrences of the actors. However, the

proposed scheme could be further extended towards:

• indexing of interactive scenarios: the indexing scheme should be extended so as to

cover the case of interactive scenarios, where the spatio-temporal presence of an

actor depends on the occurrence of events.

• playout management based on the indexing scheme: the model we proposed could

be as well used during the execution phase of the scenario. In this case the

appropriate media would be quickly located on the basis of the scenario.

REFERENCES

[Beck90] N. Beckmann, H.-P. Kriegel, R. Schneider, B. Seeger, “The R*-tree: An

Efficient and Robust Access Method for Points and Rectangles”,

Proceedings of ACM SIGMOD International Conference on Management

of Data, 1990.

22

[Bent75] J.L. Bentley, “Multidimensional Binary Search Trees Used for Associative

Searching”, Communications of the ACM, vol. 18, pp. 509-517, 1975.

[Brin93] T. Brinkhoff, H.-P. Kriegel, B. Seeger, “Efficient Processing of Spatial

Joins using R-trees”, Proceedings of ACM SIGMOD International

Conference on Management of Data, 1993.

[Chiu94] T. Chiueh, “Content-Based Image Indexing”, Proceedings of the 20th

International Conference on Very Large Databases (VLDB), 1994.

[Duda95] A. Duda, C. Keramane, “Structured Temporal Composition of Multimedia

Data”, Proceedings of the 1st IEEE International Workshop for MM-

DBMSs, 1995.

[Falo94a] C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, R. Barber,

“Efficient and Effective Querying by Image Content”, Journal of Intelligent

Information Systems, vol. 3, pp. 1-28, 1994.

[Falo94b] C. Faloutsos, I. Kamel, “Beyond Uniformity and Independence: Analysis of

R-trees Using the Concept of Fractal Dimension”, Proceedings of the 13th

ACM Symposium on Principles of Database Systems (PODS), 1994.

[Gutt84] A. Guttman, “R-trees: A Dynamic Index Structure for Spatial Searching”,

Proceedings of ACM SIGMOD International Conference on Management

of Data, 1984.

[Hirz95] N. Hirzalla, B. Falchuck, A. Karmouch, “A Temporal Model for

Interactive Multimedia Scenarios”, IEEE Multimedia Magazine, Fall 1995.

[Kolo91] C.P. Kolovson, M. Stonebraker, “Segment Indexes: Dynamic Indexing

Techniques for Multi-Dimensional Interval Data”, Proceedings of ACM

SIGMOD International Conference on Management of Data, 1991.

[Oren86] J. Orenstein, “Spatial Query Processing in an Object-Oriented Database

System”, Proceedings of ACM SIGMOD International Conference on

Management of Data, 1986.

[Page93] B.-U. Pagel, H.-W. Six, H. Toben, P. Widmayer, “Towards an Analysis of

Range Query Performance”, Proceedings of the 12th ACM Symposium on

Principles of Database Systems (PODS), 1993.

[Papa95] D. Papadias, Y. Theodoridis, T. Sellis, M. Egenhofer, “Topological

Relations in the World of Minimum Bounding Rectangles: a Study with R-

trees”, Proceedings of ACM SIGMOD International Conference on

Management of Data, 1995.

[Papa96] D. Papadias, Y. Theodoridis, “Spatial Relations, Minimum Bounding

Rectangles, and Spatial Data Structures”, International Journal of

Geographic Information Systems (to appear), 1996.

https://www.researchgate.net/publication/221213205_R_Trees_A_Dynamic_Index_Structure_for_Spatial_Searching?el=1_x_8&enrichId=rgreq-bb64ada17337d56c2771a916d96011f3-XXX&enrichSource=Y292ZXJQYWdlOzIyMTI2NjIzNztBUzoxMDMwNTY3MDg0NzI4MzdAMTQwMTU4MjA0MjgyNw==



https://www.researchgate.net/publication/221214188_Spatial_Query_Processing_in_an_Object-Oriented_Database_System?el=1_x_8&enrichId=rgreq-bb64ada17337d56c2771a916d96011f3-XXX&enrichSource=Y292ZXJQYWdlOzIyMTI2NjIzNztBUzoxMDMwNTY3MDg0NzI4MzdAMTQwMTU4MjA0MjgyNw==



https://www.researchgate.net/publication/30867879_Multidimensional_Binary_Search_Trees_Used_for_Associative_Searching?el=1_x_8&enrichId=rgreq-bb64ada17337d56c2771a916d96011f3-XXX&enrichSource=Y292ZXJQYWdlOzIyMTI2NjIzNztBUzoxMDMwNTY3MDg0NzI4MzdAMTQwMTU4MjA0MjgyNw==

https://www.researchgate.net/publication/30867879_Multidimensional_Binary_Search_Trees_Used_for_Associative_Searching?el=1_x_8&enrichId=rgreq-bb64ada17337d56c2771a916d96011f3-XXX&enrichSource=Y292ZXJQYWdlOzIyMTI2NjIzNztBUzoxMDMwNTY3MDg0NzI4MzdAMTQwMTU4MjA0MjgyNw==

https://www.researchgate.net/publication/33030343_Efficient_Processing_of_Spatial_Joins_Using_R-Trees?el=1_x_8&enrichId=rgreq-bb64ada17337d56c2771a916d96011f3-XXX&enrichSource=Y292ZXJQYWdlOzIyMTI2NjIzNztBUzoxMDMwNTY3MDg0NzI4MzdAMTQwMTU4MjA0MjgyNw==



23

[Same89] H. Samet, “The Design and Analysis of Spatial Data Structures”, Addison-

Wesley, 1989.

[Sell87] T. Sellis, N. Roussopoulos, C. Faloutsos, “The R+-tree: A Dynamic Index

for Multidimensional Objects”, Proceedings of the 13th International

Conference on Very Large Databases (VLDB), 1987.

[Theo95a] Y. Theodoridis, T. Sellis, “On the Performance Analysis of Multi-

dimensional R-tree-based Data Structures”, Technical Report KDBSLAB-

TR-95-03, Knowledge and Database Systems Laboratory, National

Technical University of Athens, Greece, 1995.

[Theo95b] Y. Theodoridis, D. Papadias, “Range Queries Involving Spatial Relations:

A Performance Analysis”, Proceedings of the 2nd International Conference

on Spatial Information Theory (COSIT), 1995.

[Vazi93] M. Vazirgiannis, C. Mourlas, “An Object Oriented Model for Interactive

Multimedia Applications”, The Computer Journal, British Computer

Society, vol. 36(1), 1/1993.

[Vazi95a] M. Vazirgiannis, M. Hatzopoulos, “Integrated Multimedia Object and

Application Modeling Based on Events and Scenarios”, Proceedings of the

1st IEEE International Workshop for MM-DBMSs, 1995.

[Vazi95b] M. Vazirgiannis, T. Sellis, “Event and Action Representation and

Composition for Multimedia Application Scenario Modeling”, Technical

Report KDBSLAB-TR-95-08, Knowledge and Database Systems

Laboratory, National Technical University of Athens, 1995.

[Vazi95c] M. Vazirgiannis, Y. Theodoridis, T. Sellis, “Spatio-Temporal Composition

in Multimedia Applications”, Technical Report KDBSLAB-TR-95-09,

Knowledge and Database Systems Laboratory, National Technical

University of Athens, 1995.

https://www.researchgate.net/publication/2302255_Spatio-temporal_Composition_in_Multimedia_Applications?el=1_x_8&enrichId=rgreq-bb64ada17337d56c2771a916d96011f3-XXX&enrichSource=Y292ZXJQYWdlOzIyMTI2NjIzNztBUzoxMDMwNTY3MDg0NzI4MzdAMTQwMTU4MjA0MjgyNw==




https://www.researchgate.net/publication/220458064_An_Object-Oriented_Model_for_Interactive_Multimedia_Presentations?el=1_x_8&enrichId=rgreq-bb64ada17337d56c2771a916d96011f3-XXX&enrichSource=Y292ZXJQYWdlOzIyMTI2NjIzNztBUzoxMDMwNTY3MDg0NzI4MzdAMTQwMTU4MjA0MjgyNw==



https://www.researchgate.net/publication/243765083_The_Design_and_Analysis_of_Spatial_Data_Structure?el=1_x_8&enrichId=rgreq-bb64ada17337d56c2771a916d96011f3-XXX&enrichSource=Y292ZXJQYWdlOzIyMTI2NjIzNztBUzoxMDMwNTY3MDg0NzI4MzdAMTQwMTU4MjA0MjgyNw==

https://www.researchgate.net/publication/243765083_The_Design_and_Analysis_of_Spatial_Data_Structure?el=1_x_8&enrichId=rgreq-bb64ada17337d56c2771a916d96011f3-XXX&enrichSource=Y292ZXJQYWdlOzIyMTI2NjIzNztBUzoxMDMwNTY3MDg0NzI4MzdAMTQwMTU4MjA0MjgyNw==

Spatio-Temporal Indexing for Large Multimedia Applications

Documents