PROGRAMMING OUTDOOR DISTRIBUTED EMBEDDED …discolab.rutgers.edu/pubs/borcea-thesis04.pdf · PROGRAMMING OUTDOOR DISTRIBUTED EMBEDDED SYSTEMS BY CRISTIAN M. BORCEA A dissertation

PROGRAMMING OUTDOOR DISTRIBUTED EMBEDDEDSYSTEMS

BY CRISTIAN M. BORCEA

A dissertation submitted to the

Graduate School—New Brunswick

Rutgers, The State University of New Jersey

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

Graduate Program in Computer Science

Written under the direction of

Professor Liviu Iftode

and approved by

New Brunswick, New Jersey

October, 2004

c© 2004

CRISTIAN M. BORCEA

ALL RIGHTS RESERVED

ABSTRACT OF THE DISSERTATION

Programming Outdoor Distributed Embedded Systems

by CRISTIAN M. BORCEA

Dissertation Director: Professor Liviu Iftode

The next generation of computing systems will be embedded everywhere in the physical world.

These ubiquitous systems, deployed in a virtually unbounded number and dynamically con-

nected, will create outdoor computing environments. The thesis of my dissertation is that these

environments can be programmed to execute distributed applications using programming mod-

els and system architectures specifically designed to address their volatility, heterogeneity, and

scale.

My dissertation proposes Spatial Programming, a location-aware programming model for

outdoor distributed computing, and Smart Messages, a system architecture based on execution

migration that supports distributed computing over ad hoc networks of embedded systems.

Spatial Programming is a location-aware programming model that enables programmers to

easily develop distributed applications over dynamic networks of potentially mobile embedded

systems. Central to Spatial Programming is the concept of spatial reference, which defines a

virtual name space over networks of embedded systems using the expected locations and prop-

erties of these systems. Programmers use spatial references to access the content or services

provided by nodes in the network in the same way they use variables in a conventional program.

Similar to the mappings from virtual to physical memory in a conventional computer system, a

runtime system maintains mappings between spatial references and nodes in the physical space.

ii

A Spatial Programming runtime is implemented on top of the Smart Messages system ar-

chitecture, which provides a cooperative execution environment in networks of embedded sys-

tems. A Smart Message is a user-defined distributed program that executes on nodes of interest

named by their properties and reached using explicit execution migration. Smart Messages rep-

resent an attractive alternative to traditional distributed computing based on message passing in

mobile ad hoc networks because they adapt quickly to highly dynamic networks and provide

support for deploying new applications in existing networks.

To demonstrate the feasibility of the proposed solutions, we have designed and implemented

a prototype system, and we have performed simulations for larger scale networks. The experi-

mental results for several applications executed over wireless networks of pocket PCs indicate

that Spatial Programming and Smart Messages are viable solutions for outdoor distributed com-

puting.

iii

Acknowledgements

I would like to begin by expressing my gratitude to Liviu Iftode, my thesis advisor and mentor

for the past four years. His passion to conduct research that matters along with his enthusiasm

for exploring new ideas have been a constant source of inspiration for me. His guidance during

my time at Rutgers has been invaluable.

With Uli Kremer, I had countless discussions about the design of both Smart Messages and

Spatial Programming. I thank Uli for helping me focus on the high-level picture of my research.

His advice and feedback have greatly enhanced and strengthened the work. I also appreciate

very much the other two members of my thesis committee, Badri Nath and Yuanyuan Zhou, for

their support.

During my years at Rutgers, I certainly benefited from the interaction with many professors.

I especially enjoyed the system classes or seminars taught by Thu Nguyen, Rich Martin, and

Ricardo Bianchini. I also appreciated Ricardo’s skills on the soccer field, even though he was

usually my opponent.

The Smart Messages project involved the participation of a larger group of people. Special

thanks to Porlin Kang for his work in the implementation of Smart Messages and for being

always around to help. I also enjoyed a short, but very productive collaboration with Chalermek

Intanagonwiwat during his PostDoc at Rutgers.

I especially acknowledge all my colleagues from DiscoLab, and I would like to thank

Aniruddha Bohra, Murali Rangarajan, and Florin Sultan for providing me with feedback for

many practice talks.

To my friend Anda Iamnitchi, I owe the interest of pursuing a Ph.D. in the United States. I

also thank her for being always on-line to answer many questions that I had.

I would like to thank all my romanian friends at Rutgers who made my life in grad school

much more colorful than I initially hoped. Special thanks to Andreea and Cristi Francu who

iv

helped me pass over the difficult period of adapting to a new country. I would like to thank

Cristi Popescu for the useful advices that he constantly offered. Costel Serban has been a great

friend for such a long time, starting with our undergrad years in Romania and continuing all

this period at Rutgers.

I am deeply grateful to my brother for helping me with everything I asked him during the

last six years. Knowing that I can rely on him has provided me with peace of mind in many

situations. Last, but certainly not least, I am indebted to my parents for everything I am.

This work was supported in part by the NSF grant ANI-0121416.

v

Dedication

To My Family

vi

Table of Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1. Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2. Outdoor Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3. The Programmability Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4. Dissertation Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6. Contributors to Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.7. Dissertation Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2. Spatial Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3. Space Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4. Spatial References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5. Reference Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6. Space Casting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7. Spatial Reference Access Timeout . . . . . . . . . . . . . . . . . . . . . . . . 20

vii

2.8. Defining New Space Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.9. Creating/Removing Network Resources . . . . . . . . . . . . . . . . . . . . . 23

2.10. Putting It All Together: Program Example . . . . . . . . . . . . . . . . . . . . 23

2.11. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3. Smart Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1. Smart Messages Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2. Cooperative Node Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.1. Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.2. Local Injector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.3. Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.4. Admission Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.5. Tag Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.6. Synchronization Mechanism . . . . . . . . . . . . . . . . . . . . . . . 34

3.3. Smart Messages API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.1. Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.2. Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.3. Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.4. Setting Resource Requirements . . . . . . . . . . . . . . . . . . . . . 36

3.3.5. Tag Space Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4. Security Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4.1. Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.2. Protection Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5. Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5.1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5.2. SPIN using Smart Messages . . . . . . . . . . . . . . . . . . . . . . . 40

3.5.3. Directed Diffusion using Smart Messages . . . . . . . . . . . . . . . . 42

3.6. Smart Messages Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

viii

3.8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4. Smart Messages Self-Routing Mechanism. . . . . . . . . . . . . . . . . . . . . 46

4.1. Content-Based Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2. Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2.1. Selecting the Routing Algorithm . . . . . . . . . . . . . . . . . . . . . 49

4.2.2. Dynamically Changing the Routing Algorithm . . . . . . . . . . . . . 50

4.3. Implementing Routing Algorithms with Smart Messages . . . . . . . . . . . . 52

4.3.1. On-Demand Content-Based Routing . . . . . . . . . . . . . . . . . . . 52

4.3.2. Geographical Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3.3. Proactive Routing using Bloom Filters . . . . . . . . . . . . . . . . . . 54

4.3.4. Rendez-Vous Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5. Prototype Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . 62

5.1. Smart Messages Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1.1. Creating New Smart Messages . . . . . . . . . . . . . . . . . . . . . . 63

5.1.2. Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.1.3. Lightweight Migration . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.1.4. Code Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.1.5. I/O Tags for Interaction with the OS and I/O System . . . . . . . . . . 67

5.2. Smart Messages Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2.1. Cost of SM Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2.2. Cost of SM Migration . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2.3. Tag Space Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2.4. Routing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2.5. Application Case Study: EZCab . . . . . . . . . . . . . . . . . . . . . 74

5.3. Spatial Programming using Smart Messages . . . . . . . . . . . . . . . . . . . 77

5.4. Spatial Programming Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 81

ix

5.5. Experiences and Lessons Learned from Building our Prototypes . . . . . . . . 84

5.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

x

List of Tables

1.1. Traditional Distributed Computing Environments vs. Outdoor Distributed Com-

puting Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.1. Smart Messages API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1. Effect of Code Brick Size oncreateSMFromFiles . . . . . . . . . . . . . . . . 69

5.2. Effect of Data Brick Size onspawnSMandcreateSM . . . . . . . . . . . . . . 69

5.3. Cost of Tag Space Primitives for Application Tags . . . . . . . . . . . . . . . 72

5.4. Cost of Reading I/O Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.5. Completion Time for Routing Algorithms . . . . . . . . . . . . . . . . . . . . 73

xi

List of Figures

2.1. How to Program Motion Sensors and Intelligent Cameras Deployed over Two

Hills to Perform Distributed Object Tracking? . . . . . . . . . . . . . . . . . 13

2.2. Analogy Between Spatial Programming and Two Traditional Programming Mod-

els . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3. Example of Spatial References for Object Tracking in a Network Consisting of

Motion Sensors and Intelligent Cameras Deployed over Two Hills . . . . . . . 17

2.4. Example of Program using Spatial References . . . . . . . . . . . . . . . . . 18

2.5. Reference Consistency Example: A Spatial Reference is Mapped to the Same

System as long as this System Remains in the The Same Space Region . . . . 19

2.6. Space Casting: The Same System is Referenced in Different Space Region . . 20

2.7. Code Example for Spatial Reference Access Timeout . . . . . . . . . . . . . . 21

2.8. Dynamic Definition of a Relative Space Region . . . . . . . . . . . . . . . . . 22

2.9. Spatial Programming Application for Object Tracking . . . . . . . . . . . . . . 24

3.1. Traditional Distributed Applications vs. Smart Messages Applications . . . . . 26

3.2. Distributed Computing Using Smart Messages . . . . . . . . . . . . . . . . . 27

3.3. Smart Message Code Example . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4. Execution Path for the Above Smart Message . . . . . . . . . . . . . . . . . . 28

3.5. Cooperative Node Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.6. Application and I/O Tag Structures . . . . . . . . . . . . . . . . . . . . . . . 33

3.7. SM Protection Domains for Tag Space Access . . . . . . . . . . . . . . . . . 38

3.8. Access Control Example For Smart Message Family Cooperation (Ni are Nodes,

SMi are Smart Messages, andT is a Tag) . . . . . . . . . . . . . . . . . . . . . 38

3.9. Access Control Example For Single Originator Cooperation (Ni are Nodes,SMi

are Smart Messages, andT is a Tag) . . . . . . . . . . . . . . . . . . . . . . . 39

xii

3.10. Access Control Example for Code-based Cooperation (Ni are Nodes,SMi are

Smart Messages, andT is a Tag) . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.11. Implementation of SPIN with Smart Messages . . . . . . . . . . . . . . . . . 41

3.12. Directed Diffusion using Smart Messages . . . . . . . . . . . . . . . . . . . . 44

3.13. SPIN using Smart Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.14. Directed Diffusion - Multiple Smart Messages . . . . . . . . . . . . . . . . . 44

3.15. SPIN - Multiple Smart Messages . . . . . . . . . . . . . . . . . . . . . . . . 44

4.1. Example of Smart Message Using Content-based Migration . . . . . . . . . . 47

4.2. Example ofmigrateImplementation . . . . . . . . . . . . . . . . . . . . . . . 48

4.3. Dynamic Change of Routing Due to Application’s Requirements . . . . . . . . 50

4.4. Dynamic Change of Routing Due to Network’s Conditions . . . . . . . . . . . 51

4.5. Example of On-demand Routing Implementation with Smart Messages . . . . 53

4.6. Lookup in Proactive Routing: An SM arrives at node A, looking for a “fire”

tag. Applying the hash functions on “fire”, it concludes that the neighbors of C

might know better about “fire”, and migrates to C. A lookup on node C leads

to the conclusion that the “fire” tag exists on node F. . . . . . . . . . . . . . . 55

4.7. Rendez-Vous Routing with Smart Messages . . . . . . . . . . . . . . . . . . . 56

4.8. Completion Time for Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . 58

4.9. Bytes Sent in the Network for Experiment 1 . . . . . . . . . . . . . . . . . . . 58





5.1. Smart Message Transfer (Main Operations) . . . . . . . . . . . . . . . . . . . 66

5.2. I/O Tag Example (Using GPS to Get the Current Location) . . . . . . . . . . . 68

5.3. Cost of Data Brick Serialization . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.4. Cost of Data Brick De-Serialization . . . . . . . . . . . . . . . . . . . . . . . 70

5.5. Effect of Code Brick Size on Single Hop Migration . . . . . . . . . . . . . . . 71

5.6. Effect of Data Brick Size on Single Hop Migration . . . . . . . . . . . . . . . 71

xiii

5.7. Network Topology for Routing Experiments . . . . . . . . . . . . . . . . . . . 72

5.8. Route Discovery in EZCab . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.9. Cab Booking following a Route Discovery in EZCab . . . . . . . . . . . . . . 74

5.10. EZCab Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.11. Estimated Completion Time for EZCab . . . . . . . . . . . . . . . . . . . . . 77

5.12. Implementation of Spatial References with Smart Messages . . . . . . . . . . 78

5.13. Example of Spatial Reference Access . . . . . . . . . . . . . . . . . . . . . . 80

5.14. Java Code for Intrusion Detection Application . . . . . . . . . . . . . . . . . 82

5.15. The Network Topology for Intrusion Detection Application . . . . . . . . . . 83

5.16. Typical Camera Node with GPS Receiver Attached . . . . . . . . . . . . . . . 83

5.17. Smart Message Code Breakdown for Intrusion Detection Application . . . . . 83

5.18. Spatial Programming Runtime Library Code Breakdown . . . . . . . . . . . . 83

5.19. Execution Time for Intrusion Detection Application . . . . . . . . . . . . . . 84

xiv

List of Abbreviations

NES Networks of Embedded Systems

SP Spatial Programming

SM Smart Message

GPS Global Positioning System

VM Virtual Machine

AN Active Networks

xv

1

Chapter 1

Introduction

1.1 Thesis

The thesis of my dissertation is that outdoor computing environments can be programmed to

execute distributed applications using programming models and system architectures specifi-

cally designed to address their volatility, heterogeneity, and scale.

1.2 Outdoor Computing Environments

Recent advances in technology will likely realize the vision of ubiquitous computing [85, 53],

where the physical world is populated with a sheer number of heterogeneous embedded systems

that can sense, monitor, or control our surrounding environment. Unlike traditional embedded

systems which have very limited resources and are used to execute simple, dedicated functions,

these emergent systems are much more powerful and can be used to program a variety of tasks.

For instance, we are witnessing computers embedded in cars, video cameras, cell phones, or

even watches [6] that are powerful enough to run applications on top of reduced versions of

traditional operating systems.

So far, these systems have been mostly used for local computations. Most of them are,

however, equipped with short-range wireless network interfaces (e.g., IEEE 802.11, Bluetooth).

Hence, they can create mobile ad hoc networks of embedded systems (NES) which can be pro-

grammed to execute distributed applications. NES offer the opportunity to program a large

spectrum of distributed applications, ranging from simple data collection and data dissemina-

tion [42, 39, 34] to remote object tracking using robots equipped with video cameras [43] or

inter-car collaboration to improve the safety and fluidity of the traffic [4]. This type of dis-

tributed applications will soon span non-traditional computing domains, such as health care,

2

Term of Comparison Traditional Networks Networks of Embedded SystemsLocation Indoor OutdoorNodes Functionally Homogeneous Functionally HeterogeneousOperation Under User’s Control UnattendedScale Relatively Small LargeTopology Stable Ad Hoc and VolatileResources Known A Priori/Infrequent Changes Limited A Priori Knowledge/Highly Dynamic

Table 1.1: Traditional Distributed Computing Environments vs. Outdoor Distributed Comput-ing Environments

transportation, or homeland security. This huge potential will not be achieved, however, with-

out proper support for programming outdoor distributed applications.

1.3 The Programmability Problem

This dissertation tries to answer the question:how to program outdoor distributed applica-

tions? Most of the recent research in NES area has focused focused on hardware, operating

systems, or network protocols for sensor networks [42, 71, 35, 39]. We believe that a crucial

challenge which has been only marginally tackled is how to program distributed applications in

outdoor computing environments. Developing outdoor distributed applications requires us to

understand the unique challenges possessed by NES. Table 1.1 presents a comparison between

the networks used in traditional distributed computing and NES. Unlike traditional distributed

computing which takes place “indoors” over relatively small scale networks with stable con-

figurations, distributed computing over NES takes place “outdoors” over large scale networks

with highly dynamic configurations. Since NES are composed of a massive number of hetero-

geneous systems, which may be mobile and volatile, it is impossible to know the exact number

or location of various network resources over time.

To leverage the raw computing power provided by NES into distributed applications, we

need programming models and system architectures that are able to overcome the NES volatil-

ity, heterogeneity, and scale. Traditional distributed computing models (e.g., message passing,

shared memory) cannot satisfy this requirement because they have not been designed for out-

door computing environments. Writing distributed programs under these models is relatively

easy when the underlying networks are composed of functionally homogeneous nodes and

have stable configurations with acceptable delays. On the other hand, when the networks are

3

composed of functionally heterogeneous nodes and have volatile configurations with unknown

delays, such as NES, developing distributed applications becomes much more difficult.

Several basic assumptions in traditional distributed computing models render them unus-

able in NES. Their main assumption is the end-to-end data transfers between applications resid-

ing on different nodes. One problem with end-to-end data transfers is that they may complete

very slowly, or may not complete at all in volatile networks [84]. Even if the network topol-

ogy is stable, the wireless nature of communication (e.g., in sensor networks, the experienced

packet loss between two neighbor nodes is as high as 40-50%) will lead to the same problems.

Since applications have no control over the network, they are forced to wait indefinitely (or

until the connection times out) each time something goes wrong in the network. To be able

to adapt quickly to network volatility, applications would like to regain the control as soon as

possible.

Another problem with traditional end-to-end data transfers is that they do not allow in-

network processing in order to reduce the size of data transferred by applications [33]. Reduc-

ing the amount of traffic in the network is important in mobile ad hoc networks, such as NES,

since it leads to reduced bandwidth and energy consumption. Therefore, outdoor applications

would also benefit from the ability to perform in-network processing.

Traditional distributed computing assumes fixed bindings between names and node ad-

dresses. This naming is too rigid for NES. After a fixed binding has been established dur-

ing the name resolution phase, an application is forced to contact the same node each time

it needs to access a resource of the same type. Commonly, name resolvers react slowly to

network changes, and applications would try to contact a node long time after this node has

become unreachable, even though nodes with similar resources exist in the network. To pre-

vent such a situation, more flexible naming is needed in NES. We believe that content-based

naming [39, 13] can provide a solution because it allows applications to contact any node that

has a certain resource.

Since content-based naming makes fixed addresses (e.g., IP) irrelevant, the routing and

name resolution should be integrated in NES. Additionally, given the diversity of applications,

no single routing will provide good performance for all applications. Therefore, similar to

active networks [25, 59, 77], it would be desirable to let applications use the best-suited routing

4

for their needs. For instance, an application may use geographical routing to reach a node with

known location, while another application may use content-based routing to reach a node with

a certain property.

In traditional distributed computing models, the programmers install the applications man-

ually on all the nodes involved in computation. Under this assumption, it is practically impos-

sible to have a new application in NES after the network deployment phase. Thus, we would

like to have an automatic method for deploying new applications in existing networks.

The conclusion of the above arguments is that traditional distributed computing models

cannot work in NES due to the unique characteristics exhibited by these networks. Currently,

the only alternative approaches are ad hoc and provide limited flexibility; they are designed for

specific classes of applications (e.g., querying the network for certain data) and can hardly ac-

commodate new applications or services after the network has been deployed. As the domain of

possible applications diversifies, there will be an increasing demand for a common distributed

computing platform to support arbitrary applications over NES. Such a platform has to support

simple development and rapid prototyping of new distributed applications. It also has to allow

applications to cope with the uncertainty encountered in NES (i.e., the network topology as

well as the resources at nodes are unknown a priori and can vary greatly over time).

1.4 Dissertation Contributions

Spatial Programming is a location-aware programming model that enables programmers to

easily develop distributed applications over dynamic networks of potentially mobile embedded

systems. Similar to the view of the network as a database (used in sensor networks), Spa-

tial Programming (SP) views the network as a single virtual name space. Central to SP is the

concept of spatial reference which defines a virtual name space over networks of embedded

systems using the expected locations and properties of these systems. In SP, network resources

(content or services provided by nodes) are accessed using spatial references in the same way

memory is accessed using variables in conventional programming. Similar to the mappings

from virtual to physical memory in a conventional computer system, a runtime system main-

tains mappings between spatial references and nodes in the physical space. For every access

5

to a spatial reference, the runtime system takes care of name resolution and binding, commu-

nication, and routing. We implemented a runtime system for SP on top of Smart Messages.

SP is presented in two papers appeared in the9th International Workshop on Future Trends of

Distributed Computing Systems (FTCDS 2003)[38] and the24th International Conference on

Distributed Computing Systems (ICDCS 2004)[19], respectively.

The Smart Messagessystem architecture, based on execution migration, content-based

naming, and self-routing, has been designed specifically to support the development of arbi-

trary distributed applications over NES. The SM computing platform assumes a decentralized

architecture, where nodes in the network act as peers. A Smart Message (SM) is a user-defined

distributed program which executes on nodes of interest named by their properties and reached

using explicit migrations. An SM carries its execution state (and possibly its code) during mi-

grations and self-routes at each intermediate node between two nodes of interest. SMs do not

make any assumptions about the underlying network configuration, except for a minimal sys-

tem support provided by nodes, which include a virtual machine and a name-based memory,

called tag space. The virtual machine offers a hardware abstraction layer for SM execution

which shields SMs from heterogeneous node configurations. The tag space offers a name-

based memory, persistent across SM executions. It consists of named data pairs, called tags,

which are used for data exchange among SMs (application tags) or for accessing the local host

properties (I/O tags).

In networks of embedded systems, SMs represent an attractive alternative to traditional

distributed computing based on end-to-end message passing for several reasons. First, SMs

allow applications to adapt to highly dynamic network configurations. The routing code can

be instructed to return the control to application as soon as a route cannot be found or after

an application-set timeout. Since its execution state is already at the same node, the SM can

quickly adapt to changes in the network. Second, the content-based routing provides the flexi-

bility to reach a node that offers a certain property in an application-controlled manner. Third,

SMs simplify the deployment of new applications in the network after the network deployment

phase has ended. A user can inject SMs at any node in the network, and consequently the

SMs migrate their code each time the code is not cached at the node they are executing on.

And fourth, SMs can significantly reduce the amount of traffic generated by certain classes of

6

applications (e.g., processing big amounts of data at its source).

To demonstrate the feasibility of SMs, we have designed and implemented the SM proto-

type in Java over Linux. The SM system support is implemented within Sun Microsystem’s

K Virtual Machine which was designed specifically for resource constrained devices (its size

is 160KB). The testbed used for the prototype’s evaluation consisted of an ad hoc network of

PDAs (HP iPAQs equipped with IEEE 802.11 wireless cards). The details of the SM prototype

are presented in aComputer Journalpaper [44].

For larger scale evaluation, we have simulated SM-based implementations of existing ap-

plications for sensor networks. The results are presented in a paper appeared in the22nd In-

ternational Conference on Distributed Computing Systems (ICDCS 2002)[21] and a chapter

in theHandbook of Sensor Networks[37]. We have also demonstrated the benefits of the SM

self-routing mechanism using different routings (e.g., content-based on-demand routing, geo-

graphical routing, proactive routing with Bloom filters). This work resulted in a paper appeared

in the1st IEEE Conference on Pervasive Computing and Communication (PerCom 2003)[20].

For this research, we developed an event-driven simulator extended with support for SM exe-

cution.

The security issues and solutions for the SM architecture have been presented in a pa-

per appeared in the1st Workshop on Mobile Distributed Computing (MDC 2003)[86]. To

demonstrate the feasibility of the SM computing platform for real-world applications, we have

developed EZCab, an application for locating and booking free cabs in densely crowded traffic

environments, which resulted in a technical report [67].

1.5 Related Work

Recent projects [31, 73, 12, 15, 70] have presented programming models for ubiquitous/pervasive

computing. SP shares some of their goals, but its main design goal is to provide simple abstrac-

tions to program distributed applications for systems embedded in the physical space. These

abstractions decouple the access to network resources from the networking details.

Although geographical routing [48, 49] and content-based naming and routing [13, 39]

have been extensively studied, a simple and intuitive programming model that allows the user

7

to express the computation in terms of physical location and content (or services) provided by

nodes is still missing. SP offers such a model, and its runtime system takes advantage of these

routing algorithms (especially of those developed for ad hoc networks).

The “database” model [18, 55] for programming sensor networks is a research complemen-

tary to SP. For instance, TAG [55] defines an SQL-like language for sensor networks. Both SP

and TAG provide simple programming constructs that shield the programmer from the underly-

ing network. There are two main differences between SP and this work. First, the programmer

has fine-grained control over execution in SP, while TAG depends entirely on the compiler (i.e.,

essentially SP offers an imperative language, while TAG offers a declarative language). Sec-

ond, SP focuses on flexible abstractions that support programming for uncertainty in highly

dynamic networks, while TAG focuses on a set of queries executed efficiently in the network.

Abstract regions [54] and EnviroTrack [82] have a similar goal to SP. While SP provides

a high-level programming model for outdoor distributed computing, they offer high-level pro-

gramming abstractions for sensor networks. Abstract regions select a subset of nodes of interest

based on certain properties and allow programmers to apply various operations (e.g., maximum,

reduction) on these subsets. Unlike abstract regions, SP is able to access individual nodes in

the network in a consistent fashion. Additionally, its implementation based on SMs allows

simple deployment of new applications in existing networks and quick adaptability to network

conditions. EnviroTrack defines a distributed middleware that provides a convenient API for

programmers writing applications that track objects in the physical world. While EnviroTrack

is focused on a specific category of applications, SP provides a general programming model

that can be used for any type of distributed application over networks of embedded systems.

SensorWare [22] can be an alternative solution for the SP runtime system, especially in

networks composed of devices with extremely limited resources (e.g., sensor networks). Sen-

sorWare is similar to SMs in the sense that both SensorWare and SMs are systems based on

code migration. Therefore, both are suitable for re-programming the network. SMs, however,

offer the advantage of programming in a well known language (Java) which is supported on

many embedded systems today [1]. Also, the tag space abstraction provided by the SM ar-

chitecture and the SM self-routing mechanism simplify the implementation of the SP runtime

system.

8

The SM platform shares the idea of execution migration with process migration [58, 64, 11],

mobile agents [46, 29], and active networks [25, 59, 77].

Unlike process migration which has been used to increase performance or availability in

stable networks, the main goal of the SM platform is to provide flexible support for program-

ming distributed applications over highly dynamic NES. Additionally, process migration and

SM migration differ in two aspects. First, the SM migration is explicit (i.e., the programmer

decides when and where to migrate), while process migration is implicit (i.e., the system de-

cides when and where to migrate a process). Second, the SM architecture avoids one of the

most difficult problems in process migration: transferring the kernel state (e.g., sockets, file de-

scriptors). The SM platform does not transfer any kernel state because SMs interact with local

hosts through atomic operations performed on the tag space, and they do not open explicitly

communication channels.

SMs are influenced by the design of mobile agents. Similar to a mobile agent, an SM may

be viewed as an application that explicitly migrates between nodes of interest. Mobile agents,

however, name nodes by fixed addresses and commonly know the network configuration a pri-

ori, while SMs name nodes by content and discover the network configuration dynamically.

In contrast to mobile agents, SMs are responsible for their own routing at each node in the

path between two nodes of interest. This feature allows SMs to adapt quickly to changes that

may occur both in the network topology and the availability of resources at nodes. Further-

more, the SM system architecture is suitable for resource constrained devices since it defines a

lightweight system support at nodes, with most of the “intelligence” incorporated into SMs.

Although the SM computing platform (especially the self-routing mechanism), shares some

of the design goals and leverages work done in active networks (AN), it differs from AN in

several key features. A first difference comes from the problems they try to solve: AN target

improved performance for end-to-end data transfer in relatively stable networks, while the SM

platform helps the development of distributed applications on top of a new computing infras-

tructure which is significantly under-used due to the lack of programmability support. Unlike

AN, we define a computing model whereby several SMs can cooperate, exchange data, and

synchronize with each other through the tag space. In terms of migration, AN do not transfer

9

the execution state from node to node whereas the SM model does. The migration of the execu-

tion state for SMs trades off overhead for flexibility to react “on-the-spot” to adverse network

conditions.

Sensor networks represent the first attempt toward deploying large scale NES. Most of

the research in this area has focused on hardware [42, 71], operating systems [35], or net-

work protocols [39, 34, 17]. Even though sensor networks act primarily as huge distributed

databases [18, 56], more sophisticated applications might be needed in the future. Toward this

end, SensorWare [22] and Mate [51] have proposed solutions for network re-programmability.

The SM architecture takes one step further and proposes a distributed computing model that is

flexible enough to be implemented for nodes with very limited resources such as those encoun-

tered in sensor networks.

Among many projects that target the programmability of ubiquitous computing environ-

ments,one.world[31] is similar to SMs in the sense that both consider migration an essential

mechanism to adapt to highly dynamic computing environments. Each application inone.world

has at least one environment that contains tuples (similar to tags in the SM architecture), ap-

plication’s components, and other nested environments. When needed, a migration moves a

checkpointed copy of an environment to another node. A significant difference between SMs

andone.worldis that our work proposes a computing model based on execution migration,

while one.worlduses migration just as a mechanism to adapt to changes (i.e., in their program-

ming model, the applications reside on nodes and communicate through remote event passing).

Another difference is that the SM architecture is more suitable for resource constrained devices

whereasone.worldis designed for more powerful nodes.

SMs represent the underlying platform for Spatial Views, a high-level programming model

for networks of embedded systems, targeting their dynamic, space-sensitive and resource-

restrained characteristics. The core of the model is iterative programming over a dynamic

collection of nodes identified by the physical spaces they are in and the services they provide.

Hidden in the iteration is execution migration, as the main collaboration paradigm, constrained

by user specified limits on resource usage such as response time and energy consumption. A

Spatial Views prototype has been implemented and first results are reported in [62]. A Spatial

Views compiler with SMs as its target is currently being implemented.

10

The tag space bears some similarity with tuple spaces [24, 50]. While both offer persistent

shared memory for applications, the essential difference is that the tag space is local to each

node. Also, unlike tuple spaces, the tag space provides SMs with I/O tags for interaction with

the local OS and I/O subsystem. The concept of I/O tags share the same goal withLinux

Procfs[7] which allows user-level programs to access certain kernel information.

Content-based naming has been recently presented for both the Internet [13, 83, 32] and

sensor networks [33]. SMs use content-based migration to reach the nodes of interest. This

high-level migration function implements routing algorithms which leverage work done for

mobile ad hoc networks [41, 68, 48].

Although the security for both mobile agents [30, 47] and ad hoc networks [69, 36] have

been extensively studied, we have faced a new and more difficult problem: how to define a

security architecture for a system based on execution migration over mobile ad hoc networks?

Given the complexity of this problem, our current architecture provides solutions for protecting

the hosts against SMs and SMs against each other. It is much harder, however, to prevent

an SM from being tampered by a malicious host. Since SMs have to execute at any host,

end-to-end authentication based on digital signatures or encrypting the entire message are not

possible. Hardware solutions [9, 66] represent an option, but they involve extra-costs. Complete

software solutions are not known yet, but code confusion and encryption techniques have been

investigated [27, 76] in the context of mobile agents.

Coupled with security comes the issue of admission control at nodes. A significant amount

of research has been done to solve this problem for real time systems [78, 74] and active net-

works [28, 59]. Given that we did not want to limit the expressibility of the programming

language (e.g., SNAP [59]), our solution is based on user-provided lower bounds for resources

and non-preemptive execution. Each node has the flexibility to implement its own schedul-

ing and resource allocation policies which are typically integrated. These policies guarantee

enough resources to satisfy the lower bounds and let the SM migrate in case no more resources

are allocated. A problem that remains to be solved is how to protect the network, as a whole,

against malicious SMs that waste network resources, but respect the admission contract at each

node. TTL-based [25] or market-based [30] schemes offer possible solutions.

11

1.6 Contributors to Dissertation

Porlin Kang has contributed significantly to the current implementation of the Smart Messages

prototype [44]. Deepa Iyer has implemented a preliminary Smart Message prototype [21].

Phillip Stanley-Marbell has participated to the initial design of Smart Messages [79]. Akhilesh

Saxena has implemented the routing algorithm based on Bloom filters, which have been used

together with other routing algorithms to demonstrate the Smart Messages self-routing mech-

anism [20]. Gang Xu has implemented the protection domains and the corresponding API

for Smart Messages security architecture [86]. Peng Zhou has implemented a flooding-based

routing algorithm and the GUI for the EZCab application [67]. The following is a list of all

my colleagues that co-authored papers from which I used material in this dissertation: Porlin

Kang, Chalermek Intanagonwiwat, Deepa Iyer, Akhilesh Saxena, Gang Xu, Peng Zhou, Phillip

Stanley-Marbell, Kiran Nagaraja, Andrzej Kochut, and Tamer Nadeem.

1.7 Dissertation Roadmap

This dissertation is organized as follows. Chapter 2 describes Spatial Programming, a location-

aware programming model for outdoor distributed computing. In Chapter 3, we present the

Smart Messages, a system architecture based on execution migration which provides system

support for Spatial Programming. The Smart Messages self-routing mechanism, which al-

lows applications to dynamically change the routing algorithm, is discussed in Chapter 4. The

prototype implementation and evaluation for Smart Messages and Spatial Programming are

presented in Chapter 5. The dissertation concludes with Chapter 6.

12

Chapter 2

Spatial Programming

This chapter presents the design of Spatial Programming (SP), a location-aware programming

model for outdoor distributed computing. We start with the motivation for a high-level pro-

gramming model that can shield the programmers from the complex networking aspects en-

countered in NES, thus, allowing them to focus on the algorithmic details of the applications.

SP is introduced through an analogy with conventional programming and distributed program-

ming using shared virtual memory. After presenting a short SP overview, we describe the

main SP concepts, including spatial references, space regions, reference consistency, and ac-

cess timeout. The chapter concludes with an SP application for distributed object tracking that

illustrates all these concepts.

2.1 Motivation

Massive networks of embedded systems (NES) will become common in the near future as the

trend of embedding “intelligence” everywhere in the physical world increases. These networks

can be programmed to execute a large variety of distributed applications. Traditionally, the

main focus of distributed computing has been on performance or availability. Instead, the focus

of distributed computing over NES will be on enabling the systems embedded in the physi-

cal world to perform collaborative tasks. This type of distributed computing is more difficult

than traditional distributed computing because the state of the network evolves continuously

over time. Therefore, it is practically impossible to know the network topology or the node

properties at any moment in time.

To motivate the need for a novel programming model for outdoor distributed computing, let

us consider a collaborative object tracking application as illustrated in Figure 2.1. For this ap-

plication, two types of nodes are assumed available across a given geographical region: motion

13

Hill1 Hill2

motion

motion

motion

motion

motionmotion motion

motion

camera

camera

camera cameracamera

Figure 2.1: How to Program Motion Sensors and Intelligent Cameras Deployed over Two Hillsto Perform Distributed Object Tracking?

sensors and intelligent cameras. Each node is capable of determining its location (i.e., using

GPS [45] or other localization methods [71, 63]). The motion sensors remain static after de-

ployment, but the cameras can be mobile (e.g., carried by mobile robots [43]). The nodes may

fail or be deployed far from each other preventing them from participating in the computation.

Since motion sensors are less expensive, their number is significantly greater than the number

of cameras.

A potentially mobile user can start an application (e.g., from a wireless-enabled PDA) that

performs object tracking across a given geographical region. This application checks the status

of motion sensors in the desired region. Each time motion is detected, the application turns on a

certain number of cameras located in the proximity of that sensor and instructs them to perform

collaborative object tracking in order to identify the object that triggered the motion sensor.

During this process, the application accesses repeatedly the selected cameras and uses the par-

tial results computed at each node to dynamically determine the next action. Once the object

tracking completes, the active cameras are turned off. This application emphasizes the main

question that any programming model for outdoor distributed computing has to answer: how

to program anunknown number of volatile embedded systems(i.e., mobile or even disposable)

to execute a user-defined application in a certain geographical area?

The task stated above is difficult and tedious to program using the traditional message pass-

ing programming model. Characteristics of message passing systems (e.g., Message Passing

Interface (MPI) standard [8]) include explicit management of communication with possible

deadlocks due to mismatched communication pairs and “all or nothing” semantics. The pro-

grammers would also have to take care of all the details involved in reaching the area of interest

14

and contacting the target nodes located there. This is not a trivial task in a volatile network with

unknown configurations. In our example, the programmer does not know how many camera

nodes are there, or where exactly they are located. Additionally, the network dynamics (caused

by failures, mobility, or deployment of new nodes) may cause the application to fail since fixed

addressing schemes treat exceptions as failures.

To simplify the development of distributed applications in NES as well as to allow for rapid

prototyping, we need a programming model that shields the programmers from most of the

networking aspects. A simple way to present the programmers with high level abstractions

for writing distributed applications is to use a declarative programming style. Declarative pro-

gramming, used mostly for querying databases, is goal-oriented in the sense that programmers

simply specify what they want instead of how to algorithmically obtain the results. Multiple so-

lutions for programming sensor networks illustrate this programming style in NES [18, 55, 56].

The “database” model of programmability suits sensor networks well because these networks

act primarily as large “distributed databases” for the environments where they are deployed.

Despite its simplicity, declarative programming is not a panacea for every type of task

or NES. Imperative programming is more appropriate for complex tasks that go beyond data

collection, especially tasks whereby algorithmic details matter. Also, networks composed of

more powerful nodes (e.g., systems in cars, cell phones, intelligent cameras, mobile robots [6,

43]) cannot be programmed in a simple and effective way without having fine-grained control

over individual network resources. To summarize, a programming model for NES needs to

answer the following questions:

• How to write simple and intuitive programs for NES?

• How to refer to nodes in a network-transparent way?

• How to use the location of the nodes in computation?

• How to discover and access repeatedly resources at nodes?

• How to cope with network dynamics?

15

Page Table

Physical Memory

Application

Address SpaceVirtual

Conventional Computer System

Message Passing

Physical Memories

Page Table &

Variable Access

Shared Virtual Memory

Space Region

Systems Embedded

Spatial Programming

RuntimeSpatial Programming

Outdoor DistributedApplication

Address SpaceShared Virtual

ApplicationDistributed Spatial ReferenceVariable Access

in Physical Space

Figure 2.2: Analogy Between Spatial Programming and Two Traditional Programming Models

2.2 Overview

Spatial Programming (SP) is a location-aware programming model designed to satisfy these

requirements. The main idea of SP is to offer network-transparent, fine-grained access to data

and services distributed on systems embedded in the physical space. In SP, a network of physi-

cally distributed systems is viewed as a single virtual address space, and its individual resources

at nodes can be accessed by applications like normal variables. SP hides the distributed nature

of the underlying infrastructure. An application written under the SP model is a sequential

program that can transparently read and write network resources as they are local variables

declared in this program. Similar to the mappings from virtual to physical memory in a con-

ventional computer system, a runtime system maintains mappings between spatial references

and nodes in the physical space. SP applications can cooperate indirectly through shared net-

work resources. The SP model allows for a large spectrum of outdoor distributed applications,

ranging from computing the average/maximum temperature over a given geographical region

to collaborative applications such as distributed object tracking or coordinating military forces

on the battlefield. Typical applications for SP are those which execute a distributed algorithm

over a set of nodes selected based on their location and properties.

Given the scale of NES and the fact that most of the nodes work unattended, it is practi-

cally impossible to re-program each node individually for every new application. Therefore,

the SP implementation (described in Chapter 5) moves the application, sequentially, at each

node whose properties or content have to be accessed by the application. Thus, although trans-

parent to the application programmer, the actual execution takes place on the nodes hosting the

16

resources being accessed.

The high level view of the network as a single virtual address space is similar to the one

presented by shared virtual memory systems [52] (i.e., shared virtual memory shields the pro-

grammers from message passing communication, while offering a shared virtual address space

for distributed applications). A major difference, however, is that shared virtual memory is per-

formed over a stable and robust network, with an acceptable upper bound for memory access

time, while SP must tolerate dynamic network configurations, with unknown time bounds for

accessing systems embedded in the physical space. Figure 2.2 illustrates this analogy and the

simple abstractions defined by SP to support outdoor distributed programming:space regions

andspatial references.

2.3 Space Regions

Unlike traditional distributed systems where the physical location of the nodes does not mat-

ter, the spatial distribution of nodes across physical space is a key feature of massive NES.

These networks will span buildings, large facilities such as campuses or airports, or even roads

and forests. Most envisioned distributed applications for NES will exhibit a location-aware

behavior. In order to achieve their prescribed objectives, they will need to run within certain

geographical regions. For instance, the motivating application described at the beginning of

the chapter may want to activate intelligent cameras within a physical range of the trigger node

(the sensor that detected motion) since otherwise no causal relation can be established.

SP considers location a first order programming concept and exposes it to applications

through space regions. A space region is a virtual representation of a given physical space.

SP applications may use statically defined spaces or create dynamically new spaces. Static

definitions are used to describe physical spaces that do not change over time and are commonly

provided in the form of names associated with geographical regions (e.g., using topological

maps). In Figure 2.3,Hill1 andHill2 are defined as two circular regions in a two-dimensional

space. For the clarity of exposition, we will describe the creation of dynamic space regions in

a subsequent section after the introduction of spatial references.

17

Hill2

{Hill1:motion[0]} {Hill2:camera[0]}{Hill1:camera[1]}

{Hill1:camera[2]}{Hill1:camera[0]}

motion motion

Hill1

{Hill2:motion[0]}

cameracamera

camera

camera

Figure 2.3: Example of Spatial References for Object Tracking in a Network Consisting ofMotion Sensors and Intelligent Cameras Deployed over Two Hills

2.4 Spatial References

A spatial reference is defined as a{space:tag} pair which is mapped to a system embedded in

the physical space. Thespaceis a space region that represents the geographical scope of this

system. Thetag is the name of a property or service provided by the same system. Tags are

not globally unique because they name properties or services that can be provided by multiple

systems. Spatial references, like variables, are defined within applications; hence, a spatial

reference has meaning only within the application that defined it.

Spatial references provide applications with a virtual resource naming in the network. Ap-

plications access network resources using spatial references in the same way they access phys-

ical memory through variables in conventional systems (or in shared virtual memory systems).

Given that programmers have only limited knowledge about such dynamic networks (i.e., a

programmer does not know how many resources are in a given space, what types they are,

or even if they exist at all), spatial references offer a convenient method to refer to network

resources using theirexpectedlocations and properties.

Figure 2.3 presents examples of spatial references. To differentiate among systems with

the same space-tag pair referenced in the same application, programmers can use indexes to

refer to distinct systems. Thus, a spatial reference becomes a triplet{space:tag[index]}. SP

guarantees that spatial references with distinct indexes (but the same space-tag pair) map to

different systems. The figure shows how a programmer can use three distinct indexes to refer

to distinct cameras onHill1 .

18

1 Image[] getImages(Location location, int n){2 Image []image = new Image[n];3 for(int i=0; i<n; i++){4 {Hill1:camera[i]}.active = ON;5 {Hill1:camera[i]}.focus = location;6 image[i] = {Hill1:camera[i]}.image;7 }8 return image;9 }

Figure 2.4: Example of Program using Spatial References

An SP application can name and access multiple network resources provided by a node

using just one spatial reference. The construct{space:tag[index]}.resourcerefers to a certain

resourcelocated on the system referenced by{space:tag[index]}. In Figure 2.3,

{Hill1:camera[0]}.activemay denote the status of the camera, while{Hill1:camera[0]}.location

may represent the location of this system in space. To illustrate better the use of these concepts

in applications, Figure 2.4 shows a code fragment, where a program activates three cameras on

Hill1 , focuses them toward a certain location, and collects the images taken by these cameras.

This example demonstrate the SP simplicity, where applications can write (lines 4-5) or read

(line 6) resources at nodes using spatial references in a similar fashion to the way they use

variables. Spatial references relieve programmers from the burden of having to cope with all

the networking details of reaching the nodes of interest and accessing data or services on those

nodes. This is possible since applications are build on top of an SP runtime system which takes

care of name resolution, communication, and access to resources. The SP runtime system also

guarantees that each index maps to a different system in the same space region and ensures

reference consistency.

2.5 Reference Consistency

Conventional computer systems maintain reference consistency for variables. The operating

system uses per-application page tables to guarantee that each time an allocated variable is

used, it accesses the same physical memory location. Similarly, SP guarantees that each time an

application uses a certain spatial reference, it accesses the same system as long as this system

19

Hill1 Hill2(before motion)

{Hill1:camera[0]}.active = OFF;{Hill1:camera[0]}.active = ON;(after motion)

Motion Path

Figure 2.5: Reference Consistency Example: A Spatial Reference is Mapped to the SameSystem as long as this System Remains in the The Same Space Region

remains in its original space region. This property provides the ability to perform arbitrary

distributed computations over a subset of nodes selected based on their location and properties.

The SP runtime system maintains mappings between spatial references and the nodes they

refer to. These mappings are maintained in aper-application mapping tableand are persistent

during the SP program execution. At the time of the first access, a spatial reference is mapped to

a node located in the desired space region which provides the required property. Each mapping

table entry contains the location of the referenced node and a unique per-application network

address for this node. The location is used for faster subsequent accesses to this node. The

network address is assigned by the application (i.e., it has no global meaning) and is used to

confirm the identity of the node for subsequent accesses (a referenced node may move from

its recorded location, and another node may take its place). This address can also be used to

locate, in the same space, referenced nodes that moved from their recorded locations. Figure 2.5

shows how reference consistency works for SP applications. Once a spatial reference has been

mapped to a camera node, it can be used repeatedly by its application to access the same camera

even when this camera moves. To be semantically acceptable, the node has to remain in the

space region it was at the time of the first access (i.e., when the mapping was created).

In some situations, reference consistency is not necessary. For instance, an application that

needs to contact periodically a number of temperature sensors located in a certain region and

compute the average temperature may accept any sensor that provides the desired space-tag

pair. In such a case, if a referenced node cannot be found in its space region, the runtime

system should transparently remap the spatial reference to a similar node rather than returning

20

Hill1 Hill2

motion path

{Hill1:camera[0]}(before motion)

{Hill2:(Hill1:camera[0])}(after motion)

camera camera

Figure 2.6: Space Casting: The Same System is Referenced in Different Space Region

an exception for a failed access. To implement this feature, SP allows an application to specify

a remapflag for spatial references.

2.6 Space Casting

The SP runtime system locates the same node each time an application uses the same spatial

reference, provided that the node is still in its space region. If a node moves out of its space re-

gion, it becomes semantically unacceptable. Thus, the application receives a timeout exception

(the system could not find the node during the timeout interval). However, if the programmer

still wants to access this node and has knowledge about the node’s mobility patterns, the space

region for the spatial reference mapped to this node can be modified usingspace casting. The

construct{space2:(space1:tag[index])} changes the geographical scope of the spatial refer-

ence fromspace1to space2. Figure 2.6 shows how space casting is used to reach a camera

carried by a mobile robot which has moved fromHill1 to Hill2 . If the new space for a node

is unknown, a programmer can use theAnywherespace constant to cast a spatial reference to

any space. Note that in such a case thetimeoutensures that the attempted access will not take

forever.

2.7 Spatial Reference Access Timeout

Unlike traditional computer systems where the access time to memory is finite and an upper

bound for this time can be computed (i.e., by adding the miss penalties in the memory hier-

archy), in a volatile and dynamic NES, it is difficult to estimate how long it takes to access a

21

1 Image[] getImages(Location location, int n, int timeout){2 Image []image = new Image[n];3 try{4 for(int i=0; i<n; i++){5 {Hill1:camera[i], timeout}.active = ON;6 {Hill1:camera[i], timeout}.focus = location;7 image[i] = {Hill1:camera[i], timeout}.image;8 }9 }catch(TimeoutException e){

10 if (i < n/2)11 return null; // abort if less than half cameras12 // otherwise continue with a lower quality result13 }14 return image;15 }

Figure 2.7: Code Example for Spatial Reference Access Timeout

network resource. This problem happens both for new references (no more available systems

with the required space-tag pair) and for mapped references (they may become invalid because

the referenced node can move from its space or simply cease to exist).

SP requires application programmers to reason about the possibility of not reaching a node

by imposing atimeouton each spatial reference (i.e., the format of a spatial reference becomes

{space:tag[index], timeout}). This timeout allows a programmer to limit the access time to

a network resource which, given the volatility of the network, may take forever. Essentially,

SP defines a “best effort” semantics that allows an application to make progress and get a

semantically acceptable result even in adverse network conditions. If a node cannot be reached

in the specified time interval, the SP runtime throws a timeout exception; once the application

catches this exception, it can decide about further actions.

Figure 2.7 shows the same code presented before in Figure 2.4, except for the added timeout

at each spatial reference. If the access to one of the spatial references times out, the application

catches aTimeoutexception (line 9). In this example, the application goes ahead with a partial

result if images from at least half of the required number of cameras have been acquired already

(lines 10-12). Otherwise, it aborts the computation and returns null.

Commonly, the programmer sets each timeout based on a constraint imposed by the user on

the total execution time (e.g., the total time is divided equally among all accesses, or each new

22

Hill2Hill1

Rangemotion

{Hill1:motion[0]}

{rangeOf({Hill1:motion[0]}, Range):camera[0]}

camera

Figure 2.8: Dynamic Definition of a Relative Space Region

access can have the entire remaining time). If no such constraint is imposed, the SP runtime

system considers a “default” value for the timeout.

2.8 Defining New Space Regions

Besides statically defined space regions, SP also supports dynamically defined space regions.

Composedspace regions can be defined using the union or intersection operators (i.e., these

space regions are also defined as circles that circumscribe the actual physical space). If we

consider the hills from our examples throughout this section, a spatial reference{(Hill1 +

Hill2):camera[0]} returns a camera node located on eitherHill1 or Hill2 .

Defining relativespace regions based on the position of a referenced node offers two ben-

efits for applications: access to systems located in dynamically defined space regions, and

possibility to “remember” a space region where a certain event took place, even after the node

that produced (or detected) this event is no longer there.

The rangeOfoperator defines a space region in the proximity of a node referenced by a

spatial reference. Figure 2.8 shows how such a relative space is dynamically defined and used

to refer to a camera node located in the proximity of a motion sensor. Similar torangeOf, SP

defines thenorthOf, southOf, eastOf,andwestOfoperators. They create space regions relative

to the position of a referenced node and the respective cardinal direction (the center of the

circular region is located toward that cardinal direction at a given distance from the position of

the referenced node).

23

2.9 Creating/Removing Network Resources

In addition to accessing resources that already exist at nodes, SP programs can also dynami-

cally create/remove their own resources. For instance, an application may need to create new

resources in order to store data in the network (i.e., similar to creating files in a file system).

The primitives that offer this functionality are:

create({space:tag[index], timeout}.resource)

remove({space:tag[index], timeout}.resource)

Currently, SP provides just a limited resource sharing policy: the resources provided by

nodes are shared, and the resources created by applications are private.

2.10 Putting It All Together: Program Example

We conclude this chapter by presenting the code (Figure 2.9) for the object tracking application

used throughout the chapter. This application emphasizes the novel concepts introduced by

SP, as well as the simplicity of programming under this model. Additionally, it represents the

class of applications that can benefit mostly from the SP model: applications that execute a

distributed algorithm over a set of nodes selected based on their content and spatial properties,

and during computation, they access repeatedly the nodes from this set.

The application checks the status ofNsmotion sensors onHill1 (lines 1-3). Once the mo-

tion is detected at one of the monitored sensors, a relative space,motionSpace, is created around

that sensor in order to perform object tracking within its proximity (line 4). Any node located

in motionSpacethat is not active (i.e., not working for other applications) is turned on, focused

to the location of motion, and added to the set of active cameras until the desired number of

Nc active cameras has been reached (lines 5-11). If a timeout exception is raised during this

computation, the application has to decide what to do next. In our example, the application

accepts a possibly lower quality of result and goes ahead if at least half of the desired number

of cameras is found. Otherwise, it restarts monitoring the motion sensors (lines 13-17). During

the object tracking (line 18), the cameras may be accessed multiple times due to the reference

consistency feature of SP. The actions taken at a camera node depend on the partial results

computed at previously visited nodes. If a camera moves out ofmotionSpaceduring the object

24

1 for(i=0; i<Ns; i++){ // loop over Ns motion sensors2 try{3 if ({Hill1:motion[i], timeout}.detect == true){4 motionSpace = rangeOf({Hill1:motion[i], timeout}, Range);5 location = {Hill1:motion[i], timeout}.location;6 for(j=0, k=0; j<Nc; k++) // build the set of Nc cameras7 if ({motionSpace:camera[k], timeout}.active == OFF){8 {motionSpace:camera[k], timeout}.active = ON;9 {motionSpace:camera[k], timeout}.focus = location;

10 activeCameras[j++] = {motionSpace:camera[k], timeout};11 }12 }13 }catch(TimeoutException e){14 if (j < Nc/2)15 continue; // continue monitoring the motion sensors16 // otherwise, do object tracking with lower quality of result17 }18 result=objectTracking(activeCameras);19 for(k=0; k<j; k++)20 activeCameras[k].active = OFF;21 return result;22 }

Figure 2.9: Spatial Programming Application for Object Tracking

tracking, the application may just ignore it or use space casting to re-discover it (considering

the execution time and typical motion speeds, the camera should be in the proximity ofmo-

tionSpace). The application ends by turning off the set of active cameras (lines 19-20). This

operation is also enabled by the reference consistency property of SP.

2.11 Summary

In this chapter, we have presented the design of Spatial Programming (SP), a location-aware

programming model for outdoor distributed computing. SP offers fine-grained, network-transparent

access to systems embedded in the physical space. Central to SP is the concept of spatial refer-

ence, which defines a virtual name space over NES using the expected locations and properties

of these systems. Programmers use spatial references to access the content or services pro-

vided by nodes in the network in the same way they use variables in a conventional program.

The main benefits of SP are the flexibility and simplicity to program user-defined distributed

applications in highly volatile outdoor computing environments.

25

Chapter 3

Smart Messages

This chapter describes the Smart Messages (SMs) distributed computing platform for networks

of embedded systems, which can be used to program any-user defined distributed application.

To simplify the application development, we have used the SM platform to implement Spatial

Programming. In this chapter, we describe the SM system architecture, based on execution

migration, content-based naming, and self-routing. Additionally, we present the node archi-

tecture (i.e., nodes in the network cooperate by providing a common system support) and the

security architecture for SMs. After describing the SM API, we demonstrate the features of the

SM platform by implementing and evaluating two previously proposed applications for sen-

sor networks (SPIN [34] and Directed Diffusion [39]). For evaluation, we have developed an

event-driven simulator, extended with support for SM execution. The chapter concludes with

simulation results for these two applications.

3.1 Smart Messages Architecture

Smart Messages (SM) define a distributed computing platform for NES based on execution

migration. Instead of transferring data (i.e., data migration) among nodes involved in the com-

putation, applications developed over the SM platform transfer the execution to each of these

nodes. Figure 3.1 illustrates the difference between the traditional distributed computing using

data migration and distributed computing with Smart Messages.

Let us assume that a user needs to contact three nodes that provide certain services. In the

data migration approach, the user application gets the addresses of the nodes, and then it sends

requests and waits for answers. This approach works well in relatively stable networks such as

the Internet. On the other hand, in more volatile networks such as NES, the user application may

take an indefinite amount of time to complete with this approach. For instance, let us consider

26

��

��

Node3

(1)

(3)

(5)

(2) (4)

(6)

for(i=0; i<3; i++){

// do computation}

send(request, address[i]); receive(data, address[i]);

DataReceive

RequestData

RequestData

DataReceive

DataReceive

RequestData

Node2

Node1

User Node

Network

��

��

Node3

(1)

(2)

(4)

(3)

read(data); // do computation}

migrate(property[i]);for(i=0; i<3; i++){

ExecutionMigration

ExecutionMigration

Execution

ExecutionMigration

MigrationNode1

Node2

User Node

Network

Figure 3.1: Traditional Distributed Applications vs. Smart Messages Applications

that these responses are time-sensitive. If the third one does not arrive (e.g., congestion, broken

routes, failed service node), the application can only wait (for an indefinite amount of time) or

re-issue all the requests.

Using execution migration, however, the application can adapt dynamically to changing

network conditions. In the SM architecture, the application discovers the nodes of interest

sequentially and executes on each of them. Thus, the application can make incremental progress

and eventually complete even in highly volatile networks. Additionally, since the nodes are

named by properties, the application can discover similar nodes even if its initial targets become

unavailable.

Distributed applications built on top of the SM architecture are collections of SMs. An

SM is a user-defined application whose execution is moved sequentially over a series of nodes

using execution migration. The nodes on which SM applications execute, called “nodes of in-

terest”, are named by properties, discovered using application-controlled routing, and switched

when the SM application calls for execution migration. The payload of an SM consists of data

“bricks”, explicitly identified in the application, and execution control state. Code “bricks”

may also be transferred if the code is not cached at destination. An SM can carry multiple data

and code bricks, and it can use them to create new SMs during its execution. In this way, an

application can eventually generate multiple SMs although it has started as a single SM.

27

Node 1(Node of interest)

Node 2(Intermediate Node)

Node 3(Node of interest)

Migration

Migration

Application

Routing

Application

CodeCode CodeCacheCache Cache

TagSpaceSpace

Machine MachineMachineVirtual Virtual Virtual

Tag TagSpace

Figure 3.2: Distributed Computing Using Smart Messages

The SM computing platform assumes a decentralized architecture, where nodes in the net-

work act as peers. SMs do not make any assumptions about the underlying network configura-

tion, except for a minimal system support provided by nodes: avirtual machine, a name-based

memory, calledtag space, and acode cache. The virtual machine offers a hardware abstrac-

tion layer for SM execution, which shields SMs from heterogeneous node configurations. The

tag space offers a name-based memory, persistent across SM executions. It consists of(name,

data) pairs, called tags, which are used for data exchange among SMs. Special I/O tags are

used as interface to the host OS and I/O system. Tags serve also to name the destination of

SM migrations and store routing information (routing tags). The code cache stores frequently

accessed code bricks in order to amortize the cost of transferring code over time. Figure 3.2

depicts the execution of an SM over three nodes. The SM application code starts onNode1and

finishes onNode3. The SM reachesNode3by explicitly migrating from node to node.Node2

is used as an intermediate hop, where only the SM routing code executes. Note that an SM

executes and potentially carries both application and routing code.

To illustrate how NES are programmed using SMs, we present a very simple example con-

sisting of an SM that books cabs in a densely populated city. Let us consider a group of people

attending a conference, who wants to return to the conference venue after an “off-site” lunch.

Instead of calling a cab company or waiting on the street for a free cab, one of them uses her

handheld device to inject an SM in the network to book a certain number of free cabs. Each cab

provides support for SM execution and is identified by aFreeCabtag name. The code for this

28

int numCabs, i; //stored in data brickLocation loc; //stored in data brickfor(i=0; i<numCabs; i++){

migrate("FreeCab");deleteTag("FreeCab");writeTag("Location", loc);

}

Figure 3.3: Smart Message Code Example

Data BrickApplication CodeRouting Code

migrate("FreeCab") migrate("FreeCab")

i=1 i=2i=0i=0 i=1Mes

sage

Smar

t

sys_migratesys_migratesys_migrate sys_migrate

... ... .........

CabClient Occupied FreeCab

OccupiedCab

FreeCab

Figure 3.4: Execution Path for the Above Smart Message

application is shown in Figure 3.3, and the SM execution path through the network is depicted

in Figure 3.4. The SM migrates to free cabs, changes their status from free to occupied (by

removing theFreeCabtag), and instructs them to come to the client’s location (by writing to

Locationtag). The SM completes after booking the desired number cabs.

The key operation in the SM programming model is multi-hop, content-based migration,

which implements routing using tags. An SM names the nodes of interest by tag names (which

represent properties or content of that node), and then calls a high-levelmigrate function to

route itself to a node that has the desired tags. In our example,migrate(“FreeCab”) routes

the SM to free cabs using the occupied cabs as intermediate nodes. This high-level function

uses the low-levelsysmigrate primitive, provided by the SM system software, for one-hop

migration. After a migration, the SM resumes from the next instruction following the migrate

call. It is important to notice that migration is explicit (i.e., the programmer callsmigratewhen

needed).

29

Figure 3.4 emphasizes two major characteristics of the SM architecture. First, the high-

level, content-based migration shields the application programmer from the routing details.

Although the routing code is executed at each node as the SM migrates hop-by-hop through the

network,migratereturns the control to application only on nodes of interest (i.e., free cabs).

Second, the data transferred during a migration is specified by the programmer as data bricks;

the variablesnumCabs, i, andloc are stored in a data brick and carried from node to node during

migrations (the figure shows howi is modified during execution).

From a user’s perspective, this model offers resilience to dynamic network configurations

and simple deployment of new distributed applications in the network. An application pro-

grammer can write simple sequential programs that migrate to nodes named by content and

execute there, while ignoring the routing which is embedded inmigratefunctions. These are

user-level functions, typically developed by system programmers. Applications can choose be-

tween multiplemigrate functions and adapt to dynamic network configurations by switching

these functions during execution.

To achieve good performance in networks composed of resource constrained nodes, we

have decided against involving the VM in determining which data is needed across migrations.

In our architecture, the VM captures the minimal execution control state required for SMs to

resume at the instruction following a migration. Although this decision puts clearly a burden

on programmers, it avoids the overhead of having the VM collect the “live data” of SMs; many

times this operation is not only time consuming, but also collects more data than necessary (i.e.,

conservative approach), thus increasing the amount of traffic in the network.

3.2 Cooperative Node Architecture

In order to execute SM-based applications, the nodes must cooperate to support SM execution

and routing. The entire SM model is built under the assumption that the node architecture

must be kept as simple and flexible as possible. Figure 3.5 shows the system components of a

cooperative node.

30

Tag Space

Incoming SM Migrating SMNetwork Network

SM Ready Queue

Injector

Cache

Manager Scheduler Virtual

Authorization

Application I/O

Code

Admission

Tags Tags

Machine

OS & I/O

Local

SM Platform

Figure 3.5: Cooperative Node Architecture

3.2.1 Virtual Machine

The virtual machine (VM) executes VM-level threads generated by incoming SMs. To migrate

an SM, the VM captures the execution state and sends it along with the code and data bricks to

the next hop. The VM at destination resumes the SM at the instruction following themigrate

call.

3.2.2 Local Injector

The local injector allows the users to start new SMs at the local node. A VM-level thread is

generated for each new SM. This thread is stored in theSM ready queueand dispatched for

execution according to the scheduling policies.

3.2.3 Scheduler

The SM execution is non-preemptive; other SMs can be accepted, but they are not dispatched

for execution before the current SM completes. The non-preemptive scheduling simplifies the

implementation of inter-SM synchronization and sharing. Additionally, we envision that the

overhead introduced by more complex scheduling will not be justified for NES applications,

which typically have short execution time.

31

3.2.4 Admission Manager

To prevent excessive use of resources (e.g., processor cycles, tag space memory, runtime mem-

ory, bandwidth), the nodes have to perform admission control on incoming SMs. The admission

control at nodes ensures the progress of all SMs running in the network. It also prevents SMs

from migrating to nodes where they cannot achieve anything due to resource constraints. SMs

present their resource requirements in a resource table. The admission manager receives the re-

source table, decides whether to accept the SMs or not, and enqueues the accepted SMs into the

SM ready queue. It also instructs an accepted SM to transfer only the missing code bricks (i.e.,

the code bricks that are not stored locally) and stores them in the code cache upon reception.

The admission manager makes the admission decision based on the current state of the node

and the SM’s resource requirements. This decision is based on the admission policy in effect

at that node. An accepted SM is guaranteed non-preemptive execution as long as its resource

usage does not exceed certain limits defined by the admission policy. For instance, a node may

run out of battery and decide to accept only SMs for which it is a node of interest, but reject all

SMs that need to route through it. If an SM is rejected, the migration call fails at the source,

and the SM regains the control.

Precise resource usage for SMs cannot be predicted in advance because their computations

depend not only on user-provided input data, but also on data gathered from the network during

execution. To be able to perform admission, the admission manager needs, however, at least

approximate information about SMs’ resource requirements. One solution would be to specify

upper bounds for the resource requirements. We have dismissed this idea for two reasons: com-

puting relatively precise upper bounds is as hard as predicting the actual resource usage (i.e.,

we do not have knowledge about data acquired at runtime), and large upper bounds may lead

to frequent rejections at nodes even though the SM may consume significantly less resources

during its execution.

Our solution requires each SM to specify its lower bounds for resource requirements. The

programmers can set them before any one-hop migration, and they define the minimum amount

of resources that may lead to SM completion or migration. The programmers may use com-

piler support to derive lower bounds for resource requirements. The declaration of these lower

32

bounds serves two purposes: protect SMs from migrating to a node that cannot offer enough

resources for any semantically acceptable result, and protect the resources at the node from be-

ing wasted on such SMs. Based on the admission policy, the system may grant more resources

to SMs that have exceeded their lower bounds during execution. If no more resources could

be granted, the system raises an exception which, by default, terminates the SM. The SM is al-

lowed, however, to catch this exception, to save data of interest in data bricks, and to migrate. A

limited amount of resources is reserved during admission for the exception handler. To ensure

a successful migration for this case, the SM has to declare, during admission, the maximum

amount of data it plans to carry to the next hop.

3.2.5 Tag Space

The tag space provides a name-based memory and a unique interface to the local OS and I/O

system. It consists of a collection of tags that can be divided into two categories: (1)application

tags which are created by SMs and used for inter-SM communication and synchronization, and

(2) I/O tags which belong to nodes and allow SMs to access system resources. The structures

of these tags are shown in Figure 3.6. Each tag has a name (unique at a node, but not globally

unique) which is similar to a file name in a file system. SMs use this name for content-based

naming.

Application tags are commonly used for data exchange among SMs because their data

portion can store application-specific data. For instance, an SM can build a routing table in a

tag, and other SMs can subsequently read the routes from this tag. Each application tag has a

lifetime that specifies the duration after which the tag expires and its memory is reclaimed by

the node.

I/O tags act as a gateway between SMs and the underlying OS and I/O system. Usually,

each I/O tag is associated with an external process, which communicates with the VM through

a standard interface. Each time an I/O tag is accessed by an SM, its associated external process

interacts with the local resources and returns a response to the SM.

The access to tag space is protected using an access control list (ACL). The application tags

have also ownership information (i.e., OwnerID and FamilyID). We defer the description of the

protection mechanism to Section 3.4.

33

Name Data Lifetime OwnerID Name ACL I/O Handler

I/O TagApplication Tag

SM Blocked QueueFamilyID ACL

Figure 3.6: Application and I/O Tag Structures

Similar to existent solutions [10, 5], we use namespaces to avoid naming conflicts; a tag

name is preceded by a namespace (i.e.,namespace:tagname). The I/O tags have a pre-defined

namespace,ions, which is known by any SM. The namespaces for application tags, on the other

hand, are defined by the SMs that create them. Each SM has a unique default namespace which

is used when a reference to a tag name is not preceded by a namespace. The system where the

SM is injected generates this unique namespace, and every SM created dynamically inherits it

from its parent SM.

An SM may use other namespaces to cooperate with SMs that do not belong to its family.

Accessing tags in other namespaces does not create problems because the access is subject

to access control. Creating new tags, however, may lead to naming conflicts. For instance,

two different SMs may create two tags with the same name, but with different semantics. A

solution to this issue is to ensure that conflicting namespaces are extremely rare in practice

(e.g., a namespace is a long random string of bits). The developers that need to cooperate can

exchange these namespaces off-line.

Although simple, this solution is not bullet-proof. If an SM needs to ensure that conflicts are

avoided, it has to usesecurenamespaces (i.e., by definition, a secure namespace is preceded by

the keywordsecure). At the compilation time, the compiler builds the list of secure namespaces

used in tag creation invocations throughout each code brick. The compiler has to be able

to generate the list of namespaces (i.e., the namespaces are either directly specified, or the

compiler is able to determine them using static analysis); if the compiler is not able to find at

least one possible namespace for a tag, the compilation fails.

At injection time, the SM must present a capability for each namespace in the compiler-

generated list. Therefore, the developer of a code brick (or the developer of an SM) has to

acquire these capabilities such that each code brick of an SM has an associated list of capabili-

ties. During SM injection, the system verifies the capabilities and creates a list of namespaces

34

for each code brick. This list together with the default namespace is maintained in the SM

structure and cannot be modified over time. A child SM inherits the list of namespaces for the

code bricks that compose it. If an SM does not present a capability for every namespace in the

list generated by the compiler, it will be rejected during the injection phase.

A central authority (CA) keeps track of all secure namespaces and their owners. Each time

a namespace owner decides to allow a code brick to create tags within that namespace, she

associates a capability, digitally signed by the CA, with this code brick; the capability contains

the hash value of the code brick. Similar to ANTS [25], this value is obtained by applying a

hash function on the code itself. Each node has the public key of the CA and the common hash

function. During SM injection, the VM uses the CA’s public key and the capability to verify

that the code bricks are authorized to use the secure namespaces.

3.2.6 Synchronization Mechanism

Given the non-preemptive SM execution, we have devised a simple update-based synchroniza-

tion mechanism for inter-SM communication. An SM can block on an application tag until

another SM performs a write on that tag. A blocked SM is appended to theSM blocked queue

and yields the processor (this is the only exception to our run-to-completion model of execu-

tion). After an SM blocks, the scheduler may dispatch other SMs for execution. When an SM

writes to an application tag with a non-empty SM blocked queue, all SMs in the queue are

woken up and made ready for scheduling. To prevent infinite blocking, if no write operation

takes place within a given timeout, SMs are unblocked and made ready for scheduling.

3.3 Smart Messages API

The SM API is presented in Table 3.1. SMs are allowed to create new SMs dynamically, migrate

one-hop to neighbor nodes, access the tag space, set lower bounds for resource requirements,

and synchronize on tags. Also, the SMs can use the uniform interface provided by the tag space

to execute system calls on the local host (i.e. through I/O tags). The use of the SM primitives

is extensively illustrated in Section 3.5 and throughout the next chapter.

35

Category Primitives

createSMFromFiles(codefiles, databricks);createSM(codebricks, databricks);spawnSM();

Smart Messages sysmigrate();blockSM(tagname, timeout);setResources(resources);createTag(tagname, lifetime, data);deleteTag(tagname);

Tag Space readTag(tagname);writeTag(tagname, data);

Table 3.1: Smart Messages API

3.3.1 Creation

Initially, an SM is injected at a node as a program file, and it callscreateSMFromFileswith a list

of program file names and data bricks to create a new SM structure. An SM may usecreateSM

to assemble a new, possibly smaller SM using some of its code and data bricks. AcreateSMcall

is commonly used to build an SM that cooperates with the current one (e.g., a route discovery

SM). An application that needs to clone itself callsspawnSM(similar to thefork system call in

Unix). Typically, spawnSMis invoked when the current SM needs to migrate a copy of itself

to nodes of interest while continuing the execution at the local node. A new SM generated by

createSMor spawnSMis scheduled for execution at the local node.

3.3.2 Migration

Thesysmigrateprimitive implements one-hop migration. It captures the execution state, sends

the resource table for admission, transfers the accepted SMs, and resumes these SMs at destina-

tion. Thesysmigrateis used by high-levelmigratefunctions to route SMs to nodes of interest.

More details about the SM self-routing mechanism are presented in Chapter 4.

3.3.3 Synchronization

TheblockSMprimitive allows SMs to block on a tag pending a write by another SM. Typically,

an SM uses this primitive to wait for a route. For instance, an SM can create a route discovery

SM and block on a routing tag until the route discovery SM returns (i.e., the route discovery

36

SM writes to the routing tag, and thus wakes up the blocked SM).

3.3.4 Setting Resource Requirements

Programmers invokesetResourceseach time they need to set new lower bounds for resource

requirements. Typically, this primitive is called once per high-level migration invocation and

specifies two categories of lower bounds: resources needed for routing, and resources needed

for computation at the node of interest (i.e., the target of migration). The resources are imple-

mentation specific, but they include at least: number of VM cycles, amount of runtime memory,

amount of tag space memory and the duration for which this memory is needed, I/O tags to be

accessed, and maximum number of bytes that would be generated when migrating this SM to

another node. An SM is not required, however, to set the resource requirements. In such a case,

the admission is based only on the size of the SM, but the node does not provide any type of

guarantees (i.e., the SM can be terminated or asked to migrate at any moment). Our current

prototype, described in Section 5.1, uses this very simple solution.

3.3.5 Tag Space Access

An SM can create, delete, or access application tags. As mentioned in Section 3.2, the tags are

accessed subject to authorization. The same interface is used to access the I/O tags: SMs can

issue commands to I/O devices by writing into I/O tags, or can get I/O data by reading from

I/O tags (an SM cannot create or delete I/O tags).

3.4 Security Architecture

One of the traditional pitfalls of existing systems based on mobile code is security. Similar

to mobile agents, there are three main issues that have to be solved: (1) protecting recipient

hosts from SMs, (2) protecting SMs from each other, and (3) protecting SMs from malicious

hosts. These problems become more severe for SMs due to the volatile nature of NES. Unlike

traditional mobile agents for relatively stable IP-based networks, the SMs have to overcome

the lack of an infrastructure or a central authority, specific to mobile ad hoc networks, which

increases significantly the difficulty of key authentication and group management.

37

In this section, we present a basic security architecture for SMs, which focuses on providing

protected access to the tag space. This security architecture offers protection against malicious

SMs under the assumption that the SM system software at nodes is trusted (i.e., we do not

protect SMs against compromised hosts). To protect against compromised systems, we plan

to develop a distributed trust mechanism [23], which helps a node assign trust values to its

one-hop neighbors; a node deemed untrusted is simply removed from the list of neighbors.

Optionally, an SM may ask to be migrated in an encrypted form between neighbor nodes. To

support this, each node carries a pair of public/private keys.

3.4.1 Access Control

A unique characteristic of SMs is that no direct access is allowed to system resources (i.e.,

the SMs access both their data and system resources through the tag space). The advantage of

this design is that the tag space is a single point of access control, which can be implemented

and enforced uniformly. Compared to mobile agent systems [30], the tag space simplifies

greatly the control mechanisms. The SM creating a tag, called tag’s owner, determines the

access control policy and delegates the host to enforce this policy on its behalf. Protecting the

application tags ensures that SM executions do not interfere with each other, and therefore,

provides a secure channel for SM cooperation.

A tag incorporates the ID of its owner, the ID of its owner’s family, the address of the

node where its owner’s family originated, and its ACL (access control list). SMs are uniquely

identified by the node address where they originated and the time of their creation. We define a

family of SMs as all SMs originated from an SM injected in the network by a user. The family

ID is the ID of the original SM. Since an SM can migrate or spawn new SMs at intermediate

nodes, its family information can be used to enforce access control for an entire family of SMs.

The ACL is a matrix of subjects and their access permissions to tags, read(r) or write(w). The

ACL contains five protection domains:Owner, Family, Origin, Code, andOthers.

Each time an SM tries to execute an operation on a tag, the VM performs the authorization

process. Based on the credentials presented during admission and the currently executing code

brick, the SM is associated with at least one protection domain. A user or the SM itself cannot

forge an SM’s identification information because this information is set automatically by the

38

Others

Owner

Origin

Code

Family

Figure 3.7: SM Protection Domains for Tag Space Access

SM1

N1

N4

N3

N2 SM2

SM1 SM1

SM2N5

{Family, rw}T

Figure 3.8: Access Control Example For Smart Message Family Cooperation (Ni are Nodes,SMi are Smart Messages, andT is a Tag)

system. The request is granted if the SM has the necessary permissions to access the tag in any

of the protection domains it has been associated with.

3.4.2 Protection Domains

TheOwnerandOthersprotection domains define the access permissions for the SM that owns

the tag and for any SM, respectively. The group concept, defined as an arbitrary relation over

SMs, supports more flexible cooperation, but also requires high overhead of managing the

group membership on-the-fly. Currently, our architecture does not support dynamic coopera-

tion among totally independent SMs. Instead, we define three protections domains that allow

cooperation among well-defined groups of SMs (i.e.,Family, Origin, Code). Figure 3.7 shows

that an SM can be associated with multiple protection domains for a tag. In the following, we

present three scenarios that illustrate the protection domains for group cooperation.

Family cooperation. In Figure 3.8, all cooperative SMs originate from a common SM

ancestor. For instance,SM1 is created onN1 and migrates toN2. At this node it creates a child,

SM2, which migrates and creates a tagT on nodeN5. To allowSM1 to access this tag,SM2 sets

39

N2

N1 SM2

SM1

N3 N4

SM2N5

SM1

SM2

{Origin, rw}T

Figure 3.9: Access Control Example For Single Originator Cooperation (Ni are Nodes,SMi areSmart Messages, andT is a Tag)

r=

r=

r

N5SM2

SM1

N1

N2(C ,C )2SM2

SM1 (C ,C )1

N3 {Code=(C ), rw}

N4

T

Figure 3.10: Access Control Example for Code-based Cooperation (Ni are Nodes,SMi areSmart Messages, andT is a Tag)

the ACL to{Family, rw} (i.e., the familyID ofT is the same as the family ID ofSM1).

Single originator cooperation.Figure 3.9 shows the scenario when the group of coopera-

tive SMs originate from a common node.SM1 andSM2 are created on nodeN1 and migrate to

a target nodeN5 via different paths.SM1 arrives atN5 beforeSM2 and creates a tagT. It also

sets the ACL as{Origin, rw} such thatSM2 will be able to accessT (i.e., the unique IDs of

SM1 andSM2 contain the same origin ID). This scenario is very likely to be encountered since

many nodes are small devices, such as PDAs or cell phones, owned by a single user.

Code-based cooperation.In addition to the simple groups described before, the SM group

cooperation can be coordinated more flexibly based on code bricks. To ensure cooperation

among SMs that are aware of the code used for data sharing or data exchange, each tag has a

list of associated hash values for certain code bricks. These hash values define the members

of the Codegroup (they may or may not belong to the owner of the tag). By definition, an

SM is a member of theCodegroup if the hash value of its currently executing code brick

belongs to this list. For instance, SMs using the same routing brick can add the hash value

corresponding to this brick to the tag’s list of hash values in order to facilitate route sharing

among them. Figure 3.10 presents such an example.SM1 creates a tagT and sets the ACL to

40

{Code=(Cr), rw} to grant access to all the other SMs using theCr routing brick. Hence,SM2

has the permissions to useT.

3.5 Application Examples

To prove that virtually any protocol or application can be written using SMs, we have imple-

mented two previously proposed applications: SPIN [34] and Directed Diffusion [39]. They

present different paradigms for content-based communication and computation in sensor net-

works: SPIN is a protocol for data dissemination, and Directed Diffusion implements data

collection.

3.5.1 Background

SPIN [34] is a family of adaptive protocols that disseminates information among nodes in a

sensor network. We present an implementation of SPIN-1 which is a three-stage handshake

protocol for data dissemination. Each time a node obtains new data, it disseminates this data in

the network by sending an advertisement to its neighbors. The node receiving the advertisement

checks if it has already received or requested that data. If not, it sends a request message back

to the sender asking for the advertised data. The initiator sends the requested data, and then,

the process is executed recursively for the entire network.

In Directed Diffusion [39], a sink node requests data by sending “interests” for named data.

Data matching an interest is then “drawn” from source nodes toward the sink node. Interme-

diate nodes can cache and aggregate data; they may also direct interests based on previously

cached data. At the beginning, the sink may receive data from multiple paths, but after a while it

will reinforce the path providing the best data rate. All future data will arrive on the reinforced

path only.

3.5.2 SPIN using Smart Messages

To illustrate a distributed application written using SMs, Figure 3.11 presents the code for our

implementation of SPIN. The tag space at each node hosts two tags: the value of the most

recent data received (tagData), and the timestamp associated with this data (tagTimestamp).

41

1 DisseminateSM(String tag, int timeout){2 // Data Brick3 int timestamp;4 Data data;5 String tagData=tag+"data";6 String tagTimestamp=tag+"timestamp";7 Address src, dest;8 // Code Brick9 while(true){ // SM at source

10 blockSM(tagData, timeout);11 timestamp = readTag(tagTimestamp);12 if (spawnSM() == 0){ // child SM13 while(true){ // SM at every node14 src = getLocalAddress();15 sys_migrate(all); // migrate to all neighbors16 int localTimestamp = readTag(tagTimestamp);17 if (timestamp <= localTimestamp){18 // the same or more recent data exists at this node19 System.exit(0);20 }21 writeTag(tagTimestamp, timestamp);22 dest = getLocalAddress();23 sys_migrate(src); // migrate back to source24 data = readTag(tagData);25 sys_migrate(dest); // bring data to destination26 writeTag(tagData, data);27 }28 }29 }30 }

Figure 3.11: Implementation of SPIN with Smart Messages

The protocol is initiated by injecting aDisseminate SMinto a node that produces data. This

SM blocks ontagData (line 10) waiting for new data. Each time new data is produced, the

SM reads thetagTimestampand spawns itself (lines 11-12). The “child” SM migrates to all

one-hop neighbors to advertise the new data (line 15). If a destination node does not have this

data or more recent data, the “child” SM updates thetagTimestampand migrates back to the

source to bring the data (lines 16-23). Upon data arrival (lines 24-26), the “child” SM executes

recursively the same algorithm until the data is disseminated in the entire network.

42

3.5.3 Directed Diffusion using Smart Messages

For the implementation of Directed Diffusion using SMs, the tag space at each node hosts

three tags: the most recent data value (tagData), the best data rate available at that node (tag-

DataRate), and the best next hop toward the source (tagBestRoute). Directed Diffusion is

initiated by injecting an SM at the sink. The execution of this SM has two main phases: (1)ex-

plorationstarts at the sink and floods the network to find data of interest, and (2)reinforcement

chooses the best path and brings data from source to sink.

If the information of interest is not locally available (notagDataRatevalue), theexplore

SMspawns itself; the “child” SM migrates to all neighbors, while the “parent” SM blocks on

tagDataRate. This operation is performed recursively at every node until an SM reaches a node

containing thetagDataRate. At this point, the “child” SM migrates back to its parent carrying

the discovered data rate. If the new data rate is better than the value stored intagDataRate, the

SM updatestagDataRatewith the new value andtagBestRoutewith its source as the best node

in the path toward the source of data. This update unblocks the “parent” SM which will carry

the data rate one hop back. Eventually, the sink node is reached and the reinforcement phase

begins.

During the reinforcement phase, acollect SMmigrates to the best next hop starting from the

sink. At each intermediate node, this SM spawns; the “child” SM migrates to the best next hop,

while the “parent” SM blocks waiting for data. When the SM reaches the source, it spawns

new SMs to carry the data one hop back at the promised data rate. Recursively, a blocked SM

is awaken by the data arrival, and it will carry the data back until it reaches the sink.

3.6 Smart Messages Simulator

For large scale evaluation, we have developed an event-driven simulator, similar to ns-2 [57],

extended with support for SM execution. The simulator is written in Java to allow rapid pro-

totyping of applications. To get accurate results, both the communication and the execution

time have to be accounted for. The simulator provides accurate measurements of the execution

time by counting, at the VM level, the number of cycles per VM instruction. To account for

the execution time, we have simulated each node with a Java thread, and we have implemented

43

a new mechanism for scheduling these threads inside JVM. The communication model used in

our simulator is “generic wireless”, with contention solved at the message level. Before any

transmission, a node “senses” the medium and backs-off in case of contention.

3.7 Simulation Results

The main goal in conducting the simulation experiments was to quantify the data convergence

time for our implementations of SPIN and Directed Diffusion using SMs and to compare these

results with the results for traditional message passing implementations. We define the data

convergence time as the time when a certain percentage of the total number of nodes have

received the data (SPIN), or the data rate (Directed Diffusion). In both cases, due to flooding,

all nodes end up receiving the data and the data rate. SPIN completes after all nodes have

received the data, while Directed Diffusion will start the reinforcement phase after all nodes

have received the data rate. We use the same network configuration for all experiments. The

network has 256 nodes distributed uniformly over a square area, and each node has the same

transmission range. The average number of neighbors per node is 4.

The first set of experiments evaluate the data convergence time when only one SM is in-

jected in the network. Figure 3.12 presents the data convergence time for a single Directed

Diffusion SM, with the sink and source located at the diagonal corners of the square region.

We plot the data convergence time for three different cases of the same SM and a base case

for the same application using passive communication (no SM). The top curve shows the time

when code caching is not used. In the second curve, we can see an improvement of more than

4 times in performance when code caching is activated during the first execution of the SM in

the network. The code is cached when an SM visits a node for the first time and will be used by

subsequent SMs during the same execution. The effects of caching are very important in this

case because the SMs visit a node multiple times in Directed Diffusion: they travel the network

both forward (looking for the source) and backward (diffusion of data rate). In the third curve

we can observe a 30% decrease in the completion time when the code is already cached at all

nodes. The fourth curve shows the data convergence time for a traditional implementation: the

protocol is implemented at each node, only data is transferred through the network, and the

44

Figure 3.12: Directed Diffusion using SmartMessages

Figure 3.13: SPIN using Smart Messages

Figure 3.14: Directed Diffusion - MultipleSmart Messages

Figure 3.15: SPIN - Multiple Smart Messages

execution time is not accounted for. We observe that the degradation in performance for our

implementation, when the code is cached at all nodes, compared to the traditional implemen-

tation is only 5%. We believe that this is a reasonable price for the flexibility to program any

user-defined distributed application in NES.

Figure 3.13 plots the same curves for a single SPIN SM launched in the network at a node

located in a corner of the square area. During the first execution, code caching leads to a 3 times

improvement in performance (i.e., reducing the size of SMs is essential for a protocol based

on flooding and three-stage communication). The third curve shows a 30% decrease in the

completion time (similar to Directed Diffusion) when the code is already cached at all nodes.

The completion time increases from 10% to 15% compared to the traditional implementation.

The second set of experiments quantify the performance of our applications when multiple

SMs run simultaneously in the network. Figures 3.14 and 3.15 show the data convergence time

45

for both Directed Diffusion and SPIN with the code already cached at nodes. For these experi-

ments, data convergence time is the time when a certain percentage of nodes have received the

data (or data rate) for all the SMs running in parallel. The nodes at which the SMs start are dis-

tributed uniformly in the network. The results show that data convergence time increases with

the number of SMs, but only during the initial flooding phase because of increased contention

in the network. After that, the shapes of the curves are the same, independent of the number of

SMs. The results also indicate that SPIN completes faster than Directed Diffusion in all cases

(i.e., 2.3 s compared to 3.4 s for the top curves in the figures). The cause is that SPIN floods

only the neighbors and then brings the data to them, while Directed Diffusion needs to flood

the entire network until it finds the source and then brings the data rate back to all nodes. In

the initial phase Directed Diffusion generates more messages in the network leading to higher

contention, but its performance will increase as soon as the reinforcement phase begins.

3.8 Summary

In this chapter, we have presented the Smart Messages (SM) distributed computing platform

which provides a common execution environment for distributed applications developed on top

of highly dynamic networks of embedded systems. The SM platform overcomes the volatility,

heterogeneity, and scale encountered in these networks by using execution migration, content-

based naming, and self-routing. Furthermore, the SM system architecture is suitable for re-

source constrained devices since it defines a lightweight system support at nodes, with most of

the “intelligence” incorporated into SMs.

To prove that virtually any user-defined distributed application can be implemented using

SMs, we have implemented and evaluated through simulations two previously proposed appli-

cations for sensor networks, SPIN and Directed Diffusion. The simulation results show that the

SM platform is able to provide high flexibility for user-defined distributed applications while

limiting the increase in the response time to at most 15% over the traditional non-active com-

munication implementations.

46

Chapter 4

Smart Messages Self-Routing Mechanism

This chapter presents the Smart Messages (SMs) self-routing mechanism. Similar to most

mobile ad hoc networks, the separation between hosts and routers disappears in NES. In our

approach, there is no support for routing at nodes. SMs are responsible for their own routing

in the network (i.e., self-routing), and they can control routing in two ways: select their routing

algorithms, or change the routing algorithm during execution. To show how routing algorithms

can be implemented with SMs, we describe four such implementations corresponding to dif-

ferent types of routing. The chapter concludes with simulation results that demonstrate the

benefits of the self-routing mechanism.

4.1 Content-Based Migration

The key SM operation is content-based migration, which implements routing. Each SM has to

include at least onerouting brickamong its code bricks. A routing brick defines a high level

migratefunction. SMs name the nodes of interest by tag names, which denote content or prop-

erties, and then callmigrate to route them to a node that has the desired tags. Additionally,

migratecan be instructed to check if the nodes with these tags meet certain conditions (i.e.,mi-

grateimplements a conditional content-based migration). This function is a user-level function,

which can be provided as a library routing brick (e.g., implemented by system programmers)

or implemented directly by application programmers. For instance, a simple implementation

of migratetakes a list of tag names as parameter and migrates the SM to a node that contains

all those tags. Nothing precludes, however, a programmer to express more complex conditions

within this function.

Commonly,migratetakes atimeoutas an additional parameter in order to deal with network

volatility. If a timeout occurs (i.e., the routing algorithm has not been able to find a node of

47

1 int n = 0, sum = 0, lifetime = 1000; //stored in data brick2 String tempTag = "Temp", avgTag = "AvgTemp"; //stored in data brick3 createTag(avgTag, lifetime, null);4 while(n < 10){5 if (migrate(tempTag, timeout)){ // true on a node of interest6 sum += readTag(tempTag);7 n++;8 } else{ // migrate returns false in case of timeout9 if (n >= 5)

10 break; // go ahead if average over at least 5 nodes11 return; // otherwise, abort the execution12 }13 }14 if (migrate(AvgTag, timeout))15 writeTag(AvgTag, sum/n);

Figure 4.1: Example of Smart Message Using Content-based Migration

interest during the given period), the SM regains the control at an arbitrary node. In this way,

the SM is able to quickly adapt to changing network conditions. For instance, it may decide to

change the routing, change the nodes of interest, or abandon the migration.

Figure 4.1 illustrates the use ofmigratein an SM. To compute the average temperature over

a certain geographic region, the SM needs to run on ten nodes providing temperature sensors.

To simplify the example, we use a single tag name (“Temp”) as parameter ofmigrate. The SM

starts by creating a tag for average temperature at the source node (line 3). Then, it callsmigrate

(line 5) until ten nodes are visited and the sum of temperatures is computed. If ten nodes have

been found, the SM callsmigrateagain to return to the source and writes the average value in

the AvgTag (lines 14-15). The migration to the source node may use a different routing brick

than the first one, and implicitly, another implementation ofmigrate.

If a route to a node of interest is not found, the SM will not stay in the network forever (i.e.,

an SM can use limited resources and if it stays for too long in the network, it will eventually

be dropped by a node). This is ensured by thetimeoutparameter ofmigrate. If the timeout

expires before finding one of the ten nodes,migratetimes out on an arbitrary node and returns

the control to the SM. In this example, if a timeout happens, the SM accepts a partial results if

at least half of the nodes have been visited (lines 8-10). This is a simple example of application-

defined quality of result, which shows the ability of SMs to adapt to adverse network conditions.

48

1 String tagID, routeTagID; // stored in data brick2 int migrateTimeout; // stored in data brick3 boolean migrate(tag, timeout){4 tagID = tag;5 routeTagID = "route" + tag;6 migrateTimeout = timeout + getLocalTime();7 while(readTag(tagID) == null){8 Address nextHop = readTag(routeTagID);9 if (nextHop != null){

10 sys_migrate(nextHop);11 if (migrateTimeout <= getLocalTime())12 return false; // migrate timed out13 }else{14 RouteDiscovery rd = getDataBrick("RouteDiscovery");15 rd.setTag(tagID);16 createSM("RouteDiscovery", rd);17 int blockTimeout = migrateTimeout - getLocalTime();18 if (blockSM(routeTagID, blockTimeout) == TIMEOUT)19 return false; // migrate timed out20 }21 }22 return true; // migrated to a node of interest23 }

Figure 4.2: Example ofmigrateImplementation

For instance, the SM might never complete if ten nodes providing temperature readings do not

exist in that region.

Figure 4.2 shows an example of amigrate implementation usingsysmigratefor one-hop

migration and routing tags. To be capable of routing, SMs need to maintain routing information

within the tag space. They create tags at visited nodes, caching discovered routing information

in the data portion of these tags. Since tags are persistent across SM executions (as long as

their lifetimes have not expired), the routing information can be used by subsequent SMs with

similar interests, thus amortizing the route discovery effort over time.

In our example, we present a simple on-demand routing based on flooding the network

when looking for a tag name. As long as a next hop toward a node of interest is available, the

entire SM eagerly migrates there (lines 8-10). If the migration timeout has expired, the routing

code returns the control to the application code of the SM (lines 11-12). The assumption in

this example is that all the nodes have the time synchronized (i.e., using either GPS receivers

or more accurate time synchronization algorithms [40, 75]).

49

If a route toward a node of interest is not available, a route discovery SM is created and

its data brick is initialized with the tag name that defines the nodes of interest (lines 13-16).

The goal of the newly created SM is to migrate through the network, find nodes of interest,

set routes to these nodes, and report back the newly learned routes. During this process, the

current SM is blocked waiting for routing information (line 18). The blocked SM is woken up

when the discovery SM returns with a route and writes the routing tag. The implementation of

the route discovery SM is presented in Section 4.3, which describes various classes of routing

algorithms implemented with SMs. A problem generated by content-based routing (not shown

in this example) is how to ensure that the SM does not end up on a node of interest already

visited. In our programs, we have used two solutions. One is to to let the SM record the nodes

of interest visited and pass this list as a parameter tomigrate. The other one is to “mark” the

visited nodes with temporary tags.

4.2 Application Examples

To illustrate the flexibility provided by self-routing, we present several scenarios for applica-

tions that benefit from this mechanism. These scenarios correspond to the two possible ways

for an SM to control the routing: (1) choosing its routing algorithms, and (2) dynamically

changing its current routing algorithm. Section 4.3 describes the SM implementations of the

routing algorithms supporting these applications.

4.2.1 Selecting the Routing Algorithm

A first scenario involves an application that needs to perform image recognition on a number

of camera nodes that have acquired an image with a certain resolution within a given time

interval. In the absence of routing information, a naive solution would be to use an on-demand

content-based routing algorithm to discover camera nodes. Once migrated to a camera node,

the SM has to check if the resolution of the image and its acquiring time satisfy the application’s

requirements, and then proceed with the computation. The disadvantage of such a method is

that the SM has to pay the cost of migrating to nodes that do not satisfy the requirements of

the application (e.g., they have low resolutions or old images). The self-routing mechanism

50

SM injected

On-Demand Routing Space reached

SM done

Geographical Routing

(a)

Figure 4.3: Dynamic Change of Routing Due to Application’s Requirements

allows the application to define its own routing that discovers only the nodes having the desired

combination of tag names and values (i.e., they satisfy the required content-based condition).

Thus, the network bandwidth, the energy consumed, and the response time are all reduced for

this application. It is important to mention that self-routing offers the power to use any arbitrary

condition expressed by a program to select the nodes of interest.

A second example presents an SM routing algorithm that builds an ad hoc content-based

topology over a network of hand-held devices belonging to the attendees at a conference. For

instance, CEOs attending a conference may decide to have an important discussion and, for se-

curity reasons, they would like to have their messages sent directly to destinations or forwarded

toward destinations only by other CEO devices. Under the assumption that it is possible to ob-

tain a connected graph using only CEO nodes, a simple SM routing algorithm can be developed

such that the routing entries stored in the tag space have thenext hopvalue set always to a CEO

node.

4.2.2 Dynamically Changing the Routing Algorithm

Using multiple routing bricks during the lifetime of an SM may improve the completion time

or even help the application complete in the presence of adverse network conditions.

Figure 4.3 presents an SM that incorporates two routing bricks comprising of a geographical

routing and an on-demand content-based routing. The nodes containing the tag of interest are

51

SM injectedRouting Alg R1

SM timeoutsRouting Alg R2 SM done

Dense networkLow mobility

Sparse networkHigh mobility

(b)

Figure 4.4: Dynamic Change of Routing Due to Network’s Conditions

colored grey, but the application is interested only in the grey nodes located in the circular

region. Therefore, a simple on-demand content-based routing would perform poorly since it

would have to flood the entire network to discover the nodes of interest. The performance can

be radically improved if the application has knowledge about the geographical region where

the nodes of interest should reside. In such a case, a geographical routing is used to reach the

desired region. Once there (the black node in the figure), the SM changes its routing to the

on-demand content-based algorithm which will flood only a limited area.

Figure 4.4 shows another example of an SM that changes its routing dynamically. The

grey nodes are nodes of interest for the application. In the dense and relatively stable part of

the network, the SM may use routes established by a proactive routing algorithm. Once the

SM enters the unstable part of the network, the adverse conditions (low density of nodes, high

mobility) lead to a timeout in themigrate call. Let us assume that the SM is executing on

the black node when the timeout expires. At this time, the SM decides to change its routing.

It does so by calling amigrate function which corresponds to an on-demand content-based

routing. Using the new routing, the SM is able to visit all nodes of interest and complete its

execution.

52

4.3 Implementing Routing Algorithms with Smart Messages

In the following, we describe briefly the proof-of-concept implementations for several routing

algorithms using SMs. It is not our intention to show finely tuned routing implementations.

Our goal is to show the potential of the SM self-routing mechanism in implementing flexible

content-based routing in NES. With this mechanism, virtually any routing algorithm for ad hoc

networks can be used to implement amigratefunction.

4.3.1 On-Demand Content-Based Routing

Previous research, such as DSR [41] and AODV [68], has shown that on-demand routing is

suitable for highly mobile environments. We extend this work to implement an on-demand

content-based routing algorithm using SMs. Figure 4.5 presents a simplified implementation

of an on-demand routing (similar to AODV). Essentially, AODV builds routes using a route

request/route reply query cycle. When a source node needs a route to a destination for which it

does not already have a route, it broadcasts a route request packet across the network. Nodes

receiving this packet update their information for the source node and set up backward pointers

to the source node in the routing tables.

Each time routing information is not available at the current node, a route discovery SMs

flood the network looking for either a node of interest (defined by a certain tag name) or for

a node containing routing information about a node of interest. An SM that arrives at a node

already visited stops its execution (lines 5-6). A new node is marked, and if it does not have

the required data, the SM migrates to all one-hop neighbors and sets backpointers to the source

of one-hop migration (lines 7-10). After finding a node of interest or a route to a node of

interest, a route discovery SM returns to its source and sets up the routing tags at each node in

the path (lines 13-19). The first SM updating the routing tag at the source unblocks the initial

SM, which subsequently migrates to the next hop, as shown in Figure 4.2. Each time the next

hop becomes unavailable, the route discovery process is restarted. Thus, routing around broken

paths is possible. Such situations emphasize one of the advantages of using self-routing SMs

over the traditional request/reply paradigm: an application is able to make progress even in

poor network conditions, moving toward nodes of interest and eventually arriving there. In the

53

1 String tagID, markTag, routeTagID, prevTagID; // stored in data brick2 Address prevHop, nextHop; // stored in data brick3 int lifetime; // stored in data brick4 while((readTag(tagID) == null) && (readTag(routeTagID) == null)){5 if (readTag(markTag) != null)6 return;7 createTag(markTag, lifetime, "visited");8 prevAddr = getLocalAddress();9 sys_migrate(all); // migrate to all neighbors

10 createTag(prevTagID, lifetime, prevAddr);11 }12 // found tagID or a tag with a route to tagID13 while(prevHop != null){14 nextHop = getLocalAddress();15 sys_migrate(prevHop);16 createTag(routeTagID, lifetime, nextHop);17 prevHop = readTag(prevTagID);18 writeTag(RouteTagID, previous());19 }

Figure 4.5: Example of On-demand Routing Implementation with Smart Messages

request/reply paradigm, the round-trip communication may never complete and the application

may fail to achieve any result.

4.3.2 Geographical Routing

Unlike traditional distributed systems where the physical location of nodes does not matter,

the spatial distribution of nodes across the physical space is a key feature of massive NES.

Many times, the applications running in NES will prefer to express their interest for content

located within well-defined geographical regions. Therefore, a geographical routing algorithm

becomes a necessity. GPSR [48] is a well known geographical routing that makes greedy for-

warding decisions using only information about a node’s immediate neighbors. When a packet

reaches a region where greedy forwarding is impossible, the algorithm recovers by routing

around the perimeter of that region. We have implemented a simple geographical routing, sim-

ilar to the greedy forwarding used by GPSR, that takes a circular region as parameter and keeps

migrating the SM to the neighbor node closest to the center of the region until it reaches a

node located within that area. The SM system software at nodes provides the list of one-hop

neighbors together with their locations.

54

4.3.3 Proactive Routing using Bloom Filters

Exchanging routing information among all nodes in NES is practically impossible, but a lim-

ited exchange of information among neighbors can be useful even in the absence of global con-

vergence. We have implemented an algorithm that maintains approximate information (sum-

maries) about content location in the network as Bloom filters [16]. This algorithm is similar to

probabilistic routing [72]. A Bloom filter is a bit vector of lengthn that uses several indepen-

dent hash functions to map the elements of a set to integers in a[0,n) interval. To form a Bloom

filter summary, each element in the set is hashed and the bits in the bit vector associated with

the hash functions are set. For an element lookup, the element is hashed and the corresponding

bits are checked. If all the bits are set, there is a certain probability that the element is contained

in the set. Thus, false positives can occur. Whereas, if any one of the bits is not set, we can

guarantee that the element is not in the set.

Our algorithm builds summaries for the content (i.e., tag names) present at each node. These

summaries are disseminated among neighbors, and they are diluted as they move away from the

source. Nodes closer to certain content have more accurate knowledge about its existence than

nodes farther away from it. This information continues to degrade as we move farther from the

content. However, it is still possible for an SM to discover a route to a content located far away

from its current node using the approximate information maintained locally. This knowledge

may not be accurate, but it is expected that the next hop will be able to provide more precise

information. Thus, choosing nodes which have an a priori better knowledge about the location

of the content as intermediate hops may finally lead to the desired destination.

Initializing the network for the proactive algorithm can be done on demand by injecting

a RoutingSM that will replicate itself at the participating nodes. TheRoutingSMs maintain

summaries about the information learned so far and store them in the tag space. They maintains

exact summaries for the local node and its one-hop neighbors, but only approximate informa-

tion about their larger neighborhood. The approximate information for a node locatedN hops

away from a content is a logicalOR of the summaries for the nodes located up toN-1 hops

away (N is an implementation parameter).

Routing SMs block on a tag, and they wake up in two situations. First, they are woken

55

1 1 0 1 0 1 0 0

1 1 1 0 0 1 0 1

1 0 1 1 0 0 0 0

0 0 1 1 1 0 0 1

1 1 0 1 0 0 1 1

1 1 0 1 0 1 0 0

0 0 1 1 1 0 0 1

1 0 0 1 0 0 1 1

1 1 0 0 0 0 1 1

1 1 1 0 0 0 0 1

1 1 1 0 0 1 0 0

1 0 0 1 0 0 1 1

1 1 0 0 0 0 1 1

A

B

C

D

E

F

G

S(B)

S(C)

S(A)

S(F OR G)

S(D OR E)

S(C)

S(G)

S(F)

S(B)

S(G)

S(F)

S(E)

S(D)

hash(rain)={7,6,4,2}

hash(fire )={7,4,1,0}

rain ?

fire ?

FIRE

RAIN

Figure 4.6: Lookup in Proactive Routing: An SM arrives at node A, looking for a “fire” tag.Applying the hash functions on “fire”, it concludes that the neighbors of C might know betterabout “fire”, and migrates to C. A lookup on node C leads to the conclusion that the “fire” tagexists on node F.

up by SMs bringing new summaries. And Second, they wake up periodically to disseminate

information in the network. In the initialization phase, the local summaries are disseminated

to each of the neighbors one-hop away. After the exact summaries of the neighbors have been

received, the local summaries are updated, and new SMs propagate them. Periodically, each

RoutingSM creates aheartbeatSM that migrates to each of its immediate neighbors, informing

them that the local node is still alive. If noheartbeatis received from a node within a timeout

period, the node is assumed to be dead and its summary is discarded. When new summaries are

modified, theRoutingSM incorporates the differences (if any) in theheartbeat, and informs all

its immediate neighbors about the change. This change is recursively propagated byheartbeats

created byRoutingSMs residing on the neighbor nodes.

A migrating SM checks the summaries at each node to find routes to the desired content.

An SM arriving at a previously visited node, looks up the summary and chooses a different

neighbor that may have the desired content. If no such neighbor exists, the SM randomly

chooses a neighboring node for migration. It stops its execution if it arrives again at the same

node, in order to avoid a loop.

Figure 4.6 shows an example of a lookup operation performed in a network with intelligent

cameras. The cameras are programmed such that a tag is set when fire or rain is detected. When

56

DisseminateWest East

South

North

waiting for routesSM blocked

Figure 4.7: Rendez-Vous Routing with Smart Messages

the application arrives at node A, it looks up the summaries to find the next hop which has the

“fire” tag, or more precisely information about the location of this tag of interest. The routing

algorithm applies the hash functions on “fire” and checks if the hash value can be matched

against the local summaries. It concludes that the neighbors of C might know better about

“fire” and migrates to C. The same algorithm is applied at C, and the SM discovers the “fire”

on node F.

4.3.4 Rendez-Vous Routing

We introduce the term “rendez-vous” routing to define a category of routing algorithms that

use a combination of on-demand and proactive routing. Such an approach can be beneficial

for certain applications both in terms of scalability and adaptability to highly dynamic network

configurations. For example, with such an approach, we can disseminate routes for important

content in certain locations across the network, and then the SMs can migrate to these locations

to find the necessary routing information. Conceptually, the idea of rendez-vous routing is

similar to the one presented in the Internet Indirection Infrastructure [81].

Figure 4.7 illustrates a simple rendez-vous routing that combines geographic dissemination

57

with limited flooding. An SM, running at the grey circle in the figure, needs routing information

for a certain tag name. The algorithm starts by broadcastingExploreSMs to one-hop away

neighbors, and then blocks waiting for routing tag updates. TheExplore SMs check if the

neighbor nodes have the given routing tag. If the tag is found, anExploreSM returns at source

and updates the routing tag; this operation unblocks the initial SM. If routing information is not

available at the neighbors,ExploreSMs create a tag for the desired routing data, and block on

it. If no result is received for a certain amount of time (passed as timeout parameter to theblock

call), eachExplorebroadcasts itself one more hop and doubles the timeout. The algorithm

works recursively until it reaches the established limit of the number of hops to be visited. The

exploring process is stopped by the initial SM after it receives the required route. This SM

floodsCancelSMs, which will let theExploreSMs know that they have to finish. ACancel

SM stops at the first node that does not contain the desired routing tag (i.e., noExplorehas

passed through that node).

SMs that create important tags generate a new SM to disseminate routing information.

According to our algorithm, an SM running at the black node in the figure creates 4 SMs

that will travel in 4 directions, based on geographic coordinates: East, West, North, South.

These SMs are inherently loop free. We assume that each node knows its own location and

its neighbors locations. The intuitive idea behind our approach is that the rendez-vous can

happen in two situations: (1) one of the dissemination SMs intersects the flooded area, or (2)

anExploreSM reaches a node storing the disseminated information. There are two advantages

to this rendez-vous algorithm. First, we avoid a global dissemination, which would be too

expensive in terms of network resources, but at the same time we propagate routing information

eagerly. And second, we limit the flooding process that takes place in on-demand algorithms.

Consequently, routes to important information are discovered faster, and the response time for

applications decreases. In the example presented in the figure, the SM moving North updates

the routing tag at the grey square node, and theExploreSM blocked there brings the routing

information to the source (grey circle).

58

0

0.5

1

1.5

2

2.5

0 1 2 3 4 5

Com

plet

ion

Tim

e (s

ec)

Number of Nodes of Interest

On-Demand Routing Conditional On-Demand Routing

Figure 4.8: Completion Time for Experiment1

0

1000

2000

3000

4000

5000

6000

0 1 2 3 4 5

Byt

es S

ent i

n th

e N

etw

ork

(KB

ytes

)

Number of Nodes of Interest

On-Demand Routing Conditional On-Demand Routing

Figure 4.9: Bytes Sent in the Network forExperiment 1

4.4 Simulation Results

Our main goal in conducting the simulation experiments was to quantify the effects of the

self-routing mechanism for applications running in large scale NES. We choose two metrics

to analyze the performance of our solution: (1)the completion timewhich measures the user-

observed response time for an application, and (2)the total number of bytes sentwhich mea-

sures the total amount of traffic (generated by an application) throughout the network. This

metric implies the energy and bandwidth consumed by an application and consequently, it also

indicates the overall lifetime of the network. For all the simulations, we uniformly distribute

256 nodes in an 1000m by 1000m square. The transmission range for each node is 100m. A

node can communicate with an average of 6 neighbors (ranging from 2 to 11 neighbors) at the

network bandwidth of 2Mb/s.

Our first set of simulation experiments studies the SM feature that allows programmers

to select the most appropriate routing for their applications or even to implement their own

routing. The SM starts on a node located in the bottom-left corner of the square region that

contains the network. The goal of this application is to visit a number of nodes of interest

(defined by a given tag name) which satisfy a certain condition. Without loss of generality,

we simply check if the value associated with the given tag is over a certain threshold. We use

two on-demand routing algorithms for this experiment (similar to those described in 4.2.1): a

simple on-demand content-based routing, and a conditional on-demand content-based routing

(which enhances the simple on-demand algorithm with a few lines of code that checks the

desired condition).

59

0

0.5

1

1.5

2

2.5

0 200 400 600 800 1000 1200 1400 1600

Com

plet

ion

Tim

e (s

ec)

Region Radius (meters)

On-Demand Routing Geographic + On-Demand Routing

Figure 4.10: Completion Time for Experi-ment 2

0

1000

2000

3000

4000

5000

6000

0 200 400 600 800 1000 1200 1400 1600

Byt

es S

ent i

n th

e N

etw

ork

(KB

ytes

)




0

0.5

1

1.5

2

2.5

0 100 200 300 400 500 600 700 800

Com

plet

ion

Tim

e (s

ec)



Figure 4.12: Completion Time for Experi-ment 3

0

1000

2000

3000

4000

5000

0 100 200 300 400 500 600 700 800 900

Byt

es S

ent i

n th

e N

etw

ork

(KB

ytes

)




We distribute uniformly over the network area a total of five nodes containing the tag of

interest and vary the number of nodes of interest (in this experiment, nodes whose tag values

satisfy the desired condition) from one to four (by setting the values of the tags of interest).

Since the results of using the simple on-demand content-based routing depend on the order

of visiting the nodes, we take all the possible combinations and compute the average for both

routing algorithms. Our results indicate that the conditional routing improves the response

time with as much as 40% (see Figure 4.8) because it does not visit any unnecessary nodes

(i.e., nodes that have the desired tag, but the tag value does not meet the condition) whereas the

simple routing does.

Additionally, our bytes-sent results (see Figure 4.9) indicate that the conditional routing

consumes significantly less energy and bandwidth (40% fewer bytes sent for one node of inter-

est) than the simple routing. As expected, when the number of nodes of interest increases, the

savings of our conditional routing are less evident because the simple on-demand routing visits

60

fewer unnecessary nodes. When the number of nodes of interest is close to the number of nodes

hosting the tag of interest, the simple routing even performs slightly better than the conditional

routing. The primary reason is that the code size of the conditional routing is approximately

150 bytes larger than that of the simple routing. Even though this additional size is small, the

impact of the additional overhead for programming the network becomes noticeable, given that

the network size is sufficiently large.

In the second set of experiments, we study the SM ability to change its routing during

execution. Specifically, we compare SMs using only on-demand routing with SMs using a

combination of geographical and on-demand routing (as described in 4.2.2). The SM starts

on a node located at the bottom-left corner of the region. The goal of this SM is to visit five

nodes of interest identified by a given tag name. The network contains exactly five nodes of

interest uniformly distributed over a region delimited by a circular area with the center at the

opposite corner and a 500m radius. If the SM has approximate information of the geographical

region containing these nodes, it can migrate to this area using geographical routing. Upon

reaching the specified area, the SM changes dynamically its routing to geographically-bound

on-demand routing (i.e., on-demand routing that floods a limited region) in order to discover

the target nodes. In our simulations, we vary the approximate geographical information (of the

target nodes) by changing the radius of the circular region defined above (the nodes of interest

remain the same).

The performance of the on-demand routing remains constant (regardless of the radius) be-

cause this simple on-demand routing always floods the entire network (see Figure 4.10). Con-

versely, the more accurate the target area is, the faster the combination scheme completes (as

much as 38% reduction in completion time). For the 1500m radius, the combination scheme

performs roughly the same as the on-demand algorithm because the target region already covers

the entire network.

It is well documented that the use of flooding in large scale networks adversely impacts the

system scalability [61]. Figure 4.11 shows that the combination approach can significantly im-

prove the scalability by reducing the total number of bytes sent in the network (consuming less

energy and bandwidth). The combination scheme can achieve up to 80% energy and bandwidth

savings. Surprisingly, for a larger radius (≥ 1100m), on-demand routing sends fewer bytes than

61

the combination scheme. There are two reasons for this result. First, given such a large target

region, the combination scheme unavoidably floods almost the entire network. Second, the

code size of geographically bound on-demand routing is 400 bytes larger than that of simple

on-demand routing. This additional code size can significantly decrease the performance, given

the sufficiently large network size and the flooding nature of our on-demand routing.

Nevertheless, for some SMs, the combination scheme can achieve much better perfor-

mance. Similar to the previous experiment, we consider an SM that starts at the same node

at the bottom-left corner. However, unlike the previous experiment, the goal of this SM is to

visit three nodes, each of which residing in one of the other corners. Additionally, the SM

has to visit these three nodes (identified by different tag names) in clockwise order. Under our

investigated scenarios, the combination scheme (with limited flooding) expectedly completes

faster (between 25% and 40%) than the on-demand algorithm which floods the entire network

(Figure 4.12). The difference between full flooding and limited flooding is more evident be-

cause the on-demand routing floods the entire network three times. Such faster completion

time conforms with the fewer bytes-sent result in Figure 4.13 (between 62% and 92% bytes

savings).

4.5 Summary

In this chapter, we have presented the Smart Messages (SM) self-routing mechanism.The main

feature of SM self-routing is its flexibility in the presence of highly dynamic network config-

urations. Content-based migration is the high level primitive used by applications to name the

nodes of interest by content and to migrate the execution there. Using this primitive, SM appli-

cations can choose the most suitable routing for their needs, implement their own routing, or

change the routing dynamically. Our simulation results indicate that the above flexibility can

improve the responsiveness of SM applications and provide significant energy and bandwidth

savings.

62

Chapter 5

Prototype Implementation and Evaluation

This chapter presents the design and implementation of the Smart Messages (SMs) prototype,

as well as the implementation of Spatial Programming (SP) over an SM runtime system. The

SM prototype is implemented in Java over Linux. The SM system support is implemented

within Sun Microsystem’s K Virtual Machine which has a memory footprint suitable for re-

source constrained devices.

We also describe EZCab, an SM-based application for locating and booking free cabs

in densely crowded traffic environments using only short-range wireless communication. To

demonstrate the SP simplicity, we have implemented a simple intrusion detection application.

Throughout this chapter, we present experimental results for the basic SM operations, SM

routing algorithms, and the two applications mentioned above. The testbed used for the evalu-

ation consisted of ad hoc networks of PDAs (HP iPAQs) equipped with IEEE 802.11 wireless

cards. The experimental results demonstrate the feasibility of our approach in programming

distributed applications for outdoor computing environments.

5.1 Smart Messages Implementation

To leverage on the existing user base, we have implemented the SM prototype in the Java pro-

gramming environment over Linux. Specifically, we have modified Sun Microsystem’s KVM

(Kilobyte Virtual Machine) [2] because its source code is available and has a small memory

footprint (i.e., it is suitable for resource constrained devices such as those encountered in NES).

The SM API is encapsulated in two Java classes:SmartMessageandTagSpace. For efficiency,

we have implemented the API as Java native methods. Besides the KVM interpreter thread,

we have introduced two additional threads for admission control and local code injection. The

design of the SM computing platform is not specific to any hardware or software environment.

63

It can be implemented on any virtual machine (e.g., Mate [51], Scylla [80]), programming

language, or underlying operating system.

In the rest of this section, we describe the most important components of our prototype

implementation: the primitives for SM creation, the memory management mechanism which

ensures thread-safety in KVM, the lightweight migration mechanism, the code caching, and

the I/O tags. Currently, the admission manager is very simple; it accepts any SM as long as the

destination node has enough memory to accommodate this SM.

5.1.1 Creating New Smart Messages

New SMs can be created at a node by the local injector or the VM interpreter. Each SM in the

system is associated with a VM-level thread. The admission manager can also create VM-level

threads for SMs arriving from the network.

A user can inject a new SM by passing a Java class name and a list of arguments to the local

injector. The injector attempts to load, link, verify, and initialize the class file. Upon successful

initialization, the injector creates a new VM-level thread with an initial stack frame for themain

method of the class and inserts the thread into the ready queue. The arguments passed by the

user are pushed onto the stack as arguments of themainmethod. At this point, the VM-level

thread has no associated SM structure. When the VM-level thread starts its execution, it has to

call createSMFromFilesto associate itself with a new SM structure.

The interpreter thread also creates new VM-level threads in response tocreateSMand

spawnSMinvocations. When an SM callscreateSM, the data bricks of the new SM are cloned

from the current SM, and the code bricks of the new SM refer to the verified code bricks in the

code cache. ThespawnSMcall is similar tocreateSM, except that the new SM starts its execu-

tion from the next bytecode afterspawnSM. To implement this primitive, the execution stack

frame associated with the VM-level thread of the original SM is duplicated onto the VM-level

thread of the new SM.

64

5.1.2 Memory Management

The garbage collector in KVM is designed for a single-threaded environment. Since any of

the three threads in SM prototype (i.e., interpreter, local injector, admission manager) could

allocate memory from the dynamic heap, we protect the garbage collector data structures us-

ing a heap lock and restrict the garbage collection to a limited number of locations (i.e.,GC

Points[14]). We have modified the mark-sweep garbage collector in KVM such that garbage

collection is performed by the interpreter only during context switches (i.e., the interpreter

has a singleGC Point). The interpreter triggers a garbage collection during a context switch

if the available memory falls below a threshold. Before performing garbage collection, the

VM ensures that the admission manager and the local injector threads have reached theirGC

Points(defined as the regions where all valid memory references are reachable from the garbage

collector’s root set). TheGC Pointsof the three VM threads are demarcated using a single

read-write lock. During garbage collection, the interpreter thread holds the write lock. The ad-

mission manager and the injector hold the read lock to protect the critical regions from garbage

collection.

5.1.3 Lightweight Migration

One of the main obstacles in implementing an efficient execution migration arises from the

strong coupling between the execution entity and the host. For example, traditional process

migration needs to deal with sockets and file descriptors during migration. Two key features in

the design of our system helped us circumvent the problem of strong coupling.

First, the tag space shields the SMs from direct coupling with the underlying OS. The read

and write operations on tags are complete and atomic transactions; no state of the underlying

OS resources is kept in the SM structure. Hence, an SM can be completely extracted from its

execution environment, migrated, and resumed at destination.

Second, an SM program never creates a communication endpoint directly since it is based

on execution migration, not message passing. Communication channels are managed implicit

by the underlying system. In contrast, traditional message passing programs create communi-

cation channels explicitly to transfer data. Hence, SM programs do not have any reference to

65

OS network descriptors.

Our migration islightweight in the sense that we do not migrate the complete memory

referred to by SMs. Instead, we migrate data bricks which are explicitly identified in the SM.

To simplify the task of programmers, we migrate, however, thethisself-reference for non-static

methods. Therefore, these methods can use object member variables safely after migration.

For clarity of exposition, we will describe the SM migration mechanism as three logical

phases:SM capture, SM transfer, andSM resumption.

SM Capture Phase. An SM enters into this phase when it invokessysmigratedirectly

or as part of a routing library. In this phase, we convert the SM into a machine-independent

representation. The code bricks are already in the machine-independent Java class format, and

therefore, only the data bricks and execution stack frames need to be converted.

To implement this conversion, we have developed a simple object serialization mechanism

(i.e., KVM does not provide one). Each data brick is serialized into values and types repre-

senting its internal structure recursively. During serialization, we also generate a temporary

structure which provides a unique identifier for each data brick reference. The unique identi-

fiers of a data brick object and its sub-objects are determined solely by the structure of the data

bricks.

The execution control state of an SM is represented by the execution stack frames of its

associated VM-level thread. Each stack frame is serialized into a tuple of six values: current

offset of instructionandoperand stackpointers, method name, signature name, class name,

and a flag indicating whether the method is non-static. For non-static methods, we also encode

the machine-independent identifier for thethisself-reference.

SM Transfer Phase. Using the data brick and stack information sizes obtained during

the capture phase, the interpreter initiates a three-way handshake protocol with the destination

node. The operation of this protocol is shown in Figure 5.1. If the SM is accepted, the admission

manager sends back a list of missing code bricks as part of theacknowledgment. Otherwise,

the admission just drops the request. Upon the receipt of the acknowledgment, the source

node sends the complete SM, which consists of missing code bricks, serialized data bricks, and

execution control state. To simplify the implementation, we have used TCP for reliable single-

hop communication between neighbors. For better performance, single-hop communication

66

AdmissionManager

StackControlCB2 DB2DB1

CB1CB2...

...

Code Cache

Tag

Spa

ce

CB1 CB2

Running SM

DB1 DB2QueueSM Ready

AdmissionManager

CB1

...

...

Code Cache

Tag

Spa

ce

CB3

QueueSM Ready

Running SM

DB3 DB4CB1 CB3

Node1

Stack

VM

Send SM (4)

Send ResourceTable (1)

Missing = CB2 (3)Ack

VM

Stack

CheckCache (2)

Add CB2 (5)

Enqueue SM (6)

Node2

InterpreterInterpretersys_migrate

Figure 5.1: Smart Message Transfer (Main Operations)

can be implemented on top of a reliable single-hop protocol over 802.11.

SM Resumption Phase. After the admission manager successfully received the code

bricks, data bricks, and execution control information from a source node, a new VM-level

thread and its associated SM structure are constructed. The missing code bricks sent from the

source node are verified by the KVM verifier and stored in the code cache by the admission

manager. We have modified the existing KVM class loader to search the code cache each time

the VM needs a class. During data brick de-serialization, the admission manager constructs

a temporary structure (similar to the structure constructed during the data brick capture at the

source node) which maps a unique identifier to each data brick reference. The execution stack

frames are reconstructed using the tuples sent from the source. Finally, the interpreter thread is

notified if it is currently idle.

5.1.4 Code Caching

Each code cache entry consists of the Java class file of a code brick, a reference count, and a

reference to the internal VM class representation. The original class format is stored for future

migrations to nodes that do not have it cached. The reference count keeps track of the number

of SMs currently referring to this code brick. Each time an SM referring to this code brick

migrates or terminates, the reference count is decremented. When the reference count becomes

67

zero, the code cache entry is moved to a free list. Should the same code brick be referenced by

a new SM, the cache entry is resurrected from the free list. The memory associated with free

list entries is reclaimed according to an LRU policy. When a cache entry is evicted, the code

brick memory is freed, and the corresponding internal VM class representation is unloaded

(since KVM does not have a class unloading capability, we have implemented our own class

unloading mechanism).

5.1.5 I/O Tags for Interaction with the OS and I/O System

An application uses thereadTagandwriteTagprimitives to access an I/O tag. It is up to the

system to define the source of the data, butreadTagtypically translates to an OS call. A

writeTagtranslates to an OS call which sets certain parameters for an I/O device. Example of

I/O tags currently available in our prototype can be found in Table 5.4.

Since each I/O tag requires specific native code, adding new I/O tags involves adding new

native code to the node. We have identified three possible solutions for this issue. The first

option is to statically link the native code into the VM. This is not viable because adding

new I/O tags would involve shutting down the VM. The second option is to implement new

I/O tags as dynamic shared libraries. This is not viable because we cannot assume that every

node supports dynamic linking. The third option is to implement new I/O tags as external

processes which communicate with the VM using a standard interface. We have chosen the

third alternative since it enables users to dynamically extend the I/O tags without requiring the

VM to be shut down or the host to support dynamic shared libraries. For efficiency, a few basic

I/O tags (e.g.,free memoryandsystemtime) are implemented and linked permanently into the

VM executable.

Commonly, an I/O tag is associated with an external program, termedhandler, which in-

corporates the code for reading and writing this I/O tag. When the VM receives a request to add

a new I/O tag, it creates a new Unix process for this handler. We use Unixpipesfor communi-

cation between VM and the handler process. Figure 5.2 shows the interaction between an SM

and a handler process for a GPS device. When the SM issues a read request for theLocation

tag, the interpreter sends aread command to the handler and blocks waiting for an answer.

Once the handler has obtained the data from the GPS device (connected on the serial port in

68

Unix PipesGPS GPS

Device

Read Command

Handler

Interpreter

Tag Space

Serial (/dev/ttyS)

I/O Handler

Location ACL

Location l = readTag("Location");

Location Object

Figure 5.2: I/O Tag Example (Using GPS to Get the Current Location)

our example), the handler encodes the data and sends it back to the VM. The VM de-serializes

the results into a Java Object and returns it to the SM. A write operation is performed similarly.

Certain SMs may have a user interface (in the form of an external process) which allow

users to interact with SMs via special I/O tags, termed UI tags. Unlike regular I/O tags, a UI

tag behaves similar to a producer-consumer circular buffer. Each UI process can communicate

with multiple SMs. This communication is done through a pair of UI tags: awrite tag for

passing data to SMs, and aread tag for receiving data from SMs. These tags persist for the

entire duration of the UI process.

5.2 Smart Messages Evaluation

To evaluate the performance of our prototype, we have measured the cost of the SM primitives

and the completion time of two routing algorithms (on-demand and geographical routing).

Additionally, we have implemented and analyzed EZCab, a real-life application for booking

cabs in densely populated cities using only short-range wireless communication. The testbed

consists of HP iPAQ 3870 running Linux 2.4.18. Each iPAQ contains an Intel StrongARM

206Mhz processor, 32MB flash memory, and 64MB RAM memory. For communication, we

use Orinoco 802.11b Silver PC Cards.

5.2.1 Cost of SM Creation

createSMFromFiles. This primitive allows a user to inject a new SM at a node. After an

invocation, the VM loads the class files from the local file system, unless the classes are already

69

Time(ms)Size(KB)

Uncached Cached

1 2.622 0.0322 5.112 0.0344 9.953 0.0428 20.151 0.063

Table 5.1: Effect of Code Brick Size oncre-ateSMFromFiles

Time(ms)Size(KB)

spawnSM createSM

2 0.270 0.2434 0.367 0.3268 0.508 0.46916 0.913 0.822

Table 5.2: Effect of Data Brick Size onspawnSMandcreateSM

in the VM code cache, and creates a new SM structure. To evaluate its cost, we have performed

two series of experiments. In the first, we invokecreateSMFromFilesfor an un-cached class

of different sizes while keeping the data brick size constant (53 bytes). Then, we repeat the

same experiment with the class cached. In both experiments, we have used 1KB class files

and we varied the number of class files used to create an SM. Table 5.1 shows that the cost of

createSMFromFilesalmost doubles (when the code is not cached) as we double the size of the

code brick. These results show that the cost of class loading dominates the cost of creating a

new SM structure. The cost of creating a new SM structure is essentially the cost measured

when the code is cached.

createSM and spawnSM.Table 5.2 shows the costs ofspawnSMandcreateSMfor differ-

ent data brick sizes. The code brick and stack size are fixed at 1527 and 131 bytes, respectively.

Typically, an SM has a mixture of static and non-static call frames. Therefore, we consider a

stack consisting of two stack frames, one for a static method and the other for a virtual method

call. Although these two primitives are similar, the results show that the cost ofspawnSM

is slightly higher than the cost ofcreateSM. The difference is the time spent to duplicate the

execution stack frames forspawnSM.

5.2.2 Cost of SM Migration

The most significant factors that determine the cost of our migration are the data brick serial-

ization, the SM transfer, and data brick de-serialization.

Data Brick Serialization and De-Serialization. Since the code bricks need not be serial-

ized, we perform this operation only on data bricks and execution stack frames. Our measure-

ments indicate that the serialization cost for the execution stack frames is small compared to

70

Figure 5.3: Cost of Data Brick Serialization Figure 5.4: Cost of Data Brick De-Serialization

the cost of data brick serialization; it varies from 0.204ms to 0.567ms as we vary the execution

stack from 2 to 15 frames. To study the effect of data brick serialization, we vary the data brick

size from 2KB to 16KB, while using a fixed size code brick (1197 bytes) and two fixed size

stack frames (131 bytes).

Commonly, the data bricks in an SM consist of a mixture of objects and primitive types.

We use two types of data bricks in this evaluation: an array of integers, and an array of objects.

The serialization costs for these two data bricks provide practical lower and upper bounds for

the cost of data brick serialization. The object array represents an upper bound since each of its

elements causes a call to the top level VM serialization method. The integer array represents a

lower bound since it involves only one call to the top level VM serialization method.

Figure 5.3 shows that the serialization cost is below 6ms for data bricks as large as 16KB.

Commonly, the SMs process data at its source, and therefore, they carry small size data. The

applications that we have developed carry less than 2KB, which costs less than 1ms to seri-

alize. Figure 5.4 presents the de-serialization cost for the same data bricks. We observe that

de-serialization cost is as much as 30% higher than the cost of serialization due to memory

allocation during object de-serialization.

SM Transfer. The variation of execution control state size is small compared to that of

code bricks and data bricks. Thus, we only consider the effect of code bricks and data bricks in

the subsequent experiments. We have performed two sets of experiments to evaluate the cost of

migration (serialization, transfer, de-serialization) for different code brick and data brick sizes.

In the first set, we vary the code brick size while keeping the data brick size and stack frame

71

Figure 5.5: Effect of Code Brick Size on Sin-gle Hop Migration

Figure 5.6: Effect of Data Brick Size on Sin-gle Hop Migration

size fixed at 53 bytes and 131 bytes, respectively. In the second experiment, we vary the data

brick size while keeping the code brick size and stack frame size fixed at 1197 bytes and 131

bytes. Figures 5.5 and 5.6 show the results of these two experiments.

The values in Figure 5.5 represent the total time for single hop migration in two situations:

the code is not cached, and the code cached. The time to transfer the SM when the code is

cached is constant and represents the overhead of the three-way handshake protocol. Figure 5.6

shows that the data brick size contributes significantly to the total cost of migration. Thus, it is

important to have a serialization scheme with minimal space overhead.

5.2.3 Tag Space Operations

Table 5.3 shows the cost of the tag space operations for application tags. ThereadTagprimitive

has the lowest cost since it performs the least number of operations; when an SM reads a tag,

the interpreter acquires a lock, performs a lookup in the tag space, verifies the access rights,

and returns the data to the SM. ThewriteTagoperation costs slightly more since the interpreter

has to check for and unblock any SMs blocked on the tag. TheblockSMoperation costs more

than bothreadTagandwriteTagsince it also needs to append the SM to the SM blocked queue

and suspend the VM-level thread. ThedeleteTagprimitive has the second highest cost since the

interpreter needs to wake up all SMs blocked on the tag, remove the timer for the tag lifetime,

and remove the tag structure from the tag space, while thecreateTagprimitive has the highest

cost since it involves additional steps to register a timer for the tag lifetime and create access

72

Operation Time(µs)

createTag 101.781deleteTag 75.071readTag 34.548writeTag 50.289blockSM 59.844

Table 5.3: Cost of Tag Space Primitives forApplication Tags

Tag Name Time(ms)

gps location 0.20neighborlist 0.34imagecapture (32 Kb) 341.23light sensor 0.11batterylifetime 25.63systemtime 0.09free memory 0.12

Table 5.4: Cost of Reading I/O Tags

User Node Intermediate Node of Interest

Figure 5.7: Network Topology for Routing Experiments

control data structures.

Table 5.4 presents the access time to several I/O tags that are currently implemented in

our prototype: GPS location query, neighbor discovery, camera image capture, light sensor,

and system status inquiry (battery lifetime, system time, and amount of free memory). The

gps location is updated by a user-level process which reads from the GPS serial interface. The

location of the neighbors along with their identifiers are returned by reading theneighborlist

tag. This tag is typically used by geographical routing algorithms carried and executed by

SMs. To get the information about neighbor nodes, we have implemented a neighbor discovery

protocol which maintains a cache of known neighbors. For theimagecapture tag, the I/O

handler converts the image received from camera in YUYV format to RGB format before

returning it to the SM. All the other tag values are obtained directly from Linux using system

calls.

5.2.4 Routing Algorithms

We present the evaluation of two simple SM routing algorithms (geographical and on-demand

content-based) executed over our SM prototype. Since one of these routing algorithms might

73

Routing Algorithm Code not cached (ms) Code cached (ms)Geographical 415.6 126.6On-demand 506.6 314.7

Table 5.5: Completion Time for Routing Algorithms

be more suitable than the other for some applications, we do not intend to compare them. In

fact, a judicious use of both algorithms might yield significantly better results than each of them

separately.

Our goals in conducting this experimental evaluation study were three-fold: (1) to demon-

strate the flexibility of the SM architecture for application-level self-routing, (2) to understand

the re-programmability issues in NES, and (3) to explore the influence of code caching on our

unattended re-programmable system. Our testbed consists of eight HP’s iPAQs running Linux

and using Orinoco’s 802.11b PC cards for wireless communication. The network topology is

typically four hops across (see Figure 5.7). The SM starts at the grey node and discovers the tag

of interest at the black node using geographical routing or on-demand content-based routing.

In the first experiment, we measure the completion time of an SM using geographical rout-

ing. The SM routes itself from the grey node to the black node and returns on a different path.

The round-trip time for this task is 415.6 ms (Table 5.5). At the beginning of our experiments,

there was no SM program (or routing) installed at any node. Therefore, the result also includes

the latency imposed by programming the network. The program size of our SM with geograph-

ical routing is approximately 4.4KB. To factor out the installation latency, we study the impact

of code caching on this experiment by re-running the same SM at the grey node. The second

execution of the same SM (the code is cached by all nodes) takes only 126.6 ms (or 3.2 times

faster).

We also conduct a similar experiment for an SM with on-demand content-based routing.

When the code is not cached, the route discovery time for this SM is 506.6 ms. This result is a

bit surprising, given that the program size of this route discovery SM is only 2.8KB. However,

the result is reasonable given the significant delay imposed by the wireless contention (due

to route discovery flooding). When the code is cached, the route discovery time for this SM

decreases to 314.7 ms (or only 1.6 times faster). Understandably, one might also expect a 3-

times speedup for this SM after code caching. However, the impact of code caching is less

74

Figure 5.8: Route Discovery in EZCabFigure 5.9: Cab Booking following a RouteDiscovery in EZCab

evident when the program size is smaller, given an unavoidable overhead coupled with such

wireless contention.

5.2.5 Application Case Study: EZCab

To demonstrate the feasibility of the SM computing platform for real-world applications, we

have developed EZCab, an application for locating and booking free cabs in densely crowded

traffic environments (like Manhattan, where looking for a free cab can be an annoying experi-

ence). We envision that the use of embedded devices in cars will soon become a reality [60, 65].

Instead of calling a cab company or merely “gesturing” to negotiate a cab for her destination,

a client can simply inject an SM through her handheld device to perform seamlessly the same

action. Unlike the existing solutions for inter-car communication that are based on certain in-

frastructures (which are expensive, cannot be deployed on every road, and provide only limited

information), EZCab uses a peer-to-peer approach whose key benefits are scalability and prac-

ticality. The minimal infrastructure needed by EZCab is the availability of the SM support in

the cabs, a location service (e.g., GPS), and wireless connectivity.

The main component of EZCab is an SM that migrates to a cab identified by aFreeCabtag,

negotiates the price according to a client-established limit, let the cab know the identity of the

client, and instructs the cab to go to the client’s location. The booking is complete after the cab

sends a message with its identification to the client and the client acknowledges this message.

When the cab arrives at the client’s location, a validation process takes place to ensure that the

client gets her booked cab (and the cab takes the client that booked it). In the following, we

present a brief description of the basic operations in EZCab: (1) discovering the routes to free

75

cabs, (2) booking a free cab, and (3) performing the validation between the cab and the client.

We conclude the section with an analysis of our application based on experimental results.

Route Discovery.The EZCab application starts at the client node and takes as parameter

the radius of the circular geographical region to be covered (the maximum number of hops,

maxHops, for which any EZCab SM is allowed to migrate is computed based on this radius and

the transmission range of the nodes). To reach a free cab, the SM uses routing tables that specify

the next hop as the probability to reach a free cab from the current node (similar to probabilistic

routing [72]). If the probability to find a free cab using the existing routes is too low, or there are

no routes at all, the SM creates aroute discovery SMand blocks waiting for routes (Figure 5.8

illustrates this process). Each route discovery SM migrates through the network until it arrives

at a node already visited by another discovery SM (i.e., it ends its execution) or reaches the

maximum number of hops that it is allowed to migrate. Once this threshold is reached, the

SM migrates back one hop and reports its current information. This is a recursive process that

builds the routing tables at nodes. We have chosen to wait for replies for a given period of

time because it is difficult to wait for a fixed number of replies in a volatile network (i.e., those

replies may never arrive).

Cab Booking. Booking a cab is a three-way handshake protocol. If a node has routes to

free cabs, the application creates abooking SMto find a free cab and blocks for a certain amount

of time. If the cab is not free, the booking SM chooses the next neighbor greedily (i.e., using

the greatest probability in the routing table), as shown in Figure 5.9. Once a free cab is found,

the SM removes theFreeCabtag, writes the client’s location in theLocationtag, and creates a

reporting SMto confirm the booking with the client. Then, it blocks at the cab waiting for an

acknowledgment from the client.

The reporting SM migrates to the client’s location using geographical routing to improve

the efficiency. Once it has informed the client that a cab is on its way, it returns to the cab

with an acknowledgment to let the cab know that the handshake has succeeded. If no reply is

received from a cab after a timeout, EZCab will re-start with a new best route. Consequently,

the booking SM waiting at the cab times out and re-creates aFreeCabtag to reflect the change

in the cab’s status.

Validation. Upon reaching the client’s location, the validation mechanism is initiated. To

76

make the validation possible, the booking SM carries the public key of the client to the cab,

and the reporting SM carries the public key of the cab to the client. To validate the client, the

cab broadcasts a challenge in the zone by encrypting a text using the client’s public key. The

client, upon receiving the encrypted text, decrypts it using its private key. In turn, it uses the

cab’s public key to encrypt the text again and send it to the cab. If the reply text is identical, the

client is validated.

Analysis. For EZCab, it is of particular importance to evaluate its completion time given re-

alistic configurations. In the following, we present an analysis which demonstrates that EZCab

can cover a circular area up to 1km radius around the client’s location, and the user-perceived

response time is less than 2 seconds. Figure 5.10 shows our EZCab prototype.

The first part of our evaluation tries to determine the maximum distance at which two mov-

ing cars can communicate and the time for which the topology is relatively stable. Using two

HP iPAQs with 802.11 cards for communication, and various mobility scenarios (as much as

170km/h relative speed between two cars moving in opposite directions), we have experienced

a substantial increase in the packet loss rate for distances bigger than 60m. We consider this

distance feasible for our target networks (we have also experimented with external antennas

and amplifiers to increase to range as much as 400m). Given this distance, two cars are in the

communication range of each other for approximately 2 seconds at a relative speed of 120km/h

(i.e., typical speed for two cars moving in opposite directions in a crowded city). Therefore,

our application should complete faster than that in order to reduce the effects of mobility on the

established routes.

The second part of our evaluation tries to see if EZCab can finish using this time bound, and

how big the geographical region covered by our application is (i.e., a bigger region increases the

probability to locate a free cab). The response time for EZCab is defined as the time spent until

the client receives a confirmation from the cab. The design of EZCab makes it easy to bound

this response time. All the main operations (route discovery, booking a cab, and reporting a

booked cab) are bounded by a timeout. Therefore, the maximum response time for a successful

booking is the sum of the timeouts for route discovery and booking a cab.

We compute the timeouts for each SM generated by EZCab as the products of the round

trip time of each SM between two nodes (RTT) and the maximum number of hops traveled

77

Figure 5.10: EZCab Prototype Figure 5.11: Estimated Completion Time forEZCab

by an SM (maxHops). We further assume that all cabs have the code cached. SMs transfer

only small size data bricks and execution control state when the code is cached. Hence, the

measured values of the RTTs for the three SMs are almost identical (24.3ms, 25.4ms, 25.1ms).

To include the costs of SM execution and wireless contention, we consider, conservatively, a

value three times greater. Since booking a cab and reporting back to the client do not involve

any broadcast, we just double the minimum timeout value for booking a cab and reporting back

to the client.

Figure 5.11 shows the response time as a function of the size of the covered region. The re-

sults indicate that EZCab can finish in less than 2 seconds for a region radius of approximately

1km even if it needs to perform a route discovery. As determined by the first part of our evalua-

tion, we expect the network topology to be relatively stable during this time period. Therefore,

we conclude that SMs offer the flexibility to program the EZCab application without any in-

frastructure, and the analysis of the actual SM implementation demonstrates the feasibility of

EZCab for densely populated cities.

5.3 Spatial Programming using Smart Messages

SP requires a set of programming constructs that have to be exposed to programmers and a

runtime system to support the model. The constructs can be added as extensions to any pro-

gramming language or implemented as library calls. In this section, we describe the SP imple-

mentation using SMs. Under this implementation, SP applications are Java programs. The SP

78

read/write {space:tag[index], timeout}.resource

migrate(space, timeout)

lookup {space:tag[index]}

timeout throw exceptionspace unreachable/

success

read location and node’s unique tagId

migrate(location, timeout)

create unique tagId at nodeverify node’s unique tagId

success success

throw exception

timeout

read/write resource

fail

existsreference does not existreference

loca

tion

unre

acha

ble timeout timeout

throw exception

success

success

create list of ineligiblenodes [mapped_nodes]

migrate(tag, space, mapped_nodes, timeout)

create a new entry in mapping table

in mapping table

migrate(tagId, space, timeout)

Figure 5.12: Implementation of Spatial References with Smart Messages

programming constructs can be invoked as Java methods, which are supported by our SM-based

runtime system.

An SM-based runtime system is suitable for SP not only because SMs provide the abil-

ity to re-program the network on-the-fly, but also because the tag space offers a simple and

uniform interface for accessing data or services at nodes. Additionally, SP benefits from the

SM self-routing mechanism; in reaching a node, the runtime system may use different routing

algorithms and change the routing dynamically.

The main idea in our implementation is to translate high level SP programs into SMs.

However, SP programs (written in Java) are not aware of the underlying SMs. To use the

SM-based runtime system, they have to follow three simple rules: (1) extend anSMWrapper

class which provides methods for the SP programming constructs, (2) initialize the SMWrapper

by passing the class names for all classes that do not belong to our SM distribution (i.e., in order

to be incorporated in the SM as code bricks), (3) use only class member variables (in this way,

the SMWrapper knows what data needs to be transferred as data bricks). Under these rules, SP

79

applications are just normal Java programs that access transparently network resources using

spatial references.

At initialization, the SMWrapper creates themapping tablewhich maintains the mappings

between spatial references and nodes. Also, the SMWrapper includes the code and data bricks

for two routing algorithms: geographical routing (to reach the space of interest), and space

bound content-based routing (to reach a node of interest within a given space). After the initial-

ization is done, the SMWrapper creates and injects a new SM in the network. This SM includes

the code and data for the SP program.

Essentially, the SMWrapper performs the SP-to-SM translation by transforming each ac-

cess to a network resource (read/write) into an SM migration. Figure 5.12 illustrates the main

steps necessary to read/write a resource located on a referenced node. For both mapped and

unmapped spatial references, the SM migrates to the desired space using geographical routing.

We have implemented a greedy geographical routing similar to GPSR [48].

When the space is reached, the SM checks if the spatial reference exists by performing

a lookup in the mapping table. In the left part of the figure, we show how to reach a node

referenced by an existent spatial reference (i.e., reference consistency). The right part shows

how a new node of interest is found and mapped to a spatial reference.

If the reference does not exist, the SM has to discover a node of interest in the given space.

Therefore, it changes dynamically its routing to a content-based on-demand routing (similar to

AODV [68]) which is used to discover a node of interest. Due to its limited geographical scope,

flooding does not represent a major problem for scalability. Once a matching node is found,

the SM assigns a unique network address to this node by creating a uniquetagID in this node’s

tag space. Subsequently, thetagID and the location of the node are stored in the associated

mapping table entry. In the process of mapping a new spatial reference, the mapped nodes

having the same space-tag pair must be avoided (i.e., the application asked for a new node). To

solve this problem, we retrieve the list of uniquetagIDs corresponding to the mapped nodes

and pass it to the routing algorithm. It is the responsibility of the routing to find an unmapped

node.

To ensure reference consistency, subsequent accesses to an existent spatial reference must

reach the same node. Therefore, the SM retrieves the location of the mapped node from the

80

// Instruction in an SP Applicationimage = {Hill1:Camera[1], timeout}.image;

// Content of Mapping Table for the Above Spatial Reference// (space, tag, index) = (unique_tag, location){Hill1, Camera, 1} = {yU78GH5, location}

// SMWrapper Code to Access the Above Spatial Referencetry{

location = MappingTable.getLocation(Hill1, Camera, 1);GeographicalRouting.migrate(location, timeout);

}catch(Exception e){ // LocationUnreachable or Timeouttry{

unique_tag = MappingTable.getUniqueTag(Hill1, Camera, 1);ContentBasedRouting.migrate(unique_tag, Hill1, timeout);

}catch(TimeoutException e){throw TimeoutException;

}}return TagSpace.readTag(image);

Figure 5.13: Example of Spatial Reference Access

mapping table and migrates directly to this location. According to spatial references’ semantics,

if the node is not present at that location anymore, the SM will try to reach it in the same space

region using its uniquetagID.

When the node of interest is reached, the SP program resumes its execution (i.e., it starts

with the read/write operation which triggered the entire migration process). The tag space

primitives are used to give the application access to local resources. If a node of interest is

not found during the time interval specified by the application, or the space is unreachable, an

exception is thrown to let the application decide further actions.

Figure 5.13 illustrates the main operations performed for a spatial reference access. The

SP application uses an already mapped spatial reference to read animageat aCameranode on

Hill1 . The mapping table (carried in the SM data brick) contains the unique tag created on this

node at the time of the first access, as well as the location of this node. The SM tries first to use

geographical routing to migrate to the referenced node. If this operation fails, either because

of a timeout or because the location is unreachable (e.g., the node might have moved from its

previously recored location), the SM tries to use the space-bound content-based routing to find

81

the node. If the node is still in the same space region (Hill1 ), the SM migrates on it, reads the

imagetag and return its value. Otherwise, the SP application receives a timeout exception.

5.4 Spatial Programming Evaluation

This section presents the implementation and evaluation of an SP application executed over our

SM-based runtime system. We have evaluated this application on a testbed consisting of ten HP

iPAQs. For wireless communication, we use Orinoco 802.11b PC cards in ad hoc mode. Each

node supports the Smart Messages (SM) architecture. Our goal in conducting this evaluation

study was twofold: (1) to verify the viability of the SP model in terms of ease of programming,

and (2) to analyze the performance of our SM-based runtime system.

The application is similar to the object tracking application described in Chapter 2. Essen-

tially, the application (injected by a user from a handheld device) performs intrusion detection

over a monitored space region. It verifies the status of the motion sensors, and if one of them

have detected motion, the application turns on a certain number of cameras to perform face

recognition. After all these cameras have been turned on, the application returns to each of

them to verify the result of the face recognition program. If at least half of the cameras have

recognized a face, the application informs the user that an intruder has been detected.

For this application, some of our nodes are identified by aCameratag (i.e., they have

an attached video camera), while others are identified by aLight tag (i.e., instead of motion

sensors, we use light sensors incorporated in iPAQs; we consider that motion was detected

when the light intensity is above a certain threshold). The camera nodes provide also tags to

activate the camera and get the result of the face recognition program. Figure 5.16 shows a

typical camera node used in our experiments.

The Java code for this application, presented in Figure 5.14, demonstrates the main benefit

of SP: flexibility to program complex distributed applications in outdoor computing environ-

ments in a simple, network-transparent fashion. Therun method shows how spatial references

shield the programmers from the networking details. It also demonstrates reference consis-

tency; the runtime system guarantees that the same cameras which have been activated to per-

form face recognition are turned off after the operation completes. Note that the SM-based

82

public class IntruderDetection extends SMWrapper{

public Space userSpace, monitoredSpace;public int i, j, count, numSensors, numCameras, timeout, threshold;public SpatialReference srLight, srUser, []srCamera;

public static void main(String []args){IntruderDetection intruderDetection = new IntruderDetection();// read and store application’s parametersString []userClasses = {"IntruderDetection"} ;intruderDetection.initSMWrapper(userClasses, intruderDetection);intruderDetection.run();

}

public void run(){try{

for (i=0; i<numSensors; i++){srLight = getSpatialReference(monitoredSpace, "Light", i, timeout);if (((Integer)srLight.read("Intensity")).intValue() > threshold){

srCamera = new SpatialReference[numCameras];for (j=0; j<numCameras; j++){

srCamera[j] = getSpatialReference(monitoredSpace, "Camera", j, timeout);srCamera[j].write("Active", "ON");

}for(j=0,count=0; j<numCameras; j++){

if (((Boolean)srCamera[j].read("FaceRecognition")).booleanValue())count++;

srCamera[j].write("Active", "OFF");}if (count > numCameras/2){

srUser = getSpatialReference(userSpace, "User", 0, timeout);srUser.write("Message", "intruder detected!!");return;

}}

}}catch(TimeoutException e){}

}}

Figure 5.14: Java Code for Intrusion Detection Application

runtime system is transparent to the programmer, except in themainmethod which performs

the initialization (i.e., the SMWrapper is initialized in order to allow it to create the SM that

will carry the SP application through the network).

For experiments, we have considered the simple network topology presented in Figure 5.15.

The response time is heavily influenced by the size of the payload carried by the SM “incar-

nation” of our SP application. Figure 5.17 presents the breakdown of the SM payload (code,

data, and execution state). The code consists of SP application and SM-based runtime library

code (i.e., the SM needs to carry the runtime library code to those nodes where this code is not

cached). We can see that the execution state is small (under 3% of the total size). The biggest

83

light sensoruser node camera

monitored space

regular node

Figure 5.15: The Network Topology for In-trusion Detection Application Figure 5.16: Typical Camera Node with GPS

Receiver Attached

Figure 5.17: Smart Message Code Break-down for Intrusion Detection Application

Figure 5.18: Spatial Programming RuntimeLibrary Code Breakdown

contribution comes from the library code (the size of its components are shown in Figure 5.18).

This code, however, is cached at nodes in the common case. In Figure 5.19, we present the

total execution time for the application in two cases: (1) the code is not cached at any node

when the application starts (but the caching is activated in the network), and (2) the SM-based

runtime library code is cached at every node (i.e., only the application code is migrated through

the network). In this experiment, we do not perform the face recognition because our goal is to

evaluate the performance of the SP runtime system (i.e., the execution time for the face recog-

nition is an order of magnitude greater than the rest of the application). The results indicate

that our SP runtime implementation based on SMs can achieve good performance, especially

when the runtime library is cached at nodes. We observe that caching leads to a 57% decrease

in the overall response time. The time breakdown shows how each basic operation is affected

by code caching. The time to reach the space of interest and the time to migrate to target

nodes are significantly reduced (as much as 70%). The route discovery time experiences a less

significant decrease due to the unavoidable contention encountered in wireless networks for

84

Figure 5.19: Execution Time for Intrusion Detection Application

flooding-based algorithms.

5.5 Experiences and Lessons Learned from Building our Prototypes

In this section, we present the experiences and lessons learned from building prototypes for

Smart Messages (SM) and Spatial Programming (SP), as well as testing them on top of ad

hoc networks of PDAs. In many aspects, this was pioneering work because the technology

that allows the deployment of wireless embedded systems in the physical world has started to

become more mature (and implicitly commercially available) only in the last two-three years.

The first lesson learned during our initial evaluation of possible hardware and software op-

tions for the prototype is that a good balance needs to exist between the amount of resources

available at nodes and the ease of programming. The hardware limitations, specific to em-

bedded systems, can lead to a very low level programming interface, and consequently, make

programming extremely tedious. From a programmer perspective, working with resource con-

strained systems is similar to “going several decades back in time” (i.e., these systems are

similar to the computers used twenty or thirty years ago). The use of extremely limited sys-

tems should be avoided, if possible, for the sake of faster implementation of real prototypes.

The danger is that a certain idea becomes irrelevant if too much time is spent on non-essential

implementation issues.

Since we needed an open platform, we had discarded from the beginning any systems

85

based on Microsoft Windows CE and PalmOS. Thus, the only valid options remained either

TinyOS-based sensors or Linux-based embedded systems. Programming sensor networks us-

ing TinyOS [35] has been demonstrated to be very difficult and time consuming due to the low

level programming interface. Additionally, our target systems were significantly more power-

ful than sensors. Therefore, we have focused on systems capable of running Linux. One of our

original ideas was to build a prototype using Axis micro-controllers [3] and Bluetooth for com-

munication. Although these micro-controllers ran a reduced version of Linux (microLinux),

writing distributed applications on top of them has been difficult for two main reasons. The

first was the lack of a memory management unit on these micro-controllers. Hence, it was

very easy to write buggy programs that crashed the entire system. A different problem came

from the immaturity of Bluetooth technology at that time. The Bluetooth chips had bugs that

ultimately rendered them unusable.

Finally, we ended up with HP iPAQ PDAs running Linux and communicating through

802.11 PC cards. The iPAQs provided a good balance between the amount of resources (much

less than traditional PCs, but enough to develop our prototype) and ease of programming. From

a programmer point of view, their best feature was the ability to run an unmodified Linux ker-

nel. Thus, we have been able to develop our prototype on PCs using traditional programming

tools for Linux, and then, cross-compile it for ARM-based processors (iPAQs have a 206MHz

Strong ARM processor). From a hardware point of view, iPAQs come equipped with two PC

card slots and a serial interface. For instance, this configuration allowed us to use one PC card

slot for wireless communication, the other for a video camera, and the serial interface for a GPS

receiver. Such a node is relatively powerful and can be used in non-trivial outdoor distributed

applications.

A problem that we faced during our experiments was the accuracy of raw GPS data, which

sometimes varies significantly. However, this problem can be solved using various “smoothing”

techniques. In a different work [26], we have shown how such techniques can improve the

localization of cars on the roads despite the relative inaccuracy of raw GPS data.

The biggest problem that we encountered was testing and running experiments in real-life

conditions. The lesson learned is that a good emulator is needed to ensure that the software

86

function properly indoor, and only after that it makes sense to do outdoor experiments. Oth-

erwise, the entire process is a huge waste of time for many people (i.e., outdoor experiments

require nodes distributed across large areas; thus, many people are needed).

Testing ad hoc networks with multi-hop communication has presented big logistics chal-

lenges because it is hard to build topologies for certain experiments when the connectivity

varies a lot between nodes. This variation occurs mostly because of obstacles located between

the nodes. When the nodes are mobile, maintaining a certain degree of connectivity while

having multi-hop communication is also very difficult. This is true especially for small size

networks when a node can easily move out of range. In a different testbed [26], we have used

omni-directional antennas to increase the communication range. Another solution we came up

with recently is to emulate mobility indoors. In this way, the implement-test-debug cycle can

be shortened significantly.

Many times we have experienced faults due to lost connectivity or lost GPS coverage. De-

signing robust applications in outdoor computing environments is a problem that needs further

study. Since it is impossible to provide real-time guarantees, our mechanisms are based on

soft deadlines. Currently, it is the programmer’s responsibility to decide what to do when the

application receives a timeout exception generated by a non-satisfied soft deadline. A sys-

tematic study of the problems encountered in outdoor computing environments may help us

improve the system support for fault tolerance, and consequently, relieve the programmer from

the burden of taking care of all the exceptions.

One problem we have observed is that programmers have their mind set on the traditional

message passing model (send/receive between two fixed end-points), and it takes a certain

amount of time until they learn how to program with migration when writing SM-based dis-

tributed applications. Furthermore, to make execution migration efficient, we require the pro-

grammers to explicitly specify the data that needs to be accessed across migrations. The most

common bug encountered in SMs was forgetting to include a certain variable in a data brick

(i.e., forgetting to make it a global object variable in that data brick). This situation occurs in

more complex programs such as those using recursion.

Although both SP and SM can be used to write any type of distributed application, we have

been used them successfully for relatively simple, sequential programs. Commonly, we had

87

one or few applications running concurrently in the network. Additionally, these applications

were mostly cooperative applications. Injecting multiple competing applications in the network

should provide better insights in the design and implementation of our systems.

5.6 Summary

In this chapter, we have described the prototype implementation of Smart Messages (SM) and

Spatial Programming (SP) using SMs. We have demonstrated the feasibility of these implemen-

tations through applications executed over ad hoc networks of PDAs (HP iPAQs equipped with

IEEE 802.11 wireless cards). Although difficult to build and test, our prototype has enabled

rapid development of outdoor distributed applications. The experimental results have indicated

that the SP model and the SM system architecture are viable solutions for outdoor distributed

computing.

88

Chapter 6

Conclusions

With the emergence of outdoor computing environments consisting of massive numbers of net-

worked embedded systems deployed everywhere in the physical world, computing is becoming

pervasive for the first time in history. However, this huge computing infrastructure lacks proper

support for programmability. The volatility, heterogeneity, and scale that characterize the net-

works of embedded systems (NES) make programming these networks a very challenging task.

The main question that this dissertation has tried to answer is:

• Can we take advantage of the ubiquity of NES and program distributed applications on

top of them?

The traditional distributed computing models and system architectures have not been de-

signed for networks such as NES, where the systems are extremely heterogeneous and the

network configuration evolves continuously over time. Therefore, we have raised the following

questions:

• Can we provide a simple and intuitive programming model that allows programmers to

reason about the algorithmic details of the applications rather than spend time coping

with the highly volatile nature of NES?

• Can we provide a common distributed computing platform that supports a cooperative

execution environment across NES for virtually any user-defined application?

This dissertation have presented the design and implementation of Spatial Programming

(SP) and Smart Messages (SM), which provide a programming model and a distributed com-

puting platform for programming distributed applications on top of NES. To the best of our

knowledge, SP is the first attempt to design and implement a location-aware programming

89

model for outdoor distributed computing. SP offers fine-grained, network-transparent access

to systems embedded in the physical space. Central to SP is the concept of spatial reference,

which defines a virtual name space over NES using the expected locations and properties of

these systems. Programmers use spatial references to access the content or services provided

by nodes in the network in the same way they use variables in a conventional program. The

main benefits of SP are the flexibility and simplicity to program user-defined distributed appli-

cations in highly volatile outdoor computing environments.

The SM architecture provides a common distributed computing platform across NES. SMs

overcome the volatility, heterogeneity, and scale encountered in NES by migrating the execu-

tion to nodes of interest and self-routing between these nodes. The SM system architecture

is suitable for resource constrained systems because it defines a lightweight system support at

nodes, with most of the “intelligence” incorporated into SMs. SMs represent an attractive alter-

native to traditional distributed computing based on end-to-end message passing because they

adapt quickly to highly dynamic networks and provide support for deploying new applications

in existing networks.

To demonstrate the feasibility of the proposed solutions, we have designed and implemented

a prototype system. The experimental results for several applications executed over ad hoc

wireless networks of PDAs have indicated that the SP model and the SM architecture are viable

solutions for outdoor distributed computing. Additionally, simulation results for larger scale

networks have shown the performance benefits of the SM self-routing mechanism.

The conclusion of this dissertation is that, although difficult, programming outdoor dis-

tributed applications is possible when the programming models and system architectures are

specifically designed to address the volatility, heterogeneity, and scale exhibited by networks

of embedded systems.

90

References

[1] Java 2 Platform, Micro Edition (J2ME). http://java.sun.com/j2me/.

[2] K Virtual Machine. http://java.sun.com/products/cldc/.

[3] “Axis Communications.” http://www.axis.com.

[4] “Intelligent Transportation Systems, U.S. Department of Transporation.”http://www.its.dot.gov.

[5] “JavaSpaces.” http://wwws.sun.com/software/jini/specs/jini1.1html/js-title.html.

[6] “Linux Devices.” http://www.linuxdevices.com.

[7] “Linux Kernel Procfs.” http://www.kernelnewbies.org/documents/kdoc/procfs-guide/intro.html.

[8] “The Message Passing Interface (MPI) Standard.” http://www-unix.mcs.anl.gov/mpi/.

[9] “Trusted Computing.” http://www.cl.cam.ac.uk/ rja14/tcpa-faq.html.

[10] “XML.” http://www.w3.org/XML/.

[11] M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young,“Mach: A new kernel foundation for unix development,” inProceedings of the USENIX1986 Summer Conference, (Atlanta, GA), July 1986, pp. 93–113.

[12] S. Adhikari, A. Paul, and U. Ramachandran, “D-Stampede: Distributed ProgrammingSystem for Ubiquitous Computing,” inProceedings of the 22nd International Conferenceon Distributed Computing Systems (ICDCS 2002), (Vienna, Austria), July 2002, pp. 209–216.

[13] W. Adjie-Winoto, E. Schwartz, H. Balakrishnan, and J. Lilley, “The Design and Imple-mentation of an Intentional Naming System,” inProceedings of the 17th ACM Symposiumon Operating Systems Principles (SOSP 1999), (Charleston, SC), ACM Press, New York,NY, 1999, pp. 186–201.

[14] O. Agesen, “GC Points in a Threaded Environment,” Technical Report SMLI TR-98-70,Sun Microsystems Laboratories, Palo Alto, CA, December 1998.

[15] G. Banavar, J. Beck, E. Gluzberg, J. Munson, J. Sussman, and D. Zukowski, “Challenges:An Application Model for Pervasive Computing,” inProceedings of the Sixth annualACM/IEEE International Conference on Mobile Computing and Networking (MobiCom2000), (Boston, MA), August 2000, pp. 266–274.

[16] B. Bloom, “Space/time trade-offs in hash coding with allowable errors,”Communicationof the ACM, vol. 13, no. 7, pp. 422–426, July 1970.

[17] B. Blum, P. Nagaraddi, A. Wood, T. Abdelzaher, S. Son, and J. Stankovic, “An EntityMaintenance and Connection Service for Sensor Networks,” inProceedings of the FirstInternational Conference on Mobile Systems, Applications, and Services (MobiSys 2003),(San Francisco, CA), May 2003, pp. 201–214.

91

[18] P. Bonnet, J. E. Gehrke, and P. Seshadri, “Querying the Physical World,”IEEE PersonalCommunications, vol. 7, no. 5, pp. 10–15, October 2000.

[19] C. Borcea, C. Intanagonwiwat, P. Kang, U. Kremer, and L. Iftode, “Spatial Programmingusing Smart Messages: Design and Implementation,” inProceedings of the 24th Inter-national Conference on Distributed Computing Systems (ICDCS 2004), (Tokyo, Japan),March 2004, pp. 690–699.

[20] C. Borcea, C. Intanagonwiwat, A. Saxena, and L. Iftode, “Self-Routing in PervasiveComputing Environments using Smart Messages,” inProceedings of the 1st IEEE In-ternational Conference on Pervasive Computing and Communications (PerCom 2003),(Dallas-Fort Worth, TX), March 2003, pp. 87–96.

[21] C. Borcea, D. Iyer, P. Kang, A. Saxena, and L. Iftode, “Cooperative Computing for Dis-tributed Embedded Systems,” inProceedings of the 22nd International Conference onDistributed Computing Systems (ICDCS 2002), (Vienna, Austria), July 2002, pp. 227–236.

[22] A. Boulis, C. Han, and M. Srivastava, “Design and Implementation of a Framework forEfficient and Programmable Sensor Networks,” inProceedings of the First InternationalConference on Mobile Systems, Applications, and Services (MobiSys 2003), (San Fran-cisco, CA), May 2003, pp. 187–200.

[23] V. Cahill and et al, “Using trust for secure collaboration in uncertain environments,” inPervasive Computing, IEEE, volume 2(3), 2003, pp. 52–61.

[24] N. Carriero and D. Gelernter, “Linda in context,”Communications of the ACM, vol. 32,no. 4, pp. 444–458, April 1989.

[25] D. Wetherall, “Active Network Vision Reality: Lessons from a Capsule-based System,” inProceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP 1999),(Charleston, SC), ACM Press, New York, NY, December 1999, pp. 64–79.

[26] S. Dashtinezhad, T. Nadeem, B. Dorohonceanu, C. Borcea, P. Kang, and L. Iftode, “Traf-ficView: A Driver Assistant Device for Traffic Monitoring based on Car-to-Car Commu-nication,” inProceedings of the 59th IEEE Semiannual Vehicular Technology Conference,May 2004.

[27] F.Hohl, “Time Limited Blackbox Security: Protecting Mobile Agents from MaliciousHosts,” in G. Vigna, editor,Mobile Agents and Security, volume 1419 ofLecture Notes inComputer Science, pp. 92–113, Springer-Verlag, London, UK, 1998.

[28] V. Galtier, K. Mills, Y. Carlinet, S. Bush, and A. Kulkarni, “Predicting resource demandin heterogeneous active networks,” inMilitary Communications Conference, 2001 (MIL-COM 2001). Communications for Network-Centric Operations: Creating the InformationForce, (Washington, D.C.), October 2001, pp. 905–909.

[29] R. Gray, G. Cybenko, D. Kotz, and D. Rus, “Mobile agents: Motivations and state of theart,” in J. Bradshaw, editor,Handbook of Agent Technology, AAAI/MIT Press, 2002.

[30] R. Gray, D. Kotz, G. Cybenko, and D. Rus, “D’Agents: Security in a multiple-language,mobile-agent system,” in G. Vigna, editor,Mobile Agents and Security, volume 1419 ofLecture Notes in Computer Science, pp. 154–187, Springer-Verlag, London, UK, 1998.

[31] R. Grimm and et al, “Systems Directions for Pervasive Computing,” inProceedings ofthe 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), (Elmau/Oberbayern,

92

Germany), IEEE Computer Society, Washington, DC, May 2001, pp. 147–151.

[32] M. Gritter and D. Cheriton, “An Architecture for Content Routing Support in the Inter-net,” inProceedings of the 3rd USENIX Symposium on Internet Technologies and Systems(USITS 2001), (San Francisco, CA), March 2001, pp. 37–48.

[33] J. Heideman, F. Silva, C. Intanagonwiwat, R. Govindan, D. Estrin, and D. Ganesan,“Building Efficient Wireless Sensor Networks with Low-Level Naming,” inProceed-ings of the 18th ACM Symposium on Operating Systems Principles (SOSP 2001), (Banff,Canada), ACM Press, New York, NY, October 2001, pp. 146–159.

[34] W. R. Heinzelman, J. Kulik, and H. Balakrishnan, “Adaptive Protocols for Informa-tion Dissemination in Wireless Sensor Networks,” inProceedings of the Fifth annualACM/IEEE International Conference on Mobile Computing and Networking (MobiCom1999), (Seattle, WA), ACM Press, New York, NY, August 1999, pp. 174–185.

[35] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister, “System ArchitectureDirections for Networked Sensors,” inProceedings of the Ninth International Conferenceon Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), (Cambridge, MA), ACM Press, New York, NY, November 2000, pp. 93–104.

[36] Y. Hu, A. Perrig, and D. Johnson, “Ariadne: a secure on-demand routing protocol for adhoc networks,” inProceedings of the 8th annual ACM/IEEE International Conference onMobile Computing and Networking (MobiCom 2002), (Atlanta, GA), ACM Press, NewYork, NY, September 2002, pp. 12–23.

[37] L. Iftode, C. Borcea, and P. Kang, “Cooperative Computing in Sensor Networks,” inM. Ilyas, editor,Handbook of Sensor Networks: Compact Wireless and Wired SensingSystems, CRC Press, July 2004.

[38] L. Iftode, C. Borcea, A. Kochut, C. Intanagonwiwat, and U. Kremer, “Programming Com-puters Embedded in the Physical World,” inProceedings of the 9th Workshop on FutureTrends of Distributed Computing Systems (FTDCS 2003), May 2003, pp. 78–85.

[39] C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directed Diffusion: A Scalable and Ro-bust Communication Paradigm for Sensor Networks,” inProceedings of the Sixth annualACM/IEEE International Conference on Mobile Computing and Networking (MobiCom2000), (Boston, MA), ACM Press, New York, NY, August 2000, pp. 56–67.

[40] J. Elson and L. Girod and D. Estrin, “Fine-Grained Network Time Synchronization us-ing Reference Broadcasts,” inProceedings of the 5th Symposium on Operating SystemsDesign and Implementation (OSDI 2002), December 2002, pp. 64–79.

[41] D. Johnson and D. Maltz,Dynamic Source Routing in Ad Hoc Wireless Networks. T.Imielinski and H. Korth, (Eds.). Kluwer Academic Publishers, 1996.

[42] P. Juang, H. Oki, Y. Wang, M. Martonosi, L. Peh, and D. Rubenstein, “Energy-EfficientComputing for Wildlife Tracking: Design Tradeoffs and Early Experiences with Ze-braNet,” inProceedings of the Tenth International Conference on Architectural Supportfor Programming Languages and Operating Systems (ASPLOS-X), (San Jose, CA), ACMPress, New York, NY, October 2002, pp. 96–107.

[43] B. Jung and G. S. Sukhatme, “Cooperative Tracking using Mobile Robots andEnvironment-Embedded, Networked Sensors,” inthe 2001 IEEE International Sympo-sium on Computational Intelligence in Robotics and Automation.

93

[44] P. Kang, C. Borcea, G. Xu, A. Saxena, U. Kremer, and L. Iftode, “Smart Messages: ADistributed Computing Platform for Networks of Embedded Systems,”The ComputerJournal, Special Focus-Mobile and Pervasive Computing, vol. 47, no. 4, pp. 475–494,July 2004. The British Computer Society. Oxford University Press.

[45] E. Kaplan, editor,Understanding GPS: Principles and Applications. Artech House, 1996.

[46] N. Karnik and A. Tripathi, “Agent Server Architecture for the Ajanta Mobile-Agent Sys-tem,” in Proceedings of the 1998 International Conference on Parallel and DistributedProcessing Techniques and Applications (PDPTA’98), (Las Vegas, NV), July 1998, pp.66–73.

[47] N. Karnik and A. Tripathi, “Security in the Ajanta Mobile Agent System,”Software Prac-tice and Experience, vol. 31, no. 4, pp. 301–329, January 2001.

[48] B. Karp and H. Kung, “Greedy Perimeter Stateless Routing for Wireless Networks,” inProceedings of the Sixth annual ACM/IEEE International Conference on Mobile Comput-ing and Networking (MobiCom 2000), (Boston, MA), ACM Press, New York, NY, August2000, pp. 243–254.

[49] Y.-B. Ko and N. H. Vaidya, “Location-Aided Routing(LAR) in Mobile Ad Hoc Net-works,” in Proceedings of the Fourth annual ACM/IEEE International Conference onMobile Computing and Networking (MobiCom), October 1998, pp. 66–75.

[50] T. Lehman, A. Cozzi, Y. Xiong, J. Gottschalk, V. Vasudevan, S. Landis, P. Davis,B. Khavar, and P. Bowman, “Hitting the distributed computing sweet spot with tspaces,”Computer Networks: The International Journal of Computer and TelecommunicationsNetworking, vol. 35, no. 4, pp. 457–472, March 2001.

[51] P. Levis and D. Culler, “Mate: A Virtual Machine for Tiny Networked Sensors,” inPro-ceedings of the Tenth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS-X), (San Jose, CA), ACM Press, New York,NY, October 2002, pp. 85–95.

[52] K. Li, “Shared virtual memory on loosely-coupled multiprocessors.” Ph.D. Thesis, YaleUniversity, October 1986. Tech Report YALEU-RR-492.

[53] M. Satyanarayanan, “Pervasive Computing: Vision and Challenges,”IEEE PersonalCommunications, August 2001.

[54] M. Welsh and G. Mainland, “Programming Sensor Networks Using Abstract Regions,”in Proceedings of the First USENIX/ACM Symposium on Networked Systems Design andImplementation (NSDI 2004), March 2004.

[55] S. Madden, M. Franklin, J. Hellerstein, and W. Hong, “TAG: a Tiny AGgregation Ser-vice for Ad-Hoc Sensor Networks,” inProceedings of the 5th Symposium on OperatingSystems Design and Implementation (OSDI)., December 2002.

[56] S. Madden, M. Franklin, J. Hellerstein, and W. Hong, “The Design of an AcquisitionalQuery Processor for Sensor Networks,” inProceedings of the 2003 ACM SIGMOD inter-national conference on Management of data, (San Diego, CA), ACM Press, New York,NY, June 2003, pp. 491–502.

[57] S. McCanne and S. Floyd. ns Network Simulator. http://www.isi.edu/nsnam/ns/.

[58] D. Milojicic, F. Douglis, Y. Paindaveine, R. Wheeler, and S. Zhou, “Process migration,”ACM Computing Surveys, vol. 32, no. 3, pp. 241–299, September 2000.

94

[59] J. Moore, M. Hicks, and S. Nettles, “Practical Programmable Packets,” inProceedings ofthe 20th Annual Joint Conference of the IEEE Computer and Communications Societies(INFOCOM 2001), (Anchorage, AK), April 2001, pp. 41–50.

[60] R. Morris, J. Jannotti, F. Kaashoek, J. Li, and D. Decouto, “CarNet: A Scalable Ad HocWireless Network System,” inProceedings of the 9th ACM SIGOPS European Workshop,(Kolding, Denmark), ACM Press, New York, NY, September 2000, pp. 61–65.

[61] S.-Y. Ni, Y.-C. Tseng, Y.-S. Chen, and J.-P. Sheu, “The Broadcast Storm Problem in aMobile Ad Hoc Network,” inProceedings of the Fifth Annual ACM/IEEE InternationalConference on Mobile Computing and Networking (MobiCom 1999), (Seattle, WA), 1999,pp. 151–162.

[62] Y. Ni, U. Kremer, and L. Iftode, “Spatial Views:space-aware programming for networksof embedded systems,” inProceedings of the 16th International Workshop on Languagesand Compilers for Parallel Computing (LCPC 2003), (College Station, TX), October2003.

[63] D. Niculescu and B. Badrinath, “Ad hoc positioning system(aps),” inProceedings of theGLOBECOM 2001 Conference, 2001.

[64] J. Ousterhout, A. Cherenson, F. Douglis, M. Nelson, and B. Welch, “The sprite networkoperating system,”IEEE Computer, vol. 21, no. 2, pp. 23–36, February 1988.

[65] P. Koopman, “Critical Embedded Automotive Networks,”IEEE Micro, vol. 22, no. 4, pp.14–18, July-August 2002.

[66] E. Palmer, “An Introduction to Citadel - A Secure Cypto Coprocessor for Workstations,”in Proceedings of IFIP SEC’94 Conference, (Curacao, Dutch Antilles), May 1994.

[67] Peng Zhou and Tamer Nadeem and Porlin Kang and Cristian Borcea and Liviu Iftode,“EZCab: A Cab Booking Application Using Short-Range Wireless Communication,”Technical Report DCS-TR-550, Rutgers University, March 2004.

[68] C. Perkins and E. Royer, “Ad Hoc On Demand Distance Vector Routing,” inProceedingsof the 2nd IEEE Workshop on Mobile Computing Systems and Applications (WMCSA1999), (New Orleans, LA), February 1999, pp. 90–100.

[69] A. Perrig, R. Szewczyk, V. Wen, D. Culler, and J. Tygar, “SPINS: Security Protocols forSensor Netowrks,” inProceedings of the 7th annual ACM/IEEE International Conferenceon Mobile Computing and Networking (MobiCom 2001), (Rome, Italy), ACM Press, NewYork, NY, July 2001, pp. 189–199.

[70] S. Ponnekanti, B. Lee, A. Fox, P. Hanrahan, and T. Winograd, “ICrafter: A Service Frame-work for Ubiquitous Computing Environments,” inProceedings of the Third InternationalConference on Ubiquitous Computing (Ubicomp), (Atlanta, GA), Springer-Verlag, Lon-don, UK, September 2001, pp. 56–75.

[71] N. Priyantha, A. Miu, H. Balakrishnan, and S. Teller, “The Cricket Compass for Context-Aware Mobile Applications,” inProceedings of the 7th annual ACM/IEEE InternationalConference on Mobile Computing and Networking (MobiCom 2001), ACM Press, NewYork, NY, July 2001, pp. 1–14.

[72] S. Rhea and J. Kubiatowicz, “Probabilistic Location and Routing,” inProceedings of the21th Annual Joint Conference of the IEEE Computer and Communications Societies (IN-FOCOM’02), (New York, NY), June 2002, pp. 1248–1257.

95

[73] M. Roman and R. Campbell, “GAIA: Enabling Active Spaces,” inProceedings of the 9thACM SIGOPS European Workshop, (Kolding, Denmark), ACM Press, New York, NY,September 2000, pp. 229–234.

[74] D. Rosu, K. Schwan, and S. Yalamanchili, “Fara - a framework for adaptive resourceallocation in complex real-time systems,” inProceedings of the Fourth IEEE Real-TimeTechnology and Applications Symposium, (Denver, CO), May 1998, pp. 79–84.

[75] S. Ganeriwal and R. Kumar and M. Srivastava, “Timing-sync Protocol for Sensor Net-works,” in Proceedings of the 1st International Conference on Embedded Networked Sen-sor Systems (Sensys 2003), November 2003, pp. 138–149.

[76] T. Sander and C. Tschudin, “Protecting Mobile Agents against Malicious Hosts,” in G. Vi-gna, editor,Mobile Agents and Security, volume 1419 ofLecture Notes in Computer Sci-ence, pp. 44–60, Springer-Verlag, 1998.

[77] B. Schwartz, A. Jackson, W. Strayer, W. Zhou, R. Rockwell, and C. Partridge, “Smartpackets: Applying active networks to network management,”ACM Transactions on Com-puter Systems, vol. 18, no. 1, pp. 67–88, 2000.

[78] J. Stankovic and K. Ramamritham, “The spring kernel: A new paradigm for real-timesystems,”IEEE Software, vol. 8, pp. 62–72, May 1991.

[79] P. Stanley-Marbell, C. Borcea, K. Nagaraja, and L. Iftode, “Smart messages: A systemarchitecture for large networkws of embedded systems,” inProceedings of HotOS-VIII,May 2001. Position Paper, 2001. Longer version: Rutgers University Technical ReportDCS-TR-430.

[80] P. Stanley-Marbell and L. Iftode, “Scylla: A smart virtual machine for mobile embed-ded systems,” in3rd IEEE Workshop on Mobile Computing Systems and Applications,WMCSA2000, (Monterey, CA), December 2000, pp. 41–50.

[81] I. Stoica, D. Adkins, S. Zhaung, S. Shenker, and S. Surana, “Internet Indirection Infras-tructure,” inProceedings of ACM SIGCOMM ’02, August 2002, pp. 73–86.

[82] T. Abdelzaher and B. Blum and Q. Cao and D. Evans and J. George and S. Georgeand T. He and L. Luo and S. Son and R. Stoleru and J. Stankovic and A. Wood, “En-viroTrack: Towards an Environmental Computing Paradigm for Distributed Sensor Net-works,” in Proceedings of the 24th International Conference on Distributed ComputingSystems (ICDCS 2004), March 2004, pp. 582–589.

[83] A. Vahdat, M. Dahlin, T. Anderson, and A. Aggarwal, “Active Names: Flexible Locationand Transport of Wide-Area Resources,” inProceedings of the Second USENIX Sympo-sium on Internet Technologies and Systems (USITS 1999), (Boulder, CO), October 1999,pp. 151–164.

[84] C. Wan, A. Campbell, and L. Krishnamurthy, “PSFQ: A Reliable Transport Protocol ForWireless Sensor Networks,” inProceedings of the 1st ACM international workshop onWireless sensor networks and applications (WSNA 2002), (Atlanta, GA), ACM Press,New York, NY, September 2002, pp. 1–11.

[85] M. Weiser, “The computer for the twenty-first century,”Scientific American, September1991.

[86] G. Xu, C. Borcea, and L. Iftode, “Toward a Security Architecture for Smart Messages:

96

Challenges, Solutions, and Open Issues,” inProceedings of the 1st International Work-shop on Mobile Distributed Computing (MDC’03), May 2003.

97

Vita

Cristian Borcea

Education

Ph.D.Computer Science, Rutgers University, New Jersey (2004)

M.S. Computer Science, Rutgers University, New Jersey (2002)

M.S. Computer Science, Polytechnic University of Bucharest, Romania (1997)

B.S. Computer Science, Polytechnic University of Bucharest, Romania (1996)

Publications

• Nishkam Ravi, Cristian Borcea, Porlin Kang, and Liviu Iftode. “Portable Smart Mes-sages for Ubiquitous Java-Enabled Devices”. Proceedings of The First Annual Interna-tional Conference on Mobile and Ubiquitous Systems: Networking and Services (Mo-biQuitous 2004), August 2004.

• Liviu Iftode, Cristian Borcea, and Porlin Kang, “Cooperative Computing in Sensor Net-works”. Handbook of Sensor Networks: Compact Wireless and Wired Sensing Systems,Mohammad Ilyas (ed.), CRC Press, July 2004.

• Porlin Kang, Cristian Borcea, Gang Xu, Akhilesh Saxena, Ulrich Kremer, and LiviuIftode, “Smart Messages: A Distributed Computing Platform for Networks of EmbeddedSystem”. The Computer Journal, Special Focus on Mobile and Pervasive Computing,Volume 47, British Computer Society, Oxford University Press, July 2004.

• Sasan Dashtinezhad, Tamer Nadeem, Bogdan Dorohonceanu, Cristian Borcea, PorlinKang, Liviu Iftode. “TrafficView: A Driver Assistant Device for Traffic Monitoringbased on Car-to-Car Communication”. Proceedings of the 59th IEEE Semiannual Vehic-ular Technology Conference (VTC 2004 Spring), May 2004.

• Liviu Iftode, Cristian Borcea, Nishkam Ravi, Porlin Kang, and Peng Zhou, “SmartPhone: An Embedded System for Universal Interactions”. Proceedings of the 10th IEEEInternational Workshop on Future Trends of Distributed Computing Systems (FTDCS2004), May 2004.

• Cristian Borcea, Chalermek Intanagonwiwat, Porlin Kang, Ulrich Kremer, and LiviuIftode, “Spatial Programming using Smart Messages: Design and Implementation”. Pro-ceedings of the 24th International Conference on Distributed Computing Systems (ICDCS2004), March 2004.

98

• Peng Zhou, Tamer Nadeem, Porlin Kang, Cristian Borcea, and Liviu Iftode. “EZCab: ACab Booking Application Using Short-Range Wireless Communication”, Rutgers Uni-versity Technical Report DCS-TR-550, March 2004.

• Liviu Iftode, Cristian Borcea, Andrzej Kochut, Chalermek Intanagonwiwat, and UlrichKremer, “Programming Computers Embedded in the Physical World”. Proceedings ofthe 9th IEEE International Workshop on Future Trends of Distributed Computing Sys-tems (FTDCS 2003), May 2003.

• Gang Xu, Cristian Borcea, and Liviu Iftode, “Toward a Security Architecture for SmartMessages: Challenges, Solutions, and Open Issues”. Proceedings of the 1st InternationalWorkshop on Mobile Distributed Computing (MDC 2003), May 2003.

• Cristian Borcea, Chalermek Intanagonwiwat, Akhilesh Saxena, and Liviu Iftode, “Self-Routing in Pervasive Computing Environments using Smart Messages”. Proceedings ofthe 1st IEEE Annual Conference on Pervasive Computing and Communications (PerCom2003), March 2003.

• Cristian Borcea, Deepa Iyer, Porlin Kang, Akhilesh Saxena, and Liviu Iftode. “Cooper-ative Computing for Distributed Embedded Systems”. Proceedings of the 22nd Interna-tional Conference on Distributed Computing Systems (ICDCS 2002), July 2002.

• Phillip Stanley-Marbell, Cristian Borcea, Kiran Nagaraja, and Liviu Iftode. “Smart Mes-sages: A system Architecture for Large Networks of Embedded Systems”, Proceedingsof the 8th Workshop on the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), Position Summary, May 2001.

PROGRAMMING OUTDOOR DISTRIBUTED EMBEDDED …discolab.rutgers.edu/pubs/borcea-thesis04.pdf · PROGRAMMING OUTDOOR DISTRIBUTED EMBEDDED SYSTEMS BY CRISTIAN M. BORCEA A dissertation

Documents