Portail documentaire SCD Doc'INSA | INSA de Lyon - …docinsa.insa-lyon.fr/these/2010/shiferaw/these.pdfI am grateful to my friends Yaser Fawaz and Sonia Lajmi with whom I made very

N° d’ordre: 2010-ISAL-0052 Année 2010

Thèse

Partage d'informations sensible à la mobilité et à l’intérêt des utilisateurs dans les réseaux mobiles ad-hoc

Présentée devant L’Institut National des Sciences Appliquées de Lyon

(INSA de Lyon)

Pour obtenir Le grade de Docteur

Ecole doctorale INFOMATHS : « Informatique et Mathématiques»

(Spécialité : Informatique)

Par Addisalem Negash Shiferaw

Soutenue le 12 juillet 2010 devant la Commission d’examen

composée de:

Prof. Sylvain Lecomte Université de Valenciennes Rapporteur

Prof. Jean-Marc Pierson Université de Paul Sabatier-Toulouse 3 Rapporteur

Prof. Ernesto Damiani Université de Milan Examinateur

Dr. Richard Chbeir Université de Bourgogne Examinateur

Dr. Dawit Bekele Gouvernance de l’Internet en Afrique Examinateur

Prof. Lionel Brunie INSA de Lyon Directeur de thèse

Dr. Marian Scuturici INSA de Lyon Co-Directeur de thèse

Ordering N°: 2010-ISAL-0052 Year 2010

Thesis

Mobility and Interest Aware Information Sharing in MANETs

Submitted to the National Institute of Applied Sciences (INSA de Lyon)

In fullfillment of the requirement for Doctoral Degree

Doctoral School INFOMATHS: « Computer Science and Mathematics »

(Affiliated Area: Computer Science)

Prepared by Addisalem Negash Shiferaw

Defended on 12 July 2010 in front of

the examination committee :

Prof. Sylvain Lecomte University of Valenciennes Reviewer

Prof. Jean-Marc Pierson University of Paul Sabatier-Toulouse 3 Reviewer

Prof. Ernesto Damiani University of Milan Examiner

Dr. Richard Chbeir University of Bourgogne Examiner

Dr. Dawit Bekele African Regional Bureau Internet Society Examiner

Prof. Lionel Brunie INSA de Lyon Supervisor

Dr. Marian Scuturici INSA de Lyon Co- Supervisor

Remerciements

Plusieurs personnes ont contribués et ont étendus leur aide précieuse dans la préparation et la réalisation de cette thèse. C’est un grand plaisir pour moi de saisir cette occasion d’exprimer ma gratitude pour tous.

Tout d'abord, je tiens à transmettre mes chaleureux remerciements à mon directeur de thèse, prof. Lionel Brunie, pour ses encouragements, ses conseils, son soutien inconditionnel et l’expérience qu’il me la transmise tout au long de ces années de doctorat, Son énergie perpétuelle et son enthousiasme dans la recherche ont rendu mon séjour dans le laboratoire agréable et enrichissant. En outre, il était toujours présent et prêt pour m'aider à surmonter les défis de la vie scolaire et sociale. Je voudrais également remercier sa famille pour l'hospitalité et l’accueil pendant mon séjour en France.

Je tiens également à remercier mon co-directeur de thèse, Dr. Marian Scuturici. Il a été toujours heureux d'interagir et de discuter de mes travaux de recherche et de fournir des conseils constructifs.

Mes remerciements vont également aux membres du jury qui ont accepté de rapporter et examiner ce travail. Je remercie Prof. Ernesto Damiani d’avoir accepté de présider le jury. J’exprime aussi ma gratitude à Prof. Jean-Marc Pierson et à Prof. Sylvain Lecomte qui ont accepté d’être rapporteurs. Je les remercie pour la lecture approfondie du mémoire et les nombreuses remarques pertinentes qu’ils ont formulés. Et enfin je remercie Dr Richard Chbeir et Dr. Dawit Bekele pour les questions très intéressantes qui ont contribué à approfondir ma réflexion.

Je reconnaissante à l’ambassade de France en Ethiopie pour avoir accepté de financer mes recherche et mon séjour en France. À cet égard, je ne veux pas passer sans parler de l'hospitalité que j'ai reçue du personnel du CROUS de Lyon. Je tiens aussi à remercier Dr. Dawit Bekele pour avoir facilité le processus administratif concernant ma bourse avec l’ambassade de France en Ethiopie.

Je tiens à remercier tout les membres de ma famille, surtout Negash Shiferaw, Aregash Mamo, Yalemzewd Negash, Yelewtfrie Negash et Helen Negash pour leurs encouragements et soutiens indispensables. J’ai de la chance d'avoir Shewangizaw Mengesha, mon fiancé, à mes côtés pendant les plus heureux et les plus tristes moments. Il a toujours été de mon coté et a consacré beaucoup de temps pour m'aider à résoudre les problèmes que j'ai rencontrés pendant mes études. Je n'oublierai jamais les soutiens et les aides de mes amis et collègues éthiopiens, y compris Dejene Ejigu, Elizabeth Addis, Fana Belay, Netsanet Mitiku, Girma Berhe et Rahel Kifle. Je voudrais aussi remercier la communauté Ethiopienne de Lyon qui a contribué de près ou de loin au succès de mon travail.

Je suis reconnaissante à mes amis Yaser Fawaz et Sonia Lajmi avec qui j'ai fait de très bonnes discussions scientifiques et nous avons passé des moments inoubliables tout au long de la thèse. Surtout, je n'oublierai jamais leurs soutiens dans des moments difficiles

tels que les deadlines d’articles, la rédaction de la thèse, etc. Par ailleurs, je voudrais remercier Faiza Najjar pour son soutien et ses conseils lors de l’identification de la problématique de recherche. Je tiens à remercier tous mes collègues et le personnel du LIRIS / INSA, surtout Valérie Lebey, Mabrouka Gheraissa, Talar Atéchian, Omar Hasan, Lyes Limam, Zeina Torbey, Armelle-Natacha Ndjafa-Yakou, Vanessa El-Khoury, Adel Ayara, Christian Vilsmaier, Tobias Mayer, Jingwei Miao, Sonia Ben Mokhtar, Nadia Bennani, Sylvie Calabretto et Elod Egyed-Zsigmond

Et enfin, le dernier, mais non le moindre, je tiens à remercier Dieu, que ton nom soit honoré et glorifié!

Addisalem Negash Shiferaw, 12 juillet, 2010, Lyon France

Acknowledgments Several people have contributed and extended their valuable assistance in the preparation

and completion of this thesis. It is a pleasure to convey my gratitude to them all in my humble acknowledgment.

First and foremost, I would like to forward my heartily thank to my supervisor, prof. Lionel Brunie, for his encouragement, guidance and unconditional support starting through out my doctoral study. Working with him permits me to have extraordinary and invaluable experiences through out the research work. His perpetual energy, intelligence and enthusiasm in research make my stay in the laboratory smoother and rewarding. In addition, he was always present and willing to help me to overcome academic and social life challenges. I would like also to thank his family for the hospitality that they have provided me during my stay in France.

I would like to thank my co-advisor Marian Scuturici. He was always delighted to interact and discuss my research work. He provides me with valuable ideas and concepts to realize my research.

My thanks also go to the examination committee members who have agreed to examine and review this research work. I thank Prof. Ernesto Damiani for accepting to chair to the examination committee. I also express my gratitude to Prof. Jean-Marc Pierson and Prof. Sylvain Lecomte who agreed to be reviewers. I am grateful for their thorough reading of the thésis and the pertinent remarks that they have pointed out. Finally, I thank Dr. Richard Chbeir and Dr. Dawit Bekele for posing very interesting questions that have helped me to deepen my reflection.

I owe so much thanks to French Embassy in Ethiopia, Addis Ababa for sponsoring my PhD study. In this regard, I do not want to pass without mentioning the hospitality that I have got from the staffs of CROUS de Lyon. I would also like to forward my special thanks to Dr. Dawit Bekele for facilitating the administrative process concerning my scholarship with the Embassy.

I would like to thank all members of my family, espically Negash Shiferaw, Aregash Mamo, Yalemzewd Negash, Yelewtfrie Negash and Helen Negash, for their indispensable encouragement and supports. I am thankful to have Shewangizaw Mengesha, my fiancé, in my side during the happiest and saddest times. He always got time to help me to resolve the problems that I encountered during my study. I will never forget the supports and helps of my Ethiopian colleagous and friends including Dejene Ejigu, Elizabeth Addis, Fana Belay, Girma Berhe, Netsanet Mitiku and Rahel Kifle. I want to use this opportunity to thank the Ethiopian communities in Lyon who have contributed in one or in the other way to the success of my research work.

I am grateful to my friends Yaser Fawaz and Sonia Lajmi with whom I made very good scientific discussions and had a wonderful time throughout my study. Above all, I will never forget their support in difficult times such as during proof readings of articles and preparation of the thesis manuscript. I would like to use this occasion to thank Faiza Najjar for her support and advice during the identification of the research problem. I would like forward my thanks to all colleagues and staffs of LIRIS/INSA, especially, Valérie Lebey, Mabrouka Gheraissa, Talar Atéchian, Omar Hasan, Zeina Torbey, Armelle-Natacha, Vanessa El-Khoury, Adel Ayara , Christian Vilsmaier, Tobias Mayer, Jingwei Miao, Sonia Ben Mokhtar, Nadia Bennani, Sylvie Calabretto and Elod Egyed-Zsigmond

Last, for not least, I would like to thank God, may your name be honored and glorified! Addisalem Negash Shiferaw, July 12, 2010, Lyon France

Résumé

Le partage d'informations au sein d'un réseau pair à pair mobile est devenu un sujet de

recherche important grâce aux progrès rapides des technologies de communication sans fil

et des dispositifs mobiles intelligents. Le partage d’informations, c'est mettre à disposition

des personnes avec lesquelles on est en contact des données afin de les visualiser, les

modifier ou les télécharger.

Les utilisateurs peuvent partager des informations d’ordre générale (par exemple, des

documents portant sur l’éducation ou le tourisme), des informations d’ordre personnel (par

exemple, des photos et des profils personnels), ou des émissions en direct (par exemple,

des émissions radio ou télévisé). Les informations à partager sont, généralement,

présentées sous la forme d'un fichier. Dans ce cas, le partage d'information peut être

considéré comme le partage de fichiers. Cette thèse traite, généralement, le problème de

partage de fichiers.

En général, les utilisateurs nomades communiquent en utilisant des réseaux sans fil

fournis par leurs fournisseurs d’accès (3G et bientôt 4G) ou des points d'accès publics

répartis dans la ville. Toutefois, les réseaux à infrastructures ne est pas toujours les plus

appropriés vue (i) leur indisponibilité partielle, par exemple, dans les moyens de transports,

dans la compagne, etc. (ii) leur coût potentiellement élevé en particulier pour le partage des

documents multimédia et (iii) la répartition de leurs points d’accès publics non uniforme.

Ainsi, les réseaux mobiles ad-hoc (MANETs) peuvent être une solution plus efficace dans

les endroits où l'installation d'une infrastructure est impossible. Dans un avenir proche, un

MANET sera plus puissant grâce à l’utilisation de la technologie de Wi-Fi direct.

L'objectif de nos travaux de recherche est de concevoir et d’implémenter un système de

partage d’information dans un environnement ad-hoc. Ce système permet aux utilisateurs

de partager les informations où et quand ils ont l'occasion sur MANET. La thèse se

focalise, particulièrement, sur les challenges liés à la mobilité et aux intérêts des

utilisateurs.

Dans un MANET, le partage de l'information est généralement effectué par la

distribution d’annonces et de requêtes. Afin d’éviter la surcharge de l'environnement avec

des annonces et des requêtes inutiles, il est important de concevoir une politique d’annonce

appropriée. Une politique d’annonce spécifie le volume d'informations à avertir, la période

après laquelle une annonce doit être relancée et le nombre de pairs maximum traversé par

une annonce. Elle doit considérer la consommation et la fourniture de l'information qui

sont liées au temps de connexion des utilisateurs (i.e., le temps qu'ils restent ensemble dans

un MANET) et à leurs contextes. Par conséquence, une politique d’annonce devrait être

paramétrée selon le temps de connexion des utilisateurs et leurs contextes.

Vu la quantité massive d’informations à partager, un contrôle/ filtrage de fichiers est mis

en place pour éviter la surcharge du réseau qui peut empêcher d’aboutir l'activité de

partage. En outre, l’interface minuscule des dispositifs mobiles n’est pas appropriée pour

parcourir tous les fichiers disponibles dans l’environnement. Par conséquent, nous

proposons que les fichiers partageables soient choisis en fonction des intérêts des

utilisateurs.

Dans cette thèse, nous proposons un middleware appelé SAMi pour permettre aux

utilisateurs nomades de partager l'information en fonction de leurs intérêts, les contextes et

leurs temps de connexion. Nous proposons une approche pour paramétrer les politiques

d’annonces en fonction des profils des utilisateurs et de leurs contextes. Le processus de

paramétrage est effectué semi-automatiquement par l'analyse des activités de partage

d’informations.

SAMi classe hiérarchiquement des fichiers et les présente dans une structure appelée une

arborescence de fichiers. Au cours du processus d’annonces, le middleware procède à un

annoncement des fichiers en utilisant soit (i) une description détaillée (situé à un niveau

profond dans l’arborescence des fichiers ou soit (ii) une descrition générale (située à un

niveau peu profond). Cette approche permet à un utilisateur de connaître le potentiel d'un

pair de fournir d'informations sans recevoir d’annonces pour chaque fichier partageables.

Ainsi, la diffusion d'une requête est limitée aux seuls pairs ayant le potentiel de fournir les

fichiers demandés.

Les utilisateurs peuvent spécifier leurs intérêts à recevoir ou à fournir des informations

de manière réactive. Les intérêts des utilisateurs peuvent également être automatiquement

déterminés en utilisant les règles d'associations. Ces règles associent les intérêts des

utilisateurs à leur contexte. Nous proposons également d'utiliser les réseaux sociaux pour

faciliter le processus d'identification d'intérêts.

SAMi a été testé dans deux environnements; un simulé et un autre réel en le déployant

sur des dispositifs mobiles reliés entre eux par Bluetooth. Les évaluations qui ont été faites,

nous ont permis de conclure que SAMi a un très bon potentiel pour aider les utilisateurs

nomades à partager l'information en fonction de leurs intérêts. Nos futurs travaux

importants sont liés à la gestion du contexte et la vie privée des utilisateurs.

Mots-clés: partage des données, sensibilité à la mobilité, sensibilité aux intérêts,

classification de fichiers, réactivité au contexte, informatique mobile, réseaux ad-hoc

Abstract

Mobile peer-to-peer information sharing has become an important research topic due to

the rapid advancement in wireless communication technologies and smart devices.

Information sharing is the practice of making information available for other individuals to

view, modify and download. Users may share general information (e.g, documents about

education and tourism), personal information (e.g, personal photos and profiles), or live

information (e.g., news being transmitted on the radio). The information to be shared is

usually presented in the form of a file. In this case, information sharing can be regarded as

file sharing. This thesis specially focuses on issues related to file sharing.

Nowadays, nomadic users usually communicate by using infrastructure-based wireless

networks provided by wireless telecommunication networks (3G and soon 4G) and public

hotspots distributed in the city. However, infrastructure-based wireless networks are not

always adequate because (i) there are places where no infrastructure-based wireless

network exists; (ii) it is costly to use telecommunication networks especially for

multimedia data and (iii) public hot spots are not uniformly distributed. Thus, an

infrastructure-less or a mobile ad-hoc network (MANET) can provide a more efficient

solution in the places where installing an infrastructure is not possible. In the near future, a

MANET will be more powerful with the usage of Wi-Fi direct.

The focus of our research is to build an information sharing system that allows users to

share information wherever and whenever they get the opportunity by using a MANET.

The thesis particularly focuses on the challenges related to the mobility and the interests of

users.

In a MANET, information sharing is usually performed by distributing advertisements

and queries. The preparation and the distribution of an advertisement are guided by an

advertisement policy. An advertisement policy describes the volume of information to be

advertised, the period after which an advertisement can be repeated and the number of

hops that an advertisement traverses. In order not to overload the environment with

unnecessary advertisements and queries, an advertisement policy should be prepared

according to the information consumptions and provisions of users. The information

consumptions and provisions of users are affected by their stay-time, the time that they stay

together in a MANET. Consequently, an advertisement policy should be parameterized

according to the users’ stay time. The users’ stay time is affected by their mobility patterns,

which are expressed by their speeds, movement directions and pause times.

Furthermore, users have a lot of information to share with each other. If files to be shared

are not controlled, the overloading of information will hinder the sharing activity.

Moreover, the input and the output facilities of mobile phones do not allow nomadic users

to browse all of the sharable files in the vicinity. Therefore, we argue that sharable files

should be selected according the users’ interests.

In this thesis, we propose an advertisement-based middleware called SAMi to allow

nomadic users to share information according to their interests, contexts and stay times.

We propose an information discovery approach, which is used by SAMi, to parameterize

advertisement policies according to users’ profiles and contexts. The parameterization

process is performed semi-automatically by analyzing users’ information sharing

activities.

SAMi classifies files hierarchically and presents them in a file tree. Files are advertised

according to users’ profile and context. During advertisements, the middleware advertises

files by using descriptions at the shallow and depth level of the file tree. This approach

permits a user to know the potentials of a peer in information provision without receiving

advertisements for each sharable-file. Thus, the dissemination of a query is limited only to

those peers having the potential to provide the required file.

Users can specify their interests to receive/provide information reactively. Users’

interests can also be automatically determined by using association rules, which associate

users’ interests with their context. We also propose to use the users’ social networks to

facilitate the interest identification processes.

SAMi has been deployed in a simulated environment. It has also been deployed over real

devices interconnected by Bluetooth. From the evaluations that have been made, we have

observed that SAMi has a very good potential to serve nomadic users to share information

according to their interests. Our important future works are related to context management

and privacy of users.

Keywords: data sharing, mobility awareness, interest awareness, classification of files,

mobile computing, context aware computing, ad-hoc network

Table of Content

Chapter 1 Introduction ......................................................................................................... 5

1.1 Background ............................................................................................................. 5

1.2 Motivation and Requirements ................................................................................. 9

1.3 Research Problem.................................................................................................. 12

1.4 Objective ............................................................................................................... 13

1.5 Research contributions .......................................................................................... 14

1.6 Structure of the Thesis........................................................................................... 14

Chapter 2 Related Work..................................................................................................... 17

2.1 Information Sharing in Peer to Peer Systems........................................................ 18

2.2 Information sharing in MANET............................................................................ 28

2.3 Service Discovery ................................................................................................. 40

2.4 Routing.................................................................................................................. 45

2.5 Summary ............................................................................................................... 51

Chapter 3 Interest Awareness ............................................................................................ 53

3.1 Motivation ............................................................................................................. 54

3.2 Definitions............................................................................................................. 56

3.3 Interest aware Information Discovery................................................................... 65

3.4 Interest Identification ............................................................................................ 69

3.5 Social Networking................................................................................................. 78

3.6 Discussion ............................................................................................................. 81

3.7 Conclusion............................................................................................................. 83

Chapter 4 Lifetime Awareness........................................................................................... 85

4.1 Overview ............................................................................................................... 86

4.2 Formalization ........................................................................................................ 87

4.3 Mobility Class Generation .................................................................................... 91

4.4 Mobility class Identification................................................................................ 101

4.5 Conclusion........................................................................................................... 104

Chapter 5 File classification and Organization................................................................ 105

5.1 Motivation ........................................................................................................... 106

5.2 Information Representation ................................................................................. 108

5.3 Classification Algorithm ..................................................................................... 111

5.4 Information Sharing Based on File Organization................................................ 119

5.5 Discussion............................................................................................................ 125

5.6 Conclusion........................................................................................................... 126

Chapter 6 Implementation and Evaluation....................................................................... 127

6.1 SAMi: a Self-Adaptive Middleware.................................................................... 128

6.2 Implementation.................................................................................................... 144

6.3 Evaluation............................................................................................................ 150

6.4 Discussion............................................................................................................ 165

6.5 Conclusion........................................................................................................... 166

Chapter 7 Conclusion and Future Work........................................................................... 169

7.1 Summary of Contributions .................................................................................. 170

7.2 Conclusion........................................................................................................... 171

7.3 Future Work......................................................................................................... 173

Glossary of Acronyms......................................................................................................... 175

Bibliography ........................................................................................................................ 176

Annex A. Résumé Etendu................................................................................................. i

Annex B. Detailed Design of SAMi ...........................................................................xviii

Annex C. Important classes of SAMi .........................................................................xxix

List of figures Figure 2-1: System Layers of P2P Information Management Systems............................... 18

Figure 2-2: Transiently Shared Tuple space........................................................................ 30

Figure 2-3: An example of the global virtual data structure managed by PeerWare .......... 32

Figure 3-1: A MANET in Bus 37........................................................................................ 55

Figure 3-2: Advertisement Distribution by p1 ..................................................................... 68

Figure 4-1. Augment-Volume ............................................................................................. 97

Figure 4-2. Reduce-Volume ................................................................................................ 97

Figure 4-3: An example of stay-time computation............................................................ 102

Figure 5-1: Query resolution via advertisements about individual files ........................... 107

Figure 5-2: File organization and Query resolution .......................................................... 107

Figure 5-3: Example of specialized metadata of a photo .................................................. 108

Figure 5-4: An example metadata of a cluster................................................................... 110

Figure 5-5: Vector representation of a cluster ................................................................... 111

Figure 5-6: An example of association between a file-tree with mobility classes. ........... 116

Figure 5-7: The redundancy created by considering all mobility classes.......................... 117

Figure 5-8: A possible result of k-means classification .................................................... 126

Figure 6-1: Architecture of SAMi ..................................................................................... 129

Figure 6-2: Examples of user agenda and habits............................................................... 132

Figure 6-3: Context management in SAMi ....................................................................... 134

Figure 6-4 : Example of adaptation process ...................................................................... 138

Figure 6-5: Implementation of ConAMi by the file adaptation module............................ 139

Figure 6-6: SAMi deployement......................................................................................... 141

Figure 6-7: Component diagram of SAMi ........................................................................ 142

Figure 6-8: Core classes and their relationships................................................................ 143

Figure 6-9: Class diagram to manage historical data ........................................................ 144

Figure 6-10: Classes for information classification........................................................... 144

Figure 6-11: A Test bed to simulate a MANET ................................................................ 145

Figure 6-12: Examples of representation of metadata in local repository......................... 147

Figure 6-13: Browsing photo by their directory organization........................................... 148

Figure 6-14: Browsing photos by their organization in a file-tree.................................... 149

Figure 6-15: Querying....................................................................................................... 149

Figure 6-16: collaboration during photo annotation ......................................................... 150

Figure 6-17: Deliverability of files for experiment one .................................................... 153

Figure 6-18: Deliverability of files for experiment two.................................................... 153

Figure 6-19: Performance of interest extraction algorithm............................................... 155

Figure 6-20: Rules to identify information demand.......................................................... 156

Figure 6-21: Rules to identify mobility classes................................................................. 158

Figure 6-22: Content based classification performance.................................................... 161

Figure 6-23: Metadata based classification performance.................................................. 161

Figure 6-24: Vector production in the PC......................................................................... 162

Figure 6-25: Performance metadata based classification.................................................. 163

Figure 6-26: Advertisement content determination in a mobile phone............................. 165

Figure 27 : Architecture de SAMi........................................................................................ xi

Figure B-1: State of a device............................................................................................ xviii

Figure B-2: States of a user ................................................................................................ xix

Figure B-3: Activity diagram of advertisement .................................................................. xx

Figure B-4: Searching information for a user query .......................................................... xxi

Figure B-5: Activity diagram of information extraction................................................... xxii

Figure B-6: Activity diagram of query treatment.............................................................. xxii

Figure B-7: File searching from incoming advertisement ............................................... xxiii

Figure B-8: Activity diagram of rule mining ................................................................... xxiv

Figure B-9: Activity diagram file representation and classification ................................ xxiv

Figure B-10: A SAMi-Adapotor yahoo messenger........................................................... xxv

Figure B-11: SAMi-thin ................................................................................................... xxvi

Figure B-12: SAMi GUI .................................................................................................. xxvi

Figure B-13: Classes in SAMi-GUI ................................................................................ xxvii

Figure B-14: SAMi-core ................................................................................................ xxviii

Figure B-15: SAMi-ext .................................................................................................. xxviii

List of Tables Table 2-1: Analysis of P2P systems .................................................................................... 27 Table 2-2: Summary of the information sharing systems designed for MANETs.............. 36 Table 2-3: Analyzes of information sharing system of MANETs....................................... 38 Table 2-4: Analyzes of service discovery protocols............................................................ 45 Table 3-1: Examples of sharing contexts ............................................................................ 59 Table 3-2: Examples of Information demands of Pascal..................................................... 60 Table 3-3: Examples of queries........................................................................................... 70 Table 3-4: Similarity values calculated by using the formula presented in Definition 3.2 . 72 Table 3-5: Example of execution flows during the decomposition of queries.................... 73 Table 3-6: Interests produced from queries listed in ........................................................... 74 Table 3-7: Historical data of Pascal .................................................................................... 77 Table 3-8: Tie-Strengths between Pascal, Anne, Bob and Eve ........................................... 79 Table 4-1: example of a mobility class................................................................................ 90 Table 4-2: Examples of sharing statistics ............................................................................ 92 Table 4-3: Range-lifetimes and advertisement volumes of classes..................................... 92 Table 4-4: Merging of mobility classes............................................................................... 94 Table 4-5: Pascal’s sharing statistics ................................................................................. 104 Table 5-1: Basic description of a file photo ...................................................................... 108 Table 5-2: Description of a cluster .................................................................................... 109 Table 6-1: The inputs of the test-bed................................................................................. 151 Table 6-2: Types of Environments .................................................................................... 152 Table 6-3: Constants for the query extraction algorithm................................................... 154 Table 6-4: Characteristics of information demands .......................................................... 155 Table 6-5: Characteristics of sharing-statistics.................................................................. 157 Table 6-6: Constants considered during rule mining evaluation....................................... 157 Table 6-7: Range-lifetimes of mobility classes designed for sharing context (“”,∅) ....... 158 Table 6-8: Parameters used classification algorithm in the first experimentation ............ 160 Table 6-9: Inputs for classification algorithm for the second type of experimentation .... 162 Table 6-10: Test data used during filtering advertisements .............................................. 164 Table 7-1: Comparing SAMi to existing information sharing systems............................. 172 Table B-1: Important activities to perform advertisement .................................................. xx Table B-2: Activities to extract and search information...................................................xxiii

Chapter 1 : Introduction

5

Chapter 1 Introduction

1.1 Background

1.1.1 Information Sharing

Information sharing is the practice of making information available for other individuals

to view, modify and download. Users may share general information like documents about

education and tourism. They may also share personal information like personal photos and

profiles. It is also possible to exchange live information like the one being transmitted on

the radio.

The information to be shared is often presented in the form of a file. In this case,

information sharing can be regarded as file sharing. The information to be shared can also

be presented as a stream of data. However, as the most popular information sharing

applications are based on file sharing, this thesis specially focuses on issues related to file

sharing.

Information sharing is accomplished via three activities: information discovery, delivery

and routing. These activities can be managed by using a centralized, a partially centralized

or a purely distributed architecture. In a centralized architecture, dedicated server(s)

manage(s) the sharing activities. In a partially centralized architecture, one or more

administrator peers are responsible for managing the information sharing activities. These

administrator peers can hold dedicated or non-dedicated devices. In a purely distributed

architecture, all peers are equal and they share responsibilities equally. In this thesis, we

consider a purely distributed architecture since finding administrator peers is difficult in a

MANET.

Finally, an information sharing system can be anonymous or social network based. In

anonymous systems, information sharing is performed without considering users’


6

acquaintances. This feature characterizes old file sharing systems. Social network based

systems allow users to share information according to their social relationships. Especially

in MANETs where resources are limited, exploitation of social networking can facilitate

the collaboration of users.

1.1.2 Mobile Ad-hoc Networks

The initial step towards a MANET was the Packet Radio Network (PRNET). The

architecture of PRNET was quite close to the current view of a MANET. Indeed, a PRNET

comprises mobile terminals and mobile repeaters (prefiguring mobile routers). During the

1990s, a number of projects that were inspired by PRNET led to the development of ad-

hoc routing algorithms, and eventually led to the creation of the IETF MANET group. This

group focused mainly on routing algorithms with various goals but evolved to a broader

research scope. These days, various applications/services can be implemented on MANETs

Users, who are opportunistically co-located in places like airports, train stations, coffee

shops, pubs, malls, and highways, can use MANETs to share information instantly. Ad-hoc

networks can also be used for entertainment purposes like providing instant connectivity

for multi-user games.

Ad-hoc networks can be deployed to provide solutions to emergency services when the

existing network infrastructure ceased to operate or they were damaged due to some kind

of disaster like earthquakes, hurricanes, fire, and so on. Similarly, in a battlefield, a

MANET can be deployed to facilitate communications among the soldiers involved in the

field.

The following features [1, 2] characterize a MANET:

1. Mobility of nodes: The movement of peers cannot be controlled in a MANET.

Peers can move from location to location freely and hence, can leave and join the

network at anytime.

2. Lack of infrastructure: As the name implies, a MANET is an infrastructure-less

network. A message from a source peer to a destination peer goes through


7

multiple peers due to the limited transmission radius. As there is no centralized

control, the network management should be distributed across peers.

3. Scarce resources: Wireless links have limited bandwidth and variable capacity.

In particular, peers participating in a MANET are battery-powered.

In summary, MANETs can provide solutions in situations where infrastructure-based

networks cannot be accessed due to their non-availability or cost. They can also be applied

to efficiently established communications between co-located users. However, the

characteristics of MANETs, i.e., mobility of peers, lack of infrastructure and scarcity of

computing resources create challenges on the usage of MANETs. Thus, in this thesis, our

goal is to design an information sharing middleware that works by considering these

characteristics of MANETs.

1.1.3 Advancement in Mobile Phones and Communication Technologies

Production of mobile devices, mostly cell phones, is increasing in an exponential

manner. The number of subscriptions reached 3.3 billion worldwide in October 2008.

Moreover, it is forecasted to be 5.32 billion by 2013 [3].

Mobile devices have become capable to store a number of files and to perform complex

computations that were only processed by personal computers. They are equipped with

wireless network technologies, sensors and applications; their storage capacities are

increasing each passing day; the processing power of mobile devices has been dramatically

improved. Cell phones’ battery life is also in a continuous improvement. Today, there are

devices that can serve more than 8 hours in active mode, i.e., talking without interruptions

[4].

The introduction of iPhone [5] drastically changed people’s view on cell phones. A lot of

applications and games have been produced for iPhones. These days, people are using their

iPhones to access emails and social network sites such as Facebook.


8

Thesedays, most of the mobile phones are equipped with short range wireless

communication technologies. In most cases, either Bluetooth or WiFi technology is

integrated [6] with them.

Bluetooth [12] allows devices to communicate over short distances at moderately fast

transmission speeds. Bluetooth provides a wireless point-to-point network for PDAs,

notebooks, printers, mobile phones, audio components, and other devices. The standard

frequency band for Bluetooth is in the 2.400 GHz to 2.483 GHz (83 MHz). Typically,

devices with Bluetooth technology have a range of 10 meters to 100 meters, and data

transfer rates up to 3Mbps. One or more Bluetooth enabled devices forms so called a

piconet. In a Bluetooth piconet, one master can communicate up to 7 active slaves, while

there can be some other up to 248 devices which are in sleep mode (they may participate to

communication actively when an active device goes into sleep mode). Multiple

independent piconets can form a scatternet. In a scatternet, some slaves are used as a

bridge by participating two or more piconets. In Bluetooth scatternets, the number of

devices is not limited.

In 1997, IEEE ratified the 802.11 WLAN standards, establishing a global standard for

implementing and deploying WLANs. IEEE 802.11, which is currently obsolete, had a

throughput of 2 Mbps. Today's WiFi devices, based on IEEE 802.11a and 802.11g, provide

transmission rates up to 54 Mbps [7]. A new standard called IEEE 802.11n [7] that can

support up to 600 Mbps is being standardized. Wi-Fi devices communicate with each other

with the help of a controller-device known as a wireless access point or "hot spot". Hot

spots usually combine three primary functions; physical support for interfacing wireless

and wired networking, routing between devices on the network and service provisioning to

add and remove devices from the network. The Wi-Fi Alliance is nearing completion of a

new specification, named Wi-Fi Direct, to enable Wi-Fi devices to connect to one another

without wireless access points [8]. It allows devices equipped with Wi-fi communication

technology (IEEE 802.11a, 802.11g or IEEE 802.11n) to get involved in an ad-hoc

network by embeding a software access point into these devices.


9

ZigBee is a low-power, low-cost, low-rate, short-range wireless technology. It is built on

top of the IEEE 802.15.4 WPAN standard [9]. ZigBee radio operates within three different

frequency ranges, 868MHz, 915MHz, and 2.4GHz, and supports data rates of 250kbps

[10]. ZigBee protocols are intended for use in embedded applications requiring low data

rates and low power consumptions. ZigBee's current focus is to define a general-purpose,

inexpensive, self-organizing mesh network that can be used for industrial control,

embedded sensing, medical data collection, smoke and intruder warning, building

automation, home automation, etc.

The maturities of communication and computing technologies indicate the feasibility of

MANETs to allow mobile devices to communicate with each other anywhere and anytime.

Thus, in our thesis, we give more emphasis to mobile phones. We do not consider any

specific communication technology in our information sharing middleware. However,

Bluetooth is considered during the evaluation of the middleware.

1.2 Motivation and Requirements

1.2.1 Scenario

The following scenario will be used to discuss the requirements of an information

sharing system in MANETs. We will also use this scenario to discuss our propositions

through out the thesis.

Pascal, a first year Ph.D. student at INSA, uses MANETs to exchange information in

different locations. In a bus, his PDA connects with devices of fellow passengers via

wireless network technologies. Passengers advertise sharable files to others in their

surrounding. Pascal usually browses the advertisements that he has received in order to

discover the files that he is looking for. If he does not find the files that he needs, he

formulates queries expressing these files. The required-files are, then, searched by

querying the neighborhood.

http://en.wikipedia.org/wiki/Data_rate

http://en.wikipedia.org/wiki/Data_rate

http://en.wikipedia.org/wiki/Power_consumption

http://en.wikipedia.org/wiki/Mesh_network

http://en.wikipedia.org/wiki/Home_automation


10

In his office, Pascal searches documents that helps him to inforce his research in the

Internet. When he leaves the office, unresolved queries are transferred to his cell phone.

During a lunch at the university restaurant and while taking a coffee with colleagues in the

university cafeteria, documents matching with the queries are searched in the

neighborhood.

Pascal makes brainstorming with colleagues and professors in their laboratory. In this

laboratory, there is a habit of making discussions in a park during the summer and at a

restaurant during the winter. Most of the participants use a laptop to take notes. Pascal

has also an obligation to take some courses at INSA. He notes doubts and questions during

the brainstorming session and lectures. Form these doubts and questions, queries are

prepared and searched in the neighborhood. When the documents matching with the

queries are found in the neighborhood, they are downloaded and saved temporarily until

Pascal approves the download.

On weekends, Pascal likes shopping; watching football matches and has the habit to go

to nightclubs with his friends. In a shop, Pascal exchange information about goods. In

nightclubs, Pascal and his friends take photos and share them with each other. Pascal is a

supporter of the football team “Olympique Lyonnais”. Whenever he goes to a stadium, he

exchanges information about Olympique Lyonnais’ players and matches with other

supporters.

1.2.2 Requirements

An information sharing middleware should fulfill the requirements listed below. We

discuss the requirements by using the scenario.

Pervasiveness: nomadic users should be allowed to share information in anywhere, at

anytime and by using any device. Pascal exchanges information in the morning, during

night, at midday etc. In the scenario, the middleware works in different places like a

nightclub, a restaurant, a cafeteria and a stadium. Laptops, mobile phones and PDAs can be

involved in the information sharing process.


11

Mobility-awareness: the dynamicity of the environment, which is described by the

mobility of users, determines the quantity of information to be presented. For instance, it is

not necessary to present all the files stored in Pascal’s device to the users in a shop since

they do not have time to check and/or download all the presented files.

Interest-awareness: users’ interests vary with the context of their environment. Pascal is

interested to share academic information at school and information about Olympique

Lyonnais’ players and matches at a stadium. He does not have the same interest when he is

with his friends and with his colleagues.

High-level semantics: it consumes a lot of bandwidth and energy to advertise

descriptions of sharable files one by one. It is important to classify files in order to present

them in groups. For instance, Pascal’s photos can be categorized as photos taken at a

nightclub, in the campus, with fellow students and so on.

Context-aware content delivery: the content delivery protocol should be performed

depending on the context of users and their environment. Online and offline delivery can

be applied according to the dynamicity of the environment. Delivery can be performed in

offline mode, via email for example, if the time that the source and requester peers stay

connected is too small to download the requested file. Distributed content delivery can be

applied if there are several sources of information.

Social awareness: information sharing should be conducted according to social

relationships and networking of users. For instance, photos taken by Pascal at a nightclub

should not be proposed to his co-workers but only to his best friends.

Data dissemination: unlike traditional networks, there are no dedicated routing

infrastructures to disseminate advertisements and queries; this is also true for routing

requests, data and responses. Therefore, efficient routing and dissemination algorithms are

needed in order to share information in a MANET efficiently.


12

1.3 Research Problem

Despite the maturity of the technologies, information sharing over mobile Internet does

not function as nomadic users expect it due to the following main problem:

• Accessing the Internet from a cell phone (via GSM networks) is still expensive,

especially to exchange multimedia data and in the developing countries,

• Public hotspots, directory, routing and other important services required for

information exchange are not uniformly distributed.

To resolve the above problem, MANETs can be used in places where Internet cannot be

used due to the non-availability of infrastructure-based network or due to their cost.

Moreover, the realization of Wifi-Direct will dramatically improve the utilities of

MANETs.

This thesis aims to propose an information sharing middleware for MANETs.

Information sharing is a popular and a matured domain of research. It has been treated

since the invention of the computer networking. However, information sharing is getting

another dimension because of the characteristics of MANETs and the nomadic users. In

this new dimension, information sharing systems should overcome challenges coming

from mobile devices and wireless network technologies. This thesis particularly focuses on

the following two main challenges of information sharing.

Mobility: In a MANET, information sharing is usually performed via the distribution of

advertisements and queries. In order not to overload the environment with unnecessary

advertisement and queries, an advertisement policy should be designed according to the

information consumptions and provisions of users. Information consumption and

production of users are limited by users’ stay-time (the time that they stay together). Users

stay time is affected by their mobility patterns, which depend on their speeds, movement

directions and pause times. Therefore, information sharing middleware should be mobility

aware in order to overcome the challenges coming from the mobility of users. We define

mobility awareness as the designing of the advertisement policy according to the


13

dynamicity of the network. We use the users’ stay time to measure the dynamicity of the

network.

Interests: Users have a lot of information to share with each other. If files to be shared

are not controlled, the overloading of information will hinder the use of the information

sharing middleware. Thus, a pervasive information sharing system should be interest

aware. We define interest awareness as the ability to adapt the information discovery

approach according to the users’ interests. In other words, the system needs to capture

users’ interests and determines the information to be shared accordingly.

1.4 Objective

The main objective of the thesis is to design a middleware that allows co-located users to

exchange information anywhere and anytime by using MANETs according to their

interests and stay-time (the time they stay together).

The following are our specific objectives:

(1). Design a theoretical model that

• determines sharable files and queries according to the interests of users to

receive and provide information.

• disseminates queries and advertisements according to the users’ interests.

• determines the volume, the radius and the period of advertisements

according to the users’ context and their stay-time.

• identifies the users’ interests to receive and provide information according

to their context and their social networks.

• classifies files hierarchically in such a way that information discovery is

facilitated and simplified.

(2). Designing and implementing an information sharing middleware that

implements the theoretical model specified by (1) and satisfies the

requirements stated in section 1.2.2.


14

1.5 Research contributions

The main contributions of this research work are:

Interests Awareness: In this thesis, we introduce a concept called Interest that expresses

the information that a user wants to receive or provide. We formalize the concept interest

as well as its usage in information sharing.

Lifetime awareness: We introduce a concept called mobility class that we use to describe

MANETs according to the users’ stay time and their context in such a way that similar

advertisement policies are applied in MANETs described by the same mobility class.

Formalization, computation and application of mobility classes are discussed in this thesis.

Classification of files: We propose that files are hierarchically classified into a “file tree”.

The dimension of a file tree, i.e., the height of the tree and the number of clusters in each

depth, are computed based on mobility classes in consideration. The formation and usage

of file trees are discussed in this thesis.

Adaptable information sharing approach: SAMi (Self Adaptive Middleware) integrates

an adaptable information sharing approach. In SAMi, the delivery of information is

performed from one or more information sources in order to minimize the delivery time

and maximize the chance of obtaining the required information. SAMi can carry out file

discovery by a “push” discovery approach, a “pull” discovery approach or hybrid of the

two. In the “push” approach, a data source make others aware of his sharable files by

disseminating advertisements; in the “pull” approach, a requester peer searches the

source(s) of a file by distributing queries.

1.6 Structure of the Thesis

This thesis is organized into seven chapters. In this chapter, we have discussed the

motivation behind our research work, our research objectives and the core research issues

of the thesis. Related works on research efforts in data routing, service discovery and

information sharing are discussed and analyzed in chapter 2. Interest-awareness and


15

lifetime-awareness are discussed in chapter 3 and 4 respectively. File representation and

organization are covered in chapter 5. SAMi, a middleware that implements the

proposition in the thesis, is discussed in chapter 6. In this chapter, we also present the

implementation and the evaluation of the middleware. Finally, we summarize our research

contributions and highlight future work in chapter 7. Abbreviations used in the thesis with

their descriptions are listed in the glossary. Finally, a detailed view of the design of the

middleware is provided in Annex A and Annex B.

17

Chapter 2 Related Work

Most of the research efforts in MANETs have been employed for solving problems

related to information dissemination/routing. In the last decade, an increasing research

attraction has also been observed towards information sharing and service discovery. As

MANETs are peer-to-peer (P2P) networks by nature, efforts done towards information

sharing in traditional P2P networks can be a base to design information sharing system for

MANETs. In this chapter, we review and analyze data routing protocols, information

sharing systems and service discovery protocols designed for MANETs and traditional

peer-to-peer networks.

This chapter is organized as follows. Section 2.1 discusses the information sharing

systems that have been proposed in traditional peer-to-peer systems and analyzes the

possibility of their adoption in MANETs. Section 2.2 presents information sharing systems

designed for MANETs. Section 2.3 discusses service discovery protocols with respect to

the objective of the thesis. Section 2.4 presents routing protocols. Finally, we conclude the

chapter by summarizing the important contributions of the reviewed approaches in section

2.5.

Chapter 2: Related Work

18

2.1 Information Sharing in Peer to Peer Systems

Peer to peer (P2P) systems are designed to facilitate the sharing of computing resources

and data by direct involvement of participants, which are usually refered peers/end-peers.

P2P systems are highly linked with information sharing though they can also be used to

share other type of resources as CPU cycles and storage spaces [11].

Peers are in most cases personal computers. They are autonomous and are assumed to

have equal chance to participate in the consumption and the provision of resources.

As displayed in Figure 2-1, P2P systems are structured in three layers [12]:

infrastructure, application and user. The infrastructure layer focuses on the construction of

virtual/overlay networks, which is a computer network built on top of the physical

network. The application layer enables communication and collaboration of entities in the

absence of a centralized control. The user layer is used to facilitate social interaction

among users.

Layer 3: User

Layer 2: Application

Layer 1: Infrastructure

Figure 2-1: System Layers of P2P Information Management Systems

According to the links of peers in the overlay network, which is specified by the

infrastructure layer, peer-to-peer networks are classified [13] as unstructured, structured,

and loosely structured. In unstructured P2P networks, the overlay links are established

arbitrarily. Oppositely, in order to resolve queries efficiently, the topologies of structured

P2P networks are tightly controlled. In such networks, contents and peers are placed

systematically in the overlay network by using a hash function. Finally, loosely structured


19

P2P networks are similar to unstructured P2P networks with respect to the link of peers

and are similar to structured P2P networks with respect to the placements of files.

According to the communication and the collaboration of entities in the network, the

topologies of an unstructured peer-to-peer system can be centralized, pure and hybrid.

In centralized P2P systems, there is a division of the content and the description of the

content. The description of the content is stored in a centralize server while the content is

stored at the end peer level. The central server performs file localization but file delivery is

performed in a peer-to-peer manner.

Unstructured pure peer-to-peer systems have a decentralized topology. In such systems,

there is no central directory server. Indices of shared files are stored locally among all

peers. A requester peer is responsible to search files that it is looking for.

Unstructured Hybrid peer-to-peer systems possess features from both centralized and

pure P2P systems. In such systems, some of the peers are used to keep the indices of files

owned by other devices. Such peers are called super peers. Super peers are responsible to

perform file localization.

Finally, according to the social interactions of users, peer-to-peer systems can be further

classified as anonymous peer-to-peer systems and social network based peer-to-peer

systems. Anonymous peer-to-peer systems are designed for users who do not (are not

interested to) know each other. Social network based peer-to-peer systems are designed for

users having social relationships with each other.

This section is organized as follows. Centralized, pure and hybrid unstructured peer-to-

peer systems are presented in sections 2.1.1, 2.1.2 and 2.1.3 and section 2.1.4 respectively.

Section 2.1.5 discusses structured and loosely structured peer-to-peer systems. Section

2.1.6 discusses social network based peer-to-peer systems. Finally, section 2.2.7 compares,

analyzes and summarizes the studied peer-to-peer systems.


20

2.1.1 Unstructured centralized peer to peer systems

The topology of a centralized P2P system is very much similar to traditional client/server

model. Napster, which allowed users to exchange songs located in their respective

computers that are not fulltime servers, is an example of a centralized P2P system [14].

The central server is a fundamental entity of any centralized P2P system. The central

server is used to manage the files of the end peers. An end-peer logs onto the system by

informing the central server its IP address and the indices of the files that it is willing to

share. The central server maintains a directory of the end-peers. This directory is updated

every time users logon/logoff.

The server is also responsible for searching files on behalf of the peers [15]. A peer

contacts the server when it needs a file. The server checks its directory and then it sends

the requester-peer with the list of addresses of the peers owning the required file.

Afterwards, the file is downloaded from a peer of the list selected by the requester.

Centralized P2P systems were the first ones to be used in the Internet but recently they

have been mainly used for non-file-sharing systems such as SETI@Home, BOINC and

Skype. Similarly, it is very difficult to adopt centralized P2P systems for MANETs, as they

require a dedicated server. The only possible way to use centralized P2P is by replacing the

central server with an ordinary peer by some election process that takes into consideration

the devices’ processing power and battery life as well as the time that the peer stays in the

vicinity. However, there are MANETs populated only by thin devices like cell phones,

PDAs and pagers. Even if it is sometimes possible to find heavy weighted devices like

laptops in MANETs, they are usually battery powered as well.

2.1.2 Unstructured pure peer to peer systems

A pure P2P system seeks to avoid the central server used in centralized P2P systems. In

such a system, all the functionalities of the server are distributed to end peers. Gnutella is

an example of pure centralized P2P system [16].


21

Each Gnutella peer has a direct connection to a small number of other Gnutella peers,

typically around four [16, 17]. A peer can, however, be connected with more peers with the

help of intermediate peers. The number of intermediate peers between two peers are used

to define the number hops. If the number of intermediate peers is n, the number of hops is

n+1.

When a peer searches for a file, it sends a query to each directly connected peer (one hop

peer). Upon the reception of the query, peers forward the query to their neighbors, which in

turn forward the query, and so on, until the query packet reaches a predetermined number

of hops counted from the requester-peer.

When a file matching the query is found, information about the file is routed back to the

requester peer [16]. As in a centralized P2P system, the requester peer then decides from

which peer it will download the file; finally, the transfer takes place directly between the

requester peer and the selected peer owning the file.

Purely decentralized systems are relatively suitable for MANETs. However, they are not

scalable due to arbitrary forwarding of queries. The scalability of the system is deteriorated

in a MANET since devices are battery powered and the bandwidth is limited.

2.1.3 Unstructured hybrid peer to peer systems

Hybrid peer-to-peer systems have been proposed as a solution to the scalability problem

faced by both centralized and decentralized systems. They have properties of centralized

and decentralized systems. As they combine the sharing activities in similar manner, we

describe the hybrid P2P systems by using FastTrack [18], OpenFT [19], JXTA [20] and

eDonkey [11]. FastTrack is the file sharing protocol used by Kazaa [21]. OpenFT (Open

FastTrack) is the file sharing protocol developed by the giFT project. JXTA (Juxtapose) is

a peer-to-peer platform specification initiated by Sun Microsystems in 2001. eDonkey was

a system used to facilitate sharing of files by using a number of servers.

In hybrid architectures, there are two types of peers. Super peers, named search peers in

openFT and rendezvous peer in JXTA, are responsible for searching files on behalf of


22

ordinary peers. In the most of the hybrid peer-to-peer systems, super peers can be ordinary

peers that enter and leave the peer-to-peer network as they want. Dedicated (i.e., static)

super peers are usually used to monitor and keep track of the network. Bootstrapping peers

in FastTrack [18] and index peer in OpenFT [19] are dedicated peers. In eDonkey, all of

the super peers were dedicated servers.

FastTrack and OpenFT use Gnutella as background for file localization [25, 26].

Localization of a file is accomplished through the broadcasting of queries between super

peers. When an ordinary peer prepares a query, it sends the query to a super peer to which

the ordinary peer is connected; the super peer will in turn broadcast the same query to all

other super peers to which it is currently connected; these super peers will forward the

query to other super peers to which they are connected. This process is repeated a fixed

number of times. Super peers are also responsible to gather results of the query and to

transfer the results back to the requester.

JXTA provides a peer-to-peer infrastructure over which other peer-to-peer applications

can be built. JXTA provide protocols to all peers to (1) organize themselves in peer groups,

(2) discovery each other, (3) communicate with each other and (3) monitor each other.

In eDonkey, a protocol named BitTorrent [23] can be used to download a file. BitTorrent

[23] works as follows. Originally, the file to be distributed is available from one server,

called seed. In addition to the seed, there is a tracker server which keeps track of all the

clients of the file in the network. A client which wants to download the file needs to get the

so-called “torrent-file” which contains metadata about the file and the address of the

tracker for that file. The client, then, contacts the tracker and receives a list of peers which

are currently downloading that file or have already downloaded it. The client then selects

some peers from this peer set and starts downloading the file block by block from the

selected peers.

All the discussed hybrid P2P systems consider traditional computing devices. There are,

however, some efforts that have been done to include Java-ME enabled devices in JXTA

networks [26, 27]. JXTA for Java Me, which is called also JXME, allows a Java ME

compliant device to participate in the JXTA network. The mobile-information-device-


23

profile (MIDP) of JAVA-ME provides the needed APIs that are already recognized by the

Java development community, and thus acts as a firm foundation for the creation of

wireless JXTA peers. However, MIDP has too many constraints to fully implement JXTA:

limited libraries, lack of an XML parser and no security support. As a result, J2ME peers

can only act as edge peers.

The large majority of file sharing systems of the Internet are based on the hybrid

unstructured P2P architecture. However, adopting hybrid architecture in MANETs is

difficult since devices should be grouped. In MANETs, the only possible way of grouping

should be geographical based, i.e., devices that are located in the same position can be

classified together. Such techniques are attached with two important challenges: the

elections of group leaders and the communications between them.

2.1.4 Structured P2P systems

Structured P2P systems such as CAN [26], Chord [27], Pastry [28], and Tapestry [29, 30]

manage peer nodes with a logical structure that is formed by using the concept of

distributed hash tables. Distributed hash tables (DHT) are based on the same idea as

conventional hash tables. Each peer has a unique identifier which is calculated, by using a

hash function, from some properties of the peer (e.g., IP address). Similarly, each sharable

file is mapped to the same hash space, for example, by calculating the hash value of the

file’s name. A peer is responsible for files that are mapped “near” to its place in the hash

space. The definition of the metric for proximity (i.e., what does “near” means) depends on

the actual implementation of the DHT.

On top of such a communication network, high level protocols can be deployed, for

example, service discovery, multicast communication, information retrieval, and file

sharing

Structured peer-to-peer systems are more scalable than unstructured systems. In

structured peer-to-peer systems, information discovery is a deterministic process.

However, these important properties (i.e., scalability and deterministic information


24

discovery) are achieved due to the arrangement of peers in an overlay network. The state of

the art systems over structured peer-to-peer networks are too complex to be adopted in a

MANET. In these systems, the physical location of peers is not considered by the hash

function.

2.1.5 Loosely structured peer to peer networks

Loosely structured P2P systems systematically organize files in the network. In our

knowledge, Freenet [31, 32] is the only example of such system. Freenet is a file sharing

system that allows peers to publish, replicate, and retrieve files in the peer-to-peer network.

Files, in Freenet, are represented by binary file-keys obtained by applying a hash

function [32]. Each peer keeps a local table that contains the list of the keys of the files

stored in neighboring peers. When a peer adds a file to the Freenet network, a key is

assigned to it. The file is forwarded to the neighboring peers. Peers receiving the file make

further forwarding. The forwarding process continues until a suitable location is found for

the file or it is forwarded for a specified number of times. The key of a file is used to

determine a suitable location to store a file. For a file having a key value k, a peer provides

a suitable storage location for the file if the peer has files having keys similar to k.

When a peer searches for a file, it sends a query to a neighbor that it thinks closest to the

target. This neighbor peer forwards the query in the same way if it does not have a file

matching the query. The query forwarding process is repeated a fixed number of times.

When a file matching with the query is found, it is replicated over the search path.

Replicated files are deleted only if a peer has run out of space. The deletion of files is

performed in probabilistic manner; rarely used files have a greater chance to be deleted

than popular ones.

Hash function based file indexing is an important contribution of Freenet. In a MANET,

this kind of indexing can be used to facilitate the search process. However, in MANETs,

peers should not be selected arbitrarily to store a file. Thus, to adopt Freenet in a MANET,


25

the hash function should be modified to consider the stay time of the devices, storage

capacities and battery powers.

2.1.6 Social Network based P2P system

A social network is a social structure between participants who are connected through

various social relationships [33]. Web based social networking occurs through a variety of

websites, which are usually called social network sites. Danah M. Boyd [34] defines a

social network site (SNS) as: “a web-based service that allows individuals to (1) construct

a public or semi-public profile within a bounded system, (2) articulate a list of other users

with whom they share a connection, and (3) view and traverse their list of connections and

those made by others within the system”.

In SNSs, users identify their relationship with others. The label for these relationships

differs depending on the site predefined terms e.g., "Friends," "Contacts," and "Fans.". A

social network site (SNS) allows users to share content, interact with each other and

develop communities around similar interests.

There are mobile-specific SNSs (e.g., Dodgebal [35, 36]), and some web-based SNSs

support limited mobile interactions (e.g., MySpace [37] and Facebook [38]). Mobile

specific SNSs work by exchanging SMSs.

Dodgeball is a commercial system that delivers SMS messages to users to alert them

about nearby friends [35]. Dodgeball does not use any location system such as GPS or cell

connection. When a user wants to communicate with near by friends, he/she sends a SMS

message, which specifies the location of the user, to the Dodgeball system. Dodgelball

distributes the message to all this/her pre-selected friends, as well as any friends of friends

within a ten-block radius. Today, Dodgeball is available in 22 cities within the Unites

States including New York, San Francisco, Los Angeles, Chicago, Washington DC,

Boston and Seattle [36].


26

ImaHima is a location-specific application created in 1999 by a Japanese company

named ImaHima [39]. This system allows users to share their current personal status (such

as location, activity or mood) and pictures with near by friends [35].

In summary, the exploitation of social networking of a user makes SNSs peer-to-peer

networks to over shine over traditional ones. However, social network based P2P systems

are at their early stage. Remember that a peer-to-peer system has three layers: user,

application and overlay. Currently, all of SNSs are peer to peer only at the user level. In

addition, in these systems, users have to perform information sharing manually.

2.1.7 Summary

Peer-to-peer information sharing systems concentrate on (1) organization of files and (2)

the organization and interconnection of peers in the overlay network. However, these

systems do not take into consideration the location of peers, their capacities, the time that

they stay together and the bandwidth of the network technology.

Table 2-1 compares the P2P systems Gnutella, Freenet, CAN, eDonkey and Dodgeball

according the requirement “pervasiveness” discussed in chapter 1 is not fulfilled by most

of the systems. Only Dodgeball satisfied the pervasiveness requirement by using SMS to

interconnect nearby friends. The requirement “Social awareness” is satisfied by only

Dodgeball that since in this system, users communicate with their friends. Interest

awareness is not considered by the mentioned peer-to-peer systems.

As files are arranged systematically in the overlay networks, we consider that structured

and loosely structure peer-to-peer systems considers partially the requirement “High level

semantics”.

The requirements “mobility awareness” and “data dissemination” are not the concerns of

most of the existing peer-to-peer systems. There are dedicated routers in the Internet; so in

our knowledge, there is no peer-to-peer system integrating a routing protocol. Finally, as

peers in the Internet are less mobile, mobility awareness is not considered by most of these

systems.


27

Most of the discussed peer-to-peer systems help peers to discover files. However, peers

perform file delivery by themselves. However, recent peer-to-peer systems (including

eDonkey) integrate content delivery protocols like BitTorrent to facilitate the file delivery

process.

Table 2-1: Analysis of P2P systems

Systems

Requirements

Gnutella Freenet CAN eDonkey Dodgeball

Pervasiveness - - + - ++

Mobility awareness X X X X -

High level semantics - + + - X

Social awareness - - - - ++

Context aware content delivery X X X ++ X

Data Dissemination X X X X X

Interest-awareness - - - - +

X-not applicable, - don’t consider,+ considered in partially or a limited way, ++ fully considered,


28

2.2 Information sharing in MANET

A number of information sharing systems such as ORION [40], Code-Torrent [41],

LIME [42, 43], Limeone [44], TOTA [45, 46], PeerWare [47], AdHocFS [48, 49], Ad-hoc

InfoWare [50] and XMIDDLE [51] have been proposed for MANETs in this decade. In

this section, we discuss these systems with respect to information discovery and delivery

activities. Section 2.2.1 discusses ORION and code-Torrent. Section 2.2.2 presents Lime,

Limeone and TOTA. PeerWare, AdHocFS, Ad-hoc InfoWare and XMIDDLE are

discussed in sections 2.2.3 to 2.2.6. Finally, we compare and analyze the information

sharing systems in section 2.2.7 according to the requirements presented in chapter 1.

2.2.1 ORION and Code-Torrent

ORION [40] is a system that allows peers in a multi-hop MANET to share files. ORION

combines application layer tasks, which are used to search and download files, with

routing tasks, which are used to distribute queries, responses and data. A simple multicast

and broadcast protocol for a MANET [52] is used to disseminate queries.

ORION maintains two routing tables: a response routing table and a file routing table.

Similar to the routing tables used by AODV [53], the response routing table is used to store

the address of the peer from which a query message has been received as next hop on the

reverse path. Thus, a peer is able to return responses to the requesting peer without explicit

route discovery.

In ORION, a peer prepares a query as a list of keywords when it needs files and forwards

the query in the vicinity. A source peer prepares a response as a list of identifiers of files

and sends it in the direction of the requester. When an intermediate peer p1 gets a response

for a query q from a peer p2, it checks the response before forwarding it. If the response

contains files that the peer p1 does not have, p1 notes p2 as the next suitable peer for the

query q.


29

In ORION, a file is divided into a fixed number of blocks and is downloaded block by

block. When a requester gets a response for a query, it chooses a file and sends a data

request message to a peer from which it receives a response containing the identifier of the

selected-file. A data-request-message indicates the block of a file that a peer wants to

download. If the receiver peer has this file, it forwards the requested block to the requester;

otherwise, it forwards the message to the next suitable peer. An intermediate peer sends a

routing-error message to the peer from which it received the request message when its next

suitable peers owing the selected file are disconnected. When a peer receives a route-error

message, it forwards the request message to another next suitable peer. The requester

cancels the download of the file if all the peers owning the file are disconnected before the

download is completed.

ORION is not scalable as it performs file localization by query flooding. The main draw

back of ORION is the usage of an overlay-network. Even if the overlay is constructed by

considering the physical location of peers, it is still costly as there is no guarantee that

peers will remain where they are. The important contribution of ORION is the delivery of

files from one or more sources.

Code-Torrent [41], a file sharing system designed for one hop MANETs, is similar to

ORION in the delivery of files, i.e., downloading of files is performed block by block from

one or more peers. The difference between the two protocols is on the file discovery

process. ORION uses a pull discovery approach (i.e., querying the neighborhood) while

Code-Torrent applies a push discovery approach (i.e., advertising sharable files). In Code-

Torrent, every peer distributes the descriptions of sharable files to its neighbors. A file

description contains the identifier of the file, its name and the number of blocks into which

the file is divided.

2.2.2 Lime, Limeone and TOTA

LIME [42, 43], Limeone [44] and TOTA [45,46] inherit and adapt the communication

model proposed by Linda [54]. The coordination aspect in Linda is accomplished by using

a tuple space, which is globally shared by all participating peers.


30

A tuple is a record of one or more fields, each field having a value of a certain data type.

Any process can write a tuple into the tuple space. Another process, then, can take/read a

tuple from the space. Taking and reading a tuple are different in the fact that a tuple is no

more available after it has been taken. To find a tuple to take/read, a process should supply

a special type of tuple called a template.

Each field of a template is filled with a specific value or a “wildcard”. The template is

matched against all tuples in the tuple space. A template and a tuple are similar if they have

the same number of fields and the fields have the same data types and values (a wildcard

matches any value).

A process can request to take/read a tuple by sending a template; in this case, it receives

a tuple matching the template. If multiple tuples match the template, one tuple is chosen

arbitrarily; if no tuple matches the template, the process is blocked until there is a match.

In Lime, the shared tuple space is replaced by a transiently shared tuple space. As

displayed in Figure 2-2, a transiently shared tuple space is formed from tuple spaces of

peers in the MANET. Each mobile entity is associated with a personal tuple space,

accessed through an interface tuple space (ITS). When mobile entities meet together, their

ITSs are merged. Each mobile entity performs a tuple operation over its personal tuple

space, which is updated with other personal tuple space information when possible. The

transiently shared tuple space is recomputed upon the arrival or departure of a peer. Peers

cannot access the tuple space when the updating process is going on. Consequently, this

system is not appropriate for a dynamic MANET.

Transiently Shared Tuple Space

Local Tuple space

Local Tuple space

Local Tuple space

Figure 2-2: Transiently Shared Tuple space


31

In Limeone [44] and TOTA [45, 46], no global tuple space is constructed. In both

systems, peers manage their own tuple spaces. In Limeone, mobile agents are used to

access the tuple spaces of neighbors while in TOTA, peers makes others aware of sharable

information by distributing tuples.

In Limeone, a software agent is attached with each tuple space. An agent can access the

tuple space of a neighbor upon connection. An agent can inform other agents in the

neighborhood about the tuples that it manages. However, this approach limits the sharing

of data since peers share data only when they have a direct connection.

TOTA works by propagating tuples; each tuple is attached with a propagation rule. The

propagation rule determines how the tuple should be distributed in the environment. This

includes determining the “scope” of the tuple (i.e., the distance to which the tuple should

be propagated and possibly the spatial direction of the propagation) and how the

propagation can be affected by the presence or the absence of other tuples in the system.

The propagation rules can also be used to determine how the tuple content should change

while it is propagated. Attaching a propagation rule to a tuple is an important contribution

of TOTA. This rule can be used to make TOTA work despite the network dynamicity.

However, the authors did not explicitly discuss the design of propagation rules according

to the network dynamicity.

2.2.3 PeerWare

PeerWare [47] is a coordination model. It is developed as a core component of the

MOTION [55] platform, which is designed for a MANET containing fixed backbone

peers.

PeerWare exploits the notion of a global virtual data structure (GVDs). GVDs is a meta-

model of communication for mobile environments, centered around the idea of supporting

coordination among a set of peers through a data space that is transiently shared and

dynamically built out of the data spaces provided by the peers.


32

GVDs is represented as a hierarchy of nodes containing documents. As in Figure 2-3, a

document may actually be accessible from multiple nodes, i.e., a document can have

multiple parents. For example, the thesis you are reading can be classified in mobile

computing domain as well as in information management domain. Changes in the

connectivity state of peers’ determine the content of the GVDs as data become

inaccessible/out of watch.

Figure 2-3: An example of the global virtual data structure managed by PeerWare

The model provides operations that allow peers to query the GVDs, to subscribe for

events and to receive corresponding notifications. The execute operation allows peers to

execute an arbitrary piece of code on a selected set of items held by connected peers. The

subscribe operation allows peers to subscribe to events occurring on a selected set of items.

The publish operation allows peers to notify the occurrence of events.

Organizing documents in a GVDs is an important contribution of PeerWare. However,

allowing a document to have multiple parents makes information retrieval complicated.

Furthermore, the modification of a GVDs is costly for a MANET because it should be

done each time a peer joins/leaves the MANET.


33

All peers have a local data structure except thin devices like PDAs. As a result, small

devices can play only the role of a client. However, nowadays, there are MANETs formed

without heavy-weighted devices, in which, peers in these MANETs store important

information that can be shared with each other.

2.2.4 AdHocFS

AdHocFS [48, 49] is a file system that permits mobile devices to access information

stored in traditional file systems. In this system, mobile devices are considered as terminals

that cache files and exchange caches in MANETs. In the system, every peer is assumed to

have a UID that uniquely identifies the peer. Peers are organized themselves by using their

UIDs.

Initially, each peer is a group leader by itself. Group-leaders discover other peers by

broadcasting messages periodically. Upon discovering other groups, groups are merged

and the peer having a minimum UID becomes the leader of the enlarged group. The leader

has connection to every peer in the group. Every peer is connected to a peer having a UID

less than its own.

The leader broadcasts the hierarchical directory structure of any peer of the group to

every other peer of the same group. The group leader is responsible to assure that (i)

members access the most recent versions of data within a group and (ii) peers have the

same versions of data. In addition, a group leader performs file replication in order to avoid

loss of data due to a sudden disconnection of peers.

As a group leader is needed to manage communications with all peers in the network,

using AdHocFS in a multi-hop MANET seems a very difficult task. Moreover, the leaders

are selected by their UIDs but not by their capacity (e.g., storage and battery).


34

2.2.5 InfoWare

InfoWare [50] is a system designed for a MANET that contains gateways to the Internet.

Even if the network infrastructure is formed in an ad-hoc way, the members of the network

are pre-defined. The system consists of a knowledge manager, a resource manager,

watchdogs, distributed event notification services and a security management.

The knowledge manager module adds a layer of knowledge to the information shared in

the network. It is also used to relate metadata descriptions of information items to a

semantic context. Moreover, it enables querying and retrieval of information items and

resources available in the network.

The distributed event notification service (DEN) is employed to facilitate the exchange

of information. The DEN consists of three services: a publisher, a subscriber and an event

notifer. Any peer involved in the information sharing needs to implement at least one of

these services. Subscribers specify the content that they need. Publishers produce an event

if they have the information needed by the subscribers.

The watchdog module, implemented by every publisher, is used to detect events and to

check the fulfillment of the condition specified by subscribers.

The resource manager module is responsible for delivering information in response to a

query and for controlling watchdogs. It is also in charge of managing the replication of

information and of performing predictions about the probability of network partitions.

The security manager is used to control the access to shared information and resources.

The system groups peers according to their roles in the emergency operation. The system

coordinates resources, performs information sharing and provides support to according to

these grouping.

The infoWare system have two important contributions: (1) the file replication and the

management of resources are performed with respect to the connectivity patterns of the

devices and (2) The management of information and resources is done according to the

peers’ roles in an emergency operation. However, the authors only vaguely discuss the


35

knowledge acquisition related to the connectivity pattern of devices. Moreover, they only

discuss the replication process at high level.

2.2.6 XMIDDLE

XMIDDLE [51] is a system that allows devices in a MANET to share structured

documents with each other. To use this system, peers should present their sharable files in

a tree form (“data tree”). Access points to a data tree are defined in order to present parts of

a file as sharable information.

Hosts having a direct connection can perform data sharing. Host A can modify data in

host B, by creating a link to an access point in a file’s data tree of host B. Such link is

similar to the remote directory mounting in a typical distributed file system.

XMIDDLE has a number of primitive operations to allow peers to share information.

The connect operation is used to connect two peers. During this operation, peers exchange

information about access points of sharable documents. The disconnect operation allows a

peer to explicitly decide to work offline. The link and the unlink operations allow peers to

connect/disconnect to/from an access point of a data tree.

Representing a document in a tree format and allowing only some parts of a document to

be sharable are important contributions of XMIDDLE. However, the system can only be

applied for one hop MANETs.

2.2.7 Discussion

In previous sections, we have discussed the most representative information sharing

systems designed for MANETs. As shown in Table 2-2, these systems apply different

methods to discover files. The methods can be divided into three types: (1) distribution of

queries/advertisements (push/pull approaches), (2) publish/subscribe systems and (3)

shared/transient memory.


36

As the distribution of queries in ORION is not controlled systematically, it can create

insupportable overheads on the participating peers. In Code-Torrent, the information

sharing is performed in a one hop MANET by distributing advertisements blindly. As

blind distribution of advertisements creates important overheads on the participating peers,

this system can not be adopted in multi-hop MANETs.

Table 2-2: Summary of the information sharing systems designed for MANETs

Information sharing

activities

Cod

e-To

rren

t

Lim

e

Lim

eOne

TOTA

OR

ION

Peer

War

e

AdH

ocFS

Info

War

e

XM

IDD

LE

Information Discovery

• CPL: hop count pull

• OPL: one hop pull

• OPH: one hop push

• CPH: hop count push

• GPH: group based push

• SM: shared memory

• PS: publish/subscriber

OPH LM OPL

OPH

CPH CPL SM

PS

GPH PS OPH

Information Delivery

• WC: whole content

• PBP: part by part

• CP: certain parts

• OD: offline delivery

PBP WC WC WC PBP WC WC WC CP

Mobility control

• R: replication

• C: Change of sharing

strategy

- - - C - - - R R


37

The distribution of queries/advertisements can be a simple, a hop count based, or a group

based distribution process. In simple flooding [52, 56], a peer distributes a packet to peers

having a direct connection with it. Each of those peers in turn redistributes the packet in

the same fashion and this continues until all reachable peers have received the packet. This

kind of dissemination of packets has an evident problem of scalability and creates high

burden on devices. In hop-count-based-flooding, a packet is distributed up to a certain hops

counted from the requester peer (for queries) and from the source peer (for

advertisements). In a group-based dissemination, a requester peer disseminates a packet to

peers in the group to which it belongs.

The need of brokers to establish a publish-subscribe system complicates the application

of peerWare in a multi-hop MANET. Finally, the shared/transient memory proposed in

LIME is not suitable for dynamic environments since peers cannot communicate while the

transient memory is changing.

In AdHocFS [48, 49], peers in the MANET forms groups in such a way that there is a

path between any two peers in the same group. The system has raised a very good issue

however, the number of peers in a group is not controlled and the group leader is not

elected according to its processing capacity and battery power. Other researchers have

proposed to use geographical based hierarchical index (GHI) [57] to facilitate information

sharing in a MANET. GHI is formed according to the geographical location of peers. This

approach tries to adopt CAN [26] on a MANET by using a geographical indexing function

that hashes files into a geographical coordinate. Normal distribution and high availability

of peers, however, are required in order to use the system.

Content delivery is a required functionality of any information sharing system. Content

can be delivered at once as it is done in traditional peer-to-peer systems. However, this

may not be always possible in MANETs. The other alternative way is to deliver the content

part by part from one or more peers as it is done in ORION [40], Code-Torrent [41] and

XMIDDLE. However, Only XMIDDLE [51] allows users to download some parts of a file.

Most of the systems do not give any special attention to problems caused by the mobility

of peers (e.g. disconnection of a peer while downloading a file from it). The propagation


38

rules in TOTA can be used to resist the network dynamicity by the scope of a tuple

propagation rule (i.e., the number of hops that a tuple traverses and the modification of the

tuple) according to the event occurring in the environment (e.g., a time alarm or a network

structure change). InfoWare and XMIDDLE try to handle problems caused by mobility of

peers by data replication.

In Table 2-3, we compare the discussed systems according to the requirements identified

in chapter 1, i.e., pervasiveness, mobile-awareness, high level semantics, social-awareness,

context aware content delivery, data routing and interest-awareness.

Table 2-3: Analyzes of information sharing system of MANETs

Systems

Requirement

Cod

eTor

rent

Lim

e

Lim

eone

TOTA

OR

ION

Peer

War

e

AdH

ocFS

Ad-

Hoc

Info

War

e

MID

DLE

Pervasiveness ++ ++ ++ ++ ++ ++ ++ + ++

Mobility awareness - - + + - - + + +

High level semantics - + + + - ++ + - +

Social awareness - - - - - - - ++ -

Context aware content delivery + - - - + - - - +

Data dissemination - - - - ++ - - - -

Interest-awareness - - - - - + - + -

- not considered, + considered partially or in limited way, ++ considered

As discussed in chapter one, the criterion “pervasiveness” describes the possibility of

exchanging information in any place, at anytime and by using any device equipped with

wireless network communication facilities. As all systems are designed for MANETs,

almost all of them satisfy this criterion. Ad-hoc infoWare satisfies this criterion only


39

partially since users should authenticate in the Internet in order to share information in a

MANET.

The mobility awareness criterion is satisfied if the information-sharing activities are

performed according to the network dynamicity1. None of the system fully considered this

criterion. Ad-Hoc InfoWare and Ad-hoc FS use replication of data to resist disconnection

of devices. Ad-Hoc FS blindly replicates the data of a peer to every other peer. However,

blindly replication cannot be a solution in an environment where devices are battery

powered and have limited storage capacity. InfoWare replicates the data of a device that

will be disconnected soon. However, replication of all the data of disconnected devices

may not be necessary. TOTA adjusts the tuples that peers advertise according to a

propagation rule. However, the identification of propagation rules is not discussed

formally. As Limeone uses software agents to manipulate the tuple space, it can tolerate

disconnection. However, the authors do not demonstrate explicitly the use of agents to

tolerate disconnections.

High-level semantics refers to the classification and the representation of sharable files

according to their semantic similarities. Lime, Limeone, TOTA and peerWare use tuple

space, a mathematical model to express semantic inter dependencies of files. However,

semantic richness of file representation depends on the implementation of the tuple space.

As peerWare considers a structure that represents files hierarchically, we consider that it

fully satisfies the criterion “High level semantics”. XMIDDLE uses a tree structure to

represent a file in order to exploit the sharable parts of the file and thus, we consider that it

satisfies the criterion “High level semantics” in a limited extent.

The social-awareness criterion is satisfied if the social relationships of users are

considering during the selection of sharable files. Most of the systems do not satisfy this

criterion. The InfoWare system groups peers according to their roles in the emergency

operation. The system coordinates resources, performs information flow, provide support

according to these grouping.

1 Network dynamicity describes the change of the state of a MANET caused by the mobility of devices. In

this thesis, network dynamism is measured by the stay-time of peers participating in a MANET.


40

In the most of information sharing systems, the delivery of files and routing of messages

are not discussed. ORION integrates ADOV routing protocol to disseminate queries. Code-

Torrent and ORION allow delivery of a file part by part. This kind of delivery process

permits to download files from several sources and thus, accelerates the delivery of files.

However, their delivery techniques are limited and do not consider all of the important

contexts that could be encountered during content delivery. For example, they do not

consider the following context “there is only one source for a file; a source and a requester

can’t stay connected until the download is completed”.

Finally, none of the system gives a special attention to the interest awareness criterion.

Publish/Subscribe information discovery approaches used by PeerWare and InfoWare can

be used to take the interests of users into consideration. However, publish/subscribe

systems are not necessarily interest-aware. Interest-awareness of these systems depends on

the kind of the events produced by publishers.

From the above analysis, we can observe that information sharing in MANETs is still in

its early age. The mobility-awareness, one of the key criterions, is supported in a limited

way by a few of the systems. The interest awareness, another key criterion, is not

considered by any of them.

2.3 Service Discovery

Service discovery protocols are basic components of any information sharing systems. In

this thesis, we focus service discovery protocols for infrastructure-based network and

infrastructure-less networks. SLP (Service Location Protocol) [58] and Universal Plug and

Play (UPnP) [59] are examples of service discovery protocols (SDP) deigned for wired

networks. There are also a number of service discovery protocols including DEAPSpace

[60], GSD [61, 62], Allia [63], Konark [64], and service ring [65] designed for

infrastructure-free networks.

Service discovery protocols can be divided into three types: directory-based, directory-

less, and hybrid of the two. In directory based SDPs, one or more devices provide directory


41

services to facilitate the service discovery process. Service-providers advertise their

services to the directory. To access a service, a client first contacts the directory to obtain

the service description; then, it contacts the suitable service provider directly. In directory-

less SDP, every device is required to provide some form of a directory (service registry).

Each service provider advertises its services to others in the vicinity. Any device interested

in the advertisements can store them in its local service registry. A service consumer can

use the cached advertisements to discover a service or it can disseminate a service

discovery message throughout the environment. In Section 2.3.1, 2.3.2 and 2.3.3, the three

types of service discovery protocols are discussed in detail. In section 2.3.4, we discuss the

main contributions of the discussed SDPs with respect to the requirements pointed out in

chapter 1.

2.3.1 Directory-based Service Discovery

Directory-based SDPs like service ring [65] and protocols proposed in [66, 67] perform

the following activities:

• forming a virtual backbone (i.e., electing directories and creating interconnection

links between them),

• performing service discovery functionalities and

• maintaining the virtual backbone.

In directory-based SDPs, mobile peers are organized in groups, usually based on distance

or communication proximity. A peer in each group is elected as coordinator to handle

routing and service discovery activities. These coordinators establish connections with

each other to form a virtual backbone.

Peers can also be grouped according to the similarity of services that they provide as in

Service ring [65]. A service ring groups together peers that are both physically close to

each other and offer similar services. Each ring possesses a designated service access point

(SAP) which knows a summary about all services offered within its ring. SAPs can be

connected to another ring, too, which lead to a hierarchical structure.


42

There are advantages on using directory based service discovery protocols. First,

scalability is achieved when the network size becomes larger since there are many

directories to handle service discovery and routing. Secondly, the response time for

locating a service is highly reduced. However, dynamic assignment of directories presents

an extra load to the network due to frequent change of the network topology. Moreover, in

a MANET, finding peers that provide directory services may be a difficult task because

these peers should have a good capacity in terms of storage space, processing power and

battery power.

2.3.2 Directory-less SDPs

SDPs like GSD [61, 62], Allia [63], Konark [64], MoGATU [68], and protocols

proposed in [69-71] do not use directories; thus, they are called directory-less SDPs. In

such SDPs, every peer is required to provide the tasks performed by directory services and

to discover resources in a peer-to-peer manner.

These SDPs are deployed with two different working models: a “Push model” and a

“Pull model”. In the push model, service providers disseminate service-advertisements in

the vicinity so that others passively learn about the available services. On the contrary, in

the pull model, clients are rather active, they flood service queries with the hope that the

service provider will eventually reply to their queries. Almost all directories-less SDPs use

a hybrid of the two models.

In most of the protocols, for example in Allia [63] and in the protocols described in [69-

71], peers make advertisement about each local service. In DEAPSpace [60], a peer

advertises its worldview, i.e., information about all services about which it is aware. In

GSD [61, 62], services are classified into groups and a peer advertises its services and the

groups of services that it has seen in its vicinity. In Konark [64], services are arranged in a

tree. In this system, the advertisement can be generic or specific, i.e., it can contain only

services found in the shallow or the bottom level of the service tree.


43

Service information can be updated mainly by two ways. The first way is to update the

service whenever an event occurs, for example the non-availability of a route to the service

provider. The other way is to update the service information on a regular basis for example

by periodical advertisements as done by Konark [64], GSD [61, 62] and the protocol

proposed in [70]. Hybrid of the two can also be used as in [69]. The period of

advertisement can be variable according to the mobility patterns of the peers like in Allia

[63] and Konark [64].

The advertisement can be sent to neighbors that are one or more hops far. The

advertisement diameter is defined as the number of hops that the advertisement crosses. In

DEAPSpace [60] and MoGATU [68], the diameter is fixed to one. Some protocols change

the diameter of advertisements with respect to the mobility of the peer. If the peers are

moving faster then the rate of advertisement increases and the diameter decreases. This

type of technique is implemented in GSD [61, 62] and Allia [63].

Service discovery is generally done by consulting the cached advertisements and if

necessary by flooding a service request query. In a protocol proposed in [70], a peer sends

a query to a service query multicast group when it needs a service. A service query

multicast group is formed during the bootstrapping phase and consists of a service provider

with its possible consumers.

2.3.3 Hybrid service discovery protocols

Scalability is the main drawback of directory-less service discovery protocols. Indeed,

they consume a lot of bandwidth. Hybrid-service-discovery protocols minimize this

drawback by using directories or service brokers whenever possible. Service Location

Protocol (SLP) [58], Universal Plug and Play (UPnP) [59] and Java Enhanced Service

Architecture (JESA) [72] are examples of Hybrid service discovery protocols. Except

JESA, all the mentioned protocols are designed for wired networks.

SLP is a service discovery protocol designed for TCP/IP networks. SLP has two different

modes of operation: (1) when a directory is present, it collects all service information


44

advertised by service providers and consumers send their requests to the directory, and (2)

when there is no directory, consumers repeatedly disseminate their requests in the

environment; service providers listen for these requests and send responses to the

consumers.

Similarly, UPnP can operate with or without a lookup/directory-service. When a service

wants to join the network, it first sends out an advertisement message to notify its

presence. If a lookup or directory service is present, it records such advertisements.

Meanwhile, other services in the network may directly see these advertisements as well.

When a peer wants to discover a service, it can contact the service directly through the

URL that is provided in the service advertisement, or it can send out a multicast query

request. In the case of discovering a service through the multicast query request, the client

request may get a responded from the service directly or from a lookup/directory service.

Finally, as SLP and UPnP, JESA can work with or without a service broker. Services are

advertised and searched by broadcasting service-advertisements and service-requests

respectively. Service-providers reply to the requester-peers via unicast messaging. In larger

networks, service brokers register services provided in the environment and service-

providers stop multicasting advertisements and replying to requests. Brokers are

discoverable as services. If the broker that a service provider has registered with is

disconnected, the service is no longer available. In this protocol, searching for a broker

poses complications by itself.

2.3.4 Summary

Providing information is a service. Hence, discovering a service is logically equivalent to

localizing information (document/file). However, implementing a service for each sharable

file is a resource-consuming task. Nevertheless, service discovery protocols can be used as

bases for designing information localization algorithms.


45

Among the requirements discussed in chapter one, the requirements “Mobility

awareness” and “High-level semantics” are considered by some service discovery

protocols. Table 2-4 lists the service discovery protocols satisfying these requirements.

Table 2-4: Analyzes of service discovery protocols

Requirement Considered by

Mobility awareness GSD, Allia

High level semantics GSD, Service ring; Konark

Allia decides the diameter and the period of advertisement depending on the mobility

pattern of the peers. GSD uses the same concept to decide the diameter. However, the

authors of Allia and GSD do not explicitly specify how they measure the mobility patterns

of peers and how the diameter and the period of advertisement are parameterized.

Konark, GSD and Service ring organize services according to their semantic similarities.

Thus, these protocols satisfy the requirement “high level semantics”. However, except

Konark, they do not exploit semantic rearrangements of service during service

advertisements. In Konark, services are rearranged in a tree and the protocol selects some

branches with respect to some criteria for advertisement. However, the authors of the

protocol discuss these criteria vaguely.

2.4 Routing

The routings of queries, responses and data are fundamental tasks of any information

sharing system. Actually, talking about information sharing system is useless without

having a reliable routing protocol, especially when dealing with MANETs. In MANETs,

peers are responsible to perform data routing. The processing power and the storage

capacity of these peers are limited. Moreover, they are battery powered. As a result,

information sharing systems should be integrated with a routing protocol that considers

these specific conditions.

According to the responsibilities of peers, routing protocols can be classified into two

types: flat and hierarchical [73]. In flat routing protocols, every peer is equally responsible


46

for forming and maintaining the routing information. As the name indicates, in hierarchical

protocols, the network is structured into clusters and cluster-heads, which form a virtual

backbone for routing. The peers forming the backbone perform the routing task. As our

middleware is based on a pure peer-to-peer architecture, we will analyze the flat routing

protocols.

According to the number of destinations, routing protocols can also be classified into two

types: multicast and unicast. Unicast routing protocols are targeted for a single destination;

multicast protocols, for a group of destinations. In this thesis, we will review only unicast

protocols.

Finally, routing algorithms can be further classified as global positioning based protocols

and global position-less protocols [73]. A global positioning based protocol considers the

positions of peers during routing. Oppositely, global positioning-less protocols do not

make any assumption about the positions of peers. In section 2.4.1 and section 2.4.2, we

present the two types of routing protocols. Later in section 2.4.3, we discuss the possible

integration of routing protocols with an information sharing system of a MANET.

2.4.1 Global Position-Less routing protocols

AODV [53], WRP [74, 75], DSDV [76], DSR [77] and ZRP [78] are popular unicast

MANET routing protocols. DSDV is a proactive routing protocol. In this protocol, devices

exchange routing information among each other. DSR and AODV are reactive routing

protocols in which routing information is searched on demand. ZRP is a hybrid protocol

that combines reactive and proactive features.

DSDV is designed based on the Bellman-Ford algorithm. In DSDV, every peer maintains

a routing table and broadcasts the table periodically to its direct neighbors. A routing table

contains the shortest distance to reach every possible destination in the network, the next

hop peer through which the shortest path passes, the address of the destination and a

sequence number, which indicates the presence/absence of a peer. A sequence number is

generated by the destination as even number and can be updated to odd number if the path


47

leading to the destination is broken. In the context of a MANET, the periodical

broadcasting of routes would definitely be very bad for the battery power of devices

participating in a MANET and would drastically increase the network traffic.

WRP uses an improved Bellman-Ford Distance Vector routing algorithm. To adapt to the

dynamic features of mobile ad-hoc networks, mechanisms are introduced to ensure the

reliable exchange of update-messages and to limited route loops created by the Bellman-

Ford algorithm. In WRP, in addition to the routing table, peers maintain two other tables: a

distance table and a link-cost table. The distance and the link-cost table store the distance

and the cost of a destination through each one-hop neighbor respectively. The cost of a link

can be the number of hops of a destination or the number of hops plus a biased value that

indicate the reliability of the link. The distance table contains the distances of each

destination from each one-hop neighbors. In WRP, peers exchange routing tables with their

one-hop neighbors using update-messages. Either the update-messages can be sent

periodically or whenever a link state changes. Additionally, if there is no change in its

routing table since the last update, a peer is required to send a “hello message” to ensure its

connectivity. When a peer receives a hello message from a new peer, the new peer is added

to its routing table, and the receiver peer sends a copy of its routing table information to the

new peer.

In general, proactive protocols like DSDV and WRP are not suitable for MANETs. As

MANETs are very dynamic networks, it is important to update route table whenever a peer

appears, disappears and reappears. This updating process creates a high burden on the

peers participating in a MANET.

AODV is designed based on the DSDV protocol. However, peers build routing tables by

using route requests / route replies query cycles. When a source peer searches for a route to

a destination, it broadcasts a route request (RREQ) packet across the network. The request

contains a source address, a source sequence number, a broadcast ID, a destination address,

a destination sequence number, and a hop count. Peers receiving this packet update their

information for the source peer. A peer receiving a RREQ packet sends a route reply

request (RREP) back to the source if it is the destination or if it has a route to the


48

destination. If this is not the case, it rebroadcasts the RREQ. As a route is searched when a

source needs to send something to the destination, the route searching process will increase

the data transfer time.

DSR is a reactive routing protocol. In this protocol, each peer caches the routes that it has

learnt. Peers learn about possible routes during packet forwarding. The cached routes are

deleted after a predefined period if they are not updated. A host can obtain a suitable

source route by searching its cache of routes. If no route is found in its cache, it will initiate

a route discovery process to find a route to the destination dynamically. A route discovery

process involves broadcasting a route request packet. The route request packet contains the

address of the source and the destination, and a unique identification number. Each

intermediate peer checks whether it knows a route leading to the destination. If it does not,

it appends its address to the route record of the packet and forwards the packet to its

neighbors. If the route discovery is successful then the source host, which has initiated the

route discovery, receives a route reply packet. Learning routes during packet forwarding is

an important contribution of this protocol.

ZRP divides the network into different zones with a variable size. The number of hops is

used to measure the size of a zone. Each peer has its own zone. A peer keeps routes to

every destination within its zone; hence, a packet can be delivered proactively when the

source and the destination are in the same zone. For routes beyond the local zone, a route

discovery is done in a reactive fashion. The source peer sends a route requests to its border

peers. The border peers of a source are those found at r hops away from the source where r

is the radius of the zone defined for the source. The border peers check their local zone for

the destination. If the requested peer is not a member of the local zone, the peer adds its

own address to the route request packet and forwards the packet to its border peers. If the

destination is a member of the local zone of the peer, it sends a route reply on the reverse

path back to the source. The source peer uses the path saved in the route reply packet to

send data packets to the destination. Dividing peers into zones is an important contribution

of this work.


49

2.4.2 Global Positioning based routing protocols

Position-based protocols use the geographic positions of peers, which are determined by

using a localization service, to facilitate the routing [79]. In these protocols, the routing

decision only depends on the position of the destination and the position of the direct

neighbors. Thus, they may not need to establish or maintain routes. LAR [80], DREAM

[81] and DG-CastoR [74, 75] are examples of position-based routing protocols.

Location aware routing (LAR) is an extension of DSR [77] where routing is guided by

the physical location of destinations and sources. If the source peer does not know about

the position of the destination peer, it uses DSR protocol to discover it. Otherwise, the

source peer sends packets to peers that are found in the direction of the destination.

In LAR, an expected zone and a request zone are defined to facilitate the routing process.

An expected zone is an area where the expected location of the destination. It is calculated

by the previous location of the destination and its velocity. A request zone is an area where

query forwarding can be performed.

The protocol proposes two schemes to calculate the request zones. In the first scheme,

the request zone is calculated as a smallest rectangle including the source and the expected

zone. In this scheme, the source adds the four corners of the rectangle with the routing

message. A peer forwards the message if it finds itself in the rectangle. In the second

scheme, the request zone is defined by two constants: a and b. The source includes its

distance from the destination (let us refer it as DISTs) with the routing message. Let

DISTi be the distance of the peer receiving the message to the destination peer. This peer

forwards the message by replacing DISTs by DISTi if a × DISTs +b≤ DISTi.

In Distance Routing Efficient Algorithm (DREAM) [81], each peer maintains a position

database that contains the geographical positions of the destinations peers. Every peer

periodically floods its position up to a fixed distance. When a peer S wants to send a packet

to a peer R, it sends the packet to its neighbors found in the direction of R. The neighbors

forward the packet to their neighbors found in the direction of R. This process continues


50

until the packet arrives at R. DREAM uses a method similar to the second scheme of LAR

to determine the peers laying in the direction of R.

DG-CastoR [82, 83] is a routing protocol proposed in order to improve information

exchange in a VANET. In this algorithm, one-hop neighbors exchange their trajectories

(i.e., current and future positions) periodically. A query used to search information is

distributed to peers that follow the same trajectory as the source. Considering trajectory is

an important contribution of the protocol.

2.4.3 Discussion

In this chapter, we have revised popular routing algorithms as DSDV, AODV, DSR and

LAR. A proactive routing protocol like DSDV is not convenient for MANETs since it

produces a lot of traffic. Route discovery time of reactive routing protocols like AODV

will increase the information discovery time.

The location-aware routing protocol (LAR) introduces location awareness over DSR in

order to facilitate the routing process. However, as it uses DSR, if the peer does not have

the position of the destination, the information discovery time will increase when peers

communicate for the first time. LAR does not have a significant difference with DSR in a

MANET where peers changes position too frequently. DREAM avoids the problems of

LAR by discovering the route of some peers proactively. However, as in all proactive

protocols, the broadcasting of update-messages will create high traffic. DG-CastoR is

different from DREAM by considering future positions (trajectory) of peers.

As our thesis focuses on indoor MANETs where the dynamicity of the network is

limited, we choose LAR to integrate with our middleware. In future, we will integrate a

routing protocol that combines the features of LAR and DG-CastoR (see section 7.3).


51

2.5 Summary

In this chapter, we have reviewed research works in the domain of information sharing,

service discovery and routing in MANETs and traditional peer-to-peer networks according

to the requirements presented in chapter one.

We have observed that it is difficult to adopt P2P systems since they are not designed for

mobile environments where peers usually hold battery powered and thin devices.

Nevertheless, we have observed the important contributions of these systems. Content

downloading performed by BitTorrent protocol [23], which is used to download files block

by block, should be adopted in MANETs in order to facilitate the information delivery

process. File rearrangements performed by Freenet [31, 32] is another interesting feature

even if it is difficult to design a hash function that takes battery power and processing

power of devices into consideration. Social awareness considered by Dodgeball [35, 36]

and ImaHima [39] should be used in MANETs in order to facilitate information exchange

between nomadic users.

In MANETs, we have analyzed ORION [40], Code-Torrent [41], LIME [42, 43],

Limeone [44], TOTA [45, 46], PeerWare [47], AdHocFS [48, 49], InfoWare [50] and

XMIDDLE [51]. Mobility awareness is only partially touched by InfoWare and

XMIDDLE. These systems use the replication of files to stand with the problem coming

from the peers’ mobility patterns. We have also observed that propagation rules in TOTA

and mobile agents in Limeone could be used to resist the network dynamicity. It is difficult

to conclude that interest awareness is touched by these information-sharing systems. As

PeerWare and InfoWare use a Publish/Subscribe information discovery approach, they

might work according to the interests of users. However, as authors do not discuss about

the kinds of events the publishers produced, it is difficult to conclude that they have an

interest awareness feature.

We have also analyzed service discovery protocols like Allia, GSD and Konark. Allia

and GSD consider the requirement “mobility awareness”. These protocols propose to

change the strategy of service discovery method according to the mobility patterns of


52

users. However, the authors do not explicitly discuss the parameterization of the strategy of

service discovery according to the mobility pattern of users. Konark arranges services in a

tree and makes advertisement by selecting some branches of the service tree.

Data routing should be integrated with information sharing in order to facilitate the

information discovery and the information delivery processes. ORION and most of service

discovery protocols integrate a reactive routing protocol. However, a reactive routing

protocol increases the information discovery time. From the survey that we have made in

the routing domain, we have observed that integrating a location aware routing protocol

will improve the performance of the information sharing activity.

Despite all the research efforts, the requirement “mobility awareness” has not been given

enough attention. We have also observed that the requirement “interest awareness” is

considered in a very limited extent.

In this thesis, we propose approaches that enable an information sharing system to

discover files according to the interests and the mobility patterns of users. We also propose

a method to organize files in a tree in such a way that file discovery can be performed with

minimum overhead. Based on the propositions, we design and implement a self-adaptive

middleware called SAMi. In the next chapters, we discuss in details our propositions and

the middleware SAMi.

53

Chapter 3 Interest Awareness

Peer to peer information sharing over MANETs has become an important research topic

due to the advancement in information and communication technologies. In order not to

create information overloading, information discovery in MANETs, which is required to

carry out information sharing, should be performed according to the users’ interests.

Moreover, interest awareness can be used to increase the collaboration of users in the

information discovery process.

In this chapter, we propose an interest aware information discovery approach that

discovers files by disseminating advertisements and queries. In this approach, the users’

interests are used to filter the files to be advertised so that the usage of advertisements

increases and their overhead decreases. The interests of users are also used during query

resolution. Furthermore, in order to increase the scalability of the information discovery

approach, advertisements and queries are disseminated according to the users’ interests.

The proposed approach permits users to specify their interests reactively. It is also possible

to compute the users’ interests automatically by analyzing their habits of information

sharing. Finally, the users’ social networks are used to facilitate the interest identification

process.

The research work in this chapter was presented in International workshop on Mobile

P2P Data Management, Security and Trust (MP-DMST*) organized in conjunction with

the MDM 2010 conference [84] and in Pervasive Computing and Communications

Workshops (PerComW 2010) [85].

This chapter is organized as follows. Section 3.1 presents the motivation behind the

design of an interest aware information discovery approach. Section 3.2 defines the

important concepts and operations used through out the chapter. Section 3.3 discusses the

proposed information discovery approach. Section 3.4 deals with the identification of the

users’ interests. Section 3.5 illustrates the exploitation of users’ social networks in the

interest identification process. Section 3.6 discusses other interests-aware approaches;

finally, we conclude the chapter in section 3.7.


54

3.1 Motivation

The effort and the time needed to search information will be minimized if the

information that users need is provided automatically with respect to their interests.

Filtering the files to be advertised with respect to the users’ interests will minimize the

volume of advertisements. Similarly, considering the interests of users during the query

resolution increases the satisfaction and the collaboration of users. Finally, the interests of

users can facilitate the routing of information and hence, can make the system scalable.

Therefore, we argue that the users’ interests should be definitely considered when

designing an information discovery approach.

Recall the scenario discussed in chapter 1; buses and cafeteria of INSA are among the

places where Pascal uses MANETs to share information. Let us observe the detailed

information needs of Pascal and people with whom he has the habit to share information.

Assume that Pascal can use either “Bus 27“ or “Bus 37” to go to the office at 8 A.M. In

these buses, Pascal communicates with other passengers by using a MANET, which is

formed via Bluetooth. Most of his friends also use Bus 37 where they exchange jokes and

photos related to touristic places. People working in banks use Bus 27. As Pascal is also

interested in financial affairs, he exchanges news about finance in this bus. Pascal uses

MANETs to share information with his colleagues at a cafeteria of INSA named INSA-Café

to exchange information about new research areas and research issues.

According to the above scenario, Pascal and his friends are interested to exchange jokes

and photos in Bus 37. Assume that a MANET formed in Bus 37 is displayed Figure 3-1

such a way that Pascal is a neighbor of Eve and David, who are neighbors of Carol. Pascal

is also a neighbor of Bob and Anne. Let us observe the information exchange from the side

of Pascal. As Eve and Carol are interested in getting photos, Pascal should send

advertisements about photos to Eve and Eve should forward the advertisements to Carol.

Note that Bob and Anne are interested in providing jokes and photos respectively. As a

result, Pascal should send queries concerning jokes to Bob and queries concerning photos

to Anne.


55

o

Wireless Networking ------

Joke PhotInterest out

Interest In

Adv Queries

? ?

Anne

Pascal

Bob ??

Carol

?

David

?

Eve

Figure 3-1: A MANET in Bus 37

As illustrated by the scenario, interest awareness facilitates the information discovery

process. Therefore, the interests of users should be considered during selection of files to

be advertised and queries to be resolved. Similarly, the dissemination of advertisements

and queries should be done according to the interests of users.

It is difficult for nomadic users to specify their interests each time that they participate in

a MANET. Thus, the information discovery process should be able to identify the users’

interests automatically. We propose that analyzing historical queries and advertisements

can be used to identify the interests of users in receiving and providing information.

The location and the time contexts have an influence on the information that a user wants

to receive. For example, a student will be more interested in sharing research papers in

INSA than in other locations. Users may be more interested to exchange news in the

morning than in the night. Users may be highly interested in providing songs during their

vacation, more than in other times.


56

The interests of users are different when they are with friends and colleagues. A user

may be more interested to share information about their office environments with

colleagues than with friends. For example, Pascal is interested to exchange jokes with his

friends and research issues with colleagues.

The contributions of this chapter are related to: (1) formalization of the user interests, (2)

illustration of the application of the users’ interests in information discovery, (3)

identification of the users’ interests and (4) application of the users’ social networking in

the interest identification process.

3.2 Definitions

In MANETs, users have sharable files to provide for others. We assume that each of the

files is described by a set of keywords. Users also want to receive files from others. They

search files that they are looking for by disseminating queries, which are also represented

as set of keywords.

Definition 3.1. Interest: A user-interest2 represents a set of files that a user wants to

receive or provide. An interest I is represented as ({k1,..,kn}, w) where

• k1,.., kn are keywords (referred as Description(I)) and

• w∈(0, 1] is a weight (referred as Weight(I)).

Description(I) expresses the files represented by the interest and Weight(I) indicates the

preference/capacity of a user to receive/provide the files represented by I as compared to

his/her other interests.

An empty interest is defined to represent the files that a user is unable to describe. As

any other interest, the files represented by an empty interest can be provided or received.

The description of an empty interest Ie contains nothing, i.e., Description (Ie) = ø.

2 We use the terms interest and user-interest interchangeably.


57

Definition 3.2. Similarity of Interests: The similarity of two interests Ii and Ij is defined

by the similarity of their descriptions. Let Similarity (Di, Dj) be a similarity value of two

descriptions Di and Dj.

Similarity(Di, Dj) can be computed by using a semantic similarity function proposed in

[86] or a lexical similarity functions proposed in [87]. Lexical similarity functions are

simple to implement and execute especially for thin devices (e.g. a mobile phone). Thus,

during our experimentations, we have used a lexical similarity function derived from the

lexical similarity function introduced in [87]. However, our information discovery

approach can use other similarity functions without significant modifications. More

precisely, we define the similarity value of two descriptions Di and Dj, as:

⎪⎪

⎩

⎪⎪

⎨

⎧≠≠

××

+

=

otherwise

DandDifDD

DDDD

DDSimilarityji

ji

jiji

ji

0

øø2

)(

),(

∩

Note that Similarity(Di, Dj) is 0 if Di and Dj are disjoint and 1 if they are identical. As

described in Definition 3.1, Di= ø is the description of an empty interest. As the

descriptions of the files represented by an empty interest are unknown, thus, it is not

possible to compare an empty interest with a non empty interest. We can not also compare

two empty interests for the same reason. By considering a pessimistic case, we define

Similarity (Di, Dj) as 0 if Di= ø or Dj= ø.

The similarity value of interests Ii and Ij is computed as:

Similarity (Ii, Ij) =Similarity(Description(Ii), Description(Ij))

We define the similarity value of an interest and a file/query in the same way. Let Df be

the description of a file f and let q be a query. Similarity(I,f) and Similarity(I,q) are defined

as:

Similarity(f,Ii)= Similarity(Df, Description(I))

Similarity(q,Ii)= Similarity(q, Description(I))


58

We say that an interest Ii and an interest Ij are similar (noted as Ii ≈ Ij), i.e., if and only if

their similarity value is greater than or equals to a predefined value accSim, i.e.,

Ii ≈Ij ⇔ Similarity (Ii,Ij) ≥ accSim

Similarly, we use the same similarity threshold to determine if an interest I is similar to a

file f (or a query q), i.e.,

Ii ≈f ⇔ Similarity (Ii, f) ≥ accSim

Ii ≈q ⇔ Similarity (Ii, q) ≥ accSim

Definition 3.3. Collaboration tie strength: For users having a habit of appearing

together in MANETs, we define the function Tie-strength(p1, p2) to indicate the degree of

their collaboration. For us, the collaboration between users is expressed by their sharing

habits.

Let numberSh(p1, p2) be the number of files that the peers p1 and p2 have shared, let co-

total-Time(p1, p2) be the total times that peers p1 and p2 have stayed connected in MANETs

and let total-numberPr(p) be the number of files that a peer p provides to others in

MANETs. The collaboration tie strength between p1 and p2 (denoted as Tie-Strength(p1,p2))

is computed as:

)p ,Time(p-total-Co )numberPr(p-total)numberPr(p-total )p ,numberSh(p )p ,strength(p-Tie

2121

2121 ++

=

Note that the Tie-strength is symmetric, i.e., Tie-strength (p1, p2) = Tie-strength (p2, p1).

In a MANET, peers do not have a global knowledge. Thus, for a peer p, it is difficult to

compute the tie-strength between any couple of peers p1 and p2. However, in our protocol

(see section 3.3), the peer p needs to know the degree of collaboration between p1 and p2 in

order to forward advertisements of p1 to p2. We will discuss the advertisement forwarding

process in section 3.3. We propose that p estimates the tie strength between p1 and p2 by

combining his/her degrees of collaboration with respect to p1 and p2. Thus, we propose that

p can compute Tie-strength (p1, p2) as:


59

2)p (p,strength -Tie )p (p,strength -Tie )p ,(pstrength -Tie 21

21+

=

Definition 3.4 Sharing-context: A sharing-context3 of a peer describes a situation in

which the peer allows others to download files from his/her machine. A sharing context is

expressed as a tuple(L,[Ts,Tf]) where L is a location and [Ts,Tf] is a time interval.

For example, (Bus 1, [8AM, 10AM]) is a sharing context describing that a peer allows

others to download files from his/her machine when he/she is in Bus 1 from 8AM to

10AM. Table 3-1 lists other examples of sharing contexts.

Table 3-1: Examples of sharing contexts

Context Description

(Bus,[8AM, 10AM]) any bus from 8AM to 10AM

(“”, [8AM, 10AM]) any place from 8AM to 10AM

(“”, ø) anywhere and anytime

We define two types of sharing contexts: abstract and actual. An abstract sharing

context describes when and where a peer allows others to download files from his/her

machine. For example, a user can specify that others can download files from his/her

machine wherever and whenever she/he is in a MANET by fixing the sharing context to

(“”, ø). However, this doesn’t mean that he/she is in a MANET 24 hour in a day and 7

days in a week. An actual sharing context is derived from an abstract sharing context by

considering the actual time and place in which data were shared. Assume that Pascal

having an abstract sharing context (“”, ø) is interconnected with other nomadic users via a

MANET in Bus 27 from 8 AM to 8:10 AM. As a result, (Bus 27, [8 AM, 8:10 AM]) is the

actual context derived from the abstract context of (“”, ø).

3 In this chapter, we use the terms sharing-context and context interchangeably.


60

Definition 3.5. Sharing-interest: A sharing-interest is the set of interests of a user in a

given context. It can be used as a demand or a provision. An information demand of a peer

is a sharing interest that contains the interests of the peer to receive information. An

information provision of a peer is a sharing interest that contains the interests of the peer to

provide information. Table 3-2 lists examples of Pascal’s information demands.

A sharing interest S has the following properties.

[1] |S| ≥ 1,

[2] Description(I1) ≠ Description(I2) for I1, I2∈ S

[3] ∑∈

=SI

IWeight 1)(

[4] Weight(I) ≥ minW

Table 3-2: Examples of Information demands of Pascal

Context c Information-Demand(Pascal, c)

(Bus 37, [8AM-8:05AM]) {({accounts, banking, economics, financial

affairs},0.3), ({fund, treasure, capital, currency,

change},0.3), ({commerce, buying, selling, exchange,

stock, trade},0.4)}

(Bus 27, [8AM-8:07AM]) {({joke, fun, quip, buffoonery},0.5), ({photo-places,

photo-friends, photo-mountain, photo-nightclub},

0.5)}

(INSA-Café, ø) {({Social-networking, social analysis, social

software}, 0.7), ({iPhone-models, iPhone models,

iPhone history, iPod},0.3)}

We assume that a user participating in a MANET is interested to receive and provide

information. Thus, a sharing interest of a user will contain at least one interest. Two

interests in a sharing interest should represent different kinds of files; as a result, their


61

descriptions should not be identical but can be overlapped. As the weight of an interest is

defined as a comparative value (Definition 3.1), the sum of the weights of the interests in a

sharing interest should be one. As any other data in the computer, the number of interests

in a sharing interest should be finite. Consequently, we define a minimum weight (minW)

to limit the number of interests such that an interest can not have a weight less than minW.

If minW is 0.25, we will have a maximum of 4 interests in a sharing interest.

Empty sharing interest: An empty sharing-interest, denoted as {(ø, 1)}, is defined as a

set containing only an empty interest.

Information-Demand(p, pd, c): Information-Demand (p, pd, c) is an information

demand of a peer p observed by a peer pd in a context c. When p is the same as pd,

Information-Demand (p, pd, c) is referred as Information-Demand(p, c).

Information-Provision(p, pd, c): Information-Provision(p, pd, c) is an information

provision of a peer p observed by a peer pd in a context c. When p is the same as pd,

Information-Provision(p, pd, c) is referred as Information-Provision(p, c)

Overall-Demand(P) and Overall-Provision(P). For a set of peers P, their common

interests to receive and provide information are referred as Overall-Demand(P) and

Overall-Provision(P) respectively. The overall demand of peers is computed by

aggregating their information demands via the operation described in Definition 3.7.

Similarly, the overall provision of peers is computed by aggregating their information

provisions.

Definition 3.6. Similarity of sharing interests: The primary condition that we use to

decide the similarity of two sharing interests S1 and S2 is the similarity of the interests in

the two sets, i.e., for every interest Ii in S1, there should be an interest Ij in S2 such that Ii ≈Ij

and vese versa. The sharing-interests are not similar if the primary condition is not

satisfied. We use the cosine law to determine the similarity of two sharing-interests

satisfying the primarily condition.


62

Assume that S1={I1i,..,I1n} and S2={I21, ..,I2m} such that |S1|=n and |S2|=m. Let the

weights of I1i and I2i be W1i and W2i respectively; Let P1 be the vector representation of S1

and P2 be the vector representation of S2; we define these vectors as:

• P1=(W11,..,W1n) and

• P2=(W21, ..,W2m)

Let W12i be the average weights of the interests in S1 matching with the interest I2i and

W21i be the average weights of the interests in S2 matching with the interest I1i. For a

sharing interests S, let Sim(S,I) be the set of interests in S similar to the interest I; W12i and

W21i are computed as:

),(

)(W

21

),(12i

21

i

ISSimI

ISSim

IWeighti

∑∈=

),(

)(W

12

),(21i

12

i

ISSimI

ISSim

IWeighti

∑∈=

Let P12 be the vector representation of S1 with respect to S2 and P21 be the vector

representation of S2 with respect to S1; we define these vectors as:

• P12 = (W121,..,W12m)

• P21=(W211, ..,W21n)

The primary similarity condition is satisfied by S1 and S2 if and only if

∀ Ii ∈ S1, ∃ Ij∈ S2 such that Ii ≈ Ij and

∀ Ii ∈ S2, ∃ Ij∈ S1 such that Ii ≈ Ij

We define the similarity value between the sharing interests S1 and S2 as:


63

⎪⎪⎩

⎪⎪⎨=

Otherwise0

satisfied iscondition similarityprimary theIf2),(

122211

21 SSSimilarity

⎧ + ) ,cos(),cos( PPPP

Similarity(S1 ,S2) is a commutative operation. The sharing interests S1 and S2 are said to

be similar if and only if

S1 ≈ S2 ⇔ Similarity(S1, S2) ≥ accC

where accC is a predefined similarity threshold.

Example: Assume that S1 = {({Social-networking}, 0.3), ({iPhone-models},0.7)}, S2=

{({Social-networking}, 0.7),({iPhone-models},0.3)} and accC =0.5.

Observe that P1= (0.7, 0.3), P2= (0.7, 0.3), P21=P2, P12 = P1. Thus, cos(P1, P2)= cos(P1,

P21)= cos(P2,P12)=0.39 and hence, Similarity(S1,S2)=0.39; therefore, S1 and S2 are not

similar.

Definition 3.7. Aggregation of sharing interests: The aggregation of a set of sharing-

interests is used to extract the common features of the users’ sharing interests. The

aggregation of a set of sharing interests T = {S1, S2, … ,Sn}, denoted as ⊕∑Si, is computed

by using the following two steps

Step 1: Decompose interests

Let TNI be a set all non-empty interests in the sharing interests to aggregated. The

interests in TNI are decomposed into GTNI = {T1, …, Tm} such that the interests in the same

group are more similar than the interests in different groups. We propose to perform the

decomposition of non-empty interests by using a method4 derived from the agglomerative

hierarchical clustering approaches [88].

4 Grouping of interests are performed in the same way as grouping of queries. The algorithm proposed to group queries is discussed in

section 3.4.1.


64

More specifically, the sets in GTNI satisfy the properties:

Ip, Iq ∈ Tk ∈ GTNI ⇒ Ip ≈ Iq

Tk , Ts ∈ GTNI and k≠s ⇒ ∃ Ip∈ Tk and ∃ Iq ∈ Ts such that Ip !≈ Iq

Let Sim(T,Iq) be the interests in Ts similar to Iq ; for Iq ∈ Tk, the

following property holds true

s

ITSimIpq

k

ITIpq

TTqspqkp

IISimilarityIISimilarity ∑∑ ),(),(∈−∈

≥−

),(}{

1

Step 2: Identifying interests in ⊕∑Si

From each Tk, an interest Ik is computed in such a way that

• TIWeightIWeightkTI

k /)()( ∑∈

=

• ∩kTI

k II∈

= )(nDescriptio)(nDescriptio

As discussed in Definition 3.5, every interest in a sharing interest should have a weight

greater than minW (predefined threshold introduced in Definition 3.5). Consequently, the

interest Ik is added in ⊕∑Si, if Weight(Ik) ≥ minW.

Finally, according to Definition 3.5, the sum of the weight of interests in a sharing

interest should be one. Let SumNI be the sum of the weights of the non-empty interests in

⊕∑Si. If 1-SumNI≥ minW, an empty interest Ie is added in ⊕∑Si such that weight(Ie) = 1-

SumNI. The weights of each interest I in ⊕∑Si will be normalized using the formula below

if 1-SumNI < minW but SumNI <1.

Weight(I) = Weight(I) ÷ SumNI


65

3.3 Interest aware Information Discovery

In a MANET, information discovery can be performed by using two approaches: push

and pull. In a push approach, data sources make others aware about their sharable files by

disseminating advertisements; in a pull approach, a requester peer searches the source of a

file by distributing queries. As discussed in section 3.1, both approaches should be

conducted according to the interests of users. Thus, data-sources should prepare and

disseminate advertisements about their sharable files according to the information demands

of data-requesters. Similarly, data-requesters should resolve queries according to the

information provisions of data-sources.

When joining a MANET, data-sources and data-requesters distribute their interests to

provide and receive information in their vicinities. Let P be a set of peers in the MANET

about which a peer p is aware of; we propose that the peer p estimates the overall demand

of peers in P and their overall provision by using the aggregation operation described in

Definition 3.7, i.e.,

∑∈

⊕=Pp

Demand(p)-nInformatioDemand(P)-Overall

∑∈

⊕=Pp

p)Provision(-nInformatioP)Provision(-Overall

Let Sod be the overall demand of the requester peers in a MANET. A data-source peer in

this MANET should preferably advertise files matching with the interests of the overall

demand Sod. Let Adv-Volume5 be the number of metadata that the data source can use to

advertise sharable files. Let N(I) be the number of metadata that it can to advertise files

matching the interest I∈ Sod. N(I) is computed as:

N(I)= Weight(I) * Adv-Volume

5 We will discuss the computation of Adv-Volume in the next chapter.


66

Let F(I) be a set of files matching the interest I∈ Sod and ADV(I) be a container used to

store advertisements related to the interest I. The data source selects at maximum N(I) files

from F(I) and puts their metadata in ADV(I) via Algorithm 3-1.

For each non-empty interest I, Algorithm 3-1 places the files matching the interest I in

F(I) (lines 1 to 3). For an empty interest Ie, F(Ie) is filled with the files that do not match

any of the non-empty interests (lines 4 to 6). If the advertisement quota for I, i.e., N(I), is

sufficient to advertise all the files in F(I), the metadata of each file in F(I) is placed in

ADV(I) (lines 8 to 9). Otherwise, for each non-empty interest I, some of the files in F(I) are

selected to be advertised according to their similarities (relevance) to the interest (lines 10

to 11). If f1 and f2 are sharable files in F(I) such that Similarity(f1,I) is greater than

Similarity(f2,I), f1 is said to be more relevant to I than f2. In this case, f1 will have more

chance to match users’ need. Therefore, this file is privileged by the data source peer to be

included in ADV(I) than the other file. If the interest I is an empty interest and the

advertisement quota of I is not enough to advertise all the files in F(I), some of the files in

F(I) are randomly selected to be advertised (lines 12 to 13).

For any interest I, the dissemination of ADV(I) is performed according to: (1) the

direction of peers having information demand matching the interest I or/and (2) the degree

of collaboration between the data source peer and the peer to which the advertisement will

be forwarded. An information demand S matches an interest I if and only if ∃ Ii ∈ S such

that Ii is similar to I.

The Tie-strength notation, described in Definition 3.3, is used to calculate the degree

of collaboration between two peers. Let min-tie be the minimum tie between a data source

and a peer that has a chance to receive the advertisements. A peer p is said to have a high

degree of collaboration with the data source peer ps if and only if

Tie-strength(ps,p )≥ min-tie


67

Algorithm: Advertisement message preparation Input : Sod, N(I)∀ I ∈ Sod, F, Sod: overall demand F: sharable files N(I): advertisement quota of an interest I Output : ADV(I) ∀ I ∈ Sod ADV(I): metadata to be advertised w.r.t an interest I Begin //put files matching a non-empty interest I in F(I) 1. For any I ∈ Sod | description(I) ≠ ø 2. F(I) {f | f ≈ I and Similarity(f,I) ≥ Similarity(f, Ij) for∀ Ij ∈ S } 3. End For

/* put all files that are not similar to any of the non empty interests in F(Ie) where Ie is an empty interest*/

4. If ∃ Ie ∈ Sod | description(Ie) =ø 5.

{ }∪

eISIIF

−∈← )(-F )F(Ie

6. End If 7. For any I ∈ Sod 8. If (|F(I)|<N(I)) 9. ADV(I) {metadata((f)|f ∈ F(I)} 10. Else if (description(I) ≠ ø)

/* Relevant (F, I, n): contains the n most relevant(similar) files to the interest I in F*/

11. ADV(I) {metadata(f)|f ∈ Relevant(F(I) I, N(I))} 12. Else

//Random (F,n): contains n files which are taken randomly from F 13. ADV(I) {metadata(f)|f ∈ Random (F(I), N(I))} 14. End if 15. End for End Algorithm

Algorithm 3-1: Advertisement message preparation


68

The data source forwards ADV(I) to direct neighbors located in the direction of peers

having information demand matching the interest I. The method proposed by LAR [80]

(described in chapter 2) is used to select neighbors according to their locations. The data

source peer ps also forwards ADV(I) to his/her direct neighbors having a high degree of

collaboration with him/her. A peer accepting the advertisement forwards the advertisement

in a similar fashion.

Example: Assume that min-tie be 0.6; in the MANET displayed in Figure 3-2, p1

advertises to p3, p4 and p5 since p3 is interested on the advertisement, p4 is located in the

direction of peers interested on the advertisement and p5 has a high degree of collaboration

with p1.

Collaboration with α Tie-strength(files/day)

Advertisemetn flow 0.25 0.75

0.6 0.9

0.005

0.09

0.4

p4

ADV(I)=sharable files matching with I ∈ Overall-

Demand(P)

Ii3∈ Information -Demand(p3) | I ≅ Ii3

Overall Demand=⊕Σinf-Demand(P)

Ii7∈Information-Demand(p7) | I≅ Ii7

2

Ii6∈ Information -Demand(p6) | I≅ Ii6

p7

p6 p5

p8

p9

1

0.25 Wireless communication

α

p3

p1

p2

min-tie 0.8

Figure 3-2: Advertisement Distribution by p1


69

As discussed in Definition 3.3, the peer p5 computes the tie-strength between p1 and his

neighbors by taking the average of his degrees of collaborations towards p1 and his

neighbors. Thus, Tie-Strength(p1,p9) = (0.9+0.6)/2=0.75 and Tie-Strength(p1,p8)=

(0.9+0.7)/2 = 0.8. As min-tie is 0.8, p5 forwards the advertisement of p1 to p8 but not for to

p9.

A peer uses advertisements to identify potential sources of interesting files. It can also

search a file by distributing queries. A query q is resolved if there is a data-source having

an information provision matching with q. Let S be an information provision; we say S

matches that the query q if and only if ∃I ∈ S such that I ≈ q.

We propose to disseminate queries in the same way as advertisement dissemination. A

data-requester forwards a query to its neighbors located in the direction of a data-source

having an information provision matching with the query. The peers receiving the query

forward it to some of their neighbors in a similar way. The query can be forwarded up to a

fixed number of hops.

The discovery approach introduced in this section performs file discovery by combining

a push method (i.e., the distribution of advertisements) and a pull method (i.e., the

distribution of queries). Both methods are performed according to the interests of users. In

addition to the interests of users, the users’ patterns of collaboration are considered during

the advertisement dissemination.

3.4 Interest Identification

3.4.1 Interest Identification from Historical Data

The users’ interests can be specified by themselves reactivelly. In our scenario, Pascal

can state that he is interested in receiving jokes in Bus 37. The precise interests of users

can also be computed automatically from queries and advertisements.


70

An information demand of a peer (i.e., the interests of the peer to receive information)

can be identified from his/her historical queries. Let Q be the set of queries distributed by a

peer p in a MANET in a context c. We propose to identify Information-demand(p, c) from

Q using the following two steps.

Step 1: Decomposing queries

Queries in Q are classified into different groups by using Algorithm 3-2, which is

derived from the agglomerative hierarchical clustering approach [88]. We will use the

queries in Table 3-3 and their similarity in Table 3-4 to illustrate Algorithm 3-2.

Table 3-3: Examples of queries

Query Descripition

q1 {tree, bush, grass, sidewalk}

q2 {tree, bush, sidewalk}

q3 {tree, bush, grass, ground}

q4 {tree, bush, grass, sidewalk, rock}

q5 {tree, bush, flower, grass}

q6 {clear, sky, tree, bush, ground}

q7 {overcast, sky, tree, bush, grass, sidewalk}

q8 {tree, grass, sky}

q9 {tree, grass, clouds, sky}


71

Algorithm: Decomposition of queries Input : Q Q: a set of historical queries. Output: G: G: a subset of the power set of Q such that queries in the same element of G are more similar than queries in other elements of G. Begin

//initialize grouping 1. G=∅ 2. For all q∈Q 3. G= G ∪ {{q}} 4. End for 5. Repeat

//merge two similar sets of queries 6. Gnew ∅ 7. While (G !=∅) 8. Qc randomly selected element of G 9. G G – {Qc}

/*search a set Qk such that every element in Qc is similar to every element in Qk*/ 10. If (∃Qk∈G such that ∀qi∈ Qc , ∀ qj ∈ Qk , qi ≈ qj && for any Qs ∈ G, one of the

following property occurs) //There are dissimilar elements in Qc and Qs (A) ∃qi∈ Qc , ∃qj ∈ Qs such that qi !≈ qj //or Qc is more similar to Qk than Qs

(B)

Similarity(qi,q j )q j ∈Qk

∑qi ∈Qc

∑Qc * Qk

≥

Similarity(qi,q j )q j ∈Qs

∑qi ∈Qc

∑Qc * Qs

11. G G – {Qk} 12. Gnew Gnew ∪ {Qc ∪ Qk} 13. Else 14. Gnew Gnew ∪ {Qc} 15. End if 16. End while 17. G Gnew 18. //Repeat the above computations until any two sets contains dissimilar queries 19. Until: ∀Qc , Qk ∈G, ∃qi∈ Qc , ∃qj ∈ Qk such that qi!≈ qj End Algo

Algorithm 3-2: classification of Description


72

Table 3-4: Similarity values calculated by using the formula presented in Definition 3.2

where similarity threshold is 0.5

q1 q2 q3 q4 q5 q6 q7 q8 q9

q1 1 0.9 0,8 0,9 0,8 0,2 0,6 0,6 0,3

q2 0,9 1 0,6 0,8 0,6 0,3 0,5 0,3 0

q3 0,8 0,6 1 0,7 0,8 0,5 0,4 0,6 0,3

q4 0,90 0,8 0,7 1 0,7 0,2 0,6 0,53 0,2

q5 0,75 0,6 0,8 0,7 1 0,2 0,4 0,6 0,3

q6 0,20 0,3 0,5 0,2 0,2 1 0,6 0,3 0,2

q7 0,60 0,5 0,4 0,6 0,4 0,6 1 0,5 0,4

q8 0,60 0,3 0,6 0,5 0,6 0,3 0,5 1 0,6

q9 0,20 0 0,2 0,2 0,3 0,2 0,4 0,7 1

Algorithm 3-2 starts the decomposition process by forming sets of queries such that each

of the sets contains one query (lines 1-4) and places them in G. In our example, G = {{q1},

{q2}, {q3}, {q4}, {q5}, {q6} {q7}, {q8}, {q9}}.

As described from lines 10 to 12, the algorithm merges two sets. Qc is merged with Qk

∈G if and only if (i) Any two elements in the two sets are similar and (ii) If there is another

set Qs in G satisfying the condition stated above (i.e., in (i)), Qc is more similar to Qk than

Qs. Merging of groups of sets is repeatedly performed until there are no similar sets in G (.

According to the example execution flows of the algorithm in Table 3-5, the queries in

our example are decomposed into G= {Q1, Q2, Q3} where Q1 = {q1, q2, q3, q4, q5} and Q2 =

{q6, q7} and Q3 = {q8, q9}.


73

Table 3-5: Example of execution flows during the decomposition of queries

Input Q={ q1, q2, q3, q4, q5, q6, q7, q8, q9}

Initialization G={{q1}, { q2}, {q3}, {q4}, {q5}, {q6} {q7}, {q8}, {q9}}

The first iteration G={{q1, q4}, { q2, q3}, {q5}, {q6, q7}, {q8, q9}}

The second iteration G={{q1, q4, q2, q3}, {q5}, {q6 ,q7}, {q8 ,q9}}

The third iteration G={{q1, q4, q2, q3, q5}, {q6, q7}, {q8, q9}}

Output G={{q1, q4, q2, q3, q5}, {q6, q7}, {q8, q9}}

Step 2: Identifying information demand

In this step, the notation Occurrence(Qc, ,k) is used to represent the number of queries in

Qc containing the keyword k; and the notation Keys(Qc) is used to represent the union of

the queries in Qc, i.e. Keys(Qc) = . ∪cQqq

∈

Let G = {Q1, Q2,…, Qn} be the result of step 1. Information-demand(p,c) ={I1, I2, …,In}

is computed from G = {Q1, Q2,…, Qn} such that the interest Ic has the following properties.

Weight(Ic) =| Qc| / |Q|

Description(Ic) is the set of the most popular keywords6 in queries in the set Qc, i.e.,:

- |Description(Ic)|= min[maxK, Keys(Qc) ] where maxK is the maximum number of

keywords used to describe an interest;

- Description(Ic) ⊂ Keys(Qc);

- Occurrence(Qc, ki) ≥ Occurrence(Qc, kj) ∀ki ∈ Description(Ic) and

∀kj ∈ Keys(Qc) – Description(Ic)

Example: Assume that maxK is 3, consider the queries in Table 3-3 and their

decompositions in Table 3-5, the information demand consists of the interests displayed in

Table 3-6.

6 If two keywords have the same popularity, one of them is selected randomly.


74

Table 3-6: Interests produced from queries listed in Table 3-3.

Interest Descripition Weight

I1 {tree, bush, grass } 0.6 (≅ 5/9)

I2 {sky, trees, bush, } 0.2 (≅ 2/9)

I3 {tree, grass, sky} 0.2 (≅ 2/9)

Interests with identical descriptions can be formed since we have fixed the maximum

number of keywords and this maximum number might be much less than the number of

keywords in the queries. If maxK were 2; for our example, the set containing interests

(tree, bush}, 0.6), ({tree, bush}, 0.2) and ({sky, cloud}, 0.2) would have been produced.

According to Definition 3.5, the descriptions of two interests should not be identical. Thus,

the interests with the same descriptions should be combined together as follows. If there

are interests I1, …., Ik in Information-demand(p,c) having identical descriptions, they will

be replaced by an interest I having the following properties:

• Description(I)= Description(I1)

• Weight(I)= ∑=

k

iiIWeight

1)(

An interest I is removed from Information-demand(p, c) if Weight(I) < minW, the

minimum threshold introduced in Definition 3.5. Assume minW be 0.5; the second and the

third interests will be removed from the set of interests listed in Table 3-6.

Let sumNI be the sum of the weights of the interests in Information-demand(p, c). An

empty interest Ie is computed with weight 1-sumNI if 1-sumNI ≥ minW. If 1-sumNI <

minW but sumNI < 1, the weight of each interest I is normalized as:

SumNI Weight(I) =

Weight(I)

For our example, the information demand contains one interest with weight 0.6. SumNI

is also 0.6. Therefore, the normalization of this interest’s weight (0.6/0.6) will make its

weight 1.


75

In summarize, in this section, we discuss the identification of the information demands

of a peer from historical queries. The Information provisions of a peer can be determined

from historical advertisements in a similar manner.

3.4.2 Interest Estimation via Association Rules

Identifying users’ interests by analyzing historical data forces a data-requester to wait

until enough queries are produced. However, we argue that users often show similar

interests in similar contexts. Therefore, we propose to use association rules to resolve the

mentioned problem.

An association rule has two parts: antecedence and consequence. The antecedence of a

rule describes the condition that should be satisfied in order to the consequence of the rule

holds true. Association rules are attached with two important values: support and

confidence. The support of a rule is the probability that a randomly selected element will

have the property described by either of its antecedence or its consequence. The confidence

of a rule is the probability that an element has the property specified by the consequence of

the rule given that the element matches the antecedence.

Association rules can be produced from historical data. The produced rule is rejected if

it has a support and a confidence inferior to pre-defined constants called minimum support

and minimum confidence respectively.

An association rule with respect to an information demand is written as:

<Context=c> ⇒ <Information-Demand=D>

As discussed in Definition 3.5, a context is represented as (L, [Ts,Tf]) where Ts is the

start time, Tf is the end time and L is a location. For a sharing context c= (L, [Ts,Tf]), let us

refer Ts as Start-Time(c) and Tf as End-Time(c). The antecedence <context=c> is

equivalent to


76

• <Start-Time=Ts> and <End-Time=Tf> and <Location=L> when both time and

location are specified.

• <Location=L> when the time interval is ∅ but the location is specified

• <Start-Time=Ts> and <End-Time=Tf> when the location is not specified but

the time is specified

The mining of a temporal condition in the form of <Time-Start=Ts> can be produced

by observing the patterns of the start-time contexts with the help of the method proposed

by Sheng Ma et el [89]. Spatial conditions (we call them location antecedents) in the form

of <Location=value> can be produced by mining the location contexts with the help of the

Apriori algorithm [90, 91].

Suppose a temporal condition <Time-Start=Ts> is already identified. Assume the

contexts C = {c1, c2, c3, …, cn} match the condition <Time-Start=Ts>, a time antecedent in

the form of <Time-Start=Ts> and <End-Time=Tf> can be obtained if

• End-time(c) ≤ Tf, ∀c ∈ C and

• ∃ c ∈ C such that End-time(c)=Tf

Let ant be an antecedence representing a context; Demands(ant,p) be a set of

information demands of the peer p at the context described by ant; Demands(ant,p) can be

known from history; let min-Conf be the minimum confidence; and finally let SD ⊂

Demands(ant,p) be a set satisfying the following condition.

Di ≈ Dj, ∀Di , Dj ∈ SD.

If |SD|/|Demands(ant,p)| > min-conf, we derive the following rule.

ant ⇒ < Information-Demand=Ds>

where Ds is an information-demand computed as:

∑∈

⊕=DSD

s DD


77

Illustration: In the scenario discussed in section 3.1, Pascal exchanges information in

Bus 27 with people working in a bank and in Bus 37 with friends. Assume that the

antecedent <context=(””, [8AM-8:10AM])> has been already identified and the

information demands of Pascal are listed in Table 3-7.

Table 3-7: Historical data of Pascal

Historical

data

Start-Time End-Time Location

context

Information-

Demand

s1 8AM 8:07AM Bus 27 D1

s2 8AM 8:06AM Bus 27 D1

s3 8AM 8:05AM Bus 37 D2

s4 8AM 8:07AM Bus 27 D1

s5 8AM 8:05AM Bus 37 D3

s6 8AM 8:07AM Bus 37 D4

D1= {({finance},0.70), ({news},0.30)}, D2={({joke},1)}

D3 = {({photo},1)}, D4={({news},1)}

D1 is the information demand of Pascal in the historical data s1, s2, and s4. Therefore, the

rule is produced7 to indicate the information demand of Pascal at the context (””, [8AM-

8:10AM]).

< context = (””,[8AM-8:10AM]>)�<Information-Demand = D1>

Association rules can be modified in order to increase their confidences. For instance, in

order to make the confidence of the above rule equals to 1, we can simply modify the

antecedence as <context = (Bus 27, [8AM-8:10AM]>.

7 To produce a rule, we should compare the confidence and the support of a rule with predefined constants. For the sake of simplicity, we skip this step in the illustration.


78

In this section, we discuss mining of association rules with respect to the users’

information demands. The information provisions of users can be computed in the same

manner.

3.5 Social Networking

A requester peer can use association rules to identify his/her information demands. A

data source peer can produce association rules to identify information demand of requester

peers. However, rule-mining processes are too expensive to be used for every requester

peer encountered in a MANET. Therefore, a data-source should select important peers to

which association rules are produced. We propose that social links of a data source can be

used to identify the important peers.

As a data-source can have several social links, the reasoning required to identify the

interests of requester peers could be expensive. We propose organizing peers that have a

habit of sharing information with a data source peer into social groups according to the

similarity of their interests. The social groups are then used to identify the interests of the

peers.

Social networks, which include social groups and links, are computed semi-

automatically based on the following assumption: “social network exists between users

who collaborate frequently with each other”. In the scenario discussed in section 3.1,

Pascal, David and Carol exchange jokes whenever they meet in Bus 37. These frequent

collaborations between these persons indicate that there is a social link between them.

3.5.1 Social Link

A social link, denoted as L(pi,pj), is a relationship between two peers pi and pj. The

notation Valid-context(L(pi,pj)) is used to represent the set of sharing contexts in which the

social link L(pi,pj) is valid. In our scenario, Pascal exchanges information with his

colleagues and assume that he communicates with these persons only in INSA. Thus, he

has social links to his colleagues and (INSA, ∅) is the valid context of these social links.


79

As it is done in social-network-sites (e.g. Facebook), a user can manually specify

his/her social links. In the scenario, Pascal, David and Carol are friends. Assume that

Pascal specifies David and Carol as his friends; in this case, there will be a social link

between Pascal and David as well as between Pascal and Carol.

A social link can also be computed semi-automatically by analyzing the degree of

users’ collaboration in MANETs. We argue that the existence of a high degree of

collaborations between p1 and p2 indicates the existence of a social link between these two

peers. As discussed in section 3.3, the threshold of Tie-strength (i.e., min-tie) indicates a

high degree of collaboration between peers. As a result, a social link L(p1, p2) is formed if

and only if Tie-strength(p1, p2) ≥ min-tie. The context (“”,∅) is placed in Valid-

context(L(p1, p2)). Users can also specify the valid contexts of L(p1, p2).

In the scenario discussed in section 3.1, Pascal, Anne, Bob and Eve exchange

information in Bus 37. Suppose they did not specify the fact that they are friends. Their Tie

strengths, which are measured by files/day, are listed in Table 3-8. Let min-tie be 0.1

files/day; we can conclude that there are social links between Pascal and Eve and Anne has

social links with all the mentioned persons.

Table 3-8: Tie-Strengths between Pascal, Anne, Bob and Eve

Pascal Anne Bob Eve

Pascal 0.5 0.001 1

Anne 0.5 0.3 0.6

Bob 0.001 0.3 0.04

Eve 1 0.6 0.04


80

∈

3.5.2 Social Grouping

A social group, denoted as G(P,C,p), is a set of peers in P having similar links with a

peer p in a context c ∈ C; the peer p is called “observer peer” and C is a set of valid

contexts for the group. Demand-In-Group(G(P,C,p),c) denotes the common information

demand of peers in P as observed by p in a context c. The social group G(P,C,p) satisfies

the following properties:

• Information-Demand(pi,p,c) ≈ Information-Demand(pj,p,c) for pi, pj P and

c∈C; and

∈

• Demand-In-Group((G(P,C,p),c) is an empty-sharing interest if c ∉C.

The common information demand of peers in the social group G(P,C,p) in the context c

C is obtained by aggregating the information demands of peers in P, i.e.,

)),,(()cp),C,(G(P,( cppLDemandnInformatioGroupInDemand piPpi

∑∈

−⊕=−−

We propose to compute a social-group as follows. Let P be a set of peers having social

links with a data-source pp. Let PG⊂ P be the set of peers and let C be a set of contexts.

Social group G(PG,C,pp) will be formed if and only if:

• where L(p)),((∩GPp

pi ppLcontextValidC∈

−= i,pp) is a social link between the peers pi

and pp

• Information-Demand(pj,pp,c) ≈ Information-Demand(pi,pp,c) ,∀c ∈ C and ∀pi, ∀pj

∈ PG

In the scenario introduced in section 3.1, Carol and David are interested in receiving

jokes in a bus. According to section 3.5.1, there are social links L(Pascal, Carol) and

L(Pascal, David). Therefore, the following group can be formed.

G({Carol, David}, {(8AM-8:05AM, Bus 37)}, Pascal)


81

c

We propose to determine the users’ information demands by analyzing their social

groups. For a peer p and a context c, a data source pp computes the information demand of

the peer p by analyzing the social groups of p in the context c. GPrt(p,c), the set of groups

of p in the context c, is computed by pp as follows.

GPrt(p,c) = {G(P,C,pp)|p∈P and Ci ∈∃ such that ci ≈ c}.

The context ci is similar to c if and only if the time context and the location context in ci

are similar to the respective contexts in c. The time context in ci is similar to the time

context in c if (1) the time context is not specified in c or (2) the time context in ci is

included in the time context in c. The location context in ci is similar to the corresponding

context in c if (1) the location context is not specified in c or (2) the location context in ci is

same as or an instance of the location context in c.

If GPrt(p, c) is not empty, pp can compute the information demand of the data-requester

p at context c as:

∑∈

−−⊕=−),GPrt(

p ),( ) c,p (p, cpG

cGGroupInDemandDemandnInformatio

In this section, we discuss the computation and the usage of social groups with respect to

information demands. Social groups with respect to information provisions are used and

computed in the same manners.

3.6 Discussion

Publish/Subscribe systems as the one proposed in [92] can be considered as interest

aware systems. A publish/subscribe system consists of publishers, subscribers and a

delivery infrastructure (a sequence of brokers). In such system, a publisher produces

events; subscribers declare their interest on the events and the delivery infrastructure

forwards the events from a publisher to the corresponding subscribers. However, in

MANETs, construction and modification of delivery infrastructures are expensive since

most of the involved computing devices are mobile, battery-powered and thin. Moreover,


82

publish/subscribe systems are not necessarily interest-aware. The interest-awareness

feature of this system depends on the kind of the events that publishers produce.

Lindemann and Waldhorst [25] proposed replication of indices of popular queries at

several mobile devices in order to facilitate the localization of files for those queries. Even

if popular queries represent common interests of users, identifying them is time consuming

and difficult in MANETs where users may not be aware of other participants in advance.

In contrast to MANETs, the learning of users’ interests has been extensively studied in

Web search and mining [93, 94]. Click history [95] and browse history [96] have been

proposed to capture the users’ interests automatically.

Recommender systems such as Letizia [97] and Watson [98] make suggestions to users

based on inferences made about their interests gathered from the recently viewed Web

pages or the contents of active desktop applications.

StumbleUpon (stumbleupon.com) is a recommender system that uses collaborative

filtering (CF), which is an automated process combining human opinions with machine

learning of personal preferences, to create virtual communities of like-minded Web surfers.

Rating Web sites updates a personal profile (a blog-style record of rated sites) and

generates peer networks of Web surfers linked by a common interest. These social

networks coordinate the distribution of Web content, so that users “stumble upon” pages

explicitly recommended by friends and peers. However, recommendations from CF

systems typically require explicit action from a large community of users [99].

Recommendation engines of e-commerce sites (e.g. Amazon and Netflix) works by

exploiting the users’ interests, which are derived both from their explicit actions (e.g.

buying a product) and from their interaction log behavior (e.g. clicking on certain

categories of products). In the Web search area, the interests of users, which are derived

from interaction logs, can be used to create automated-Web-search-engine evaluation

facilities [100].


83

To identify the interests of users in a MANET, we have been inspired from the

identification of users’ interests based on the click-history, the browse history and to

interaction logs to design our interest identification approach. Our interest identification

approach is different from the discussed approaches in the facts that (1) we identify

interests according to the context of users, and (2) we use social networking of users and

association rules to facilitate the interest identification process.

3.7 Conclusion

In this chapter, we have proposed a comprehensive solution to discover file according to

the interests of users. We have introduced two types of interests: information demand and

information provision. The information demand is the set of interests of a user to receive

information; and the information provision is the set of interests of a user to provide

information. The users’ information demands are used to determine files to be advertised

and guide the dissemination of advertisements while information demands are used to

determine where and how queries are resolved. In this chapter, we have also proposed

approaches to (1) identify the users’ interests by analyzing their information sharing

activities and (2) to produce association rules that correlate the users’ interests with their

contexts. Finally, we have investigated how to use the users’ social networks to facilitate

the interest identification process.

Chapter 4 Lifetime Awareness

85


As discussed in the previous chapter, in a Mobile Ad-hoc NETwork (MANET),

information discovery is usually performed by distributing advertisements and queries. In

order not to load the environment with unnecessary traffic, an advertisement policy should

be designed according to the users’ information consumptions and provisions. Information

consumptions and provisions are limited by the users’ context and stay-time (i.e, the time

that they stay together). Consequently, these parameters are of primary importance for the

design of an efficient advertisement policy.

In this chapter, we define a concept called mobility class to parameterize advertisements

according to the mobility profile of a data source. A peer can manually specify mobility

classes. We also propose an approach to generate the mobility classes semi-automatically

according to the peers’ sharing habits. Finally, we discuss the identification of mobility

classes by using both the users’ mobility patterns and habit rules.

The research work of this chapter was presented in the 11th IEEE International

Conference on Mobile Data Management (MDM 2010)[101].

The chapter is organized as follows. We discuss the motivation behind the

parameterization of advertisements according to the users’ profile and their context in

section 4.1. Section 4.2 formalizes the concept mobility class; Section 4.3 deals with the

semi-automatic generation of mobility classes. Section 4.4 discusses the identification of

mobility classes. Finally, we conclude the chapter in section 4.5.


86

4.1 Overview

In MANETs, file discovery is usually conducted by using two approaches: pull and push.

In the former, requester peers discover files by querying peers in the vicinity. In the later,

data-sources make others aware of their sharable files by distributing advertisements. As in

the most of discovery protocols, in MANETs, information discovery should be performed

by using a hybrid of the two approaches.

An advertisement policy dictates the volume of information to be advertised, the period

after which another advertisement could be made and the number of hops that an

advertisement traverses. We argue that an appropriate advertisement policy must be

designed in such a way that the usage of a push approach is maximized without imposing

insupportable overheads.

An advertisement policy should be designed with respect to the information

consumptions and provisions of users, which are limited by the users’ stay-times and their

contexts. For instance, in the scenario presented in chapter 1, Pascal is involved in

information sharing in a bus, a stadium, a street, a shop and a restaurant. Assume that

Pascal stays from five to seven minutes in a bus. As passengers do not have time to

download all the advertised files, it is not necessary to advertise all Pascal’s sharable files

in such kind of environments. Oppositely, the stay-time of Pascal in a stadium is much

longer than in a bus. Therefore, the volume of an advertisement in a stadium should be

greater than in a bus.

Furthermore, the shorter the stays time, the higher the dynamicity of the network will be.

A highly dynamic MANET is the one formed in a street. The stay-time of a user in a street

is short. In each minute, there are a number of cars joining and leaving a MANET formed

in a street. Thus, the stay time of peers is short in this kind of MANET. In order to make

new peers aware of the sharable files, advertisements should be made more frequently in

such kind of MANET than in other kinds of MANETs.


87

The peers’ stay-times also indicate the feasibility of the dissemination of advertisements8

with respect to the distance between peers. Assume that a data source peer and a target

peer are found at the two extreme ends of a big shop hall and the data source peer sends an

advertisement about a file to the target peer. First, the target might leave the shop before

getting the advertisement. Second, if it is possible that the target receives the

advertisement, the data source may leave the shop while the download is going on/or even

before receiving the data download request.

We introduce, in this chapter, a concept called “mobility class” to describe catagories of

MANETs with respect to the users’ stay-times and contexts. The same policy of

advertisement is applied in all MANETs of the same mobility class. In order to consider

the evolution of the users’ information needs and their capacity of information provision,

mobility classes are continuously revised and updated.

The contributions of this chapter are (1) the formalization of the concept of mobility

classes, (2) the generation of mobility classes semi-automatically and (3) the identification

of mobility classes

4.2 Formalization

A MANET is defined as a collection of devices interconnected by wireless

communication technologies. At the application level, a MANET can be seen as a

collection of users (peers). Peers can have a direct connection or they can be connected

through other peers. They are called one-hop neighbors if they have a direct connection

and are multi-hop neighbors if they are connected with the help of other peers.

Every peer has a local view of a MANET. Assume that every passenger of a bus has a

computing device. There is always a MANET in the bus in its normal operation. However,

the MANET view for a particular user is bounded by the time he/she enters and quits the

bus. In addition to the time and location contexts, a MANET view is bounded by the

awareness of peer about others in the environment. A peer is aware of his/her one-hop

8 Remember that we implement the dissemination of advertisements with respect to the interests of users (cf. chapter 3).


88

neighbors. He/she is aware of his/her multi-hop neighbors through different

communications that he/she has made with his/her neighbors.

Definition 4.1. MANET and MANET View: A MANET, denoted as V(P), is a set of

peers communicating via wireless communication technologies. A MANET-view9,

denoted as V(P,p0), is a projection of a MANET defined by using the local knowledge of a

peer p0. Thus, V(P,p0) is a collection of peers in a MANET V(P) that the peer p0 is aware

of, i.e.,

V(P,p0) = {p|p∈P such that p0 is aware of the existence of p}.

Definition 4.2. Connectivity-lifetime: For peers p1 and p2, let note stay-time(p1, p2) be

the time that p2 and p1 are expected to stay connected together. Connectivity-

lifetime(V(P,p0,)) is defined as the average time that a peer p0 stays with the other peers in

a MANET-view V(P,p0) ,i.e.,

)pV(P,

)p Time(p,-stay

))pV(P, lifetime(-tyConnectivi0

0

0

0

)pV(P,∑

∈=

p

Definition 4.3. Adv-usage: Let us consider a MANET view V(P,p0) and a peer p0 ∈ P;

Adv-usage(p0) denotes the usage factor of advertisements made by p0 in V(P,p0). It

describes the number of files provided by p0 with respect to the number of files advertised

by the peer. It is computed as

Adv-usage(p0) =|DAF|/ (|Adv-population| *|AF|) where

AF: collection of files that have been advertised by p0. AF may contain duplicated

elements; a file appears twice in AF if it has been advertised twice.

DAF: collection of files that have been advertised and have been downloaded from

p0. By definition, all files in DAF are also a member of AF. A file appears twice in

DAF, if it has been downloaded twice.

Adv-Population: peers in V(P,p0) that received the advertisement made by p0. 9 We use the terms view and MANET view interchangeability.


89

Definition 4.4. Sharing statistics: Let us consider a MANET view V(P,p0) and a peer p0

∈ P; a sharing statistics, denoted as s(p0,c), describes the quantitative behavior of peers in

V(P,p0) in the actual sharing context c as observed by p0. A sharing statistics s(p0,c) is

composed of the following attributes.

Hop(s(p0,c)): average distance that exists between the peer p0 and the other peers ,

Files-provisioned(s(p0,c)): average number of files provided by p0 to a peer,

Queries-received(s(p0,c)): average number of queries received by p0,

Usage-factor(s(p0,c)): Adv-usage(p0) described in Definition 4.3 and

Co-lifetime(s(p0,c)): connectivity-lifetime(V(P,p0)) described in Definition 4.2.

Definition 4.5. Mobility Class: A mobility-class m(p,c) is a structure used by a data-

source peer p to describe a class of MANET-views according to their connectivity-lifetimes

in the abstract sharing context c in such a way that the same advertisement policy is

applied in the MANETs described by the same mobility classes. More precisely, a

mobility-class m(p,c) is characterised by the abstract context c and a range connectivity

lifetime, which is referred as range–lifetimes(m(p,c)). A MANET views described by the

mobility class m(p,c) should statisify the following properties:

The contexts of the MANET views are actual sharing contexts of the abstract context

c

Their connectivity-lifetime lies in range–lifetimes(m(p,c)). If range–lifetimes(m(p,c))

is [tmn,tmx), the connectivity lifetime of a MANET-view described by m(p,c) is

greater than or equal to tmn and is less than tmx.

The advertisement policy attached with the mobility class is denoted as adv-

policy(m(p,c)). adv-policy(m(p,c)) is the advertisement policy applied in MANET-views

described by m(p,c). The advertisement policy is composed of three attributes: Adv-

volume(m(p,c)), Adv-radius(m(p,c)) and Adv-period(m(p,c)). Adv-volume(m(p,c)) is the

maximum volume of an advertisement, Adv-radius(m(p,c)) is the maximum number of

hops that an advertisement traverses and Adv-period(m(p,c)) is the time after which an

advertisement should be repeated.


90

The two efficiency measures named Satisfactory-Factor(m(p,c)) and Overload-

Factor(m(p,c)) are attached with the mobility class. Satisfactory-Factor(m(p,c)) is the

minimum usage factor of advertisement that a MANET view described by m(p,c) is

targeted to achieve. Overload-Factor(m(p,c)) is the targeted maximum number of queries

received by p in a MANET view described by m(p,c).

Example: Assume that there is a mobility class with the context, the range-lifetimes and

the advertisement policy displayed in Table 4-1. This mobility class describes that a peer

can advertise a maximum of two files to its one-hop neighbors every one minute in a

MANET view formed in a bus from 8 AM to 10 AM and having a connectivity lifetime in

the range of [0,7) minutes.

Table 4-1: example of a mobility class

Context Range-lifetimes

(in minute)

Adv-volume Adv-radius Adv-period

(in minute)

(Bus,[8 AM, 10AM] [0, 7) 2 1 1

An inactive mobility class describes a class of MANET views in which nothing is

advertised. Thus, for the inactive mobility class m(p,c), adv-volume(m(p,c)) = 0 and

information discovery is performed by using the pull approach (i.e., query distribution) in

such kinds of MANET-views described by this mobility class.

Definition 4.6. Sharing statistics of a mobility class: Let us consider a peer p and a

mobility class m(p,c); the sharing statistics of a mobility class m(p,c), denoted as S(m(p,c)),

is defined as the sharing statistics of p in the MANET-views that are described by the

mobility class m(p,c). More precisely, for a mobility-class m(p,c), S(m(p,c)) is the set of

sharing statistics s(p,ca) such that Co-lifetime(s(p,ca))∈ range-lifetimes(m(p,c)) and ca is an

actual sharing context of the abstract context c. S(m(p,c)) is described by the following

attributes.


91

Queries-received (S(m(p,c))) is the average number of queries received by the peer p

in the MANET-views described by m(p,c).

Usage-factor (S(m(p,c))) is the average usage factor of advertisements made by the

peer p in the MANET-views described by m(p,c).

4.3 Mobility Class Generation

4.3.1 Operation

Operation 4.1 Adv-characteristic-computation(m(p,c)): The operation is used to

compute the advertisement policy, the satisfaction factor and the overload-factor of a

mobility class m(p,c) by using the sharing statistics of m(p,c).

For not overloading the environment with advertisements, the volume of advertisement is

set less than or equal to the minimum number of queries seen in the history. The minimum

number of hops seen in the history is assigned to the number of hops that an advertisement

traverses. For the period, the average connectivity-lifetimes attached with the sharing

statistics is taken. More precisely, the advertisement policy of the mobility class m(p,c) is

computed as:

• Adv-volume(m(p,c)) = minimum(av-FP, min-QR),

• Adv-radius(m(p,c))= min-hops, and

• Adv-period(m(p,c)) = avg-cnt

where

• av-FP: average value of the “files-provisioned” attribute in S(m(p,c)).

• min-QR: minimum value of the “queries-received” attribute of the sharing

statistics in S(m(p,c)).

• min-hops: minimum hops in S(m(p,c)).

• avg-cnt: average connectivity lifetimes in S(m(p,c)).

Let’s observe the sharing statistics in Table 4-2; if S(m(p,c)) contains s1, s2, s3 and s4,

• Adv-volume(m(p,c)) = minimum( (2+3+4+5)/4 , 2)=2


92

• Adv-radius(m(p,c)) =1 and

• Adv-period(m(p,c))=(1+2+3.5+3)/4=2.4

In the same way, we have the mobility classes listed in Table 4-2.

Table 4-2: Examples of sharing statistics

No Co-lifetime (minutes) Hops Files-provided Queries-received Usage

factor

s1 1 1 2 3 0.6

s2 2 1 3 2 0.3

s3 3.5 2 4 10 0.2

s4 3 2 5 12 0.6

s5 4 2 5 20 0.8

s6 5 2 10 15 0.7

s7 7 2 15 15 0.7

s8 9 2 20 25 0.6

Table 4-3: Range-lifetimes and advertisement volumes of classes

Mobility class Range of lifetimes Adv-volume Hops Period

m1 [0, 4) 2 1 2.4

m2 [4 , 6) 7.5 2 4.5

m3 [6 ,∞) 15 2 8

We propose to compute Satisfactory-factor(m(p,c)) and Overload-Factor(m(p,c))

according to the following objectives.

1. At least half of the advertised files should be downloaded and

2. The number of the received queries in the mobility-class should be less than the

average number of queries that has been seen in the history.

Thus, the satisfaction and the overloading factors of a mobility class m(p,c) are assigned

as:


93

• Satisfactory-Factor(m(p,c))=0.5

• Overload-Factor(m(p,c))= avg-QR where av-QR is the average value of

the attribute queries-received in S(m(p,c).

The satisfaction-factor and the overload-factor of a mobility class can also be changed

manually.

Operation 4.2. Efficient (m(p,c)): This operation is used to evaluate the efficiency of a

mobility class from its sharing statistics observed in the history. Thus, the efficiency of a

mobility class is computed if and only if S(m(p,c)) ≠ ∅. A mobility class is efficient, i.e.,

Efficient(m(p,c), if the following properties hold true.

• The mobility class is not inactive class, i.e., Adv-Volume(m(p,c)) > 0.

• The average number of queries received in the history is less than the targeted

number of queries for the mobility class, i.e., Queries-received(S(m(p,c)))≤

Overload-Factor(m(p,c)).

• The average usage factor in the history is less than what has been targeted for the

mobility class, i.e., Usage-factor (S(m(p,c))) ≥ Satisfactory-Factor (m(p,c)).

Operation 4.3. Consecutive(mi(p,c),mj(p,c)): Two mobility classes are consecutive if

and only if the upper value in the range-lifetimes of one of the mobility class is the lower

value in the range-lifetimes of the other. Thus, Consecutive(mi(p,c),mj(p,c)) is true if one

of the following conditions is satisfied.

• t2i = t1

j or

• t2j = t1

i

where

• [t1i,t2

i) is range-lifetimes(mi(p,c))

• [t1j,t2

j) is range-lifetimes(mj(p,c)).

Operation 4.4 Included(mi(ps1,c),mj(ps2,c)): For mobility classes mi(ps1,c) and mj(ps2,c)

in a context c; let the range lifetimes of mi(ps1,c) and mj(ps2,c) be [t1i,t2

i) and [t1j,t2

j)

respectively. The mobility class mi(ps1,c) is said to be included in a mobility class mj(ps2,c)

if and only if


94

•

jiji

t1i ≤ t1

j and

• t2i ≤ t2

j

Operation 4.5. Merge(mi(p,c),mj(p,c)): The merge operation can only be applied over

two consecutive mobility-classes mi(p,c) and mj(p,c). Let the range-lifetimes of the

mobility-class mi(p,c) be [t1i,t2

i) and the range-lifetimes of the mobility-class mj(p,c) be

[t1j,t2

j). The operation Merge(mi(p,c),mj(p,c)) produces a new mobility-class m(p,c) such

that m(p,c) has all the properties of mj(p,c) except its range-lifetimes and sharing statistics.

The range-lifetimes(m(p,c)) is calculated as:

⎪⎩

⎪⎨⎧ ==

Otherwise) t,[tt t if) t,[t c))m(p,lifetimes(-Range i

2j

1

1221

The sharing statistics attached with the mobility class is re-initialized, i.e., S(m(p,c))=∅.

The objective of applying the operation Merge is to enhance the efficiency of a non-

efficient mobility class by using the advertisement policy of an efficient mobility class.

Example: Let us consider mobility classes in Table 4-3, Table 4 4 shows the merging of

the mobility class m2 with the mobility class m1 and vice versa.

Table 4-4: Merging of mobility classes

Merging Rang-lifetimes Adv-

Volume

Adv-

radius

Adv-

period

m2 with m 1 [0, 6) 2 1 2.4

m2 with m1 [0, 6) 7.5 2 4.5

Operation 4.6. Copy-adv(mi(p1,c), mj(p2,c)): For mobility classes mi(p1,c) and

mj(p2,c), Copy-Class(mi(p1,c),mj(p2,c)) produces a mobility class m(p1,c) having all the

properties of mi(p1,c) except the advertisement policy and the sharing statistics. The

advertisement policy of the new mobility class is the same as the advertisement policy of

mj(p2,c), i.e., Adv-policy(m(p1,c))= Adv-policy(mj(p2,c)). The sharing statistics attached

with the mobility class will be re-initialized, i.e., S(m(p1,c))=∅.


95

The objective of the operation Copy-Adv10 is to enhance a mobility class prepared for a

peer p1 in a context c by using a mobility class prepared for a peer p2 such that p1 and p2

have similar information provisions in the context c.

Operation 4.7. Divide(m(p,c)): The operation Divide returns nothing(i.e., ∅) if the

mobility class m(p,c) is not divisible. The mobility class m(p,c) is said to be divisible if

there are at least two sharing statistics in S(m(p,c)) having different connectivity lifetimes,

i.e.,

∃s1, s2 ∈ S(m(p,c)) such that Co-lifetime(s1) ≠ Co-lifetime(s2)

Let us define the notation unique(S) before describing how the division operation is

performed. For a set of sharing statistics S, we use unique(S) to represent the sharing

statistics in S having different connectivity lifetimes, i.e.,

• ∀s1∈ S, ∃s2∈ unique(S) such that Co-lifetime(s1) = Co-lifetime(s2) and

• Co-lifetime(s1) ≠ Co-lifetime(s2) ∀s1, s2∈ ∈ unique(S)

For a divisible mobility class m(p,c) having a range-lifetimes [t1,t2), the operation

Divide(m(p,c)) produces two mobility-classes m1(p,c) and m2(p,c). The range-lifetimes of

the mobility-classes m1(p,c) and m2(p,c) are [t1, tmd) and [tmd, t2) where tmd is the median

connectivity lifetimes in unique(S(m(p,c))). The sharing statistics attached with the two

mobility classes satisfy the properties:

• Co-lifetime(si) < Co-lifetime(sj) ∀si ∈ S(m1(p,c)) and ∀sj ∈ S(m2(p,c)).

• 12

) c))(p,unique(S(m c))(p,unique(S(mk ±= and |S(mk(p,c))|>1 for k=1 or 2

• Co-lifetime(s) ≥ tmd ∀s∈ S(m2(p,c)) and Co-lifetime(s) < tmd ∀s∈ S(m1(p,c))

• S(m1(p,c)) ∪ S(m2(p,c)) = S(m(p,c))

10 For the sake of simplicity, we consider mobility classes defined for the same contexts. However, it is simple to extend the operation to

consider similar contexts.


96

The advertisement policies, the satisfaction factors and the overload factors of the

resulted mobility classes are computed by using operation 4.1.

4.3.2 Computation

We propose a method to generate mobility-classes for a data source peer ps in a given

context according to the following objective. Mobility classes should be formed in such a

way that (1) the network traffic created by advertisements is manageable, (2) the discovery

of information is facilitated, and (3) the number of queries to be distributed is reduced.

Assume that a peer ps has just started sharing information in a sharing-context c. If there

is a peer pi such that the information provisions of ps and pi are similar, ps can use the

characteristics of the mobility classes of pi to define its own mobility classes. Otherwise, a

mobility-class m(ps,c) with range-lifetimes(m(ps,c))= [0, ∞) and adv-volume(m(ps,c))=0

will be used as the only mobility class. In this case, data-requesters discover the sharable

files of ps by using the pull discovery approach in the context c.

We propose to enhance the efficiency of mobility classes by using three types of

heuristics: optimistic, pessimistic and neutral. The optimistic heuristics modifies mobility

classes based on the assumption: “inefficiency occurs since (1) the advertisement volume is

too limited to include the files needed by users or/and (2) the advertisement radius is too

short to reach the potential users.” The pessimistic heuristics performs modification based

on the assumption: “the inefficiency of the mobility class occurs due to an over estimation

of the advertisement volume.” The neutral heuristics does not make any assumption but

tries to enhance the efficiency of the mobility classes by merging/dividing them as well as

by using the behavior of the mobility classes computed by similar peers.

The optimistic heuristics increases the volume of advertisement while the pessimistic

heuristics reduces the volume of advertisement. Let [t1, t2) be the range-lifetimes of a

mobility-class m(ps,c). Let ∆ and α be pre-defined incremental factors of the advertisement

volume and of the period respectively. Let β be the highest number of hops that an


97

advertisement can traverse. The pseudo-codes in Figure 4-1 and Figure 4-2 are used to

augment and reduce the total number of metadata distributed in a mobility-class m(ps,c).

1. If adv-volume(m(ps,c))+ ∆ ≤ Overload-Factor (m(ps,c)) then

adv-volume(m(ps,c))+=∆

2. if Adv-radius(m(ps,c))< β then Adv-radius(m(ps,c))++

3. if adv-period(m(ps,c)) > α then adv-period(m(ps,c))-= α

Figure 4-1. Augment-Volume

1. if adv-volume(m(ps,c)) ≤ ∆ && (Adv-radius(m(ps,c))≤ 1) &&

adv-period(m(ps,c))+α (t≥ 2- t1) then adv-volume(m(ps,c))=0

2. if adv-volume(m(ps,c))>∆ then adv-volume(m(ps,c))-=∆

3. if Adv-radius(m(ps,c))>1 then Adv-radius(m(ps,c))--

4. if adv-period(m(ps,c))+α <(t2- t1) then adv-period(m(ps,c))+= α

Figure 4-2. Reduce-Volume

In Algorithm 4-1, we use the neutral heuristic as long as possible. However, the neutral

approach cannot be applied if the operations “Merge”, “Copy-adv” and “Divide” cannot be

performed. In this case, we propose the application of the optimistic heuristic.


98

Algorithm: Mobility Class Computation Input: ps, M(p,c) for all p∈ PA∪ {ps}, Inf-Pr(p) for all p∈ PA∪ {ps} ps : data source peer in consideration M(p,c): mobility classes of the peer p in the sharing context c PA: set of peers having similar information provision with ps in the context c and have a high degree of collaboration with ps. Inf-Pr(p): denotes Information-Provision(p) Output: M(ps,c) Begin 1. For each mi(ps,c) ∈ M(ps,c) such that S(mi(ps,c)) ≠∅ 2. If (Efficient(mi(ps,c))) 3. #opt-mod(mi(ps,c)) 0 4. Else

//Merge: Neutral Heuristics 5. Case 1: ( ∃ mj(ps,c) ∈ M(ps,c) | Efficient(mj(ps,c)) &&Consecutive(mi(ps,c), mj(ps,c)) ) &&

(If ∃ mk(ps,c) ∈ M(ps,c) | Consecutive(mi(ps,c), mk(ps,c)) && Efficient(mk(ps,c)) then Usage-Factor(mj(ps,c))≥ Usage-Factor(mk(ps,c)) )

6. M(ps,c) (M(ps,c) - { mi(ps,c), mj(ps,c)}) U Merge(mi(ps,c), mj(ps,c)) 7. #opt-mod(m(ps,c)) #opt-mod(mi(ps,c)) 8. Break

//Copy adv: Neutral Heuristics 9. Case 2:( ∃ p∈ PA && ∃ mj(p,c) ∈ M(p,c) |Included(mi(ps,c), mj(p,c)) && Efficient(mj(p,c))) &&

(If ∃ p’ ∈ PA and ∃ mk(p’,c) ∈ M(p’,c)| Included(mi(ps,c), mk(p’,c)) && Efficient(mk(p’,c)) then Similarity(Inf-Pr(ps), Inf-Pr(p)) ≥ Similarity(Inf-Pr(ps), Inf-Pr(p’)))

10. M(ps,c) (M(ps,c) - { mi(ps,c)}) U Copy-adv (mi(ps,c), mj(p,c)) 11. #opt-mod(m(ps,c)) #opt-mod(mi(ps,c)) 12. Break

//Divide: Neutral Heuristics 13. Case 3: Efficient(m1(ps,c)) || Efficient(m2(ps,c)) for m1(ps,c),m2(ps,c) ∈ Divide(mi(ps,c)) 14. M(ps,c) ( M(ps,c) - { mi(ps,c)} ) U { m1(ps,c),m2(ps,c)} 15. For j=1 to 2 16. #opt-mod(mj(ps,c)) #opt-mod(mi(ps,c)) 17. End for 18. Break

//Augment: Optimistic Heuristics 19. Case 4: #optimal-mod(mi(ps,c)) < opt-Limit 20. Augment-Volume(mi(ps,c)) 21. #opt-mod(mi(ps,c))++ 22. Break

//Reduce: Pessimistic Heuristics 23. Case 5: Usage-Factor(mi(ps,c)) < Satisfactory-Factor(mi(ps,c)) 24. Reduce-Volume(mi(ps,c)) 25. End Case 26. End For End Algorithm

Algorithm 4-1: Mobility Class Computing


99

Let #opt-mod(m(ps,c)) be the number of consecutive optimistic-modifications made on

a mobility class m(ps,c) and opt–limit be the maximum number of times that an optimistic

heuristic can be consecutively applied. The optimistic heuristic is said to be failed if #opt-

mod((m(ps,c)) is equal to opt-limit. A pessimistic heuristic is applied when the optimistic

heuristic failed to work.

Algorithm 4-1 is used to enhance efficiency of the mobility classes of ps in a context c.

The algorithm accepts as inputs mobility classes of the peer and of the set of peers that

have similar information provisions as ps and have a high degree of collaboration with this

peer. The algorithm processes only mobility class that has been applied in the history

because of the following reasons: (1) efficiency of the mobility class is defined based on

historical observations and (2) it is not important to process a mobility class that has never

been used.

Let PA be the set of data-sources such that pi ∈ PA satisfies the following properties:

• tie-strength(pi, ps)>min-tie and

• Information-Provision(pi,c) ≈ Information-Provision(ps,c).

As discussed in section 3.3, min-tie, the minimum tie, is used to indicate a high degree of

collaboration between peers.

Let M(ps,c) be the mobility classes that a data source ps uses to determine the

advertisement policy of MANET-views at context c. The peer ps can use Algorithm 4-1 to

enhance the inefficiency of a mobility-class mi(ps,c) ∈ M(ps,c) via one of the five cases

described below.

Case 1 Merging: The merge operation is applied if there is an efficient mobility class

mj(ps,c) ∈ M(ps,c) satisfying the following conditions: (1) mj(ps,c) is consecutive to

mi(ps,c) and (2) if the other consecutive mobility class of mi(ps,c), let us refer it as mk(ps,c),

is efficient then the usage factor of advertisement of mj(ps,c) is greater than or equals to

that of mk(ps,c). In this case, mi(ps,c) and mj(ps,c) are replaced by the mobility class resulted


100

by merging them (let’s call this class m(ps,c)). As described in Operation 3.5, m(ps,c) will

have all the property of mj(ps,c) except range-lifetimes and historical statistics.

Case 2 Copying: This case is applied if it is not possible to apply case 1 and there is a

peer p ∈ PA having an efficient mobility class mj(p,c) such that the following properties

are satisfied:

• The range-lifetimes of mi(ps,c) is included in the range-lifetimes of mj(p,c)

• The mobility class mj(p,c) is efficient, and

• If there is any other peer p’ satisfying the conditions (1) and (2), the information

provision of ps is more similar to the information provision of p than that of p’ in

the context c.

In this case, the mobility class mi(ps,c) is replaced by the mobility class resulted by

copying the advertisement policy of mj(p,c) to mi(ps,c).

Case 3 Division: This case is applied if there is no efficient mobility class described in

case 1 and case 2 and one of the mobility classes resulted by the division operation is

efficient. In this case, mi(ps,c) is replaced by the mobility classes resulted from the division

operation.

Case 4 Advertisement augmentation: This case is applied if it is not possible to apply

the above cases. In this case, an optimistic approach is used to increase the volume of

advertisement via the augmentation method presented in Figure 4-1.

Case 5 Advertisement reduction: This case is applied if the inefficiency of a mobility

class occurs due to an unsatisfactory usage of advertisements and the optimal approach can

not be applied on the mobility class. A pessimistic approach is used to reduce the total

volume of advertisements via the reduction method presented in Figure 4-2 with the

objective of increasing the usage of advertisement.

To summarize, in this section, we have proposed to compute mobility classes semi-

automatically. In a given context, mobility classes with respect to a peer can be initialized


101

by the mobility classes of a similar peer or by an inactive mobility class. Mobility classes

are enhanced by operations presented in section 4.3.1.

4.4 Mobility class Identification

4.4.1 Stay time

A peer can identify a mobility class by using the connectivity lifetime of a MANET

view, which is determined by the stay times of the peers. The stay time of two peers p1 and

p2 is the time that they stayed together (Definition 4.2).

The stay time of direct neighbors having mobility patterns described by constant

velocities and directions can be calculated by using the method proposed in [102]. Let

(xi,yi), vi and θi be the position, the velocity and the direction of a peer pi respectively and r

be the transmission range of the communication technology. As discussed in [102], the

stay time of p1 and p2 can be computed as follows:

22

2222

21 babc)-(ad-)rb(acd)(ab-

)p,time(p-Stay+

+++=

Where a = v1cos θ1 – v2 cos θ2, b=x1 - x2, c = v1sin θ1 - v2sin θ2, d=y1 - y2

Figure 4-3 illustrates that the stay time of a peer p1 located at (1,3) and a peer p2 located

at (4,1) with the specific velocity is 0.3 minutes given that they are connected with a

network technology having a 10 meters of transmission range.

Before defining the stay-time of multi-hop neighbors, let us define the stay time of a

path. Assume that ph is a path connecting peers p0 and pm. The path ph can be expressed as

a sequence of peers p0, p1, …,pm such that there is a direct connection between pi and pi+1

for 0 ≤ i < m. The path ph is valid if and only if any of two adjacent peers in ph stay

connected. Therefore, the stay time of ph is calculated as follows;

stay-time-path (ph) = [ ]10),( 1min −=− + mtoiforpptimestay ii


102

X

Y

θs=0° vs=40m/minute

p1

p2

a=40*cos(0°)-45 *(cos(315°)=40 b=1-4=-3 c=40 *sin(0°)-45 *(sin (315°)=32 d=3-1=2 r=10meter ab+cd=40*-3+32*2=-56 a2+b2=1600+9=1609 (a2+b2)r2=160900 (ad-bc)2=(40*2+-3*32)2=256

Stay-time(ps,pd)=( 56+ 160644 )/1609=0.3

θd=315° vd= 45m/minute

Figure 4-3: An example of stay-time computation

In a MANET, multiple paths can be used to connect two peers. Let Ph be the set of paths

connecting the peers p1 and p2. The two peers stay connected if and only if there is at least

one valid path in Ph. Therefore, the stay time of p1 and p2 is computed as:

[ ]∈ Phphfor (ph)path -time-staymax stay-time (p1, p2) =

4.4.2 Association Rule Mining

Mobility classes are determined in the function of the peers’ stay-times. As discussed in

section 4.4.1, the stay-times of peers are determined by analyzing their mobility patterns in

a MANET. However, this way of determining the stay-times of peers is difficult since


103

peers may not follow pre-defined mobility patterns. We argue that we can overcome the

above challenge by analyzing the historical behaviors of peers.

The average stay-time of a peer can be computed from the history by using the actual

time that p stayed with other peers in a MANET in a given context. We propose to use

association rules; we also call them “habit-rules”, to estimate the average stay-time of a

peer in a given context and to determine the mobility class of a MANET view.

The association rule below associates the mobility class m3 to the context (Bus 3, [8AM-

8:10AM]).

<Context = (Bus 3, [8AM-8:10AM])> <mobility class=m3>

A habit rule has two parts: an antecedent context (e.g., <Context=(Bus 3, [8AM-

8:10AM])>) and consequent mobility class (e.g., <mobility class=m3>). As discussed in

chapter 3, antecedences can be produced by a method derived from those proposed in [89]

and [90, 91]. Let minConf be the minimum threshold confidence of a rule and S(ant) be the

set of sharing statistics matching the antecedence ant. Assume that there is a mobility class

m(p,c) such that ant is an actual sharing context of the abstract context c. According to

Definition 4.6, S(m(p,c)) is the set of sharing statistics matching the mobility class m(p,c).

A rule ant <mobility class=m(p,c)> is formed if and only if

minConfS(ant)

c))S(m(p,S(ant)≥

∩

In the scenario presented in chapter 1 and in section 3.1, Pascal exchanges information in

a bus with friends and people working in a bank. Let m(Pascal, (Bus,φ )) be a mobility

class with range-lifetimes [0, 6). Let’s observe the sharing statistics of Pascal displayed in

Table 4-5 and the antecedent <context = (“Bus”,[8AM-8:05AM])>.

The contexts attached to sharing statistics in Table 4-5 are the actual contexts of the

abstract context (Bus,φ ). The sharing-statistics s1, s2, s4, and s5 have a connectivity-


104

lifetime in the rang-lifetimes of the mobility class m(Pascal, (Bus,φ )), i.e., less than 6.

Thus, the rule below can be formed from the above historical data11.

<context=(“Bus”,[8AM-8:05AM])> <mobility class = m(pascal, (Bus,φ ))>

Table 4-5: Pascal’s sharing statistics

Sharing

statistics

Context Co-lifetime

s1 (Bus 27 ,[8AM-8:07AM]) 5

s2 (Bus 27, [8AM-8:06AM]) 5.9

s3 (Bus 37, [8AM-8:05AM]) 7

s4 (Bus 27, [8AM-8:07AM]) 5

s5 (Bus 27, [8AM-8:05AM]) 2

s6 (Bus 37, [8AM-8:07AM]) 7

In general, association rules can be used to estimate the mobility class of a MANET view

according to the actual sharing contexts. These rules can be produced by mining contextual

patterns in the historical sharing-statistics and the similarity of the connectivity lifetimes

attaches with these historical data.

4.5 Conclusion

Information discovery in a MANET can be performed by using the pull approach (via

querying) and by using the push approach (via advertisements). To maximize the usage of

the push approach, we introduce a novel concept called a mobility-class that parameterizes

the advertisement policy according to users’ stay-times and their context. Mobility classes

can also be computed semi-automatically by using the approach proposed in this chapter.

Peers can determine the current mobility classes by analyzing their stay-times or by using

habit rules. 11 To produce a rule, we should compare the confidence and support of a rule with predefined constants. For

the sake of simplicity, we skip this step in the illustration.

105

Chapter 5 File classification and Organization

In the previous two chapters, we have presented interest-aware and lifetime-aware

information sharing methodologies. In these two chapters, sharable files are advertised by

disseminating their metadata. This kind of advertisement can impose a high burden on

devices. In this chapter, we propose an algorithm that organizes sharable files in a tree,

named a file tree, so that files can be advertised briefly or in detailed by using their

organization in the tree.

The research work in this chapter has been published in the International Journal on

Computer Science and Information Systems [103] and has been presented in the

conference on Pervasive Computing and Communications Workshops (PerComW 2010)

[85].

This chapter is organized as follows: we discuss our motivation in section 5.1; file

representation and organization are covered in section 5.2; classification of files is

presented in section 5.3; we illustrate the application of file trees in file discovery in

section 5.4; section 5.5 discusses the main contributions presented in this chapter, finally,

we conclude this chapter in section 5.6.

Chapter 5 File Classification and Organization

106

5.1 Motivation

Organizing files into a tree facilitates the presentation of files and minimizes the load of

peers involved in information sharing. According to the scenario presented in chapter 1,

Pascal exchanges photos with his friends. Let’s consider the MANET displayed in Figure

5-1 where Pascal is connected to Bob, Carol, David and Eve. Assume that Pascal is

interested in receiving photos of vegetables and the other participants of the MANET are

interested to provide photos about vegetables. Suppose that the advertisement quota is 2

and the forwarding factor is also 2. As a result, each of the participants of the MANET can

advertise two of vegetables’ photos to Pascal.

Assume that Pascal wants to receive a photo about Jerusalem artichoke. The

advertisement quota is too small to indicate the peers owning the photo that Pascal is

looking for. Most probably, he will be forced to search the photo by distributing a query.

As the forwarding factor is small, he will need to distribute the query repeatedly. This type

of file discovery will make the environment overloaded with queries and will take time to

satisfy users’ information needs.

Now, assume that participants of the MANET organize their files in file-trees as shown

in Figure 5-2. Bob informs Pascal that he has photos of tubers and seeds. Other participants

advertise files in a similar fashion. Pascal knows that Jerusalem artichoke is a tuber

vegetable. As consequence, he learns that Bob has the potential to provide the required

photo. Therefore, Pascal decides to send the query only to Bob.

Organizing files in a tree permits users to advertise files at a high level. This kind of file

advertisement allows users to know the potential of a peer to provide the required files and

so to limit the dissemination of his/her queries to potential peers.

In this chapter, we introduce a concept called “cluster” that organizes files hierarchically.

In other words, a cluster represents group of files or group of other clusters. We then

propose an algorithm that classifies files into clusters.


107

?

??

? 4

4

1 1

1 1

3

Cabagge Carrot

Potato Bean

Cabagge Onion

Tomato Broccoli

Pascal

Searching photo of Jerusalem artichoke

Carol David

Eve Bob

2

Task order Advertisement

Query

Adv-Volume 2

Forwarding factor 2

?

α

Figure 5-1: Query resolution via advertisements about individual files

?3

1 1

1 1leaves roots

tubersseeds

bulbs leaves

fruits flowers

Pascal

Searching photo of Jerusalem artichoke

Eve

2

Advertisement Query ?

Adv-Volume 2 Forwarding Factor 2

David Carol

Bob

Vegetable

leaves roots

Vegetable

tubers seeds

Vegetable

bulbs leaves

Vegetable

fruits flowers leaves

Figure 5-2: File organization and Query resolution


108

5.2 Information Representation

Files are represented via their metadata. Metadata of a file are composed of basic

metadata and specialized metadata. Basic metadata are described in Table 5-1. The

attributes FileID is assigned by using a sequential number and the Mac address of the

device where the file is created.

Table 5-1: Basic description of a file photo

Attributes Descriptions

FileID unique identifier of the file

Description list of keywords that describes the file

FileSize the size of the file

Specialized metadata depend on the type of the file. As displayed in Figure 5-3, a

specialized metadata of a photo, for example, can contain objects of interest identified in

the photo and spatial/temporal context of the photo snapshot.

< ?xml version = "1.0" encoding="UTF-8" ?> <! -- Description of the file format --> <actor>Pascal, Anne, Michael </ actor > <Format> <Type> jpeg</Type> </Format> <location> Part Dieu </ location >

<Time>28/04/2010 </Time>

Figure 5-3: Example of specialized metadata of a photo


109

Well-known content description metadata models like Dublin-core [104] and MPEG-7

[105] can be used to represent the metadata of a file. In this chapter, we use an abstract

representation of metadata to present and discuss our work.

To facilitate file searching and categorization, files are mapped in a space via vector

space modeling (VSM) techniques [87,106]. Vector space modeling is a standard

technique in information retrieval to represent documents through their contents.

In VSM, a document di is represented by a vector di = {wi1,wi2, . . ., win} where wij

represents the weight of the term j in the document di. To produce this vector for a text

document, the document is parsed into series of words in such a way that the parsing

process removes stop words such as prepositions, conjunctions, common verbs, pronouns,

articles and common adjectives. The documents are then represented in a term x frequency

matrix [87,106]. A document vector can be considered as a vector in the term x frequency

space that is usually referred as vector space.

In this thesis, we propose to organize files hierarchically into clusters. The structure

containing the clusters is called a file tree. The metadata of a cluster are described in Table

5-2. The metadata of a cluster contain its description as well as the IDs of files and sub-

clusters grouped under it. The metadata also contain the average size of files grouped in the

cluster. An example of metadata of a cluster is given in Figure 5-4.

Table 5-2: Description of a cluster

Attributes Descriptions

ClusterID unique identifier of the represented cluster

Description list of keywords that describes the cluster

FilesIDs ids of the files found under the represented cluster

SubClusterIDs ids of the clusters found under the represented cluster

AvgFileSize the average size of files grouped in the cluster


110

< ?xml version = "1.0" encoding="UTF-8" ?>

< ClusterID > C0083blueD11</ ClusterID >

<Description> Pascal, Campus </Description>

< FileIDs > F0083blueD333, F0083blueD343 , F0083blueD355 ,

F0083blueD356, F0083blueD356,F0083blueD360</ FileIDs >

< SubClusterIDs> C0083blueD21, C0083blueD22</ SubClusterIDs >

< AvgSizeFile>544KB </ AvgSizeFile>

Figure 5-4: An example metadata of a cluster

The description of a cluster c is computed by using the description of the files/clusters

under it. Assume that maxKey is the maximum number of keywords used to describe a

file/cluster. Suppose G contains the descriptions of the files/clusters grouped under the

cluster c. Let Occurrence(G,k) be the number of descriptions in G containing the keyword

k and keys(G) be the union of descriptions in G. Description(c), the description of the

cluster c, is defined as the set of the most popular keywords that appear in Keys(G), i.e;

- | Keys(G) | < maxKey ⇒|Description(c)| = | Keys(G)|

- | Keys(G)| ≥ maxKey ⇒ |Description(c)| = maxKey

- Description(c) ⊂ Keys(G)

- Occurrence(G, ki) ≥ Occurrence(G, kj), ∀ki ∈ Description(c), ∀kj ∈ Keys(G) -

Description(c)

The vector representation of a cluster c is computed as follows. Let V be the set of

vector-representations of files/clusters classified under c. The vector representation of c is

the centriod vector12 of V. In the example displayed in Figure 5-5, V contains the vectors in

the circle and the center of the circle is the vector representation of the cluster.

12 The centroid vector is the average vector of the vectors in V.


111

Vector representation of c

Vector classified under the cluster

Figure 5-5: Vector representation of a cluster

We propose to apply a VSM technique to construct a virtual vector space to represent

advertised files. As keywords in the textual descriptions of files are the most important

terms of the file, in this thesis, we propose to construct the virtual vector space from the

term statistics with respect to textual descriptions of files.

5.3 Classification Algorithm

5.3.1 File Classification

Clusters are organized into a structure called a file tree. The root of the tree is an artificial

cluster representing all sharable files.

A file-tree is constructed in a bottom up fashion; files are classified into clusters; the

resulting clusters are then classified into other clusters; the classification continues until a

tree of the required height is obtained.

New sharable files, which are added after the classification has been performed, can be

automatically added into clusters found at the leaves of the file tree.

Files/clusters can be classified by using a content-based approach, a metadata-based

approach or a hybrid of the two. A content-based approach performs classifications by


112

using the files’/clusters’ vector-representations; a metadata-based approach performs

classifications by using the textual-description of the files/clusters.

A content-based approach may not be always applicable since some of the files may not

have vector representations. A vector space can not be computed every time a file is added

due to the following reasons: (1) vector space computation is expensive; and (2) when a

file is added in thin device, this device may not encounter right away the device that is

capable to compute the vector space on its behalf. Consequently, the recently added files

may not have vector representations. We propose to apply a hybrid approach if there are

files that do not have vector representations13.

In a hybrid classification approach, files are first classified using a metadata-based

approach into clusters such that each of the clusters contains some files that have a vector

representation. The vector representations of the resulted clusters are determined by using

only the files that have a vector representation. Afterwards, the classifications of the

clusters are performed by a content-based approach.

The k-means* algorithm (Algorithm 5-1) classifies files/clusters according to their

similarities. Let us discuss about similarity of files and clusters, before discussing the

algorithm.

The similarity between files and clusters can be computed based on the similarity of

either their textual descriptions or their vector-representations. Let E1 and E2 represent

clusters, or/and files. Let the sets of keywords D1 and D2 be the textual descriptions of E1

and E2 respectively.

As discussed in chapter 3, D1 is similar to D2 if and only if Similarity (D1, D2) ≥ minSim

where minSim is the similarity threshold. The similarity value between E1 and E2 is equals

to the similarity value between D1 and D2, i.e.,

Similarity (E1,E2)= Similarity (D1, D2),

13 In this chapter, all files are assumed to have metadata.


113

E1 and E2 are similar if and only if D1 and D2 are similar, i.e.,

E1 ≈E2 D1 ≈ D2

The similarity of files and clusters can also be derived from the similarity of their vector

representations. Let 12γ be the angle between the vectors representations of E1 and E2; the

similarity of E1 and E2 is calculated as:

Similarity(E1,E2) =cos ( 12γ )

The elements E1 and E2 are said to be similar if and only if

Similarity(E1,E2) minCosSim ≥

where minCosSim is the minimum cosine similarity value.

K-means* performs the classification process according to the similarity of files and

clusters. Let k be the number of clusters required, S be the textual-descriptions/vector-

representations of files/clusters to be classified. The algorithm puts k or less dissimilar

elements in the set heads (lines 3-9) such that the element in heads satisfies the following

condition.

Similarity(si,sj) <minSim for si, sj∈heads

Note that the value of minSim is different for content-based and metadata-based

classifications.

As described in lines 11-14, k-means* classifies the elements in S according to their

similarities to the elements in the set heads. It copies the content of the set heads, which is

initialized at the beginning of the algorithm, to the set oldheads. The set heads is re-

initializes the set heads (lines 15 and 16). From each group, the algorithm selects a head

element of the group in such a way that the head is more similar to the elements in the

group than any other element in this group (lines 17-19). Regrouping and recompilation of

group-heads continue until heads in consecutive steps are similar or the loop is performed

for a maximum number of times (maxIteration).


114

Algorithm: k-means* INPUT:S, k, minSim S: files/clusters representations k: number of clusters minSim: a threshold indicating the similarity of elements. Note that minSim has different values for content and metadata based classifications. OUTPUT: heads, member(e) ∀e∈heads heads: group-heads of the resulting clusters members(e): files/clusters members of the cluster headed by e BEGIN

//take dissimilar elements randomly 1. heads= ø 2. S’=S

//find k or less number of heads 3. WHILE (|heads| < k) && (S’ ≠ ø) 4. α= S’.randomSelect() // take an element randomly 5. S’= S’-{ α} //remove the element

//add α in heads if it is dissimilar to the other elements in heads 6. IF ((heads = ø) || (Similarity (α,β)< minSim, ∀ β ∈heads) ) 7. heads.add(α) 8. END IF 9. END WHILE 10. i=0 11. Do

//map elements into clusters 12. FOR each s ∈S

/*put s in the group headed by α if s is more similar to α than to other group heads*/

13. members (α) .add(s) for α ∈ heads such that Similarity(s, α) ≥ Similarity(s,β) ∀β ∈ heads

14. END FOR //copy the content of heads into oldheads

15. oldHeads=heads //reset heads

16. heads =ø //re-compute heads

17. FOR each h ∈ oldHeads //determine the best head for the group currently headed by h

18. heads.add(α) such that ∀β≠α∈ members(h)

∑∑∈∈

≥)()(

),(),(hmemberswhmembersw

wsimilaritywsimilarity βα

19. END FOR 20. i++ 21. WHILE ((oldHead !=heads) &&(i<maxIteration)) END ALGORITHM

Algorithm 5-1: file clustering based on k*-means


115

Let the height of a file tree be h and the number of clusters at depth i be ni for 1 ≤ i ≤ h.

The file tree is computed as:

Step 1: files are classified into nh clusters by using the k-means* algorithm.

Step 2: for each depth i, i = h-1, h-2, …, 1, clusters found at depth i + 1 are classified

into ni clusters.

Step 3: all clusters at depth 1 are grouped into the root cluster, which is an artificial

cluster representing all sharable files.

The next section studies the determination of the dimension of a tree (i.e., a tree high and

number of clusters in each depth of the tree) in the function of mobility classes.

5.3.2 Computation of the File tree’s Dimension

We propose to compute the height of a file tree and the number of clusters at each depth

according to the mobility classes considered by a source peer to determine the

advertisement policies in MANETs. The number of clusters at a depth of the file tree

should be related to the volume of advertisement attached with a mobility class m so that

the clusters in this depth will be advertised in MANET-views described by the mobility

class m.

Consider the file tree in Figure 5-6. Assume that there are two mobility classes m1 and m2

with advertisement volumes equals to 2 and 8 metadata. As a result, the clusters at depth 1

correspond to the mobility class m1 and those in the depth 2, to the mobility class m2.

Therefore, the clusters c11 and c12 will be advertised in MANET-views described by the

mobility class m1; Clusters c21, c22, c23, c24, c25, c26, c27 and c28 will be advertised in

MANET-views described by the mobility class m2. We discuss advertisements of

files/clusters in section 5.5.


116

m2

m1

C0

C11 C12

C21 C23

C24

C22

C25

C27

C28

C26

Figure 5-6: An example of association between a file-tree with mobility classes.

Not all of the mobility classes can be considered to compute the dimention of the file tree

(i.e., the height of the file tree and the number of clusters at each depth of the tree) because

of the following reason. Redundancy of clusters can be created since the number of

advertisements of mobility classes may be the same or may not be significantly different.

Assume that there are mobility classes m1, m2 and m3 with advertisement volumes 3, 4 and

8 metadata. If m1, m2 and m3 are considered to compute the dimension of the file tree, a file

tree that looks like the one in Figure 5-7 will be resulted. Note that there are clusters

representing the same group of files in the depths 1 and 2; the clusters c11 and c21 as well as

the clusters c12 and c22 represent the same files. To avoid this kind of redundancies of

clusters, we propose to identify representative mobility classes that will be used to compute

the dimension of the file-tree.

Representative mobility classes are those mobility classes that show significant

differences in terms of advertisement volumes. Let β be a significance-factor such that β >

1. A mobility class mi is said to be significantly greater than to a mobility class mj (denoted

as mi > mj) if and only if

β≥−−

)()(

i

i

mvolumeadvmvolumeadv


117

m3

m2

m1

C0

C11 C12

C31 C33 C34C32 C35 C37C36

C21 C22 C23 C24

C12

C38

Figure 5-7: The redundancy created by considering all mobility classes

A mobility class mi is said to be significantly less than a mobility class mi (denoted as mi

< mj) if and only if mj > mi. The mobility classes mi and mj are called significantlly

different if and only if mi > mj or mj > mj.

Let nf be the number of sharable files. Let M be the set of all mobility classes considered

by a peer during information sharing. The set Mimp ⊂ M is called a set of representative

classes if and only if every m ∈ Mimp satisfies the following properties

[1] mi<mj or mj<mi, ∀mi,mj ∈ Mimp

[2] for ∀m ∈ M - Mimp, one of the following properties is satisfied

a. ∃mi∈ Mimp such that β<−−

)()(

mvolumeadvmvolumeadv i

b. β<− )(mvolumeadv

nf

[3] one of the following conditions hold true for ∀m ∈ Mimp

a. the mobility class is found in inside of the list, i.e., ∃mi, mj ∈ Mimp –{m}

such that mi<m<mj,

b. the mobility class is found at the end of the list, i.e., mi<m, ∀mi∈ Mimp –

{m} and β* adv-volume(m) ≤ nf,


118

c. the mobility class is found at the beginning of the list, i.e., m<mi, ∀mi ∈

Mimp –{m} and there is no mj ∈ M such that mj<m

Let us consider the mobility classes that are used to produce the file tree in Figure 5-7

(remember that m1, m2 and m3 have advertisement volumes 3, 4 and 8 metadata in the

example). Assume that nf is 16 and β is 2; m2 and m3 are representative mobility classes.

Algorithm 5-2 is used to calculate representative mobility classes as follows. As

described in line 2, all mobility classes in M that have an advertisement volume

significantly less than nf (the number of sharable files) are placed in a set named M’. The

algorithm, then, identifies, from M’, the mobility class having the maximum advertisement

volume as a representative class (step 1). Let’s call this mobility class as mcp. The

algorithm reinitialized M’ to contain mobility classes significantly less than mcp (step 2).

The algorithm repeats step 1 and 2 until M’ becomes an empty set.

Algorithm: representative mobility class computation Input: M, β,,nf M :list of mobility classes β :significance factor nf :the number of sharable files Output: Mimp Mimp :list of representative classes Begin

/*initialization*/ 1. Mimp= {∅}

/*identify mobility classes that have advertisement volumes significantly less than nf */ 2. M’={m| m∈M and adv-volume(m)*β < nf}

/*compute representative mobility classes*/ 3. While (M’!= ∅)

/* Step 1: identify a mobility class having the maximum number of adv-volume as a representative class */

4. Remove mcp∈ M’ such that adv-volume(mcp)≥adv-volume(m) ∀m∈ M’ 5. Mimp+={mcp}

/*Step 2: reinitialize M’ to contain mobility classes significantly less than to mcp */ 6. M’= {m|m∈M’ and m<mcp } 7. End while End Algorithm

Algorithm 5-2: representative mobility class computation


119

The condition Mimp = ∅ indicates that the number of sharable files is not significantly

different from the volume of advertisement attached with any of the mobility classes. In

this case, advertisement can be made by using metadata of all individual files; thus,

classification is not needed. If that is not the case, the height of the tree is |Mimp| and the

number of clusters at each depth i equals the advertisement volume attached with the ith

mobility class listed in Mimp.

5.4 Information Sharing Based on File Organization

5.4.1 Information Advertisement

As discussed in chapter 3, a data source can make advertisement by using the metadata

of every sharable file. As discussed in the beginning of this chapter, this kind of

advertisement will overload the environment with queries. In this chapter, we propose

advertising files by using descriptions of clusters that represents groups of sharable files.

The advertisement message can contain only clusters found at the shallowest or the

deepest level of a file-tree. As discussed in chapter 4, the current mobility class is used to

determine the volume of advertisement. As discussed in chapter 3, the overall demand of

the peers in the MANET view is used to determine the content of advertisements.

Files and clusters are mapped to the users’ interests in the overall demand according to

their reciprocal similarities. Let F(I) and C(I) be files and clusters matching the interest I.

A file f and a cluster c are placed in sets F(I) and C(I) respectively if and only if (1) c and f

are relevant to I and (2) for any interest Ij in the overall demand, c and f are more relevant

to I than to Ij. The relevance of files and clusters are computed according to their similarity

to the interest.

We propose to compute the content of advertisements by using Algorithm 5-3 according

to the interests of users, the mobility class of the MANET view and the arrangement of the

sharable files in the file-tree. Let m be a mobility class describing the current MANET

view and let Sod be the overall-demand of the peers in the MANET view. The data source

peer prepares advertisements of files using Algorithm 5-3 according to the overall demand


120

Sod and the advertisement volume attached with the mobility class m. The total volume of

advertisements with respect to the interests in Sod should be adv-Volume(m) and the sum of

the weight of the interests in Sod is 1; thus, the advertisement quota for the interest I in Sod,

denoted as N(I), is computed as:

N (I) = weight (I)*adv-Volume (m)

Let F be a set of files and Ck be a set of clusters found at the depth k of the file tree. Let

F(I) F and C⊆ ⊆k(I) Ck be sets of files and clusters matching the interest I; and ADV(I) be

an advertisement container for an interest I. For an empty interest Ie, i.e.,

Description(Ie)=∅, F(Ie) and Ck(Ie) are computed as follows.

• and { }

∪eod ISI

IF−∈

= )(-F )F(Ie

• { }∪

eod ISII

−∈

= )(C- C )(IC kkek

Let E be a set of files/clusters; the set Relevant(E, I, n) represents the n most relevant

(similar) elements of E with respect to the interest I, i.e., similarity(ei, I) ≥ similarity(ej, I)

for ∀ei ∈ Relevant(E, I, n), ej ∈ E- Relevant(E, I, n).

Algorithm 5-3 selects the metadata of files and clusters to be distributed in the

environment as follows. As indicated in lines 3 to 6, all metadata of files in F(I) are

selected if N(I) is large enough for advertising all sharable files. Otherwise, starting from

the leaves of the file tree, the algorithm searches a depth of the file tree where the number

of cluster at this depth is less than N(I) (lines 7-10).

Let us call this depth k. If the above search is unsuccessful, the metadata of the most

relevant clusters at depth 1 are placed in ADV(I) (lines 11-14). Otherwise, as described in

lines 15 to 22, the metadata of the most relevant clusters found from the depth k to the

depth h (the height of the tree) are placed in the set ADV(I) according to their position in

the file tree and their similarity with the interest I. After considering all the above clusters,

some metadata of individual files might be placed according to the available slots in

ADV(I) (lines 23-25).


121

Algorithm: Advertisement content determination Input: h, Sod, F(I) ∀I∈ Sod, Ck(I) for 0<k≤h and every I ∈ Sod h : height of the file-tree Sod : overall demand F(I) : files matching with the interest I Ck(I) : clusters matching an interest I and found at the depth k Output: Adv(I) for all I ∈ Sod Adv(I) : advertisement for every I ∈ Sod Begin 1. For each I ∈ Sod 2. ADV(I)=∅

/*select all metadata of files if N(I) is large enough to advertise them one by one*/ 3. If (N(I) ≥ |F(I)| ) 4. ADV(I)={metadata(f)| f∈ F(I)} 5. Exit 6. End If

/*search the depth where there is less than N(I) clusters*/ 7. k=h 8. While ((|N(I) ≤ |Ck (I)|) && (k>0)) 9. k-- 10. End while

/*if there is no depth where there is less than N(I) clusters, select some of the clusters at depth one and exit*/

11. If (k==0) 12. ADV(I)={metadata(c)|c∈Relevant(C1(I),I,N(I))} 13. Exit 14. End If

// select clusters according to their depth in the file tree 15. While((|Adv(I)|<N(I)) & (k≤ h)) 16. If (N(I)-|Adv(I)| ≥ |Ck (I)|) 17. ADV(I)={metadata(c)|c∈Ck(I)}U ADV(I) 18. Else 19. ADV(I)={metadata(c)|c∈Relevant(Ck(I),I,N(I)-|Adv(I)|)} U ADV(I) 20. End If 21. k++ 22. End while

//select some of the files if there are still free slots in ADV(I) 23. If(|Adv(I)|<N(I)) 24. ADV(I)={metadata(f)|f∈ Relevant(F(I),I,N(I)-|Adv(I)|)} U ADV(I) 25. End If 26. End for End Algorithm

Algorithm 5-3: Advertisement content determination


122

A source can make another advertisement after adv-period(m). In the meantime, the

information discovery method will try to adjust its knowledge about stay-times of peers

and the mobility class of the MANET-View. Moreover, in addition to the mobility class,

the volume of advertisement will be affected by Adv-usage(ps), the advertisement usage

factor of ps described in definition 4.3 (in chapter 4). Before sending the advertisements, a

data source asks the usage of advertisements of his/her neighbors and the volume of

advertisements14 that they will distribute in the next period.

Assume that peers Pn are direct neighbors. Let Adv-volume-total be the total volume of

advertisements that the peers in P will distribute in the next period. The advertisement

volume for ps ∈ Pn is zero if its advertisement usage-factor is zero. Otherwise, the

advertisement volume is calculated as follows.

) total-volume-Adv()(

)(∗

−−

=−∑∈ nPp

s

pusageAdvpusageAdvvolumeAdv

Considering the usage factor in the advertisement volume computation give more

chances to popular peers and minimize unnecessary advertisement produces by less

popular peers.

5.4.2 Information Discovery

In chapter 3, we have discussed query resolution according to information provisions of

users. In this chapter, we discuss resolving queries by using the received advertisements.

A query is resolved via an information discovery and an information delivery phases.

The information discovery phase is used to discover peers owning files matching with the

query while the information delivery phase is used to fetch the files.

Let F(q) be the files matching a user query q , which is expressed by a list of keywords,

and le C(q) be the clusters matching q. A file f and a cluster c are placed in F(q) and C(q) 14 As a peer computes mobility classes independently, they can have different volume of advertisements.


123

respectively if they are relevant to the query q. A file fi is relevant to a query q if and only

if fi is similar to the query q. A file fi is more relevant to interest q than a file fk if

Similarity(fi,q) > Similarity(fk,q)

Clusters are compared with queries in the same way.

Let owners(e) be the set of peers owning an element e (which represents a cluster/file).

Let downloadTime(f) be the time needed to download a file f and let disAnddelTime(c) be

the average time needed to discover and deliver a file grouped under a cluster c.

downloadTime(f) is estimated from the attribute “FileSize” in the metadata of the file.

disAnddelTime(c) is estimated by using the average size of files represented by the cluster

c (i.e., by using the attribute “AvgSizeFile” in the metadata of a cluster).

Let p be the data-requester posing the query q. As discussed in chapter 4.5.1, stay-

time(pi,pj) is the time that peer pi and pj stays together. For a set of elements E, let

Relevant(E,k,q) be a set of elements in E containing relevant elements satisfying the

following properties

• |Relevant(E,k,q) | = max (|E|,k)

• similarity(ei,q) ≥ similarity(ej,q) ∀ei ∈ Relevant(E,k,q) and ∀ej

∈ E - Relevant(E,k,q)

Algorithm 5-4 is used to prepare the messages that can be used to discover or to deliver

files for the query q from F(q) and C(q) respectively. More precisely, this algorithm

prepares two sets Delivery(q) and Discovery(q). The set Delivery(q) contains tuples of the

form (p, f) where p is the peer owning a file f that matches the query q. Discovery(q)

contains tuples of the form (p, q) where p is a peer owning a cluster matching with q.

Algorithm 5-4 first removes files from F(q) if it is not possible to deliver these files (line

1). It also removes clusters from C(q) if it is not possible to discover files grouped under

these clusters (line 2). The algorithm, then, selects some of the files from F(q) according to

their relevance to q and to the required number of files (lines 4 to 7). The algorithm ends

without preparing the discovery message if enough advertisements about files are found. In


124

the case that |Delivery(q)| < n (n is the number of responses displayed to the user), the

owners the most relevant clusters in C(q) are selected as potential sources of the files

matching with q and discovery messages are prepared to them (lines 9 to 15).

For each tuple (p,f), the metadata of the file f is displayed to the user. If the user approves

the downloading of the file, the file will be delivered from the peer p. For each tuple (p,q)

in Discovery(q), the query q is sent to the peer p. A peer receives the query q searches a

file matching with the query. If the search is successful, the peer sends the description of

the file to the requester peer. The requester peer may decide to download the file from the

peer p. We will discuss the delivery of file in the next chapter.

Algorithm: Prepare delivery and discovery messages Input: F(q), C(q), n, pr F(q): files matching with the query q C(q): clusters matching with the query q n: maximum number of files searched for a query pr: requester peer Output: Discovery(q), Delivery(q) Discovery(q): set of tuple (p, q) where p is a peer owning a cluster matching with q Delivery(q): set of tuple (p,f) where p is a peer owning a file f matching a query q Begin 1. Remove any f in F(q) such that stay-time(pr,pi) < downloadTime(f) for all pi ∈ owners(f) 2. Remove any c in C(q) such that stay-time(pr,pi)< disAnddelTime(c) for all pi € owners(c) 3. //prepare discovery messages 4. For all f ∈ Relevant(F(q),n, q) 5. For all p ∈ owners(f) 6. Put (p,f) in Delivery(q) 7. End For 8. End For 9. If (|Delivery(q) |<n) 10. For all c ∈ Relevant(C(q), n-Delivery(q),q) 11. For all pi ∈ owners(c) 12. Put (pi, q) in Discovery(q) 13. End for 14. End For 15. End If End Algorithm

Algorithm 5-4: Prepare delivery and discovery messages


125

5.5 Discussion

In this thesis, we have proposed to organize files in a file tree in order to facilitate the file

advertisement process. The file tree is formed in a bottom up fashion. First, files are

classified into clusters. The clusters are then repeatedly classified into other clusters until a

file tree with a required dimension is obtained. The dimension of the tree is computed

according to the number of representative mobility classes so that determination of the

content of advertisement is simplified.

We have proposed an algorithm, named k-means*, to classify files and clusters. This

algorithm is derived from the k-means classification algorithm [107,108]. The difference

between this algorithm and k-means is the selection of the group head, which is the

centroid of a cluster. The modification is needed due to the following weaknesses of k-

means.

• As group-heads are initialized by elements that may represent nothing, it may

happen in k-means that a cluster contains nothing [109].

• When the number of files/clusters is small, the initial grouping will determine the

resulted clusters significantly [110].

Let consider the content-based classification approach. K-means takes group-heads from

the vector space randomly. As a result, a group head may be selected in such a way that all

files/clusters to be grouped are less similar to this group head than to the other group-

heads. In this case, a cluster represented by this group-head will contain nothing. Let us

consider the example displayed in Figure 5-8. Assume k-means selects v1, v2 and v3 as

group heads; some of the files are similar to v1 and the others to v2; none of the files is

more similar to v3 than to v1 or v2; thus, the cluster headed by v3 will contain nothing. The

algorithm k*-means resolves this problem by initializing group-heads with elements that

represent files/clusters to be grouped. As a result, in k-means*, a cluster is initialized in

such a way that it will contain at least one element.


126

f8

f9

f7 f6 f4 f5

f3 f2 f1

v2

v3

v1 Group head

File

Figure 5-8: A possible result of k-means classification

In k-means, the initial grouping may determine content of the resulted clusters. For the

example displayed in Figure 5-8, the vector headed by v3 contains always nothing

regardless the number of times that k-means is iterated. As a result, the classification may

not have semantic meaning when the number of files/clusters is small since the cluster-

heads are initially selected randomly in k-means. In order to resolve the mentioned

problem, our algorithm initializes group-heads with dissimilar elements.

5.6 Conclusion

In this chapter, we have discussed the organization of files in a file tree in such a way that

the dimension of the file tree is computed in the function of the mobility classes that are

considered during information sharing. The data sources can determine the content of

advertisement from the file tree according to the mobility classes of the MANET view. We

demonstrate the application of a file-tree in the file discovery process.

127

Chapter 6 Implementation and Evaluation

In the previous three chapters, we have proposed and discussed methods used to conduct

information discovery in MANETs according to the interests and the stay-times of users.

Based on these approaches, we propose a self-adaptive information sharing middleware

called SAMi. SAMi is designed to fulfill the requirements described in chapter one. In this

chapter, we present the design, the implementation and the evaluation of this middleware.

The chapter has been designed according to our research work presented in the previous

chapters and those presented in the International Conference on Wireless Applications and

Computing [111,112] and in the fourth IEEE International Conference on Pervasive

Services (ICPS’07)[113].

The chapter is organized as follows. The design of the middleware is presented in section

6.1. Section 6.2 discusses the implementation of SAMi in simulated and real environments.

The evaluation of SAMi is covered in section 6.3. Section 6.4 discusses the challenges

encountered during the design and the implementation of the middleware. Finally, we

conclude the chapter in section 6.5.

Chapter 6: Implementation and evaluation

128

6.1 SAMi: a Self-Adaptive Middleware

In this thesis, we propose a self-adaptive middleware called SAMi that works according

to the following requirements specified in chapter 1.

• Pervasiveness: nomadic users should be allowed to share information anywhere,

anytime and by using any device.

• Mobility-awareness: the advertisement policy should be determined according to the

dynamicity of the environment, which is described by the mobility patterns of users.

• Interest-awareness: sharable files should be selected according to the users’ interests

to receive information and the users’ interests to provide information should be

considered during query resolution.

• High-level semantics: sharable files should be advertised at high level according to

their similarities.

• Context-aware content delivery: file delivery should be performed according to the

context of users and their environments.

• Social awareness: sharable files should be selected according to the social networks

of the users.

• Data dissemination: advertisements and queries should be disseminated according to

the users’ interests.

SAMi is a pure peer-to-peer middleware. Every device participating in information

sharing is required to install SAMi. However, thin devices can be helped by heavy

weighted devices to perform complex operations.

Figure 6-1 displays the architecture of SAMi. The main input of the middleware is a

query. Personal information including the basic information (age, name, address, etc), the

agendas, the habits, the states (e.g., busy) and the interests of users can be accepted as

inputs.


129

Advertisement data-store

Local repository

Personal data-store

Rule base MANET-Viewdata store

Adv

ertis

emen

t Man

agem

ent

File

Man

ager

Context Manager

Agenda

Habit

User basic data

File Discovery

File Delivery

File Adaptation

Query

State

Interest

Figure 6-1: Architecture of SAMi

SAMi stores important data, which permits to perform information sharing efficiently, in

four data repositories; namely local repository, advertisement data-store, MANET View

data-store and rule base. A device can contain zero or more data-stores.

Local repository and advertisement data-store contain descriptions of sharable files in

the local machine and in the vicinity respectively. In addition to the descriptions of

sharable files, the advertisement data-store contains platform and service advertisements.

The MANET view data store contains historical information about sharing activities. It

contains the sharing statistics and the mobility classes discussed in chapter 4. It also

contains the queries received in the history as well as the information demands and the

information provisions of users.

Rule base contains the association rules that are used to associate statistically the users’

context to their interests and to the mobility classes.

The middleware is composed of three modules; namely, context manager, advertisement

manager and file manager. A device can contain one or more modules. Every device that

participates in information sharing is required to have the file manager module.


130

Context manager determines the MANET-views’ mobility classes and the users’

interests from their contexts by using association rules. It also determines the users’

information needs by analyzing their agenda, habits and historical queries.

File manager, the core of SAMi, carries out file management functionalities, which

includes searching, delivering and classifying files. The file discovery, the file delivery and

the file adaptation modules perform information sharing activities. The file discovery

module is responsible for searching for information sources; the file delivery module is

used to download files; finally, file adaptation is used to help the file delivery module to

fetch the file according to the context of users and their profiles.

Advertisement Manager is responsible to make other peers aware of the sharable files

stored in the device of a data source. It determines the content and the distribution of

advertisements according to the mobility class of the MANET view and the interests of the

peers participating in the MANET.

6.1.1 Design goals

The design goals of the SAMi middleware are listed below.

Flexibility: The system should be easy to be used by any person with minimum effort. It

should also be adaptable to the capacity of mobile devices. This flexibility is achieved by

making the middleware to use interfaces of other well-known messengers like yahoo

messenger. It should also provide its own user interface when it is not possible to use such

messengers.

Discovery Optimization: The system should decrease the time needed to search a file.

The search time can be decreased by optimizing the usage of the push type of information

discovery approach.

Fairness: All peers in the network should equally profit from the information exchange.

This can be performed by fixing a quota on the volume of information to be advertised.


131

Automatic computing: The interests of users and mobility classes of MANET views

should be computed automatically as much as possible. Moreover, to facilitate information

sharing, association rules should be produced for estimating the mentioned profile

information.

Scalability: the middleware should work regardless of the number of users and the

number of sharable files in a MANET.

6.1.2 User Profile

A user profile is his/her representation in the virtual world. It describes persistent and

context dependent information about a user. Persistent personal information includes age,

birthday and sex. Context dependent personal information includes habit, preference,

agendas and so on.

The agenda and the habit of a user are used to determine the activities of the user. A habit

indicates repetitive activities of a user during a certain context. For example, while

travelling in a bus, a user may have the habit of reading news and listening to music. An

agenda describes the planned activities of a user. In an agenda, a user can specify the

documents that she/he needs to accomplish the planed activities. Examples of user agendas

and user habits are given in Figure 6-2.

A preference of a user describes the format of the information that he/she is interested in.

For instance, a user may prefer audio data during driving. Preferences of users can vary

according to the spatial or the temporal context of users.

The user profile can describe the information demands and provisions of a user. As

discussed in the previous chapter, the information demand describes the interests of a user

to receive information and the information provision describes his/her interests to provide

information.


132

A user profile can indicate the social groups of a user. A user can be a member of

different groups with respect to professional activities, social relationships and hobbies as

well as their information sharing habits.

User Agenda Start End event Activity Required Documents 10A.M. 12 P.M. meeting Strategic plan

preparation Strategic plan preparation

12P.M. 1 P.M. lunch - 1P.M. 3 P.M meeting Discussing with business

persons Efficient way of chairing a meeting How to deal with business persons

User Habit Activity When Time needed

habit

Shopping Week end 30 minutes - Journey in train Friday 2 hours Listening music Talking with friend Night 10 minutes Exchange jokes

Figure 6-2: Examples of user agenda and habits

6.1.3 Context Management Module

Dey [114] defines context as “any information that can be used to characterize the

situation of an entity. An entity is a user, a place, or a physical or computational object

that is considered relevant to the interaction between a user and an application, including

the user and application themselves.” According to Winograd [115], “something is context

because of the way it is used in interpretation, not due to its inherent properties. The

voltage on the power lines is a context if there is some action by the user and/or computer

whose interpretation is dependant on it, but otherwise is just part of the environment.”

From above two definitions, Dejene Ejgu [116], a former Ph.D. student in our research

team, describes context as “an operational term whose definition depends on the intention

of the operations involved on an entity at a particular time and space rather than the

inherent characteristics of the entities and the operations themselves”.


133

The concept “sharing context”, defined in chapter 3, is based on definition of “context”.

A sharing context describes the situation where the user is willing to provide files to others

in the vicinity. There are two types of sharing context: abstract and actual. An abstract

sharing context is a sharing context which is manually specified by a user to describe when

and where he allows others to download files from his machine. An actual sharing context

is derived from an abstract sharing context by considering the actual time and place in

which data were shared.

In the scenario presented in chapter 1, Pascal has a habit of sharing information in a class

room. Assume that he specifies (“Class-Room”, ø) as an abstract sharing context, where ø

denotes any time. Pascal is interconnected with other students via a MANET in Room 331

where a course is going on. According to the course schedule, the course will be conduced

from 9 AM to 10 AM. Therefore, (Room-331, [9 AM, 10 AM]) is the actual context

derived from the abstract context (“Class-Room”, ø).

As displayed in Figure 6-3, the context manager module uses the RAID-Action Engine

proposed by Dejene Ejigu [116] to interpret sharing contexts. The engine uses the HCom

model proposed also by Dejene [[116] to manage the context semantics and the context

data. The RAID-engine uses the Jena reasoner [117] to produce actions, which are used to

identify the context dependent personal profile, the mobility classes of MANET views and

the information needs of the user.

SAMi identifies the mobility class of a MANET view as follows. RAID-Action Engine

identifies an abstract context that matches with the actual context, which is accepted as an

input for a data source peer. Let us refer the actual context as cA and let us refer the data

source peer as p. The RAID-engine determines, then, the set of mobility classes defined by

the data source peer for the abstract sharing context. Let’s refer this set as M(p,cA). A

mobility class is selected from M(p,cA) by using the association rules and the actual

sharing context of the data source.


134

Context Manager

RAID-Action Engine

Rule mining

Input

User agenda

Actual sharing Context

Output

context dependent personal profile

Mobility Class

Information needs

User Habit

Personal data-store

MANET ViewData store

Rule Base

Figure 6-3: Context management in SAMi

Indeed, as discussed in section 4.4.2, mobility classes are determined by using rules

stored in the rule base data-store. For instance, the rule <context =

(Restaurant,∅)> <mobility class = m3> indicates that the MANETs observed in a

restaurant at any time15 are described by a mobility class m3.

SAMi also uses the RAID-Action Engine to identify the interests of the user by analyzing

association rules. The association rules associate the contexts of users with their interests.

They can also be used to associate social networks of users with their interests.

As discussed in chapter 3, habit rules are used to determine the used to determine the

information provisions of users. The following rule may be used to determine the

information provision of a user in a bus.

15 ∅ is used to represent any time.


135

<context = (Bus, ∅)> <information provision = {({Football},0.5}, ({news},0.50)}>

Different social groups presented in the MANET view can be used to determine the

information provision of a user. The following rules may be used to determine the

information provisions of users in these groups.

<Group= colleagues > <information provision = {({research}, 0.7), ({news},0.3)}>

<Group=friends > <information provision = {({news}, 0.9), ({music}, 0.10)}>

The RAID-Action Engine is also used to identify the information need of users from their

context dependent personal profile. As described in the previous section, the context

dependent personal profile contains the user agenda, habits, preferences and interests.

The information needs of a user can be determined from his agenda and habits. Assume

that Pascal has the habit of listening music during a long journey with preference for

Whitney Houston’s songs. SAMi starts searching the mentioned songs when he plans to

travel to another country.

A user can describe the information that he needs to perform the activities in the agenda;

for example, he can specify that the documents talking about “How to deal with business

persons” are needed to perform the agenda of a meeting with businesspersons. The

information needs of a user are determined by using his interests. For instance, sport news

should be searched for a user interested to get such news.

To sum up, the context manager module determines the profile of a MANET and of the

users participating in the network as well as their information needs. The module works by

using habit rules. Habit rules with respect to mobility classes and user interests are

produced by the rule-mining component of the context manager module. This component

works as described in chapter 3 and 4.

6.1.4 Information Sharing in SAMi


136

In SAMi, information discovery is performed via two phases: information discovery and

information delivery. The information discovery phase is used to discover files while the

information delivery is used to fetch the selected file.

The information discovery phase is guided by an advertisement policy. As discussed in

chapter 3, an advertisement policy is used to determine the volume of advertisement, its

radius and its frequency.

An advertisement policy can be determined from the mobility class that characterizes the

MANET view. Mobility classes are identified from the stay-time of the users and the

association rules. If it is impossible to determine the mobility class, an inactive mobility

class is considered as default class.

The information demands of users are used to determine the files to be advertised. The

information demands of users are identified by habit rules. If it is not possible to determine

the information demand of a user, an empty sharing interest is taken as default demand.

The advertisement manager module of SAMi advertises descriptions of files (metadata of

files and clusters) to other peers in the vicinity by using the algorithm proposed in section

5.3.2. A data requester can identify the files that he is looking for from the advertisements.

A data requester peer can also discover files by distributing queries as discussed in chapter

3.

After the information discovery phase is completed, the information delivery phase

starts. The purpose of this phase is to select one or more information sources to deliver a

file. Information delivery is performed as follows:

• SAMi identifies information-sources that can deliver the whole file. This kind of a

source is identified by analyzing the file size with respect to the time that the source

and the peer stay together16. If there are several such types of information sources,

SAMi selects an information source according to how far it is and how long it will stay

around.

16 Remember that the stay-time of two peers is computed by taking the intermediate peers into consideration.


137

• If the search in the above step is not successful, SAMi searches a combination of peers

(p1, p2, …, pk) such that the each information source pi delivers a portion of the file

called sfi so that the merge of (sf1, sf2, …, sfk) gives the required file.

Faults can occur in the information discovery and the information delivery phases. The

requester peer estimates the maximum time required to discover files for a query. The

information discovery for the query fails if the requester peer does not get the required

number of responses for the distributed discovery messages in the estimated time. In that

case, discovery messages are sent to other peers.

When a peer requests an information source to deliver a file or a portion of it, SAMi

estimates the maximum time needed to get the required file/portion. If the peer does not get

the information within that estimated time, the middleware concludes that the delivery of

the file or the portion of the file is failed. In this case, it searches if there are other

information sources to deliver the required file or the portion of the file.

In the above paragraphs, we have discussed about a normal information delivery process,

i.e., when the following conditions hold true:

1. The context of the requester peer matches the format of a file that he is looking for.

2. The peers owning the required file will stay connected with the requester peer until

the delivery process is completed.

In reality, these conditions may not always hold true. Therefore, we apply offline

delivery and adaptation of content in the case that these conditions are not satisfied.

In pervasive computing environments, it is possible that a user discovers an important

file but he is not able to download the file. Offline information delivery can be performed

if a requester peer does not need the information right away. The delivery can be

performed offline, by using email for example in the following two cases:

• the source and the requester peers know each other


138

• the requester peer can not wait until the download is completed but the source

peer can deliver the file to another peer such that this peer and the requester peer

know each other

In the first case, the source and the target exchange the file later. In the second case, the

source delivers the file to an intermediate peer in a MANET and this peer will deliver the

file to the requester peer via offline delivery.

Adaptation of content is a solution when the format of the information does not match

the context of the user. Assume that Pascal is in the airport waiting for delayed flight and

looks for some jokes to pass the time. He finds the joke in a text format but he can not read

the text since he is taking care of his baby. Here, the middleware has to convert the text to

audio.

SAMi uses the ConAMi system proposed by Yaser Fawaz, a PhD student in our research

team, to perform the content adaptation process [112,113,118]. ConAMi determines the

adaptation process by comparing the format of the localized file with the format that fits

the context of the user and his environment. In ConAMi, the adaptation process is divided

into simple adaptation tasks in such a way that each of these tasks can be performed by a

single service. Figure 6-4 shows an example of an adaptation process.

TextToAudioConversionTask TextSummarizationTask TextTranslationTask

Figure 6-4: Example of adaptation process

ConAMi performs content adaptation as follows. It identifies the tasks involved in the

adaptation process. It, then, searches services that execute the tasks in the adaptation

process. Hereafter, it constructs the content adaptation tree, which shows the best service


139

composition plans. The services in the optimal path of the tree are used to execute the

required content adaptation. More precisely, content adaptation in SAMi is implemented as

in Figure 6-5.

MANET

Adaptation services

File to be adapted

Context dependent personal profile

AdaptationOntology

Environmental context

Adaptation rules

File adaptation Adapted File

Figure 6-5: Implementation of ConAMi by the file adaptation module

The file adaptation module accepts as inputs: the file to be adapted, the environmental

context (e.g., screen dimensions, memory size, CPU speed, darkness, noisiness,

speechlessness and bandwidth), the context dependent personal profile (e.g., user’s

preferences), the adaptation rules, and the adaptation ontology. Adaptation ontology is

used to describe the entities involved in the adaptation process such as device, user,

network, location, adaptation service and data. The adaptation ontology is built based on

the EHRAM model proposed by Dejene Ejigu [116]. Adaptation Rules are a set of pre-

defined rules identified by Yaser Fawaz [116]. They are used to determine the tasks

involved in the adaptation process according to the user’s preferences, the device

capabilities, and the network bandwidth [118].

The file manager module manages, analyzes and performs reasoning on the input data in

order to determine the tasks needed to perform content adaptation on behalf of the


140

requester peer. The identified tasks are, then, used to select the adaptation services to be

used to perform the content adaptation process

6.1.5 Deployment

Some of the SAMi’s functionalities (e.g., classification of information and rule mining)

are too expensive to be performed in a MANET. Fortunately, it is enough to perform these

activities occasionally by using heavy weighted devices. Therefore, we decompose the

middleware into two important components: SAMi-basic and SAMi-ext. SAMi-basic

performs the basic functionalities of the middleware that are needed to perform a file

sharing activity in a MANET. SAMi-ext performs functionalities that are expensive but are

not required to be performed frequently. An example deployment of SAMi is displayed in

Figure 6-6.

As displayed in Figure 6-6, SAMi-basic is deployed in each mobile device. There are

different approaches to deploy SAMi-ext. It can be deployed in servers, which are

accessible from the Internet. This solution, however, will make the middleware inflexible

and highly dependent on the Internet. To resolve the above problem, we propose that a user

installs the SAMi-ext component in a PC equipped with wireless network so that he/she

accesses SAMi-ext installed in his/her personal PCs at home and access the one installed at

remote servers in other places.

In order to facilitate the performance of SAMi-ext, this component can use other services

to perform advanced functionalities. Rule mining and classification of files are

implemented as services.


141

MANET

Pascal David

Internet

Pascal

Anne

Social Networking

Rule mining

File Classification

Anne

SAMi-basic

SAMi-basic

SAMi-Basic

SAMi-basic SAMi-ext

SAMi-ext

SAMi-basic

Figure 6-6: SAMi deployement

As displayed in Figure 6-7, SAMi-basic is composed of four sub-systems: SAMi-core,

SAMi-GUI, SAMi-thin and SAMi-adaptor.

• SAMi-core is used to perform the fundamental information sharing activities (file

advertisement, file discovery and file delivery).

• SAMi-GUI provides a graphical user interface to accept inputs into the

middleware and to display its outputs.

• SAMi-adaptor enables SAMi to work with well-known messengers like Yahoo

messenger and Google-Talk.


142

• SAMi-thin allows thin devices with scarce resources to use SAMi. This

component consists of the functionalities used to discover files (via querying

neighborhood) and deliver files.

SAMi-adaptor

SAMi-GUI

11

1

1

11 SAMi-thin

1

1

SAMi-ext

SAMi-basic

SAMi-core

Figure 6-7: Component diagram of SAMi

6.1.6 Core implementation Classes of SAMi

In this section, we discuss the main classes of SAMi-core and SAMi-ext. The remaining

classes and other components are presented in annex. As shown in Figure 6-8, SAMi-core

is composed of the classes: Context-Manager, Adv-Manager, Info-Manager, Rule-

Manipulator and Env-Behavior. Context-Manager is used to capture the user and the

environmental contexts as well as to determine the profile of the user and his/her

neighbors. Adv-Manager prepares advertisement messages about files and platform (i.e.,

descriptions of devices and adaptation services). Info-Manager is used to fetch and provide

information from and to the devices in the vicinity. Rule-Manipulator manages the rules

that are used to determine the users’ interests and the MANET-views’ mobility classes.

Env-Behavior is used to compute the possible mobility classes that a peer uses during

information sharing.


143

Figure 6-8: Core classes and their relationships

The important classes of the SAMi-ext component are displayed in Figure 6-9 and Figure

6-10. Association rules, which we have named habit rules, are extracted by analyzing

historical data.

The class diagram displayed in the Figure 6-9 shows the classes that are involved in

analyzing historical data. Historical data are stored in the form of sharing-statistics (the

data structure defined in chapter 4) and sharing-histories, which is a structure that contains

the queries, the advertisements and the information demand and information provision of a

user in a given context. Mobility-Manager computes mobility-classes (chapter 3) from

sharing statistics (managed by the Sharing-Statistics class). Habit rules with respect to

mobility classes are identified by analyzing the same data. Interest-Manager extracts the

user’s sharing-interest, which can be either an information demand or an information

provision, by using sharing-histories managed by the sharing-history class. The mining of

rules with respect to sharing interest is done by using the same historical data.


144

Figure 6-9: Class diagram to manage historical data

As shown in Figure 6-10, the hierarchical classification algorithm implements the

Classification-Manager interface. In this thesis, we uses k-means* repeatedly in order to

get a file tree of a required dimension (the height of the tree and the number of cluster at

each depth). The Hierarchical-k-mean* class performs this classification as discussed in

chapter 5.

Figure 6-10: Classes for information classification

6.2 Implementation

6.2.1 SAMi over a Simulated Environment

We have developed a test-bed depicted in Figure 6-11 to simulate a MANET and to

implement the proposed middleware. This test-bed has been designed to output the rate of


145

file delivery of the SAMi middleware. The rate of file delivery expresses the number of

delivered files with respect to the number of files that has been requested

Input

Simulation Parameters

Test Bed

Output Rate of delivery

Application Linker

Connectivity Manager Mobility Manager

Info-Agent Files

Mobility Model

Resources

Messenger

Mobile Node

File-Requests

Network Characteristi

Figure 6-11: A Test bed to simulate a MANET

The test-bed accepts the following inputs: Resources, Files, File-Requests, Simulation-

Parameters, Mobility Model and Network Characteristics.

Files and File-Requests are the most important inputs of the middleware. A file is

represented by its file-size (size of the file), file-ID (a number identifying the file uniquely)

and metadata-size (size of the file’s metadata). The users’ information-needs are

represented by file-requests. A file request contains the request time, the ID of the

requested file and the size of the request-message.

The Resources parameter presently allows specifying only the memory capacities of the

devices involved in the MANET. However, the test bed is open to include other important

resources like the CPU power.


146

Simulation Parameters consist of the area coverage of the MANET and the duration of

the simulation. Mobility Model is the description of the mobility patterns of peers.

Currently, the test-bed implements a random-way-point distribution model to determine

the distribution of the peers and their movement patterns; thus, it only accepts the

maximum and the minimum values for the speed and the pause time of the peers.

Nevertheless, the test-bed can be easily modified to consider other mobility models.

Network Characteristics represent the characteristics of the network technology used to

connect peers in a MANET. The test-bed assumes peers are equipped with the same

network technology. Thus, it accepts the bandwidth and the line of sight of the network

technology.

The test bed is composed of 6 modules: Info-Agent, Mobile Node, Messenger,

Connectivity Manager, Mobility Manager and Application Linker. Info-Agent and Mobile

node modules implement the SAMi middleware and simulate a mobile peer respectively.

The other four modules are used to simulate a MANET. Messenger is responsible for

messages transactions. Connectivity Manager is in charge of checking the connectivity

between two peers according to the characteristics of the communication technology.

Mobility Manager is responsible to change the location context of a peer according to the

mobility model. Application Linker is responsible to invoke Info-Agent when

internal/external events occur.

6.2.2 Application of SAMi in Photo Sharing and Annotation

A prototype has been developed to illustrate the application of the SAMi middleware in

photo sharing and annotation in MANETs. J2ME and Java were used to implement SAMi

on a standard PC and a mobile phone respectively.


147

The prototype was deployed on top of Sun Wireless Toolkit (an emulation environment),

Sony Ericson w880i and w910i mobile phones, a laptop with 2 GHz processing power and

a desktop computer 3 GHz processing power.

A simple data structure has been used to represent metadata in the local repository of the

mobile phones to allow fast processing and to avoid extra memory usage. Figure 6-12

displays an example representation of metadata in the local repository of a mobile phone.

Thus, the photo described in Figure 6-12 has an ID F0083blueD333, described by the

keywords campus and Pascal, taken at the place called Part-Dieu and on 10/04/2010

(which is equals to 20100413155829).

F0083blueD333|campus-Pascal| Part Dieu | 20100413155829

photo

Figure 6-12: Examples of representation of metadata in local repository

In the prototype, a user can exploit the system to browse photos in his/her phone and

other phones in the vicinity according to the directory structure or their organization in a

file tree as displayed in Figure 6-13 and Figure 6-14.


148

Figure 6-13: Browsing photo by their directory organization

The first message that a peer exchange with a neighbor is ‘hello message’ containing the

interests of the peer to receive and provide information. When a peer joins a network, it

searches Bluetooth enabled devices and sends ‘hello message’ to them. The peers

accepting the message try to discover the sender device and welcome the new neighbor by

sending hello. After exchanging ‘hello messages’, peers advertise their sharable files.

A user can download the advertised files and request detailed information about the

advertised clusters. A peer sends metadata of some of the files under the cluster for which

a detail request was received. The user can ask again more detailed information about the

cluster to obtain the metadata of other files in the cluster.


149

Figure 6-14: Browsing photos by their organization in a file-tree

If a user does not find the file that he/she is looking for, he/she can search the file by

using a query as displayed in Figure 6-15. The prototype applies a simple keyword-

matching algorithm to compare queries and files.

Figure 6-15: Querying


150

As annotation of photo is a very important activity in photo management, the prototype

allows users to annotate the photo in their device while browsing. As shown in Figure

6-16, users can perform annotations in collaboration.

Figure 6-16: collaboration during photo annotation

6.3 Evaluation

In this section, we present the experimentations made to evaluate the SAMi middleware.

A PC with 3 GHz of processing power has been used to evaluate the data delivery of the

middleware. Its main functionalities are tested on top of the same PC and a Sony Ericson

w890i mobile phone.

6.3.1 Data Delivery Rate

The middleware was tested by fixing the simulation area (i.e., the area over which

devices involved in a MANET are distributed) and the simulation time (i.e., the length of

time that the MANET is valid).


151

We have assumed that the interests of users are unknown (i.e., information-demands and

information-provisions of users are empty sharing interests). Table 6-1 shows the input

parameters of the test-bed during the experimentation.

Table 6-1: The inputs of the test-bed

Parameters Values

Simulation area 33 meters by 33 meters

Simulation time (STime) 10800 second

Network bandwidth 1 Mbps

Line of sight 10 meters

Overall demand /

Overall provision

{(Ø,1)}

Number of files 10*Number of peers

Number of replication for

each file

Random (0,8)

Request Time (RT) Random (0, STime)

Experiment one Random(RT+1200s, STime) Deadline for file delivery

Experiment two Random(RT + 2400s, STime)

Number of file request per a

peer

Random (0, number of files)

Metadata size Random(1KB, 4KB)

Storage capacity of a peer Random(256 MB, 160 GB)

Speed Random( 1m/s, 45m/s)

We have collected 557 abstracts of research papers collected in the domain of information

retrieval, multimedia, pervasive computing, GIS and e-learning. These abstracts are used

as sharable files that are distributed to the peers randomly. Two or more peers might own

identical files, i.e., the files are replicated. A peer, however, does not keep the files

received from other peers as sharable files. We set the network characteristics by keeping


152

Bluetooth in mind. The minimum speed is set by considering the peers are walking; while

the maximum speed is set by considering that peers are using some transportation means.

As described in Table 6-2, three types of mobile ad-hoc environments (mobility classes):

highly dynamic, dynamic and moderate were considered.

Table 6-2: Types of Environments

Pause time Environment

[0,5) Highly dynamic

[5,10) Dynamic

[10,∞) Moderate

We made two experiments for each environment by changing the number of peers from

10 to 110. Figure 6-17 and Figure 6-18 show the result of experiment one and two

respectively. The difference between the two experiments lies in the deadline of file

delivery.

All requested files could not be delivered because of one of the three reasons:

1. the requested file might be stored in a device which was far from the requester peer.

2. the information source might disappear before the file was completely delivered and

the information source and the requester peers didn’t meet again.

3. the deadline of the file delivery was reached before it was completely delivered.

As it is observed in Figure 6-17 and Figure 6-18, the rate of delivery, i.e., the number of

files delivered out of the number of files requested, shows similar patterns regardless of the

environmental changes. The balance between the rates of deliveries is achieved by the fact

that the middleware uses three or more data sources to deliver a file in the case where a

single peer cannot deliver the whole file. Moreover, different portions of a file can be

delivered in different times. The changes of the advertisement period also play a role on

creating a balance in the rate of delivery. The changes of the advertisement period make


153

peers to have knowledge on the information around them by imposing a minimum

overhead on the bandwidth.

00,10,20,30,40,50,60,70,80,9

10 20 30 40 50 60 70 80 90 100 110

Number of Peers

Rat

e of

Del

iver

y

highly dynamic dynamic moderate

Figure 6-17: Deliverability of files for experiment one

00,10,20,30,40,50,60,70,80,9

1

10 20 30 40 50 60 70 80 90 100 110

Number of Peers

Rat

e of

Del

iver

y

highly dynamic dynamic moderate

Figure 6-18: Deliverability of files for experiment two


154

6.3.2 Interest Awareness

We used the descriptions of the photos in [119] as queries to test the algorithm described

in section 3.4. We have made evaluations to measure the execution time of the proposed

method to identify the users’ interests. Constants displayed in Table 6-3 were used during

the evaluation of the interest extraction algorithm presented in section 3.4.1. We have used

the algorithm to produce an information demand from the list of queries.

We identify the minimum weight of an interest using the following objective. A sharing-

interest is designed to contain at most 5 interests so that the effort required to specify

interests is minimized in a mobile phone. This indicate that 0.2 (1/5) is the minimum

weight of an interest. For a nomadic user equipped with a mobile, it is not simple to

specify several keywords in interests; thus, we limit the number of keywords in an interest

to 5. We set the minimum similarity value indicating the similarity of interests to 0.4,

which indicate that two interests are similar if they have at least two keywords in common.

We set the minimum cosine similarity 0.4, which is a little bit less than 0.5. Note that the

cosine value 0 indicates that the interests are totally different and the cosine value 1

indicates that they are identical.

Figure 6-19 displays the performance of the algorithm versus the number of queries. It

takes less than one minute and 32 milliseconds to process 540 queries on a mobile phone

and on a PC respectively. Therefore, the performance of the algorithm is, indeed,

acceptable for both a PC and a mobile phone.

Table 6-3: Constants for the query extraction algorithm

Minimum Weight of an interest(minW ) 0.2

Maximum Keywords in Interests(maxKeys ) 5

Minimum Interest Similarity Value( accSim ) 0.4

Minimum Cosine Similarity Value( accV ) 0.5

Acceptable support 4%

Acceptable confidence (minConf) 80%


155

0

400

800

1200

40 160 280 400 520

Queries

Exe

cutio

n Ti

me(

ms)

PC mobile

Figure 6-19: Performance of interest extraction algorithm

Rule mining is performed to produce rules that can be used to identify information

demand of a user. The historical information demands of a peer are generated by using the

data displayed in Table 6-4. Figure 6-20 shows the relationships between the volume of

sharing histories and the performance of the rule mining algorithm.

Table 6-4: Characteristics of information demands

Information Demand

Context Interest Description Weight

I01 news 30-40 (8 AM, Bus 27)

I02 finance 60-70

I11 research 65-70 (8 AM, Bus 37)

I12 joke 30-35

I21 football 70-80 (ø, Stadium)

I22 tennis 20-30

I31 Research 65-70 (12 PM, INSA-Café)

I32 place 35-40


156

To process 180 sharing-histories, it takes around 20 seconds for the PC. The algorithm

takes around 20 seconds to process 120 historical information demands on the mobile

phone. As a result, the algorithm is acceptable for a PC. It has also an acceptable

performance for a mobile phone as long as the number of sharing histories is less than 120.

The rule mining process becomes heavier for a mobile phone as the number of

information demands increases. However, as rule mining is performed occasionally, the

load on a mobile phone is not as such exaggerated.

Assume that a user collects sharing-statistics 3 times in a day. We need 60 days (two

months) to produce 180 sharing-histories. In the reality, it is very rare that a user would

stay away from the Internet for 2 months. Therefore, heavy weighted devices can perform

rule mining on behalf of a mobile phone.

0

20

40

60

80

100

120

50 60 70 80 90 100 110 120 130 140 150 160 170 180

Information demands

Exe

cutio

n tim

e (s

)

PC Mobile

Figure 6-20: Rules to identify information demand


157

6.3.3 Mobility Awareness

posed in section 4.4.2 was performed to produce rules

with respect to the mobility classes of MANET-views. Table 6-5 describes the data used to

g

the performance of the rule-mining algorithm

is

Table 6-5: Characteristics of sharing-statistics

Range –lifetime(in minutes)

The rule-mining algorithm pro

enerate sharing-statistics. Table 6-6 lists the values of the constants of the algorithm used

during the evaluation. The mobility-classes described in Table 6-7 are considered during

the evaluation of the algorithm.

The result of the experiment is displayed in Figure 6-21. It takes around 26 seconds to

process 200 sharing-statistics in a PC. Thus,

acceptable for a PC.

Actual sharing Context(Time, Location)

(Bus, 8 AM - 8:10 AM) 3-4

(Restaurant, 12 AM -12:30 AM) 22-28

(Stadium ,∅) 90-120

(Café ,∅) 11-15

Table 6-6: Constants considered during rule mining evaluation

Recent-Time 0(all sharing-statistics are considered

minimum network referred by a rule 5

Acceptable support 4%

Acceptable confidence 80%


158

Table 6-7: Range-lifetimes of mobility classes designed for sharing context (“”,∅)

Mobility classes Range-Lifetimes(in minutes)

1 [0,15)

2 [15,30)

3 [30,∞)

For 100 sharing statistics in a mobile phone, the algorithm took around 22 seconds. The

performance of the rule mining reduces, in a mobile phone, as the volume of sharing

statistics increases. It takes around 4.6 minutes to process 200 sharing-statistics in a mobile

phone. Assume that a user is involved 3 times a day in information sharing in a MANET.

We need around 67 days to have 200 sharing statistics. A mobile phone will contact

powerful devices several times in 67 days. As a result, it can be helped by powerful

devices to mine rules. Moreover, we perform rule mining in incremental manner.

0

50

100

150

200

250

300

50 60 70 80 90 100

110

120

130

140

150

160

170

180

190

200

Sharng-statistics

Exe

cutio

n tim

e (s

)

PC Mobile

Figure 6-21: Rules to identify mobility classes


159

6.3.4 File Classific

xperiments to evaluate our classification algorithm (section

5.3.1). In the first type of experimentation, the execution time of the algorithm is tested

v

icantly different advertisement volumes. These mobility classes are used to

d

t of files or by using their metadata. The

co

an

ation

We have made two types of e

ersus the number of representative mobility classes (cf section 5.3.2). The second type of

experimentation is used to evaluate the execution time of the algorithm versus the number

of files.

As discussed in section 5.3.2, representative mobility classes are those mobility classes

having signif

etermine the dimension of the file tree, i.e., its height and number of clusters at each depth

of the tree. The significance factor is used to determine the number of representative

mobility class and hence, this factor affects the dimension of the file tree. Consequently,

evaluating classification of file versus representative classes is the same as evaluating

classification of file versus the significant factor.

During the evaluation, we have produced mobility classes in such a way that all of the

mobility classes are also representative classes.

As discussed in section 5.3.1, the hierarchical classification algorithm proposed in this

thesis can be implemented by using the conten

ntent-based classification is performed by using the files’ vector representations. The

metadata-based classification is performed by using the files’ textual descriptions.

The content-based classification was evaluated by using 557 abstracts of research papers

collected in the domain of information retrieval, multimedia, pervasive computing, GIS

d e-learning. The algorithm, however, does not accept the raw files but their vector

descriptions. The metadata-based algorithm accepts metadata of photos collected and

prepared by the Department of Computer Science and Engineering of the University of

Washington [119]. The classification algorithms were tested using the parameters

described in Table 6-8. We have used 0.3 as a lexical minimum similarity value. The value

0.3 is a reasonable similarity value to classify files in the same group; assume that two files

are described by textual descriptions containing 10 keywords; the two files are similar if


160

the execution times and the

significance factors for the content-based and the metadata-based algorithms. As displayed,

w

eters used classification algorithm in the first experimentation

Input Content based Metadata based

they have at least 3 keywords in common. For content wise similarity, 0.4 is logical

similarity cosine value for classifying files. Note that cosine value 1 indicates that the files

are identical and 0 indicates that they are totally dissimilar.

Figure 6-22 and Figure 6-23 show the relationships of

e can observe that the execution time of the algorithm does not depend much on the

significance factor.

Table 6-8: Param

Files 557 200

File Type text Photo

minSim 0.4 0.3

maxIteration 100 30

File representation or al description vect textu

Fro th riments, we have observed that the content-based

classification algorithm has a good performance as compared to the metadata based

c

m the result of e expe

lassification algorithm. However, this is only true if the vectors are computed in advance.

In reality, the vectors of files depend on one another; thus, we need to re-compute the

vector space as new files added in the system. Vector production is not a simple process as

displayed in Figure 6-24. The vector space was produced with the help of a library called

Jama [120]. The Jama library does not have a version for mobile phones. However, it is

simple to imagine how the vector production would be expensive for mobile phones.


161

0

5000

10000

15000

20000

25000

30000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Significance-factor

Exe

cutio

n tim

e (m

s)

PC Mobile

Figure 6-22: Content based classification performance

0

20000

40000

60000

80000

100000

120000

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Significance-factor

Exe

cutio

n ti

me

(m

s)

PC mobile phone

Figure 6-23: Metadata based classification performance


162

05000

1000015000200002500030000350004000045000

50 85 120

155

190

225

260

295

330

365

400

435

470

505

540

Number of files

Exe

cutio

n tim

e (m

s)

Figure 6-24: Vector production in the PC

In the second type of experimentation, we only considered the metadata-based

classification algorithm. As displayed in Table 6-9, the metadata classification was applied

to produce a file tree from the test data downloaded from the website [119]. The file tree

was produced by keeping in mind three mobility classes such that 2, 6 and 20 metadata can

be advertised. Thus, the height of the file tree is 3, two clusters are found at the depth 1; six

clusters, at the depth 2 and twenty clusters, at the depth 3.

Table 6-9: Inputs for classification algorithm for the second type of experimentation

Data data in [119]

Height of the file tree 3

Number of cluster at the first depth 2

Number of cluster at the second depth 6

Number of cluster at the third depth 20


163

Figure 6-25 shows the execution time required to perform the metadata based file

classification versus the number of photos to be classified. The algorithm has a good

performance in a PC. It has also an acceptable performance in a mobile phone as long as

the number of photos is less than 250. However, the performance of the algorithm

deteriorated in a mobile phone as the number of photos increases.

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650Photos

Exe

cutio

ns ti

me

(s)

Mobile phone PC

Figure 6-25: Performance metadata based classification

The algorithm took around 5 minutes to process 650 photos on the mobile phone. In

reality, the number of photos, in a mobile phone, rarely passes 250; indeed, thin devices

like Sony Ericson w910i mobile phones are not usually used to store more than 250 photos

(more than 500 mega byte). Furthermore, as photo classification is done occasionally, thin

devices can be helped by heavy weighted devices to perform photo classification.

6.3.5 Advertisement Selection

The algorithm in section 5.4.1, which is used to filter the advertisements according to the

users’ interests, was tested by using the metadata of photos collected and prepared by the


164

Department of Computer Science and Engineering of the University of Washington [119].

Table 6-10 describes the data used to filter the advertisements.

The advertisement policy is designed to distribute 3 metadata for the first two interests

(7*0.4 =2.8) and 1 metadata for the other interest (7*0.2=1.4). We consider 0.3 as a

minimum similarity value between an interest and a file/cluster. Assume that the number of

keywords in the textual description of a file is 7. A file is match with an interest if the

interest and the file’s textual description have three keywords in common.

Figure 6-26 illustrates the relationship between the algorithm and the number of sharable

photos. For 200 sharable photos, the algorithm takes around 68 ms in the mobile phone.

Therefore, the execution time is definitely acceptable.

Table 6-10: Test data used during filtering advertisements

Overall

Demand

{({ tree, grass, sky, bench},0.4),

({flower, tree, bush, sky, car},0.4),

({trunk, sidewalk, rock, sky},0.2)}

Advertisement volume 7 metadata

data data in [119]

File-tree’s

Dimension

Height 3; There are 2, 6, and 20 clusters at each level

of the tree.

Similarity threshold 0.3


165

010203040

50607080

50 60 70 80 90 100

110

120

130

140

150

160

170

180

190

200

Photos

Exe

cutio

n Ti

me

(ms)

Figure 6-26: Advertisement content determination in a mobile phone

6.4 Discussion

In this chapter, we have discussed the design and the implementation of the SAMi

middleware. The flexibility of the middleware is an important design goal of SAMi. One

way of achieving this goal could be integrating the middleware with well-established

messengers as Google-Talk and Yahoo. Even if most of these messengers permit such kind

of integrations via a plug-in, in our knowledge, no messenger works without accessing the

Internet. Therefore, we designed our own interface for the middleware.

We have tried to use third party simulation software like ns-2. However, the simulators

are designed to test routing protocols and it is difficult to use them to deploy an advanced

information sharing system like SAMi. As a result, we have developed our own simulator.

The main functionalities of the middleware are implemented in a real environment.

J2ME have been used as programming language. As it is a high-level language,

determining the load of a device and state of users were difficult. In J2ME, almost every

computation should be implemented using elementary operations. Advanced computations

(e.g. string manipulation like splitting and merging), are not supported in J2ME.


166

The current version of MIDP does not provide a support to process metadata that has

been presented by the content description metadata models like Dublin-core [104] and

MPEG-7 [105]. It does not ever provide an XML parser. As result, simple strings are used

to represent messages and describe files.

We have used Sony Ericsson mobile phones w910i and w890i as computing devices

during the implementation of the middleware. In these phones and most ordinary phones,

database management systems do not exist. In addition, J2ME does not have a library to

access a database management system. As a result, the data stores are implemented by

using a system called the record management system17 provided by MIDP.

The prototype works correctly on a Sun Wireless Toolkit emulator, a Sony Ericson

w880i and w910i mobile phones and a PC with 3 GHz processing speed and a laptop with

2 GHz processing speed. We used the prototype to enable photo sharing involving 4 Sun

wireless Toolkit emulators. We also used it to perform photo sharing between the laptop

and the desktop computer. However, we faced some difficulties with the communications

involved mobile phones due to the instability of Bluetooth.

6.5 Conclusion

In this chapter, we have discussed a self-adaptive middleware named SAMi. The

middleware uses the approaches discussed in the previous three chapters to perform

information discovery according to the profile of the MANET and the peers participating

in the network. In the middleware, information delivery is performed according to the

profile of the users and the deadline of the file delivery. The middleware is decomposed

into two components; named SAMi-Basic and SAMi-ext. SAMi-Basic is used to perform

the basic functionalities of SAMi and is installed by every device participating in the

information sharing. SAMi-ext is used to perform the expensive activities of SAMi like

rule mining and file classification.

17 The record management system stores data as a list of records. A recrd is an array of bytes.


167

This chapter has also presented the design, the implementation and the evaluation of

SAMi. The test-bed that simulates MANETs and implements the middleware had been

designed by assuming that peers are equipped with a uniform network technology. The

test-bed was used to evaluate the data delivery rate of the middleware.

Furthermore, the prototype, which is developed to illustrate the application of SAMi in

photo sharing and annotation, considers mobile phones as computing devices and

Bluetooth as a network technology. It was used to evaluate the major functionalities of the

SAMi middleware on Sun Wireless Emulators, a PC with 3 GHz processing speed and a

Sony Ericson w910i mobile phone.

169

Chapter 7 Conclusion and Future Work

In the first two chapters, we have identified interest-awareness and mobility awareness as

important research problems in information sharing in a MANET and we have reviewed

important research works in the field of information sharing, service discovery and data

routing with respect to the identified research problems. In the previous four chapters, we

have presented and evaluated our information sharing middleware that has been proposed

to resolve the identified research problems. In this chapter, we summarize our important

contributions. We also analyze our research work according to the requirements stated in

the first chapter and research work presented in the second chapter. Finally, we conclude

the thesis and the chapter by pointing out the main future work envisaged for extending our

research work.

The chapter is organized as follows: section 7.1 summarizes the contribution of the

thesis; section 7.2 analyses our middleware; finally, section 7.3 winds up the chapter and

the thesis by pointing out future works.

Chapter 7: Conclusions and Future Works

170

7.1 Summary of Contributions

In this thesis, we have proposed a novel middleware named SAMi to allow nomadic

users to share information in MANETs. The middleware works by distributing

advertisements and queries. The advertisement of files and the resolution of queries are

performed according to the users’ profile and their context.

The middleware is designed to fulfill the requirements stated in the first chapters, i.e.,

“Pervasiveness”, “Mobility awareness”, “Interest-awareness”, “High level semantics”,

“Social awareness”, “Context aware content delivery” and “Routing of data”. As existing

information-sharing systems in a MANET gives less emphasis to challenges with respect

to the mobility of users and their interests, our work gives more focus to the requirement

“Mobility awareness” and “Interest-awareness”.

To facilitate the advertisement process, we have studied how files are hierarchically

classified into clusters. Clusters are organized in a file tree. The dimension of the tree (i.e.,

its height and the number of clusters at each depth) is determined from the mobility classes

considered by the data source to share information in MANETs.

A mobility class is a concept used to describe a category of MANET-views according to

the users’ stay-times and their contexts. The same advertisement policy is applied in

MANET-views described by the same mobility class. Mobility classes can be determined

from the peers’ stay times or by using habit rules. We have proposed an approach to

compute mobility classes semi-automatically by analyzing the peers’ historical information

sharing behaviors.

SAMi determines the content of advertisements and their dissemination according to the

users’ interests so that the advertisements’ volumes and the load of their routing are

minimized. Similarly, the resolution of a query (i.e., where and how the query is posed) is

performed by observing the users’ interests to provide information. Historical query

analysis, habit rules and social groups are involved in the interest identification process.


171

In SAMi, a file is delivered from one or more information sources. Information sources

are selected according to their profile, the time they stay with the requester and how far

they are found from the requester in terms of distance and number of hops.

SAMi has been deployed in a simulated environment where devices are assumed to

interconnect by wireless network technology with uniform bandwidths. It has been also

deployed over real devices (mobile phones and PCs) interconnected by Bluetooth. The

simulation-environment was used to evaluate the data delivery rate of the middleware.

Experimentation on real devices was used to evaluate the important different

functionalities of the middleware. From the evaluations that have been made, we have

observed that SAMi has a very good potential to serve nomadic users to share information

according to their interests.

7.2 Conclusion

This thesis is designed especially to tackle the challenges related to the mobility of users

and their interests. Table 7-1, the copy of Table 2-3 with the column “SAMi”, compares

SAMi to the existing information sharing systems. As discussed in chapter 2, none of them

gives a special attention towards the requirement “mobility awareness” and “interest

awareness”. The column “SAMi” indicates that our middleware gives attention to all of

these requirements in a better way.

As described below, SAMi deals with all of the requirements specified in the first

chapter, more especially with mobility awareness and interest awareness.

Pervasiveness: SAMi can be deployed in computing devices ranging from mobile phones

to PCs. As it is designed for a MANET, the middleware can be used anywhere and any

time.

Mobility awareness: SAMi adjusts its information discovery strategies according to the

network dynamicity, which is measured by the connectivity lifetime of MANET views.

Mobility classes are used to parameterize the advertisement policy, which determines the


172

extent to which the push discovery approach is applied, according to the connectivity

lifetime of MANET-views.

Interest-awareness: SAMi is designed to work according to the interests of users. Users,

first, exchange their interests to receive and provide information. Data requesters and data

sources resolve queries and make advertisements based on the received interests. We have

proposed an approach that can be used by data-sources to identify and facilitate the data

requesters’ interests.

Table 7-1: Comparing SAMi to existing information sharing systems

Systems

Requirement

Cod

eTor

rent

Lim

e

Lim

eOne

TOTA

OR

ION

Peer

War

e

AdH

ocFS

Ad-

Hoc

Info

War

e

MID

DLE

SAM

i

Pervasiveness ++ ++ ++ ++ ++ ++ ++ + ++ ++

Mobility awareness - - + + - - + + + ++

Interest-awareness - - - - - + - + - ++

High-level semantics - + + + - ++ + - + +

Social awareness - - - - - - - ++ - +

Context aware content delivery + - - - + - - - + ++

Data dissemination - - - - ++ - - - - +

- not considered, + considered partially or in limited way, ++ considered

High-level semantics: In SAMi, files are hierarchically classified into clusters. Clusters

are used to make the file advertisement at high level. However, the semantic meaning of a

cluster is limited since we have used an unsupervised classification approach.

Social awareness: In SAMi, the users’ social networks are used to facilitate the interest

identification process. We have proposed an algorithm that identifies the implicit social


173

networks of users by analyzing their collaborations in a MANET. However, SAMi do not

assist users to identify their explicit social networks (i.e., friends, neighbors, families and

so on).

Context aware content delivery: In SAMi, files are delivered block by block from one or

more sources. In order to facilitate the downloading of rare files, SAMi applies offline

delivery (by using email for example).

Data dissemination: In SAMi, queries and advertisement are disseminated according to

the interests of users. LAR routing protocol is employed to determine the peers located in

the direction of peers interested in the advertisement and able to resolve the query.

7.3 Future Work

Information sharing in MANETs touches many issues and integrates several domains.

An information sharing middleware for MANETs should deal with issues related to data

routing, content delivery, information discovery, information classification and social

networking. As the fields are too numerous to be covered in a thesis, we concentrated on

the main aspects of information sharing and leave the others as open works. The following

are some of the main envisaged future works:

Privacy: The problem of privacy is more challenging in MANETs than in traditional

networks. Access rights can be used to keep the privacy of the users. SAMi can be used, as

it is, by nomadic users to share files with public access right. SAMi can also be easily

extended to advertise files having access right limited to some individuals or groups.

However, assigning access right manually is a tedious task for users. In the future, we will

investigate on designing a method that assists users to assign access rights to files.

File classification: In this thesis, we have tested an unsupervised classification algorithm

to produce a file tree. However, an ontology based classification approach may give a more

semantically meaningful file tree. The problems of an ontology-based classification

approach are related to the creation of the domain knowledge and the formation of a

balanced file tree. Investigating hybrid classification technique is an important future work


174

that can enable a more meaningful and balanced file tree. Another important future work is

the production of a file tree according to the interests of users. Considering the interest of

users during the production of a file tree will facilitate the advertisement content selection

process.

Data routing: In SAMi, we have used the LAR routing protocol. In order to allow LAR

to work in a dynamic environment, it can be hybridize with the DG-CastoR, a routing

algorithm developed in our research team. In future, we plan to study the precisely

hybridization of LAR and DG-CastoR.

Context Management: In SAMi, the context of users is used to determine the mobility

class of MANET-views and the users’ interests. However, we consider only the time and

the location contexts. Moreover, we have not considered complex manipulation of

contexts. For example, we said two location contexts are similar if they are identical or

have inheritance relationships. In our approach, the context (“Bus”, ∅) and (“Tram”, ∅)

are not the same. In the future, we plan to use the work of Dejene Ejigu, a former PhD

student in our research group, to equip the SAMi middleware with an advanced context

management feature.

175

ITS

Glossary of Acronyms

AODV Ad-hoc On Demand Distance Vector

DEN Distributed Event Notification services

DREAM distance routing efficient algorithm

GHI Geographical based Hierarchical Index

GSM Global System for Mobile

GVDs Global Virtual Data structure

Interface Tuple Space

JESA Java Enhanced Service Architecture

JXME JXTA for java ME

JXTA JuXTApose

MANET Mobile Ad-hoc NETwork

MIDP Mobile Information Device Profile

NAT Network Address Translation

P2P Peer to Peer

PDA Personal Data Assistant

PRNET Packet Radio NETwork

SAMi Self-Adaptive Middleware

SAP Service Access Point

SDP Service Discovery Protocols

SLP Service Location Protocol

SNS Social-Network Sites

UPnP Universal Plug and Play

VANET Vehicular Ad-Hoc NETwork

VSM Vector Space Modeling

WLAN Wireless Local Area Networks

WPAN Wireless Personal Area Networks

WWAN Wireless Wide Area Networks

XML eXtensible Markup Language

176

Bibliography

[1] R. Prasad and L. Deneire, “Chapter 7 - Mobile Ad Hoc Networks (MANET),” From

WPANs to Personal Networks: Technologies and Applications, Artech House, 2006,

available on http://common.books24x7.com/book/id_14823/book.asp, last accessed

on 15 April 2010.

[2] K. Sarkar, Subir, T.G. Basavaraju and C. Puttamadappa, “Chapter 1 - Introduction,”

Ad Hoc Mobile Wireless Networks: Principles, Protocols and Applications,

Auerbach Publications, 2008, available on

http://common.books24x7.com/book/id_26393/book.asp, last Accessed on 15 April

2010.

[3] S. Churchil, “Cellular’s 25th Anniversary”, http://www.dailywireless.org/2008/10/

10/cellulars-25th-anniversary/, last accessed on April 15, 2010, Oct. 2008.

[4] K. German, “Cell phone battery life charts- CNET Reviews,” Feb. 2010, available

on http://reviews.cnet.com/cell-phone-battery-life-charts/, last accessed on 15 April

2010.

[5] “iPhone 3GS, http://www.apple.com/iphone/, last accessed on 15 April 2010.”

[6] M. Mühlhäuser and I. Gurevych, “Chapter IX - Opportunistic Networks” Handbook

of Research on Ubiquitous Computing Technology for Real Time Enterprises, IGI

Global, 2008, available on http://common.books24x7.com/book /id_24561

/book.asp,. last accessed on 15 April 2010.

[7] “IEEE 802.11 - Wikipedia,” the free encyclopedia, available on

http://en.wikipedia.org/wiki/Wi-Fi, last accessed on 21 Jan 2009.

[8] Wi-Fi Alliance, “Wi-Fi Direct,” Oct. 2009, available on http://www.wi-

fi.org/news_articles.php?f=media_news&news_id=909, last accesses on 24 Feb

2010.

[9] ZigBee Alliance, “ZigBee and Wireless Radio Frequency Coexistence”, 2009,

available on http://www.zigbee.org/LearnMore/WhitePapers/tabid/257/Default.aspx,

last accessed Feb 2 2010.

http://www.wi-fi.org/news_articles.php?f=media_news&news_id=909

http://www.wi-fi.org/news_articles.php?f=media_news&news_id=909

177

[10] K. Tuan Le, “ZigBee SoCs provide cost-effective solutions,” 2005, available on

http://www.wirelessnetdesignline.com/howto/173500576, last accessed on 15 May

2010.

[11] P. Piccard, “Chapter 10 - eDonkey and eMule,” Securing IM and P2P Applications

for the Enterprise, Syngress Publishing, 2006, available on

http://common.books24x7.com/book/id_10710/book.asp, last accessed on 15 Aprl

2010.

[12] R. Subramanian and D. Brian, “Chapter I - Core Concepts in Peer-to-Peer

Networking,” Peer-to-Peer Computing: The Evolution of a Disruptive Technology,

IGI Publishing, 2005, available on http://common.books24x7.com/book/

id_9175/book.asp, last accessed on 15 April 2010

[13] M. Mühlhäuser and I. Gurevych, “Chapter VIII - Peer-to-Peer Systems,” Handbook

of Research on Ubiquitous Computing Technology for Real Time Enterprises, IGI

Global, 2008, available on http://common.books24x7.com/book/id_24561/book.asp,

last accessed on 20 April 2010.

[14] S. Liebowitz, “Chapter 7 - Copyright and the Internet,” Rethinking the Network

Economy: The True Forces that Drive the Digital Marketplace, AMACOM, 2002,

available on http://common.books24x7.com/book/id_5192/book.asp last accessed on

20 April 2010.

[15] I.J. Taylor, “Chapter 2 - Peer-2-Peer Systems,” From P2P to Web Services and

Grids: Peers in a Client/Server World, , Springer, 2005, available on

http://common.books24x7.com/book/id_16228/book.asp, Last Accessed on 12 April

2010.

[16] M. Miller, “Chapter 10 - The Gnutella Network: The Next Napster?,” Discovering

P2P, Sybex, 2001, available on http://common.books24x7.com/book/

id_3239/book.asp, last Accessed on 15 April 2010.

[17] I.J. Taylor, “Chapter 6 - Gnutella,” From P2P to Web Services and Grids: Peers in a

Client/Server World, Springer, 2005, available on http://common.books24x7.com/

book /id_16228/ book.asp,last accessed on 15 April 2010.

178

[18] giFT-FastTrack, “Documentation of the known parts of the FastTrack protocol,

2004, available on http://cvs.berlios.de/cgi-bin/viewcvs.cgi/gift-fasttrack/giFT-

FastTrack, last accessed on 10 April 2010.

[19] R. Subramanian and D.G. Brian, “Chapter II - Peer-to-Peer Networks for Content

Sharing,” Peer-to-Peer Computing: The Evolution of a Disruptive Technology, IGI

Publishing, 2005, available on http://common.books24x7.com/book/

id_9175/book.asp, last accessed on 15 April 2010.

[20] J.D. Gradecki, “Chapter 2 - An Overview of JXTA,” Mastering JXTA: Building

Java Peer-to-Peer Applications, John Wiley & Sons, 2002, available on

http://common.books24x7.com/book/id_5393/book.asp, last accessed on 15 April

2010.

[21] M. Miller, “Chapter 12 - The KaZaA/ MusicCity Network,” Discovering P2P,

Sybex, 2001, available on http://common.books24x7.com/book/id_3239/book.asp,


[22] H. Balakrishnan, M.F. Kaashoek, D. Karger, R. Morris, and I. Stoica, “Looking up

data in P2P systems,” Commun. ACM, vol. 46, 2003, p. 43-48.

[23] bittorrent, http://www.bittorrent.com/, last accessed on 26 May 2010.

[24] A. Arora, C. Haywood, and K. Pabla, “JXTA for J2ME Extending the Reach of

Wireless With JXTA Technology,” Sun Microsystems, UK, 3 pages, 2002.

[25] C. Lindemann and O.P. Waldhorst, “A Distributed Search Service for Peer-to-Peer

File Sharing in Mobile Applications,” IEEE Computer Society, 2002, 8 pages.

[26] S. Ratnasamy, P. Francis, S. Shenker, R. Karp, and M. Handley, “A Scalable

Content-Addressable Network,” IN PROCEEDINGS OF ACM SIGCOMM, 2001,

p. 161-172.

[27] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan, “Chord: A

Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proceedings of the

2001 conference on Applications, technologies, architectures, and protocols for

computer communications, an Diego, California, USA, 2001, p. 149 - 160 .

[28] A.I.T. Rowstron and P. Druschel, “Pastry: Scalable, Decentralized Object Location,

and Routing for Large-Scale Peer-to-Peer Systems,” Springer-Verlag, 2001, p. 329-

350.

179

[29] B.Y. Zhao, L. Huang, J. Stribling, S.C. Rhea, A.D. Joseph, and J.D. Kubiatowicz,

“Tapestry: A Resilient Global-scale Overlay for Service Deployment,” IEEE

JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, vol. 22, 2004, p.

41-53.

[30] B.Y. Zhao, J.D. Kubiatowicz, and A.D. Joseph, Tapestry: An Infrastructure for

Fault-tolerant Wide-area Location and routing, Technical Report: CSD-01-1141,

University of California at Berkeley, 28 pages, 2001.

[31] I.J. Taylor, “Chapter 9 - Freenet,” From P2P to Web Services and Grids: Peers in a

Client/Server World, Springer, 2005, available on


2010.

[32] L.T. Yang and M. Guo, “Chapter 29 - Resource Discovery in Peer-to-Peer

Infrastructure,” High Performance Computing: Paradigm and Infrastructure, John

Wiley & Sons, 2006, available on http://common.books24x7.com/book/

id_22774/book.asp, last accessed on 15 April 2010.

[33] D. Goh and F. Schubert, “Chapter VIII - Adaptive Peer-to-Peer Social Networks for

Distributed Content-Based Web Search,” Social Information Retrieval Systems:

Emerging Technologies and Applications for Searching the Web Effectively, IGI

Publishing, 2008, available on http://common.books24x7.com/book

/id_23246/book.asp, last accessed on April 2010.

[34] D. Boyd and N. Ellison, “Social network sites: Definition, history, and scholarship,”

Journal of Computer-Mediated Communication, vol. 13, 2007, available on

http://jcmc.indiana.edu/vol13/issue1/boyd.ellison.html, last accessed on 12 April

2010.

[35] D. Martinez, Introduction location related aspects to mobile multimedia

environments, Reports from MSI, university of Växjö, 52 pages, 2006.

[36] N.D. Ziv and B. Mulloth, “An Exploration on Mobile Social Networking: Dodgeball

as a Case in Point”, Proceedings of the International Conference on Mobile Business

IEEE Computer Society, Washington, DC, USA, 21 pages, 2006.

[37] Myspace, “http://www.myspace.com, last accessed on 26 May 2010.”

[38] FaceBook, “www.facebook.com, last accessed on 26 May 2010.”

http://common.books24x7.com/book/id_16228/book.asp

180

[39] N. Jhanji, “ImaHima,”, Sep. 2001, available on http://90.146.8.18/en/archives/

prix_archive/prix_projekt.asp?iProjectID=10954, accessed Dec 10, 2010.

[40] A. Klemm, E. Klemm, C. Lindemann, and O.P. Waldhorst, “A Special-Purpose

Peer-to-Peer File Sharing System for Mobile Ad Hoc Networks,” Proceeding of the

IEEE Semiannual Vehicular Technology Conference (VTC2003-Fall), Orlando, FL,

USA, 6 pages, October 2003.

[41] U. Lee, J. Park, J. Yeh, G. Pau, and M. Gerla, “Code torrent: content distribution

using network coding in VANET”, ACM Press, 6 pages, 2006.

[42] A.L. Murphy, G.P. Picco, and G. Roman, “LIME: A coordination model and

middleware supporting mobility of hosts and agents,” ACM Trans. Softw. Eng.

Methodol., vol. 15, 2006, p. 279-328.

[43] G.P. Picco, A.L. Murphy, and G. Roman, “LIME: Linda meets mobility,” In

Proceedings of the 21stInternational Conference on Software Engineering, Los

Angeles (USA): 1999, p. 368-377.

[44] C. Fok, G. Roman, and G. Hackmann, “A lightweight coordination middleware for

mobile computing,” IN PROCEEDINGS OF THE 6TH INTERNATIONAL

CONFERENCE ON COORDINATION MODELS AND LANGUAGES, vol. 2949,

2004, p. 135-151.

[45] M. Mamei and F. Zambonelli, “Programming pervasive and mobile computing

applications with the tota middleware,” PROCEEDINGS OF THE SECOND IEEE

ANNUAL CONFERENCE ON PERCOM, 2004, p. 263-273.

[46] M. Mamei and F. Zambonelli, “Self-Maintained Distributed Tuples for Field-based

Coordination in Dynamic Networks,” Concurrency and Computation: Practice and

Experience Concurrency - Practice and Experience, vol. 18, 2004, p. 427-443.

[47] G.C. And, G. Cugola, and G.P. Picco, PeerWare: Core Middleware Support for

Peer-to-Peer and Mobile Systems, Technical report, Politecnico di Milano, Italy, 11

pages, 2001.

[48] M. Boulkenafed and V. Issarny, “AdHocFS: Sharing Files in WLANs,” 2nd Int.

Symp. on Network Computing and Applications, 2003, p. 156–63.

181

[49] M. Boulkenafed and V. Issarny, “A Middleware Service for Mobile Ad Hoc Data

Sharing, Enhancing Data Availability,” PROCEEDINGS OF ACM/IFIP

INTERNATIONAL MIDDLEWARE CONFERENCE, RIO DE JANEIRO, 2003, p.

493-511.

[50] T. Plagemann, J. Andersson, O. Drugan, V. Goebel, C. Griwodz, P. Halvorsen, E.

Munthe-kaas, M. Puzar, N. S, K. Steml, T. Plagemann, J. Andersson, O. Drugan, V.

Goebel, and C. Griwodz, “Middleware services for information sharing in mobile

ad-hoc networks - challenges and approach,” IN WORKSHOP ON CHALLENGES

OF MOBILITY, IFIP TC6 WORLD COMPUTER CONGRESS, 12 pages, 2004.

[51] C. Mascolo, L. Capra, S. Zachariadis, and W. Emmerich, “XMIDDLE: A Data-

Sharing Middleware for Mobile Computing,” INT. JOURNAL ON PERSONAL

AND WIRELESS COMMUNICATIONS, vol. 21, 2001, p. 77-103.

[52] J. Jetcheva, Y. Hu, D. Maltz, and D. Johnson, “A Simple Protocol for Multicast and

Broadcast in Mobile Ad Hoc Networks,” IETF Internet Draft, 2001, available on

http://www.monarch.cs.rice.edu/internet-drafts/draft-ietf-manet-simple-mbcast-

01.txt, accessed 24 Nov 2009.

[53] C. Perkins, E. Belding-Royer, and S. Das, “Ad hoc On-Demand Distance Vector

(AODV) Routing, Nokia Research Center,” july 2003, available on

http://www.ietf.org/rfc/rfc3561.txt, last accessed on 26 Nov 2009.

[54] D. Gelernter, “Generative communication in Linda,” ACM TRANSACTIONS ON

PROGRAMMING LANGUAGES AND SYSTEMS, vol. 7, 1985, p. 80-112.

[55] G. Cugola and G.P. Picco, “Peer-to-Peer for Collaborative Applications,” IEEE

Computer Society, 2002, p. 359-364.

[56] C. Ho, K. Obraczka, G. Tsudik, and K. Viswanath, “Flooding for Reliable Multicast

in Multi-Hop Ad Hoc Networks,” IN PROCEEDINGS OF THE 3RD

INTERNATIONAL WORKSHOP ON DISCRETE ALGORITHMS AND

METHODS FOR MOBILE COMPUTING AND COMMUNICATIONS, vol. 7,

1999, p. 64-71.

[57] H. Wu, H. Peng, Q. Zhou, M. Yang, B. Sun, and B. Yu, “P2P Multimedia Sharing

over MANET,” Advances in Multimedia Modeling, 2006, p. 635-642.

182

[58] E. Guttman, “Service Location Protocol: Automatic Discovery of IP Network

Services,” IEEE Internet Computing, vol. 3, 1999, p. 71-80.

[59] E. Meshkova, J. Riihijarvi, M. Petrova, and P. Mahonen, “A survey on resource

discovery mechanisms, peer-to-peer and service discovery frameworks,” Computer

Networks, vol. 52, August. 2008, p. 2097-2128.

[60] R. Hermann, D. Husemann, M. Moser, M. Nidd, C. Rohner, and A. Schade,

“DEAPspace: transient ad-hoc networking of pervasive devices,” IEEE Press,

Boston, Massachusetts, 2000, p. 133-134.

[61] D. Chakraborty, A. Joshi, Y. Yesha, and T. Finin, “GSD: A Novel Group-based

Service Discovery Protocol for MANETS,” IN 4TH IEEE CONFERENCE ON

MOBILE AND WIRELESS COMMUNICATIONS NETWORKS (MWCN), 2002,

p. 140-144.

[62] D. Chakraborty, A. Joshi, Y. Yesha, and T. Finin, “Toward Distributed Service

Discovery in Pervasive Computing Environments,” IEEE Transactions on Mobile

Computing, vol. 5, 2006, p. 97-112.

[63] O. Ratsimor, D. Chakraborty, A. Joshi, and T. Finin, “Allia: Alliance-based Service

Discovery for Ad-Hoc Environments,” IN PROC. OF ACM MOBILE

COMMERCE WORKSHOP, 2002, p. 1-9.

[64] S. Helal, N. Desai, V. Verma, and C. Lee, “Konark - A Service Discovery and

Delivery Protocol for Ad-Hoc Networks,” In Proceedings of the Third IEEE

Conference on Wireless Communication Networks (WCNC), New Orleans, USA, 7

pages, 2003.

[65] M. Klein, B. König-Ries, and P. Obreiter, “Service Rings - A Semantic Overlay for

Service Discovery in Ad hoc Networks,” Proceedings of the 14th International

Workshop on Database and Expert Systems Applications, IEEE Computer Society,

7 pages, 2003.

[66] U.C. Kozat, L. Tassiulas, and M. Ad, “Network Layer Support for Service

Discovery in Mobile Ad Hoc Networks,” Proc. of IEEE/INFOCOM-2003, San

Francisco, USA, 11 pages, 2003.

183

[67] F. Sailhan and V. Issarny, “Proceedings of the Third IEEE International Conference

on Pervasive Computing and Communications,” IEEE Computer Society, 2005, p.

235-244.

[68] F. Perich, A. Joshi, T. Finin, and Y. Yesha, “On Data Management in Pervasive

Computing Environments,” IEEE Trans. on Knowl. and Data Eng., vol. 16, 2004, p.

621-634.

[69] L. Cheng and I. Marsic, “Service Discovery and Invocation for Mobile Ad Hoc

Networked Appliances,” IEEE Second International Workshop on Networked

Appliances(IWNA'2000), New Jersey, USA, 5 pages, 2000.

[70] A. Varshavsky, B. Reid, and E. de Lara, “A cross-layer approach to service

discovery and selection in MANETs,” IEEE International Conference on Mobile

Adhoc and Sensor Systems Conference, Washington DC, USA, 2005, p. 459-466.

[71] V. Lenders, M. May, and B. Plattner, “Service discovery in mobile ad hoc networks:

A field theoretic approach,” Pervasive and Mobile Computing, vol. 1, Sep. 2005, p.

343-370.

[72] S. Preu,” ESA Service Discovery Protocol,” Proceedings of Networking, Pisa, Italy:

LNCS, 2002, p. 1196-1201.

[73] N. Nikaein, H. Labiod, and C. Bonnet, “Distributed Dynamic routing algorithm for

mobile ad hoc networks Mobile and Ad Hoc Networking and Computing,” First

Annual Workshopon Mobile Ad Hoc Network&Computing(MobiHOC),

Boston,USA: 2000, p. 19-27.

[74] S. Murthy and J.J. Garcia-Luna-Aceves, “A routing protocol for packet radio

networks,” Proceedings of the 1st annual international conference on Mobile

computing and networking, Berkeley, California, United States: ACM, 1995, p. 86-

95.

[75] S. Murthy and J.J. Garcia-Luna-Aceves, “An efficient routing protocol for wireless

networks,” Mob. Netw. Appl., vol. 1, 1996, p. 183-197.

[76] C.E. Perkins and P. Bhagwat, “DSDV routing over a multihop wireless network of

mobile computers,” Ad hoc networking, Addison-Wesley Longman Publishing Co.,

Inc., 2001, p. 53-74.

184

[77] D. Johnson, D. Maltz, and J. Broch, “DSR: the dynamic source routing protocol for

multihop wireless ad hoc networks,” Ad hoc networking, Addison-Wesley Longman

Publishing Co., Inc., 2000, p. 139-172.

[78] Z. Haas, “A New Routing Protocol For The Reconfigurable Wireless Networks,” In

Proceedings of the 6th International Conference on Universal Personal

Communications, vol. 2, p. 562-566, 1997.

[79] M. Mauve, J. Widmer, and H. Hartenstein, “A Survey on Position-Based Routing in

Mobile Ad-Hoc Networks,” IEEE NETWORK, vol. 15, 2001, p. 30-39.

[80] Y. Ko and N.H. Vaidya, “Location-aided routing (LAR) in mobile ad hoc

networks,” Journal of wireless networks, Kluwer Academic Publishers, vol. 6, 2000,

p. 307-321.

[81] S. Basagni, I. Chlamtac, V. Syrotiuk, and B. Woodward, “A distance routing effect

algorithm for mobility (DREAM),” ACM Press, 1998, p. 76-84.

[82] T. Atechian and L. Brunie, “DG-CastoR : Direction-based Geocast Routing protocol

for VANET,” IADIS Internal Conference Telecommunications Networks and

Systems TNS, Amsterdam, Netherlands, 8 pages, 2008.

[83] T. Atechian and L. Brunie, “DG-CastoR for query packets dissemination in

VANET,” 5th IEEE Mobile Ad hoc and Sensor Networks MASS, Atlanta, USA, 6

pages, 2008.

[84] A. Shiferaw, L. Brunie, and V. Scutirici, “Interest-Awareness for Information

Sharing in MANETs,” International workshop on Mobile P2P Data Management,

Security and Trust (MP-DMST*) in conjunction with the 11th IEEE International

Conference on Mobile Data Management (MDM), Kansas City, USA, May, 6 pages,

2010.

[85] A. Shiferaw, S. Lajmi, V. Scuturici, and L. Brunie, “PASMi: self-adaptive Photo

Annotation and Sharing Middleware of Mobile Ad-hoc Networks,” Conference on

Pervasive Computing and Communications Workshops (PerComW) 2010,

Mannheim, Germany, , 6 pages 2010

185

[86] L. Limam, D. Coquil, H. Kosch, and L. Brunie, “Extracting user interests from

search query logs: A clustering approach,” In the 7th International Workshop on

Text-based Information Retrieval (TIR '10) in conjunction with the 21st

International Conference on Database and Expert Systems Applications (DEXA

'10), Span: IEEE ed. Bilbao, 5 pages, 2010.

[87] D. Metzler, S. Dumais, and C. Meek, “Similarity Measures for Short Segments of

Text,” Advances in Information Retrieval, 2007, p. 16-27.

[88] R. Xu and D. Wunsch, “Chapter 3 - Hierarchical Clustering,” Clustering, 2009,

available on http://common.books24x7.com/book/id_27271/book.asp, last accessed

on 15 April 2010.

[89] S. Ma and J.L. Hellerstein, “Mining Partially Periodic Event Patterns With

Unknown Periods,” Proceedings of the International Conference on Data

Engineering (ICDE), Heidelberg, Germany, 2000, p. 205-214.

[90] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,”

Proceeding of the Very Large Data Bases (VLDB) Conference, Santiago de Chile,

Chile,1994, p. 487-499.

[91] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets

of Items in Large Databases,” Proceedings of the ACM SIGMOD International

Conference on Management of Data, Washington, D.C., USA, 1993, pp. 207-216.

[92] M. Denko, “PUSMAN: Publish-Subscribe Middleware for Ad Hoc Networks”,

Proceeding of the Canadian Conference on Electrical and Computer Engineering,

Ottawa, ON, Canada: 2006, p. 1677-1681.

[93] B. Mobasher, R. Cooley, and J. Srivastava, “Automatic personalization based on

Web usage mining,” Commun. ACM, vol. 43, 2000, p. 142-151.

[94] M. Eirinaki and M. Vazirgiannis, “Web mining for web personalization,” ACM

Trans. Internet Technology, vol. 3, 2003, p. 1-27.

[95] T. Joachims, “Optimizing search engines using clickthrough data,” Proceedings of

the eighth ACM SIGKDD international conference on Knowledge discovery and

data mining, Edmonton, Alberta, Canada, 2002, p. 133-142.

186

[96] K. Sugiyama, K. Hatano, and M. Yoshikawa, “Adaptive web search based on user

profile constructed without any effort from users,” Proceedings of the 13th

international conference on World Wide Web, New York, USA, 2004, p. 675-684.

[97] H. Lieberman, “Letizia: An Agent That Assists Web Browsing,” international joint

conference on artificial intelligence, , Montreal, Quebec, Canada, 1995, p. 924-929.

[98] J. Budzik and K. Hammond, “Watson: Anticipating and Contextualizing

Information Needs,” In 62nd annual meeting of the American society for information

science, Washington, DC, USA, 1999, p. 727-740.

[99] D. Goldberg, D. Nichols, B.M. Oki, and D. Terry, “Using collaborative filtering to

weave an information tapestry,”Communications of the ACM, vol. 35, 1992, p. 61-

70.

[100] G. Dupret, “Web search engine evaluation using click-through data and a user

model,” Proceeding of the workshop on query log analysis (WWW), Banff, Canada,

8 pages, 2007.

[101] A. Shiferaw, L. Brunie, V. Scutirici, and Y. Fawaz, “Mobility Awareness for

Information Sharing in MANETs,” the 11th IEEE International Conference on

Mobile Data Management (MDM), Kansas City, USA, May, 3 pages, 2010.

[102] W. Su, S. Lee, and M. Gerla, “Mobility Prediction and Routing in Ad Hoc Wireless

Networks,” International Journal of Network Management, vol. 11, 31 pages, 2001.

[103] A. Negash, L. Brunie, and V. Scutirici, “A context aware Information sharing

Middleware for a dynamic pervasive computing environment,” The International

Journal on Computer Science and Information Systems, 2007, p. 65-82.

[104] C. White, Q. Liam, and L. Burman, “Chapter 23 - Introducing the Dublin Core,”

Mastering XML Premium Edition, 2001, available on


2010.

[105] H. Kosch, “Chapter 2 - MPEG-7: The Multimedia Content Description Standard,”

Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-

21,” Auerbach Publications, 2004, available on


2010.

187

[106] E. Chisholm and T.G. Kolda, “New term weighting formulas for the vector space

method in information retrieval,” Technical report, Oak Ridge National Laboratory,

20 pages, 1999.

[107] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, “An

Efficient k-Means Clustering Algorithm: Analysis and Implementation,” IEEE

Trans. Pattern Anal. Mach. Intell., vol. 24, July 2002, p. 881-892.

[108] Y. Zhao and G. Karypis, “Empirical and Theoretical Comparisons of Selected

Criterion Functions for Document Clustering,” Machine Learning, vol. 55, June.

2004, p. 311-331.

[109] “A Tutorial on Clustering Algorithms,” Feb. 2010, available on

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/, last accessed on 17 Jan

2010.

[110] K. Teknomo, “K-Mean Clustering Tutorials”, available on

http://people.revoledu.com/kardi/tutorial/kMean/index.html, last accessed on 25 Feb

2010.

[111] A. Negash, L. Brunie, and V. Scutirici, “A Self-Adaptive Information sharing

Middleware for a dynamic pervasive computing environment The 3rd International

Conference on Wireless Applications and Computing. pp 35-42, Lisbon, Portugal,

July, 2007.

[112] Y. Fawaz, A. Negash, L. Brunie, and S. Vasile-Marian, “Service Composition-

Based Content Adaptation for Pervasive Computing Environment,” The 4th IEEE

International Conference on Pervasive Services (ICPS’07), pp 189-192, Istanbul,

Turkey, July, 2007.

[113] Y. Fawaz, A. Negash, L. Brunie, and V. Scuturici, “ConAMi: Collaboration Based

Content Adaptation Middleware for Pervasive Computing Environment,” The 3rd

International Conference on Wireless Applications and Computing. pp 35-42,

Lisbon, Portugal, July, 2007.

[114] A.K. Dey, “Understanding and Using Context,” Personal Ubiquitous Computing,

vol. 5, 2001, p. 4-7.

188

[115] T. Winograd, “Architectures for context,” HUMAN-COMPUTER INTERACTION,

vol. 16, 2001, p. 401-419.

[116] D. Ejigu, “Context Modeling and Collaborative Context-Aware Services for

Pervasive Computing,” PhD Thesis, INSA de Lyon - France, 245 pages, 2007.

[117] HP Labs, “Jena - a Semantic Web Framework for Java,” available on:

http://jena.sourceforge.net/, last accessed on 20 March 2010.

[118] Y. Fawaz, “Context-Aware Service Composition and Execution for Pervasive

Computing: a data driven approach,” PhD Thesis, INSA de Lyon - France, 213

pages, 2010.

[119] “Testdata,” Department of Computer Science and Engineering, University of

Washington, http://www.cs.washington.edu/research/imagedatabase/groundtruth,


[120] Jama, “A Java Matrix Package“, available on

http://math.nist.gov/javanumerics/jama/, last accessed on 16 Sept 2009.

i

Annex A. Résumé Etendu

Le partage d'informations au sein d'un réseau pair à pair mobile est devenu un sujet

de recherche important grâce aux progrès rapides des technologies de communication sans

fil et des dispositifs mobiles intelligents. Les utilisateurs peuvent partager des informations

d’ordre générale (par exemple, des documents portant sur l’éducation ou le tourisme), des

informations d’ordre personnel (par exemple, des photos et des profils personnels), ou des

émissions en direct (par exemple, des émissions radio ou télévisé).

L'objectif de nos travaux de recherche est de concevoir et d’implémenter un système de

partage d’informations dans un environnement ad-hoc. Le partage d’informations, c'est

mettre à disposition à des personnes avec lesquels on est en contact des données afin de les

visualiser, les modifier ou les télécharger. Ce système permet aux utilisateurs de partager

les informations où et quand ils ont l'occasion sur un MANET. Cette thèse se focalise,

particulièrement, sur les exigences suivantes:

• Ubiquité: les utilisateurs nomades devraient être capables de partager l'information

n'importe où, n'importe quand et en utilisant n'importe quel dispositif.

• Sensibilité à la mobilité: les mécanismes de mise en ouvre de partage

d’information doivent prendre en compte la mobilité des utilisateurs.

• Sensibilité aux intérêts: les fichiers partageables doivent être choisis selon les

intérêts des utilisateurs.

• Sémantique de haut niveau: les fichiers partageables doivent être annoncés via

des descriptions de haut niveau.

• Délivrance de contenu sensible au contexte: la délivrance des fichiers doit être

effectuée en fonction du contexte des utilisateurs et de leur environnement.

• Sensibilité au réseau social: les fichiers partageables doivent sélectionnés en

considérant les réseaux sociaux avec quels appartiennent les utilisateurs.

• Acheminement des données/routage: les annonces et les requêtes doivent être

acheminées en fonction des intérêts des utilisateurs.

ii

Afin de prendre en compte ces nombreux défis, nous proposons un intergiciel appelé

SAMi pour permettre aux utilisateurs nomades de partager des informations.

Ce chapitre est organisé comme suit. Tout d’abord, la sensibilité aux intérêts et à la

mobilité sont examinés dans la section 1. Ensuite, SAMi est présenté dans la section 2.

Enfin, nous concluons ce chapitre en présentant quelques pistes de recherche dans la

section 3.

1 Sensibilité aux intérêts et à la mobilité

1.1 Sensibilité aux intérêts

Dans un MANET, le partage d'informations est effectué par la distribution d’annonces et

de requêtes. Les pairs possèdent des fichiers qu’ils ont l’intention de partager avec les

autres. Nous supposons qu’un fichier est décrit par un ensemble de mots-clés. Les pairs qui

souhaitent recevoir des fichiers recherchent ces derniers via la diffusion de requêtes. Les

requêtes, à leur tour, sont représentées par un ensemble de mots-clés.

Afin d’éviter la surcharge de l'environnement avec des annonces et des requêtes inutiles,

l’annonce de fichiers et la résolution de requête doivent être effectuées selon les intérêts

des utilisateurs.

Un intérêt représente un ensemble de fichiers que l'utilisateur a l’intention de recevoir

ou fournir. Un intérêt, noté I, est représenté par ((k1, .., kn), w) tel que

• k1,.., kn sont des mots-clés (nommé Description (I)) et

• w∈ (0, 1] est un poids (nommé Weight(I)).

Description(I) représente les fichiers représentés par l’intérêt I. Weight(I) indique la

préférence/capacité d'un utilisateur pour recevoir ou fournir les fichiers représentés par I.

Un intérêt vide est défini pour représenter les fichiers que l'utilisateur ne peut pas décrire.

La description de l’intérêt vide Ie est l’ensemble vide, i.e., Description (Ie) = ø.

La similarité entre deux intérêts Ii et Ij est définie comme de la similarité de leurs

descriptions. Notons Similarity(Di, Dj) une valeur de similarité de deux descriptions Di et

iii

Dj. Similarity(Di, Dj) peut être calculée en utilisant la fonction de similarité sémantique

proposée dans [86] ou une des fonctions de similarité lexicale proposées dans [87].

La valeur de la similitude des intérêts Ii et Ij est calculée comme suit:

Similarity (Ii,Ij) =Similarity(Description(Ii), Description(Ij))

Nous définissons la valeur de similarité entre un intérêt et un fichier ou une requête de la

même manière. Soit Df représente la description d'un fichier f et soit q représente une

requête. Similarity(I, f) et Similarity(I, q) sont définies par:

Similarity(f,Ii)= Similarity(Df, Description(I))

Similarity(q,Ii)= Similarity(q, Description(I))

Soit Ei et Ej deux éléments qui peuvent représenter des intérêts, des fichiers ou des

requêtes. Les deux éléments sont similaires (noté Ei ≈ Ej ) si et seulement si :

Similarity (Ii,Ij) ≥ accSim où accSim est un seuil de similarité prédéfini

L’intérêt au partage d’l pair est l'ensemble des intérêts de ce pair dans un contexte de

partage donné. Un contexte de partage 18 d'un pair décrit une situation dans laquelle un

pair permet aux autres de télécharger des fichiers depuis son dispositif. Un contexte de

partage est exprimé par un tuple (L, [Ts, TF]) où L est une localisation et [Ts, Tf] est un

intervalle de temps.

Par exemple, (Bus 1, [8AM, 10AM]) est un contexte de partage décrivant qu'un pair

autorise les autres pairs à télécharger des fichiers depuis son dispositif quand il est dans le

Bus 1 de 8AM à 10AM. D'autres exemples contextes de partage sont listés dans le Tableau

1.

Tableau 1: Exemples de contextes de partage

Contexte Description

(Bus, [8AM, 10AM]) N'importe quel bus de 8AM à 10AM

(“”, [8AM, 10AM]) N'importe quel endroit (lieu) de 8AM à 10AM

(“”, ø) N'importe où et n'importe quand

18 Nous utilisons les concepts «contexte de partage» et «contexte» de façon interchangeable.

iv

Nous définissons deux types de contextes de partage: abstrait et réel. Un contexte de

partage abstrait décrit quand et où un pair autorise les autres pairs à télécharger des fichiers

à partir de son dispositif. Par exemple, un utilisateur peut spécifier que d'autres peuvent

télécharger des fichiers à partir de son dispositif partout et chaque fois qu'il est dans un

MANET en fixant le contexte de partage à ("", ø). Cependant, cela ne signifie pas qu'il est

dans un MANET 24 heures sur 24 et 7 jours sur 7. Un contexte de partage réel est dérivé

d’un contexte de partage abstrait en considérant le temps et le lieu réel dans lequel les

données ont été partagées. A titre d’illustration, supposons qu’un pair ayant un contexte de

partage abstrait ("", ø) est connecté avec d'autres utilisateurs nomades via un MANET dans

le Bus 27 de 8h00 à 8h10. Le (Bus 27, [8h00, 8h10]) est un contexte réel déduit du

contexte abstrait ("", ø).

Un intérêt au partage S est décrit par :

[1] |S| ≥ 1,

[2] Description(I1) ≠ Description(I2) pour tout I1, I2∈ S

[3] ∑∈

=SI

IWeight 1)(

[4] Weight(I) ≥ minW où minW désigne le poids minimal d'un intérêt.

La condition que nous utilisons pour décider de la similitude entre deux intérêts au

partage S1 et S2 est la similitude des intérêts des deux ensembles, c'est-à-dire, pour chaque

intérêt Ii dans S1, il doit y avoir un intérêt Ij dans S2 tel que Ii ≈ I et réciproque. Les intérêts

au partage ne sont pas similaires si cette condition n'est pas satisfaite. Nous utilisons la

mesure de cosinus pour déterminer la similarité entre deux intérêts au partage satisfaisant

la condition principale.

Supposons les deux intérêts au partage S1={I1i,..,I1n} et S2={I21, ..,I2m} tel que |S1|=n et

|S2|=m.

Soit W1i et W2i les poids relatif de I1i et I2i respectivement; la représentation vectorielle de

S1, notée P1, et de S2 , notée P2 ,sont donnée par

• P1=(W11,..,W1n) et

• P2=(W21, ..,W2m)

Soit W12i représente le poids moyen des intérêts de S1 qui sont similaires à l’intérêt I2i.

Soit W21i représente le poids moyen des intérêts dans S2 qui sont similaires à l'intérêt I1i.

v

Pour un intérêt au partage S, soit Sim(S,I) les intérêts de S qui sont semblables à l'intérêt

I;c-a-d Sim(S,I)={Ij| Ij∈S et Ij≈I } ; W12i et W21i sont calculés comme suit:

),(

)(W

21

),(12i

21

i

ISSimI

ISSim

IWeighti

∑∈=

),(

)(W

12

),(21i

12

i

ISSimI

ISSim

IWeighti

∑∈=

Soit P12 égal la représentation vectorielle de S1 par rapport à S2 et soit P21 égal la

représentation vectorielle de S2 par rapport à S1; nous définissons ces vecteurs comme suit:

• P12 = (W121,..,W12m)

• P21=(W211, ..,W21n)

La condition de similitude est satisfaite par S1 et S2 si et seulement si :

∀ Ii ∈ S1, ∃ Ij∈ S2 tel que Ii ≈ Ij et

∀ Ii ∈ S2, ∃ Ij∈ S1 tel que Ii ≈ Ij

Nous définissons la valeur de similitude entre les intérêts au partage S1 et S2 comme suit:

⎪⎪⎩

⎪⎪⎨

⎧ +

=

sinon 0

satisfaiteest principale similitude decondition la si 2

) ,cos(),cos(

),(

122211

21

PPPP

SSSimilarity

Similarity(S1,S2) est commutative. Deux intérêts au partage S1 et S2 sont similaires si et

seulement si :

S1 ≈ S2 ⇔ Similarity(S1,S2) ≥ accC où accC est un seuil de similarité prédéfini.

Un intérêt au partage peut être utilisé comme demande ou provision d’information. Une

demande d’informations d'un pair est un intérêt au partage qui contient les intérêts

décrivant les informations que ce pair souhaite recevoir. Une provision d'informations

contient les intérêts décrivant les informations que ce pair est prêt à fournir.

vi

Information-Demand(p, pd, c) représente la demande d’informations du pair p observée

par le pair pd dans le contexte c. Quand p est égal à pd,, Information-Demand (p, pd, c) est

notée Information-Demand(p,c).

Information-Provision(p, pd, c) représente une provision d’informations d’un pair p

observé par un pair pd dans un contexte c. Quand p est égal à pd, Information-Provision(p,

pd, c) est considéré comme Information-Provision(p, c).

Overall-Demand(P) représentent les intérêts d’un ensemble de pairs P décrivant les

informations que ces pair souhaitent recevoir.

Overall-Provision(P) représentent d’un ensemble de pairs P à décrivant les informations

que ces pair sont prêts fournir.

Les intérêts des utilisateurs peuvent être exprimés manuellement par eux mêmes. Par

exemple, un utilisateur peut déclarer qu'il est intéressé par la réception des blagues dans le

contexte bus 37. Les intérêts des utilisateurs peuvent, aussi, être calculés automatiquement

en utilisant les requêtes et les annonces échangées dans l’historique. Les intérêts peuvent, également, être déterminés en utilisant des règles d'associations.

Une règle d'association liée à une demande d'informations est écrite comme suit :

<Contexte=c> ⇒ <Demande-Information =D>

Exemple: la règle ci-dessous indique que de 8h00 à 8h10 dans n'importe quel lieu, la

demande d'informations de l’utilisateur contient un intérêt lié à la finance (70%) et autre

intérêt lié au tourisme (30%).

< contexte = (””,[8AM-8:10AM]>

< Demande-Information = {({finance},0.70), ({tourisme},0.30)}>

Un pair demandeur peut utiliser des règles d'association afin d'identifier ses demandes

d'informations. Une source de données peut produire des règles d'associations pour

identifier des demandes d’informations des pairs demandeurs. Toutefois, l'extraction de

règles est trop coûteuse pour être utilisée pour chaque pair demandeur rencontré dans un

MANET. Par conséquent, une source de données devra choisir les pairs importants pour

lesquels des règles d'associations seraient produites. Nous proposons l’utilisation des liens

sociaux d'une source de données pour identifier les pairs importants.

Une source de données pouvait avoir plusieurs liens sociaux, l’identification des intérêts

des pairs pourrait être coûteuse. Nous proposons, donc, d’identifier les pairs qui ont

vii

l’habitude de partager des informations avec une source de données, de les placer dans des

groupes sociaux et de les trier selon la similitude de leurs intérêts au partage. Les groupes

sociaux, ainsi crées, sont, alors, utilisés pour identifier les intérêts des pairs.

1.2 Sensibilité à la mobilité

Comme discuté dans la section précédente, le partage d'informations dans un MANET

est généralement effectué par la diffusion des annonces et des requêtes. La sélection et la

diffusion d’annonces de fichiers sont déterminées par une politique d’annonces qui guide

le volume d'informations dans une annonce, la période après laquelle une autre annonce

devrait être faite et le nombre de pairs traversés par une annonce. Pour ne pas surcharger

les environnements avec de trafic inutile, une politique d’annonces doit être conçue selon

les demandes et les provisions d’informations des utilisateurs. Ces dernières sont

conditionnées par le contexte d'utilisateurs, leur temps de connexion et le temps qu'ils

restent ensemble dans un MANET.

Un MANET est défini par une collection de dispositifs connectés par des technologies

de communication sans fil. En pratique, on peut considérer un MANET comme un

ensemble de pairs. Deux pairs peuvent avoir une connexion directe ou ils peuvent être

connectés indirectement via d'autres pairs. Ils sont appelés des voisins directs s’ils ont une

connexion directe et voisins multi-sauts s’ils sont connectés d’une manière indirecte via

d’autres pairs.

Chaque pair a une vue locale d'un MANET appelée MANET-View. Supposons que

chaque utilisateur dans le bus possède un dispositif mobile. Il y a toujours un MANET

dans un bus. Cependant, la MANET-View pour un utilisateur particulier est limitée au

moment où il monte et descend du bus. En plus des contextes liés au temps et au lieu, une

MANET-View d’un pair est limitée par ses connaissances. En effet, une MANET-View d’un

pair continent l’ensemble des pairs avec lesquels il peut communique (directement ou

multi-sauts).

viii

Un MANET, noté par V(P), est un ensemble de pairs communiquant via les technologies

de communication sans fil. Une MANET View, noté V(P, p0), est une projection d'un

MANET définies en utilisant les connaissances d'un pair p0.

Pour les pairs p1 et p2, soit stay-time(p1, p2) le temps estimé que p1 et p2 restent

connectés. Connectivity-lifetime(V(P,p0,)) est défini comme le temps moyen que le pair p0

reste connecté avec les autres pairs dans une MANET View V(P,p0).

Prenons une MANET View V(P,p0) et un pair p0 ∈ P; une statistique de partage, notée

s(p0,c), décrit le comportement quantitatif des pairs dans la MANET View V(P,p0) dans le

contexte de partage c. Une statistique de partage s(p0,c) est composée des attributs

suivants :

Hop(s(p0,c)): distance moyenne entre le pair p0 et les autres pairs,

Files-provisioned(s(p0,c)): nombre moyen de fichiers fournis par p0 à un pair,

Queries-received(s(p0,c)): nombre moyen de requêtes reçues par p0,

Usage-factor(s(p0,c)): nombre de fichiers découverts et téléchargés à partir de p0

grâce à des annonces faites par p0,

Co-lifetime(s(p0,c)): connectivity-lifetime(V(P,p0)) décrit ci-dessus.

Une classe de mobilité noté m(p, c) est une structure utilisée par un pair p pour décrire

un groupe de MANET View selon leur temps de connectivité (connectivity lifetime) dans

le contexte de partage abstrait c. L’idée sous-jacent à la notion de class de mobilité est que

la même politique d’annonces est appliquée dans les MANET-Views décrit par les mêmes

classe de mobilité. Les attributs importants d'une classe de mobilité sont range–

lifetimes(m(p,c)) et adv-policy(m(p,c)). range–lifetimes(m(p,c)), noté [tmn,tmx), indique que

le temps de connectivité d'une MANET View décrite par m(p,c) est supérieur ou égal à tmn

et inférieur à tmx. adv-policy(m(p,c)) est la politique d’annonces appliquée dans des

MANET Views décrites par m(p, c). adv-policy(m(p,c)) est définie par:

• Adv-volume(m(p,c)): volume maximum d'une annonce,

• Adv-radius(m(p,c)): nombre maximal de sauts qu’une annonce traverse, et

• Adv-period(m(p,c)): temps après lequel une annonce devra être répétée.

ix

Un pair peut identifier une class de mobilité en utilisant la durée de vie de connectivité

d'une MANET View, qui est déterminé par les temps de connexions des pairs.

Dans le manuscrit, nous décrivons comment utiliser des règles d'associations pour

déterminer les classes de mobilité dans un contexte donné. Par exemple, la règle

d’association ci-dessous associe la classe de mobilité m3 au contexte bus 3 à 8 heures de 8:

10 heures.

<Contexte = (Bus 3, [8 heures-8: 10 heures])> <classe de mobilité =m3>

1.3 Classification des fichiers

Dans cette thèse, nous proposons d'organiser les fichiers de manière hiérarchique dans

des structures appelées clusters. La structure contenant les clusters est appelée

arborescence de fichiers. La racine de l'arborescence est un cluster artificiel représentant

tous les fichiers partageables.

Une arborescence de fichiers se construit du bas vers le haut. Tout d'abord, les fichiers

sont classifiés dans des clusters, les groupes résultants sont, alors, classifiés dans d'autres

clusters. Ainsi, la classification continue jusqu'à ce qu'on obtienne une arborescence ayant

la profondeur demandée. Des fichiers partageables qui sont ajoutés après le calcule de la

classification vent être insérer dans des clusters de niveau 1.

Nous proposons de calculer la hauteur d'une arborescence de fichiers et le nombre de

clusters définis chaque profondeur selon les classes de mobilité représentatives de l’activité

de partage de utilisateur. Les classes de mobilité représentatives sont les classes de

mobilité qui affichent des différences significatives en termes de volumes d’annonces.

Soit ß représente un seuil indiquant qu’il existe une différence significative entre deux

volumes d’annonces ß > 1. On dit qu’une classe de mobilité mi est significativement plus

grande qu'une classe de mobilité mj (noté comme mi > mj) si et seulement si :

x

β≥−−

)()(

i

i

mvolumeadvmvolumeadv

On dit qu'une classe de mobilité mi est significativement moins grande qu'une classe de

mobilité mj (noté comme mi < mj) si et seulement si mj > mi. Soit nf égal le nombre de

fichiers partageables, soit M représente l’ensemble des classes de mobilité considérées par

un pair pendant le partage d'informations. La liste Mimp ⊂ M est appelé l’ensemble des

classes de mobilité représentatives si et seulement si :

1. mi<mj ou mj<mi, ∀mi,mj ∈ Mimp

2. pour ∀m ∈ M - Mimp, l’une des conditions suivantes est satisfaite

d. ∃mi∈ Mimp tel que β<−−

)()(

mvolumeadvmvolumeadv i

e. β<− )(mvolumeadv

nf

3. L’une des conditions suivantes est satisfaite :

a. La classe de mobilité se trouve dans la liste, c'est-à-dire, ∃mi, mj ∈ Mimp

– {m} tel que mi<m<mj,

b. La classe de mobilité se trouve à la fin de la liste, c'est-à-dire, mi<m ∀mi

∈ Mimp –{m} et β* adv-volume(m) ≤ nf,

c. La classe de mobilité se trouve au début de la liste, c'est-à-dire, m<mi,

∀mi ∈ Mimp – {m} et ∃ mj ∈ M tel que m > mj

La hauteur de l'arborescence de fichiers est |Mimp|. Le nombre de clusters à chaque

profondeur k est égal au volume d’annonces attaché à la classe de mobilité à la kème place

de Mimp.

2 SAMi

Dans cette thèse, nous proposons un intergiciel auto-adaptatif appelé SAMi. La Figure 27

illustre l'architecture SAMi. Chaque pair souhaitent participer au MANET doit exécuter

xi

l’intergiciel. SAMi stocke les données des gestions dans quatre bases de données : “local

repository“, “advertisement data-store“, “MANET View data-store“ et “rule base“.

“Local repository“ et “advertisement data-store“ contiennent les descriptions des

fichiers. La base “MANET view“ contient des informations historiques concernant les

activités de partages. “Rule base“ contient les règles d'associations utilisées pour associer

un contexte à un intérêt au partage ou à la classe de la mobilité.

Le intergiciel est composé de trois modules: (i) le “context manager” ; (ii) l’

“advertisement manager” ; et (iii) le “file manager”. Un dispositif peut exécuter un ou

plusieurs modules. Le module “file manager” est, cependant, obligatoire.

Le module “context manager” détermine les classes de mobilités et les intérêts au

partage. Il détermine, aussi, les besoins d'informations des utilisateurs en analysant leurs

agendas, leurs habitudes et leurs historiques de requêtes.

Advertisement data-store

Local repository

Personal data-store

Rule base MANET-Viewdata store

Adv

ertis

emen

t Man

agem

ent

File

Man

ager

Context Manager

File Discovery

File Delivery

File Adaptation

Figure 27 : Architecture de SAMi

Le module “File manager” effectue les fonctionnalités de gestion de fichiers via les

modules “file discovery”, “file delivery” et “file adaptation”. Le module “file discovery”

est chargé de rechercher des sources d'informations; le module “file delivery” est utilisé

pour télécharger des fichiers; et le module “file adaptation” est utilisé afin de modifier les

formats de fichiers par rapport au contexte et aux préférences des utilisateurs.

xii

Enfin, le module “Advertisement Manager” est chargé de communiquer aux autres pairs

les fichiers partageables stockés dans le dispositif d'une source de données. Il détermine le

contenu et la distribution des annonces selon la classe de mobilité d’une MANET-View et

les intérêts des pairs participant au MANET.

2.1 Le module ‘Context manager’

Le module “Context manager” est chargé de déterminer la classe de mobilité

correspondant à une MANET Veiw. La classe la mobilité décrivant une MANET View est

déterminée en analysant le temps de connectivité de la MANET Veiw. Des classes de

mobilité peuvent également être déterminées à l'aide de règles d'associations. La règle,

décrite ci-dessous, indique que la MANET-View observée dans un bus à tout moment 19 est

décrite par la classe de mobilité m3.

<context = (Bus,∅)> <mobility class = m3>

Le module “Context manager” est responsable de la détection des besoins

d'informations. Les besoins d'informations d'un utilisateur sont déterminés en fonction des

agendas et des habitudes de l’utilisateur. Par exemple, pour un utilisateur qui a l'habitude

d'écouter des chansons pendant un long voyage avec une préférence pour les chansons la

chanteuse Whitney Houston, SAMi commence à rechercher les chansons de cette chanteuse

dés que l'utilisateur à planifie son voyager.

Un utilisateur peut également décrire l'information dont il a besoin pour satisfaire les

activités dans son agenda. Par exemple, il peut préciser la documentation concernant

"Comment faire face à des hommes d'affaires» qui est nécessaire afin de satisfaire l’activité

mentionnée dans son agenda qui concerne une réunion avec des hommes d'affaires.

19 ∅ est utilisé pour représenter tout moment.

xiii

Les besoins en informations d'un utilisateur sont enfin déterminés en utilisant ses

intérêts. Par exemple, des nouvelles sportives sont recherchées si l’on identifie que

l’utilisateur est intéressé par ce type d’information.

Le module “Context manager” est également chargé de déterminer des provisions en

informations et des demandes d’informations des utilisateurs. Les règles d’associations

(comme les règles ci-dessous) sont utilisées pour déterminer les dispositions d'informations

et les demandes d'informations des utilisateurs par rapport à leur contexte.

<context = (Bus, ∅)> <information provision = {({Football},0.5}, ({news},0.5)}>

<context = (Bus, ∅)> <information demand = {({film},0.5}, ({music},0.5)}>

Les groupes sociaux présents dans la MANET View, peuvent être utilisés pour

déterminer les provisions d'informations et des demandes d’informations des utilisateurs.

2.2 Le module: ‘Advertisement Manager’

Le module “Advertisement manager” est responsable de la distribution d’annonces aux

voisins d’un pair. Un message d’annonces contient des clusters trouvés dans l’arborescence

de fichiers à un niveau plus ou moins profond. La classe de mobilité actuelle est utilisée

pour déterminer le volume de l’annonce. La demande globale des pairs est utilisée dans le

but de déterminer le contenu des annonces (c.-à-d. dans le but de bien proposer des

informations qui a priori ils intéressent).

Nous proposons de calculer le contenu des annonces en fonction :

• des intérêts des pairs présentés dans le MANET,

• de la classe de mobilité décrivant la MANET View,

• et de l’emplacement des fichiers dans l’arborescence de fichiers.

Soit m égal une classe de mobilité décrivant la MANET View actuelle. Soit Sod la demande

globale des pairs à la MANET View (i.e., Sod est Overall-Demand(P) défini dans la section

xiv

1.1 tel que P est l’ensemble des pairs participant dans la MANET View). Le quota de

l’annonce, notée N(I), pour l'intérêt I dans l'ensemble Sod, noté N(I), est calculé comme :

N (I) = weight (I)*adv-Volume (m)

Soit F représente un ensemble des fichiers et soit Ck représente un ensemble de clusters

trouvés à la profondeur k de l'arborescence de fichier. Soit F(I) F l’ensemble de fichier ⊆

correspondants à l'intérêt I et soit Ck(I) C⊆ k l’ensemble de clusters correspondant à

l'intérêt I.. Un fichier f et un cluster c est placé dans F (I) et Ck (I) respectivement si et

seulement si (i) c et f sont similaires à I ; et (ii) pour n'importe quel intérêt Ij dans la

demande globale, c et f sont plus similaires de I que de Ij.

Pour un intérêt vide Ie,, c'est-à-dire, Description(Ie)=∅, F(Ie) et Ck(Ie) sont calculé

comme suit.

• et { }

∪eod ISI

IF−∈

= )(-F )F(Ie

• { }∪

eod ISII

−∈

= )(C- C )(IC kkek

L’Algorithme 5 détermine les métadonnées des fichiers et des clusters destinées à être

distribuées dans l'environnement. Le principe de l’algorithme est comme suit : lignes 3 à 6,

toutes les métadonnées des fichiers dans F(I) sont sélectionnées, si N(I) est assez grand

pour annoncer les fichiers en utilisant les métadonnées de chaque fichier. Sinon,

l'algorithme recherche une profondeur de l'arborescence de fichier tel que le nombre de

clusters à cette profondeur sont inférieure de N(I). Cette profondeur est appelée k. Si la

recherche échoue, les métadonnées des clusters les plus similaires à la profondeur 1 sont

placées à ADV(I). Sinon, comme décrit dans les lignes 15 à 22, les métadonnées des

clusters les plus pertinentes trouvées dans de la profondeur k à de la profondeur h sont

placées dans l'ensemble ADV(I), selon leur position dans l'arborescence de fichiers et leur

similitude à l'intérêt I. Après avoir examiné tous les clusters ci-dessus, certaines

métadonnées des fichiers peuvent être placées dans ADV(I).

xv

Algorithm: Préparation de messages d’annonces Input: h, Sod, F(I) ∀I ∈ Sod, Ck(I) pour 0<k≤h et chaque I ∈ Sod h : hauteur de l'arborescence des fichiers Sod : la demande globale F(I) : fichiers correspondant à l'intérêt I Ck(I) : clusters correspondant à l'intérêt I et trouve à la profondeur k Output: Adv(I) pour tout I ∈ Sod Adv(I) : métadonnées d’annonces à l'égard de l'intérêt I ∈ Sod Begin 1. For each I ∈ Sod 2. ADV(I)=∅

/* Sélectionner l'ensemble des métadonnées des fichiers si N(I) est assez grand pour faire de la publicité un par un */

3. If (N(I) ≥ |F(I)| ) 4. ADV(I)={metadata(f)| f∈ F(I)} 5. Exit 6. End If

/* Recherche de la profondeur où il ya moins de N(I) clusters */ 7. k=h 8. While ((|N(I) ≤ |Ck (I)|) && (k>0)) 9. k-- 10. End while

/* S'il n'ya pas de profondeur où il est inférieur à N (I) des clusters, sélectionner quelque clusters en profondeur un Relevant (C, I, n): contient les n plus similaires clusters à l'intérêt I dans C */

11. If (k==0) 12. ADV(I)={metadata(c)|c∈Relevant(C1(I),I,N(I))} 13. Exit 14. End If

// Clusters sélectionner en fonction de leur profondeur dans l'arborescence des fichiers 15. While((|Adv(I)|<N(I)) & (k≤ h)) 16. If (N(I)-|Adv(I)| ≥ |Ck (I)|) 17. ADV(I)={metadata(c)|c∈Ck(I)}U ADV(I) 18. Else 19. ADV(I)={metadata(c)|c∈Relevant(Ck(I),I,N(I)-|Adv(I)|)} U ADV(I) 20. End If 21. k++ 22. End while

/* Sélectionner les fichiers s'il ya encore des places libres en ADV(I) Relevant (F, I, n): contient les n plus pertinentes fichiers (i.e., similaires) à l'intérêt I dans F*/

23. If(|Adv(I)|<N(I)) 24. ADV(I)={metadata(f)|f∈ Relevant(F(I),I,N(I)-|Adv(I)|)} U ADV(I) 25. End If 26. End for End Algorithm

Algorithme 5: Préparation de messages d’annonces

Apres avoir calculé ADV(I), la source de donnés le transfère à ses voisins directs qui

satisfont une des deux conditions :(i) le voisin est situé dans la direction de pairs ayant une

xvi

demande d'informations correspondant à l'intérêt I ; et (ii) le voisin a un haut degré de

collaboration avec la source de données. Dans le premier cas, la méthode proposée par

l’algorithme de routage LAR [81] est utilisée pour sélectionner les voisins en fonction de

leur emplacement. Dans le deuxième cas, les voisins sont déterminés par rapport à leur

historique de partage. Un pair acceptant l’annonce la retransmettra de la même façon.

Un pair-source de données retransmettra éventuellement l’annonce après une période noté

adv-période(m). Dans l'intervalle, le pair va essayer d'améliorer ses connaissances sur le

temps de connexion des pairs dans la MANET View et d’affiner la classe de mobilité

décrivant la MANET View.

2.3 Le module: ‘File Manager’

Le module “File-Manager” est chargé de découvrir et de télécharger des fichiers

correspondant à une requête via deux phases : (i) la découverte d’informations ; et (ii) le

téléchargement d’informations. La phase de découverte d'informations est utilisée pour

découvrir les pairs possédant les fichiers correspondant à la requête. Quant à la phase de

téléchargement, elle est utilisée pour récupérer les fichiers.

F(q) et C(q) représentent, respectivement, les fichiers et les clusters correspondant à une

requête q. La requête q est décrite par une liste de mots-clés. Un fichier f et un cluster c

sont placés dans F(q) et C(q), respectivement si ils sont similaires à la requête q.

Tout d’abord, certains fichiers dans F(q) sont supprimés si ce n’est pas possible de les

fournir. De même, certains des clusters dans C(q) sont supprimés s’il n'est pas possible de

découvrir des fichiers groupés dans ces clusters. Les fichiers sont placés dans F(q) en

fonction de leur pertinence par rapport à q. Si le nombre de fichiers sélectionnés ne suffit

pas, les clusters les plus pertinents dans C(q) sont sélectionnés comme des sources

potentielles de fichiers correspondant à q et les messages de découverte sont envoyés aux

ces sources potentielles. Des messages de découverte sont également envoyés aux pairs

ayant une provision d'informations correspondant à q.

xvii

Après que la phase de découverte d’informations soit achevée, la phase de téléchargement

de l’information commence. Le but de cette phase est de choisir une ou plusieurs sources

d'informations pour télécharger un fichier. Le téléchargement d’informations est exécuté

comme suit :

• SAMi recherche les sources d'informations qui peuvent acheminer le fichier en

entier. Si plusieurs pairs sont capables d’effectuer l’acheminement, SAMi

sélectionne un pair en fonction de sa distance par rapport au pair concerné et au

temps qu’ils vont rester ensemble.

• Si aucun pair n’est pas en mesure de transférer le fichier en entier, SAMi recherche

une combinaison de pairs (p1, p2, …, pk) telle que pi fournit une portion du fichier

(appelé sfi) et c’que la fusion de (sf1, sf2, …, sfk) donne le fichier demandé.

3 Conclusion et perspectives

Dans cette thèse, nous proposons un modèle théorique le système de partage

d'informations adapté aux MANETs qui prouvent découvrir des fichiers selon les intérêts

des utilisateurs et la dynamicité du réseau. Nous proposons, aussi, une méthode

d'organisation des fichiers en arborescence permettant de faciliter la découverte des

fichiers. Pour mettre en œuvre le modèle théorique proposé, nous décrivons un intergiciel

auto-adaptatif appelé SAMi.

Actuellement, SAMi peut être utilisé pour permettre aux utilisateurs nomades de partager

des fichiers sous condition sur les droits d’accès. Dans l'avenir, nous planifions d’étendre

SAMi pour distribuer les annonces des fichiers selon les droits des pairs.

Dans cette thèse, une arborescence de fichiers est construite en utilisant une technique de

classification non supervise afin de faciliter la découverte des fichiers. Dans l'avenir, nous

planifions d’utiliser une technique fondée sur une ontologie pour enrichir la technique non-

supervisée proposé. Nous planifions, aussi, d’utiliser des intérêts des utilisateurs pour

optimiser la classification des fichiers.

xviii

Annex B. Detailed Design of SAMi

State Diagram

In the middleware, a user and a device have states as shown in the state diagrams displayed

in Figure B-1 and Figure B-2.

Figure B-1: State of a device

A device has four main states: isolated-idle, isolated-busy, inMANET-idle and inMANET-

busy. The prefix isolated and inMANET indicate a device is in and not in a MANET

respectively. The suffix idle indicates that no program is running on the device while the

suffix busy indicates that programmers are running.

There are two important states for a user: States idle and busy. A user can be interrupted in

the idle state.

xix

Figure B-2: States of a user

Activity Diagram

Advertisement (Figure B-3) is performed when a peer enters in a MANET by using

activities listed in Table B-1. The advertisement policy (period, content and radius of the

advertisement) should be determined by the advManager. The advertisement time and

advertisement message are determined and prepared by the advManager. When the

advertisement time is arrived, the advertisement message is distributed by the messenger

object. The above process is repeated until the device is out of the network.

In SAMi, information-needs of a user can be identified from queries of users as discussed

in Figure B-4 and from their agendas as in Figure B-5 by using activities mentioned in

Table B-1. The entered query can be searched directly if the device is in MANET and is

idle. Otherwise, the query is passed to query Manager for later treatment, otherwise.

When a user enters an agenda, the information manager extracts a query in order to search

documents that are needed to accomplish the agenda. If the device state is inMANET-idle

and the query should be treated urgently (the agenda is planned after a few hours) or the

query goes with the context of the environment (the interests of the user in the MANET

matches with the query), the query is treated directly, and it will be treated later, otherwise.

<Advertisement>M

esse

nge

rC

ont

ext

Ma

nage

rA

dvM

ana

ger

Dev

ice

[enters in a MANET] prepare advMessage calculateTimeToAdv

sendAdvMessage

[it is TimeToAdv]

determine profile

determine adv-policy

Figure B-3: Activity diagram of advertisement

Table B-1: Important activities to perform advertisement

Activity Description

Determine profile calculates the interests and the mobility class of users

Determine Adv policy determines the period, the content and radius of

advertisement with respect to users’ interest and mobility

class

Prepare advMessage prepares advertisement message

calculateTimeToAdv determines the time to make advertisement as current time

plus a random number between zero and the period of

advertisement

sendAdvMessage distributes advertisement in the vicinity

xx

<Information extraction from user >

<Device> <InfManager><user>

[Enter a query] [state = inMANET-idle]

[state != inMANET-idle]

Search file

search is successful

search isn't successful

Treat a query later

Figure B-4: Searching information for a user query

As shown in Figure B-6, queries that have been kept for later treatment are searched if the

device enters in another MANET. The query which deadline is approaching will treated

first. The query that goes with the information provision capacity of the user will be treated

next.

xxi

<Extract information need from Agenda ><I

nfM

anag

er>

<D

evic

e><U

ser>

[agenda is entered]

[need information for the agenda]

[state = inMANET-idle]

[query go enviromental context]

[urgent query] [search is successfull ]

Treat a query later

extract Query

search file

[Nothing is needed]

(State != inMANET-idle]

[Other queries] [Search is not successfull ]

Figure B-5: Activity diagram of information extraction

<Query treatement>

<In

fManager

><D

evi

ce>

[state=INMANET-idle]

take a urgent query

no uregnt query

take a contextual query

search a file

[state=INMANET-idle]

[no query to treat]

treat a query later

[search is successful ]

[Search is not successful ]

[state != INMANET-idle]

There are queries

Figure B-6: Activity diagram of query treatment

xxii

xxiii

Table B-2: Activities to extract and search information

Activity Description

Search file Searches a file expressed by a query

Treat a query later Puts a query for later treatment

Extract a query Extracts a query from a user agenda or habit

Take urgent query Selects a query which will be expired before a peer

involves in another MANET

Take contextual query Selects a query which go with the information provisions

of users in the vicinity

Advertisement can be used to identify information-sources for a query as shown in Figure

B-7. The file indicated by the advertisement will be downloaded if it does not exist locally

and matched with a query.

<Usage of Advertisement >

<Inf

Man

ager

><M

esse

nger

>

[advertisement is accepted]

matches a query download file

[the advertisement is for a file[matches with historical query download file

Figure B-7: File searching from incoming advertisement

Rule identification is done offline as shown Figure B-8. When a device is in isolated-idle

state, the class rule-miner estimates the time that a peer stays in the state (calculate life-in-

state). If the time is enough to mine rule, the rule mining will be performed.

xxiv

<Rule mining>

<Rule Miner><Device>

[device is isolated -idle]calculate

life-in-state, the time that a node stayes in

this state

[life-in-state < mining-time

mine-rules

Figure B-8: Activity diagram of rule mining

As rule identification, file classification and representation are done offline. As shown in

Figure B-9, when the middleware starts working it represent the files, classify them into

clusters and then represent the clusters. A new file is grouped under a leaf-cluster that is

more similar to the file. When a tree is unbalanced, it will be modified to create a balanced

one. The modification of tree can be done when the environmental context is changed. As

classification of files, modification of tree is done offline.

Figure B-9: Activity diagram file representation and classification

xxv

Sub-system Decomposition

The component SAMi-adaptor (Figure B-10) contains only one package. It passes inputs

entered through other messenger to SAMi-basic and displays the output produced by

SAMi-basic by using interface provided by the messenger. The main component of SAMi-

adaptor is the interface plug-in.

Figure B-10: A SAMi-Adapotor yahoo messenger

SAMi-thin (Figure B-11) is used to allow thin devices to participate in the information

exchange. It is composed of 3 packages: Login; Collaborator and UserInterface. Note that

the packages are not unique for this component. They can be used with/without mediator

for the other components as well.

The login package is used to verify that an authorized user accesses the middleware. The

UserInterface package is used to accept basic inputs of SAMi, i.e., a query, a user profile

and an agenda. The Collaborator package is used to ask other peers in the surrounding to

search information on behalf of a user owning a thin device.

xxvi

Figure B-11: SAMi-thin

As shown in the diagram displayed in Figure B-12, the SAMi-GUI component contains a

package called UserInterface, which is also a part of the component SAMi-thin. The

package contains four interfaces and four classes that implement the interfaces. The class

guiFileIO is used by to accept a query and to display a query recommendation,

advertisements, and files that are downloaded recently. The class uiUserIO is used to

accept a user preference, state, agenda and profile. The envIO is used by the administrator

to configure mobility classes. The guiMain is used by a user to navigate from one interface

to the other.

Figure B-12: SAMi GUI

xxvii

SAMi-GUI is implemented by extending the user interface classes of J2ME. It is consists

of the classes displayed in Figure B-13.

mainMenu

browseMenu aboutForm settingMenusearchForm

advFormruleForm

tempStoreFormHistForm

habitFormagendaForm

prefFormmobilityForm

Figure B-13: Classes in SAMi-GUI

The component SAMi-core (Figure B-14) is composed of the three packages that access

the advertisement data-store, the MANET-View data-store and the local repository. The

advertiser package is responsible to distribute and to manage advertisements according to

the context of a user and the environment. The inf-Manager package searches and

downloads files according to a user query, agenda and habit.

xxviii

Figure B-14: SAMi-core

SAMi-ext (Figure B-15) is composed of two packages: ruleExtractor and extInfMang. The

ruleExtracto package identifies rules by analyzing historical data-store and puts the

resulted rules in rule Base. The extInfManager package is used to classify files into

clusters, represent files and clusters in vector space, and manages file adaptation.

Figure B-15: SAMi-ext

xxix

Annex C. Important classes of SAMi

Inf-Manager

lstFiles: the list of metadata of shareable files.

lstCluster: the list of metadata of clusters found in each depth of the file tree.

THeight: the height of the file tree.

resp-limit: the maximum number of files returned for a query.

numUploads: the number of files sent to the neighbors in the current session.

processQuery(): identifies files that match a query and prepares a response.

prepareReponse(): prepares a response for a query.

removeException(): removes the files that are identified as exception by a query.

searchByTitle(): searches files according to their title.

searchbyCategory(): searches files according to their category.

xxx

getTHeight (): returns the height of the file-tree.

mapFileToInterest(): identifies files that match with the interest of a user.

mapFileByCategory(): identifies files that have/similar to a given category.

mapFileByTitle(): identifies files that have/similar to a title passed as an argument.

mapClusterToInterest(): identifies clusters that match with the interest of a user.

mapClusterBy category(): identifies clusters containing files having/similar to a given

category.

mapClusterByTitle(): identifies clusters that have/similar to a title passed as an argument.

nearer():returns the interest which is more similar to a given file.

uploadFile():sends whole or a part of a file.

isExist():returns a file having a given meta.

Discovery-Manager

lstDiscovery: a list of discovery objects that can be used to discover files for a query.

maxFile: the maximum number of files searched for a file.

searchFile(): searches files and their sources by creating a discovery object.

cleanDiscovery(): removes a discovery object and register a query dealt by the object in

historical data-store.

accpetResponse(): accepts a response and hands it to an appropriate discovery object.

searchDiscovery(): searches a discovery object that searches a response for a given query.

xxxi

File Discovery

maxfile: the maximum number of files that can be discovered for a query.

discoveryDeadline: the maximum time that files should be discovered.

q: a query for which information is discovered.

lstInf: a list of files discovered for a query.

lstRsp: a list of responses accepted for a query.

searchFile():searches file for a query by distributing discovery message and from

advertisement data store.

setMetafile():adds the metadata of a file and the source of the file in lstInf.

distributeMessage():distributes discovery message for potential sources.

getInfDis():returns the attribute lstInf.

searchAgain(): performs a further search.

approvalDelivery(): passes the lstInf to the deliveryManager object.

getQueryID(): returns the id of the query that the object is dealing with.

acceptResponse():accepts a response for the distributed message.

xxxii

DeliveryManager

download: assigns objects to download a given list of files.

cleanDelivery: removes a delivery object.

acceptDelivery: transfers a portion of a file to an appropriate delivery object.

searchDelObject: searches a delivery object that deals with the file specified by a given

metadata.

File Delivery

maxSource: the maximum number of sources from which the file can be downloaded.

Meta: the metafile considered by the object.

lstReq: the number of delivery-requests prepared by the object.

lstInf: the downloaded parts of a file with their owners.

lstSources: a list of the profiles of sources of the file with the metafile referred.

downloadFile(): searches list of sources to download the required file.

downloadPartially(): downloads some parts of the file.

acceptFilePortions(): accepts a portion of a file.

xxxiii

distributeMessageDelivery(): distributes delivery messages to the sources of the file that

the object is dealing with.

mergResult(): merges the portions of the file.

canDownLoadFull(): checks if the file that a object deals with can be downloaded by

using the given list of sources.

searchRequest():Searches a request that is sent a given source

Response

queryID: the identifier of a query about which a response is dealing.

numFile: the number of files matching with the query.

lstFiles: metadata of files matching with the query.

simValues: list of similarity values where the ith value indicate the similarity between the

ith file and the query

Query

queryID: the identifier of a query.

title: the title of the file about which the query is dealing.

catagories: the categories of the file to be searched.

xxxiv

deadline: the time after which a file should be no more searched or downloaded for the

query.

exceptions: the identifiers of the files to which the query doesn’t stand

setDeadline(): assigns the deadline of the query

Download Request

meta: the metadata of the file to be downloaded.

sourceID: the identifiers of a peer from which parts of the file will be downloaded.

requestTime: the time when the request is distributed.

divisionBy: the number that indicates into how many parts the file is divided into.

requestedPart:the part of the file that will be downloaded from the peer referred by the

object.

FileAdv

meta: the description of a file

owners: the identifiers of peers that have advertised the file referred by the object

xxxv

ClusterAdv

meta:A metadata of a cluster

owners:The identifiers of peers that have advertised the cluster referred by the object

Adv-Manager

xxxvi

radius: the number of hops that the advertisement traverses

period: the time interval between two successive advertisements

numAdv: the volume of the advertisements

advCont: the content of the advertisement

advTimeThershold: the time up to when the advertisement distribution can be delayed

IDAdvFile: the identifier of files that have been advertised to the current neighbors

IDAdvCluster: the identifier of clusters that have been advertised to the current neighbors

prevAdv: advertisements that have been made in recent history

intializeAdv(): sets the attributes the object as of the mobility class

setNumAdv(): recalculates the volume of advertisement with related to the usability of the

previous advertisement

scheduleAdv(): schedules the advertisement

prepareCont(): initializes the content of advertisement

setCont( ): selects the files and clusters to be advertised

sendAdv(): sends advertisement to a neighbor

setIDFilesClustersAdvertised(): identifies the identifiers of files and clusters that have

been advertized to the current neighbors

addFile(): adds files to be advertised

addIsolatedFile: adds a file which is classified under no cluster that the adv-manager is

aware of

addCluster(): adds clusters to be advertised

addNonIsolatedFile(): adds a file which is classified under a cluster that the adv-manager

is aware of

createBalanceDoc(): makes sure that the number of files/clusters for interests doesn’t

show significant differences as much as possible

resolveConflict(): makes sets of files matching two interests are disjoint

setIDFileCluster(): identifies of files/clusters advertise for neighbors

xxxvii

PositiveDoc

isoFiles: files that match an interest and classified under no cluster in the attribute cluster

nonIsoFiles: files that match an interest that also match files in isoFiles and classified at

least under a cluster in the attribute cluster

clusters: clusters matching the interest that match the files in isoFiles

THeight: the height of the file tree

numAdv: the volume of advertisement

intialize(): initializes the attribute isoFiles, THeight and numAdv

addClusters(): adds clusters in the attribute clusters and move files classified under this

clusters from isoFiles to nonIsoFiles

getNumAdv(): returns the volume of advertisements

getNumFiles(): returns the number of files that are referred by the object

getNumCluster(): returns the number of clusters that are referred by the object

getFiles(): returns the IDs of files referred by the object

getNumIsoFiles(): returns the number of isolated files

xxxviii

getNumNonIsoFiles(): returns the number of non-isolated files

getIsoFiles(): returns the IDs of the isolated files

getNonIsoFiles(): returns the IDs of the non-isolated files

getNumIsoClusters(): returns the number of isolated clusters, i.e. the clusters that are

referred by the object and are classified under no cluster referred by the object

getNumNonIsoClusters(): returns the number of non-isolated clusters, i.e. the clusters that

are referred by the object and are classified at least under a cluster referred by the object

getIsolatedClusters():Returns the ID of the isolated clusters referred by the object

getNonIsolatedClusters():Returns the ID of the non-isolated clusters referred by the object

PositiveCluster

isolatedClusters: a list of clusters that match an interest, founds at the same depth and

classified under no clusters matching the same interest

nonIsolatedClusters:A list of clusters that match the interest matching the clusters in

isolatedClusters, founds at the same depth and classified under another clusters matching

the same interest

addCluster():Adds an id of a cluster in isolatedClusters

getIsolated():Returns the ids of the isolated clusters in isolatedClusters

removeIsoCluster(int at):Removes a cluster from isolatedClusters

addNonIsoCluster():Adds an id of a cluster in nonIsolatedClusters

getNumClusters():Returns the number of clusters

getNumIsolated():Returns the number of isolated clusters

getNumNonIsolated():Returns the number of non isolated clusters

getIsolated():Returns the ids of t clusters in isolatedClusters

xxxix

ProQueryManager

maxNum: the maximum number of proactive queries.

lstProQuery: the list of proactive queries.

maxTime: the maximum time that a proactive query can be kept for approval.

setQueryForApproval(): adds a proactive query in lstProQuery.

deleteQuery(): deletes a proactive query.

cleanProQuery(): deletes proactive queries that are formed before maxTime.

approvesQuery(): starts searching files for the query approved by a user.

TempFileManager

tempFolderPath: the path of the folder where files can be stored temporally.

totalSize: the maximum size of memory that can be occupied by temporary files.

occupiedSize: the actual size of memory occupied by the temporary files.

lstTemfile: the metadata of files stored temporally.

xl

maxTime: the maximum lifetime of a temporarily file.

downloadFileTemporarly():downloads files temporally.

deleteTemporarly(): deletes a temporary files that disapproved by a user.

approvedTempory()): moves a file to a place indicating by a user during his/her approval.

cleanTempory(): deletes unapproved files that are download before maxTime.

AdvStore

lstFileAdv: a list of the advertised files.

lstClstAdv: a list of the advertised clusters.

lstPlatAdv: a list of the advertised devices’ profiles.

timeTolerance : the time during which the advertisements of the expired/the disconnected

peers is tolerated.

uselessTime: the minimum stay-time that a peer should have in order to considered during

query resolution.

memLimit: the maximum volume of the memory allowed to store advertisement.

xli

addAdv(): adds advertisements in the advertisement data store.

modifyDeviceProfile(): modifies a device’s profile.

adjustAdv(): deletes some advertisements according to the time and memory constraints.

deleteFileAdv(): deletes file advertisements belongs to a peer with a given ID.

deleteClusters(int dID): deletes file advertisements belongs to a peer with a given ID.

searchFile() :searches files go with a given query.

searchPotential(): searches potential sources of a file that the query is dealing with.

Sharing-history

t: the time context of a MANET-view.

loc: the location context of a MANET-view.

lstLT: a list of peers’ stay-times.

lstDemands: a list of users’ information demands in the MANET-View.

lstProvisions: a list of users’ information provisions in the MANET-View.

lstFileDist: the metadata of files that have been advertised in the view.

lstQueries: queries that have been distributed in the view.

getInterestsFromInfExc (): extracts interest from files exchanged.

getInterestsFromQueries(): extracts interests from queries.

xlii

HistDataDistributed

distID: an identifier of the file that has been distributed.

title: the title of the file.

category: the category of the file.

frequency: the number of times that the file has been distributed.

MANETID: the identifiers of the MANET-Views where the distribution has been done.

addFrequency(): increments the distribution frequency.

mergUnder(): merges a given historical data with the one referred by the object.

getSimValue(): calculates the similarity between the file that the object refers to and a

given file.

histMANET

MID: an identifier of a MANET-View.

location: the location context of the view.

xliii

time: the time context of the view.

Overall- demand: the overall information demand of peers in the view.

Overall- provision: the overall information provision of peers in the view.

avgLT: the average stay-time of peers in the view.

MetaFile

ID: the identifier of a file.

title: the title of the file.

categories: the categories of the mentioned file.

Device Profile

Id: the identifier of the device referred by the object.

Stay-time: the time where the device rests connected with the peer in consideration.

modTime:the time where the referred device sent its stay-time.

X: the x position of the device.

Y:the y position of the device.

getTTL():returns the time after which the device is unreachable.

modifyStayTime():changes the values of the lifetime and modTime attributes.

xliv

User habit

rTime:A time when a user usually performs an habit referred by the object.

duriation:The duration of the activity referred by the habit .

lstReqDocs:The documents needed to perform the habit.

location:The location where a user performs activities referred by the habit.

User Agenda

rTime: A time when a user performs an agenda referred by the object.

duriation: the duration of the agenda.

activities: activities included in the agenda.

lstReqDocs: the documents needed to perform the agenda.

location: the location where the agenda is performed.

xlv

Interest

description: the description of a user interest.

location: the locations where the interest is dependent.

now: marks if the interest is only used for the current MANET-View.

always: marks if the interest is applicable anywhere and anytime.

TimeFrom: the time after when the interest is applicable.

TimeTo: the time after when the interest is no longer applicable.

histQuery

queryID: the identifier of a query.

title: titles of files that have been discovered for the query.

catagory: categories of files that have been discovered for the query.

frequency: the number of times that a user poses the query.

MANETID: the identifiers of the MANET-views where the query been posed.

addFrequency(): increments the usage frequency of the query.

mergUnder(): merges a given query with the one referred by the object.

xlvi

getSimValue(): calculates the similarity between the query that the object refers to and a

given query.

getProQuery():Prepares a proactive query

DocDescriptors

Description: a list of words describing a cluster or a file.

maxKeywords: the maximum number of keywords used to describe a cluster/file.

addWord(): adds a word in lstWords.

getSimValue(): calculates the similarity the description with a given title, which can be a

list of String/keywords.

adjustList(): adjusts the keywords in the description according to the quota by removing

less important words.

Keyword

word: a word that describes a cluster/file.

stemWord: the stemmed form of the word.

Freq: the number of times that a word appears in the title of the file or the cluster.

increaseFreq(): increases the value of the attribute freq by one

xlvii

MetaCluster

CID: the identifier of a cluster.

depth: the depth of the cluster in the file tree.

IDFlsUnder: the files classified under the cluster.

IDClstUnder: the clusters classified under the cluster.

lstKeywords: a list of keywords describing the cluster.

addElement(): groups a given file/cluster under the cluster referred by the object.

getSimValue():calculates the similarity between the referred cluster and the given

file/cluster.

ClusterAtDept

Clusters: the clusters found at the same depth.

get(): returns the ith cluster

xlviii

infClassifier

lstCollection:Clusters at each depth of the file tree.

lstFile:A list of files’ metadata.

THeight:The height of the file tree.

lstNBClusteres:Number of clusters at each depth of the file-tree.

balanceTree():Balances the file tree.

createTree():Creates a file tree.

classifyFiles():Classifies files into different clusters.

classifyClusters():Classifies clusters into other clusters.

List of Publications

I. International Journals

• A. Negash, L.Brunie, V.Scutirici,” A context aware Information sharing Middleware

for a dynamic pervasive computing environment”,The International Journal on

Computer Science and Information Systems, Vol. 2, No. 2, pp. 65-82, ISSN: 1646-

3692, 2007

II. International Conferences • A. Negash, L.Brunie, V.Scutirici, “SAMi: A Self-Adaptive Information sharing

Middleware for a dynamic pervasive computing environment”, The International

Conference Wireless Applications and Computing. 6 pages, Lisbon, Portugal. 2007

• Y. Fawaz, A. Negash, Lionel Brunie, and Vasile-Marian Scuturici, “ConAMi:

Collaboration-based content Adaptation Middleware for Pervasive Computing

Environment”. The 4th IEEE International Conference on Pervasive Services

(ICPS’07), pp 189-192, Istanbul, Turkey, July, 2007.

• Y. Fawaz, A. Negash, Lionel Brunie, and Vasile-Marian Scuturici, “Service

Composition-based Content Adaptation for Pervasive Computing Environment”. The

3rd International Conference on Wireless Applications and Computing. pp 35-42,

Lisbon, Portugal, July, 2007 (chosen as a best paper).

• A. Shiferaw, S. Lajmi, V. Scuturici and L. Brunie, PASMi: self-adaptive Photo

Annotation and Sharing Middleware of Mobile Ad-hoc Networks, Conference on

Pervasive Computing and Communications Workshops (PerComW 2010), 6 pages,

Mannheim, Germany, 2010.

A. Shiferaw, L.Brunie, V.Scutirici, Y.Fawaz, “Mobility Awareness for Information

Sharing in MANETs”. The 11th IEEE International Conference on Mobile Data

Management (MDM 2010), 3 pages, Kansas City, USA, May, 2010(in press.)

• A. Shiferaw, L.Brunie and V.Scutirici, Interest-Awareness for Information Sharing in

MANETs, International workshop on Mobile P2P Data Management, Security and

Trust (MP-DMST*), 6 pages, Kansas City, USA, , May, 2010 ( in press.)

http://liris.cnrs.fr/publis/?id=2876





Curriculum Vitae

Informations Personnelles

Nom, Prénom Addisalem Negash Shiferaw

Date et lieu de naissance le 11 mars 1977 à Addis-Abeba (Ethiopie)

Etat civil Célibataire

Nationalité Ethiopienne

Langue : Anglais, Français, Amharique (langue d’Ethiopie)

Formation

• Doctorante en informatique au laboratoire LIRIS, INSA de Lyon (Jan.2006 - Juillet 2010)

• Master en informatique, Département d’Informatique, Université d’Addis-Abeba, Addis-Abeba, Ethiopie (Sept. 2002 - Juillet 2004).

• ‘Bachelor of Science (BSc)’ en informatique, Département de mathématiques, Université d’Addis-Abeba, Addis-Abeba, Ethiopie (Sept.1996 - Juillet 2000).

• E.S.L.C.E (Diplôme de fin d’études secondaires), Addis-Abeba, Ethiopie (Mai 1996).

Expérience professionnelle

• Enseignement en informatique, Département de Mathématiques et Informatique, Université d’Addis-Abeba (Sept. 2000- Oct 2004).

Expérience administrative

• Intervenante et responsable pour les cours de bureautique et logiciels d’application, formation continue, Département de Mathématiques et Informatique, Université d’Addis-Abeba (Juin. 2002-Sept 2002).