N° d’ordre: 2010-ISAL-0052 Année 2010 Thèse Partage d'informations sensible à la mobilité et à l’intérêt des utilisateurs dans les réseaux mobiles ad-hoc Présentée devant L’Institut National des Sciences Appliquées de Lyon (INSA de Lyon) Pour obtenir Le grade de Docteur Ecole doctorale INFOMATHS : « Informatique et Mathématiques» (Spécialité : Informatique) Par Addisalem Negash Shiferaw Soutenue le 12 juillet 2010 devant la Commission d’examen composée de: Prof. Sylvain Lecomte Université de Valenciennes Rapporteur Prof. Jean-Marc Pierson Université de Paul Sabatier-Toulouse 3 Rapporteur Prof. Ernesto Damiani Université de Milan Examinateur Dr. Richard Chbeir Université de Bourgogne Examinateur Dr. Dawit Bekele Gouvernance de l’Internet en Afrique Examinateur Prof. Lionel Brunie INSA de Lyon Directeur de thèse Dr. Marian Scuturici INSA de Lyon Co-Directeur de thèse
255
Embed
Portail documentaire SCD Doc'INSA | INSA de Lyon - …docinsa.insa-lyon.fr/these/2010/shiferaw/these.pdfI am grateful to my friends Yaser Fawaz and Sonia Lajmi with whom I made very
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
N° d’ordre: 2010-ISAL-0052 Année 2010
Thèse
Partage d'informations sensible à la mobilité et à l’intérêt des utilisateurs dans les réseaux mobiles ad-hoc
Présentée devant L’Institut National des Sciences Appliquées de Lyon
(INSA de Lyon)
Pour obtenir Le grade de Docteur
Ecole doctorale INFOMATHS : « Informatique et Mathématiques»
(Spécialité : Informatique)
Par Addisalem Negash Shiferaw
Soutenue le 12 juillet 2010 devant la Commission d’examen
composée de:
Prof. Sylvain Lecomte Université de Valenciennes Rapporteur
Prof. Jean-Marc Pierson Université de Paul Sabatier-Toulouse 3 Rapporteur
Prof. Ernesto Damiani Université de Milan Examinateur
Dr. Richard Chbeir Université de Bourgogne Examinateur
Dr. Dawit Bekele Gouvernance de l’Internet en Afrique Examinateur
Prof. Lionel Brunie INSA de Lyon Directeur de thèse
Dr. Marian Scuturici INSA de Lyon Co-Directeur de thèse
Ordering N°: 2010-ISAL-0052 Year 2010
Thesis
Mobility and Interest Aware Information Sharing in MANETs
Submitted to the National Institute of Applied Sciences (INSA de Lyon)
In fullfillment of the requirement for Doctoral Degree
Doctoral School INFOMATHS: « Computer Science and Mathematics »
(Affiliated Area: Computer Science)
Prepared by Addisalem Negash Shiferaw
Defended on 12 July 2010 in front of
the examination committee :
Prof. Sylvain Lecomte University of Valenciennes Reviewer
Prof. Jean-Marc Pierson University of Paul Sabatier-Toulouse 3 Reviewer
Prof. Ernesto Damiani University of Milan Examiner
Dr. Richard Chbeir University of Bourgogne Examiner
Dr. Dawit Bekele African Regional Bureau Internet Society Examiner
Prof. Lionel Brunie INSA de Lyon Supervisor
Dr. Marian Scuturici INSA de Lyon Co- Supervisor
Remerciements
Plusieurs personnes ont contribués et ont étendus leur aide précieuse dans la préparation et la réalisation de cette thèse. C’est un grand plaisir pour moi de saisir cette occasion d’exprimer ma gratitude pour tous.
Tout d'abord, je tiens à transmettre mes chaleureux remerciements à mon directeur de thèse, prof. Lionel Brunie, pour ses encouragements, ses conseils, son soutien inconditionnel et l’expérience qu’il me la transmise tout au long de ces années de doctorat, Son énergie perpétuelle et son enthousiasme dans la recherche ont rendu mon séjour dans le laboratoire agréable et enrichissant. En outre, il était toujours présent et prêt pour m'aider à surmonter les défis de la vie scolaire et sociale. Je voudrais également remercier sa famille pour l'hospitalité et l’accueil pendant mon séjour en France.
Je tiens également à remercier mon co-directeur de thèse, Dr. Marian Scuturici. Il a été toujours heureux d'interagir et de discuter de mes travaux de recherche et de fournir des conseils constructifs.
Mes remerciements vont également aux membres du jury qui ont accepté de rapporter et examiner ce travail. Je remercie Prof. Ernesto Damiani d’avoir accepté de présider le jury. J’exprime aussi ma gratitude à Prof. Jean-Marc Pierson et à Prof. Sylvain Lecomte qui ont accepté d’être rapporteurs. Je les remercie pour la lecture approfondie du mémoire et les nombreuses remarques pertinentes qu’ils ont formulés. Et enfin je remercie Dr Richard Chbeir et Dr. Dawit Bekele pour les questions très intéressantes qui ont contribué à approfondir ma réflexion.
Je reconnaissante à l’ambassade de France en Ethiopie pour avoir accepté de financer mes recherche et mon séjour en France. À cet égard, je ne veux pas passer sans parler de l'hospitalité que j'ai reçue du personnel du CROUS de Lyon. Je tiens aussi à remercier Dr. Dawit Bekele pour avoir facilité le processus administratif concernant ma bourse avec l’ambassade de France en Ethiopie.
Je tiens à remercier tout les membres de ma famille, surtout Negash Shiferaw, Aregash Mamo, Yalemzewd Negash, Yelewtfrie Negash et Helen Negash pour leurs encouragements et soutiens indispensables. J’ai de la chance d'avoir Shewangizaw Mengesha, mon fiancé, à mes côtés pendant les plus heureux et les plus tristes moments. Il a toujours été de mon coté et a consacré beaucoup de temps pour m'aider à résoudre les problèmes que j'ai rencontrés pendant mes études. Je n'oublierai jamais les soutiens et les aides de mes amis et collègues éthiopiens, y compris Dejene Ejigu, Elizabeth Addis, Fana Belay, Netsanet Mitiku, Girma Berhe et Rahel Kifle. Je voudrais aussi remercier la communauté Ethiopienne de Lyon qui a contribué de près ou de loin au succès de mon travail.
Je suis reconnaissante à mes amis Yaser Fawaz et Sonia Lajmi avec qui j'ai fait de très bonnes discussions scientifiques et nous avons passé des moments inoubliables tout au long de la thèse. Surtout, je n'oublierai jamais leurs soutiens dans des moments difficiles
tels que les deadlines d’articles, la rédaction de la thèse, etc. Par ailleurs, je voudrais remercier Faiza Najjar pour son soutien et ses conseils lors de l’identification de la problématique de recherche. Je tiens à remercier tous mes collègues et le personnel du LIRIS / INSA, surtout Valérie Lebey, Mabrouka Gheraissa, Talar Atéchian, Omar Hasan, Lyes Limam, Zeina Torbey, Armelle-Natacha Ndjafa-Yakou, Vanessa El-Khoury, Adel Ayara, Christian Vilsmaier, Tobias Mayer, Jingwei Miao, Sonia Ben Mokhtar, Nadia Bennani, Sylvie Calabretto et Elod Egyed-Zsigmond
Et enfin, le dernier, mais non le moindre, je tiens à remercier Dieu, que ton nom soit honoré et glorifié!
Addisalem Negash Shiferaw, 12 juillet, 2010, Lyon France
Acknowledgments Several people have contributed and extended their valuable assistance in the preparation
and completion of this thesis. It is a pleasure to convey my gratitude to them all in my humble acknowledgment.
First and foremost, I would like to forward my heartily thank to my supervisor, prof. Lionel Brunie, for his encouragement, guidance and unconditional support starting through out my doctoral study. Working with him permits me to have extraordinary and invaluable experiences through out the research work. His perpetual energy, intelligence and enthusiasm in research make my stay in the laboratory smoother and rewarding. In addition, he was always present and willing to help me to overcome academic and social life challenges. I would like also to thank his family for the hospitality that they have provided me during my stay in France.
I would like to thank my co-advisor Marian Scuturici. He was always delighted to interact and discuss my research work. He provides me with valuable ideas and concepts to realize my research.
My thanks also go to the examination committee members who have agreed to examine and review this research work. I thank Prof. Ernesto Damiani for accepting to chair to the examination committee. I also express my gratitude to Prof. Jean-Marc Pierson and Prof. Sylvain Lecomte who agreed to be reviewers. I am grateful for their thorough reading of the thésis and the pertinent remarks that they have pointed out. Finally, I thank Dr. Richard Chbeir and Dr. Dawit Bekele for posing very interesting questions that have helped me to deepen my reflection.
I owe so much thanks to French Embassy in Ethiopia, Addis Ababa for sponsoring my PhD study. In this regard, I do not want to pass without mentioning the hospitality that I have got from the staffs of CROUS de Lyon. I would also like to forward my special thanks to Dr. Dawit Bekele for facilitating the administrative process concerning my scholarship with the Embassy.
I would like to thank all members of my family, espically Negash Shiferaw, Aregash Mamo, Yalemzewd Negash, Yelewtfrie Negash and Helen Negash, for their indispensable encouragement and supports. I am thankful to have Shewangizaw Mengesha, my fiancé, in my side during the happiest and saddest times. He always got time to help me to resolve the problems that I encountered during my study. I will never forget the supports and helps of my Ethiopian colleagous and friends including Dejene Ejigu, Elizabeth Addis, Fana Belay, Girma Berhe, Netsanet Mitiku and Rahel Kifle. I want to use this opportunity to thank the Ethiopian communities in Lyon who have contributed in one or in the other way to the success of my research work.
I am grateful to my friends Yaser Fawaz and Sonia Lajmi with whom I made very good scientific discussions and had a wonderful time throughout my study. Above all, I will never forget their support in difficult times such as during proof readings of articles and preparation of the thesis manuscript. I would like to use this occasion to thank Faiza Najjar for her support and advice during the identification of the research problem. I would like forward my thanks to all colleagues and staffs of LIRIS/INSA, especially, Valérie Lebey, Mabrouka Gheraissa, Talar Atéchian, Omar Hasan, Zeina Torbey, Armelle-Natacha, Vanessa El-Khoury, Adel Ayara , Christian Vilsmaier, Tobias Mayer, Jingwei Miao, Sonia Ben Mokhtar, Nadia Bennani, Sylvie Calabretto and Elod Egyed-Zsigmond
Last, for not least, I would like to thank God, may your name be honored and glorified! Addisalem Negash Shiferaw, July 12, 2010, Lyon France
Résumé
Le partage d'informations au sein d'un réseau pair à pair mobile est devenu un sujet de
recherche important grâce aux progrès rapides des technologies de communication sans fil
et des dispositifs mobiles intelligents. Le partage d’informations, c'est mettre à disposition
des personnes avec lesquelles on est en contact des données afin de les visualiser, les
modifier ou les télécharger.
Les utilisateurs peuvent partager des informations d’ordre générale (par exemple, des
documents portant sur l’éducation ou le tourisme), des informations d’ordre personnel (par
exemple, des photos et des profils personnels), ou des émissions en direct (par exemple,
des émissions radio ou télévisé). Les informations à partager sont, généralement,
présentées sous la forme d'un fichier. Dans ce cas, le partage d'information peut être
considéré comme le partage de fichiers. Cette thèse traite, généralement, le problème de
partage de fichiers.
En général, les utilisateurs nomades communiquent en utilisant des réseaux sans fil
fournis par leurs fournisseurs d’accès (3G et bientôt 4G) ou des points d'accès publics
répartis dans la ville. Toutefois, les réseaux à infrastructures ne est pas toujours les plus
appropriés vue (i) leur indisponibilité partielle, par exemple, dans les moyens de transports,
dans la compagne, etc. (ii) leur coût potentiellement élevé en particulier pour le partage des
documents multimédia et (iii) la répartition de leurs points d’accès publics non uniforme.
Ainsi, les réseaux mobiles ad-hoc (MANETs) peuvent être une solution plus efficace dans
les endroits où l'installation d'une infrastructure est impossible. Dans un avenir proche, un
MANET sera plus puissant grâce à l’utilisation de la technologie de Wi-Fi direct.
L'objectif de nos travaux de recherche est de concevoir et d’implémenter un système de
partage d’information dans un environnement ad-hoc. Ce système permet aux utilisateurs
de partager les informations où et quand ils ont l'occasion sur MANET. La thèse se
focalise, particulièrement, sur les challenges liés à la mobilité et aux intérêts des
utilisateurs.
Dans un MANET, le partage de l'information est généralement effectué par la
distribution d’annonces et de requêtes. Afin d’éviter la surcharge de l'environnement avec
des annonces et des requêtes inutiles, il est important de concevoir une politique d’annonce
appropriée. Une politique d’annonce spécifie le volume d'informations à avertir, la période
après laquelle une annonce doit être relancée et le nombre de pairs maximum traversé par
une annonce. Elle doit considérer la consommation et la fourniture de l'information qui
sont liées au temps de connexion des utilisateurs (i.e., le temps qu'ils restent ensemble dans
un MANET) et à leurs contextes. Par conséquence, une politique d’annonce devrait être
paramétrée selon le temps de connexion des utilisateurs et leurs contextes.
Vu la quantité massive d’informations à partager, un contrôle/ filtrage de fichiers est mis
en place pour éviter la surcharge du réseau qui peut empêcher d’aboutir l'activité de
partage. En outre, l’interface minuscule des dispositifs mobiles n’est pas appropriée pour
parcourir tous les fichiers disponibles dans l’environnement. Par conséquent, nous
proposons que les fichiers partageables soient choisis en fonction des intérêts des
utilisateurs.
Dans cette thèse, nous proposons un middleware appelé SAMi pour permettre aux
utilisateurs nomades de partager l'information en fonction de leurs intérêts, les contextes et
leurs temps de connexion. Nous proposons une approche pour paramétrer les politiques
d’annonces en fonction des profils des utilisateurs et de leurs contextes. Le processus de
paramétrage est effectué semi-automatiquement par l'analyse des activités de partage
d’informations.
SAMi classe hiérarchiquement des fichiers et les présente dans une structure appelée une
arborescence de fichiers. Au cours du processus d’annonces, le middleware procède à un
annoncement des fichiers en utilisant soit (i) une description détaillée (situé à un niveau
profond dans l’arborescence des fichiers ou soit (ii) une descrition générale (située à un
niveau peu profond). Cette approche permet à un utilisateur de connaître le potentiel d'un
pair de fournir d'informations sans recevoir d’annonces pour chaque fichier partageables.
Ainsi, la diffusion d'une requête est limitée aux seuls pairs ayant le potentiel de fournir les
fichiers demandés.
Les utilisateurs peuvent spécifier leurs intérêts à recevoir ou à fournir des informations
de manière réactive. Les intérêts des utilisateurs peuvent également être automatiquement
déterminés en utilisant les règles d'associations. Ces règles associent les intérêts des
utilisateurs à leur contexte. Nous proposons également d'utiliser les réseaux sociaux pour
faciliter le processus d'identification d'intérêts.
SAMi a été testé dans deux environnements; un simulé et un autre réel en le déployant
sur des dispositifs mobiles reliés entre eux par Bluetooth. Les évaluations qui ont été faites,
nous ont permis de conclure que SAMi a un très bon potentiel pour aider les utilisateurs
nomades à partager l'information en fonction de leurs intérêts. Nos futurs travaux
importants sont liés à la gestion du contexte et la vie privée des utilisateurs.
Mots-clés: partage des données, sensibilité à la mobilité, sensibilité aux intérêts,
classification de fichiers, réactivité au contexte, informatique mobile, réseaux ad-hoc
Abstract
Mobile peer-to-peer information sharing has become an important research topic due to
the rapid advancement in wireless communication technologies and smart devices.
Information sharing is the practice of making information available for other individuals to
view, modify and download. Users may share general information (e.g, documents about
education and tourism), personal information (e.g, personal photos and profiles), or live
information (e.g., news being transmitted on the radio). The information to be shared is
usually presented in the form of a file. In this case, information sharing can be regarded as
file sharing. This thesis specially focuses on issues related to file sharing.
Nowadays, nomadic users usually communicate by using infrastructure-based wireless
networks provided by wireless telecommunication networks (3G and soon 4G) and public
hotspots distributed in the city. However, infrastructure-based wireless networks are not
always adequate because (i) there are places where no infrastructure-based wireless
network exists; (ii) it is costly to use telecommunication networks especially for
multimedia data and (iii) public hot spots are not uniformly distributed. Thus, an
infrastructure-less or a mobile ad-hoc network (MANET) can provide a more efficient
solution in the places where installing an infrastructure is not possible. In the near future, a
MANET will be more powerful with the usage of Wi-Fi direct.
The focus of our research is to build an information sharing system that allows users to
share information wherever and whenever they get the opportunity by using a MANET.
The thesis particularly focuses on the challenges related to the mobility and the interests of
users.
In a MANET, information sharing is usually performed by distributing advertisements
and queries. The preparation and the distribution of an advertisement are guided by an
advertisement policy. An advertisement policy describes the volume of information to be
advertised, the period after which an advertisement can be repeated and the number of
hops that an advertisement traverses. In order not to overload the environment with
unnecessary advertisements and queries, an advertisement policy should be prepared
according to the information consumptions and provisions of users. The information
consumptions and provisions of users are affected by their stay-time, the time that they stay
together in a MANET. Consequently, an advertisement policy should be parameterized
according to the users’ stay time. The users’ stay time is affected by their mobility patterns,
which are expressed by their speeds, movement directions and pause times.
Furthermore, users have a lot of information to share with each other. If files to be shared
are not controlled, the overloading of information will hinder the sharing activity.
Moreover, the input and the output facilities of mobile phones do not allow nomadic users
to browse all of the sharable files in the vicinity. Therefore, we argue that sharable files
should be selected according the users’ interests.
In this thesis, we propose an advertisement-based middleware called SAMi to allow
nomadic users to share information according to their interests, contexts and stay times.
We propose an information discovery approach, which is used by SAMi, to parameterize
advertisement policies according to users’ profiles and contexts. The parameterization
process is performed semi-automatically by analyzing users’ information sharing
activities.
SAMi classifies files hierarchically and presents them in a file tree. Files are advertised
according to users’ profile and context. During advertisements, the middleware advertises
files by using descriptions at the shallow and depth level of the file tree. This approach
permits a user to know the potentials of a peer in information provision without receiving
advertisements for each sharable-file. Thus, the dissemination of a query is limited only to
those peers having the potential to provide the required file.
Users can specify their interests to receive/provide information reactively. Users’
interests can also be automatically determined by using association rules, which associate
users’ interests with their context. We also propose to use the users’ social networks to
facilitate the interest identification processes.
SAMi has been deployed in a simulated environment. It has also been deployed over real
devices interconnected by Bluetooth. From the evaluations that have been made, we have
observed that SAMi has a very good potential to serve nomadic users to share information
according to their interests. Our important future works are related to context management
and privacy of users.
Keywords: data sharing, mobility awareness, interest awareness, classification of files,
mobile computing, context aware computing, ad-hoc network
List of Tables Table 2-1: Analysis of P2P systems .................................................................................... 27 Table 2-2: Summary of the information sharing systems designed for MANETs.............. 36 Table 2-3: Analyzes of information sharing system of MANETs....................................... 38 Table 2-4: Analyzes of service discovery protocols............................................................ 45 Table 3-1: Examples of sharing contexts ............................................................................ 59 Table 3-2: Examples of Information demands of Pascal..................................................... 60 Table 3-3: Examples of queries........................................................................................... 70 Table 3-4: Similarity values calculated by using the formula presented in Definition 3.2 . 72 Table 3-5: Example of execution flows during the decomposition of queries.................... 73 Table 3-6: Interests produced from queries listed in ........................................................... 74 Table 3-7: Historical data of Pascal .................................................................................... 77 Table 3-8: Tie-Strengths between Pascal, Anne, Bob and Eve ........................................... 79 Table 4-1: example of a mobility class................................................................................ 90 Table 4-2: Examples of sharing statistics ............................................................................ 92 Table 4-3: Range-lifetimes and advertisement volumes of classes..................................... 92 Table 4-4: Merging of mobility classes............................................................................... 94 Table 4-5: Pascal’s sharing statistics ................................................................................. 104 Table 5-1: Basic description of a file photo ...................................................................... 108 Table 5-2: Description of a cluster .................................................................................... 109 Table 6-1: The inputs of the test-bed................................................................................. 151 Table 6-2: Types of Environments .................................................................................... 152 Table 6-3: Constants for the query extraction algorithm................................................... 154 Table 6-4: Characteristics of information demands .......................................................... 155 Table 6-5: Characteristics of sharing-statistics.................................................................. 157 Table 6-6: Constants considered during rule mining evaluation....................................... 157 Table 6-7: Range-lifetimes of mobility classes designed for sharing context (“”,∅) ....... 158 Table 6-8: Parameters used classification algorithm in the first experimentation ............ 160 Table 6-9: Inputs for classification algorithm for the second type of experimentation .... 162 Table 6-10: Test data used during filtering advertisements .............................................. 164 Table 7-1: Comparing SAMi to existing information sharing systems............................. 172 Table B-1: Important activities to perform advertisement .................................................. xx Table B-2: Activities to extract and search information...................................................xxiii
Chapter 1 : Introduction
5
Chapter 1 Introduction
1.1 Background
1.1.1 Information Sharing
Information sharing is the practice of making information available for other individuals
to view, modify and download. Users may share general information like documents about
education and tourism. They may also share personal information like personal photos and
profiles. It is also possible to exchange live information like the one being transmitted on
the radio.
The information to be shared is often presented in the form of a file. In this case,
information sharing can be regarded as file sharing. The information to be shared can also
be presented as a stream of data. However, as the most popular information sharing
applications are based on file sharing, this thesis specially focuses on issues related to file
sharing.
Information sharing is accomplished via three activities: information discovery, delivery
and routing. These activities can be managed by using a centralized, a partially centralized
or a purely distributed architecture. In a centralized architecture, dedicated server(s)
manage(s) the sharing activities. In a partially centralized architecture, one or more
administrator peers are responsible for managing the information sharing activities. These
administrator peers can hold dedicated or non-dedicated devices. In a purely distributed
architecture, all peers are equal and they share responsibilities equally. In this thesis, we
consider a purely distributed architecture since finding administrator peers is difficult in a
MANET.
Finally, an information sharing system can be anonymous or social network based. In
anonymous systems, information sharing is performed without considering users’
Chapter 1 : Introduction
6
acquaintances. This feature characterizes old file sharing systems. Social network based
systems allow users to share information according to their social relationships. Especially
in MANETs where resources are limited, exploitation of social networking can facilitate
the collaboration of users.
1.1.2 Mobile Ad-hoc Networks
The initial step towards a MANET was the Packet Radio Network (PRNET). The
architecture of PRNET was quite close to the current view of a MANET. Indeed, a PRNET
comprises mobile terminals and mobile repeaters (prefiguring mobile routers). During the
1990s, a number of projects that were inspired by PRNET led to the development of ad-
hoc routing algorithms, and eventually led to the creation of the IETF MANET group. This
group focused mainly on routing algorithms with various goals but evolved to a broader
research scope. These days, various applications/services can be implemented on MANETs
Users, who are opportunistically co-located in places like airports, train stations, coffee
shops, pubs, malls, and highways, can use MANETs to share information instantly. Ad-hoc
networks can also be used for entertainment purposes like providing instant connectivity
for multi-user games.
Ad-hoc networks can be deployed to provide solutions to emergency services when the
existing network infrastructure ceased to operate or they were damaged due to some kind
of disaster like earthquakes, hurricanes, fire, and so on. Similarly, in a battlefield, a
MANET can be deployed to facilitate communications among the soldiers involved in the
field.
The following features [1, 2] characterize a MANET:
1. Mobility of nodes: The movement of peers cannot be controlled in a MANET.
Peers can move from location to location freely and hence, can leave and join the
network at anytime.
2. Lack of infrastructure: As the name implies, a MANET is an infrastructure-less
network. A message from a source peer to a destination peer goes through
Chapter 1 : Introduction
7
multiple peers due to the limited transmission radius. As there is no centralized
control, the network management should be distributed across peers.
3. Scarce resources: Wireless links have limited bandwidth and variable capacity.
In particular, peers participating in a MANET are battery-powered.
In summary, MANETs can provide solutions in situations where infrastructure-based
networks cannot be accessed due to their non-availability or cost. They can also be applied
to efficiently established communications between co-located users. However, the
characteristics of MANETs, i.e., mobility of peers, lack of infrastructure and scarcity of
computing resources create challenges on the usage of MANETs. Thus, in this thesis, our
goal is to design an information sharing middleware that works by considering these
characteristics of MANETs.
1.1.3 Advancement in Mobile Phones and Communication Technologies
Production of mobile devices, mostly cell phones, is increasing in an exponential
manner. The number of subscriptions reached 3.3 billion worldwide in October 2008.
Moreover, it is forecasted to be 5.32 billion by 2013 [3].
Mobile devices have become capable to store a number of files and to perform complex
computations that were only processed by personal computers. They are equipped with
wireless network technologies, sensors and applications; their storage capacities are
increasing each passing day; the processing power of mobile devices has been dramatically
improved. Cell phones’ battery life is also in a continuous improvement. Today, there are
devices that can serve more than 8 hours in active mode, i.e., talking without interruptions
[4].
The introduction of iPhone [5] drastically changed people’s view on cell phones. A lot of
applications and games have been produced for iPhones. These days, people are using their
iPhones to access emails and social network sites such as Facebook.
Chapter 1 : Introduction
8
Thesedays, most of the mobile phones are equipped with short range wireless
communication technologies. In most cases, either Bluetooth or WiFi technology is
integrated [6] with them.
Bluetooth [12] allows devices to communicate over short distances at moderately fast
transmission speeds. Bluetooth provides a wireless point-to-point network for PDAs,
notebooks, printers, mobile phones, audio components, and other devices. The standard
frequency band for Bluetooth is in the 2.400 GHz to 2.483 GHz (83 MHz). Typically,
devices with Bluetooth technology have a range of 10 meters to 100 meters, and data
transfer rates up to 3Mbps. One or more Bluetooth enabled devices forms so called a
piconet. In a Bluetooth piconet, one master can communicate up to 7 active slaves, while
there can be some other up to 248 devices which are in sleep mode (they may participate to
communication actively when an active device goes into sleep mode). Multiple
independent piconets can form a scatternet. In a scatternet, some slaves are used as a
bridge by participating two or more piconets. In Bluetooth scatternets, the number of
devices is not limited.
In 1997, IEEE ratified the 802.11 WLAN standards, establishing a global standard for
implementing and deploying WLANs. IEEE 802.11, which is currently obsolete, had a
throughput of 2 Mbps. Today's WiFi devices, based on IEEE 802.11a and 802.11g, provide
transmission rates up to 54 Mbps [7]. A new standard called IEEE 802.11n [7] that can
support up to 600 Mbps is being standardized. Wi-Fi devices communicate with each other
with the help of a controller-device known as a wireless access point or "hot spot". Hot
spots usually combine three primary functions; physical support for interfacing wireless
and wired networking, routing between devices on the network and service provisioning to
add and remove devices from the network. The Wi-Fi Alliance is nearing completion of a
new specification, named Wi-Fi Direct, to enable Wi-Fi devices to connect to one another
without wireless access points [8]. It allows devices equipped with Wi-fi communication
technology (IEEE 802.11a, 802.11g or IEEE 802.11n) to get involved in an ad-hoc
network by embeding a software access point into these devices.
Chapter 1 : Introduction
9
ZigBee is a low-power, low-cost, low-rate, short-range wireless technology. It is built on
top of the IEEE 802.15.4 WPAN standard [9]. ZigBee radio operates within three different
frequency ranges, 868MHz, 915MHz, and 2.4GHz, and supports data rates of 250kbps
[10]. ZigBee protocols are intended for use in embedded applications requiring low data
rates and low power consumptions. ZigBee's current focus is to define a general-purpose,
inexpensive, self-organizing mesh network that can be used for industrial control,
embedded sensing, medical data collection, smoke and intruder warning, building
automation, home automation, etc.
The maturities of communication and computing technologies indicate the feasibility of
MANETs to allow mobile devices to communicate with each other anywhere and anytime.
Thus, in our thesis, we give more emphasis to mobile phones. We do not consider any
specific communication technology in our information sharing middleware. However,
Bluetooth is considered during the evaluation of the middleware.
1.2 Motivation and Requirements
1.2.1 Scenario
The following scenario will be used to discuss the requirements of an information
sharing system in MANETs. We will also use this scenario to discuss our propositions
through out the thesis.
Pascal, a first year Ph.D. student at INSA, uses MANETs to exchange information in
different locations. In a bus, his PDA connects with devices of fellow passengers via
wireless network technologies. Passengers advertise sharable files to others in their
surrounding. Pascal usually browses the advertisements that he has received in order to
discover the files that he is looking for. If he does not find the files that he needs, he
formulates queries expressing these files. The required-files are, then, searched by
P21)= cos(P2,P12)=0.39 and hence, Similarity(S1,S2)=0.39; therefore, S1 and S2 are not
similar.
Definition 3.7. Aggregation of sharing interests: The aggregation of a set of sharing-
interests is used to extract the common features of the users’ sharing interests. The
aggregation of a set of sharing interests T = {S1, S2, … ,Sn}, denoted as ⊕∑Si, is computed
by using the following two steps
Step 1: Decompose interests
Let TNI be a set all non-empty interests in the sharing interests to aggregated. The
interests in TNI are decomposed into GTNI = {T1, …, Tm} such that the interests in the same
group are more similar than the interests in different groups. We propose to perform the
decomposition of non-empty interests by using a method4 derived from the agglomerative
hierarchical clustering approaches [88].
4 Grouping of interests are performed in the same way as grouping of queries. The algorithm proposed to group queries is discussed in
section 3.4.1.
Chapter 3 Interest Awareness
64
More specifically, the sets in GTNI satisfy the properties:
Ip, Iq ∈ Tk ∈ GTNI ⇒ Ip ≈ Iq
Tk , Ts ∈ GTNI and k≠s ⇒ ∃ Ip∈ Tk and ∃ Iq ∈ Ts such that Ip !≈ Iq
Let Sim(T,Iq) be the interests in Ts similar to Iq ; for Iq ∈ Tk, the
following property holds true
s
ITSimIpq
k
ITIpq
TTqspqkp
IISimilarityIISimilarity ∑∑ ),(),(∈−∈
≥−
),(}{
1
Step 2: Identifying interests in ⊕∑Si
From each Tk, an interest Ik is computed in such a way that
• TIWeightIWeightkTI
k /)()( ∑∈
=
• ∩kTI
k II∈
= )(nDescriptio)(nDescriptio
As discussed in Definition 3.5, every interest in a sharing interest should have a weight
greater than minW (predefined threshold introduced in Definition 3.5). Consequently, the
interest Ik is added in ⊕∑Si, if Weight(Ik) ≥ minW.
Finally, according to Definition 3.5, the sum of the weight of interests in a sharing
interest should be one. Let SumNI be the sum of the weights of the non-empty interests in
⊕∑Si. If 1-SumNI≥ minW, an empty interest Ie is added in ⊕∑Si such that weight(Ie) = 1-
SumNI. The weights of each interest I in ⊕∑Si will be normalized using the formula below
if 1-SumNI < minW but SumNI <1.
Weight(I) = Weight(I) ÷ SumNI
Chapter 3 Interest Awareness
65
3.3 Interest aware Information Discovery
In a MANET, information discovery can be performed by using two approaches: push
and pull. In a push approach, data sources make others aware about their sharable files by
disseminating advertisements; in a pull approach, a requester peer searches the source of a
file by distributing queries. As discussed in section 3.1, both approaches should be
conducted according to the interests of users. Thus, data-sources should prepare and
disseminate advertisements about their sharable files according to the information demands
of data-requesters. Similarly, data-requesters should resolve queries according to the
information provisions of data-sources.
When joining a MANET, data-sources and data-requesters distribute their interests to
provide and receive information in their vicinities. Let P be a set of peers in the MANET
about which a peer p is aware of; we propose that the peer p estimates the overall demand
of peers in P and their overall provision by using the aggregation operation described in
Definition 3.7, i.e.,
∑∈
⊕=Pp
Demand(p)-nInformatioDemand(P)-Overall
∑∈
⊕=Pp
p)Provision(-nInformatioP)Provision(-Overall
Let Sod be the overall demand of the requester peers in a MANET. A data-source peer in
this MANET should preferably advertise files matching with the interests of the overall
demand Sod. Let Adv-Volume5 be the number of metadata that the data source can use to
advertise sharable files. Let N(I) be the number of metadata that it can to advertise files
matching the interest I∈ Sod. N(I) is computed as:
N(I)= Weight(I) * Adv-Volume
5 We will discuss the computation of Adv-Volume in the next chapter.
Chapter 3 Interest Awareness
66
Let F(I) be a set of files matching the interest I∈ Sod and ADV(I) be a container used to
store advertisements related to the interest I. The data source selects at maximum N(I) files
from F(I) and puts their metadata in ADV(I) via Algorithm 3-1.
For each non-empty interest I, Algorithm 3-1 places the files matching the interest I in
F(I) (lines 1 to 3). For an empty interest Ie, F(Ie) is filled with the files that do not match
any of the non-empty interests (lines 4 to 6). If the advertisement quota for I, i.e., N(I), is
sufficient to advertise all the files in F(I), the metadata of each file in F(I) is placed in
ADV(I) (lines 8 to 9). Otherwise, for each non-empty interest I, some of the files in F(I) are
selected to be advertised according to their similarities (relevance) to the interest (lines 10
to 11). If f1 and f2 are sharable files in F(I) such that Similarity(f1,I) is greater than
Similarity(f2,I), f1 is said to be more relevant to I than f2. In this case, f1 will have more
chance to match users’ need. Therefore, this file is privileged by the data source peer to be
included in ADV(I) than the other file. If the interest I is an empty interest and the
advertisement quota of I is not enough to advertise all the files in F(I), some of the files in
F(I) are randomly selected to be advertised (lines 12 to 13).
For any interest I, the dissemination of ADV(I) is performed according to: (1) the
direction of peers having information demand matching the interest I or/and (2) the degree
of collaboration between the data source peer and the peer to which the advertisement will
be forwarded. An information demand S matches an interest I if and only if ∃ Ii ∈ S such
that Ii is similar to I.
The Tie-strength notation, described in Definition 3.3, is used to calculate the degree
of collaboration between two peers. Let min-tie be the minimum tie between a data source
and a peer that has a chance to receive the advertisements. A peer p is said to have a high
degree of collaboration with the data source peer ps if and only if
Tie-strength(ps,p )≥ min-tie
Chapter 3 Interest Awareness
67
Algorithm: Advertisement message preparation Input : Sod, N(I)∀ I ∈ Sod, F, Sod: overall demand F: sharable files N(I): advertisement quota of an interest I Output : ADV(I) ∀ I ∈ Sod ADV(I): metadata to be advertised w.r.t an interest I Begin //put files matching a non-empty interest I in F(I) 1. For any I ∈ Sod | description(I) ≠ ø 2. F(I) {f | f ≈ I and Similarity(f,I) ≥ Similarity(f, Ij) for∀ Ij ∈ S } 3. End For
/* put all files that are not similar to any of the non empty interests in F(Ie) where Ie is an empty interest*/
4. If ∃ Ie ∈ Sod | description(Ie) =ø 5.
{ }∪
eISIIF
−∈← )(-F )F(Ie
6. End If 7. For any I ∈ Sod 8. If (|F(I)|<N(I)) 9. ADV(I) {metadata((f)|f ∈ F(I)} 10. Else if (description(I) ≠ ø)
/* Relevant (F, I, n): contains the n most relevant(similar) files to the interest I in F*/
11. ADV(I) {metadata(f)|f ∈ Relevant(F(I) I, N(I))} 12. Else
//Random (F,n): contains n files which are taken randomly from F 13. ADV(I) {metadata(f)|f ∈ Random (F(I), N(I))} 14. End if 15. End for End Algorithm
Algorithm 3-1: Advertisement message preparation
Chapter 3 Interest Awareness
68
The data source forwards ADV(I) to direct neighbors located in the direction of peers
having information demand matching the interest I. The method proposed by LAR [80]
(described in chapter 2) is used to select neighbors according to their locations. The data
source peer ps also forwards ADV(I) to his/her direct neighbors having a high degree of
collaboration with him/her. A peer accepting the advertisement forwards the advertisement
in a similar fashion.
Example: Assume that min-tie be 0.6; in the MANET displayed in Figure 3-2, p1
advertises to p3, p4 and p5 since p3 is interested on the advertisement, p4 is located in the
direction of peers interested on the advertisement and p5 has a high degree of collaboration
with p1.
Collaboration with α Tie-strength(files/day)
Advertisemetn flow 0.25 0.75
0.6 0.9
0.005
0.09
0.4
p4
ADV(I)=sharable files matching with I ∈ Overall-
Demand(P)
Ii3∈ Information -Demand(p3) | I ≅ Ii3
Overall Demand=⊕Σinf-Demand(P)
Ii7∈Information-Demand(p7) | I≅ Ii7
2
Ii6∈ Information -Demand(p6) | I≅ Ii6
p7
p6 p5
p8
p9
1
0.25 Wireless communication
α
p3
p1
p2
min-tie 0.8
Figure 3-2: Advertisement Distribution by p1
Chapter 3 Interest Awareness
69
As discussed in Definition 3.3, the peer p5 computes the tie-strength between p1 and his
neighbors by taking the average of his degrees of collaborations towards p1 and his
neighbors. Thus, Tie-Strength(p1,p9) = (0.9+0.6)/2=0.75 and Tie-Strength(p1,p8)=
(0.9+0.7)/2 = 0.8. As min-tie is 0.8, p5 forwards the advertisement of p1 to p8 but not for to
p9.
A peer uses advertisements to identify potential sources of interesting files. It can also
search a file by distributing queries. A query q is resolved if there is a data-source having
an information provision matching with q. Let S be an information provision; we say S
matches that the query q if and only if ∃I ∈ S such that I ≈ q.
We propose to disseminate queries in the same way as advertisement dissemination. A
data-requester forwards a query to its neighbors located in the direction of a data-source
having an information provision matching with the query. The peers receiving the query
forward it to some of their neighbors in a similar way. The query can be forwarded up to a
fixed number of hops.
The discovery approach introduced in this section performs file discovery by combining
a push method (i.e., the distribution of advertisements) and a pull method (i.e., the
distribution of queries). Both methods are performed according to the interests of users. In
addition to the interests of users, the users’ patterns of collaboration are considered during
the advertisement dissemination.
3.4 Interest Identification
3.4.1 Interest Identification from Historical Data
The users’ interests can be specified by themselves reactivelly. In our scenario, Pascal
can state that he is interested in receiving jokes in Bus 37. The precise interests of users
can also be computed automatically from queries and advertisements.
Chapter 3 Interest Awareness
70
An information demand of a peer (i.e., the interests of the peer to receive information)
can be identified from his/her historical queries. Let Q be the set of queries distributed by a
peer p in a MANET in a context c. We propose to identify Information-demand(p, c) from
Q using the following two steps.
Step 1: Decomposing queries
Queries in Q are classified into different groups by using Algorithm 3-2, which is
derived from the agglomerative hierarchical clustering approach [88]. We will use the
queries in Table 3-3 and their similarity in Table 3-4 to illustrate Algorithm 3-2.
Table 3-3: Examples of queries
Query Descripition
q1 {tree, bush, grass, sidewalk}
q2 {tree, bush, sidewalk}
q3 {tree, bush, grass, ground}
q4 {tree, bush, grass, sidewalk, rock}
q5 {tree, bush, flower, grass}
q6 {clear, sky, tree, bush, ground}
q7 {overcast, sky, tree, bush, grass, sidewalk}
q8 {tree, grass, sky}
q9 {tree, grass, clouds, sky}
Chapter 3 Interest Awareness
71
Algorithm: Decomposition of queries Input : Q Q: a set of historical queries. Output: G: G: a subset of the power set of Q such that queries in the same element of G are more similar than queries in other elements of G. Begin
//initialize grouping 1. G=∅ 2. For all q∈Q 3. G= G ∪ {{q}} 4. End for 5. Repeat
//merge two similar sets of queries 6. Gnew ∅ 7. While (G !=∅) 8. Qc randomly selected element of G 9. G G – {Qc}
/*search a set Qk such that every element in Qc is similar to every element in Qk*/ 10. If (∃Qk∈G such that ∀qi∈ Qc , ∀ qj ∈ Qk , qi ≈ qj && for any Qs ∈ G, one of the
following property occurs) //There are dissimilar elements in Qc and Qs (A) ∃qi∈ Qc , ∃qj ∈ Qs such that qi !≈ qj //or Qc is more similar to Qk than Qs
(B)
Similarity(qi,q j )q j ∈Qk
∑qi ∈Qc
∑Qc * Qk
≥
Similarity(qi,q j )q j ∈Qs
∑qi ∈Qc
∑Qc * Qs
11. G G – {Qk} 12. Gnew Gnew ∪ {Qc ∪ Qk} 13. Else 14. Gnew Gnew ∪ {Qc} 15. End if 16. End while 17. G Gnew 18. //Repeat the above computations until any two sets contains dissimilar queries 19. Until: ∀Qc , Qk ∈G, ∃qi∈ Qc , ∃qj ∈ Qk such that qi!≈ qj End Algo
Algorithm 3-2: classification of Description
Chapter 3 Interest Awareness
72
Table 3-4: Similarity values calculated by using the formula presented in Definition 3.2
where similarity threshold is 0.5
q1 q2 q3 q4 q5 q6 q7 q8 q9
q1 1 0.9 0,8 0,9 0,8 0,2 0,6 0,6 0,3
q2 0,9 1 0,6 0,8 0,6 0,3 0,5 0,3 0
q3 0,8 0,6 1 0,7 0,8 0,5 0,4 0,6 0,3
q4 0,90 0,8 0,7 1 0,7 0,2 0,6 0,53 0,2
q5 0,75 0,6 0,8 0,7 1 0,2 0,4 0,6 0,3
q6 0,20 0,3 0,5 0,2 0,2 1 0,6 0,3 0,2
q7 0,60 0,5 0,4 0,6 0,4 0,6 1 0,5 0,4
q8 0,60 0,3 0,6 0,5 0,6 0,3 0,5 1 0,6
q9 0,20 0 0,2 0,2 0,3 0,2 0,4 0,7 1
Algorithm 3-2 starts the decomposition process by forming sets of queries such that each
of the sets contains one query (lines 1-4) and places them in G. In our example, G = {{q1},
{q2}, {q3}, {q4}, {q5}, {q6} {q7}, {q8}, {q9}}.
As described from lines 10 to 12, the algorithm merges two sets. Qc is merged with Qk
∈G if and only if (i) Any two elements in the two sets are similar and (ii) If there is another
set Qs in G satisfying the condition stated above (i.e., in (i)), Qc is more similar to Qk than
Qs. Merging of groups of sets is repeatedly performed until there are no similar sets in G (.
According to the example execution flows of the algorithm in Table 3-5, the queries in
our example are decomposed into G= {Q1, Q2, Q3} where Q1 = {q1, q2, q3, q4, q5} and Q2 =
{q6, q7} and Q3 = {q8, q9}.
Chapter 3 Interest Awareness
73
Table 3-5: Example of execution flows during the decomposition of queries
Association rules can be modified in order to increase their confidences. For instance, in
order to make the confidence of the above rule equals to 1, we can simply modify the
antecedence as <context = (Bus 27, [8AM-8:10AM]>.
7 To produce a rule, we should compare the confidence and the support of a rule with predefined constants. For the sake of simplicity, we skip this step in the illustration.
Chapter 3 Interest Awareness
78
In this section, we discuss mining of association rules with respect to the users’
information demands. The information provisions of users can be computed in the same
manner.
3.5 Social Networking
A requester peer can use association rules to identify his/her information demands. A
data source peer can produce association rules to identify information demand of requester
peers. However, rule-mining processes are too expensive to be used for every requester
peer encountered in a MANET. Therefore, a data-source should select important peers to
which association rules are produced. We propose that social links of a data source can be
used to identify the important peers.
As a data-source can have several social links, the reasoning required to identify the
interests of requester peers could be expensive. We propose organizing peers that have a
habit of sharing information with a data source peer into social groups according to the
similarity of their interests. The social groups are then used to identify the interests of the
peers.
Social networks, which include social groups and links, are computed semi-
automatically based on the following assumption: “social network exists between users
who collaborate frequently with each other”. In the scenario discussed in section 3.1,
Pascal, David and Carol exchange jokes whenever they meet in Bus 37. These frequent
collaborations between these persons indicate that there is a social link between them.
3.5.1 Social Link
A social link, denoted as L(pi,pj), is a relationship between two peers pi and pj. The
notation Valid-context(L(pi,pj)) is used to represent the set of sharing contexts in which the
social link L(pi,pj) is valid. In our scenario, Pascal exchanges information with his
colleagues and assume that he communicates with these persons only in INSA. Thus, he
has social links to his colleagues and (INSA, ∅) is the valid context of these social links.
Chapter 3 Interest Awareness
79
As it is done in social-network-sites (e.g. Facebook), a user can manually specify
his/her social links. In the scenario, Pascal, David and Carol are friends. Assume that
Pascal specifies David and Carol as his friends; in this case, there will be a social link
between Pascal and David as well as between Pascal and Carol.
A social link can also be computed semi-automatically by analyzing the degree of
users’ collaboration in MANETs. We argue that the existence of a high degree of
collaborations between p1 and p2 indicates the existence of a social link between these two
peers. As discussed in section 3.3, the threshold of Tie-strength (i.e., min-tie) indicates a
high degree of collaboration between peers. As a result, a social link L(p1, p2) is formed if
and only if Tie-strength(p1, p2) ≥ min-tie. The context (“”,∅) is placed in Valid-
context(L(p1, p2)). Users can also specify the valid contexts of L(p1, p2).
In the scenario discussed in section 3.1, Pascal, Anne, Bob and Eve exchange
information in Bus 37. Suppose they did not specify the fact that they are friends. Their Tie
strengths, which are measured by files/day, are listed in Table 3-8. Let min-tie be 0.1
files/day; we can conclude that there are social links between Pascal and Eve and Anne has
social links with all the mentioned persons.
Table 3-8: Tie-Strengths between Pascal, Anne, Bob and Eve
Pascal Anne Bob Eve
Pascal 0.5 0.001 1
Anne 0.5 0.3 0.6
Bob 0.001 0.3 0.04
Eve 1 0.6 0.04
Chapter 3 Interest Awareness
80
∈
3.5.2 Social Grouping
A social group, denoted as G(P,C,p), is a set of peers in P having similar links with a
peer p in a context c ∈ C; the peer p is called “observer peer” and C is a set of valid
contexts for the group. Demand-In-Group(G(P,C,p),c) denotes the common information
demand of peers in P as observed by p in a context c. The social group G(P,C,p) satisfies
the following properties:
• Information-Demand(pi,p,c) ≈ Information-Demand(pj,p,c) for pi, pj P and
c∈C; and
∈
• Demand-In-Group((G(P,C,p),c) is an empty-sharing interest if c ∉C.
The common information demand of peers in the social group G(P,C,p) in the context c
C is obtained by aggregating the information demands of peers in P, i.e.,
10 For the sake of simplicity, we consider mobility classes defined for the same contexts. However, it is simple to extend the operation to
consider similar contexts.
Chapter 4 Lifetime Awareness
96
The advertisement policies, the satisfaction factors and the overload factors of the
resulted mobility classes are computed by using operation 4.1.
4.3.2 Computation
We propose a method to generate mobility-classes for a data source peer ps in a given
context according to the following objective. Mobility classes should be formed in such a
way that (1) the network traffic created by advertisements is manageable, (2) the discovery
of information is facilitated, and (3) the number of queries to be distributed is reduced.
Assume that a peer ps has just started sharing information in a sharing-context c. If there
is a peer pi such that the information provisions of ps and pi are similar, ps can use the
characteristics of the mobility classes of pi to define its own mobility classes. Otherwise, a
mobility-class m(ps,c) with range-lifetimes(m(ps,c))= [0, ∞) and adv-volume(m(ps,c))=0
will be used as the only mobility class. In this case, data-requesters discover the sharable
files of ps by using the pull discovery approach in the context c.
We propose to enhance the efficiency of mobility classes by using three types of
heuristics: optimistic, pessimistic and neutral. The optimistic heuristics modifies mobility
classes based on the assumption: “inefficiency occurs since (1) the advertisement volume is
too limited to include the files needed by users or/and (2) the advertisement radius is too
short to reach the potential users.” The pessimistic heuristics performs modification based
on the assumption: “the inefficiency of the mobility class occurs due to an over estimation
of the advertisement volume.” The neutral heuristics does not make any assumption but
tries to enhance the efficiency of the mobility classes by merging/dividing them as well as
by using the behavior of the mobility classes computed by similar peers.
The optimistic heuristics increases the volume of advertisement while the pessimistic
heuristics reduces the volume of advertisement. Let [t1, t2) be the range-lifetimes of a
mobility-class m(ps,c). Let ∆ and α be pre-defined incremental factors of the advertisement
volume and of the period respectively. Let β be the highest number of hops that an
Chapter 4 Lifetime Awareness
97
advertisement can traverse. The pseudo-codes in Figure 4-1 and Figure 4-2 are used to
augment and reduce the total number of metadata distributed in a mobility-class m(ps,c).
1. If adv-volume(m(ps,c))+ ∆ ≤ Overload-Factor (m(ps,c)) then
adv-volume(m(ps,c))+=∆
2. if Adv-radius(m(ps,c))< β then Adv-radius(m(ps,c))++
3. if adv-period(m(ps,c)) > α then adv-period(m(ps,c))-= α
Figure 4-1. Augment-Volume
1. if adv-volume(m(ps,c)) ≤ ∆ && (Adv-radius(m(ps,c))≤ 1) &&
adv-period(m(ps,c))+α (t≥ 2- t1) then adv-volume(m(ps,c))=0
2. if adv-volume(m(ps,c))>∆ then adv-volume(m(ps,c))-=∆
3. if Adv-radius(m(ps,c))>1 then Adv-radius(m(ps,c))--
4. if adv-period(m(ps,c))+α <(t2- t1) then adv-period(m(ps,c))+= α
Figure 4-2. Reduce-Volume
In Algorithm 4-1, we use the neutral heuristic as long as possible. However, the neutral
approach cannot be applied if the operations “Merge”, “Copy-adv” and “Divide” cannot be
performed. In this case, we propose the application of the optimistic heuristic.
Chapter 4 Lifetime Awareness
98
Algorithm: Mobility Class Computation Input: ps, M(p,c) for all p∈ PA∪ {ps}, Inf-Pr(p) for all p∈ PA∪ {ps} ps : data source peer in consideration M(p,c): mobility classes of the peer p in the sharing context c PA: set of peers having similar information provision with ps in the context c and have a high degree of collaboration with ps. Inf-Pr(p): denotes Information-Provision(p) Output: M(ps,c) Begin 1. For each mi(ps,c) ∈ M(ps,c) such that S(mi(ps,c)) ≠∅ 2. If (Efficient(mi(ps,c))) 3. #opt-mod(mi(ps,c)) 0 4. Else
//Reduce: Pessimistic Heuristics 23. Case 5: Usage-Factor(mi(ps,c)) < Satisfactory-Factor(mi(ps,c)) 24. Reduce-Volume(mi(ps,c)) 25. End Case 26. End For End Algorithm
Algorithm 4-1: Mobility Class Computing
Chapter 4 Lifetime Awareness
99
Let #opt-mod(m(ps,c)) be the number of consecutive optimistic-modifications made on
a mobility class m(ps,c) and opt–limit be the maximum number of times that an optimistic
heuristic can be consecutively applied. The optimistic heuristic is said to be failed if #opt-
mod((m(ps,c)) is equal to opt-limit. A pessimistic heuristic is applied when the optimistic
heuristic failed to work.
Algorithm 4-1 is used to enhance efficiency of the mobility classes of ps in a context c.
The algorithm accepts as inputs mobility classes of the peer and of the set of peers that
have similar information provisions as ps and have a high degree of collaboration with this
peer. The algorithm processes only mobility class that has been applied in the history
because of the following reasons: (1) efficiency of the mobility class is defined based on
historical observations and (2) it is not important to process a mobility class that has never
been used.
Let PA be the set of data-sources such that pi ∈ PA satisfies the following properties:
A habit rule has two parts: an antecedent context (e.g., <Context=(Bus 3, [8AM-
8:10AM])>) and consequent mobility class (e.g., <mobility class=m3>). As discussed in
chapter 3, antecedences can be produced by a method derived from those proposed in [89]
and [90, 91]. Let minConf be the minimum threshold confidence of a rule and S(ant) be the
set of sharing statistics matching the antecedence ant. Assume that there is a mobility class
m(p,c) such that ant is an actual sharing context of the abstract context c. According to
Definition 4.6, S(m(p,c)) is the set of sharing statistics matching the mobility class m(p,c).
A rule ant <mobility class=m(p,c)> is formed if and only if
minConfS(ant)
c))S(m(p,S(ant)≥
∩
In the scenario presented in chapter 1 and in section 3.1, Pascal exchanges information in
a bus with friends and people working in a bank. Let m(Pascal, (Bus,φ )) be a mobility
class with range-lifetimes [0, 6). Let’s observe the sharing statistics of Pascal displayed in
Table 4-5 and the antecedent <context = (“Bus”,[8AM-8:05AM])>.
The contexts attached to sharing statistics in Table 4-5 are the actual contexts of the
abstract context (Bus,φ ). The sharing-statistics s1, s2, s4, and s5 have a connectivity-
Chapter 4 Lifetime Awareness
104
lifetime in the rang-lifetimes of the mobility class m(Pascal, (Bus,φ )), i.e., less than 6.
Thus, the rule below can be formed from the above historical data11.
<context=(“Bus”,[8AM-8:05AM])> <mobility class = m(pascal, (Bus,φ ))>
Table 4-5: Pascal’s sharing statistics
Sharing
statistics
Context Co-lifetime
s1 (Bus 27 ,[8AM-8:07AM]) 5
s2 (Bus 27, [8AM-8:06AM]) 5.9
s3 (Bus 37, [8AM-8:05AM]) 7
s4 (Bus 27, [8AM-8:07AM]) 5
s5 (Bus 27, [8AM-8:05AM]) 2
s6 (Bus 37, [8AM-8:07AM]) 7
In general, association rules can be used to estimate the mobility class of a MANET view
according to the actual sharing contexts. These rules can be produced by mining contextual
patterns in the historical sharing-statistics and the similarity of the connectivity lifetimes
attaches with these historical data.
4.5 Conclusion
Information discovery in a MANET can be performed by using the pull approach (via
querying) and by using the push approach (via advertisements). To maximize the usage of
the push approach, we introduce a novel concept called a mobility-class that parameterizes
the advertisement policy according to users’ stay-times and their context. Mobility classes
can also be computed semi-automatically by using the approach proposed in this chapter.
Peers can determine the current mobility classes by analyzing their stay-times or by using
habit rules. 11 To produce a rule, we should compare the confidence and support of a rule with predefined constants. For
the sake of simplicity, we skip this step in the illustration.
105
Chapter 5 File classification and Organization
In the previous two chapters, we have presented interest-aware and lifetime-aware
information sharing methodologies. In these two chapters, sharable files are advertised by
disseminating their metadata. This kind of advertisement can impose a high burden on
devices. In this chapter, we propose an algorithm that organizes sharable files in a tree,
named a file tree, so that files can be advertised briefly or in detailed by using their
organization in the tree.
The research work in this chapter has been published in the International Journal on
Computer Science and Information Systems [103] and has been presented in the
conference on Pervasive Computing and Communications Workshops (PerComW 2010)
[85].
This chapter is organized as follows: we discuss our motivation in section 5.1; file
representation and organization are covered in section 5.2; classification of files is
presented in section 5.3; we illustrate the application of file trees in file discovery in
section 5.4; section 5.5 discusses the main contributions presented in this chapter, finally,
we conclude this chapter in section 5.6.
Chapter 5 File Classification and Organization
106
5.1 Motivation
Organizing files into a tree facilitates the presentation of files and minimizes the load of
peers involved in information sharing. According to the scenario presented in chapter 1,
Pascal exchanges photos with his friends. Let’s consider the MANET displayed in Figure
5-1 where Pascal is connected to Bob, Carol, David and Eve. Assume that Pascal is
interested in receiving photos of vegetables and the other participants of the MANET are
interested to provide photos about vegetables. Suppose that the advertisement quota is 2
and the forwarding factor is also 2. As a result, each of the participants of the MANET can
advertise two of vegetables’ photos to Pascal.
Assume that Pascal wants to receive a photo about Jerusalem artichoke. The
advertisement quota is too small to indicate the peers owning the photo that Pascal is
looking for. Most probably, he will be forced to search the photo by distributing a query.
As the forwarding factor is small, he will need to distribute the query repeatedly. This type
of file discovery will make the environment overloaded with queries and will take time to
satisfy users’ information needs.
Now, assume that participants of the MANET organize their files in file-trees as shown
in Figure 5-2. Bob informs Pascal that he has photos of tubers and seeds. Other participants
advertise files in a similar fashion. Pascal knows that Jerusalem artichoke is a tuber
vegetable. As consequence, he learns that Bob has the potential to provide the required
photo. Therefore, Pascal decides to send the query only to Bob.
Organizing files in a tree permits users to advertise files at a high level. This kind of file
advertisement allows users to know the potential of a peer to provide the required files and
so to limit the dissemination of his/her queries to potential peers.
In this chapter, we introduce a concept called “cluster” that organizes files hierarchically.
In other words, a cluster represents group of files or group of other clusters. We then
propose an algorithm that classifies files into clusters.
Chapter 5 File Classification and Organization
107
?
??
? 4
4
1 1
1 1
3
Cabagge Carrot
Potato Bean
Cabagge Onion
Tomato Broccoli
Pascal
Searching photo of Jerusalem artichoke
Carol David
Eve Bob
2
Task order Advertisement
Query
Adv-Volume 2
Forwarding factor 2
?
α
Figure 5-1: Query resolution via advertisements about individual files
?3
1 1
1 1leaves roots
tubersseeds
bulbs leaves
fruits flowers
Pascal
Searching photo of Jerusalem artichoke
Eve
2
Advertisement Query ?
Adv-Volume 2 Forwarding Factor 2
David Carol
Bob
Vegetable
leaves roots
Vegetable
tubers seeds
Vegetable
bulbs leaves
Vegetable
fruits flowers leaves
Figure 5-2: File organization and Query resolution
Chapter 5 File Classification and Organization
108
5.2 Information Representation
Files are represented via their metadata. Metadata of a file are composed of basic
metadata and specialized metadata. Basic metadata are described in Table 5-1. The
attributes FileID is assigned by using a sequential number and the Mac address of the
device where the file is created.
Table 5-1: Basic description of a file photo
Attributes Descriptions
FileID unique identifier of the file
Description list of keywords that describes the file
FileSize the size of the file
Specialized metadata depend on the type of the file. As displayed in Figure 5-3, a
specialized metadata of a photo, for example, can contain objects of interest identified in
the photo and spatial/temporal context of the photo snapshot.
< ?xml version = "1.0" encoding="UTF-8" ?> <! -- Description of the file format --> <actor>Pascal, Anne, Michael </ actor > <Format> <Type> jpeg</Type> </Format> <location> Part Dieu </ location >
<Time>28/04/2010 </Time>
Figure 5-3: Example of specialized metadata of a photo
Chapter 5 File Classification and Organization
109
Well-known content description metadata models like Dublin-core [104] and MPEG-7
[105] can be used to represent the metadata of a file. In this chapter, we use an abstract
representation of metadata to present and discuss our work.
To facilitate file searching and categorization, files are mapped in a space via vector
space modeling (VSM) techniques [87,106]. Vector space modeling is a standard
technique in information retrieval to represent documents through their contents.
In VSM, a document di is represented by a vector di = {wi1,wi2, . . ., win} where wij
represents the weight of the term j in the document di. To produce this vector for a text
document, the document is parsed into series of words in such a way that the parsing
process removes stop words such as prepositions, conjunctions, common verbs, pronouns,
articles and common adjectives. The documents are then represented in a term x frequency
matrix [87,106]. A document vector can be considered as a vector in the term x frequency
space that is usually referred as vector space.
In this thesis, we propose to organize files hierarchically into clusters. The structure
containing the clusters is called a file tree. The metadata of a cluster are described in Table
5-2. The metadata of a cluster contain its description as well as the IDs of files and sub-
clusters grouped under it. The metadata also contain the average size of files grouped in the
cluster. An example of metadata of a cluster is given in Figure 5-4.
Table 5-2: Description of a cluster
Attributes Descriptions
ClusterID unique identifier of the represented cluster
Description list of keywords that describes the cluster
FilesIDs ids of the files found under the represented cluster
SubClusterIDs ids of the clusters found under the represented cluster
AvgFileSize the average size of files grouped in the cluster
The vector representation of a cluster c is computed as follows. Let V be the set of
vector-representations of files/clusters classified under c. The vector representation of c is
the centriod vector12 of V. In the example displayed in Figure 5-5, V contains the vectors in
the circle and the center of the circle is the vector representation of the cluster.
12 The centroid vector is the average vector of the vectors in V.
Chapter 5 File Classification and Organization
111
Vector representation of c
Vector classified under the cluster
Figure 5-5: Vector representation of a cluster
We propose to apply a VSM technique to construct a virtual vector space to represent
advertised files. As keywords in the textual descriptions of files are the most important
terms of the file, in this thesis, we propose to construct the virtual vector space from the
term statistics with respect to textual descriptions of files.
5.3 Classification Algorithm
5.3.1 File Classification
Clusters are organized into a structure called a file tree. The root of the tree is an artificial
cluster representing all sharable files.
A file-tree is constructed in a bottom up fashion; files are classified into clusters; the
resulting clusters are then classified into other clusters; the classification continues until a
tree of the required height is obtained.
New sharable files, which are added after the classification has been performed, can be
automatically added into clusters found at the leaves of the file tree.
Files/clusters can be classified by using a content-based approach, a metadata-based
approach or a hybrid of the two. A content-based approach performs classifications by
Chapter 5 File Classification and Organization
112
using the files’/clusters’ vector-representations; a metadata-based approach performs
classifications by using the textual-description of the files/clusters.
A content-based approach may not be always applicable since some of the files may not
have vector representations. A vector space can not be computed every time a file is added
due to the following reasons: (1) vector space computation is expensive; and (2) when a
file is added in thin device, this device may not encounter right away the device that is
capable to compute the vector space on its behalf. Consequently, the recently added files
may not have vector representations. We propose to apply a hybrid approach if there are
files that do not have vector representations13.
In a hybrid classification approach, files are first classified using a metadata-based
approach into clusters such that each of the clusters contains some files that have a vector
representation. The vector representations of the resulted clusters are determined by using
only the files that have a vector representation. Afterwards, the classifications of the
clusters are performed by a content-based approach.
The k-means* algorithm (Algorithm 5-1) classifies files/clusters according to their
similarities. Let us discuss about similarity of files and clusters, before discussing the
algorithm.
The similarity between files and clusters can be computed based on the similarity of
either their textual descriptions or their vector-representations. Let E1 and E2 represent
clusters, or/and files. Let the sets of keywords D1 and D2 be the textual descriptions of E1
and E2 respectively.
As discussed in chapter 3, D1 is similar to D2 if and only if Similarity (D1, D2) ≥ minSim
where minSim is the similarity threshold. The similarity value between E1 and E2 is equals
to the similarity value between D1 and D2, i.e.,
Similarity (E1,E2)= Similarity (D1, D2),
13 In this chapter, all files are assumed to have metadata.
Chapter 5 File Classification and Organization
113
E1 and E2 are similar if and only if D1 and D2 are similar, i.e.,
E1 ≈E2 D1 ≈ D2
The similarity of files and clusters can also be derived from the similarity of their vector
representations. Let 12γ be the angle between the vectors representations of E1 and E2; the
similarity of E1 and E2 is calculated as:
Similarity(E1,E2) =cos ( 12γ )
The elements E1 and E2 are said to be similar if and only if
Similarity(E1,E2) minCosSim ≥
where minCosSim is the minimum cosine similarity value.
K-means* performs the classification process according to the similarity of files and
clusters. Let k be the number of clusters required, S be the textual-descriptions/vector-
representations of files/clusters to be classified. The algorithm puts k or less dissimilar
elements in the set heads (lines 3-9) such that the element in heads satisfies the following
condition.
Similarity(si,sj) <minSim for si, sj∈heads
Note that the value of minSim is different for content-based and metadata-based
classifications.
As described in lines 11-14, k-means* classifies the elements in S according to their
similarities to the elements in the set heads. It copies the content of the set heads, which is
initialized at the beginning of the algorithm, to the set oldheads. The set heads is re-
initializes the set heads (lines 15 and 16). From each group, the algorithm selects a head
element of the group in such a way that the head is more similar to the elements in the
group than any other element in this group (lines 17-19). Regrouping and recompilation of
group-heads continue until heads in consecutive steps are similar or the loop is performed
for a maximum number of times (maxIteration).
Chapter 5 File Classification and Organization
114
Algorithm: k-means* INPUT:S, k, minSim S: files/clusters representations k: number of clusters minSim: a threshold indicating the similarity of elements. Note that minSim has different values for content and metadata based classifications. OUTPUT: heads, member(e) ∀e∈heads heads: group-heads of the resulting clusters members(e): files/clusters members of the cluster headed by e BEGIN
//take dissimilar elements randomly 1. heads= ø 2. S’=S
//find k or less number of heads 3. WHILE (|heads| < k) && (S’ ≠ ø) 4. α= S’.randomSelect() // take an element randomly 5. S’= S’-{ α} //remove the element
//add α in heads if it is dissimilar to the other elements in heads 6. IF ((heads = ø) || (Similarity (α,β)< minSim, ∀ β ∈heads) ) 7. heads.add(α) 8. END IF 9. END WHILE 10. i=0 11. Do
//map elements into clusters 12. FOR each s ∈S
/*put s in the group headed by α if s is more similar to α than to other group heads*/
13. members (α) .add(s) for α ∈ heads such that Similarity(s, α) ≥ Similarity(s,β) ∀β ∈ heads
14. END FOR //copy the content of heads into oldheads
15. oldHeads=heads //reset heads
16. heads =ø //re-compute heads
17. FOR each h ∈ oldHeads //determine the best head for the group currently headed by h
18. heads.add(α) such that ∀β≠α∈ members(h)
∑∑∈∈
≥)()(
),(),(hmemberswhmembersw
wsimilaritywsimilarity βα
19. END FOR 20. i++ 21. WHILE ((oldHead !=heads) &&(i<maxIteration)) END ALGORITHM
Algorithm 5-1: file clustering based on k*-means
Chapter 5 File Classification and Organization
115
Let the height of a file tree be h and the number of clusters at depth i be ni for 1 ≤ i ≤ h.
The file tree is computed as:
Step 1: files are classified into nh clusters by using the k-means* algorithm.
Step 2: for each depth i, i = h-1, h-2, …, 1, clusters found at depth i + 1 are classified
into ni clusters.
Step 3: all clusters at depth 1 are grouped into the root cluster, which is an artificial
cluster representing all sharable files.
The next section studies the determination of the dimension of a tree (i.e., a tree high and
number of clusters in each depth of the tree) in the function of mobility classes.
5.3.2 Computation of the File tree’s Dimension
We propose to compute the height of a file tree and the number of clusters at each depth
according to the mobility classes considered by a source peer to determine the
advertisement policies in MANETs. The number of clusters at a depth of the file tree
should be related to the volume of advertisement attached with a mobility class m so that
the clusters in this depth will be advertised in MANET-views described by the mobility
class m.
Consider the file tree in Figure 5-6. Assume that there are two mobility classes m1 and m2
with advertisement volumes equals to 2 and 8 metadata. As a result, the clusters at depth 1
correspond to the mobility class m1 and those in the depth 2, to the mobility class m2.
Therefore, the clusters c11 and c12 will be advertised in MANET-views described by the
mobility class m1; Clusters c21, c22, c23, c24, c25, c26, c27 and c28 will be advertised in
MANET-views described by the mobility class m2. We discuss advertisements of
files/clusters in section 5.5.
Chapter 5 File Classification and Organization
116
m2
m1
C0
C11 C12
C21 C23
C24
C22
C25
C27
C28
C26
Figure 5-6: An example of association between a file-tree with mobility classes.
Not all of the mobility classes can be considered to compute the dimention of the file tree
(i.e., the height of the file tree and the number of clusters at each depth of the tree) because
of the following reason. Redundancy of clusters can be created since the number of
advertisements of mobility classes may be the same or may not be significantly different.
Assume that there are mobility classes m1, m2 and m3 with advertisement volumes 3, 4 and
8 metadata. If m1, m2 and m3 are considered to compute the dimension of the file tree, a file
tree that looks like the one in Figure 5-7 will be resulted. Note that there are clusters
representing the same group of files in the depths 1 and 2; the clusters c11 and c21 as well as
the clusters c12 and c22 represent the same files. To avoid this kind of redundancies of
clusters, we propose to identify representative mobility classes that will be used to compute
the dimension of the file-tree.
Representative mobility classes are those mobility classes that show significant
differences in terms of advertisement volumes. Let β be a significance-factor such that β >
1. A mobility class mi is said to be significantly greater than to a mobility class mj (denoted
as mi > mj) if and only if
β≥−−
)()(
i
i
mvolumeadvmvolumeadv
Chapter 5 File Classification and Organization
117
m3
m2
m1
C0
C11 C12
C31 C33 C34C32 C35 C37C36
C21 C22 C23 C24
C12
C38
Figure 5-7: The redundancy created by considering all mobility classes
A mobility class mi is said to be significantly less than a mobility class mi (denoted as mi
< mj) if and only if mj > mi. The mobility classes mi and mj are called significantlly
different if and only if mi > mj or mj > mj.
Let nf be the number of sharable files. Let M be the set of all mobility classes considered
by a peer during information sharing. The set Mimp ⊂ M is called a set of representative
classes if and only if every m ∈ Mimp satisfies the following properties
[1] mi<mj or mj<mi, ∀mi,mj ∈ Mimp
[2] for ∀m ∈ M - Mimp, one of the following properties is satisfied
a. ∃mi∈ Mimp such that β<−−
)()(
mvolumeadvmvolumeadv i
b. β<− )(mvolumeadv
nf
[3] one of the following conditions hold true for ∀m ∈ Mimp
a. the mobility class is found in inside of the list, i.e., ∃mi, mj ∈ Mimp –{m}
such that mi<m<mj,
b. the mobility class is found at the end of the list, i.e., mi<m, ∀mi∈ Mimp –
{m} and β* adv-volume(m) ≤ nf,
Chapter 5 File Classification and Organization
118
c. the mobility class is found at the beginning of the list, i.e., m<mi, ∀mi ∈
Mimp –{m} and there is no mj ∈ M such that mj<m
Let us consider the mobility classes that are used to produce the file tree in Figure 5-7
(remember that m1, m2 and m3 have advertisement volumes 3, 4 and 8 metadata in the
example). Assume that nf is 16 and β is 2; m2 and m3 are representative mobility classes.
Algorithm 5-2 is used to calculate representative mobility classes as follows. As
described in line 2, all mobility classes in M that have an advertisement volume
significantly less than nf (the number of sharable files) are placed in a set named M’. The
algorithm, then, identifies, from M’, the mobility class having the maximum advertisement
volume as a representative class (step 1). Let’s call this mobility class as mcp. The
algorithm reinitialized M’ to contain mobility classes significantly less than mcp (step 2).
The algorithm repeats step 1 and 2 until M’ becomes an empty set.
Algorithm: representative mobility class computation Input: M, β,,nf M :list of mobility classes β :significance factor nf :the number of sharable files Output: Mimp Mimp :list of representative classes Begin
/*initialization*/ 1. Mimp= {∅}
/*identify mobility classes that have advertisement volumes significantly less than nf */ 2. M’={m| m∈M and adv-volume(m)*β < nf}
/*compute representative mobility classes*/ 3. While (M’!= ∅)
/* Step 1: identify a mobility class having the maximum number of adv-volume as a representative class */
4. Remove mcp∈ M’ such that adv-volume(mcp)≥adv-volume(m) ∀m∈ M’ 5. Mimp+={mcp}
/*Step 2: reinitialize M’ to contain mobility classes significantly less than to mcp */ 6. M’= {m|m∈M’ and m<mcp } 7. End while End Algorithm
Algorithm 5-2: representative mobility class computation
Chapter 5 File Classification and Organization
119
The condition Mimp = ∅ indicates that the number of sharable files is not significantly
different from the volume of advertisement attached with any of the mobility classes. In
this case, advertisement can be made by using metadata of all individual files; thus,
classification is not needed. If that is not the case, the height of the tree is |Mimp| and the
number of clusters at each depth i equals the advertisement volume attached with the ith
mobility class listed in Mimp.
5.4 Information Sharing Based on File Organization
5.4.1 Information Advertisement
As discussed in chapter 3, a data source can make advertisement by using the metadata
of every sharable file. As discussed in the beginning of this chapter, this kind of
advertisement will overload the environment with queries. In this chapter, we propose
advertising files by using descriptions of clusters that represents groups of sharable files.
The advertisement message can contain only clusters found at the shallowest or the
deepest level of a file-tree. As discussed in chapter 4, the current mobility class is used to
determine the volume of advertisement. As discussed in chapter 3, the overall demand of
the peers in the MANET view is used to determine the content of advertisements.
Files and clusters are mapped to the users’ interests in the overall demand according to
their reciprocal similarities. Let F(I) and C(I) be files and clusters matching the interest I.
A file f and a cluster c are placed in sets F(I) and C(I) respectively if and only if (1) c and f
are relevant to I and (2) for any interest Ij in the overall demand, c and f are more relevant
to I than to Ij. The relevance of files and clusters are computed according to their similarity
to the interest.
We propose to compute the content of advertisements by using Algorithm 5-3 according
to the interests of users, the mobility class of the MANET view and the arrangement of the
sharable files in the file-tree. Let m be a mobility class describing the current MANET
view and let Sod be the overall-demand of the peers in the MANET view. The data source
peer prepares advertisements of files using Algorithm 5-3 according to the overall demand
Chapter 5 File Classification and Organization
120
Sod and the advertisement volume attached with the mobility class m. The total volume of
advertisements with respect to the interests in Sod should be adv-Volume(m) and the sum of
the weight of the interests in Sod is 1; thus, the advertisement quota for the interest I in Sod,
denoted as N(I), is computed as:
N (I) = weight (I)*adv-Volume (m)
Let F be a set of files and Ck be a set of clusters found at the depth k of the file tree. Let
F(I) F and C⊆ ⊆k(I) Ck be sets of files and clusters matching the interest I; and ADV(I) be
an advertisement container for an interest I. For an empty interest Ie, i.e.,
Description(Ie)=∅, F(Ie) and Ck(Ie) are computed as follows.
• and { }
∪eod ISI
IF−∈
= )(-F )F(Ie
• { }∪
eod ISII
−∈
= )(C- C )(IC kkek
Let E be a set of files/clusters; the set Relevant(E, I, n) represents the n most relevant
(similar) elements of E with respect to the interest I, i.e., similarity(ei, I) ≥ similarity(ej, I)
for ∀ei ∈ Relevant(E, I, n), ej ∈ E- Relevant(E, I, n).
Algorithm 5-3 selects the metadata of files and clusters to be distributed in the
environment as follows. As indicated in lines 3 to 6, all metadata of files in F(I) are
selected if N(I) is large enough for advertising all sharable files. Otherwise, starting from
the leaves of the file tree, the algorithm searches a depth of the file tree where the number
of cluster at this depth is less than N(I) (lines 7-10).
Let us call this depth k. If the above search is unsuccessful, the metadata of the most
relevant clusters at depth 1 are placed in ADV(I) (lines 11-14). Otherwise, as described in
lines 15 to 22, the metadata of the most relevant clusters found from the depth k to the
depth h (the height of the tree) are placed in the set ADV(I) according to their position in
the file tree and their similarity with the interest I. After considering all the above clusters,
some metadata of individual files might be placed according to the available slots in
ADV(I) (lines 23-25).
Chapter 5 File Classification and Organization
121
Algorithm: Advertisement content determination Input: h, Sod, F(I) ∀I∈ Sod, Ck(I) for 0<k≤h and every I ∈ Sod h : height of the file-tree Sod : overall demand F(I) : files matching with the interest I Ck(I) : clusters matching an interest I and found at the depth k Output: Adv(I) for all I ∈ Sod Adv(I) : advertisement for every I ∈ Sod Begin 1. For each I ∈ Sod 2. ADV(I)=∅
/*select all metadata of files if N(I) is large enough to advertise them one by one*/ 3. If (N(I) ≥ |F(I)| ) 4. ADV(I)={metadata(f)| f∈ F(I)} 5. Exit 6. End If
/*search the depth where there is less than N(I) clusters*/ 7. k=h 8. While ((|N(I) ≤ |Ck (I)|) && (k>0)) 9. k-- 10. End while
/*if there is no depth where there is less than N(I) clusters, select some of the clusters at depth one and exit*/
11. If (k==0) 12. ADV(I)={metadata(c)|c∈Relevant(C1(I),I,N(I))} 13. Exit 14. End If
// select clusters according to their depth in the file tree 15. While((|Adv(I)|<N(I)) & (k≤ h)) 16. If (N(I)-|Adv(I)| ≥ |Ck (I)|) 17. ADV(I)={metadata(c)|c∈Ck(I)}U ADV(I) 18. Else 19. ADV(I)={metadata(c)|c∈Relevant(Ck(I),I,N(I)-|Adv(I)|)} U ADV(I) 20. End If 21. k++ 22. End while
//select some of the files if there are still free slots in ADV(I) 23. If(|Adv(I)|<N(I)) 24. ADV(I)={metadata(f)|f∈ Relevant(F(I),I,N(I)-|Adv(I)|)} U ADV(I) 25. End If 26. End for End Algorithm
A source can make another advertisement after adv-period(m). In the meantime, the
information discovery method will try to adjust its knowledge about stay-times of peers
and the mobility class of the MANET-View. Moreover, in addition to the mobility class,
the volume of advertisement will be affected by Adv-usage(ps), the advertisement usage
factor of ps described in definition 4.3 (in chapter 4). Before sending the advertisements, a
data source asks the usage of advertisements of his/her neighbors and the volume of
advertisements14 that they will distribute in the next period.
Assume that peers Pn are direct neighbors. Let Adv-volume-total be the total volume of
advertisements that the peers in P will distribute in the next period. The advertisement
volume for ps ∈ Pn is zero if its advertisement usage-factor is zero. Otherwise, the
advertisement volume is calculated as follows.
) total-volume-Adv()(
)(∗
−−
=−∑∈ nPp
s
pusageAdvpusageAdvvolumeAdv
Considering the usage factor in the advertisement volume computation give more
chances to popular peers and minimize unnecessary advertisement produces by less
popular peers.
5.4.2 Information Discovery
In chapter 3, we have discussed query resolution according to information provisions of
users. In this chapter, we discuss resolving queries by using the received advertisements.
A query is resolved via an information discovery and an information delivery phases.
The information discovery phase is used to discover peers owning files matching with the
query while the information delivery phase is used to fetch the files.
Let F(q) be the files matching a user query q , which is expressed by a list of keywords,
and le C(q) be the clusters matching q. A file f and a cluster c are placed in F(q) and C(q) 14 As a peer computes mobility classes independently, they can have different volume of advertisements.
Chapter 5 File Classification and Organization
123
respectively if they are relevant to the query q. A file fi is relevant to a query q if and only
if fi is similar to the query q. A file fi is more relevant to interest q than a file fk if
Similarity(fi,q) > Similarity(fk,q)
Clusters are compared with queries in the same way.
Let owners(e) be the set of peers owning an element e (which represents a cluster/file).
Let downloadTime(f) be the time needed to download a file f and let disAnddelTime(c) be
the average time needed to discover and deliver a file grouped under a cluster c.
downloadTime(f) is estimated from the attribute “FileSize” in the metadata of the file.
disAnddelTime(c) is estimated by using the average size of files represented by the cluster
c (i.e., by using the attribute “AvgSizeFile” in the metadata of a cluster).
Let p be the data-requester posing the query q. As discussed in chapter 4.5.1, stay-
time(pi,pj) is the time that peer pi and pj stays together. For a set of elements E, let
Relevant(E,k,q) be a set of elements in E containing relevant elements satisfying the
following properties
• |Relevant(E,k,q) | = max (|E|,k)
• similarity(ei,q) ≥ similarity(ej,q) ∀ei ∈ Relevant(E,k,q) and ∀ej
∈ E - Relevant(E,k,q)
Algorithm 5-4 is used to prepare the messages that can be used to discover or to deliver
files for the query q from F(q) and C(q) respectively. More precisely, this algorithm
prepares two sets Delivery(q) and Discovery(q). The set Delivery(q) contains tuples of the
form (p, f) where p is the peer owning a file f that matches the query q. Discovery(q)
contains tuples of the form (p, q) where p is a peer owning a cluster matching with q.
Algorithm 5-4 first removes files from F(q) if it is not possible to deliver these files (line
1). It also removes clusters from C(q) if it is not possible to discover files grouped under
these clusters (line 2). The algorithm, then, selects some of the files from F(q) according to
their relevance to q and to the required number of files (lines 4 to 7). The algorithm ends
without preparing the discovery message if enough advertisements about files are found. In
Chapter 5 File Classification and Organization
124
the case that |Delivery(q)| < n (n is the number of responses displayed to the user), the
owners the most relevant clusters in C(q) are selected as potential sources of the files
matching with q and discovery messages are prepared to them (lines 9 to 15).
For each tuple (p,f), the metadata of the file f is displayed to the user. If the user approves
the downloading of the file, the file will be delivered from the peer p. For each tuple (p,q)
in Discovery(q), the query q is sent to the peer p. A peer receives the query q searches a
file matching with the query. If the search is successful, the peer sends the description of
the file to the requester peer. The requester peer may decide to download the file from the
peer p. We will discuss the delivery of file in the next chapter.
Algorithm: Prepare delivery and discovery messages Input: F(q), C(q), n, pr F(q): files matching with the query q C(q): clusters matching with the query q n: maximum number of files searched for a query pr: requester peer Output: Discovery(q), Delivery(q) Discovery(q): set of tuple (p, q) where p is a peer owning a cluster matching with q Delivery(q): set of tuple (p,f) where p is a peer owning a file f matching a query q Begin 1. Remove any f in F(q) such that stay-time(pr,pi) < downloadTime(f) for all pi ∈ owners(f) 2. Remove any c in C(q) such that stay-time(pr,pi)< disAnddelTime(c) for all pi € owners(c) 3. //prepare discovery messages 4. For all f ∈ Relevant(F(q),n, q) 5. For all p ∈ owners(f) 6. Put (p,f) in Delivery(q) 7. End For 8. End For 9. If (|Delivery(q) |<n) 10. For all c ∈ Relevant(C(q), n-Delivery(q),q) 11. For all pi ∈ owners(c) 12. Put (pi, q) in Discovery(q) 13. End for 14. End For 15. End If End Algorithm
Algorithm 5-4: Prepare delivery and discovery messages
Chapter 5 File Classification and Organization
125
5.5 Discussion
In this thesis, we have proposed to organize files in a file tree in order to facilitate the file
advertisement process. The file tree is formed in a bottom up fashion. First, files are
classified into clusters. The clusters are then repeatedly classified into other clusters until a
file tree with a required dimension is obtained. The dimension of the tree is computed
according to the number of representative mobility classes so that determination of the
content of advertisement is simplified.
We have proposed an algorithm, named k-means*, to classify files and clusters. This
algorithm is derived from the k-means classification algorithm [107,108]. The difference
between this algorithm and k-means is the selection of the group head, which is the
centroid of a cluster. The modification is needed due to the following weaknesses of k-
means.
• As group-heads are initialized by elements that may represent nothing, it may
happen in k-means that a cluster contains nothing [109].
• When the number of files/clusters is small, the initial grouping will determine the
resulted clusters significantly [110].
Let consider the content-based classification approach. K-means takes group-heads from
the vector space randomly. As a result, a group head may be selected in such a way that all
files/clusters to be grouped are less similar to this group head than to the other group-
heads. In this case, a cluster represented by this group-head will contain nothing. Let us
consider the example displayed in Figure 5-8. Assume k-means selects v1, v2 and v3 as
group heads; some of the files are similar to v1 and the others to v2; none of the files is
more similar to v3 than to v1 or v2; thus, the cluster headed by v3 will contain nothing. The
algorithm k*-means resolves this problem by initializing group-heads with elements that
represent files/clusters to be grouped. As a result, in k-means*, a cluster is initialized in
such a way that it will contain at least one element.
Chapter 5 File Classification and Organization
126
f8
f9
f7 f6 f4 f5
f3 f2 f1
v2
v3
v1 Group head
File
Figure 5-8: A possible result of k-means classification
In k-means, the initial grouping may determine content of the resulted clusters. For the
example displayed in Figure 5-8, the vector headed by v3 contains always nothing
regardless the number of times that k-means is iterated. As a result, the classification may
not have semantic meaning when the number of files/clusters is small since the cluster-
heads are initially selected randomly in k-means. In order to resolve the mentioned
problem, our algorithm initializes group-heads with dissimilar elements.
5.6 Conclusion
In this chapter, we have discussed the organization of files in a file tree in such a way that
the dimension of the file tree is computed in the function of the mobility classes that are
considered during information sharing. The data sources can determine the content of
advertisement from the file tree according to the mobility classes of the MANET view. We
demonstrate the application of a file-tree in the file discovery process.
127
Chapter 6 Implementation and Evaluation
In the previous three chapters, we have proposed and discussed methods used to conduct
information discovery in MANETs according to the interests and the stay-times of users.
Based on these approaches, we propose a self-adaptive information sharing middleware
called SAMi. SAMi is designed to fulfill the requirements described in chapter one. In this
chapter, we present the design, the implementation and the evaluation of this middleware.
The chapter has been designed according to our research work presented in the previous
chapters and those presented in the International Conference on Wireless Applications and
Computing [111,112] and in the fourth IEEE International Conference on Pervasive
Services (ICPS’07)[113].
The chapter is organized as follows. The design of the middleware is presented in section
6.1. Section 6.2 discusses the implementation of SAMi in simulated and real environments.
The evaluation of SAMi is covered in section 6.3. Section 6.4 discusses the challenges
encountered during the design and the implementation of the middleware. Finally, we
conclude the chapter in section 6.5.
Chapter 6: Implementation and evaluation
128
6.1 SAMi: a Self-Adaptive Middleware
In this thesis, we propose a self-adaptive middleware called SAMi that works according
to the following requirements specified in chapter 1.
• Pervasiveness: nomadic users should be allowed to share information anywhere,
anytime and by using any device.
• Mobility-awareness: the advertisement policy should be determined according to the
dynamicity of the environment, which is described by the mobility patterns of users.
• Interest-awareness: sharable files should be selected according to the users’ interests
to receive information and the users’ interests to provide information should be
considered during query resolution.
• High-level semantics: sharable files should be advertised at high level according to
their similarities.
• Context-aware content delivery: file delivery should be performed according to the
context of users and their environments.
• Social awareness: sharable files should be selected according to the social networks
of the users.
• Data dissemination: advertisements and queries should be disseminated according to
the users’ interests.
SAMi is a pure peer-to-peer middleware. Every device participating in information
sharing is required to install SAMi. However, thin devices can be helped by heavy
weighted devices to perform complex operations.
Figure 6-1 displays the architecture of SAMi. The main input of the middleware is a
query. Personal information including the basic information (age, name, address, etc), the
agendas, the habits, the states (e.g., busy) and the interests of users can be accepted as
inputs.
Chapter 6: Implementation and evaluation
129
Advertisement data-store
Local repository
Personal data-store
Rule base MANET-Viewdata store
Adv
ertis
emen
t Man
agem
ent
File
Man
ager
Context Manager
Agenda
Habit
User basic data
File Discovery
File Delivery
File Adaptation
Query
State
Interest
Figure 6-1: Architecture of SAMi
SAMi stores important data, which permits to perform information sharing efficiently, in
four data repositories; namely local repository, advertisement data-store, MANET View
data-store and rule base. A device can contain zero or more data-stores.
Local repository and advertisement data-store contain descriptions of sharable files in
the local machine and in the vicinity respectively. In addition to the descriptions of
sharable files, the advertisement data-store contains platform and service advertisements.
The MANET view data store contains historical information about sharing activities. It
contains the sharing statistics and the mobility classes discussed in chapter 4. It also
contains the queries received in the history as well as the information demands and the
information provisions of users.
Rule base contains the association rules that are used to associate statistically the users’
context to their interests and to the mobility classes.
The middleware is composed of three modules; namely, context manager, advertisement
manager and file manager. A device can contain one or more modules. Every device that
participates in information sharing is required to have the file manager module.
Chapter 6: Implementation and evaluation
130
Context manager determines the MANET-views’ mobility classes and the users’
interests from their contexts by using association rules. It also determines the users’
information needs by analyzing their agenda, habits and historical queries.
File manager, the core of SAMi, carries out file management functionalities, which
includes searching, delivering and classifying files. The file discovery, the file delivery and
the file adaptation modules perform information sharing activities. The file discovery
module is responsible for searching for information sources; the file delivery module is
used to download files; finally, file adaptation is used to help the file delivery module to
fetch the file according to the context of users and their profiles.
Advertisement Manager is responsible to make other peers aware of the sharable files
stored in the device of a data source. It determines the content and the distribution of
advertisements according to the mobility class of the MANET view and the interests of the
peers participating in the MANET.
6.1.1 Design goals
The design goals of the SAMi middleware are listed below.
Flexibility: The system should be easy to be used by any person with minimum effort. It
should also be adaptable to the capacity of mobile devices. This flexibility is achieved by
making the middleware to use interfaces of other well-known messengers like yahoo
messenger. It should also provide its own user interface when it is not possible to use such
messengers.
Discovery Optimization: The system should decrease the time needed to search a file.
The search time can be decreased by optimizing the usage of the push type of information
discovery approach.
Fairness: All peers in the network should equally profit from the information exchange.
This can be performed by fixing a quota on the volume of information to be advertised.
Chapter 6: Implementation and evaluation
131
Automatic computing: The interests of users and mobility classes of MANET views
should be computed automatically as much as possible. Moreover, to facilitate information
sharing, association rules should be produced for estimating the mentioned profile
information.
Scalability: the middleware should work regardless of the number of users and the
number of sharable files in a MANET.
6.1.2 User Profile
A user profile is his/her representation in the virtual world. It describes persistent and
context dependent information about a user. Persistent personal information includes age,
birthday and sex. Context dependent personal information includes habit, preference,
agendas and so on.
The agenda and the habit of a user are used to determine the activities of the user. A habit
indicates repetitive activities of a user during a certain context. For example, while
travelling in a bus, a user may have the habit of reading news and listening to music. An
agenda describes the planned activities of a user. In an agenda, a user can specify the
documents that she/he needs to accomplish the planed activities. Examples of user agendas
and user habits are given in Figure 6-2.
A preference of a user describes the format of the information that he/she is interested in.
For instance, a user may prefer audio data during driving. Preferences of users can vary
according to the spatial or the temporal context of users.
The user profile can describe the information demands and provisions of a user. As
discussed in the previous chapter, the information demand describes the interests of a user
to receive information and the information provision describes his/her interests to provide
information.
Chapter 6: Implementation and evaluation
132
A user profile can indicate the social groups of a user. A user can be a member of
different groups with respect to professional activities, social relationships and hobbies as
well as their information sharing habits.
User Agenda Start End event Activity Required Documents 10A.M. 12 P.M. meeting Strategic plan
preparation Strategic plan preparation
12P.M. 1 P.M. lunch - 1P.M. 3 P.M meeting Discussing with business
persons Efficient way of chairing a meeting How to deal with business persons
User Habit Activity When Time needed
habit
Shopping Week end 30 minutes - Journey in train Friday 2 hours Listening music Talking with friend Night 10 minutes Exchange jokes
Figure 6-2: Examples of user agenda and habits
6.1.3 Context Management Module
Dey [114] defines context as “any information that can be used to characterize the
situation of an entity. An entity is a user, a place, or a physical or computational object
that is considered relevant to the interaction between a user and an application, including
the user and application themselves.” According to Winograd [115], “something is context
because of the way it is used in interpretation, not due to its inherent properties. The
voltage on the power lines is a context if there is some action by the user and/or computer
whose interpretation is dependant on it, but otherwise is just part of the environment.”
From above two definitions, Dejene Ejgu [116], a former Ph.D. student in our research
team, describes context as “an operational term whose definition depends on the intention
of the operations involved on an entity at a particular time and space rather than the
inherent characteristics of the entities and the operations themselves”.
Chapter 6: Implementation and evaluation
133
The concept “sharing context”, defined in chapter 3, is based on definition of “context”.
A sharing context describes the situation where the user is willing to provide files to others
in the vicinity. There are two types of sharing context: abstract and actual. An abstract
sharing context is a sharing context which is manually specified by a user to describe when
and where he allows others to download files from his machine. An actual sharing context
is derived from an abstract sharing context by considering the actual time and place in
which data were shared.
In the scenario presented in chapter 1, Pascal has a habit of sharing information in a class
room. Assume that he specifies (“Class-Room”, ø) as an abstract sharing context, where ø
denotes any time. Pascal is interconnected with other students via a MANET in Room 331
where a course is going on. According to the course schedule, the course will be conduced
from 9 AM to 10 AM. Therefore, (Room-331, [9 AM, 10 AM]) is the actual context
derived from the abstract context (“Class-Room”, ø).
As displayed in Figure 6-3, the context manager module uses the RAID-Action Engine
proposed by Dejene Ejigu [116] to interpret sharing contexts. The engine uses the HCom
model proposed also by Dejene [[116] to manage the context semantics and the context
data. The RAID-engine uses the Jena reasoner [117] to produce actions, which are used to
identify the context dependent personal profile, the mobility classes of MANET views and
the information needs of the user.
SAMi identifies the mobility class of a MANET view as follows. RAID-Action Engine
identifies an abstract context that matches with the actual context, which is accepted as an
input for a data source peer. Let us refer the actual context as cA and let us refer the data
source peer as p. The RAID-engine determines, then, the set of mobility classes defined by
the data source peer for the abstract sharing context. Let’s refer this set as M(p,cA). A
mobility class is selected from M(p,cA) by using the association rules and the actual
sharing context of the data source.
Chapter 6: Implementation and evaluation
134
Context Manager
RAID-Action Engine
Rule mining
Input
User agenda
Actual sharing Context
Output
context dependent personal profile
Mobility Class
Information needs
User Habit
Personal data-store
MANET ViewData store
Rule Base
Figure 6-3: Context management in SAMi
Indeed, as discussed in section 4.4.2, mobility classes are determined by using rules
stored in the rule base data-store. For instance, the rule <context =
(Restaurant,∅)> <mobility class = m3> indicates that the MANETs observed in a
restaurant at any time15 are described by a mobility class m3.
SAMi also uses the RAID-Action Engine to identify the interests of the user by analyzing
association rules. The association rules associate the contexts of users with their interests.
They can also be used to associate social networks of users with their interests.
As discussed in chapter 3, habit rules are used to determine the used to determine the
information provisions of users. The following rule may be used to determine the
Nous définissons la valeur de similarité entre un intérêt et un fichier ou une requête de la
même manière. Soit Df représente la description d'un fichier f et soit q représente une
requête. Similarity(I, f) et Similarity(I, q) sont définies par:
Similarity(f,Ii)= Similarity(Df, Description(I))
Similarity(q,Ii)= Similarity(q, Description(I))
Soit Ei et Ej deux éléments qui peuvent représenter des intérêts, des fichiers ou des
requêtes. Les deux éléments sont similaires (noté Ei ≈ Ej ) si et seulement si :
Similarity (Ii,Ij) ≥ accSim où accSim est un seuil de similarité prédéfini
L’intérêt au partage d’l pair est l'ensemble des intérêts de ce pair dans un contexte de
partage donné. Un contexte de partage 18 d'un pair décrit une situation dans laquelle un
pair permet aux autres de télécharger des fichiers depuis son dispositif. Un contexte de
partage est exprimé par un tuple (L, [Ts, TF]) où L est une localisation et [Ts, Tf] est un
intervalle de temps.
Par exemple, (Bus 1, [8AM, 10AM]) est un contexte de partage décrivant qu'un pair
autorise les autres pairs à télécharger des fichiers depuis son dispositif quand il est dans le
Bus 1 de 8AM à 10AM. D'autres exemples contextes de partage sont listés dans le Tableau
1.
Tableau 1: Exemples de contextes de partage
Contexte Description
(Bus, [8AM, 10AM]) N'importe quel bus de 8AM à 10AM
(“”, [8AM, 10AM]) N'importe quel endroit (lieu) de 8AM à 10AM
(“”, ø) N'importe où et n'importe quand
18 Nous utilisons les concepts «contexte de partage» et «contexte» de façon interchangeable.
iv
Nous définissons deux types de contextes de partage: abstrait et réel. Un contexte de
partage abstrait décrit quand et où un pair autorise les autres pairs à télécharger des fichiers
à partir de son dispositif. Par exemple, un utilisateur peut spécifier que d'autres peuvent
télécharger des fichiers à partir de son dispositif partout et chaque fois qu'il est dans un
MANET en fixant le contexte de partage à ("", ø). Cependant, cela ne signifie pas qu'il est
dans un MANET 24 heures sur 24 et 7 jours sur 7. Un contexte de partage réel est dérivé
d’un contexte de partage abstrait en considérant le temps et le lieu réel dans lequel les
données ont été partagées. A titre d’illustration, supposons qu’un pair ayant un contexte de
partage abstrait ("", ø) est connecté avec d'autres utilisateurs nomades via un MANET dans
le Bus 27 de 8h00 à 8h10. Le (Bus 27, [8h00, 8h10]) est un contexte réel déduit du
contexte abstrait ("", ø).
Un intérêt au partage S est décrit par :
[1] |S| ≥ 1,
[2] Description(I1) ≠ Description(I2) pour tout I1, I2∈ S
[3] ∑∈
=SI
IWeight 1)(
[4] Weight(I) ≥ minW où minW désigne le poids minimal d'un intérêt.
La condition que nous utilisons pour décider de la similitude entre deux intérêts au
partage S1 et S2 est la similitude des intérêts des deux ensembles, c'est-à-dire, pour chaque
intérêt Ii dans S1, il doit y avoir un intérêt Ij dans S2 tel que Ii ≈ I et réciproque. Les intérêts
au partage ne sont pas similaires si cette condition n'est pas satisfaite. Nous utilisons la
mesure de cosinus pour déterminer la similarité entre deux intérêts au partage satisfaisant
la condition principale.
Supposons les deux intérêts au partage S1={I1i,..,I1n} et S2={I21, ..,I2m} tel que |S1|=n et
|S2|=m.
Soit W1i et W2i les poids relatif de I1i et I2i respectivement; la représentation vectorielle de
S1, notée P1, et de S2 , notée P2 ,sont donnée par
• P1=(W11,..,W1n) et
• P2=(W21, ..,W2m)
Soit W12i représente le poids moyen des intérêts de S1 qui sont similaires à l’intérêt I2i.
Soit W21i représente le poids moyen des intérêts dans S2 qui sont similaires à l'intérêt I1i.
v
Pour un intérêt au partage S, soit Sim(S,I) les intérêts de S qui sont semblables à l'intérêt
I;c-a-d Sim(S,I)={Ij| Ij∈S et Ij≈I } ; W12i et W21i sont calculés comme suit:
),(
)(W
21
),(12i
21
i
ISSimI
ISSim
IWeighti
∑∈=
),(
)(W
12
),(21i
12
i
ISSimI
ISSim
IWeighti
∑∈=
Soit P12 égal la représentation vectorielle de S1 par rapport à S2 et soit P21 égal la
représentation vectorielle de S2 par rapport à S1; nous définissons ces vecteurs comme suit:
• P12 = (W121,..,W12m)
• P21=(W211, ..,W21n)
La condition de similitude est satisfaite par S1 et S2 si et seulement si :
∀ Ii ∈ S1, ∃ Ij∈ S2 tel que Ii ≈ Ij et
∀ Ii ∈ S2, ∃ Ij∈ S1 tel que Ii ≈ Ij
Nous définissons la valeur de similitude entre les intérêts au partage S1 et S2 comme suit:
⎪⎪⎩
⎪⎪⎨
⎧ +
=
sinon 0
satisfaiteest principale similitude decondition la si 2
) ,cos(),cos(
),(
122211
21
PPPP
SSSimilarity
Similarity(S1,S2) est commutative. Deux intérêts au partage S1 et S2 sont similaires si et
seulement si :
S1 ≈ S2 ⇔ Similarity(S1,S2) ≥ accC où accC est un seuil de similarité prédéfini.
Un intérêt au partage peut être utilisé comme demande ou provision d’information. Une
demande d’informations d'un pair est un intérêt au partage qui contient les intérêts
décrivant les informations que ce pair souhaite recevoir. Une provision d'informations
contient les intérêts décrivant les informations que ce pair est prêt à fournir.
vi
Information-Demand(p, pd, c) représente la demande d’informations du pair p observée
par le pair pd dans le contexte c. Quand p est égal à pd,, Information-Demand (p, pd, c) est
notée Information-Demand(p,c).
Information-Provision(p, pd, c) représente une provision d’informations d’un pair p
observé par un pair pd dans un contexte c. Quand p est égal à pd, Information-Provision(p,
pd, c) est considéré comme Information-Provision(p, c).
Overall-Demand(P) représentent les intérêts d’un ensemble de pairs P décrivant les
informations que ces pair souhaitent recevoir.
Overall-Provision(P) représentent d’un ensemble de pairs P à décrivant les informations
que ces pair sont prêts fournir.
Les intérêts des utilisateurs peuvent être exprimés manuellement par eux mêmes. Par
exemple, un utilisateur peut déclarer qu'il est intéressé par la réception des blagues dans le
contexte bus 37. Les intérêts des utilisateurs peuvent, aussi, être calculés automatiquement
en utilisant les requêtes et les annonces échangées dans l’historique. Les intérêts peuvent, également, être déterminés en utilisant des règles d'associations.
Une règle d'association liée à une demande d'informations est écrite comme suit :
<Contexte=c> ⇒ <Demande-Information =D>
Exemple: la règle ci-dessous indique que de 8h00 à 8h10 dans n'importe quel lieu, la
demande d'informations de l’utilisateur contient un intérêt lié à la finance (70%) et autre
Les groupes sociaux présents dans la MANET View, peuvent être utilisés pour
déterminer les provisions d'informations et des demandes d’informations des utilisateurs.
2.2 Le module: ‘Advertisement Manager’
Le module “Advertisement manager” est responsable de la distribution d’annonces aux
voisins d’un pair. Un message d’annonces contient des clusters trouvés dans l’arborescence
de fichiers à un niveau plus ou moins profond. La classe de mobilité actuelle est utilisée
pour déterminer le volume de l’annonce. La demande globale des pairs est utilisée dans le
but de déterminer le contenu des annonces (c.-à-d. dans le but de bien proposer des
informations qui a priori ils intéressent).
Nous proposons de calculer le contenu des annonces en fonction :
• des intérêts des pairs présentés dans le MANET,
• de la classe de mobilité décrivant la MANET View,
• et de l’emplacement des fichiers dans l’arborescence de fichiers.
Soit m égal une classe de mobilité décrivant la MANET View actuelle. Soit Sod la demande
globale des pairs à la MANET View (i.e., Sod est Overall-Demand(P) défini dans la section
xiv
1.1 tel que P est l’ensemble des pairs participant dans la MANET View). Le quota de
l’annonce, notée N(I), pour l'intérêt I dans l'ensemble Sod, noté N(I), est calculé comme :
N (I) = weight (I)*adv-Volume (m)
Soit F représente un ensemble des fichiers et soit Ck représente un ensemble de clusters
trouvés à la profondeur k de l'arborescence de fichier. Soit F(I) F l’ensemble de fichier ⊆
correspondants à l'intérêt I et soit Ck(I) C⊆ k l’ensemble de clusters correspondant à
l'intérêt I.. Un fichier f et un cluster c est placé dans F (I) et Ck (I) respectivement si et
seulement si (i) c et f sont similaires à I ; et (ii) pour n'importe quel intérêt Ij dans la
demande globale, c et f sont plus similaires de I que de Ij.
Pour un intérêt vide Ie,, c'est-à-dire, Description(Ie)=∅, F(Ie) et Ck(Ie) sont calculé
comme suit.
• et { }
∪eod ISI
IF−∈
= )(-F )F(Ie
• { }∪
eod ISII
−∈
= )(C- C )(IC kkek
L’Algorithme 5 détermine les métadonnées des fichiers et des clusters destinées à être
distribuées dans l'environnement. Le principe de l’algorithme est comme suit : lignes 3 à 6,
toutes les métadonnées des fichiers dans F(I) sont sélectionnées, si N(I) est assez grand
pour annoncer les fichiers en utilisant les métadonnées de chaque fichier. Sinon,
l'algorithme recherche une profondeur de l'arborescence de fichier tel que le nombre de
clusters à cette profondeur sont inférieure de N(I). Cette profondeur est appelée k. Si la
recherche échoue, les métadonnées des clusters les plus similaires à la profondeur 1 sont
placées à ADV(I). Sinon, comme décrit dans les lignes 15 à 22, les métadonnées des
clusters les plus pertinentes trouvées dans de la profondeur k à de la profondeur h sont
placées dans l'ensemble ADV(I), selon leur position dans l'arborescence de fichiers et leur
similitude à l'intérêt I. Après avoir examiné tous les clusters ci-dessus, certaines
métadonnées des fichiers peuvent être placées dans ADV(I).
xv
Algorithm: Préparation de messages d’annonces Input: h, Sod, F(I) ∀I ∈ Sod, Ck(I) pour 0<k≤h et chaque I ∈ Sod h : hauteur de l'arborescence des fichiers Sod : la demande globale F(I) : fichiers correspondant à l'intérêt I Ck(I) : clusters correspondant à l'intérêt I et trouve à la profondeur k Output: Adv(I) pour tout I ∈ Sod Adv(I) : métadonnées d’annonces à l'égard de l'intérêt I ∈ Sod Begin 1. For each I ∈ Sod 2. ADV(I)=∅
/* Sélectionner l'ensemble des métadonnées des fichiers si N(I) est assez grand pour faire de la publicité un par un */
3. If (N(I) ≥ |F(I)| ) 4. ADV(I)={metadata(f)| f∈ F(I)} 5. Exit 6. End If
/* Recherche de la profondeur où il ya moins de N(I) clusters */ 7. k=h 8. While ((|N(I) ≤ |Ck (I)|) && (k>0)) 9. k-- 10. End while
/* S'il n'ya pas de profondeur où il est inférieur à N (I) des clusters, sélectionner quelque clusters en profondeur un Relevant (C, I, n): contient les n plus similaires clusters à l'intérêt I dans C */
11. If (k==0) 12. ADV(I)={metadata(c)|c∈Relevant(C1(I),I,N(I))} 13. Exit 14. End If
// Clusters sélectionner en fonction de leur profondeur dans l'arborescence des fichiers 15. While((|Adv(I)|<N(I)) & (k≤ h)) 16. If (N(I)-|Adv(I)| ≥ |Ck (I)|) 17. ADV(I)={metadata(c)|c∈Ck(I)}U ADV(I) 18. Else 19. ADV(I)={metadata(c)|c∈Relevant(Ck(I),I,N(I)-|Adv(I)|)} U ADV(I) 20. End If 21. k++ 22. End while
/* Sélectionner les fichiers s'il ya encore des places libres en ADV(I) Relevant (F, I, n): contient les n plus pertinentes fichiers (i.e., similaires) à l'intérêt I dans F*/
23. If(|Adv(I)|<N(I)) 24. ADV(I)={metadata(f)|f∈ Relevant(F(I),I,N(I)-|Adv(I)|)} U ADV(I) 25. End If 26. End for End Algorithm
Algorithme 5: Préparation de messages d’annonces
Apres avoir calculé ADV(I), la source de donnés le transfère à ses voisins directs qui
satisfont une des deux conditions :(i) le voisin est situé dans la direction de pairs ayant une
xvi
demande d'informations correspondant à l'intérêt I ; et (ii) le voisin a un haut degré de
collaboration avec la source de données. Dans le premier cas, la méthode proposée par
l’algorithme de routage LAR [81] est utilisée pour sélectionner les voisins en fonction de
leur emplacement. Dans le deuxième cas, les voisins sont déterminés par rapport à leur
historique de partage. Un pair acceptant l’annonce la retransmettra de la même façon.
Un pair-source de données retransmettra éventuellement l’annonce après une période noté
adv-période(m). Dans l'intervalle, le pair va essayer d'améliorer ses connaissances sur le
temps de connexion des pairs dans la MANET View et d’affiner la classe de mobilité
décrivant la MANET View.
2.3 Le module: ‘File Manager’
Le module “File-Manager” est chargé de découvrir et de télécharger des fichiers
correspondant à une requête via deux phases : (i) la découverte d’informations ; et (ii) le
téléchargement d’informations. La phase de découverte d'informations est utilisée pour
découvrir les pairs possédant les fichiers correspondant à la requête. Quant à la phase de
téléchargement, elle est utilisée pour récupérer les fichiers.
F(q) et C(q) représentent, respectivement, les fichiers et les clusters correspondant à une
requête q. La requête q est décrite par une liste de mots-clés. Un fichier f et un cluster c
sont placés dans F(q) et C(q), respectivement si ils sont similaires à la requête q.
Tout d’abord, certains fichiers dans F(q) sont supprimés si ce n’est pas possible de les
fournir. De même, certains des clusters dans C(q) sont supprimés s’il n'est pas possible de
découvrir des fichiers groupés dans ces clusters. Les fichiers sont placés dans F(q) en
fonction de leur pertinence par rapport à q. Si le nombre de fichiers sélectionnés ne suffit
pas, les clusters les plus pertinents dans C(q) sont sélectionnés comme des sources
potentielles de fichiers correspondant à q et les messages de découverte sont envoyés aux
ces sources potentielles. Des messages de découverte sont également envoyés aux pairs
ayant une provision d'informations correspondant à q.
xvii
Après que la phase de découverte d’informations soit achevée, la phase de téléchargement
de l’information commence. Le but de cette phase est de choisir une ou plusieurs sources
d'informations pour télécharger un fichier. Le téléchargement d’informations est exécuté
comme suit :
• SAMi recherche les sources d'informations qui peuvent acheminer le fichier en
entier. Si plusieurs pairs sont capables d’effectuer l’acheminement, SAMi
sélectionne un pair en fonction de sa distance par rapport au pair concerné et au
temps qu’ils vont rester ensemble.
• Si aucun pair n’est pas en mesure de transférer le fichier en entier, SAMi recherche
une combinaison de pairs (p1, p2, …, pk) telle que pi fournit une portion du fichier
(appelé sfi) et c’que la fusion de (sf1, sf2, …, sfk) donne le fichier demandé.
3 Conclusion et perspectives
Dans cette thèse, nous proposons un modèle théorique le système de partage
d'informations adapté aux MANETs qui prouvent découvrir des fichiers selon les intérêts
des utilisateurs et la dynamicité du réseau. Nous proposons, aussi, une méthode
d'organisation des fichiers en arborescence permettant de faciliter la découverte des
fichiers. Pour mettre en œuvre le modèle théorique proposé, nous décrivons un intergiciel
auto-adaptatif appelé SAMi.
Actuellement, SAMi peut être utilisé pour permettre aux utilisateurs nomades de partager
des fichiers sous condition sur les droits d’accès. Dans l'avenir, nous planifions d’étendre
SAMi pour distribuer les annonces des fichiers selon les droits des pairs.
Dans cette thèse, une arborescence de fichiers est construite en utilisant une technique de
classification non supervise afin de faciliter la découverte des fichiers. Dans l'avenir, nous
planifions d’utiliser une technique fondée sur une ontologie pour enrichir la technique non-
supervisée proposé. Nous planifions, aussi, d’utiliser des intérêts des utilisateurs pour
optimiser la classification des fichiers.
xviii
Annex B. Detailed Design of SAMi
State Diagram
In the middleware, a user and a device have states as shown in the state diagrams displayed
in Figure B-1 and Figure B-2.
Figure B-1: State of a device
A device has four main states: isolated-idle, isolated-busy, inMANET-idle and inMANET-
busy. The prefix isolated and inMANET indicate a device is in and not in a MANET
respectively. The suffix idle indicates that no program is running on the device while the
suffix busy indicates that programmers are running.
There are two important states for a user: States idle and busy. A user can be interrupted in
the idle state.
xix
Figure B-2: States of a user
Activity Diagram
Advertisement (Figure B-3) is performed when a peer enters in a MANET by using
activities listed in Table B-1. The advertisement policy (period, content and radius of the
advertisement) should be determined by the advManager. The advertisement time and
advertisement message are determined and prepared by the advManager. When the
advertisement time is arrived, the advertisement message is distributed by the messenger
object. The above process is repeated until the device is out of the network.
In SAMi, information-needs of a user can be identified from queries of users as discussed
in Figure B-4 and from their agendas as in Figure B-5 by using activities mentioned in
Table B-1. The entered query can be searched directly if the device is in MANET and is
idle. Otherwise, the query is passed to query Manager for later treatment, otherwise.
When a user enters an agenda, the information manager extracts a query in order to search
documents that are needed to accomplish the agenda. If the device state is inMANET-idle
and the query should be treated urgently (the agenda is planned after a few hours) or the
query goes with the context of the environment (the interests of the user in the MANET
matches with the query), the query is treated directly, and it will be treated later, otherwise.
<Advertisement>M
esse
nge
rC
ont
ext
Ma
nage
rA
dvM
ana
ger
Dev
ice
[enters in a MANET] prepare advMessage calculateTimeToAdv
sendAdvMessage
[it is TimeToAdv]
determine profile
determine adv-policy
Figure B-3: Activity diagram of advertisement
Table B-1: Important activities to perform advertisement
Activity Description
Determine profile calculates the interests and the mobility class of users
Determine Adv policy determines the period, the content and radius of
advertisement with respect to users’ interest and mobility
class
Prepare advMessage prepares advertisement message
calculateTimeToAdv determines the time to make advertisement as current time
plus a random number between zero and the period of
advertisement
sendAdvMessage distributes advertisement in the vicinity
xx
<Information extraction from user >
<Device> <InfManager><user>
[Enter a query] [state = inMANET-idle]
[state != inMANET-idle]
Search file
search is successful
search isn't successful
Treat a query later
Figure B-4: Searching information for a user query
As shown in Figure B-6, queries that have been kept for later treatment are searched if the
device enters in another MANET. The query which deadline is approaching will treated
first. The query that goes with the information provision capacity of the user will be treated
next.
xxi
<Extract information need from Agenda ><I
nfM
anag
er>
<D
evic
e><U
ser>
[agenda is entered]
[need information for the agenda]
[state = inMANET-idle]
[query go enviromental context]
[urgent query] [search is successfull ]
Treat a query later
extract Query
search file
[Nothing is needed]
(State != inMANET-idle]
[Other queries] [Search is not successfull ]
Figure B-5: Activity diagram of information extraction
<Query treatement>
<In
fManager
><D
evi
ce>
[state=INMANET-idle]
take a urgent query
no uregnt query
take a contextual query
search a file
[state=INMANET-idle]
[no query to treat]
treat a query later
[search is successful ]
[Search is not successful ]
[state != INMANET-idle]
There are queries
Figure B-6: Activity diagram of query treatment
xxii
xxiii
Table B-2: Activities to extract and search information
Activity Description
Search file Searches a file expressed by a query
Treat a query later Puts a query for later treatment
Extract a query Extracts a query from a user agenda or habit
Take urgent query Selects a query which will be expired before a peer
involves in another MANET
Take contextual query Selects a query which go with the information provisions
of users in the vicinity
Advertisement can be used to identify information-sources for a query as shown in Figure
B-7. The file indicated by the advertisement will be downloaded if it does not exist locally
and matched with a query.
<Usage of Advertisement >
<Inf
Man
ager
><M
esse
nger
>
[advertisement is accepted]
matches a query download file
[the advertisement is for a file[matches with historical query download file
Figure B-7: File searching from incoming advertisement
Rule identification is done offline as shown Figure B-8. When a device is in isolated-idle
state, the class rule-miner estimates the time that a peer stays in the state (calculate life-in-
state). If the time is enough to mine rule, the rule mining will be performed.
xxiv
<Rule mining>
<Rule Miner><Device>
[device is isolated -idle]calculate
life-in-state, the time that a node stayes in
this state
[life-in-state < mining-time
mine-rules
Figure B-8: Activity diagram of rule mining
As rule identification, file classification and representation are done offline. As shown in
Figure B-9, when the middleware starts working it represent the files, classify them into
clusters and then represent the clusters. A new file is grouped under a leaf-cluster that is
more similar to the file. When a tree is unbalanced, it will be modified to create a balanced
one. The modification of tree can be done when the environmental context is changed. As
classification of files, modification of tree is done offline.
Figure B-9: Activity diagram file representation and classification
xxv
Sub-system Decomposition
The component SAMi-adaptor (Figure B-10) contains only one package. It passes inputs
entered through other messenger to SAMi-basic and displays the output produced by
SAMi-basic by using interface provided by the messenger. The main component of SAMi-
adaptor is the interface plug-in.
Figure B-10: A SAMi-Adapotor yahoo messenger
SAMi-thin (Figure B-11) is used to allow thin devices to participate in the information
exchange. It is composed of 3 packages: Login; Collaborator and UserInterface. Note that
the packages are not unique for this component. They can be used with/without mediator
for the other components as well.
The login package is used to verify that an authorized user accesses the middleware. The
UserInterface package is used to accept basic inputs of SAMi, i.e., a query, a user profile
and an agenda. The Collaborator package is used to ask other peers in the surrounding to
search information on behalf of a user owning a thin device.
xxvi
Figure B-11: SAMi-thin
As shown in the diagram displayed in Figure B-12, the SAMi-GUI component contains a
package called UserInterface, which is also a part of the component SAMi-thin. The
package contains four interfaces and four classes that implement the interfaces. The class
guiFileIO is used by to accept a query and to display a query recommendation,
advertisements, and files that are downloaded recently. The class uiUserIO is used to
accept a user preference, state, agenda and profile. The envIO is used by the administrator
to configure mobility classes. The guiMain is used by a user to navigate from one interface
to the other.
Figure B-12: SAMi GUI
xxvii
SAMi-GUI is implemented by extending the user interface classes of J2ME. It is consists
of the classes displayed in Figure B-13.
mainMenu
browseMenu aboutForm settingMenusearchForm
advFormruleForm
tempStoreFormHistForm
habitFormagendaForm
prefFormmobilityForm
Figure B-13: Classes in SAMi-GUI
The component SAMi-core (Figure B-14) is composed of the three packages that access
the advertisement data-store, the MANET-View data-store and the local repository. The
advertiser package is responsible to distribute and to manage advertisements according to
the context of a user and the environment. The inf-Manager package searches and
downloads files according to a user query, agenda and habit.
xxviii
Figure B-14: SAMi-core
SAMi-ext (Figure B-15) is composed of two packages: ruleExtractor and extInfMang. The
ruleExtracto package identifies rules by analyzing historical data-store and puts the
resulted rules in rule Base. The extInfManager package is used to classify files into
clusters, represent files and clusters in vector space, and manages file adaptation.
Figure B-15: SAMi-ext
xxix
Annex C. Important classes of SAMi
Inf-Manager
lstFiles: the list of metadata of shareable files.
lstCluster: the list of metadata of clusters found in each depth of the file tree.
THeight: the height of the file tree.
resp-limit: the maximum number of files returned for a query.
numUploads: the number of files sent to the neighbors in the current session.
processQuery(): identifies files that match a query and prepares a response.
prepareReponse(): prepares a response for a query.
removeException(): removes the files that are identified as exception by a query.
searchByTitle(): searches files according to their title.
searchbyCategory(): searches files according to their category.
xxx
getTHeight (): returns the height of the file-tree.
mapFileToInterest(): identifies files that match with the interest of a user.
mapFileByCategory(): identifies files that have/similar to a given category.
mapFileByTitle(): identifies files that have/similar to a title passed as an argument.
mapClusterToInterest(): identifies clusters that match with the interest of a user.
mapClusterBy category(): identifies clusters containing files having/similar to a given
category.
mapClusterByTitle(): identifies clusters that have/similar to a title passed as an argument.
nearer():returns the interest which is more similar to a given file.
uploadFile():sends whole or a part of a file.
isExist():returns a file having a given meta.
Discovery-Manager
lstDiscovery: a list of discovery objects that can be used to discover files for a query.
maxFile: the maximum number of files searched for a file.
searchFile(): searches files and their sources by creating a discovery object.
cleanDiscovery(): removes a discovery object and register a query dealt by the object in
historical data-store.
accpetResponse(): accepts a response and hands it to an appropriate discovery object.
searchDiscovery(): searches a discovery object that searches a response for a given query.
xxxi
File Discovery
maxfile: the maximum number of files that can be discovered for a query.
discoveryDeadline: the maximum time that files should be discovered.
q: a query for which information is discovered.
lstInf: a list of files discovered for a query.
lstRsp: a list of responses accepted for a query.
searchFile():searches file for a query by distributing discovery message and from
advertisement data store.
setMetafile():adds the metadata of a file and the source of the file in lstInf.
distributeMessage():distributes discovery message for potential sources.
getInfDis():returns the attribute lstInf.
searchAgain(): performs a further search.
approvalDelivery(): passes the lstInf to the deliveryManager object.
getQueryID(): returns the id of the query that the object is dealing with.
acceptResponse():accepts a response for the distributed message.
xxxii
DeliveryManager
download: assigns objects to download a given list of files.
cleanDelivery: removes a delivery object.
acceptDelivery: transfers a portion of a file to an appropriate delivery object.
searchDelObject: searches a delivery object that deals with the file specified by a given
metadata.
File Delivery
maxSource: the maximum number of sources from which the file can be downloaded.
Meta: the metafile considered by the object.
lstReq: the number of delivery-requests prepared by the object.
lstInf: the downloaded parts of a file with their owners.
lstSources: a list of the profiles of sources of the file with the metafile referred.
downloadFile(): searches list of sources to download the required file.
downloadPartially(): downloads some parts of the file.
acceptFilePortions(): accepts a portion of a file.
xxxiii
distributeMessageDelivery(): distributes delivery messages to the sources of the file that
the object is dealing with.
mergResult(): merges the portions of the file.
canDownLoadFull(): checks if the file that a object deals with can be downloaded by
using the given list of sources.
searchRequest():Searches a request that is sent a given source
Response
queryID: the identifier of a query about which a response is dealing.
numFile: the number of files matching with the query.
lstFiles: metadata of files matching with the query.
simValues: list of similarity values where the ith value indicate the similarity between the
ith file and the query
Query
queryID: the identifier of a query.
title: the title of the file about which the query is dealing.
catagories: the categories of the file to be searched.
xxxiv
deadline: the time after which a file should be no more searched or downloaded for the
query.
exceptions: the identifiers of the files to which the query doesn’t stand
setDeadline(): assigns the deadline of the query
Download Request
meta: the metadata of the file to be downloaded.
sourceID: the identifiers of a peer from which parts of the file will be downloaded.
requestTime: the time when the request is distributed.
divisionBy: the number that indicates into how many parts the file is divided into.
requestedPart:the part of the file that will be downloaded from the peer referred by the
object.
FileAdv
meta: the description of a file
owners: the identifiers of peers that have advertised the file referred by the object
xxxv
ClusterAdv
meta:A metadata of a cluster
owners:The identifiers of peers that have advertised the cluster referred by the object
Adv-Manager
xxxvi
radius: the number of hops that the advertisement traverses
period: the time interval between two successive advertisements
numAdv: the volume of the advertisements
advCont: the content of the advertisement
advTimeThershold: the time up to when the advertisement distribution can be delayed
IDAdvFile: the identifier of files that have been advertised to the current neighbors
IDAdvCluster: the identifier of clusters that have been advertised to the current neighbors
prevAdv: advertisements that have been made in recent history
intializeAdv(): sets the attributes the object as of the mobility class
setNumAdv(): recalculates the volume of advertisement with related to the usability of the
previous advertisement
scheduleAdv(): schedules the advertisement
prepareCont(): initializes the content of advertisement
setCont( ): selects the files and clusters to be advertised
sendAdv(): sends advertisement to a neighbor
setIDFilesClustersAdvertised(): identifies the identifiers of files and clusters that have
been advertized to the current neighbors
addFile(): adds files to be advertised
addIsolatedFile: adds a file which is classified under no cluster that the adv-manager is
aware of
addCluster(): adds clusters to be advertised
addNonIsolatedFile(): adds a file which is classified under a cluster that the adv-manager
is aware of
createBalanceDoc(): makes sure that the number of files/clusters for interests doesn’t
show significant differences as much as possible
resolveConflict(): makes sets of files matching two interests are disjoint
setIDFileCluster(): identifies of files/clusters advertise for neighbors
xxxvii
PositiveDoc
isoFiles: files that match an interest and classified under no cluster in the attribute cluster
nonIsoFiles: files that match an interest that also match files in isoFiles and classified at
least under a cluster in the attribute cluster
clusters: clusters matching the interest that match the files in isoFiles
THeight: the height of the file tree
numAdv: the volume of advertisement
intialize(): initializes the attribute isoFiles, THeight and numAdv
addClusters(): adds clusters in the attribute clusters and move files classified under this
clusters from isoFiles to nonIsoFiles
getNumAdv(): returns the volume of advertisements
getNumFiles(): returns the number of files that are referred by the object
getNumCluster(): returns the number of clusters that are referred by the object
getFiles(): returns the IDs of files referred by the object
getNumIsoFiles(): returns the number of isolated files
xxxviii
getNumNonIsoFiles(): returns the number of non-isolated files
getIsoFiles(): returns the IDs of the isolated files
getNonIsoFiles(): returns the IDs of the non-isolated files
getNumIsoClusters(): returns the number of isolated clusters, i.e. the clusters that are
referred by the object and are classified under no cluster referred by the object
getNumNonIsoClusters(): returns the number of non-isolated clusters, i.e. the clusters that
are referred by the object and are classified at least under a cluster referred by the object
getIsolatedClusters():Returns the ID of the isolated clusters referred by the object
getNonIsolatedClusters():Returns the ID of the non-isolated clusters referred by the object
PositiveCluster
isolatedClusters: a list of clusters that match an interest, founds at the same depth and
classified under no clusters matching the same interest
nonIsolatedClusters:A list of clusters that match the interest matching the clusters in
isolatedClusters, founds at the same depth and classified under another clusters matching
the same interest
addCluster():Adds an id of a cluster in isolatedClusters
getIsolated():Returns the ids of the isolated clusters in isolatedClusters
removeIsoCluster(int at):Removes a cluster from isolatedClusters
addNonIsoCluster():Adds an id of a cluster in nonIsolatedClusters
getNumClusters():Returns the number of clusters
getNumIsolated():Returns the number of isolated clusters
getNumNonIsolated():Returns the number of non isolated clusters
getIsolated():Returns the ids of t clusters in isolatedClusters
xxxix
ProQueryManager
maxNum: the maximum number of proactive queries.
lstProQuery: the list of proactive queries.
maxTime: the maximum time that a proactive query can be kept for approval.
setQueryForApproval(): adds a proactive query in lstProQuery.
deleteQuery(): deletes a proactive query.
cleanProQuery(): deletes proactive queries that are formed before maxTime.
approvesQuery(): starts searching files for the query approved by a user.
TempFileManager
tempFolderPath: the path of the folder where files can be stored temporally.
totalSize: the maximum size of memory that can be occupied by temporary files.
occupiedSize: the actual size of memory occupied by the temporary files.
lstTemfile: the metadata of files stored temporally.
xl
maxTime: the maximum lifetime of a temporarily file.
Date et lieu de naissance le 11 mars 1977 à Addis-Abeba (Ethiopie)
Etat civil Célibataire
Nationalité Ethiopienne
Langue : Anglais, Français, Amharique (langue d’Ethiopie)
Formation
• Doctorante en informatique au laboratoire LIRIS, INSA de Lyon (Jan.2006 - Juillet 2010)
• Master en informatique, Département d’Informatique, Université d’Addis-Abeba, Addis-Abeba, Ethiopie (Sept. 2002 - Juillet 2004).
• ‘Bachelor of Science (BSc)’ en informatique, Département de mathématiques, Université d’Addis-Abeba, Addis-Abeba, Ethiopie (Sept.1996 - Juillet 2000).
• E.S.L.C.E (Diplôme de fin d’études secondaires), Addis-Abeba, Ethiopie (Mai 1996).
Expérience professionnelle
• Enseignement en informatique, Département de Mathématiques et Informatique, Université d’Addis-Abeba (Sept. 2000- Oct 2004).
Expérience administrative
• Intervenante et responsable pour les cours de bureautique et logiciels d’application, formation continue, Département de Mathématiques et Informatique, Université d’Addis-Abeba (Juin. 2002-Sept 2002).