Mobile Social Networking for Enhanced Group Communication

Author: Advisors:

Martin Wirz Michael Kuhn

Reto Grob

Master Thesis

Mobile Social Networking for

Enhanced Group

Communication

August 31, 2008

Preface

This report is the documentation of my Master Thesis written at Swisscom Strategyand Innovation and the Distributed Computing Group at ETH Zurich from March 1thuntil August 31th 2008.

During this time I not only gathered a lot of knowledge concerning my project but alsogained insight into a cutting-edge and absorbing field of research.

Without the support of many people, this Master Thesis would not have been pos-sible and I wish to express my gratitude to them all. First of all, I would like to thankProf. Dr. Roger Wattenhofer for giving me the opportunity to write my Master Thesisat the Distributed Computing Group and for the supervision of my work.

I greatly appreciate the guidance of my supervisors, Michael Kuhn from the Dis-tributed Computing Group and Reto Grob from Swisscom Innovations.

Additionally, I thank the whole Swisscom Strategy and Innovation Group for thegreat working atmosphere.

Bern, August 31, 2008

Martin Wirz

Contents

Abstract xi

1 Introduction and Outline 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Scope of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 Ubiquitous Mobile Communication . . . . . . . . . . . . . . . . . . . . . 32.2 Drivers for the Mobile Internet . . . . . . . . . . . . . . . . . . . . . . . 32.3 Towards Mobile Social Services . . . . . . . . . . . . . . . . . . . . . . . 4

3 Motivation 73.1 Survey and Conclusion about Group-Based Communication on Mobile

Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.1.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Specification of a new Mobile Group Communication Service . . . . . . 93.2.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.2 Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3 Existing Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3.1 Mobile Version of classical SNS . . . . . . . . . . . . . . . . . . . 103.3.2 Location Based Services . . . . . . . . . . . . . . . . . . . . . . . 103.3.3 Group Communication and Collaboration Services . . . . . . . . 113.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 User Recommendation based on Social Graph Clustering 134.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Social Network as a Graph Model . . . . . . . . . . . . . . . . . . . . . . 14

4.3.1 Social Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.3.2 Ego-Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.3.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 User Recommendation Algorithm . . . . . . . . . . . . . . . . . . . . . . 164.4.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.4.2 Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.4.3 Recommendation Algorithm . . . . . . . . . . . . . . . . . . . . . 20

iv Contents

4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.5.1 Clustering Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 214.5.2 Clustering Stability . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Ego Graph Retrieval based on Communication Behavior 335.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2.1 Mining Social Network . . . . . . . . . . . . . . . . . . . . . . . . 345.2.2 Mining Spatio-Temporal Co-Occurrences . . . . . . . . . . . . . . 345.2.3 Mining Temporal Co-Occurrences in Communication . . . . . . . 35

5.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3.1 Estimating Contact Interlinkage . . . . . . . . . . . . . . . . . . 355.3.2 Communication Stream . . . . . . . . . . . . . . . . . . . . . . . 35

5.4 Temporal Co-Occurrence Algorithm . . . . . . . . . . . . . . . . . . . . 355.4.1 Temporal Co-Occurrence . . . . . . . . . . . . . . . . . . . . . . 355.4.2 Correlation Weighting . . . . . . . . . . . . . . . . . . . . . . . . 37

5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.5.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.5.2 Precision and Recall of Temporal Co-Occurrence Algorithm . . . 38

5.6 Temporal Association Pattern Analysis . . . . . . . . . . . . . . . . . . . 415.7 Behavior Analysis using Markov Chain . . . . . . . . . . . . . . . . . . . 435.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 Platform and Implementation 496.1 Specification of Cluestr . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.1.1 General Usage of the Service . . . . . . . . . . . . . . . . . . . . 496.1.2 Client Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.2 Platform for Mobile Social Applications . . . . . . . . . . . . . . . . . . 506.2.1 Native Mobile Applications . . . . . . . . . . . . . . . . . . . . . 506.2.2 Mobile Web Applications . . . . . . . . . . . . . . . . . . . . . . 506.2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.3 Infrastructure Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.4 Client Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.4.1 Data Handler and Queue . . . . . . . . . . . . . . . . . . . . . . 546.4.2 Background Synchronization Engine . . . . . . . . . . . . . . . . 56

6.5 Server Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.6 Ego-Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.7 Mobile User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.7.1 General Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.7.2 Contact List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.7.3 Cluestr Initialization Process . . . . . . . . . . . . . . . . . . . . 60

6.8 TodayScreen Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7 Conclusion and Outlook 637.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.2.1 Platform for Mobile Social Networking Services . . . . . . . . . . 657.2.2 Mobile Social Networking Services . . . . . . . . . . . . . . . . . 65

Contents v

A Notations 67

B Questionnaire 69

C Task Description 73

List of Figures

3.1 Classification of existing mobile SNS . . . . . . . . . . . . . . . . . . . . 12

4.1 Visualization of a sample ego-graph . . . . . . . . . . . . . . . . . . . . . 154.2 Visualization of a real ego-graph exhibiting characteristic community

structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.3 Visualization of recall, precision and the F-measure . . . . . . . . . . . . 214.4 Comparison of the accuracy of clustering the subject’s ego graph at dif-

ferent stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.5 Time measurement of the group initialization experiment with 4 involved

subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.6 Clusters fall apart by randomly removing links . . . . . . . . . . . . . . 274.7 Effect of clustering a degenerated ego-graph with missing friendship links 284.8 Effect of clustering a degenerated ego-graph with missing contacts . . . 30

5.1 Sample communication stream and dedicated co-occurrence pairs . . . . 365.2 Sample ego-graph retrieved out of the communication stream . . . . . . 385.3 Co-occurrence statistics for an ego-graph . . . . . . . . . . . . . . . . . . 395.4 Plot of precision and recall for three different egos . . . . . . . . . . . . 395.5 Precision and recall with two different timeframe durations . . . . . . . 405.6 Precision, recall and F-measure with variable timeframe duration . . . . 415.7 Weighted co-occurrence statistics for an ego-graph . . . . . . . . . . . . 425.8 Plot of the correlation coefficients . . . . . . . . . . . . . . . . . . . . . . 435.9 Precision and recall comparison of weighted and unweighted co-occurrence

statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.10 Histogram of the communication distribution for each cluster . . . . . . 455.11 weekday histogram of the communication distribution of the ego with

each contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.12 Markov Chain of the communication behavior . . . . . . . . . . . . . . . 47

6.1 Infrastructure of Cluestr . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.2 Device architecture stack . . . . . . . . . . . . . . . . . . . . . . . . . . 556.3 Client-Server sequence diagram for requesting updates by the client’s

downloader worker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.4 Device architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.5 Cluestr user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.6 Screenshot of the friendlist and the profile view . . . . . . . . . . . . . . 606.7 Screenshot of the Cluestr initialization process . . . . . . . . . . . . . . 616.8 Screenshot of the TodayScreen widget . . . . . . . . . . . . . . . . . . . 62

List of Tables

4.1 Clustering accuracy (recall, precision and F-measure) for each subject . 234.2 Comparison of the number of identified clusters . . . . . . . . . . . . . . 24

6.1 Mobile client use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.2 Mobile user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.3 4 different contact selection modes are available for inviting contacts to

participate in a Cluestr . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Abstract

Social Networking Services (SNS) are becoming ubiquitous on the Internet. At currentstage however, there is little scope of usage of SNS in mobile space. This is expected tochange. Upcoming technological and economical drivers will provide a reliable basis forincorporating social networking functionality into mobile devices. The objective of thisthesis is to explore novel concepts in the field of mobile social networking.

A survey conducted at the beginning of this thesis revealed that mobile phones are oftenused for group-based communication with members belonging to the same community.Yet, today’s mobile communication alternatives do not cope with this social behavior.This thesis introduces ’Cluestr’, a social networking service offering enhanced groupcommunication and collaboration functionality aiming to lower the effort for organizing,managing and coordinating group activities.

A key element of Cluestr is a contact recommendation engine which helps a user tofind contacts belonging together faster. To do so, the recommendation engine is ableto automatically classify a user’s contacts into groups representing communities like’university colleagues’, ’coworkers’, ’family’, ’friends’ etc. solely by clustering his ego-graph.

Evaluation on real social networking data and subject questioning revealed that aperson’s contacts can be grouped into communities and that clustering an ego-graphcan extrapolate these communities accurately.

Further experiments showed that recommendation based on community affiliationsignificantly lowers the required time to find contacts. It is also shown that even onincomplete social structures, enough information is available for the recommendationalgorithm to come up with adequate suggestions.

Traditional SNS rely on a centralized infrastructure where a central entity is responsiblefor managing the social network information. An in-depth analysis in the second part ofthis thesis investigates the feasibility of a decentralized approach to determine a person’ssocial network structures. Of interest is, if by mining a person’s communication stream,meaningful predictions about his and his contact’s social roles and relationships can bededuced. With an evaluation using mobile communication logs, it is shown that temporalcommunication patterns can be found which indicate social structures to some degree.

Chapter 1

Introduction and Outline

1.1 Introduction

Social Networking Services (SNS) such as Facebook, LinkedIn and many others are be-coming ubiquitous on the Internet. At current stage however, there is little scope ofusage of SNS on mobile devices. The main obstacles are device and user interface con-straints as well as slow data rates. Recent developments indicated major improvementsto tear down those traditional barriers. With these changes in mind, there is nothingpreventing the success of social networking services on mobile phones in the near future.

Seeing mobile devices as a personal communication tool constantly carried aroundand accessible all the time, we believe that incorporating social networking functionalitycould significantly enhance mobile communication and change the way we communicate.

We believe mobile SNS should not just copy features of their desktop counterpartswhich might have proven to be successful there, but rather take special characteristics ofmobile communication such as mobility and ubiquity into account and focus on mobilecommunication behavior and usage patterns.

1.2 Scope of this Thesis

The objective of this thesis is to illuminate novel concepts in the field of mobile socialnetworking and to explore how future mobile social networking services could incorpo-rate such ideas to provide an enhanced communication experience. In the first part ofthis thesis, a survey focusing on users’ communication behavior will reveal requirementsfor future services out of which we are going to present our own concepts and ideas.A prototype implementation should give insight into appropriate technologies and plat-forms for developing such services. The implementation will also be used to test andevaluate concept developed in the course of this thesis with real users.

Traditional SNS rely on a centralized infrastructure. The previous implementationrelies on a central entity responsible for managing social network information. In asecond part of this thesis, we want to study the feasibility of a distributed approach.We want to investigate and analyze, if social network structures can be determined oneach mobile device independently by mining communication streams.

2 1 Introduction and Outline

1.3 Outline

This document is structured as follows: In Chapter 2 we will highlight recent develop-ments in mobile communication technology and address trends in mobile social network-ing. This background information forms the base for our ideas presented later on. InChapter 3, we are going to present the idea behind Cluester, a mobile SNS for enhancedgroup communication. A key element of Cluestr and a major contribution of this the-sis is an efficient method for a user to invite contacts for group-based communicationby receiving support through a contact recommendation engine helping to find suitedcontacts faster. In Chapter 4, we are going to introduce the algorithm of our contactrecommendation engine. An in-depth analysis of the performance and its applicabilitycompletes this Chapter. Chapter 5 deals with the second main topic of this masterthesis. Here, we are going to analyze a distributed approach. Chapter 6 then explainsthe design of our prototype implementation. We are going to evaluate stat-of-the-artplatforms and explain our preferred choice. Finally, Chapter 7 concludes this thesis andgives an outlook.

Chapter 2

Background

2.1 Ubiquitous Mobile Communication

The number of registered mobile phones in Switzerland recently superpassed the numberof residents. Mobile phones became an important communication device in many aspectsof the daily life. Today’s mobile network infrastructure covers almost every corner ofthe civilized areas and provides instant and ubiquitous access.

The wired telephony network once lowered the communication delay and made spa-cial distance between communicating parties negligible. By shifting to a mobile infras-tructure, the setup time for communicating is being lowered as well: Today’s mobileinfrastructure allows us to reach everyone everywhere all the time - instantly.

2.2 Drivers for the Mobile Internet

With a state-of-the-art mobile phone, we carry a powerful computer in our pocketswhich is able to do far more then offering voice services and sending text messages.However, the way we use a mobile phone for communication did not change a lot overthe past years. We believe that technological as well as economical drivers will force asignificant change in the way we use a mobile device for communication. This shift willbe comparable to what we have experienced in the nineties when the Internet becamepopular. The main technological drivers forcing this shift include:

Migration to an all-IP Infrastructure Mobile network infrastructure migratesfrom a circuit switched network to packet switched all-IP infrastructure, like SIP-basedIMS1 and XML/HTTP based XDMS2. This change has a huge impact on the telecom-munication industry since everyone will be able to offer mobile Internet services on topof the telephone companies’ network infrastructures.

Wireless Broadband Technology Today’s 3G (third generation) mobile networkaccess is ubiquitous. In Switzerland, Swisscom offers HSDPA access with up to 14.4Mbit/s data transfer rate. Over the next years, data rates on mobile networks willincrease significantly as new technologies emerge. This development is an enabler forthe mobile Internet, since fast access with low delay and high data-rate is possible.

1IP Multimedia Subsystem2XML Document Management Server

4 2 Background

New Platforms and Rich User Interfaces New platforms are pushed onto themarket. As examples, Android and the iPhone provide rich user interfaces on big screenswhich are intuitive and simple to use by providing an appealing presentation.A whole ecosystem is built around the devices. Not only software development tools areprovided, but also distribution channels and revenue models.

Sophisticated Browsers Rich web browsers are implemented more and more inmobile phones to enhance user experience while browsing. WebKit browser, currentlyimplemented in Nokia devices, the iPhone and Android, as well as Opera Mobile, imple-mented in HTC Devices and Sony Ericsson phones enable a rich browsing experience.This transforms mobile devices to sophisticated Internet clients and blur mobile anddesktop browsing.

Flat Rate Pricing Unlimited and affordable flat rate pricing will dispose the fear ofreceiving a high bill at the end of the month. Mobile network capacity and competitionamong the telephone companies will force the introduction of such data models in thenear future.

2.3 Towards Mobile Social Services

Due to influences of the previously discussed drivers, state-of-the-art mobile phones startoffering a comparable user experience very much comparable to desktop computers.Thus, we expect to see more and more services expanding their reach into the mobiledomain.

As part of the Web 2.0 movement3, a big trend for online services was the incor-poration of social aspects. Classical social networking services such as Facebook4 orMySpace5 became extremely popular. In most of these social networking services, thegeneral aim is to connect members and offer a platform for interaction in virtual space.Besides classical social networking services, social features found their way into a vari-ety of other services. Amazon6, eBay7 and Digg8 are just a few examples which rely onsocial aspects.

Having access to the Internet on a mobile devices opens the door to offer such serviceson a mobile device. We believe that in mobile space two main developments will makethe mobile Internet successful:

• Location-based Services (LBS)

• Mobile Social Services (MSS)

Several LBS already exist (See Section 3.3) and the penetration of GPS implementedin mobile devices is growing constantly. LBS are not topic of this thesis. We thereforerefer to literature covering this field of research.

Mobile social services exist for some time. However, we believe that currently avail-able implementations do by far not exploit the full potential. This has several reasons.

3http://en.wikipedia.org/wiki/Web2.04www.facebook.com5www.myspace.com6www.amazon.com7www.ebay.com8www.digg.com

2.3 Towards Mobile Social Services 5

A major challenge is their implementation on mobile devices. We are going to coverthis topic in Section 6.2 when we talk about different platforms and implementationissues. Another element preventing existing mobile social network services from beingsuccessful is that they are trying to copy approaches from their desktop counterparts.However, the usage pattern of a mobile phone significantly differs from a desktop com-puter. Simply porting existing services might not necessarily be successful on mobiledevices.

The main question we would like to address in this thesis is how to leverage today’scommunication behavior to a whole new level by introducing social networking aspects.

Chapter 3

Motivation

The technological and economical developments addressed in the previous chapter mo-tivated us to conceive new concepts and ideas for mobile communication. We would liketo explore how future communication systems can profit from social aspects since westrongly believe that the incorporation of social elements will significantly change theway we communicate.

3.1 Survey and Conclusion about Group-Based Com-

munication on Mobile Devices

To find out how mobile phones are used and what social networking services couldcontribute, we made a survey and questioned a total of 342 people. The participantscan be divided into three groups:

1. 193 students from the faculty of Electrical Engineering at the ETH

2. 136 personal contacts and contacts of their contacts

3. 13 employees of Swisscom Innovations

The questionnaire, which can be found in Appendix B, tends to investigate the currentusage of mobile phones for group communication and collaboration in mobile space.

Although the three groups are rather heterogeneous, the outcome of the survey showsa lot of similarities. Therefore, if not mentioned otherwise, the results of the differentgroups are combined.

3.1.1 Results

The outcome of the survey can be summarized into five statements:

1. The acceptance and usage of social networking services such as Facebook or Xing1

is widespread. Only 9% of the participants are not registered in any social net-working service. Among those who are subscribed to at least one service, 59% useit at least once a week and 19% even on a daily base.

1www.xing.com

8 3 Motivation

2. Most participants agreed on the fact that their contacts stored in the mobilephone’s address book can be grouped into communities such as ’university col-leagues’, ’coworkers’, ’family’, ’friends’ etc., and that communication often occursamong the members of a certain community simultaneously. However, althoughfunctionality for grouping contacts on mobile phones exist, they are hardly used.Only 16% use the built in grouping function to assign contacts to groups.

3. 68% of all participants use the feature to send text messages to multiple receivers.It is being used for different tasks including holiday greetings, invitations, schedul-ing meetings, event organization and polls.

4. The conference call feature is not often used in daily life. 11% claim to use it on anon-regular basis. Only one person claimed to use it regularly. The main purposefor using this feature are business meetings.

5. To the question how the mobile Internet is used, 74% of non-Swisscom employees(Group 1 and Group 3) said that they do not use it at all. Mostly because eithertheir mobile phone does not support displaying rich content or because it is tooexpensive.Part of the phone bill of the Swisscom employees is paid by Swisscom. Therefore,these participants don’t worry about the cost too much. This has a strong effect.In this group, 54% use the mobile Internet regularly and 33% occasionally. Themain purpose is sending and receiving E-mails, followed by checking timetables,news sites, Wikipedia, etc.

3.1.2 Conclusion

Based on the results of our survey, we conclude that:

1. Messaging is widely accepted as a tool for communication. Occasionally, messagesare sent to multiple receivers.

2. The mobile phone is often used for organizing and coordinating activities amongmultiple people e.g. for deciding about a meeting point, time and the activity.

3. Group communication is often performed with members of a community existingin real life such as members of a sport team, coworkers, class mates or family.

Today’s mobile communication alternatives do not cope with this social behavior. SMStext messages can be sent to multiple receivers simultaneously but have severe limi-tations for efficient group communication. When user A sends a message to five otherusers and one of them, let us assume user B, replies, the other four do not get theanswer from B. SMS therefore only offers a 1 : N way of communication. For efficientgroup communication, a N : N solution is required, where everyone can send messagesto everyone else and receives all the messages from the others.

The conference call feature offers such functionality for voice communication but isinfeasible since it requires synchronous communication where all the participants haveto be present (at the phone) at the same time in order to receive the information. E-mail could offer such possibilities. It is however not very successful on mobile devices inEurope due to lack of integration in current devices.

We believe that for efficient group communication and collaboration a new form ofasynchronous communication is required with focus on enhanced intragroup interaction.We are going to present our ideas in the next section.

3.2 Specification of a new Mobile Group Communication Service 9

3.2 Specification of a new Mobile Group Communi-

cation Service

Based on the survey and our conclusion, we are going to specify a new communica-tion service which focuses on group communication. We are going to name our serviceCluestr, the fusion of cluster and clue. The name should indicate the possibility to col-laborate in small groups by using this service. These small groups are called Cluestrs.A user can participate in multiple Cluestrs at the same time.

3.2.1 Idea

The key conclusion drawn from the survey is the need for an efficient group communica-tion service in mobile space. This service should enhance communication within a groupand in particular should lower the effort to organize, manage and coordinate group ac-tivities like going out in the evening with friends or arranging meeting point and timefor an upcoming football match with the team mates. As a further enhancement, wewant to incorporate collaboration capabilities that can be used among members of agroup.

3.2.2 Use Case

To illustrate the idea on how to use our service, we want to give an example of a typicalsituation where Cluestr could be useful:

Every Saturday, a local football team has a match against a rivaling team.The game is either held on the pitch at home or external. The team captaincan initiate a Cluestr and invite all team mates. On a billboard, team matescan inform the others whether they will join for the game or not and discussabout the meeting point. Using the poll function, they can vote for theperson who has to be the chauffeur and drive to the game. Using the ToDolist, the team manages the logistics for the BBQ afterwards. Everyone cantick what he will contribute to the buffet.

3.2.3 Requirements

Based on the use case, we propose the following requirements for our service:

Social Networking Functionality The service should be based on a social network-ing platform which provides basic functionality as known from similar services:

• Building a social network by connecting to other members.

• Sending messages to other members.

• Having a profile with personal information and status information. This informa-tion can be browsed by other friends.

10 3 Motivation

Group Communication and Collaboration The strength of our service lies ingroup communication and collaboration. We want functionality to be able to establishgroups and invite other users to participate. Participants in a group can profit fromintragroup communication and collaboration tools. Most importantly, a thread-like bill-board should be available, where everyone can post messages and read those. In addition,collaboration tools are available: A poll, where participants can vote and a ToDo listwhere participants can add elements or mark elements as accomplished will be availablefor each group separately.

Mobility and Usability Since this service should run on mobile devices, severalcriteria have to be met in terms of usability:

• Very simple and intuitive user interface for efficient usage

• Registration to the service should not be required for participation in a group.However, only registered users will be able to have a profile, are able to establishtheir social network and can initiate a group.

Efficient Group Initialization Process Most importantly, and here lies a maincontribution of this thesis, we want to offer an efficient way to establish a group and toinvite contacts to participate. The least possible user effort should be needed therefore.Selecting participants form an alphabetically ordered contact list is not an intuitivemethod. Our approach, a contact recommendation engine, is presented in Chapter 4.

3.3 Existing Services

A variety of mobile SNS already exist and only a few are presented here. We mainlyfocus on services competing with Cluestr.

3.3.1 Mobile Version of classical SNS

Recently, classical social networking services started to offer mobile clones of their desk-top counterparts. The services are accessible through the devices’ built-in browser andare optimized for mobile phone screens. Facebook Mobile2 and MySpace Mobile3 arejust two representatives of this class. Compared to their desktop version, the function-ality is limited. Only the most important features are available including messaging,status updates and news feeds to capture recent activities of friends. The big advantageof these services is that they can profit from an already well established social networkwhich is not available to new mobile SNS. New services have to build their own socialnetwork from scratch.

3.3.2 Location Based Services

The recent emergence of location-based mobile social networking services offered byproviders such as GyPSii4, Pelago5 and Loopt6 allow users to share real-life experiences

2m.facebook.com3mobile.myspace.com4www.gypsii.com5www.pelago.com6www.loopt.com

3.3 Existing Services 11

via geo-tagged user-generated multimedia content, exchange recommendations aboutplaces, identify nearby friends and set up ad hoc face to face meetings.

3.3.3 Group Communication and Collaboration Services

Some services already exist which offer group communication and collaboration features.

Jaiku7 Jaiku is a social networking, micro-blogging and lifestreaming service com-parable to Twitter8. Jaiku is compatible with Nokia S60 platform mobile devices. Thesoftware allows users to post messages onto their Jaiku page. One of the main differ-ences between Jaiku and its competitor Twitter is Lifestream, an Internet feed thatshares users online activities utilizing other programs such as flickr for photos, last.fmfor music, and location by mobile phone. Users can find friends an get their latest up-dates. They can comment, see their availability, location, and calendar events if theyhave Jaiku Mobile on their phone.

Zingku9 Zingku aims to make it easier for people to share photos, send invitationsor conduct polls among friends via mobile phone. On the mobile phone, Zingku usesstandard text messaging and picture messaging features that come with every phone.On the web, the service uses standard web browser and instant messenger.

Groovr10 Groovr is a platform allowing users to stay in touch and share contentamong friends, straight from their mobile phone. Groovr lets users broadcast messagesphotos, videos or presence information to their friends. Friends will receive notificationsand can reply to them through their mobile device.

Limbo11 Limbo amplifies the social lives of its members by making it easier to con-nect with friends in the real world. In addition to finding people nearby, members canalso find people doing specific things, such as drinking at a nearby pub, playing bas-ketball, or shopping at the mall. The idea is to combine ’what’, ’who’ and ’where’, tocreate more social opportunities for members. Limbo also offers a group messaging toolwhich enables threaded chat with anyone, on any mobile phone or network.

SLAM12 Slam is a Microsoft Research Community Technologies Group researchproject. It is a mobile application that enables group-centric real-time communication,location awareness and photo-sharing. The core concept behind Slam is a “Slam”, agroup of people with whom users can exchange messages and photos. When a usersends a message in Slam, it is automatically sent to everyone in the group. For smart-phone users that have the Slam client installed, their phone will buzz and they will havean indication on their phone’s home screen that there is a new message. For SMS users,they will receive the message as an SMS from the Slam server. Like smartphone users,SMS users can be members of multiple slam groups. This is only a short overview ofmobile SNS services. Many more exist addressing similar functionality.

7www.jaiku.com8www.twitter.com9www.zingku.com

10www.groovr.com11www.limbo.com12www.msslam.com

12 3 Motivation

3.3.4 Summary

The common denominator of most mobile social network services are ported from clas-sical social network services. This includes

• establishing friendship connections

• creating user profiles and browsing other friends’ profiles

• messaging and commenting

• sharing and distributing multimedia content

Figure 3.1 illustrates how existing services can be classified in terms of their socialnetworking features, group communication functionality and collaboration tools.

FacebookMySpace

Jaiku

Limbo

Groovr

Zingku

Collaboration Group Communication

Classical Social Network Service

Features

Figure 3.1: Classification of mobile SNS in terms of their SNS features, group commu-nication and collaboration functionality

None of the existing mobile SNS offers an appropriate platform for group basedcommunication and collaboration in mobile space.

In the next section, we are going to present an a contact recommendation engine whichsupports the initiator of a new group by suggesting suited contacts. In Chapter 5 weinvestigate the feasibility of a decentralized approach. In Chapter 6 we present theplatform used to implement Cluestr.

Chapter 4

User Recommendation based

on Social Graph Clustering

4.1 Introduction

The ability to communicate and collaborate with contacts in small groups is the keyidea of Cluestr. Such groups are established by users directly on the device. Everyoneusing the service is able to do this. The person establishing a new group then becomesthe initiator of this group and is able to invite contacts to participate.

Cluestr is designed to run on mobile devices. Thus, inviting contacts should be keptas simple and intuitive as possible. Selecting participants from a contact list in alphabeticorder is inefficient since the desired contacts are in general randomly distributed over thewhole range. We would like to offer a more efficient method. In this chapter we are goingto present a recommendation engine able to support the initiator by suggesting suitedcontacts for invitation. For better understanding, we would like to give an illustration:

If the initiator of a new group invites contact A for participating, the engineproposes that contacts B, D and E might also fit into this group but notcontact C. The initiator decides to invite E as well. With this informationand its knowledge, the engine assumes that the initiator is not interested inD and will only recommend B.

We have to define criteria according which the engine comes up with recommendations.The principles behind our recommendation algorithm is based on the outcome of oursurvey: The initiator belongs to different communities and often communicates withmembers of one community at the same time. Our recommendation engine is able todetect a user’s communities and able to to assign contacts to these communities. Now, byestablishing a new group, the recommendation engine ranks contacts according to theircommunity affiliation and previously selected contacts which leads to a recommendation.

One could think of different approaches to conclude the community affiliation of theinitiator’s contacts:

• Tagging of contacts and manual grouping

• Content analysis of the communication

• Social graph topology

14 4 User Recommendation based on Social Graph Clustering

Tagging and manual grouping of contacts requires a large managing and maintainingeffort by a user and is therefore undesired. This feature is already implemented in manymobile phones but hardly used according to our survey. We want our recommendationengine to be able to recommend prospective participants with the least possible usereffort.

Another approach to group contacts could be by analyzing content of the communi-cation and group the contacts around topics. [1] and [2] follow this idea. However, a lotof communication is voice based. Concluding relevant content information using voicerecognition is hard to achieve on mobile devices.

The approached followed in the course of this thesis is based on the relationshipamong contacts. This information can be used to find contacts belonging together.Relationship information is gathered following a ’social networking’-idea. In a socialnetworking service, users link each other to indicate a relationship. This leads to a rela-tionship model in the form of a network with the users as nodes connected through tiesinterpreted as relationships. We designed our recommendation engine to require onlythis network information to find community affiliation of contacts and use this informa-tion to deduce a recommendation.

In the next section we start by giving an overview of related work. In Section 4.3 we in-troduce the social graph and common metrics which we are going to use throughout thisthesis. We will then present our algorithm followed by an evaluation of the performanceand a conclusion afterwards.

4.2 Related Work

Social network structures are analyzed to gain insight into different aspects. For instance,important actors in the network, those with the most connections, or the greatest influ-ences [3, 4], can be found. Alternatively, it may be the path connecting actors that is ofinterest. Analysts may look for the shortest paths [5], or the most novel types of con-nections. An emerging field of research in social network analysis is interested in findingsubsets of the network that are especially cohesive in order to find actors belongingtogether [6, 7, 8, 9, 10, 11, 12, 13].

Knowledge of social networks is useful in various application areas. In law enforce-ment concerning organized crimes such as drugs and money laundering [5] or terrorism[14, 15], knowing how the perpetrators are connected to one another would assist theeffort to disrupt a criminal act or to identify additional suspects. In commerce, viralmarketing exploits the relationship between existing and potential customers to increasesales of products ans services [3, 4]. Members of a social networks may also take advan-tage of their connections to get to know others, for instance through web sites facilitatingnetworking or dating among their users [16].

4.3 Social Network as a Graph Model

In this section we are going to introduce the social network as a graph model and presentcommon metrics.

A social network describes a group of social entities and the pattern of inter-relationshipamong them. What the relationship means varies, from those of social nature, such askindship or friendship among people, to that of transactional nature, such as trading

4.3 Social Network as a Graph Model 15

relationships between countries. Social networks share a common structure in which so-cial entities, named actors, are inter-linked through units of relationships between a pairof actors known as tie, interlinkage of simply links. By representing actors as verticesand ties as edges, social networks can be represented as a graph.

4.3.1 Social Graph

Social networks can be interpreted as a graph G = (V, E), where the set V of verticesrepresents actors and the set E of edges represents links between actors. We assumethat all edges are undirected and, if not mentioned otherwise, of weight 1. In a socialgraph, a link between two actors is denoted as ’friendship’ or ’relation’, meaning thatboth of these two actors deliberately accepted this interlinkage.

4.3.2 Ego-Graph

We define an ego-graph to be a graph consisting of a single actor of interest (ego E)together with the actors the ego is are connected to (alters, contacts Ci) and all thelinks among those alters. Such graphs are also known as the first order neighborhoods ofego. Figure 4.1 illustrates a sample ego-graph with the ego E in the center surroundedby its contacts C1 to C10.

C1

EC3

C2

C4

C7

C5

C6

C8

C9

C10

Figure 4.1: Visualization of a sample ego-graph

4.3.3 Metrics

In the field of social network analysis, several measurements are widely accepted through-out literature. Our recommendation algorithm relies on such measurements. We are go-ing to introduced the most important metrics. If not stated otherwise, we follow thedefinition of Brandes et Al. introduced in [17].

4.3.3.1 Length of a Path

A path from s ∈ V to t ∈ V is defined as an alternating sequence of vertices and edges,beginning with s and ending with t, such that each edge connects its preceding with itssucceeding vertex. The length of a path is the sum of the weights of its edges. We use


dG(s, t) to denote the distance between vertices s and t, i.e. the minimum length of anypath connecting s and t in G.

4.3.3.2 Centrality Measurement

Several measures capture variations of a vertex’s importance in a graph. Let σst = σts

denote the number of shortest paths from s ∈ V to t ∈ V . Let σst(v) denote the numberof shortest paths from s to t that some v ∈ V lies on. An essential measurement for theanalysis of social networks are centrality indices. They are designed to rank the actorsaccording to their position in the network and can be interpreted as the prominenceof actors embedded in a social structure. Many centrality indices are based on shortestpaths, e.g. to measure the average distance from other actors, or the ratio of shortestpaths an actor lies on. The following are standard measures of centrality:

Definition 1 (Closeness Centrality) Closeness centrality is based on geodesic dis-tance. It measures how far away a node is from all other nodes. Closeness is preferredin network analysis to mean shortest-path length, as it gives higher values to more centralvertices.

CC(v) =1

∑

t∈V dG(v, t)(4.1)

After Sabidussi, 1966 in [18].

Definition 2 (Betweenness Centrality) The betweenness of edge v, CB(v), is de-fined as the number of shortest paths, between all pairs of vertices, that pass along v.A high betweenness means that the edge acts as a bottleneck between a large number ofvertex pairs and suggest that it is connecting different clusters.

CB(v) =∑

s6=v 6=t∈V

σst(v)

σst

(4.2)

After Freeman, 1977 in [19].

4.4 User Recommendation Algorithm

In this section we are going to introduce the concepts behind our recommendationengine. We start by highlighting the idea which then leads to the algorithm.

4.4.1 Idea

One of the key conclusion that can be drawn from the survey is the fact that a user’scontacts can be assigned to various communities and that group communication isoften performed with members of such a community or a subset of it. By initializing anew group, our recommendation engine tries to detect the community the initiator isinterested in by analyzing the affiliation of already chosen contacts and then recommendsother contacts possessing comparable community affiliation. To achieve this goal, therecommendation engine needs to know

• The different communities the initiator belongs to

• Which of the initiator’s contacts belongs to which communities

4.4 User Recommendation Algorithm 17

Our algorithm deduces this information from the initiator’s ego graph.

Figure 4.2 shows the topology of a real ego-graph retrieved from Facebook. (Inthis figure, the ego and the ego-alter ties are not displayed for simplicity). Taking thisIllustration as an example, ego-graphs exhibit a characteristic structure. Examiningtheir topology reveals that a selection of alters are densely connected and may thereforeform an almost complete subgraph. In contrast, some alters possess only sparse linkageamong themselves.

Figure 4.2: Visualization of a real ego-graph exhibiting characteristic community struc-tures. In this figure, the ego and the ego-alter ties are not displayed for simplicity

The existence of such dense and sparse regions can best be explained by taking socialcharacteristics of communities into account. Usually, all members of a community knoweach other and therefore share a relationship with each other. In the social graph, thisis represented by a tie between nodes. On the other side, members of one communitymay not know members of other communities which the ego also belongs to. As aconsequence, only a few (if any) alters of one community are connected with altersbelonging to other communities.

Therefore, by detecting dense regions in the ego-graph, communities and their mem-bers can be found.


Such dense regions are often referred to cluster. We define a cluster in the followingway:

Definition 3 (Cluster) A cluster is a subgraph such that the density of edges withinit (intra-cluster edges) is larger than the density of edges connecting vertices outside thecluster (inter-cluster edges).

In the course of this thesis, we will refer to the term cluster as a set of vertices algo-rithmically deduced to represent a community. The aim is to have clusters as congruentas possible to real communities.

With this, we present the procedure of our recommendation engine:

1. Detect hidden community structures by finding clusters in an ego-graph

2. Assign contacts to these clusters

3. Rank contacts based on previously selected contacts and their cluster affiliation.

4. Recommend contacts to the initiator

5. Continue with Step 3 after the initiator has selected a next contact to invite

4.4.2 Graph Clustering

As stated in the previous section, a fundamental requirement for our recommendationalgorithm is the ability to detect clusters. Cluster affiliation is important for an adequaterecommendation. Clustering a graph is a broad research topic and a wide range ofalgorithms have been developed to do so, including [8, 20, 9, 21, 22].

A well known algorithm for finding community structure in a graph is Girvan andNewman’s (GN) algorithm presented in [9, 22], based on betweenness centrality measure[19]. The algorithm is computationally complex with O(m2n), however it does giverelatively good results. The algorithm works as follows:

1. Calculate betweenness centrality of all edges in a graph

2. Find edge with highest betweenness centrality and remove it

3. Recalculate betweenness centrality for all remaining edges

4. Repeat from step 2 until no edges remain

This is a hierarchical, divisive, clustering algorithm. After one or more iterations, remov-ing an edge causes the graph to split into two clusters. As further edges are removed,each cluster again splits, until n singleton clusters remain. In order to find an optimalclustering stage, Newman proposed modularity as an optimization criterion for graphpartitioning into clusters [8]. This modularity measurement is introduced in Section4.4.2.1.

The GN algorithm assumes that communities are disjoint, placing each node in one clus-ter only. However, in social networks, communities often overlap. An alter may belongto multiple groups and therefore should be found in multiple clusters as well. Severalalgorithms for clustering a graph into overlapping clusters exist [6, 23, 10, 11, 12, 13].The CONGA algorithm presented by Gregory in [6] is a hierarchical, divisive algorithm,based on Girvan and Newman’s algorithm but extended to allow overlapping clusters.

4.4 User Recommendation Algorithm 19

In addition to the GN algorithm, CONGA introduces the split betweenness with whicha decision can be made whether to cut a link between two nodes because they belong totwo different clusters or to split a node and assign it to both clusters. The CONGA al-gorithm comprises a sequence of steps. Each iteration removes an edge from the networkor splits a vertex into two vertices:

1. Calculate edge betweenness of edges and split betweenness of nodes.

2. Remove edge with maximum edge betweenness or split nodes with maximum splitbetweenness, if greater.

3. Recalculate edge betweenness and split betweenness.

4. Repeat from Step 2 until no edges remain.

Initially, the graph is treated as a single cluster, assuming it is connected. Eventually,Step 2 causes the cluster to split into two clusters. Clusters continue to split untilonly singleton clusters remain. If a node v splits into multiple nodes which are thendistributed between the clusters, the interpretation is that v occurs in all these clustersand therefore belongs to multiple clusters.

In Section 4.3.3.2, the definition of edge betweenness of an edge e is defined asthe number of shortest paths between all pairs of vertices that pass along e. The splitbetweenness of a vertex v is the number of shortest paths that would pass between thetwo parts of v if it were split. In general, there are 2d(v)−1 − 1 (where d(v) is the degreeof v) ways to split v into two. The split that maximizes the split betweenness is chosenas the best split. In [6] an approximate algorithm for calculating split betweenness of anode is given. To calculate the split betweenness, a greedy method is used with whichit only takes d(v) − 2 steps to find a good split. This algorithm is not guaranteed tofind the best split. However, the greedy method is much more efficient and, in practice,usually finds the best split or a close approximation to it.

The GN algorithm has a worst-case time complexity of O(m2n), where m is thenumber of edges and n is the number of vertices. Each vertex v splits into an averageof up to m/n vertices, so the number of vertices after splitting in O(m) instead of n.This makes the time complexity O(m3) in the worst case: there are O(m) iterations,and both, Step 1 and Step 3 are O(m2).

In practice, the speed depends heavily on the number of vertices that are split andon how easily the network breaks into separate components. This is because in Step3, betweenness needs to be calculated only for the component containing the removededge or split vertex, or for both components if Step 2 caused the component to split.

4.4.2.1 Modularity

In order to find optimal clustering stage in a hierarchical algorithm, a measurement isrequired that expresses how good a given partition of a graph into clusters is. A simpleapproach that has become widely accepted was proposed by Newman in [8]. It is basedon the intuitive idea that random networks do not exhibit community structure. Let usimagine that we have an arbitrary network and an arbitrary partition of that networkinto nc clusters. It is then possible to define a nc × nc size matrix e where the elementseij represent the fraction of total links starting at a node in partition i and ending ata node in partition j. Then, the sum of any row of e, ai =

∑

j eij corresponds to thefraction of links connected to i.


If the network does not exhibit cluster structure or if the partitions are allocatedwithout any regard to the underlying structure, the expected value of the fraction oflinks within partitions can be estimated. It is simply the probability that a link beginsat a node in i, ai, multiplied by the fraction of links that end at a node in i, ai. Sothe expected number of intra-cluster links is aiai. On the other hand we know that thereal fraction of links exclusively within a partition is eii. So, we can compare the twovalues directly and sum over all the partitions in the graph. This measurement is knowas modularity.

Definition 4 (Modularity) The modularity of a partition is the amount by which thenumber of edges between vertices in the same subset exceeds the number predicted by thedegree-distribution preserving random graph model of Chung [24].

Q =∑

i

(

eii − a2i

)

(4.3)

4.4.3 Recommendation Algorithm

By automatically detecting dense structures in the initiator’s ego-graph we extrapolatehis real-life communities such as class mates, work colleges, family members etc.

The chance that if the initiator selects one member of a community, other membersof the same community will also be invited, is high. Therefore, the recommendationengine will suggest other members of the same cluster as previous selection. If a selectedcontact belongs to more than one cluster, the idea is simple to propose all members of allclusters the selected contact belongs to, since further information about the initiator’spreference is not known. In addition, contacts sharing more than one cluster with theselected ones, should receive additional promotion. This produces a ranking and we canprovide a recommendation to the initiator. The ranking is done by giving each contactone point for every cluster he shares with previously selected contacts.

After the initiator selected a next participant, the remaining nodes are ranked againafter the same criteria. Iterating this step after each selection will strengthen the pro-posal of members belonging to a cluster where the initiator has already selected par-ticipants from. This leads to a more accurate recommendation after each iteration. Ofcourse, at the beginning when no contact has been selected, no recommendation can begiven. Then, contacts are presented in another form, e.g. in alphabetic order.

4.5 Evaluation

In this section we are going to evaluate our recommendation engine. We decided to dothis in multiple steps to evaluate different aspects consecutively.

The first evaluation investigates the performance of clustering a social graph. Here wewould like to evaluate if the clustering algorithm is able to reveal real community struc-tures in ego-graphs. In a next stage, we evaluate the recommendation engine and verifyif it can bring any advantage to initiate groups. In the last part, we study the stabilityof our recommendation engine. In particular, we investigate, whether recommendationis accurate even if a user’s social graph is a small subset of his real life social network.

4.5 Evaluation 21

4.5.1 Clustering Accuracy

A common way to evaluate graph clustering algorithms is by using networks based onknown community structure and comparing the known communities with the clustersfound by the algorithm. The comparison can be done in various ways including theMutual Information measure [25] and Random Index [26]. Gregory used in [6] the F-measure to evaluate the CONGA algorithm.

Definition 5 (F-measure) The F-measure is defined as the harmonic mean of recalland precision:

F = 2precision × recall

precision + recall(4.4)

where recall and precision are defined as follows:

Definition 6 (Recall) The number of vertices in a cluster that also belong to the samecommunity divided by the total number of vertices in the community.

Definition 7 (Precision) The number of vertices in a cluster that also belong to thesame community divided by the total number of vertices in the same cluster.

Figure 4.3 illustrates recall, precision and the F-measure by giving a simple example.

Cluster i Community j

A

BC ED

FA

B

D

Recallij = {A, B, D} / {A, B, C, D, E} = 3 / 5

Precisionij = {A, B, D} / {A, B, D, F} = 3 / 4

F-Measureij = 3 / 4

Figure 4.3: Visualization of recall, precision and the F-measure

An in-depth evaluation of CONGA is given in [6] and [23]. We, however, are especiallyinterested in the performance on social graphs and in particular on ego-graphs.

Evaluating an algorithm on real social graphs is challenging. We are not able to char-acterize the performance of the algorithm without knowledge of the correct communitystructure, the ground truth.

Therefore, for proper analysis, we not only require information about the socialgraph but also about existing communities. This information can only be retrieved bysubject questioning. We present an experiment on real social networking data with aknown community structure based on subject questioning for evaluating the clusteringperformance and the accuracy of our recommendation algorithm.

4.5.1.1 Facebook Dataset

For our analysis, we require a realistic and meaningful data set. We decided to focus onFacebook data due to several reasons:


• Facebook is considered to be one of the biggest social networking services. Manyusers on Facebook are registered for several month or even years. Therefore, thesocial graph is well established and stable. This means that registered people havealready found and connected to most of their friends which are also registered onFacebook.

• We assume that the overall structure and especially the clustering characteristicsof Facebook ego-graphs are comparable to those of the address book in people’smobile phones.

We wrote a Facebook application to retrieve a user’s ego-graph and stored it in adatabase. For this, we grab a user’s friendlist and store ID, firstname and lastname ofeach friend. In a second step, these friends were checked if a friendship connection existsbetween any pair of them. This provides us with the required information to build theego-graph of a user. In addition, the users were interviewed and asked to group theirfriends into communities and name them. This data forms the ground truth and we cannow evaluate the performance of our algorithm.

A total of four subjects participated in the experiment, each of which having aFacebook account with between 59 and 151 friends. The ego-graph of each subject wasretrieved. In addition, the subjects were asked to manually assign their Facebook friendsto an arbitrary amount of communities. This classification represents the ground truthagainst which the clustering algorithm has to persist.

4.5.1.2 Clustering Accuracy

In an experiment, we investigate the accuracy and applicability of clustering algorithmsfor partitioning an ego-graph into communities. For evaluation, we compared the clus-tering result of CONGA to the manual grouping. As a meaningful measurement theF-measure was chosen. The amount of clusters found by CONGA might differ from theamount of communities defined by the subject. Therefore, a direct one to one mappingis not necessarily possible.

The following procedure has been applied to calculate the F-measure:Given are n communities (O1 . . . On) identified by the subject and m clusters (C1 . . . Cm)found by the algorithm. Each Ci is compared to all Oj . Nodes simultaneously showingup in both clusters are counted. This gives a similarity measurement with which recall,precision and F-measure can be calculated according to 4.5 and 4.6.

recallCi=

# similarity pairs in Oj and Ci

# nodes in Oj

(4.5)

precisionCi=

# similarity pairs Oj and Ci

# nodes in Ci

(4.6)

The Oj which maximizes the F-measure is chosen as the representative community forcluster Ci. This is done for all clusters Ci. Averaging yields a global precision and recallvalue for the clustering performance. Table 4.1 shows the accuracy of clustering oursubjects’ ego-graph at maximized modularity.

The average F-measure value of 0.82% is promising. Still, we do not know if modu-larity is an appropriate optimization criteria for finding an adequate graph partitioning.To investigate if modularity maximization also maximizes the F-measure, we clusteredeach ego-graph until singleton clusters remained. At each step, recall, precision and the

4.5 Evaluation 23

Subject Recall Precision F-measureSubject 1 0.94 0.96 0.95Subject 2 0.78 0.87 0.82Subject 3 0.80 0.79 0.80Subject 4 0.78 0.67 0.72Average 0.83 0.82 0.82

Table 4.1: Clustering accuracy (recall, precision and F-measure) for each subject bycomparing the algorithmically clustered ego-graph to manually identified communities.

resulting F-measure were calculated. A plot of clustering stage 4 to 50 is given in Fig-ure 4.4. The vertical line indicates the cluster stage at which maximal modularity isachieved. Besides Subject 4, CONGA found the amount of clusters that optimize theF-measure. For Subject 4, it differs by only one cluster.

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1Subject 1

Clusters

RecallPecisionF−measure

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1Subject 2

Clusters


0 10 20 30 40 500

0.2

0.4

0.6

0.8

1Subject 3

Clusters


0 10 20 30 40 500

0.2

0.4

0.6

0.8

1Subject 4

Clusters

RecallRecisionF−measure

7 5

5 5

Figure 4.4: Comparison of the accuracy of clustering the subject’s ego graph at differentstages. The vertical line indicates the clustering stage with maximal modularity.

The number of detected cluster by the algorithm might differ from the amount ofidentified communities by a subject. To gain insight why this may have happened inour experiment, we asked the subjects to name their communities and also give namesto clusters recognized by the algorithm. We are now able to compare the two outcomesand give a qualitative interpretation. Basically two effects caused the number of clustersto differ:


• Combining two independent groups of friends to one community which do notshare interlinkage are detected as different clusters by the algorithm.To give an example, one subject put all friends she got to know during her intern-ship into one community. However, this community was recognized as two clusterscontaining flat mates she was living with in one cluster and the other containingco-workers. There were no ties connecting these two groups.

• Two communities were discovered as only one when the interlinkage betweenfriends was too high to separate them. This may happen when one communityis a subset of another one.To give an example, a subject put her friends from university into one groupand defined a second community with friends she knows from a student organiza-tion. Members of this organization however go to the same university and possessfriendships with other students not in the organization. This made it impossibleto separate the two communities and only one cluster was detected.

Communities Clusters (max Mod) Clusters (max F-measure)Subject 1 7 7 7Subject 2 7 5 5Subject 3 4 5 5Subject 4 7 5 4

Table 4.2: Comparison of the number of identified clusters by subject questioning andalgorithmically.Communities: Amount of clusters identified by a subject.Cluster (max Mod): Amount of clusters found at maximal modularity.Cluster (max F-measure): Amount of clusters that resulted at highest F-measure.

Table 4.2 summarizes the results of our experiment. The first column is the amount ofcommunities identified by a subject. In the second column, the amount of clusters foundat maximal modularity is given. The third row represents the number of communities atmaximal F-measure. The partition at maximal F-measure yields the best possible resultin a clustered graph using CONGA. The fact that the number of clusters at maximalmodularity does not vary too much from the optimal cut convinced us to see modularityas an appropriate optimization criteria.

The achieved F-measure values by automated clustering (See Table 4.1) are promis-ing especially by taking the two previously causes for non-congruency into account.Therefore, CONGA is suitable for detecting communities in an ego-graph.

4.5.1.3 Performance of the Recommendation Engine

In the previous section, we showed that clustering an ego-graph using CONGA andmaximized modularity results in an adequate estimation for detecting communities andtheir affiliated contacts. In this section we are going to evaluate if recommendationbased on graph clustering is able to improve the group initialization process. We want toanalyze the performance of our recommendation engine. For this, we asked our subjectsto establish groups using Cluestr and invite friends to participate. Three different groupshad to be established according to different tasks:

4.5 Evaluation 25

1. “You would like to invite all members of a community of your choice for a BBQ.”

2. “Invite some of your friends from one community of your choice to watch a movieat your home.”

3. “Invite some of your friends for a weekend trip in the mountains.”

Each of these tasks faces increasing challenges to the recommendation engine. The firstgroup consist of all members of one community. In the second group, only a part of acommunity’s members should be invited. The last group includes friends regardless oftheir community affiliation.

This experiment was performed on a mobile device1 using the Cluestr platform describedin Chapter 6. The subjects’ ego-graphs from Facebook were used as the data set for thisexperiment.

Each task had to be solved following the same procedure: In a first step, the subjectswere asked to write on paper all participants they wish to invite. Then, these participantshad to be invited using Cluestr. Each subject had to perform the invitation procedureon the device three times. Once, the subject was shown an alphabetic list of his/herfriends, once he/she had to establish the group using the cluster view, where friendsare grouped according to the detected clusters and once the subject had to choose theparticipants based on recommendation. During the experiment, the time required toinitiate a group was measured.

Figure 4.5(a) shows the average time required to select a participant for each case andeach selection mode. In Figure 4.5(b), the mean values of all subjects combined areplotted.

We conclude from the results that the recommendation algorithm performs stronger,the more community-centric the selection of friends is. The more randomized the se-lected contacts are (in terms of community affiliation), the more challenging it gets forthe recommendation engine to come up with adequate suggestions. In a completely ran-domized selection, as evaluated with Case 3, recommendation performance falls apart.

During our experiment, whenever the recommendation was inappropriate, the sub-ject could switch to either the alphabetic or clustered list to select a next participant.The less community focused the group was, the more frequent a subject had to switchto these selection modes in order to find contacts. However, the outcome of our surveyindicated that most groups are established with contacts that form a community in reallife. Therefore, Case 1 and Case 2 are more realistic.

These results show that both, listing contacts according to cluster affiliation as well asrecommendation-based selection results in time saving. Contacts can be found fasterand in a more convenient way. In our experiment, the average selection time per userin Case 1 could be cut in half from 12.2s to 6s by using our recommendation engine.In cases where a selection and not the whole community is selected, reduction of 23%from 11.5s to 8.9s could be achieved. Besides the improvement in time, the subjectsalso mentioned that the recommendation itself helps not to forget any participants forinviting.

1HTC Touch Cruiser


1 2 30

5

10

15Subject 1

Case

t/con

tact

[s]

AlphabeticClusteringRecommendation

1 2 30

5

10

15Subject 2

Case

t/con

tact

[s]


1 2 30

5

10

15Subject 3

Case

t/con

tact

[s]


1 2 30

5

10

15

20

25Subject 4

Case

t/con

tact

[s]


(a) Time required to initiate a group by each subject for each case.

1 2 30

5

10

15Average

Case

t/con

tact

[s]


(b) Average time required to initiate a group for eachcase.

Figure 4.5: Time measurement of the group initialization experiment with 4 involvedsubjects. A total of three different experiments (Case 1 - 3) were run, each of whichincluded three different selection modes: choosing contacts from alphabetic list, fromclustered list and based on recommendation.

4.5.2 Clustering Stability

The ego-graphs retrieved from Facebook are considered to be well established and there-fore stable. This means that people who are registered on Facebook are already con-nected with most of users they know and want to be connected to and, for one ego,a significant amount of friends are registered on Facebook. We do not expect to seerapid changes of the network structure for one user. This is not given for new SNS. Byestablishing a new social service, only a few people will be registered in the beginning.

4.5 Evaluation 27

Our recommendation engine however should not rely on the completeness of the socialgraph. It should yield good results in any state of the network.

In an experiment, we investigated our recommendation algorithm’s performance ondegenerated networks. This should simulate an underdeveloped social graph compara-ble to a topological structure new social services have to deal with. Two cases wereinvestigated:

• Missing friendships We investigate the effect if ties between friends are missingand how many ties may miss until clustering becomes inaccurate.For this, we took the subjects’ ego-graphs, randomly removed links and clusteredthe resulting degenerated graph. The percentage of removed links was increasedfrom 0% to a total of 90% in 10% steps. Each step was evaluated 30 times. Meanand variance of detected clusters were recorded.

• Missing friends This experiment is similar to the previous one but instead ofremoving links, nodes were randomly removed.

Cluster 1Cluster 1.1

Cluster 1.2

Figure 4.6: Clusters fall apart by randomly removing links

In both cases, we are interested if clusters can still be detected and whether theF-measure remains stable or decays.

For analyzing the influence of missing ties, Figure 4.7(a) is a plot of mean and varianceof the number of detected clusters at each step. Figure 4.7(b) shows recall, precision andthe F-measure compared to clustering the complete network at maximal modularity.

Figure 4.7(a) illustrates that with an increasing number of missing links, the amountof clusters increases. This effect can best be described by taking Figure 4.6 into account.Densely connected nodes which previously formed a cluster lose connectivity among eachother, if ties are removed. This causes a cluster to fall apart into smaller pieces. This alsoexplains, why the precision of the new generated cluster (and therefore the global value)remains high. Nodes do not get assigned to wrong clusters, they still remain togetherwith other nodes of the same community. The recall however decreases rapidly since thesplit of a cluster has the effect that not all nodes are present in one cluster anymore.

The second experiment investigates another issue a social network service has to dealwith: The “bootstrap problem”. By lunching a new social service not many users mightbe subscribed to it. Nevertheless, the service has to be usable in order to attract newusers. Automatic recommendation strongly relies on the social graph topology whichonly develops when new users subscribe to the service.


0 20 40 60 80 1000

5

10

15

20

% of removed links

Num

ber

of c

lust

ers

Subject 1

0 20 40 60 80 1000

5

10

15

20

% of removed links

Num

ber

of c

lust

ers

Subject 2

0 20 40 60 80 1000

5

10

15

20

% of removed links

Num

ber

of c

lust

ers

Subject 3

0 20 40 60 80 1000

5

10

15

20

% of removed links

Num

ber

of c

lust

ers

Subject 4

(a) Mean value and variance of number of detected clusters at each stage

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Subject 1

% of removed links

RecallPrecisionF−measure

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Subject 2

% of removed links


0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Subject 3

% of removed links


0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Subject 4

% of removed links


(b) Recall, precision and F-measure at each stage

Figure 4.7: Effect of clustering a degenerated ego-graph with missing friendship links.Links were removed randomly. Each stage was evaluated 30 times.

4.6 Summary 29

The difference to the previous experiment is that now nodes are being removed fromthe graph instead of links. This should simulate that not all of an initiator’s real lifecontacts are subscribed to the service.

The same measurements as in the first experiment were applied and evaluated. Fig-ure 4.8(a) shows that the more nodes are missing, the less clusters are found. Figure4.8(b) shows precision, recall and F-measure values. The F-measure is high, regardlessof the amount of removed nodes. Although the number of clusters decreases, the highprecision value indicates an accurate clustering. The reason for the decay of the numberof estimated clusters is due to the fact that the clusters dissolve since their nodes arenot present anymore.

Obviously, missing nodes do not significantly compromise the algorithm performanceand recommendation based on an incomplete graph is still adequate.

We conclude that only a small subset of a user’s real life ego graph is requiredto estimate community affiliation of these contacts. Therefore, even with a small userbase Cluestr is able to provide acceptable recommendation. This helps to overcome thebootstrap problem. The disadvantage is that if a contact is not registered to the service,he/she will never be recommended to the initiator. This person therefore might easilybe forgotten.

4.6 Summary

Browsing through a mobile phone’s contact list to find a friend is a tedious undertaking.When many contacts have to found consecutively, this repetitive task consumes a notnegligible effort by the user.

In this chapter we introduced a recommendation engine able to recommend contactsbased on previously selected participants.

The idea behind our approach is based on the outcome of our survey: A personis able to group his/her contacts into communities and group communication is oftenperformed with members of such a community.

Our recommendation engine is able to detect these communities by clustering aperson’s ego-graph. By establishing a new group and inviting contacts, the algorithmtries to detect the preferences of the initiator and then recommends suited contacts tothe initiator.

We evaluated our recommendation engine using real data and subject questioning forverification. The results are promising. The most important conclusions are:

• The structure of a person’s ego-graph indicates communities by possessing re-gions with dense interlinkage between nodes. Existing clustering algorithms suchas CONGA are able to extrapolate these community structures accurately. Mod-ularity is an adequate optimization criteria.

• Knowledge about community affiliation to recommend contacts based on previousselections enhances the initialization process by speeding up the required timeto select participants. In addition, the recommendation engine is able to presentcontacts to the initiator he might not have thought of otherwise.

• Our experiments on degenerated graphs showed that even with only a small subsetof an initiator’s ego-graph, enough information is present for the recommendationalgorithm to come up with adequate suggestions.


0 20 40 60 80 1000

2

4

6

8

10

% of removed nodes

Num

ber

of c

lust

ers

Subject 1

0 20 40 60 80 1000

2

4

6

8

10

% of removed nodes

Num

ber

of c

lust

ers

Subject 2

0 20 40 60 80 1000

2

4

6

8

10

% of removed nodes

Num

ber

of c

lust

ers

Subject 3

0 20 40 60 80 1000

2

4

6

8

10

% of removed nodes

Num

ber

of c

lust

ers

Subject 4

(a) Mean value and variance of number of detected clusters at each stage

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Subject 1

% of removed nodes


0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Subject 2

% of removed nodes


0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Subject 3

% of removed nodes


0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Subject 4

% of removed nodes


(b) Recall, precision and F-measure at each stage

Figure 4.8: Effect of clustering a degenerated ego-graph with missing contacts. Nodeswere removed randomly. Each stage was evaluated 30 times.

4.6 Summary 31

Taking all this into account, we believe that a recommendation engine as presented inthis Section brings value to the user.

A challenge coming along with our approach is that information about a user’s ego-graphis required. The whole social network has to be established by the users. Friendshipshave to be requested and confirmed. The simplest approach is to have this informationcentrally stored on a server. With this, required topology and clustering information issent back to the client upon request.

In the next chapter we would like to go one step further and explore if it is possible todetermine a person’s ego graph based on his/her communication behavior. If we are ableto find such patterns indicating social structures between contacts solely by observingthe ego’s communication stream, we could offer a distributed method able to determinecommunity affiliation of contacts on each device independently. The big advantage thiswould imply is to overcome the bootstrapping problem, since manually establishing thesocial network by requesting interlinkage would not be required anymore and everycontact could be recommended regardless if this contact is registered to the service ornot.

Chapter 5

Ego Graph Retrieval based on

Communication Behavior

5.1 Motivation

The recommendation algorithm proposed in the previous chapter is based on graphclustering. A requirement for its applicability is knowledge of a person’s social structuresin form of an ego-graph. Two drawbacks arise following this approach:

• A centralized infrastructure is required to maintain the social graph. A decentral-ized approach however could result in less data traffic and in a and more scalableinfrastructure. All computing could be shifted to the client device.

• The social network consists of users connected through ties. To build the socialgraph, users have to be registered and they manually have to request and confirminterlinkage. A person not registered will be ignored by the system and if twoperson did not deliberately establish a tie, they are considered as not knowingeach other. Obviously, some user effort is required to manage the social network.Otherwise, it may not be congruent with the real social structures.

In this chapter, we would like to investigate if a fully decentralized approach is feasi-ble where social structures are deduced on every mobile phone independently. Whileit is fairly simple to obtain information about a user’s contacts, knowledge about therelationships among those is not directly evident. Our approach observes an ego’s com-munication stream and tries to detect significant patterns which provides indicationsof relationships among a user’s contacts. Our approach does not rely on any semanticinformation contained in the communication.

The next section presents related work in the field of mining social networks. We thenintroduce our approach for estimating a user’s ego-graph followed by an analysis of theperformance and an evaluation of optimization criteria.

34 5 Ego Graph Retrieval based on Communication Behavior

5.2 Related Work

5.2.1 Mining Social Network

Several criteria have been used to infer ties between actors: self-report, similarity, co-occurrence and communication for example.

Self-reporting uses only links reported by individual actors. Such links are subjec-tive. There could be cases where a claim of a tie is not mutual. Classical tools likequestionnaires and interviews are based on this principle [28]. A phone’s address bookcan be seen as such a list. Mutuality is not required and often not given. A similar ideais also present in the buddy list features of instant messaging systems [29].

Similarity has its foundation on the sociological idea that friends tend to be alike [30].This leads to the assumption that the more people have in common, the likelier it is thatthey are related. For example, homepages with similar textual content and links mayrepresent a group of related individuals [31]. Other forms of similarity include havingthe same communication patterns and sharing the same opinions or areas of interest.Comparing user profiles in SNS is a widely used approach for similarity identification.

Co-occurrence assumes that if several entities occur together more frequently thanrandom chance alone, they may be associated.

The work on connection subgraph presented in [32] uses a huge network whose tiesidentify pairs of people whose names are frequently mentioned together on the sameweb pages. Co-authorship networks, as a second example, relate people who co-authorthe same publications together [3, 33].

Communication, defined generally as transfer of information or resources, is commonamong socially related people. Inversely, evidence of communication may indicate asso-ciation. Examples where communication can be traced include emails [34], newsgroups[7], instant messaging [29] or mobile phones [15].

Our approach only observes an ego’s communication stream. It neglects semantical anal-ysis of the content completely. We therefore have to focus on behavior patterns to deducean ego-graph.

Several sensory data and their fusion can be analyzed on mobile phones and beconsidered for finding behavior patterns. Spacial, as well as communication events inrelation to time are examples for such sources of information.

5.2.2 Mining Spatio-Temporal Co-Occurrences

Mining spatio-temporal co-occurrence deals with tuples of both, space and time com-ponents. Lauw et al. [35] argue that individuals who are frequently found together atthe same location at the same time, are likely to be associated with each other. Theyestimate the social graph from data instances where several actors co-occur in spaceand time, presumably due to an underlying interaction. Group pattern mining [36] goesinto a similar direction, arguing that people who are consistently moving together maybelong to a group.

We do not have spatial information in our communication stream. Therefore, this ap-proach can not be followed. We have to focus on communication events only.

5.3 Problem Statement 35

5.2.3 Mining Temporal Co-Occurrences in Communication

Temporal co-occurrence investigates events in time. Every tuple < t, i > is an item ioccurring at time t and subsets are derived using the time component of tuples. In thesimplest case, two tuples < t1, i1 > and < t2, i2 > support a co-occurrence pair {i1, i2} if|t1− t2| ≤ T for a given interval bound T . Baums et al. [15] use temporal co-occurrencesin a stream of communication to detect hidden communities. They evaluate frequencybetween co-occurrence pairs and estimate ties.

Instead of having a sliding window, a discretized timeline could be used. Two eventsco-occur if they fall into the same timeframe.

5.3 Problem Statement

We are interested in the accuracy of predicting interlinkage of two contacts by detectingsignificant patterns in a stream of communication on the ego’s devices without havinginsight into direct communication between these contacts.

5.3.1 Estimating Contact Interlinkage

All approaches presented in 5.2 have in common that they are able to supervise, analyzeand mine the entire network. In our case of a decentralized system with independentdevices, this is not possible. We do not have full scope of the communication flow in thewhole network, only an ego-centric perspective is available.

The aim of our algorithm is to deduce whether contact Ci knows contact Cj (andtherefore has a tie in the social graph) based on the communication stream betweenE ↔ Ci and E ↔ Cj .

5.3.2 Communication Stream

A communication stream is a set of tuples of the form

〈SenderID, ReceiverID, t, msg〉 , (5.1)

where SenderID sends the message msg to ReceiverID at time t. For our analysis, wedo not take any semantic information of msg into account.

In our communication stream, each tuple includes the ego and one contact. Wedefine the communication stream Si of contact Ci as the combination of all tupleswhere contact Ci is either sender or receiver.

5.4 Temporal Co-Occurrence Algorithm

Our algorithm observes the communication stream Si and Sj to deduce interlinkage ofCi and Cj . We follow a statistical approach where we detect temporal communicationpatterns in the form of temporal co-occurrence.

5.4.1 Temporal Co-Occurrence

Group communication exhibits characteristic communication patterns. From an ego-centric perspective, two patterns may indicate group communication:


• Broadcasting appears when the ego sends a message to multiple contacts simulta-neously. (in case of voice calls, broadcasting is similar to calling different contactsconsecutively within a short timeframe.)

• Relaying occurs when a contact sends a message to the ego which then triggersthe Ego to send a message to other contacts.

In both of these cases, two (or more) communication events take place within a shorttimeframe with two (or more) involved contacts.

We are going to define timeframe and introduce the concept of temporal communi-cation co-occurrence as follows:

Definition 8 (Timeframe Duration) In the course of this thesis, we refer to a shorttimeframe when the duration lies between seconds and one hour and to long timeframeswhen the duration exceeds one week.

Definition 9 (Temporal Communication Co-Occurrence) We talk about tempo-ral communication co-occurrence {Ci, Cj} of contact Ci and Cj if a communicationevent E ↔ Ci and another event E ↔ Cj coincide in the same timeframe T .

Figure 5.1 illustrates a sample communication stream with the corresponding co-occurrence statistics underneath.

E C1E C1E C2

E C2E C1E C3

E C3E C4

TTimeframe

{C1,C2} {C3,C4}{C3,C1}{C1,C2}{C2,C3}

Co-Occurrence Pairs:

t

Co-Occurrence Statistics:{C1,C2} : 2{C2,C3} : 1{C3,C4} : 1

Figure 5.1: Sample communication stream and dedicated co-occurrence pairs. The fre-quency of each pair is calculated to retrieve the co-occurrence statistics.

Such a temporal communication co-occurrence can be interpreted as an indicatorfor either a broadcasting or relaying pattern and therefore for group communicationbetween E, Ci and Cj . A co-occurrence pair {Ci, Cj} becomes statistically significantwhen its frequency of appearance exceeds the expected frequency of these structuresfrom the random background communication. Such a significant co-occurrence pair isconsidered as implying a hidden tie between Ci and Cj . We investigate whether by an-alyzing temporal communication co-occurrence over a long time, a person’s ego-graphcan be estimated or not.

One problem that occurs by taking the frequency of communication co-occurrence pairsas a decision for interlinkage between contacts, is the following: The frequency of a co-occurrence pair depends on the frequency of the communication events of both contactsseparately. If E communicates frequently with Ci, the probability of a communication

5.5 Evaluation 37

event occurring in the same timeframe as an event of a completely unrelated contactincreases. This leads to a statistic which could cause the detection of wrong ties. Sec-ondly, if a contact rarely communicates with the ego, co-occurrence with other contactsmay never gain significance and will remain undiscovered. Therefore, normalization isrequired to overcome these problems.

In the next section we introduce a weighting function which is applied to lower theinfluence of frequent and rare communication behavior of contacts.

5.4.2 Correlation Weighting

By evaluating the frequency of co-occurrences in a communication stream, the focuslies on analyzing short-time communication behavior only. Long-term patterns are dis-regarded completely. We investigate whether the overall temporal communication be-havior of an ego with contacts belonging to the same community exhibits characteristicsimilarities.

The communication stream Si between contact Ci and E can be interpreted as a timediscrete signal with value 1 if a communication took place within the timeframe ti and 0otherwise. This signal representation provides a characterization of the communicationbehavior between Ci and E. We assume to find a similar communication behavior oftwo contacts featuring a hidden tie. Therefore, the signal representations of Si and Sj

correlate better than two random signals.The correlation coefficients R(Si, Sj) is the zeroth lag of the normalized cross-

covariance function C(Si, Sj). The cross-covariance is a measure of similarity of twosignals. We weight the co-occurrence frequency of a contact pair with their correla-tion coefficient. By doing so, the influence of frequent communication with a contactis damped down and if communication with one contact appears rarely, the weightedco-occurrence statistics might still be significant if it shares a similar communicationpattern with another contact. With this approach we overcome both issues: frequentand rare communication behavior are less influential.

5.5 Evaluation

In this section, we are going to evaluate the previously introduced concepts on realcommunication logs. In Section 5.5.2 we analyze if temporal co-occurrence does implyties among contacts. We evaluate the effect of changing the timeframe duration in 5.5.2.1and make experiments in 5.5.2.2 to see the performance of weighting the co-occurrencestatistics with the dedicated correlation coefficients.

5.5.1 Data Set

For the evaluation of our algorithm, we use real records of Swisscom’s mobile calldatabase. Each call record entry contains information about

• Initiator ID

• Receiver ID

• Initiation Timestamp

• Type (Call or SMS)


• Duration (= null, if SMS)

We queried 40 randomly chosen egos and retrieved all their communication events duringa time period of six month. Contacts of these egos were identified and the communicationevents among them were retrieved. With this, we have all mobile communication events(call and SMS) to estimate 40 independent ego-graphs. We take an ego’s communicationstream as input data for our algorithm. Communication events between two contactsindicate interlinkage if they appear mutually and therefore are taken as a representationto build the social graph, the ground truth, against which our algorithm was evaluated.Figure 5.2 is a plot of such an ego-graph.

Figure 5.2: Sample ego-graph retrieved out of the communication stream. A tie is as-sumed if at least one communication event has been initiated by both instances mutually.

5.5.2 Precision and Recall of Temporal Co-Occurrence Algo-

rithm

Precision and recall are good measurements to evaluate the accuracy of estimated tiesbetween contacts. Precision is the number of correctly estimated ties divided by thethe total number of estimated ties, whereas recall is given by the number of correctlyestimated ties divided by the number of expected ties (ground truth). Figure 5.3 plots theco-occurrence frequency of all co-occurrence pair appeared in the communication streamof one specific ego. For evaluation, Figure 5.4 shows typical plots of precision (risingcurve) and recall (falling curve) for three estimated ego-graphs. The Y-axis representsprecision and recall values. All co-occurrence pairs with a (weighted) co-occurrence

5.5 Evaluation 39

statistics less than a threshold are ignored. The X-axis indicates the amount of neglectedcontact pairs. This normalization makes it possible to compare different graphs anddifferent metrics among each other.

0 10 20 30 40 50 60 70 800

2

4

6

8

10

12

14

Co−occurrence pair

Unw

eigh

ted

co−

occu

rren

ce

Co−occurrence pairThreshold

Figure 5.3: Co-occurrence statistics for an ego-graph. Pairs above the threshold areconsidered as being connected through a tie in the ego-graph.

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

% of ignored Co−Occurrence Pairs

Precision & Recall

Ego1Ego2Ego3

Figure 5.4: Plot of precision (rising curve) and recall (falling curve) for three differentegos with timeframe duration of T = 1h and correlation weighting.

As visible in Figure 5.4, the maximal recall for an ego-graph can be reached by tak-ing all detected co-occurrence pairs into account for estimating ties. The correspondingprecision however is not satisfiable. By increasing the threshold, the statistical signifi-cance of a co-occurrence pair has to excess a certain value that a tie will be assumed.


Therefore, recall drops whereas precision rises. This behavior can be discovered regard-less of the weighting function and the short timeframe durations we investigated in allanalyzed ego-graphs. It supports our initial assumption that broadcasting and relayingpatterns represented as temporal co-occurrence indicate ties between contacts.

It is however difficult to define an optimal threshold since precision and recall divergesignificantly and the F-measure reaches a maximal value at 0% of ignored pairs.

We also evaluate the average ratio of the co-occurrence frequency of contacts whichdo share interlinkage by contacts which do not share a tie. This ratio is given by

Co-occurrence frequency of linked contacts

Co-occurrence frequency of unlinked contacts= 1.7 (5.2)

This implies that the co-occurrences of interlinked contacts occurs in average close totwice as often as between unrelated contacts.

5.5.2.1 Varying Co-Occurrence Timeframe Duration

The timeframe T in which co-occurrence is being detected can be optimized to bestmatch the broadcasting and relaying behavior. Figure 5.5 shows a comparison of pre-cision and recall with two different timeframe durations of T = 24h and T = 1h. Bycomparing the two plots, it becomes obvious that by reducing the duration of the time-frame, the recall is being lowered. The accuracy on the other hand increases significantly.

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Precision

T = 24hT = 1h

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Recall

T = 24hT = 1h

Figure 5.5: Precision and recall with two different timeframe duration of T = 1h and T= 24h

To find the optimal timeframe, we investigated the effect of varying the duration ofa timeframe. We try to optimized the F-measure of the detected co-occurrence pairs.The results are presented in Figure 5.6. Maximal F-measure and therefore optimizedtimeframe duration could be achieved with T = 1 hour (indicated by the dashed verticalline).

5.5.2.2 Correlation Weighting

In Figure 5.8, the correlation coefficients between any two signal representations ofcommunication streams were calculated. Contact pairs which are connected through a

5.6 Temporal Association Pattern Analysis 41

100

101

102

103

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Timeframe Duration [min]

Variation of Timeframe Duration

PrecisionRecallF−Measure Maximal F−measure at

T = 60 min

Figure 5.6: Precision, recall and F-measure with variable timeframe duration T (inminutes) between 1 minute and 1 day. Maximum F-measure is reached at T = 60min.Here, co-occurrence pairing revealed the best estimation of the ego-graph.

tie on the ego-graph are plotted with a blue circle, unrelated pairs are marked with a redcross. The Y-axis is the dedicated correlation values. The plot on the right side showsthe contact pairs in ascending order of their correlation coefficient whereas on the left,contact pairs are randomly arranged. The plots shows the existence of a relation betweenthe correlation factor of two contact’s communication stream and their interlinkage. Onaverage, related vertices have a 8.1 times higher correlation coefficient than unrelatedvertices. This supports our assumption that related contacts share a higher correlationvalue than unrelated. Figure 5.7 shows a plot of the weighted co-occurrence statisticsfor one ego-graph.

Weighting the co-occurrence statistic with the correlation value increases precisionand recall. Figure 5.9 gives a comparison of weighted and unweighted co-occurrencestatistics for a sample ego-graph estimation. In our dataset, correlation weighting in-creased the average recall value by 12% and precision by 5% and therefore yiels signifi-cantly better results.

5.6 Temporal Association Pattern Analysis

In the previous section we evaluated if temporal co-occurrence can be used to estimatehidden ties. We saw that weighting the co-occurrence statistics with the correlationvalue increases precision and recall. This implies that two contacts belonging to the samecommunity exhibit a communication pattern that correlates stronger than contacts ofdifferent communities.

In a next step, we would like to analyze, if communities feature a natural regularityin their communication pattern which could help to conclude a contact’s communityaffiliation. One could think that a community prefers to communicate on certain hoursduring the day or only on certain weekdays. A person might communicate with his co-


0 10 20 30 40 50 60 700

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Co−occurrence pair

Wei

ghte

d co

−oc

curr

ence

Co−occurrence pairThreshold

Figure 5.7: Co-occurrence statistics for an ego-graph weighted by the correlation co-efficient of the two contact’s signal representation. Co-occurrence pairs exceeding athreshold are considered of being connected through a tie.

workers during business hours. In the evening he then prefers to communicate with hisfriends and family members. During a week, from Monday to Friday, he often commu-nicates with co-workers whereas during the weekend he rarely has to think of his jobbut prefers to communicate with his friends and family members. If a unique regularitypattern for a community can be found, it would then be possible to group contacts withsimilar temporal communication behavior and hence find clusters without estimatingties between contacts. We analyze the data stream of the ego-graph illustrated in Fig-ure 5.2. In a first step, we manually identified 4 clusters and assigned contacts to them.Some nodes may belong to multiple cluster whereas several nodes do not belong to anycluster. They are combined into the rest group. For each cluster and the rest group, weanalyzed the communication regularity separately. We investigated the communicationdistribution over weekdays as well as hours during a day. Figure 5.10 shows two his-tograms for each cluster as well as the rest group. The histograms in 5.10(a) illustratesthe hourly activity during a day and 5.10(b) shows the daily activity during a week.The variations in the distributions of each group are not strong enough to come upwith a clear distinction. In addition, by looking at the communication distribution ofeach contact separately, it is not evident to which cluster they may belong - even byknowing the exact number of clusters and their distribution in advance. Figure 5.12shows the normalized weekday histogram of each contact and the clusters they belongto. It is obvious that a clear and distinct assignment would be rather difficult. We seethe following reasons as the main cause for not finding significant differences: mobilephones are used before and after but usually not during a meeting with members of acommunity. Therefore, communication events may strongly interfere with communica-tion of other communities. As en example, planing the weekend is usually done duringthe week, often during work time.

5.7 Behavior Analysis using Markov Chain 43

0 100 200 300 4000

0.2

0.4

0.6

0.8

1

Vertex Pair

Cor

rela

tion

Coe

ffici

ent

Arbitrary arrangemen

Unlinked VerticesLinked Vertices

0 100 200 300 4000

0.2

0.4

0.6

0.8

1

Vertex Pair

Ascending correlation coefficient

Cor

rela

tion

Coe

ffici

ent

Unlinked VerticesLinked Vertices

Figure 5.8: Plot of the correlation coefficients between any two contact’s communicationstream signal representation. Contact pairs sharing a link in the ego graph are markedwith a ’o’. Otherwise, they are marked with a ’x’. The average correlation coefficientof related pairs is 8.1 times higher than that according to unrelated pairs. The plot onthe right side shows the contact pairs in ascending order of their correlation coefficientwhereas on the left, contact pairs are randomly arranged.

5.7 Behavior Analysis using Markov Chain

As a short insertion, we will inspect in this section if the communication behavior withinclusters differ or not. To gain insight into the communication behavior, we modeled aMarkov Chain to represent the communication stream. We included four different states:

• Incoming short message (IM)

• Outgoing short message (OM)

• Incoming call (IC)

• Outgoing call (OC)

In a next step, we calculated the transition probabilities from every state into another bythe consecutive communication event. Figure 5.12 illustrates this for the ego illustratein 5.2.

We evaluated the behavior of all four clusters and the rest group independently andcompared the results to the overall outcome of the whole network. Surprisingly, the be-havior of each group were very similar and differences lie within variance. No significantdistinction could be discovered. It is however possible to conclude some interesting facts:

• The probability to change the media for the next communication event lies byabout 30%.

• In 35% of the cases, the next communication event is similar to the previous interms of incoming/outgoing and short message/phone call.

• In roughly 50% of the cases, the successive event changed from incoming to out-going or vice versa.


0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Precision

UnweightedCorrelation weighted

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Recall

UnweightedCorrelation weighted

Figure 5.9: Precision and recall comparison of weighted and unweighted co-occurrencestatistics. By weighting the co-occurrence statistics with the correlation coefficient ofthe two contact’s signal representation, recall could be increased by 12% and precisionby 5% respectively.

5.8 Summary

Recent studies showed that communication behavior can indicate social relationshipsand that these structures can be retrieved without any knowledge of the content ofexchanged messages. Just by observing the stream of communication events, the socialgraph can be deduce. In this chapter we present an approach to estimate a person’sego-graph by observing the communication stream out of an ego-centric perspective. Noinformation about the communication behavior of this person’s contacts is available.

We introduce co-occurrence pairing. Co-occurrence indicate broadcasting and relay-ing patterns which are characteristical for group communication. A tie between twocontacts is assumed if the frequency of their co-occurrence appearance becomes statis-tically relevant.

Evaluating our approach on real data revealed that our initial assumptions were cor-rect. broadcasting and relaying indicate hidden ties. However, both of these patterns donot occur frequently enough and very often, other communication events of completelyunrelated contacts interfere and falsify the co-occurrence statistics significantly. Thisfinding is against the outcome of our survey where participants indicated this behavior.

By analyzing call log data, we found the optimal timeframe duration to record co-occurrence to be one hour. In addition, we could increase the recall by 12% and precisionby 5% by weighting the co-occurrence statistics with the correlation factor of the twocontacts’ communication streams.

The significant enhancement that correlation weighting introduced, indicates thatthe long term communication pattern in fact possesses natural similarities of contactsbelonging to the same community. The evaluation of daytime and weekday distributionon the other hand did not imply that communication possess regularity.

We state the assumption that group communication occurs more or less randomly overtime but contacts of the same community show similar communication behavior.

5.8 Summary 45

0 10 200

20

40

60

Cluster 1

Hour0 10 20

0

20

40

60

Hour

Cluster 2

0 10 200

20

40

60

Hour

Cluster 3

0 10 200

20

40

60

Hour

Cluster 4

0 10 200

20

40

60

Hour

Rest

(a) Histogram of the hour distribution during a day.

1 2 3 4 5 6 70

50

100

Cluster 1

Day of the Week1 2 3 4 5 6 7

0

50

100

Cluster 2


0

50

100

Cluster 3


0

50

100

Cluster 4


0

50

100

Rest

Day of the Week

(b) Histogram of the day distribution during a week.

Figure 5.10: Histogram of the communication distribution for each cluster. 5.10(a) showsthe hourly distribution over a day. 5.10(b) shows the daily distribution over a week.

The obtained results are not very promising. Although our evaluation showed the statis-tical significance of our assumption, the estimation is too poor for real world applicabilitye.g. for usage in our recommendation engine.

The major difficulty in our approach is the lack of information available over thewhole social graph. We believe that with access to all communication streams, thesocial graph could be retrieved very accurately. This however would require a centralizedmanagement entity.

To avoid this, in addition to temporal communication information, spacial data couldsupport the accuracy of the estimation of the social graph. Cell ID or GPS positioningcould be used to detect proximity among contacts. Exhibit similar spacial pattern couldindicate community belonging like presented in [35].

In the next section we are going to introduce the platform we developed to run Cluestr.Due to the unreliable retrieval of a person’s ego-graph by observing the communicationstream, we are not going to follow the approach presented in this chapter but presenta social networking service where the social graph has to be established by the usersthemselves on intention.


Figure 5.11: Normalized weekday histogram of the communication distribution of theego with each contact separately as well as combined to their affiliated clusters. A clearassignment of a contact to any cluster is not obviously to achieve.

5.8 Summary 47

IM

IC

OM

OC

0.27

0.69

0.120.31

0.28

0.23

0.060.13

0.46 0.13

0.270.15

0.11

0.15

0.27 0.37

P(Media changes) = 0.31

P(Initiator changes) = 0.51

P(Sustain) = 0.35

Figure 5.12: Markov Chain and the transition probability of the communication behav-ior.IM: Incoming message, IC: Incoming calls, OM: Outgoing message, OC: Outgoing call

Chapter 6

Platform and Implementation

In Section 3.2 we stated the idea and key elements of Cluestr. In this Chapter we aregoing to specify the design of our application and present the platform. We begin bydefining the requirements for our application. Next, we evaluate common platforms fordeveloping mobile applications and then discuss our implementation.

6.1 Specification of Cluestr

In this Section we define the system specification. We do this be stating use cases anddeduce the requirements. Requirements are used to define the capabilities a system musthave. Use cases are used to describe an interaction between the user and the system. Usecases are principally used for getting the functional requirements of the system whichshould be designed and implemented.

6.1.1 General Usage of the Service

Cluestr is designed for mobile usage. This implies some constraints and forces the dis-cussion on key considerations:The service has to be accessible through a mobile client. However, Cluestr should alsobe accessible with desktop computers. We want this feature so that a member can usethe service even if he/she is not in possession of a phone able to run Cluestr.

Another key requirement we want to achieve is that no registration is required forparticipation. A registration however is possible and even a necessity if a user wants toinitiate his/her own group and also if he/she wants to profit from the social network-ing functionality to connect with other registered friends and share profile and statusupdates.

Unregistered participants are able to participate in Cluestrs. Within such a Cluestr,they can:

• Write on the Billboard

• Participate in Polls

• Manage the ToDo list

50 6 Platform and Implementation

6.1.2 Client Use Cases

Table 6.1 gives an overview of use cases. The application is designed according to them.

6.2 Platform for Mobile Social Applications

To find the optimal platform to implement Cluestr, we evaluated two different ap-proaches for developing mobile social applications:

• Native mobile applications

• Mobile web applications

6.2.1 Native Mobile Applications

A variety of approaches exist for developing mobile applications. Until recently, na-tive applications for mobile devices offered the most sophisticated and thus preferredplatform. Several different platforms exist. The most common platforms include:

• J2ME1

• Symbian2

• Windows Mobile3

• Android4

• iPhone5

However, several barriers exist by developing native mobile applications. Most impor-tantly, a wide fragmentation of devices makes it extremely difficult to offer one softwareto a large variety of mobile devices. Additionally, distribution of the software and up-dates later on is difficult to achieve.

6.2.2 Mobile Web Applications

With the Web2.0 movement on desktop computers, many applications and services arenow available on the web, accessible through a web browser. We assume that this trendspill over to the mobile world. In state-of-the-art devices such as the iPhone, the NokiaN95 or Android Phones, mobile browsers evolved to sophisticated browsers offeringsimilar user experience as known from desktop browsing. We assume to see a tendencyto implement such feature-rich browsing technology into more and more devices in thenear future. The trend towards web applications and services as observed on personalcomputers will therefore likely continue on mobile devices.

1java.sun.com/javame2www.symbian.com3www.microsoft.com/windowsmobile/4code.google.com/android5www.apple.com/iphone

6.2 Platform for Mobile Social Applications 51

Layer Use Case Short Description RS IS

System Register Registration is required for cer-tain features like initiating aCluestr, requesting friendshipsor sending messages.

R√

System Login, Logout Login and logout to the serviceusing username and password.

R√

System Search user finding people also registeredto the service.

R X

SN Request/Confirm friendships Friendship request can be sentto other members. Requestshave to accepted by the re-quested party in order to estab-lish a connection between thetwo members.

R √

SN Update profile and status Change own profile informa-tion and current status. A pro-file may contain informationabout the name, contact infor-mation, etc. The status infor-mation may contain locationand presence information.

R√

SN See a friend’s profile and status Browsing through profile andstatus of friends.

R√

SN Send and receive private messages Sending messages to friendsand receiving messages fromthem.

R√

Cluestr Initiate a Cluestr A user can initiate a newCluestr and invite other user toparticipate.

R√

Cluestr Participate in a Cluestr Participating in a Cluestr doesnot require to be registered.However, unregistered usersare not able to see profile andstatus information from othermembers and may not havefriendship links

U√

Cluestr Write messages on the Billboard Participants can broadcastmessages to all participants ina Cluestr.

U X

Cluestr Participate in a Poll Participants can poll. Only onepoll is active at the time.

U X

Cluestr Manage the ToDo list Participants can contribute tothe ToDo list. By that, theycan add new elements or markexisting elements as accom-plished.

U X

Table 6.1: Mobile client use cases. R: Registration required, U: Registration not required.IS: Implementation Status. Features with a

√are implemented, those with a X are not

implemented


Comparing mobile web applications to native applications, they offer several advan-tages:

• Every device featuring a web browser will be able to run the service. Applica-tion developers do not have to be concerned about the mobile phone’s operatingsystem. They do not have to build different versions for different platforms. There-fore, device fragmentation does not add more complexity to software development.Browser fragmentation will be less of an issue by following web standards.

• To run a web application, no software distribution is required. No software has tobe downloaded and installed. The URL to access the service is the gateway to usethe application immediately.

• Adding new features or bug-fixing the service does not require a redistribution ofthe application. Users do not have to get notified to download and install updates.The service provider remains in control of the software all the time and user alwaysexperiences the most recent version.

On the other hand, there are still several drawbacks coming along with web applicationson mobile devices.

• The application can behave slowly due to network latency and the large amountof data that has to be transmitted e.g. by browsing from one web page to thenext.

• Only limited system access is available. Accessing address book data, locationinformation, etc. through a browser is currently not supported and therefore cannot be retrieved.

• Mobile web applications can not be used without persistent connectivity to theserver. Ubiquitous network access is required.

To make mobile web applications successful, the next phase needs to overcome theseissues and therefore make the mobile web responsive, ’always on’ (persistent) and enableit to merge with other mobile functions such as telephony, messaging, location etc.

Attempts to overcome this issues are currently in progress of being implemented.The most promising trends are:

Enhancing Responsiveness Using JavaScript and Ajax XML HTTPRequests shiftsan application’s logic from the server to the client. For a data request, not the whole pagedescription has to be retransmitted but only raw data. This makes a web applicationmore responsive and lowers the amount of transmitted data.

Accessing System Ressources To access system resources, APIs can provide func-tionality to invoke device capabilities from JavaScript such as access to current location(e.g., via GPS), phone dialer, camera, address book, calendar and SMS. Previously, theseservices were only available to native applications. Such an API is under standardizationprocess by the OpenAjax Alliance6.

6http://www.openajax.org/member/wiki/Mobile Device APIs

6.3 Infrastructure Overview 53

Accessing Data Offline In order to be reactive to real-time events, mobile applica-tions have to be persistent. Offline storage can support persistence by allowing a copyof web-bound data to be accessed and updated locally in real-time. Gears7 offers sucha possibility. Gears is an open source browser extension that lets developers create webapplications that can run offline. Gears provides three key features:

• A local server, to cache and serve application resources (HTML, JavaScript, im-ages, etc.) without needing to contact a server.

• A database, to store and access data from within the browser.

• A worker thread pool, to make web applications more responsive by performingexpensive operations in the background.

Offline capabilities enable the possibility to use the service offline when no connectivityto either the wireless network or the server is available.

Using Gears in combination with XML HTTPRequest makes it possible to synchro-nize data between the device and the server as a background task.

6.2.3 Conclusion

We believe that mobile web applications accessible through a device’s browser will be-come a valuable alternative to native applications with a huge potential in the future.

Based on the previous evaluation, we decided to build Cluestr as a mobile webapplication.

6.3 Infrastructure Overview

We use Gears to offer offline access and background synchronization. This will keepthe application highly responsive since data can be retrieved from the local databaseand does not always have to be requested from the server. To keep latency low andto reduce the amount of transmitted data, Ajax-standard XML HTTPRequests areused to transfer data between client and server. Data objects are serialized using JSON(JavaScript Object Notation). JSON is a lightweight computer data interchange format.It is a text-based, human-readable format for representing simple data structures. JSONhas less overhead than XML and is therefore preferred in mobile web applications tokeep the amount of transmitted data as low as possible.

Gears is currently available for most desktop PCs. In the mobile area, Gears is onlysupported by Pocket Internet Explorer (PIE) running on Windows Mobile 5 and 6devices. However, Gears is fully open source and is currently in the process of beingported to other platforms like Opera Mobile as well as WebKit Browser for Nokiaphones and Android devices. For evaluating and testing, we used an HTC Touch Cruisedevice running Windows Mobile 6.1.

On the server side, we decided to use a Ruby on Rails infrastructure. Rails is a full-stack framework for developing web applications according to the Model-View-Controlpattern8. Using Ruby on Rails, we can use the same infrastructure for both, the mobileand a PC version. Figure 6.1 gives an overview of the infrastructure.

7http://gears.google.com8en.wikipedia.org/wiki/Model-view-controller


MySQL DB

Web Access

XML HTTPRequest using JSON Serialization

Figure 6.1: Infrastructure of Cluestr using Gears and Pocket Internet Explorer on Win-dows Mobile as the client infrastructure and a Ruby on Rails Server as the back-endinfrastructure

6.4 Client Design

An overview of the architecture stack is given in Figure 6.2. For Cluestr, we implementeda Data Handler (DH) and a Synchronization Engine (SE).

The Data Handler is responsible for local data management. The SynchronizationEngine running in the background assures data consistency between the device and theserver. On top of SE and DH, the Application Logic (AL) and the User Interface (UI)are implemented.

6.4.1 Data Handler and Queue

Data is either stored in the device’s local database or available on the server. With ourarchitecture, storing and retrieving data on the device is managed by the DH whichtherefore provides an abstraction to the Application Logic and the User Interface. Theidea is that the AL requests the required data from the DH. DH will then assemble thedata and hands it back to the application. The AL itself does not have to be concernedwhere the data lies, whether it is locally available or has to be requested from the server.The same, if the AL wants to store data, it sends a data object to the DH which willhandle the content and stores it in the local database and/or sends it back to the server.

6.4.1.1 Internal Data Handling

The Data Handler itself handles requests from the Application Logic. In a first step,it distinguishes whether data is requested or if data needs to be stored or altered. Byrequesting data by the AL, the DH knows how to retrieve the specific data. The datacan either be stored in the local database or on the server. If a request has to be sent to

6.4 Client Design 55

Synchronization Engine (SE)

User Interface (UI)

Application Logic (AL)

Data Handler (DH)

Database

Server

Device

Web Browser

Figure 6.2: Device architecture stack featuring a User Interface Layer, an ApplicationLogic Layer, a Datahandler Layer responsible for managing storage locally and a Syn-chronization Engine running in the background responsible for assuring data consistencybetween the device and the server

the server, the DH manages the XML HTTPRequest. It sends a request to the server,handles the reply message and reports it back to the application. However, direct datarequest to the server by the DH should be avoided since connectivity to the networkcan not be assured and delay might be high. This could cause time-outs and lead to anunsatisfactory user experience. In our design, only occasionally the DH has to establisha connection to the server itself. This, for example, is the case by logging into Cluestrwhere user verification is required.

If the AL sends a request to either update existing data (e.g. profile update) or to adda new new dataset (e.g. by sending a message), the DH takes responsibility to updatethe local database.

Updating the local database causes data inconsistency. The server has no informationthat data has been altered on a client. The server’s database has to be updated as well.However, direct connection by the DH should be avoided as much as possible.

To guarantee data consistency, we introduce a Synchronization Engine running as abackground thread. Using this, the DH does not have to deal with sending updated datato the server. Additionally, the DH does not have to be concerned about not having themost recent data from the server. Both of this tasks are managed and guaranteed bythe Synchronization Engine.

To provide a mechanism able to inform the SE about altered data, we introduce aqueue where data sets which need to be synchronized to the server can be appended.If a dataset needs be sent to the server, the DH post this data to the queue. TheSynchronization Engine manages this queue and sends the data object to the server.

6.4.1.2 Queue

Data which has to be sent to the server is appended to the queue. Entries in the queueare then processed by the Synchronization Engine, which passes them to the server.

The queue takes a queue-entry as input. This entry contains the following informa-tion:

• Type: Type of the data object that has to be sent to the server.


• Id: Local Id of the data object. This is optional but sometimes required to handlereplies from the server.

• Priority: Processing priority. Usually, all entries share the same priority and areprocessed according to a FIFO rule.

• JSON: The data which has to be sent to the server in JSON notation.

• URL: The URL to where the Synchronization Engine will send the data.

The queue itself is implemented as a table in the local database exhibiting the followingstructure:

QueueIndex Type Id Priority JSON URLinteger, primary key varchar integer integer varchar varchar

The advantage of having the queue implemented as a table in the local database isto have data entries persistent and not losing them by either exiting the browser norshutting down the device.

6.4.2 Background Synchronization Engine

The Synchronization Engine is responsible for keeping the data between the client deviceand the server consistent. The benefits of synchronization and storing the data locallyare:

• Data is ready all the time. Whenever the user chooses to go offline or is accidentallydisconnected, data sets can be retrieved locally.

• The performance is enhanced when using a slow Internet connection since data isloaded from the local database.

The downside is that the SE might consume resources or slow down the online experiencewith its processing. Using WorkerPool offered by Gears, the cost of synchronizing isminimized and no longer affects the user’s experience since it runs as a thread in thebackground.

The SE has two tasks to fulfill:

1. Sending altered data from the client to the server.

2. Downloading new and updated data from the server and store it locally.

This is achieved using two independent workers: the uploader worker and the downloaderworker.

Uploader Worker The uploader worker periodically checks the queue for data en-tries. If the queue is not empty, the worker tries to establish a connection to the serverand sends the data. After a successful transmission, the worker removes the entry fromthe queue. It might have to report information back to the local database. This is nec-essary in various cases: If a user writes a message to a friend, he/she can hit the sendbutton not having to worry about current connection status. The message gets firstmarked as undelivered as long as it has not been transmitted to the server and is en-queued. After successful transmission, it lies in the duty of the uploader to mark themessage as sent.

6.5 Server Design 57

Downloader Worker The downloader worker is responsible to download updatesfrom the server and updating the local database accordingly. We decided to implementthis by a polling infrastructure with timestamps from the server to keep track of changes.

The client has a timestamp of the last successful synchronization stored in thedatabase. This timestamp is sent to the server along with a request to retrieve since lastsynchronization request. The server then checks for updates and transmits them backtogether with a new timestamp. After a successful request, the client updates the localdatabase with the elements received and stores the new timestamp. It is being used forthe next update request.

Client Server

{ "data" = <data>, "ts" = <ts_new> }

{ "ts": <timestamp> }

ts_new = CurrentTime()data = GetChangesSince(ts)

updateDB(data)updateTS(ts) .

.

.

Figure 6.3: Client-Server sequence diagram for requesting updates by the client’s down-loader worker

6.5 Server Design

The design of the server infrastructure relies on the Model-View-Controller concept of-fered in Ruby on Rails. A synchronization controller is responsible to handle requestsfrom the client’s Synchronization Engine. The uploader worker sends data to this con-troller which then updates the database. For a request by a client’s downloader worker,the synchronization controller retrieves all updates from the server’s database since thelast request end sends it back to the client including the current timestamp. Figure 6.3illustrates this process.

6.6 Ego-Graph Clustering

In order to have the recommendation engine running, the device requires informationabout clustering of a user’s ego graph. We implemented the following procedure toachieve this: If a user wants to initiate a new group, a clustering request is sent tothe server. The computationally complex task of clustering is performed on the server.After successful clustering the user’s ego-graph, all information is sent back to the clientdevice and stored until a new clustering request is sent.


Data Handler

Application

Database

db.new

db.select

app.

requ

est app.reply

Queue

Server

db.delete

db.update

q.append

Synchronization Engine

Downloader Uploader

server.download

server.upload

Figure 6.4: Device architecture

6.7 Mobile User Interface

6.7.1 General Design

The User Interface on the mobile device is kept simple and intuitive. It has 5 main tabson top and various sub tabs on the left side. The main tabs and their purposes aresummarized in Table 6.2.

Tab Description

Hub Used to display an activity feed, change personal status and initiate anewCluestr. Synchronization can be switched on and off here.

Cluestr Used for Cluestr activities. Offers viewing shouts, polls and the ToDo list.Only one Cluestr is active at one time.

Friends Lists friends, shows their profile and status activities.Messages Inbox and Outbox for private messages sent to friends or received from them.

Also able to write a new messageSettings Various settings such as changing profile information, selecting the active

Cluestr etc.

Table 6.2: Mobile user interface

Screenshot 6.5 shows the design and highlight important parts: The main navigationwith its 5 tabs is placed on the top of the window (red square). On the left side, thesub tabs are accessible (green square). In the screenshot, the user is currently invitingcontacts to participate in a new Cluestr. The contacts already invited are listed at

6.7 Mobile User Interface 59

the beginning of the main window (yellow square). Underneath, the contact list (bluesquare) is displayed separated by the selection view buttons (pink square).

Figure 6.5: Cluestr user interface

6.7.2 Contact List

Figure 6.6(a) shows the contact list. It directly offers the possibility to send a message,send a SMS, send an E-Mail or call the person. By clicking on a subjects name, theprofile and status information is displayed as seen in Figure 6.6(b).


(a) Screenshot Friendlist (b) Screenshot Profile view

Figure 6.6: Screenshot of the friendlist and the profile view

6.7.3 Cluestr Initialization Process

By initiating a new Cluestr, four different views to display the contact list are providedto select contacts to participate:

Icon Name Description

Recommendation View Displays the top 5 recommended contacts

Cluster View Displays a user’s contacts grouped into de-tected clusters

Activity View Displays contacts according to communica-tion activities

Alphabetic View Displays a user’s contacts in an alphabeti-cally ordered list

Table 6.3: 4 different contact selection modes are available for inviting contacts to par-ticipate in a Cluestr

Figure 6.7 shows Recommendation View and Cluster View. The RecommendationView displays the top 5 recommended friends (Screenshot 6.7(a)). This recommendationis based on the algorithm presented in Section 4. By hitting the “+”-button, the contactgets promoted as an invited member. Undesired recommendations can be excluded fromthe list by clicking the “-”-button.The Cluster View displays a user’s contacts grouped into clusters they belong to (Screen-shot 6.7(b)).The alphabetic view simply lists all friends in alphabetic order as known from the phonebook.

6.8 TodayScreen Extension 61

(a) Screenshot recommendation view (b) Screenshot clustering view

Figure 6.7: Screenshot of the Cluestr initialization process

6.8 TodayScreen Extension

To access Cluestr on a mobile device, a user has to lunch the web browser and navigateto the website. This procedure involves some effort just to lunch the service. On mobiledevices, users are not willing to navigate back and forth. The most important infor-mation should be placed in very prominent position that is easily visible and simple toaccess.

As a prove of concept, we developed a widget for the Windows Mobile Today Screen.widgets support persistence by allowing the web application to always be running andalways be visible to the user.

Windows Mobile does not come with a widget engine. We developed a Today Screenitem with an embedded web browser instance pointing to a widget-optimized versionof Cluestr. This website also incorporates Gears as well as our Data Handler and Syn-chronization Engine. Thus, the widget is able to interact with the server. This is usedto send notification about new messages or status updates from the server back to thewidget which instantly notifies this on the Today Screen. In Figure 6.8, a screenshot ofthe widget on the Today Screen is given.


(a) Screenshot of the TodayScreen widget mes-sage notification

(b) Screenshot of the TodayScreen widget statusupdate

Figure 6.8: Screenshot of the TodayScreen widget

Chapter 7

Conclusion and Outlook

7.1 Conclusion

New technological and economical drivers in mobile communication and the emergenceof social networking services motivated us to explore how both can be incorporated toprovide enhanced mobile communication services that cope better with users’ behavior.

We made a survey among 342 people to find out how mobile phones are used andwhat social network services could contribute. The outcome led to the following state-ments:

• The mobile phone is often used for organizing and coordinating activities amongmultiple people.

• Group communication is often performed with members of a community such asmembers of a sport team, co-workers, class mates or family.

However, current mobile phones do not offer functionality for efficient group communi-cation.

In this Thesis, we proposed a new service able to fulfill the demand for group com-munication and collaboration. The service, named Cluestr, offers N:N messaging andcollaboration functionality such as polls and ToDo lists which can be used among mem-bers of a group. Groups can be initiated ad-hoc and directly on mobile devices bysending invitations to arbitrary contacts.

The main contribution of this thesis is a recommendation engine that supports theinitiator of a new group by recommending appropriate contacts.

We evaluated different aspects of our approach. Out of the obtained results, it can beconcluded that:

• A person belongs to multiple communities. Communities are often independentfrom each other. This is directly reflected in the structure of the person’s ego-graphwhere contacts belonging to the same community are densely connected amongeach other. Graph clustering algorithms are able to extrapolate these communitystructures. We showed that by knowing a person’s ego graph, community affiliationof the contacts can be discovered accurately.

• Although common, an alphabetic ordered list of contacts is unnatural. We showedthat by grouping contact according their community affiliation they can be found

64 7 Conclusion and Outlook

much faster, especially, if contacts belonging to the same community have to beselected consecutively.

• To extrapolate a user’s communities, knowledge of the ego-graph is required. Ina ’social networking’ approach, relationship between two users is indicated bya mutual agreement. This requires some effort by the users to keep their socialnetwork congruent to their real social structures. We showed that even if onlya small subset of a person’s social graph is available, clustering is adequate andrecommendation performs well. This is very promising and helps to overcome the’bootstrap problem’, when not many users are registered to the service.

Following the ’social networking’ approach, an inevitable requirement is a centralizedinfrastructure responsible for managing the social graph. In a second part of this thesiswe evaluated the feasibility of a decentralized approach where users’ ego-graphs areestimated on their devices independently. Estimation is done by analyzing patterns inthe communication stream directly on a mobile phone. No semantical information aboutthe content of a message is considered, only the detection of temporal communicationco-occurrence is taken into account. By evaluating this approach on real call logs, wecould conclude that:

• broadcasting and relaying patterns indicate hidden ties. However, both of thesepatterns do not occur frequently enough and very often, other communicationevents of completely unrelated contacts interfere and falsify the co-occurrencestatistics significantly. This finding is against the outcome of our survey whereparticipants indicated that they often send text messages to multiple receivers.

• Despite the detection of short-time patterns like broadcasting and relaying weanalyzed the long-term behavior of related contacts. Evaluation showed that thecommunication behavior of the ego with related contacts on average possess highersimilarities then between completely unrelated contacts. However, a clear distinc-tion is not possible and this measurement can rather be seen as a decision to findcompletely unrelated contacts then to use it for predicting relationships.

• Although related contacts exhibit similarities in the communication behavior, nonatural regularity could be discovered. The evaluation of daytime and weekdaydistribution of communication events did not imply any regularity. We believe thatubiquity of mobile communication is the main cause since a phone call can be doneall the time and everywhere. We conclude that group communication occurs moreor less randomly over time but contacts belonging to the same community showsimilar communication behavior.

The obtained results for the investigated distributed approach are not very promising.Although our evaluation showed the statistical significance of our assumption, the es-timation is too poor for real world applicability e.g. for usage in our recommendationengine. The influence of unrelated communication events is too strong. The unsatisfac-tory performance of the decentralized approach forced us to follow a ’social networking’approach for the prototype implementation of Cluestr.

7.2 Outlook 65

7.2 Outlook

7.2.1 Platform for Mobile Social Networking Services

The decision to implement Cluestr as a mobile web application arises from the hugepotential we see in this technology. Offline storage of data, background synchroniza-tion, widgets on the home screen, push mechanisms and browser APIs to access systemresources such as location and address book data will provide a rich development envi-ronment, where developers do not have to rely on any distribution channel and wheredevice fragmentation does not add additional complexity. We are just in the beginning ofthis development and many concepts and ideas have to prove themselves first. Standard-ization of technologies and concepts has to provide a base on which software developersand device manufacturers can work hand in hand to offer sophisticated platforms for arich, intuitive and elaborate user experience.

7.2.2 Mobile Social Networking Services

In this thesis we explored new concepts for mobile communication. We showed howfuture communication systems can profit from social features and we strongly believethat the incorporation of social elements in mobile devices will significantly change theway we communicate.

The success of a service however relies on the acceptance of the users. Reaching acritical mass to make a service successful is difficult to achieve in such a highly com-petitive field. Social service however strongly rely on a large user base for building anadequate social graph. To overcome the ’bootstrapping’ problem, a decentralized ap-proach as investigated in this thesis can become an alternative.

However, as our evaluation showed, user behavior can not easily be predicted and fur-ther analysis is required. In addition to solely evaluate temporal communication events,a fusion with other sensory data could significantly enhance the results. We believe thattaking spacial data into account could lead to a tremendous improvement in estimat-ing social structures. Also, a combination by incorporating semantical information oftextual messages could help to understand social structures or communication topics.

By all means, a better understanding of how people behave in socio-technical systemscould lead to a more accurate model out of which group interaction patterns can revealsocial structures offering better performance.

All in all, we see a huge potential in including social information in combination withsensory data in future mobile services. Many interesting and novel services may emerge,mining such data to assist the owner of a mobile phone in every-day activities.

Appendix A

Notations

The following terms were used in the consequent meaning throughout this thesis.

Social Network A social network is a social structure made of nodes linked by ties.Nodes are the individual actors within the networks, and ties are the relationshipsbetween the actors.

Social Network Analysis Social network analysis is the mapping and measuring ofrelationships and flows between people, groups, organizations, computers, web sites, andother information/knowledge processing entities. Social network analysis provides botha visual and a mathematical analysis of human relationships. Management consultantsuse this methodology with their business clients and call it Organizational NetworkAnalysis.

Social Networking Service A social network service focuses on building online com-munities of people who share interests and activities, or who are interested in exploringthe interests and activities of others. Most social network services are web based andprovide a variety of ways for users to interact, such as e-mail and instant messagingservices.

Social Graph Graph representation of the social network structures.

Ego-Graph Ego-graph is a sub-graph of a social graph consisting of a single actor(ego E) together with the actors the ego is connected to (alters, contacts Ci) and all thelinks among those alters. Such graphs are also known as the first order neighborhoodsof ego.

Ego Is an individual whose social structures are of interest.

Alter Alters are individuals an ego is connected to. Together with an the ego, theyspan up the ego-graph. On mobile phones, alters are also referred to as contacts andare used mutually throughout this thesis.

Community A community is a group of interacting people sharing a common identity.Examples for communities are ’university colleagues’, ’coworkers’, ’family’, ’friends’ etc.

68 A Notations

Cluster A cluster is a subgraph such that the density of edges within it (intra-clusteredges) is larger than the density of edges connecting vertices outside the cluster (inter-cluster edges). We refer to the term cluster as a set of vertices algorithmically deducedto represent a community. The aim is to have clusters as congruent as possible to realcommunities.

70 B Questionnaire

Appendix B

Questionnaire�� !"��#$�%&' ()* + ,-*.'/-0�� 12��3 1�2453 62663 5726�38� 42��3 4�2143 �662663 �92�73:�;�� <��=��>? � � �� @ABCA ,)CB+D /'-E)&FB/G ,'&HBC', +&' ()* ,*I,C&BI'. -)J�#��K��L 4�27�3 4�2�63 ��2��3 4427�3�MN��OP 7425�3 ��2563 �2793 ��2673QR�S#�� 92�93 52563 �24�3 �2�73T�"�U�N��#V 62�13 62763 62663 62973W�LNM 62�13 62563 �2793 62�73T��L�� 42�73 42563 ��2�93 �2�13X��$ 92��3 ��2463 4925�3 �27738�� 62�13 12�63 �2793 �2�43WMY�� 52163 52663 ��2��3 52163Z)E )[-'/ .) ()* HB,B- ,*CA E'I,B-',J8�"�� 12�73 ��2743 ��2�13 �72593T�� MY#� �� # ��L 4127�3 492143 ��2�13 4�2�73W�� # ��L �12�73 �42793 ��2��3 �925�34\� M� �� #��L 4424�3 492113 ��2��3 4726�3]#�VR �42663 �92�93 62663 ��2663��"��#V M� �� # �#R 72663 52��3 62663 925�3@A+- B, ()*& ^+B/ ^)-BH+-B)/ -) HB,B- -A',' ,B-',J�� _�� 2943 �92�73 924�3 �42143�� #�� "� ��#$�� 9624�3 7927�3 ��2573 712563�Y#�� M��M ��2543 ��27�3 �629�3 ��2�43Q#�#$� KN�� M#�M� �24�3 �2��3 ��2953 �2�53WMY�� 924�3 �625�3 �921�3 ��2753`��;�� a��b? � � �� !"��#$�%&' ()* +ID' -) ,cDB- ()*& CB&CD' )[ [ &B'/., B/-) .B[['&'/- G&)*c,J@ABCAJd�\��L��e _# �VRe � fe f�$Y\��Y��Ve Q�V�M#�Re �S��Me gh��i�j:��k �;� l��b��m;n � � �� !"��#$�%&' ()* *,B/G -A' E'I ,'&HBC' o)).D'pCAJ� ��qM L�� M �12�73 �42�13 492663 �92953� ��V�� VR N�� M 452�43 �427�3 62663 462�43� �_M�� "�� "�M#M�� 926�3 ��24�3 ��2�13 �72��3� #V��#�R ��M�#M�� S�VV� �42�93 972663 962663 7�2�138�"�� 2��3 52��3 52��3 12�93r)*D. ()* -AB/F )[ *,B/G o)).D' )/ ()*& ^)IBD' cA)/'J� #V��#�R N�� MY�� _�#MN�� 2463 625�3 62663 �26430�� 9127�3 992�13 5�2��3 9129938� 7�2�43 7�2�13 ��2�13 7�2773

71stuvwxyz{| }u~~v��y��u� vz�� y �ux��{ ��u�{ �� ¡¢ ¢£¤¥¦ §¨� ©¡¥ª ��¥©� ©«¬¬ª ¢¥©¤§¡¥ ¡ª ¡¢� ¡®¤¬� ¯¨¡¥� °±��² �³ �� µ¶·�¸ µ¶µµ¸ µ¶µµ¸ µ¶��¸±��² �¹��º» ¼¶¼½¸ ��¶�� ¶��¸ �µ¶��¸¾� �µ¶·�¸ ¼¼¶¼�¸ ��¶�¿¸ ¼�¶µµ¸Àª �£Á ª ¡� Â¨«§ ¯¢�¯¡£�£ ° Ã��Ä��´ ��Å¹��Æ¹´��² ³�� ³Ç�² ��Å¹��Æ��Å ��´��Å�² È�� ¡¢ £�¥É¤¥¦ «¥ ÊËÊ §¡ ¢¬§¤¯¬� ©¡¥§«©§£°±��² �³ �� ¶�¼¸ ��¶½Ì¸ Ì�¶�¿¸ �Ì¶�·¸±��² �¹��º» ·Ì¶¿�¸ ·½¶��¸ Ì�¶�¿¸ ·�¶·�¸¾� �Ì¶�½¸ �µ¶¿¿¸ ��¶�¿¸ �½¶�½¸Àª �£Á ª ¡� Â¨«§ ¯¢�¯¡£�£ ° Í�º��¹» Å��´��Å�² ��Î��Çº��Å �Ï��´�² ��Ï�´¹´��² ��ÐÇ��´�² È�� ¡¢ ¢£¤¥¦ §¨� ®¢¤¬§Ñ¤¥ ¦�¡¢¯¤¥¦ª ¢¥©§¤¡¥«¬¤§ ¤¥ ¡¢� ¯¨¡¥�Ò£ «ÉÉ��££ ®¡¡Ó°±�� µ¶µ�¸ ½�¶��¸ ��¶�¿¸ �·¶¼�¸¾� ¼�¶�Ì¸ ¿�¶µ¿¸ ¼�¶��¸ ¼Ì¶�¿¸�� ¡¢ ¢£¤¥¦ ¡¢� ¡®¤¬� ¯¨¡¥�ª ¡� £©¨�É¢¬¤¥¦ ��§¤¥¦£ °±��² �³ �� ÌÌ¶�½¸ Ì·¶�µ¸ ·µ¶µµ¸ ÌÌ¶¼�±��² �¹��º» �¿¶µ�¸ �¿¶µ�¸ µ¶µµ¸ ��¶Ì½¸¾� �¼¶¼½¸ �¿¶�� ·µ¶µµ¸ �¼¶¿�¸Àª �£Á ¨¡Â «¥ ¯�¡¯¬� «�� § ¯¤©«¬¬ ¤¥Ô¡¬Ô�É°Õ¹��¹´�� ³�� ´� �µ�ux��{ Öy�y }u~~v��y��u� �� ¡¢ £�¥É¤¥¦ ×Ñ«¤¬£ª �¡ ¡¢� ¡®¤¬� ¯¨¡¥�°±��² �³ �� Ì¶¿�¸ ��¶�¿¸ ·¼¶��¸ ��¶·½¸±��² �¹��º» �¼¶�Ì¸ �·¶¿�¸ ½·¶µµ¸ �¿¶·¼¸¾� ¿�¶�½¸ �¿¶·Ì¸ ��¶�¿¸ ¿µ¶�µ¸Ø¡ ¡¢ Ô¤£¤§ ¡®¤¬� Â�®Ñ£¤§�£ Â¤§¨ ¡¢� ¡®¤¬� ¯¨¡¥�°±��² �³ �� ¿¶¿Ì¸ �¶�Ì¸ ·µ¶µµ¸ ¼¶¿�±��² �¹��º» �·¶Ì¼¸ ½½¶¼� Ì�¶�¿¸ ��¶��¸¾� ¿�¶¿�¸ ¿�¶µ·¸ ¼¶��¸ ¿�¶��Àª �£Á Â¨¤©¨ ¡¥�£° �ÙÙÚÛÕÕ �´�¶ ´��´¹Üº�� ² Å��Åº�² �� ² ��Ý�Þ��¹² È

72 B Questionnaire

Appendix C

Task Description

Prof. Roger Wattenhoferphone +41 1 632 6312fax +41 1 632 [email protected]

Master Thesis “Mobile Social Networking”

This document describes the context and the objectives of the master thesis of Martin Wirz.The thesis will be jointly conducted by ETH and Swisscom AG.

Changes in the actual work are possible provided that both parties agree.

Context Social networking services (SNS) are becoming ubiquitous on the internet such asFacebook, LinkedIn and many others . However, at the current stage, there is little usage ofSNS on mobile devices due to device and user interface constraints. The main SNS on mobileis still the address book application coupled with SMS. In 2008, we recognize the first signs ofa shift in the mobile eco-system as mobile phones operating systems become open and moreuser friendly, such as with Google Android, WebKit browser technologies, AppleSs UI designsor linux-based software stacks.

Through this change and with the newly available transmission speeds, traditional barriersare torn down and it becomes possible to implement mobile services with a good user experi-ence. This master thesis shall focus on mobile social networking services and leverage the newopportunities from the recent developments.

Objectives The objective of this Master thesis is to come up with novel features in the areaof social mobile networking, and to develop a mobile social networking engine into which thesefeatures can be incorporated. In the end, the system has to be evaluated and tested with realusers.

• Mobile Social Networking EngineGive answers on what are the major components required for a next generation social

74 C Task Description

networking system running on mobile platforms. Where should these components resideand who should be in control of the data? Is a decentralized approach feasible?Functionality of the engine could include

– Optimized use of state-of-the-art mobile platforms and existing mobile networkinginfrastructure

– Establish relationships with friends and share data (e.g. profiles)

– Support grouping or tagging of persons, relationships and other elements

– Provide fine-grained permission control to data accessible by other Persons andgroups of persons

– Push and request functionality for data transfer (e.g. location request)

– offer automation to master todaySs information overload, such as by defining ruleson events

• ApplicationsWhat applications and services are most appealing and promising on mobile devices?Example applications could include:

– Address book replacement by user profiles (similar to social network sites)

– Location based services and navigation

– Calendar and poll functionality such as doodle

– Search functionality

Based on an analysis, user cases will be defined and applications will be implemented ontop of the engine.

• Use CasesTo verify the feasibility and the acceptance of such services, an in-depth study basedon subject questioning and behavior analysis will performed. Therefore, use cases andcorresponding metrics will be defined and measured using a test user group.

Work Plan

• Introduction and state of the art [2W] What exists on the market already, whatknowledge is available at ETH and Swisscom? What are the new opportunities with theopen mobile operating systems?

• Concept [3W] What does the mobile social engine have to offer? Which features do wefocus on? What services and applications do we want to deploy? What use cases do weenvision for the test users?

• Detail Design [4W] How shall the implementation be done? Which platform, whichframeworks to use? Design, on how to extend the platform, reuse components in future?

• Implementation [9W] Provide a stable demonstrator to be used by anyone. (if possiblework together with UI designers)

• Test [5W] Provide the the service to a test user group of 10-20 people. Do qualitativefeedback rounds of the use cases (acceptance of users, feature requests) and analyse usageaccording to predefined metrics.

• Final report and presentation [4W]

75

Martin’s Duties

• One meeting per week with the advisers at Swisscom.

• Weekly status updates by mail to the adviser at ETH (Michael).

• An introduction presentation at ETH (after approximately 2 months).

• A final presentation to be hold once at ETH and once at Swisscom premises.

• A final report, presenting work and results. This report should also include a criticalreview of the work.

• Two copies of the report (each containing a CD with relevant code, etc.) should behanded in in the end.

General Remarks

• While the master candidate will gather input through discussions with Swisscom (so-cial networking experts, usability professionals, computer scientists), it is his own andindependent work.

• Swisscom provides a working place including all necessary hardware and software infras-tructure at the Swisscom tower at Ostermundigenstrasse 93, 3006 Bern.

• Generally, the candidate works on his thesis at Swisscom premises. It is however possibleto work at remote locations for up to 20% of the time.

• Candidate provides weekly status report to ETH and Swisscom

• All results have to be publishable.

• Any data (usage-data, measurements, etc.) acquired during the project is made availableto both parties (Swisscom and ETH) for further use.

• The software (including code) developed during the thesis is made available to bothparties (Swisscom and ETH) for further use. For ETH only academic (non-commercial)use is allowed.

• Duration: March 1, 2008 − August 31, 2008

Contact Persons

• Michael Kuhn: [email protected], ETZ G61.4, phone 044 632 77 30• Reto Grob, Swisscom AG: [email protected], phone 079 770 4267• Rolf von Behrens, Swisscom AG: [email protected], phone 079 200 7983• Roger Wattenhofer: [email protected], ETZ G63, phone 044 632 63 12

Bibliography

[1] G. Cselle. Organizing email. Master’s thesis, ETH Zurich, 2006.

[2] S. Farnham, W. Portnoy, A. Turski, L. Cheng, and D. Vronay. Personal Map:Automatically Modeling the User’s Online Social Network. Proceedings of INTER-ACT’03, pages 567–574, 2003.

[3] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence througha social network. Proceedings of the ninth ACM SIGKDD international conferenceon Knowledge discovery and data mining, pages 137–146, 2003.

[4] M. Richardson and P. Domingos. Mining knowledge-sharing sites for viral market-ing. Proceedings of the eighth ACM SIGKDD international conference on Knowl-edge discovery and data mining, pages 61–70, 2002.

[5] J.J. Xu and H. Chen. Fighting organized crimes: using shortest-path algorithms toidentify associations in criminal networks. Decision Support Systems, 38(3):473–487, 2004.

[6] S. Gregory. An Algorithm to Find Overlapping Community Structure in Networks.LECTURE NOTES IN COMPUTER SCIENCE, 4702:91, 2007.

[7] R. Agrawal, S. Rajagopalan, R. Srikant, and Y. Xu. Mining newsgroups using net-works arising from social behavior. Proceedings of the 12th international conferenceon World Wide Web, pages 529–535, 2003.

[8] MEJ Newman. Modularity and community structure in networks. Proceedings ofthe National Academy of Sciences, 103(23):8577–8582, 2006.

[9] M. Girvan and MEJ Newman. Community structure in social and biological net-works. Proceedings of the National Academy of Sciences, 99(12):7821, 2002.

[10] J. Baumes, M. Goldberg, MS Krishnamoorthy, M. Magdon-Ismail, and N. Preston.Finding communities by clustering a graph into overlapping subgraphs. Proceedingsof the IADIS International Conference on Applied Computing, pages 97–104, 2005.

[11] J. Baumes, M. Goldberg, and M. Magdon-Ismail. Efficient identification of over-lapping communities. IEEE International Conference on Intelligence and SecurityInformatics (ISI), pages 27–36, 2005.

[12] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping com-munity structure of complex networks in nature and society. Arxiv preprintphysics/0506133, 2005.

78 Bibliography

[13] B. Adamcsek, G. Palla, I.J. Farkas, I. Derenyi, and T. Vicsek. CFinder: locatingcliques and overlapping modules in biological networks, 2006.

[14] V.E. Krebs. Mapping networks of terrorist cells. Connections, 24(3):43–52, 2002.

[15] J. Baumes, M. Goldberg, M. Hayvanovych, M. Magdon-Ismail, W. Wallace, andM. Zaki. Finding Hidden Group Structure in a Stream of Communications. LEC-TURE NOTES IN COMPUTER SCIENCE, 3975:201, 2006.

[16] S. Lo and C. Lin. WMR–A Graph-Based Algorithm for Friend Recommendation.

[17] U. Brandes. A faster algorithm for betweenness centrality. Journal of MathematicalSociology, 25(2):163–177, 2001.

[18] G. Sabidussi. The centrality index of a graph. Psychometrika, 31(4):581–603, 1966.

[19] L.C. Freeman. A set of measures of centrality based on betweenness. Sociometry,40(1):35–41, 1977.

[20] U. Brandes, M. Gaertler, and D. Wagner. Experiments on graph clustering al-gorithms. Proceedings of the 11th Annual European Symposium on Algorithms(ESAS03), pages 568–579, 2003.

[21] MEJ Newman. Finding community structure in networks using the eigenvectors ofmatrices. Physical Review E, 74(3):36104, 2006.

[22] MEJ Newman and M. Girvan. Finding and evaluating community structure innetworks. Physical Review E, 69(2):26113, 2004.

[23] S. Gregory. A Fast Algorithm to Find Overlapping Community Structure in Net-works.

[24] W. Aiello, F. Chung, and L. Lu. A random graph model for massive graphs.Proceedings of the thirty-second annual ACM symposium on Theory of computing,pages 171–180, 2000.

[25] A. Fred and A.K. Jain. Robust data clustering. Proc. Conference on ComputerVision and Pattern Recognition, CVPR, 2003.

[26] W.M. Rand. Objective criteria for the evaluation of clustering methods. Journalof the American Statistical Association, 66(336):846–850, 1971.

[27] C. Haythornthwaite. Social networks and Internet connectivity effects. Information,Communication & Society, 8(2):125–147, 2005.

[28] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications.Cambridge University Press, 1994.

[29] J. Resig, S. Dawara, C.M. Homan, and A. Teredesai. Extracting social networksfrom instant messaging populations. Workshop on link analysis and group detec-tion, Knowledge Discovery in Databases, August, 2004.

[30] K. Carley. A theory of group stability. American Sociological Review, 56(3):331–354, 1991.

Bibliography 79

[31] L.A. Adamic and E. Adar. Friends and neighbors on the Web. Social Networks,25(3):211–230, 2003.

[32] C. Faloutsos, K.S. McCurley, and A. Tomkins. Connection subgraphs in socialnetworks. SIAM International Conference on Data Mining, Workshop on LinkAnalysis, Counterterrorism and Security, 2004.

[33] H.Chalupsky Shou-de Lin. Unsupervised Link Discovery in Multi-relational Datavia Rarity Analysis. 2003.

[34] M.F. Schwartz and D.C.M. Wood. Discovering shared interests using graph anal-ysis. Communications of the ACM, 36(8):78–89, 1993.

[35] H.W. Lauw, E.P. Lim, T.T. Tan, and H.H. Pang. Mining Social Network fromSpatio-Temporal Events. Workshop on Link Analysis, Counterterrorism and Se-curity, in conjunction with SIAM Data Mining Conference, 2005.

[36] Y. Wang, E.P. Lim, and S.Y. Hwang. On Mining Group Patterns of Mobile Users.

Mobile Social Networking for Enhanced Group Communication

Documents