Top Banner
STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo
16

STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

STARTING EXPLORING MOBILE PHONE DATA

IN THE SANDBOX

Pilar Rey del Castillo

Page 2: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

2

Mobile phone data in the Sandbox

• Special case: only since October 2014• Limited information provided in the dataset• Still very interesting to analyse– Sensors of human and social behaviour

(location...)– Example of requirements of exploratory step

comparing with other type of data in the Sandbox

– Aim describe initial steps in attempting to produce meaningful results for statistical purposes

Page 3: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

3

Location or positioning data

• Concept in mobile phones & statistics context• User assigned to a number of neighbouring

antennas for load balancing reasons• Types

– Active– Passive: Call Detail Records (CDRs)...

Passive location occasional samples of the approximate locations of the phone's user

Page 4: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

4

Mobile phones datasets (1)

• D4D Challenge: Orange's “Data for development” in Ivory Coast

• Anonymised Call Detail Records (CDRs) of outgoing phone calls & sms exchanges – Orange’s customers in Ivory Coast – Between December 1, 2011 and April 28, 2012 (150

days, 5 months)

• Sandbox IT infrastructure: perfect

Page 5: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

5

Mobile phones datasets (2)

• Total antenna-to-antenna traffic on an hourly basis ( 5 million customers)

• Individual trajectories for 50.000 customers for two week time windows

Page 6: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

6

Literature exploiting location• Supplementary information at the micro level

(ground truth) – Lausanne Data Collection Campaign (Nokia 2009-2011)– Reality Mining Project (MIT 2004-2005)– Ad hoc experiments, conducting surveys… : Isaacman et

al. (2011), De Oliveira et al. (2011)– …

• Just CDRs: Assumptions on the users' behaviour… – Orange Data Challenges (Ivory Coast, Senegal)– Järv et al. (Estonia, 2012)– Kung et al. (Portugal, IC, Saudi Arabia, Boston, Milan,

2014)– …

Page 7: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

7

Ivory Coast data• Positioning data our aim: human

home -> work commuting figures• Way to proceed: obtain results under

certain assumptions and compare • First assumptions– Orange's customers represent

population (96% subscriptions per 100 inhabitants, 2013)

– Behaviour of 50000 customers sample is representative of mobility behaviour (to be assessed later)

Page 8: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

8

2nd step: model to draw meaningful information

• Problem of oscillations: antennas aggregation by section = county x urbanization 157 sections

• Problem of giving a meaning to user's location: daily & weekly patterns of use as discriminative features– Isaacman et al. (2011):

• home weekends + weekdays between 7 pm & 7 am• work weekdays between 1 pm & 5 pm

– Kung et al. (2014):• home weekdays between 8 pm & 8 am• work weekdays between 8 am & 8 pm

Apart from other sophisticated filtering…

Page 9: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

9

Page 10: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

10

Commuting in Ivory Coast

• Sample of 50000 customers 51% cluster 1 28% cluster 2 21% cluster 3

• Almost 50% of the sample home -> work located Estimate cross-tabulation commuting between Ivory Coast sections

Page 11: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

11

Main commutes (%) home-> work between sections

WORK HOME

Abidjan_1

Bouafle_3

Korhogo_0

Bandama Sud_ 0

Basso Sassandra_

0

Sinfra_1

Touba_1

% Shown

Abidjan_ 0 88.4 0.2 0.0 0.4 1.7 0.2 0.0 90.9

Abidjan_ 3 85.8 0.2 0.0 0.5 0.8 0.2 0.0 87.5

Bongouanou_ 0 54.1 0.0 0.0 0.5 6.0 1.6 0.5 62.8

Bongouanou_ 3 66.0 0.0 0.0 0.0 2.0 0.0 0.0 68.0

Dabou_ 0 87.5 0.0 0.0 0.0 12.5 0.0 0.0 100.0

Ferkessedougou_ 3 65.5 0.0 0.0 0.0 1.7 0.0 0.0 67.2

J acqueville_ 0 70.6 0.0 0.0 5.9 0.0 0.0 0.0 76.5

J omoro_ 1 35.7 0.0 7.1 0.0 7.1 0.0 0.0 50.0

Korhogo_ 3 57.1 14.3 14.3 0.0 0.0 14.3 0.0 100.0

Niakaramandougou_ 0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0

Odienne_ 3 66.7 0.0 0.0 0.0 33.3 0.0 0.0 100.0

Laghi_ 0 55.0 0.3 0.0 1.9 5.8 0.0 0.0 63.1

Fromager_ 0 60.6 0.9 0.0 0.9 6.3 0.0 0.0 68.8

Seguela_ 0 33.3 33.3 0.0 33.3 0.0 0.0 0.0 100.0

Tengrela_ 3 50.0 0.0 0.0 0.0 11.1 0.0 0.0 61.1

Touba_ 3 33.3 0.0 0.0 0.0 22.2 0.0 22.2 77.8

Page 12: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

12

Final remarks

• CDRs useful tool to learn and test new methods (although no reliable figures produced)

• Just a portion of possible ways to exploit CDRs promising source (need more research)

• Another possible research strand: develop an "OfficialStatistics" app for smartphones gathering ground truth

Page 13: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

13

Page 14: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

14

Page 15: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

15

References• de Oliveira, R.,Karatzoglou, A., Cerezo, P. C., de Vicuña, A. A. L. and

Oliver, N. (2011), “Towards a psychographic user model from mobile phone usage”, in Desney S. Tan; SaleemaAmershi; Bo Begole; Wendy A. Kellogg &ManasTungare, ed., 'CHI Extended Abstracts' , ACM

• Isaacman, S., Becker, R., Cáceres, R., Kobourov, S., Martonosi, M., Rowland, J. and Varshavsky, A. (2011), “Identifying Important Places in People’s Lives from Cellular Network Data”, Lecture Notes in Computer Science Vol. 6696, pp. 133-151.

• Järv,O., Ahas, R., Saluveer, E., Derudder, B.,and Witlox, F. ( 2012) “Mobile Phones in a Traffic Flow: A Geographical Perspective to Evening Rush Hour Traffic Analysis Using Call Detail Records”, PLoS ONE 7(11), http://dx.plos.org/10.1371/journal.pone.0049171

• Kung, K.S., Greco, K., Sobolevsky, S., and Ratti, C. (2014), “Exploring Universal Patterns in Human Home-Work Commuting from Mobile Phone Data”, PLoS ONE 9(6): e96180. doi:10.1371/journal.pone.0096180

Page 16: STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo.

16