University of Luxembourg SCILABTEC 2015, Friday 22 May. ANALYSIS Of Call Detail Records Based on Scilab Foued Melakessou
Jul 29, 2015
University of LuxembourgSCILABTEC 2015, Friday 22 May.
ANALYSIS Of Call Detail Records Based on ScilabFoued Melakessou
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Outline
n Project Description
n Problematic n Big Data Analysis n Population Mobility Modeling
n D4D Challenge (2015 Senegal)
n Scilab Contribution: NARVAL
n Conclusion & Perspective
n FNR Core Project (April 2014-March 2017)
n The MAMBA (MultimodAl MoBility Assistance) project intends to propose and validate a multimodal mobility platform that relies on new Internet technologies to interconnect different mobile services with the aim of providing relevant travel advice based on users’ contexts, so as to optimize overall system performance n Real time traffic conditions n Status of existing public transport services (e.g., buses, trains) n User preferences
n Analysis of large mobility datasets (e.g., mobile phone traces)
n Influence the itinerary of the users by suggesting new multimodal routes based on their preferences
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
MAMBA Project
n Transportation Research: human mobility n Urban planning n Traffic forecasting n Resource management
n The evaluation of mobility models helps to better design and develop future infrastructure in order to better support the actual demand
n Lack of tools to monitor the time-resolved location of individuals n Observation studies n Forms: population census
n Call Detail Records (CDRs), generated by mobile phone operators can be used to retrieve mobility patterns of the population under study. n Extraction of realistic mobility models adapted to the Senegal use case
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Problematic
n D4D Senegal is an innovation challenge open on ICT Big Data for the purposes of societal development n NETMOB’15, Media Lab, MIT, April 8-10, 2015
n Sonatel and the Orange Group are making anonymous data, extracted from the mobile network in Senegal, available to international research laboratories
n 5 priority subject matters: health, agriculture, transport/urban planning, energy and national statistics n Advancing research in the field of Big Data (anonymisation, datamining, visualization and
cross-referencing) n Involving local stakeholders and guaranteeing benefits in education and development of the
ecosystem of local start-ups n Advancing in anonymisation techniques to allow sharing of data that is relevant for society
while respecting privacy
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Data 4 Development Challenge: Senegal
n Sonatel/Orange (Société nationale des télécommunications) is the main mobile telecommunication operator
n Anonymous call patterns of Orange’s mobile phone users in Senegal have been released for the 2015 D4D challenge: phone calls and text exchanges between 9M of Orange customers in 2013
n 1666 Base Stations: Antenna ID, Lat, Lon n SET1 (53.8GB): Voice and Text traffic (#Call,
TCall and #Text) for each month (antenna-to-antenna traffic on an hourly basis) n 2013-01-01 00,1,1,1,54 n 2013-01-01 00,1,2,1,39
n SET2 (38.4GB): Fine-grained mobility data on a rolling 2-week basis (trajectories of 300000 randomly sampled users) n 1,2013-01-07 13:10:00,461 n 1,2013-01-07 17:20:00,454
n SET3 (16.5GB): Coarse-grained mobility data, month by month, at the district level (trajectories of 150000 randomly sampled users) n 37509, 2013-01-29 15:00:00,3 n 84009, 2013-01-14 07:00:00,3
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Orange/Sonatel Call Detail Records
14
16
13
15
12.5
13.5
14.5
15.5
16.5
Latit
ude
Longitude
-17 -16 -15 -14 -13 -12
Region Population
3.14M
0.15M
59
Antenna
8
1 2 3
4
5
6 7
9
13
489#
1110
14
N3
N2
N7
N1N1
N2
N3
N4
N3
N5
N5N6
1 Dakar 8 Saint-Louis
2 Thies 9 Kolda
3 Djourbel 10 Sedhiou
4 Fatick 11 Ziguinchor
5 Louga 12 Kedougou
6 Kaolack 13 Tambacounda
7 Kaffrine 14 Matam
n Human activities heavily impact the behavior of calls and messages emitted and received during each period of the day n Model the daily traffic profile
supported in each base station (level of aggregation=1h) n Traffic variable: number of calls
#Calls, duration of calls Tcall and number of messages #Text
n Outliers removal process at 2σ n Normal traffic behavior n Detection of traffic anomalies
(cultural or sport events)
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Daily Profile Model
T
0 2 000 4 000 6 000 8 0000
Voice NS
500
1000
1500
T
#Cal
l
0 2 000 4 000 6 000 8 000
100000
70000
30000
0
TCal
l
T
Voice DS
Text S
0 2 000 4 000 6 000 8 000
#Tex
t
0
2000
4000
6000
8000
0 2 000 4 000 6 000 8 0000
1 000
500
1 500
Voice ND
T
#Cal
l
0 2 000 4 000 6 000 8 000
Voice DD
TCal
l
0
20000
40000
60000
80000
T
Text D
0 2 000 4 000 6 000 8 000
T
0
2000
4000
6000
8000
#Tex
t
n PX(a)={X1X2X3…X24} n Xi is the expected value of the traffic variable X
collected in the antenna a during the ith one hour time slot
n E(a)={E1E2…Ee} n List of extrema computed with a gradient algorithm n Succession of local minimums and maximums
n NaN, a statistics and machine learning toolbox for Scilab focusing on data with and without missing values encoded as NaN’s n Tmax_f: time when the first local maximum Nmax_f occurs (beginning of diurnal activities) n Tmax_l: time when the last local maximum Nmax_l occurs (end of diurnal activities) n M (respectively S) is the average traffic
(respectively the standard deviation) of the diurnal activities
n Ne: number of extrema n Tmin: time when the global minimum Nmin occurs
n Classification based on k-Means algorithm to find a set of maximally disjoint clusters (k=4) n Urban (Class 1) n Suburban (Class 2 and 3) n Rural (Class 4)
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
K-Means Classification
2 4 6 8 10 12 14 16 18 20 22 240
500
1000
1500
2000
2500
3000
3500
4000
T
N
Tmax_f
First MAX
n First and largest peak centered at 12AM
n Diurnal activities start at 08AM and finish at 11PM
n Rural population represents 55% of the total population (14M) n Highest population density in the area of Dakar, the capital of Senegal
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
#Call
Class 1Class 2Class 3
Class 0
Class 4
Class 1
Class 2
Class 3
Class 4
2 4 6 8 10 12 14 16 18 20 22 240
500
1000
1500
2000
2500
3000
3500#Call
T
n High activity during the night (08PM to 02AM)
n Young population n The median age is 18.4 years n 60% of the population are less than 24 years old
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
#Text
#Text
Class 1
Class 2
Class 3
Class 4
2 4 6 8 10 12 14 16 18 20 22 24
T500
0
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
6500
Class 1Class 2Class 3
Class 0
Class 4
n Mobile phone activity between the first and last peaks n Mean n Standard deviation
n Local vs Distant calls
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Geographical distribution of Caller/Callee activities (#Call)
n Level of aggregation: 1 Day
n Detection of local anomalies n Correlation #Call and Tcall
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Local Traffic Anomaly Detection
50 100 150 200 250 300 350
#Call Antenna 1
0
20
40
Day
0
1000
2000
3000TCall
50 100 150 200 250 300 350 Day
#Text
0
10
20
50 100 150 200 250 300 350 Day
σ2σ
n Level of aggregation: 1 Day n Number of antennas that present an anomaly the same day
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
National Traffic Anomaly Detection
Date (2013) Event 1 January New Year
24 January Mawlid (Prophet birth)
31 March - 1 April Easter
4 April Independence day
9 July - 7 August Ramadan
4 August Laylat Al Qadr
8 August Aid El Fitr
15 August Assomption
15 October Aid El Kebir
13-14 November Tamkharit (Achoura)
15 December to beginning of January Magal of Touba
25 December Christmas
n On the 1st August 2013, the new highway between Dakar and Diamniadio has been inaugurated
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Specific Event Analysis: Highway Inauguration
50 100 150 200 250 300 350 Day0
500
1000#Call Antenna 237
50 100 150 200 250 300 350 Day0
50000
TCall
#Text
50 100 150 200 250 300 350 Day0
500
1000
50 100 150 200 250 300 350 Day0
500
1000#Call Antenna 397
50 100 150 200 250 300 350 Day0
50000
100000TCall
50 100 150 200 250 300 350 Day
#Text2000
1000
0
n 67 antennas located near to the new highway
n Computation of the traffic growth after the inauguration n #Call: +15% n TCall: +23% n #Text: +29%
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Traffic Growth After the Opening of the New Highway
5 10 15 20 25 30 35 40 45 50 55 60-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
TCall#Call
#Text
Antenna (Highway) Id
Gro
wth
n Maps of primary flows n Mobility graphs are composed by all interconnections between antennas (respectively
districts) where users moved during the duration of the traffic collection n Congestion map provides for each antenna A, the number of users that have crossed A
during the duration of the traffic measurement n Each congestion map presents high values near to urban areas n Mobility graphs are highly correlated with the road network n Long links: there is a lack of information about intermediate locations where the users
traveled, if the inter-arrival time between two mobile activities is too large
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Mobility Demand: SET2 and SET3
n Other data sets used in this project n Type of data: OpenStreetMap n Perl Scripts (automatic extraction of national boundaries, highway/primary/secondary/
trunk roads)
n New Scilab module (CDRs analysis) added to the NARVAL toolbox n Complete software environment enabling the understanding of available communication
algorithms, but also the design of new schemes n Graph Optimization, Topology, Internet Traffic, Routing, Transmission Protocol, Route
Diversity, Mobility, Security, Anonymity, Path Planning, Wireless Sensor Network, etc. n Target audience: academics, students, engineers and scientists n http://atoms.scilab.org/toolboxes/NARVAL
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Scilab Contribution
NARVAL
Network Analysis and Routing eVALuation
c
Ver 3.0
n Daily Profile Model n Characterization of each base station traffic (amount of calls,
duration of calls and amount of text messages) n Classification: urban, suburban and rural modes n Correlation between each base station traffic and population
of its covering area
n Traffic Anomalies Detection n Analysis of national anomalies n Analysis of local anomalies, e.g. inauguration of the new
highway between Dakar and Diamniado
n Mobility graphs n Computation of mobility flows n Performance of congestion maps
n Future Work n Separation of daily profiles into 7 daily profiles
n Working day and weekend n Anomaly detection with an aggregation level of 1h
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Conclusion
Antenna 266: Week ProfilesMondayTuesdayWednesdayThursdayFridaySaturdaySunday
2 4 6 8 10 12 14 16 18 20 22 24Time
0
500
1000
1500
2000
2500
3000
3500
4000
4500
#Ca
ll
n A global open innovation initiative for climate resilience
n Now is the time to use big data to fight climate change n World leader will decide on a universal climate
agreement at COP21, potentially the most decisive climate conference in 25 years
n COP21 (Paris, 2015, Nov 30th to Dec 11th) is critical as the impacts of climate change are accelerating, and time to develop solutions is limited n Achieve a new international agreement on
the climate, applicable to all countries, with the aim of keeping global warming below 2°C
n Analyzing anonymised, aggregated data from digital sources such as mobile phones and bank transactions can provide valuable insights on human behavior and climate risks.
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
D4C DATA For CLIMATE ACTION CHALLENGE 2015
SCILABTEC 2015 Analysis of Call Detail Records Based on Scilab
Questions
n Thank You!
n Acknowledgement n This Data was made available by ORANGE/SONATEL within the framework of the D4D
Challenge. The author would like to thank Orange and SONATEL for the availability of these CDR’s datasets.
n The author would like to thank the National Research Fund of Luxembourg (FNR) for providing financial support through the CORE 2013 MAMBA project (C13/IS/5825301)
n Contact n [email protected]