Top Banner
Big Data Challenge COMP 41700 Seminars in Data Science
26

Telecom Italia Big Data Challenge

Feb 12, 2017

Download

Data & Analytics

Groupon
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Telecom Italia Big Data Challenge

Big Data ChallengeCOMP 41700Seminars in Data Science

Page 2: Telecom Italia Big Data Challenge

Summary of the presentation:

Short Introduction of Telecom Italia Big Data Challenge – Donagh Summary of Paper 1 and Paper 2 – Rajesh Other interesting insights we can draw from this dataset – Malika

Page 3: Telecom Italia Big Data Challenge

a contest designed to stimulate the creation and development of innovative technological ideas in

the Big Data field

Page 4: Telecom Italia Big Data Challenge

history

• Early 2014 Telecom Italia released first edition which was closed

• Success meant that the next iteration was open

• Freely available for anyone to use.

• https://dandelion.eu/datamine/open-big-data/

Page 5: Telecom Italia Big Data Challenge

data sets

• Geo-referenced (Milan and the Autonomous Province of Trento)

• Anonymised

• Millions of records

• November -> December 2013

• extracted from telecom records, energy, weather, public and private transport, social networks

Page 6: Telecom Italia Big Data Challenge

Milano / Trentino

• Grid

Page 7: Telecom Italia Big Data Challenge

grid

Page 8: Telecom Italia Big Data Challenge
Page 9: Telecom Italia Big Data Challenge

Milano datasetsDomain

Telecommunications SMS, Call Internet; MI to Provinces; MI to MI;

Weather Weather Station Data ; Precipitation

Environment Air Quality

News Milano Today

Social Tweets

Page 10: Telecom Italia Big Data Challenge

tweets

• username - anonymised

• entities

• language

• municipality

• Tweet time

• geometry

Page 11: Telecom Italia Big Data Challenge

Paper 1(Anatomy and efficiency of urban multimodal mobility)

Main Goal: To find the optimal time-respecting path between two Geo locations in multi-modal layer

Where, l(a,b) is the quickest length (time respecting and minimal) trips on the network d(a,b) is the euclidean distance from the origin 'a' to the destination 'b'

Page 12: Telecom Italia Big Data Challenge

Rail becomes then dominant at 40 kms and air travel is dominant for trips of distance of order 700 kms. Other transportation modesplay a secondary role, with peaks at 22 kms for the Metro, 40 kms for Ferries and 70 kms for Coaches

Page 13: Telecom Italia Big Data Challenge

The bus system is covering most of the short trips, whereas the advantage of using the Metro and Rail systems emerges progressively for longer distances

Page 14: Telecom Italia Big Data Challenge

The total number of stop events Omega grows proportionally with the urban area populations P.

Where, C(alpha) is the number of stop events in the layer 'alpha' and Delta-t is theduration of the time interval

Page 15: Telecom Italia Big Data Challenge

Paper 2(High resolution population estimates from telecommunications data)

Data Source: Telecommunications(provided by Telecom Italia) Census data

Satellite images(provided by Landsat)

Main Goal: Create high-resolution(235m x 235m) population estimates in time and space

Difficulties: Population counts can change rapidly that means is hard to acquire local census estimates in a timely and accurate manner. The correlation coefficient between call volume and the

underlying population distribution vary with time.

Page 16: Telecom Italia Big Data Challenge

Building map:

41% of area on the map are directly generated.

To classify the remaining 59% , they train a Random forest classifier using OpenStreetMap data as labeled training examples.

Page 17: Telecom Italia Big Data Challenge

Population is distributed exponentially in the beginning:29% of grid-squares have zero population5% of grid-squares have a population of 13% of grid-squares have population of 2 and so on.

39% of grid-squares have a population over 100

Then follow a normal distribution with a mean of 400 persons

Population Distribution:

Page 18: Telecom Italia Big Data Challenge

10-minute intervals for each of the 235m × 235m grid cells.

Communication activity is approximately log normal There are 5 types of communications activity: SMSIN, SMSOUT, CALLIN, CALLOUT, and INTERNET.

Telecommunications activity:

Page 19: Telecom Italia Big Data Challenge

Elementary Model:

Previous research have suggest that the relation between location(i), population and telecommunication:

(w stands for call volume, p stands for population)

Not Perfect:

The relationship between call volume and populationin this region is much weaker below a threshold of 351 persons.

Main reason is that the dense population area tend to have more cell tower for we to observe the relationship.

Model(1):

Page 20: Telecom Italia Big Data Challenge

Model(2):

Try to find the best hours of call volume data:

Each type correlates most strongly during the hour from 10 am to 11 am, and as with the total call volumes, CALLOUT has the greatest correlation, Approximately 0.68. Thus we use CALLOUT from 10 am to 1 am for the wi in model(2).

Page 21: Telecom Italia Big Data Challenge

Where else can we use the Telecom Italia Dataset?

Page 22: Telecom Italia Big Data Challenge

Analyzing cities using the space-time structure of mobile phone network

• Attempts to connect telecom usage data from Telecom Italia mobile to geography of human activity

• Usage of telecom data to enhance the understanding of cities as space of flows

Page 23: Telecom Italia Big Data Challenge

Using Telecom Dataset for social network analysis

 investigating social structures through the use of network and graph theories.

Anthropology, Biology, Communication Studies, …etc

social network analysis

Page 24: Telecom Italia Big Data Challenge

Traffic monitoring in urban area.

• Use of Telecom data to track the dense regions.• Rerouting strategies• Increase the public transport in dense area.• Provide more taxies in dense area.

Page 25: Telecom Italia Big Data Challenge

Other Usages

Users localization Security

Health Care : Tracking users exercises

Page 26: Telecom Italia Big Data Challenge

Thank you...

Special Thanks to my team members:Hao Wu and He Ping