Applying Big Data tools to acquire and process data on cities€¦ · APPLYING BIG DATA TOOLS TO ACQUIRE AND PROCESS DATA ON CITIES JACEK MAŚLANKOWSKI, Ph.D. DEPARTMENT OF BUSINESS
Post on 20-Aug-2020
4 Views
Preview:
Transcript
APPLYING BIG DATA TOOLS TO ACQUIRE AND PROCESS DATA ON CITIESJACEK MAŚLANKOWSKI, Ph.D.DEPARTMENT OF BUSINESS INFORMATICS
FACULTY OF MANAGEMENT
UNIVERSITY OF GDAŃSK, POLAND
Bucharest, Decem
ber 7-8, 2017
1
AGENDA
Prerequisites
Possible use cases and pilots conducted
Conclusions
Bucharest, December 7-8, 2017
2
PREREQUISITESCHALLENGES AND OVERVIEWTHE GOAL OF THE STUDY AND GENERAL CHARACTERISTICS
Bucharest, December 7-8, 2017
3
CHALLENGES AND OVERVIEW
Lacks of statistical data at regional level(sampling frame, cost).
We have to identify new data sources to acquire the data with minimum cost (no cost) at regional or city level.
Based on UNECE taxonomy Big Data sourcescan be divided into three categories: Human sourced information Process mediated data Machine generated data
Lots of Big Data sources exist but it is not easyto access them (e.g., MCR, CDR, road sensors, …)
Bucharest, December 7-8, 2017
4
OVERVIEW AND THE GOAL OF THE STUDY
The goal of the paper was to study and analyze the possiblealternative Big Data sources for analysis at regional (city)level
Smart city will become smarter with the current and reliabledata help in decision-making process
The pilot surveys conducted in various statistical areas will beshown in this paper
Bucharest, December 7-8, 2017
5
POSSIBLE USE CASES AND PILOTS CONDUCTED
METHODS, TECHNOLOGIES, DATA SOURCES
Bucharest, December 7-8, 2017
6
WHAT DO WE NEED TO ACQUIRE AND PROCESS BIG DATA?
MethodsWeb scrapingText Mining/NLP Machine Learning
Data quality framework InputThroughputOutput
Bucharest, December 7-8, 2017
7
LIST OF ALTERNATIVE DATA SOURCES
Business data Job vacancies
Data on enterprises
Accomodation establishments(hotel chains)
Road traffic
Social statistics Life satisfaction
People’s opinion
Bucharest, December 7-8, 2017
8
CASE 1: JOB VACANCIES
Data we have Variety of European Labour Force surveys
– demand, supply, earnings etc.
Sampling does not allow to process the data at regional level.
Several Big Data sources exist.
Data to acquire with Big Data tools Job supply at regional and city level
Job demand at regional and city level
Number of job offers scraped (thous.):
140; 124; 46; 39; 9.9
Duplicates in the data
Classification is needed
Location is not always shown
Bucharest, December 7-8, 2017
9
CASE 1: RESULTS
Bucharest, December 7-8, 2017
10
Number of enterprises: 10.3 thous.Number of locations: 9.5 thous. 18880
11420
9317
5514
4141 3962 40075081
2265
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Warszawa Wrocław Kraków Poznań Katowice Łódź Gdańsk Niemcy Szczecin
JOB
VACA
NCI
ES
LOCATION
Job vacancies by the city
Liczba ofert przed klasyfikacją Liczba ofert po klasyfikacji
CASE 2: ENTERPRISES
What we know? Business registers
Not frequently updated
Don’t have full information on enterprise
What can be observed? Current NACE (type of economic activity)
Communication channels (what are the most important channels)
Methods that enterprise uses for itsactivities (e.g., e-commerce)
Bucharest, December 7-8, 2017
11
CASE 3: ACCOMODATION ESTABLISHMENTS (INCLUDING HOTEL CHAINS)
What we know? List of registered hotels and places
What can be added? Full, updated list of
accomodation base
The categories of objects even not registered
The object misplacedtheir category
Bucharest, December 7-8, 2017
12
14%
73%
8%2% 2% 1%
Price category of agritourismplaces
20PLN and less
20.01-40PLN
40.01-60PLN
60.01-80PLN
80.01-100PLN
100.01PLN and more
CASE 4: ROAD TRAFFIC
Current data sources Traffic loops
Road traffic sensors
Possible data sources Google Maps
Traffic density and anomalies(access to historical data)
Bucharest, December 7-8, 2017
13
CASE 5: PEOPLE’S OPINION
Current work We have sampling data that are gathered
and processed not in the real time
Process of typical opinion pool isexpensive
Data possible to gather
Bucharest, December 7-8, 2017
14
CASE 6: SOCIAL STATISTICS – LIFE SATISFACTION
Twitter popularity in Europe Training and testing datasetprecision recall f1-score supporthappy 0.62 0.75 0.68 56neutral 0.60 0.71 0.65 34calm 0.43 0.30 0.35 10upset 0.67 0.15 0.25 13depressed 0.59 0.62 0.60 21discouraged 0.59 0.50 0.54 20indeterminate 0.00 0.00 0.00 3avg / total 0.59 0.60 0.58 157
Bucharest, December 7-8, 2017
15
THE FRAMEWORK
Bucharest, December 7-8, 2017
16
(1) Web scraping
•HTML/PHP/ASP etc.
(2) Twitter API
•Text
(3) Machine Learning
•Classification
CONCLUSIONS SUMMARYFUTURE WORK
Bucharest, December 7-8, 2017
17
SUMMARY
There is a variety of alternative data sources
The quality is an issue
„Bad statistical data drives out good”
Bucharest, December 7-8, 2017
18
FUTURE WORK
Combining the data
Increase the data quality
Publish a dashboard
Bucharest, December 7-8, 2017
19
Dashboard for labour market mismatch
Bucharest, December 7-8, 2017
20
THANK YOU!JACEK MAŚLANKOWSKIUNIVERSITY OF GDAŃSK
POLAND
Bucharest, December 7-8, 2017
21
top related