Geographical Knowledge Discovery applied to the Social Perception of Pollution in Mexico City Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN Christophe Claramunt, Naval Academy Research Institute 1
21
Embed
Geographic knowledge discovery (PhD Theme) by Roberto Zagal
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Geographical Knowledge Discovery applied to the Social Perception of Pollution in Mexico City
Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN
Christophe Claramunt, Naval Academy Research Institute
1
Introduction (1)• Traditionally Pollution Data has been produced by
institutions, government and vendors• But now… the Pollution Data is produced by persons, too
2
Information about Pollution topic is expressed in different ways by:
Government, News media People in social networks
3
Introduction (2)
Introduction (3)
But…What about the certainty of this
information?
Introduction (4) What about ... inconsistency?
Id Type Description1 Tweet
newspaper1The index of IMECAS is 135 #CDMX
2 TweetNewspaper2
@ the #contamination of air is 127 IMECAS #CDMX #bad #new
Related work• The social data problem has been faced:
1. KDD and Social Mining2. Formal publications (news media) guide the classification
of the interests of social media users [1]3. Opinion mining and topic modeling [2]. But not using a GKD with an approach of crossing data
layers
6
GoalKnow how to:
Discover the certainty level of information
by Crossing geographic and social information
7
8
Solution proposed:
GKD Framework ForData Air Polluttion
Phase 1
Phase 2
Phase 3
Data extraction: Sample tweet (Phase 1)
9
Id Type Description1 Tweet
newspaper1TheThe index of IMECAS is 135 #CDMX
2 TweetNewspaper2
@ the #contamination of air is 127 IMECAS #CDMX #bad #news
We consider tweets from accounts that periodically reports data of air pollution
Data extraction: Domain Detection (Phase 1)
10
Id Type Description2 Tweet
Newspaper2
@ #contamination air is 127 IMECAS #CDMX #bad #new
The post is related to a pollution topic
Preprocessing (Phase 2)
• Emotion detection [3] • Location extraction
11
Id Type Description2 Tweet
Newspaper2@ #contamination air is 127 IMECAS #CDMX #bad #new
• If we detect to which category belongs each set of data:
• Health and Pollution, Transport and Pollution
Then, we can select which data sources should be Then, we can select which data sources should be crossed with the tweet , in order to discover crossed with the tweet , in order to discover KnowledgeKnowledge
12
Classification C5 algorithm (Phase 3)
Id Description Category2 @ #contamination air is 127 IMECAS
#CDMX #bad #new Health and pollution
Crossing data (Phase 4)
• Example 1:• Inconsistencies in tweet 1 and 2?
13
Id Type Description1 Tweet
Newspaper1The index of IMECAS is 135 #CDMX
2 TweetNewspaper2
@ the #contamination of air is 127 IMECAS #CDMX
What is correct?
How to know what tweet is correct? Answer:
It was classified in the domain of: Health and pollution ( In Phase 3 )Then The official data from Healt reports and pollution reports are
selected to be crosssed with the Tweet (in Phase 4)
28/10/16
Crossing data (Phase 4)
Crossing data (Phase 4)
• Data are crossed considering different attributes, from the tweet is taken the date and hour of publication
• When is crossed with the date and hour from official reports of air quality: a match is found
28/10/16
We discovered the tweets are correct but with different location (the location is not include in the original tweet)
28/10/16
1 Tweet newspaper1
The index of IMECAS is in 135 #CDMX
#Taxqueña 10:00 hours
2 TweetNewspaper2
The #contaminación of air is in 127 IMECAS #CDMX
#Indios Verdes
15:00 hours
Knowledge Discovered!
Crossing data (Phase 4)
Other preliminary results
• Following the same approach
• Knowledge discovered: what topic are talked by region
17
Topic Geographic Period
HealthSouth , West March-June
TransportNorth, East January
December
Policy and programs
Center JanuaryDecember
PollutionSurrounding Mexico City January-June
Public roadsSurrounding Mexico City January-
December
Conclusions and Future work• The integration of the geographical and temporal
dimensions allow us to discover data correlations knowledge can increase certainty of some information in social networks .
• The main contribution is the domain discovery and classification of information is a key element of news aproaches for to discover geographic information.
18
Conclusions and future work• Future work
• Use of clustering or deep learning approaches to improve the classification process
• The location detection is a hard problem. It can be test another machine learning methods for social media [4, 5]
• ¿How can we improve the geographic discovery knowledge considering no explicit links between traditional data sources and
[1] Jonghyun Han, Hyunju Lee, Characterizing the interests of social media users: Refinement of a topic model for incorporating heterogeneous media, Information Sciences, Volumes 358–359, 1 September 2016, Pages 112-128, ISSN 0020-0255.
[2] Schubert, E., Weiler, M., & Kriegel, H. P. (2014, August). Signitrend: scalable detection of emerging topics in textual streams by hashed significance thresholds. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 871-880). ACM.
[3] Carlos Acevedo Miranda, Ricardo Clorio Rodriguez, Roberto Zagal Flores,and Consuelo V. Garcia Mendoza. Web architecture for analysis of feelings in Facebook with semantic approach (Spanish), pp. 59–69; rec. 2014-06-22; acc. 2014-07-21 59 Research in Computing Science 75 (2014). http://www.rcs.cic.ipn.mx/rcs/2014_75/
[4] Ting Hua, Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2016. How events unfold: spatiotemporal mining in social media. SIGSPATIAL Special 7, 3 (January 2016), 19-25. DOI=http://dx.doi.org/10.1145/2876480.2876485
[5] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010.