Top Banner
Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Towards the Integration of Spatiotemporal Sensor Data and User-Generated Content Cornelius Rabsch • [email protected] “Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data” by C. Rabsch Abstract: Pervasive sensor networks are the source of continuous data streams about our physical environment. With the rise of the Mobile Web people-centric sensing yields a new layer of spatiotemporal contextual data, from qualitative user-generated content (e.g. geo- referenced multimedia messages) to quantitative sensor measurements (e.g. earthquake or hazard alerts). This mobile sensed content is made accessible within an ecosystem of heterogenous service providers, from social networks to social data networks. The mining, analysis and processing of these streams provides many challenges and semantic technologies can be utilized to overcome this heterogeneity. The integration of sensor data with a user-generated context will provide an increased situational awareness and contextual knowledge, resulting in application scenarios from more efficient emergency response management to improved urban planning. This talk gives an overview of the research I am doing at the Digital Enterprise Research Institute as part of my final thesis for my studies of Business Administration and Computer Science (Diploma degree, equivalent to a Master) at the University of Mannheim, Germany. A detailed presentation about the spatiotemporal data integration steps and formalisms will be published at a later stage. Alternative title: “A geospatial activity-based approach to semantically link user-generated content and sensor data.” Thesis supervision by: Prof. Dr. Manfred Hauswirth - Sensor Middleware Unit, Digital Enterprise Research Institute, National University of Ireland, Galway - www.deri.ie Prof. H. Stuckenschmidt - Knowledge Representation and Knowledge Management Research Group, University of Mannheim, Germany - ki.informatik.uni-mannheim.de Index terms: people-centric sensing, mobile sensing, user-generated content, sensor data, spatiotemporal integration Contact: Cornelius Rabsch - [email protected] - http://www.inperspektive.com
13

Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Oct 21, 2014

Download

Technology

Pervasive sensor networks are the source of continuous data streams about our physical environment. With the rise of the Mobile Web people-centric sensing yields a new layer of spatiotemporal contextual data, from qualitative user-generated content (e.g. geo-referenced multimedia messages) to quantitative sensor measurements (e.g. earthquake or hazard alerts). This mobile sensed content is made accessible within an ecosystem of heterogenous service providers, from social networks to social data networks. The mining, analysis and processing of these streams provides many challenges and semantic technologies can be utilized to overcome this heterogeneity. The integration of sensor data with a user-generated context will provide an increased situational awareness and contextual knowledge, resulting in application scenarios from more efficient emergency response management to improved urban planning.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Chapter Copyright 2008 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Towards the Integration of Spatiotemporal Sensor Data and User-Generated Content

Cornelius Rabsch • [email protected]

“Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data” by C. Rabsch

Abstract:Pervasive sensor networks are the source of continuous data streams about our physical environment. With the rise of the Mobile Web people-centric sensing yields a new layer of spatiotemporal contextual data, from qualitative user-generated content (e.g. geo-referenced multimedia messages) to quantitative sensor measurements (e.g. earthquake or hazard alerts). This mobile sensed content is made accessible within an ecosystem of heterogenous service providers, from social networks to social data networks. The mining, analysis and processing of these streams provides many challenges and semantic technologies can be utilized to overcome this heterogeneity. The integration of sensor data with a user-generated context will provide an increased situational awareness and contextual knowledge, resulting in application scenarios from more efficient emergency response management to improved urban planning.

This talk gives an overview of the research I am doing at the Digital Enterprise Research Institute as part of my final thesis for my studies of Business Administration and Computer Science (Diploma degree, equivalent to a Master) at the University of Mannheim, Germany. A detailed presentation about the spatiotemporal data integration steps and formalisms will be published at a later stage.

Alternative title:“A geospatial activity-based approach to semantically link user-generated content and sensor data.”

Thesis supervision by: Prof. Dr. Manfred Hauswirth - Sensor Middleware Unit, Digital Enterprise Research Institute, National University of Ireland, Galway - www.deri.ieProf. H. Stuckenschmidt - Knowledge Representation and Knowledge Management Research Group, University of Mannheim, Germany - ki.informatik.uni-mannheim.de Index terms:people-centric sensing, mobile sensing, user-generated content, sensor data, spatiotemporal integration

Contact: Cornelius Rabsch - [email protected] - http://www.inperspektive.com

Page 2: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

Ubiquitous Mobile Sensing

Source: Nokia Research Center, 2008 [1]

Source: National Research Council USA

The global trends of pervasive sensor networks and sensors embedded into everyday devices goes along with a steadily increasing number of deployed mobile devices, already reaching over 4 billion in 2009 [1]. Sensors and mobile phones are highly connected by the fact that every modern phone embeds a variety of sensors: location, acoustical, light or orientation, for example. According to Nokia Research Center [1] there will be a shift from traditional sensor networks to a participatory sensing infrastructure that leverages available devices and puts humans in the loop. Thus leading to a sensing network utilizing people and their mobility. Mobile sensing is the origin for large heterogeneous spatiotemporal data sets that have to be mined, processed and analyzed which provides challenging tasks and complex problems.

[1] Nokia Research Center, Sensing the World with Mobile Devices, http://research.nokia.com/files/insight/NTI_Sensing_-_Dec_2008.pdf, 2008

Page 3: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

Sensors + Mobile Phones + People

To give an overview of the main concepts we take a closer look at sensors connected to mobile phones, people carrying mobile phones and mobile phones as place of origin for user-generated content and sensor data.

Looking solely at networked sensors, we have a well understood infrastructure where sensors are part of sensor networks and sensor bases or social data networks can provide a persistence layer with varying degrees of data accessibility. Providers for this kind of service are Sensorbase [2], Sensorpedia [3] or Pachube [4], for example. Sensor middleware can be utilized to collect, process and analyze large amounts of sensor data.

On the other hand, people in a networked world are part of online communities and create content within social networks. All user-generated contentin Social Media is distributed over a variety of heterogeneous services. Photo community Flickr [5], video community YouTube [6] or generic social network Facebook [7], for example. These services act as valuable centralized places to collect and analyze social media contributions.

Bridging the quantitative sensor world with the human-centric social media world is a challenging task. Mobile phones can play an important role as the place of origin for user-generated content and sensor data. The concept of mobile sensing is focusing on the mobile devices and its embedded sensors. Humans are seen as carriers providing the mobility which results in an opportunistic sensing infrastructure [1].

In both scenarios, sensors embedded into mobile phones and people submitting content with mobile phones, the networking aspect is essential and Internet connectivity is enabler to exchange and aggregate data. The Internet and the World Wide Web in specific provide a scalable infrastructure with shared well-accepted standards and technologies and thus build a ready-to-use foundation and platform for data distribution, mining and analysis.

The specifics of data sharing around the aspects of provenance, trust, permissions and access rights wonʼt be in the focus of this research.

[1] Lane et al., Urban sensing systems: opportunistic or participatory?, 9th workshop on Mobile computing systems, 2008, www.cl.cam.ac.uk/~mm753/papers/hotmobile08.pdf[2] Sensorbase.org[3] Sensorpedia.org[4] Pachube.com[5] Flickr.com[6] Youtube.com[7] Facebook.com[8] Campbell et al., The Rise of People-Centric Sensing. Internet Computing, IEEE (2008) vol. 12 (4) pp. 12 - 21

Page 4: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

Mobile + Sensors

CitySense, SenseNetworks

NoiseTube, SONY ComputerScience Laboratory

BioMapping.net by C. Nold

Real Time Rome, MIT SENSEable City Lab

We follow by giving a short overview about typical mobile sensing scenarios to clarify how mobile devices can be used to sense your physical environment.

The NoiseTube project “turns your mobile phone into an environmental sensor and participates to the monitoring of noise pollution” [1]. Users can also annotate the sensed data with tags as ʻConstruction buildingʼ to give more background with the sensed location.

CitySense [2] is built on top of Sense Networksʼ Macrosense location analytics platform [3] to report about nightlife activity in San Francisco by analyzing location traces and consumer behavior.

BioMapping [4] measures the Galvanic Skin Response (GSR) to recognize emotional arousal at specific places as bridges or hard-to-cross streets.

The Real Time Rome project [5] by the MIT SENSEable City Lab [6] analyzes cell phone data within the city of Rome to figure out consumer behavior and finding patterns in urban data collections [7].

Goal of these sensing application can be the utilization and analysis of the sensed data to influence human behavior, e.g. where to go next, which places have to be avoided, how to select the fasted path to a specific place, for example.

[1] NoiseTube by SONY Computer Science Laboratory, http://noisetube.net/[2] CitySense, Sense Networks, http://www.citysense.com, “Life San Francisco Nightlife Activity”[3] MacroSense, Sense Networks, http://www.sensenetworks.com/macrosense.php[4] BioMapping, C. Nold, http://biomapping.net[5] Real Time Rome, MIT SENSEable City Lab, http://senseable.mit.edu/realtimerome/[6] MIT SENSEable City Lab, http://senseable.mit.edu/[7] Reades et al., Cellular Census: Explorations in Urban Data Collection, Pervasive Computing, IEEE (2007) Vol. 6 (3) pp. 30 - 38

Page 5: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

Mobile + People

Source: twitpic.com/135xa

“There’s a plane in the Hudson. i’m on the ferry going to pick up people. Crazy.”

Source: twitter.com/arunshanbhag

Source: www.flickr.com/photos/vinu

User-generated content, also referred as social media contributions, can be personal, entertaining, of uncertain quality, timely, contextual and often unpredictable. For the sake of simplicity and convincingness we give an overview of typical contributions of high importance and high quality. Citizen journalism refers to the act of citizens reporting about their neighborhood and the community they are living in, ranging from hyper-local news to on-the-spot emergency reports. The tools that facilitate citizen journalism are manifold and often free and easy to use.

User ʻarunshanbhagʼ reports about a terrorist attack at the Taj Hotel in Mumbai [1] by using the micro-blogging service Twitter, user ʻvinuʼ on Flickr uploads and geo-references a photo about the same event with only a minor delay [2], user ʻjkrumsʼ is witnessing a plane crash and immediately takes a picture which got tens of thousands of views on Twitpic [3], to name a few real-life citizen journalism examples.

Mobile mapping of earthquake catastrophes [4] is another example where it is important to know at which places a house is burning or where it collapsed and how big the impact was.

It is important to note that location often matters and user-generated content is in many cases already geo-referenced by utilizing built-in GPS sensors in mobile phones or by manually selecting the correct location on a web mapping interface.

[1] “Mumbai Blasts”, Twitter user arunshanbhag, http://twitter.com/arunshanbhag, accessed April 2009[2] “Mumbai Attacks on Nov 26th 2008”, Flickr user vinu, http://www.flickr.com/photos/vinu/sets/72157610144709049/, accessed April 2009[3] “US Airways Hudson River Plane Crash”, Twitpic user jkums, http://twitpic.com/135xa, accessed April 2009[4] “Earthquake LʼAquilla”, Google Maps user TMG, http://maps.google.com/maps/ms?msa=0&msid=112463924814795169379.000466dc7c10ff9a99bd4&ie=UTF8&ll=42.3496,13.397613&spn=0.013638,0.032187&t=p&z=15, accessed April 2009

Page 6: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

Related Work

References: see slide notes

To distinguish the related work we use a) a breakdown by type of content and b) a breakdown by the heterogeneity. The types of data can be user-generated content, sensor data or scenarios where both are considered for integration and analysis. With regards to the heterogeneity, the distinction is between a closed world scenario where one or multiple well-known data sources are applied and an open world scenario where multiple heterogeneous data sources are applied. The latter provides a much higher complexity because of the missing semantic interoperability between service providers and their data schemas.

The focus lies on the data integration and analysis of spatiotemporal user-generated content and sensor data in a mobile context.

Note: Will be clarified and regrouped later on.

[1] Eagle and Pentland, Reality mining: sensing complex social systems. Personal and Ubiquitous Computing, 2006[2] Sheth et al., Semantic Sensor Web, IEEE Internet Computing, 2008[3] Calabrese et al., Wikicity: Real-time Location-sensitive Tools For The City, IEEE Pervasive Computing, 2007[4] Girardin et al., Digital Footprinting: Uncovering Tourists with User-Generated Content, IEEE Pervasive Computing, vol. 7 (4) pp. 36 - 43, 2008[5] Urban Sensing, CENS/UCLA, Center for Embedded Networked Sensing, http://urban.cens.ucla.edu/[6] PEIR, the Personal Environmental Impact Report, http://urban.cens.ucla.edu/projects/peir/[7] Real Time Rome, MIT SENSEable City Lab, http://senseable.mit.edu/realtimerome/[8] Sensorbase.org[9] Sensorpedia.org[10] Pachube.com[11] SIOC, Semantically-Interlinked Online Communities, http://sioc-project.org/[12] BioMapping, C. Nold, http://biomapping.net[13] NoiseTube by SONY Computer Science Laboratory, http://noisetube.net/

Page 7: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

Research Questions

How can spatiotemporal user-generated content be semantically linked to sensor data?

What kind of semantics are required to describe, analyze and query heterogenous geospatial activities?

How can we increase and assess the contextual information provided by sensor data and user-generated content?

Why? Increased efficiency in emergency response management, urban planning or targeted advertising,...

The focus will be on data access and data integration in a scenario where the dataʼs life-cycle is important to understand, i. e. mobile-originated web-accessible content provided by heterogenous services.

Page 8: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

Geospatial Activity Semantics

*M. Perry. A framework to support spatial, temporal and thematic analytics over semantic web data, 2008

*

The underlying concepts of the geospatial activity vocabulary are the shared semantics of time, space and theme [1] of the user-generated content and the sensor data. GeoAct is a vocabulary utilizing Semantic Web technologies as RDF for the representation of spatiotemporal thematic data. RDF provides the basis for easy to use extension and integration mechanisms.

The red circles show common vocabularies as FOAF, Dublin Core or SIOC that can be interlinked to the GeoAct Activity base class. A gazetteer schema as the one of Geonames [2] can be utilized to refer GPS coordinates to well-known geographical identifiers as a way to geographically cluster activity data.

This section and the following about the GeoAct schema and the underlying concepts will be reviewed in detail in a different presentation.

[1] Sheth and Perry, Traveling the Semantic Web through Space, Time, and Theme, IEEE Internet Computing, 2008[2] http://www.geonames.org/

Page 9: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

Activity Example as RDF/XML

Atom / RSS Feed Content: Earthquakes, Hazards, Weather, Traffic, Photos, Multimedia, Messages, Events,...

Semantic Gap! RDF / Atom Triples?

A sample RDF/XML extract of a geoact:Activity class about an earthquake notification as provided by the U.S. Geological Survey.

The semantic gap shows that more specific semantics of the earthquake notification and the seismic waves sensors are hidden in non-structured text. A solution are more domain-specific ontologies extending the GeoAct vocabulary.

Web feeds (Atom or RSS with GeoRSS extensions) are already providing a variety of activity information from earthquakes to hazards to multimedia messages or event information.

Sample feed sources for New York City, USA:USGS M2.5+ Earthquakes http://earthquake.usgs.gov/eqcenter/catalogs/1day-M2.5.xmlBrightkite Place Stream http://brightkite.com/places/ede07eeea22411dda0ef53e233ec57ca/objects.rss?limit=100&filters=notes,photosFlickr Photo Stream http://www.flickr.com/services/feeds/geo/?tags=newyork or http://www.flickr.com/services/feeds/geo/?tags=manhattanUpcoming Event Stream http://upcoming.yahoo.com/syndicate/v2/place/hVUWVhqbBZlZSrZUYahoo! Traffic Alert Stream http://local.yahooapis.com/MapsService/rss/trafficData.xml?appid=YdnDemo&location=new%20york%20cityWeather, NYC, Central Park http://www.weather.gov/xml/current_obs/KNYC.rss

Page 10: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

GeoAct Data Flow

The GeoAct prototype utilizes the GeoAct vocabulary for the integration of sensor data and user-generated content.

Social networks for user-generated content and social data networks for sensor data are the ʻsinksʼ of the platform that provide web-accessibility by supporting web feeds as Atom or other open standard. These services should act as the neutral storage and aggregation services in an ecosystem where sensor data and unser-generated content is shared in equal ways.

Note: More to follow. First the focus is on available web content by ignoring the assumption that the data should be mobile-originated.

Page 11: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

GeoAct TimeMap Visualization

The screenshot shows one dynamic view of the TimeMap [1] interface where spatiotemporal activity content of a variety of sources around New York City was aggregated from web feeds and visualized. The data mining is decoupled from the querying and visualization part.

The GeoAct prototype takes a bounding box as the geospatial query parameter and a time period as temporal query parameter to visualize on a interactive TimeMap crawled web-accessible geospatial activity data. The goal is to visualize heterogeneous spatiotemporal data from a variety of service providers, e.g. Flickr, Brightkite, Twitter, Yahoo! Traffic or the U.S. Geological Survey.

The implementation is built around the Ruby on Rails web framework [2] and the relational database PostgreSQL [3] with the PostGIS [4] extension for improved geospatial queries. A triple store with a SPARQL endpoint will be used for advanced querying.

For simplicity, privacy considerations are not taken into account for this work and all crawled content is publicly available. For user-generated content it means that the user agreed to make his or her social media contribution accessible to everyone.

[1] Timemap.js is a Javascript library to help use Google Maps with a SIMILE timeline, http://code.google.com/p/timemap/[2] Ruby on Rails, http://rubyonrails.org/[3] PostgreSQL, http://www.postgresql.org/[4] PostGIS, http://postgis.refractions.net/

Page 12: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

First Conclusions

Sensor data and UGC can fit together integration & analysis of both required to fully understand

the context and to increase situational awareness raw sensor data streams not useful in the open world

scenario -> middleware, social data networks? Mobile phone provides same origin for production of UGC

& SD Neighborhood level “real-time” data not (yet) realistic

Demand for (social) sensor sharing infrastructure FireEagle, Pachube, SensorPedia, SensorBase, Web Feeds,

UGC Metadata (machine tags; EXIF)

To increase situational awareness heterogeneous data sets with spatiotemporal sensor data and user-generated content can be used. Shared semantics for time, location and theme provide a central point for the data integration steps.

There is a growing demand for a sensor data sharing infrastructure where sensors and sensor networks provide their collected data streams in accessible ways to third parties interested in integrating and remixing the data. These services not only take care about the data aggregation and accessibility, a management and provenance layer can also help to track the flow and origin of the data, for example. We refer to it as social data networks with respect to the analogy to social networks. Social networks or online communities are carriers of user-generated content and provide privacy layers and mechanisms to distribute publicly available content.

In [1] the integration of social networks and sensor networks is considered and the question how sensors can extend social networks or replace humans to answer certain queries. The focus is on using existing connections and privacy concepts within a social network to share and access sensor devices.

[1] Breslin et. al, Integrating Social Networks and Sensor Networks, W3C Workshop on the Future of Social Networking, 2009http://www.w3.org/2008/09/msnws/papers/sensors.html

Page 13: Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data

Digital Enterprise Research Institute www.deri.ie

Next Steps

Sample SPARQL Queries & Inference “All activity topics in case of an earthquake on April 2nd 2009 by the service

www.flickr.com near Time Square”

Advanced Web Mining Named entity recognition to extract location data from

unstructured text (e. g. Twitter messages) via APIs Prototype refinements (querying, export,

documentation,...) Thesis write-up (until June ‘09)

Explaining multiple SPARQL [1] queries that require the availability of sensor data and user-generated content helps to understand how querying on the crawled web content can be done. Reasoning can be described by utilizing extended GeoAct activities.

[1] SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/