NoizCrowd: A Crowd-Based Data Gathering and Management System for Noise Level Data Mariusz Wisniewski, Gianluca Demartini, Apostolos Malatras, and Philippe Cudré-Mauroux University of Fribourg, Switzerland
May 10, 2015
NoizCrowd:A Crowd-Based Data Gathering and Management System for Noise Level
Data
Mariusz Wisniewski, Gianluca Demartini, Apostolos Malatras, and Philippe Cudré-
MaurouxUniversity of Fribourg, Switzerland
Gianluca Demartini 2
Motivation - Big Data
• Large dataset are necessary to enable analytics and support decision making– Meteorological station / car traffic
• Set up a large-scale sensing infrastructure is costly and time-consuming
• Create a large amount of valuable data– Crowdsourcing– Data generation models– Smartphones as sensors– Big Data analytics
Gianluca Demartini 3
NoizCrowd
• A crowd-sensing approach to big data generation using commodity sensors
• Crowd-source noise level in a geo region• Noise propagation models to generate data• Array data management techniques to scale• Results accessible via a visual interface
• Support decisions (e.g., where to live)
Gianluca Demartini 4
Outline
• Related approaches• NoizCrowd Architecture Overview– Data Gathering– Storage– Modeling– Export and Visualization
• Data Models• Performance Evaluation
Gianluca Demartini 5
Related Work
• Participatory Sensing vs Sensor Networks– Low cost / High cost– Mobile phones / Sensors– Distributed / Centralized management– Privacy, data quality
• Applications: Environment, vehicle routing
Gianluca Demartini 6
Related Work
• Noise Mapping Apps– NoiseTube: opensource, widespread usage– NoiseMap: control over data– SoundSense: machine learning to classify sounds
• NoizCrowd– Data in RDF linkable to other datasets
(linkeddata.org)– Scalable storage: generate data by interpolation
Gianluca Demartini 7
NoizCrowd Architecture
Gianluca Demartini 8
Data Gathering
• By means of Crowd-sourcing– GPS: location– Microphone: noise level– Internet connection: send data to server
• Microphone Calibration– Sound level meter– Sharing conversion table for smartphone models
Gianluca Demartini 9
Data Storage
• App sends median and peak dB values over few seconds
• Spatio-temporal data: non-relational storage system (SciDB)– Durable storage– Retrieve data to build models– Export data for visualization
• Multi-dimensional array (space and time)• Distributed storage
Gianluca Demartini 10
Noise Modeling
• Data from crowd is noisy and skewed/sparse• Raw data is not shown to the end users• Models to deal with– Overlapping data– Missing data
Gianluca Demartini 11
Data Export and Visualization
• From SciDB data is– converted to RDF– stored in dipLODocus[RDF]– Available via SPARQL
• Visualization– Overlay noise level on a map– Additional chart for time evolution
Gianluca Demartini 12
Gianluca Demartini
Data Models
• Spatial Interpolation– In the same time interval, data from different
locations– Need to be computational simple (large volume)– Bi-dimensional range queries in space (SciDB)– K-nearest neighbor interpolation– Computed in parallel
Gianluca Demartini 14
Data Models
• Temporal interpolation– Short ranges (minutes) like spatial interp. in 3D– Long ranges, look for patterns and infer• E.g., every Monday at 11am we have 50dB and we miss
a Monday measurement• E.g., same measurement (50dB) in same area 2h ago
and now
Gianluca Demartini 15
Noise Propagation Models
• We adopt an existing model that takes into account:– Sound power– Distance from source– Directivity– Atmospheric absorption– Excess attenuation (we use meteo conditions)
• Difficult to measure with smartphone• Constant in a given region (and use GPS info)
Gianluca Demartini 16
Materialization of Models
• Data from models– Is computationally expensive to generate– May be a lot since we can cover any region
• We do late materialization– At query time– Only for the specific request– Cached and indexed for future requests– Incremental updates of views, if possible
Gianluca Demartini 17
Performance Evaluation (1)
• 30 outdoor deployments– 2,3,4 smartphones– Multiple noise sources– Urban setting, flat area of 50x50 meters
• Professional-grade noise level meter as gold standard measurement
• 85% of interpolated data +-6dB error• 63% of interpolated data +-4dB error
Gianluca Demartini 18
Performance Evaluation (2)
• Sound propagation and source location• 3 smartphones, 100dB source
Gianluca Demartini 19
Performance Evaluation (3)
• Sound level of source error– 16% with 3 measurements– 10% with 4 measurements– 9% with 5 measurements
• Source location– 3m error on average
Gianluca Demartini 20
NoizCrowd - Conclusions
• Large scale data is key for decision making• Crowd-source noise level data using mobiles– Scale-out using an array backend– Generate missing data and visualize
• Next steps– Android app– Data recording as background feature– Additional materialization strategies
http://exascale.info