Top Banner
Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA
24

Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Mar 28, 2015

Download

Documents

David Moore
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Data Provenance in Remote Environmental Monitoring

Dr. Christian Skalka, University of Vermont, USA

Page 2: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Data Provenance in Remote Environmental Monitoring (REM)REM = automated collection of data from the

natural environment in remote settings.

Central points: Data provenance is fundamental to REM.

Data source, times, ownership are intrinsic. REM hardware and software architectures pose

unique challenges for establishing provenance. Heterogeneous, distributed, low-power systems.

Page 3: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Outline

Two REM case studies and problem statements:

1. Snowpack monitoring (SnowMAN) The SnowMAN project summary. Microcosmic provenance issues, challenges. SnowMAN provenance “coping mechanisms”.

2. Sagehen Creek Field Station network Overview of project setting. Macrocosmic provenance issues, challenges. Possible approaches to central challenges.

Page 4: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

How Much Snow is Out There? Snow/Water Equivalent (SWE):

measurement of water content in snowpack Not the same as snow height.

Page 5: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

How Much Snow is Out There? Regional snowpack profiles are critically important to

natural resource planning, public safety. Real world measurement is complicated by terrain, forest

canopies, wind, exposure. Accurate realtime SWE measurement is a “holy grail” of

REM.

Page 6: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

The UVM SnowMAN Project

A new approach to SWE measurement Use modern computer technology for

data acquisition and retrieval A multi-modal approach to SWE

approximation Lightweight, low cost, robust,

adaptable Improved spatial and temporal

resolution

Page 7: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Multimodal Sensor Fusion

Algorithms on sensing nodes combine multiple sensing technologies of variable power cost:

1. Snow height via ultrasound (cheap)

2. Snow density via microwave absorption (moderate)

3. Snow density via gamma ray attenuation (expensive)

Page 8: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

SnowMAN System Architecture Multiple data gathering-and-processing nodes

connected via a Wireless Sensor Network (WSN) Arduino-based on-site gateway provides

datalogging via SD card, data processing Remote data retrieval via TCP/IP over cellmodem

Page 9: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Provenance Issues in SnowMAN Data reported by sensors meaningless

without provenance information: Time of sampling event Location of sample Type and ADC conversion formula of sensor

Refinement of multimodal fusion algorithm requires history/cause of sampling event.

Page 10: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Provenance Challenges in SnowMAN Low-bandwidth requirements in WSNs

Messages must be small, infrequent. Volatility of low-cost devices

WSN node failures require data reliability solutions

Heterogeneous network architecture Data formats must be converted in network

communications Time synchronization

Page 11: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Managing Provenance in SnowMAN Reliability ensured by datalogging on gateway,

replication within WSN. Requires data source, time to be stored with readings.

Provenance information reported with data readings. Component of packet format; not onerously large.

Data converted at “protocol boundaries”. 802.15.4 to RS232 to TCP/IP to SQL.

Time synchronization handled by simple protocols. Low precision sufficient; cellmodem provides “true” time.

Page 12: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Outstanding Provenance Issues in SnowMAN How to verify that data is converted properly

at protocol boundaries? How to encode history of multi-modal

readings, for analysis and refinement of algorithms?

How to detect errors in data readings, due to sensor, time synchronization, node failure?

Page 13: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

REM in Macrocosm: Sagehen Creek Field Station

Sagehen Creek Field Station and Experimental Forest located near Truckee, CA

Research and Teaching Facility of UC Berkeley 9,000 acres of undisturbed wilderness, extensive

REM technology

Page 14: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

REM in Macrocosm: Sagehen Creek Field Station Literally hundreds of various sensor devices

Temperature, wind, humidity Streamflow, Stream temperature Snow height, SWE Video

9 hubs with (programmable) dataloggers, power, wireless transmission

Goal: wireless connectivity to field house and internet, off-site data warehousing

Multiple user, administration groups

Page 15: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Sagehen Creek Field Station

Page 16: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Provenance Issues at Sagehen Inherits microcosmic issues (time, location,

sensor modality essential to data). Video triggering events should be reported. Group data ownership now important to

report (and maintain through data cycle). Sagehen provenance should be credited in

myriad end-uses of data. Diagnostics of network functionality and

services.

Page 17: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Provenance Challenges at SagehenInherits microcosmic challenges, but: Increased sampling rates, network traffic Time synchronization much more complex GPS auto-location for some sensors, manual

for others Much greater diversity of devices,

communications mediums (wired, wireless) More protocol boundaries Multimedia

Page 18: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Sagehen Provenance Issues: ScalabilitySagehen network modeled as source-to-sink

dataflow, from sensors to end-users. Sources extensible by user groups

New sensors, sensor networks (e.g. WSNs) New remote datalogging/replication architecture

Sink usable by end-user groups Arbitrary visualization technologies Diverse research and education applications

Page 19: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Sagehen Network: The Current Reality Establishing data communications backbone

over IEEE802.11 wireless LAN. Limited data collection over network (one-

hop) via canned proprietary software. Most data collection being done manually

from dataloggers. Sensors hardwired to dataloggers, no WSNs in

the field. Some one-hop connectivity between hubs.

Page 20: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Sagehen Network: The Vision Seamless source-to-sink dataflow.

From sensors in the field to off-site, permanent data warehouse.

Also accessible onsite at remote hubs (reliable). Wireless sensor network capabilities in the

field. Attribution of data to source groups and

Sagehen. Easy extensibility of network at source end,

to allow addition of new sensors (and WSNs).

Page 21: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Some Ideas for Supporting Provenance in the Sagehen Software ArchitectureTreating data like messages on a protocol

stack. Stack defined across device (protocol)

boundaries: Sensor data is “raw”, collects more provenance

information as it moves towards the sink. Higher layers of provenance (time, ownership)

encapsulate lower layers. Allows compositional (principled) treatment of

cross-protocol data transformation.

Page 22: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Some Ideas for Supporting Provenance in the Sagehen Software ArchitectureWatermarking data to establish Sagehen and

group ownership. Easily done for video media.

Video retrieved only from the internet; watermarking performed on traditional platform.

Watermarking sensor data?? Need to preserve data may not tolerate traditional

techniques. In-the-field retrieval requires in-the-field

watermarking.

Page 23: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Conclusion

Remote environmental monitoring requires provenance for correct interpretation of data.

REM networks heterogeneous, some components computationally “weak”. Power, cost restrictions. Protocol hodgepodge!

Adapting to REM environment a unique challenge for provenance in software.

Page 24: Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA.

Conclusion

Two case studies: SnowMAN: lightweight, low cost SWE monitoring. Sagehen Creek Field Station: REM in macrocosm.

http:www.cs.uvm.edu/~skalkahttp://sagehen.ucnrs.org/