Top Banner
SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez
12

Thomas Whitenack David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez

Feb 25, 2016

Download

Documents

valora

NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS. Thomas Whitenack David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez. USGS Instantaneous water data services. 15 minute intervals 10,000+ sites (7,000+ hav e dischage ) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA:

INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS

Thomas Whitenack

David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez

Page 2: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

USGS Instantaneous water data services

• 15 minute intervals• 10,000+ sites (7,000+ have dischage)• Upto 60 days of data available• http://waterservices.usgs.gov/WOF/InstantaneousValues• Data provided using CUAHSI WaterML

Page 3: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

Open Source Data Turbine (Ring Buffered Network Bus)

•DataTurbine is a robust open-source streaming data middleware system, designed for sensor based systems.•Co-developed by our UCSD / Calit2 colleagues. •Solution for accessing both streaming and static data, from different vendor systems, via a common interface.• Released under Apache 2.0 Open Source License• Provides real high performance data streaming, 10+MB/sec, 1000 frames/sec

Page 4: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

Open Source DataTurbine

• Supported by NASA SBIR, 15 years in development

• Supports multiple types of streams: real-time monitoring, video and multimedia, telemetry, instant messages, etc. etc.

• Scalable: DataTurbine servers can be interconnected to handle large streams

• Can manipulate the streams: fast forward or slow motion playback (TiVo-like)

Page 5: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

Goal of Integrating Data Turbine with CUAHSI HIS

• Get the two systems to work together. • Maintain an up-to-date view of a large volume of

near real time data, in house. • Store data locally beyond the 60 days it is made

available. • Enable viewing of the NWIS Instantaneous data

in the Realtime Data Viewer (RDV).

Page 6: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

Challenges of Project • Integrate CUAHSI HIS with the data turbine

• CUAHIS HIS perspective: • Consuming waterML from Java environment• Obtain and store NWIS 15 minute data beyond 60 days.

• Data Turbine Perspective• Cuahsi data represented unusual challenges

– Pulling data.– Time stamps have to set for each value.

• 7,000 “Channels” needed to be organized for the RDV client– Visualizing / navigating mass volumes of data.

Page 7: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

CUAHSI –> Data Turbine

Page 8: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

OSDT Custom Source• Each source is a separate connection

• 7000 sources was too many for OSDT.• Sources can have multiple channels and sub-

channels• Sites were organized by state and county to make it

navigatible • 50GB Disk cache: ~ 1 year of 15 minute data for 7000

sites. • Cycling through 7,000+ getValues request takes ~18

hours for the iteration, or upon restart.• Subsequent iterations still can complete in under 8 hours.

Page 9: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

Realtime Data Viewer (RDV)

Page 10: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

OSDT Custom “Sink”• Is essentially a custom client connection to

DataTurbine (RDV is a sink process). • Pulls data and writes it to SQL batch files for

batch inserts. • Used to update local ODM instance of NWIS

instantaneous data.

Page 11: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

Conclusions• CUAHSI HIS WaterML can be used in Java/ non windows

environments successfully. • Displaying near realtime data in RDV is very fast and is a valuable

visualization tool. • Data turbine is designed to ingest much more data than this.

• Capable of 10MB/Second – We’re feeding it < 1K/second.• Updating 7000+ data channels worked, but is well beyond what the

OSDT developers had in mind when designing it. • Organizing 7000+ channels in a viewer display represents

organizational challenges.

Page 12: Thomas Whitenack David Ryan, David Valentine,  Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER

Questions?

[email protected]

• http://www.dataturbine.org