SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez
Feb 25, 2016
SAN DIEGO SUPERCOMPUTER CENTER
NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA:
INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS
Thomas Whitenack
David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez
SAN DIEGO SUPERCOMPUTER CENTER
USGS Instantaneous water data services
• 15 minute intervals• 10,000+ sites (7,000+ have dischage)• Upto 60 days of data available• http://waterservices.usgs.gov/WOF/InstantaneousValues• Data provided using CUAHSI WaterML
SAN DIEGO SUPERCOMPUTER CENTER
Open Source Data Turbine (Ring Buffered Network Bus)
•DataTurbine is a robust open-source streaming data middleware system, designed for sensor based systems.•Co-developed by our UCSD / Calit2 colleagues. •Solution for accessing both streaming and static data, from different vendor systems, via a common interface.• Released under Apache 2.0 Open Source License• Provides real high performance data streaming, 10+MB/sec, 1000 frames/sec
SAN DIEGO SUPERCOMPUTER CENTER
Open Source DataTurbine
• Supported by NASA SBIR, 15 years in development
• Supports multiple types of streams: real-time monitoring, video and multimedia, telemetry, instant messages, etc. etc.
• Scalable: DataTurbine servers can be interconnected to handle large streams
• Can manipulate the streams: fast forward or slow motion playback (TiVo-like)
SAN DIEGO SUPERCOMPUTER CENTER
Goal of Integrating Data Turbine with CUAHSI HIS
• Get the two systems to work together. • Maintain an up-to-date view of a large volume of
near real time data, in house. • Store data locally beyond the 60 days it is made
available. • Enable viewing of the NWIS Instantaneous data
in the Realtime Data Viewer (RDV).
SAN DIEGO SUPERCOMPUTER CENTER
Challenges of Project • Integrate CUAHSI HIS with the data turbine
• CUAHIS HIS perspective: • Consuming waterML from Java environment• Obtain and store NWIS 15 minute data beyond 60 days.
• Data Turbine Perspective• Cuahsi data represented unusual challenges
– Pulling data.– Time stamps have to set for each value.
• 7,000 “Channels” needed to be organized for the RDV client– Visualizing / navigating mass volumes of data.
SAN DIEGO SUPERCOMPUTER CENTER
CUAHSI –> Data Turbine
SAN DIEGO SUPERCOMPUTER CENTER
OSDT Custom Source• Each source is a separate connection
• 7000 sources was too many for OSDT.• Sources can have multiple channels and sub-
channels• Sites were organized by state and county to make it
navigatible • 50GB Disk cache: ~ 1 year of 15 minute data for 7000
sites. • Cycling through 7,000+ getValues request takes ~18
hours for the iteration, or upon restart.• Subsequent iterations still can complete in under 8 hours.
SAN DIEGO SUPERCOMPUTER CENTER
Realtime Data Viewer (RDV)
SAN DIEGO SUPERCOMPUTER CENTER
OSDT Custom “Sink”• Is essentially a custom client connection to
DataTurbine (RDV is a sink process). • Pulls data and writes it to SQL batch files for
batch inserts. • Used to update local ODM instance of NWIS
instantaneous data.
SAN DIEGO SUPERCOMPUTER CENTER
Conclusions• CUAHSI HIS WaterML can be used in Java/ non windows
environments successfully. • Displaying near realtime data in RDV is very fast and is a valuable
visualization tool. • Data turbine is designed to ingest much more data than this.
• Capable of 10MB/Second – We’re feeding it < 1K/second.• Updating 7000+ data channels worked, but is well beyond what the
OSDT developers had in mind when designing it. • Organizing 7000+ channels in a viewer display represents
organizational challenges.
SAN DIEGO SUPERCOMPUTER CENTER
Questions?
• http://www.dataturbine.org