Top Banner
SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky, Reza Wahadj, David Valentine, Blair Jennings (San Diego Supercomputer Center, UCSD) David Maidment (CRWR, UT-Austin) and other HIS development partners from UT-Austin, Utah State U, Drexel U, Duke U
23

SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D

A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL

REPOSITORIES OF HYDROLOGIC TIME SERIES

Ilya Zaslavsky, Reza Wahadj, David Valentine, Blair Jennings

(San Diego Supercomputer Center, UCSD)David Maidment

(CRWR, UT-Austin)and other HIS development partners

from UT-Austin, Utah State U, Drexel U, Duke U

Page 2: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D

The Grid is becoming the backbone for collaborative science and data sharing

CI is about RE-USING data and research resources !!

Page 3: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D

CI Vision for Hydrologic Science

• Leverage ongoing cyberinfrastructure projects:• Geosciences Network (GEON)• Share data between Earth Disciplines• Secure access to Grid resources, single sign-on authentication/

authorization, distributed data management, data publication, search, information integration, knowledge management, scientific workflows, archiving

• Integrate with common COTS (commercial off-the shelf) software: • Excel, ArcGIS, Matlab… • and Fortran … mostly on Windows… • Interesting survey of CUAHSI partners by David Tarboton!

Page 4: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D

HIS User Assessment (Chapter 4 in Status Report)

Data Access

Science Observatorysupport

Education

Which of the four HIS goals is most important to you?

Page 5: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D

Tuning to unique features of hydrology

• Hydrologic observations:• Reliance on federally-organized data collection (NWIS, STORET,

Ameriflux, etc.) with huge and complex nomenclatures simplifying access to federal repositories relatively lower emphasis on data ownership

• Handling time in both UTC and local• Various spatial offsets• Multiple data types: time series, fields, spatial data

• Integrative discipline:• Interoperation with atmospheric, ocean, soils, geomorphology, social

datasets and services…• Community:

• Organized by “natural boundaries” natural object hierarchy networks of relatively autonomous self-managed data nodes

• Partnership with public sector water management

ontologies

Page 6: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSD

SciR&D

Problems

• Microsoft and .NET vs Linux and J2EE

• Open source vs proprietary

• Free vs not free

Open architecture, web services,

well-defined interfaces

Page 7: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Main Components• Web services for

accessing hydrologic repositories

• Hydrologic Observations Data Model

• Hydrologic Data Access System + Time SeriesViewer

• Collection of CUAHSI nodes

NWISNWIS

ArcGISArcGIS

ExcelExcel

NCARNCAR

UnidataUnidata

NASANASAStoretStoret

NCDCNCDC

AmerifluxAmeriflux

MatlabMatlabAccessAccess SASSAS

FortranFortran

Visual BasicVisual Basic

C/C++C/C++

CUAHSI Web ServicesCUAHSI Web Services

Page 8: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

NWIS

Matlab ArcGIS Excel Web browser

Application services: analysis, mapping, charting, models,workflow, integration (8)

Data registration/Search/Query rewriting & orchestration(6)

NAWQA STORET . . .

Data Nodes

External data resources registrymetadata

We b

ser

vices

regi

stry

and

rela

ted

serv

ices

(10)

Hosted dataservices (5)

Fortran/C/VB/Java codes

Data Node Data Node

Core grid services: monitoring nodes, scheduling,data transfer, replication, collectionmanagement,…(1)

Resource drivers (2)

Service consumers

User registration/authentication/authorization (9)portal

SensorsSensors SensorsSensors

Sensor management services (3)

Sensor data filtering (4)

Ontology source andservices (7)

RServer

ArcGIS Server

Conversion engine

Certificateauthority

Data Node Data Node

3

2

1

Page 9: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

NWISNWIS

ArcGISArcGIS

ExcelExcel

NCARNCAR

UnidataUnidata

NASANASAStoretStoret

NCDCNCDC

AmerifluxAmeriflux

MatlabMatlab

AccessAccess SASSAS

FortranFortran

Visual BasicVisual Basic

C/C++C/C++

Some operational services

CUAHSI Web ServicesCUAHSI Web Services

Data SourcesData Sources

ApplicationsApplications

Extract

Transform

Load

http://www.cuahsi.org/his/

Page 10: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Database Sizes

EPA

NWS

USGS

Records

200 million

?

Stations Time range

250 million

800,000 100 years

1.5 million 100 years

100 years19,000

(From Jon Goodall, Duke U.)

Page 11: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Language for Data Representation

EPA

NWS

USGS

Unique Identifier for a Observation Station

site_no

Station ID

COOPID

Latitude, Longitude

Time of Measurement

Station Latitude, Station Longitude

Activity Start

dec_lat_va, dec_long_va

dv_dt

YEAR,MO,DA,TIME LATITUDE, LONGITUDE

Lots of semantic differences in parameter names, methods, etc.

Page 12: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Typical Example of Data Retrieval

National Water Information System (NWIS)

Page 13: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Core Web ServicesService Input Output

GetSites Obs Network, filter Get station codes in network

GetSiteInfo Station Code Lat/long, station name

GetVariables Obs Network or data source, filter

Get variable codes

GetVariableInfo Variable code Description of variable

GetValues Station code or lat/long point, variable code, begin date, end date

A time series of values

GetChart As for GetValue A chart plotting the values

Page 14: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

CUAHSI Web Serviceshttp://www.cuahsi.org/his/webservices.html

NCEP North American Forecast Model 12 Km grid for continental US

Page 15: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

CUAHSI Point HydrologicObservations Data Model

• A relational database stored in Access, PostgreSQL, MS SQL Server, ….

• Stores observation data made at points

• Consistent format for storage of observations from many different sources and of many different types.

Streamflow

Flux towerdata

Precipitation& Climate

Groundwaterlevels

Water Quality

Soil moisture

data

(D. Tarboton, USU)

Community design requirements(22 reviewers)

Page 16: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Schema

Page 17: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Feature

Waterbody

HydroIDHydroCodeFTypeNameAreaSqKmJunctionID

HydroPoint

HydroIDHydroCodeFTypeNameJunctionID

Watershed

HydroIDHydroCodeDrainIDAreaSqKmJunctionIDNextDownID

ComplexEdgeFeature

EdgeType

Flowline

Shoreline

HydroEdge

HydroIDHydroCodeReachCodeNameLengthKmLengthDownFlowDirFTypeEdgeTypeEnabled

SimpleJunctionFeature

1HydroJunction

HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole

*

1

*

HydroNetwork

*

HydroJunction

HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole

HydroJunction

HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole

1

1

CouplingTable

WaterID (GUID)HydroID (Integer)

MonitoringPoint

WaterIDHydroCodeNameLatitudeLongitude…

Hydrologic Observations Data Model

1

1

OR

Independent of, but coupled to Geographic Representation

HODM Arc Hydro

Page 18: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D

Uses and tools for HODM

• HODM is central to HIS infrastructure, but lacks tools• Testing HODM with two types of data: federal repositories, and

external databases (Panola). Personal and enterprise versions.• Mapping wizard: loading

Excel observation data to HODM database:• Can save mapping files

for subsequent runs of similarly formatted spreadsheets

• Local data analysis can be done: charts and stats

• HDAS as an interface to HODM datasets - but shall not be the only one - so exposing HODM as Web services

Page 19: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSD

SciR&D

Hydrologic Data Access System

http://river.sdsc.edu/hdas/

Page 20: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D

Hydrologic Data Access

System

Page 21: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Cross-platform design Central CUAHSI HIS Node (Windows) GEON Data Node (Linux)

Data

Apache TomcatIIS Web Server

ASP . Net

Geon Software Stack

SQL Server

Proxy

ArcGIS

Technologies

HDASHODM

Web

ServiceWeb

Services

Web Serviceproxies

Data

Remote CUAHSI HIS Node (Windows)

Data

IIS Web ServerASP . Net

SQL ServerArcGIS

Technologies

HDASHODM

Web

ServiceWeb

Services

Web Serviceproxies

Remote CUAHSI HIS Node (Windows)

Data

IIS Web ServerASP . Net

SQL ServerArcGIS

Technologies

HDASHODM

Web

ServiceWeb

Services

Web Serviceproxies

Remote CUAHSI HIS Node (Windows)

Data

IIS Web ServerASP . Net

SQL ServerArcGIS

Technologies

HDASHODM

Web

ServiceWeb

Services

Web Serviceproxies

Remote CUAHSI HIS Node (Windows)

Data

IIS Web ServerASP . Net

SQL ServerArcGIS

Technologies

HDASHODM

Web

ServiceWeb

Services

Web Serviceproxies

Remote CUAHSI

HIS Nodes (Windows)

Page 22: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D

HIS Scalability• Adding…

…data types and datasets; processing models and services; servers; users and roles –

- shall not create unmanageable bottlenecks that require system re-engineering

• Designing for scalability:• Distilling a generic set of web service signatures; resolving

semantic and structural heterogenities• Using HODM as a common generic format for time series

data, for ease of coding and uniform search interfaces• HDAS GUI design to abstract specifics of disparate

repositories• Leveraging common CI components developed in GEONNeed to work with agencies to remove web services

bottleneck

Page 23: SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D

Future Work

• Updating and standardizing web services; services against additional repositories

• Adopting HODM for storing time series observations, and developing tools for loading data, querying, analyzing and visualizing data in HODM

• Finalizing the Windows-based CUAHSI Node, and preparing it for distribution, along with documentation

• “Digital Watershed” conceptualization