HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture

Post on 30-Jan-2016

52 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture. M. Piasecki November, 2007. Lecture. Demo of HydroSeek What are the search criteria? Functionality of the Engine Interface Data Sources Common Sources - PowerPoint PPT Presentation

Transcript

04/22/23 Department of Civil, Architectural & Environmental Engineering 1

HYDROSEEK and HYDROTAGGERA Search Engine for Hydrologists

GIS in Water Resources Lecture

M. Piasecki

November, 2007

04/22/23 Department of Civil, Architectural & Environmental Engineering 2

Lecture Demo of HydroSeek What are the search criteria? Functionality of the Engine Interface

Data Sources Common Sources Common Problems (Completeness, Syntax, Semantics)

Ontologies Ontology details Concept-to-data variable tagging

Architecture Flow Chart Technologies used

Demo of HydroTagger Why the Tagging? Technologies

04/22/23 Department of Civil, Architectural & Environmental Engineering 3

www.HydroSeek.org

04/22/23 Department of Civil, Architectural & Environmental Engineering 4

HIS Goals Hydrologic Data Access System – better access

to a large volume of high quality hydrologic data Support for Observatories – synthesizing

hydrologic data for a region Advancement of Hydrologic Science – data

modeling and advanced analysis Hydrologic Education – better data in the

classroom, basin-focused teaching

04/22/23 Department of Civil, Architectural & Environmental Engineering 6

Search multiple heterogeneous data sources simultaneously regardless of semantic or structural differences between them

Objective

NWIS

NARR

NAWQANAM-12

request

request

request

request

request

requestrequest

request

request

return

return

return

return

return

returnreturn

return

return

What we are doing now …..

04/22/23 Department of Civil, Architectural & Environmental Engineering 7

Semantic Mediator

What we would like to do …..

NWIS

NAWQA

NARR

generic

request

GetValues

GetValues

GetValues

GetValues

GetValues

GetValuesGetValues

GetValues

GetValues HODM

04/22/23 Department of Civil, Architectural & Environmental Engineering 8

Data sources…USGS

EPA

CIMS

TCEQ

NADP

04/22/23 Department of Civil, Architectural & Environmental Engineering 10

Spatial Coverage

STORET has 758 sites in Texas, TCEQ has 8407.

STORET has 47,602 sites in Florida, NWIS has 27,906.

NWIS has 121,545 in Minnesota, STORET has 22,260.

04/22/23 Department of Civil, Architectural & Environmental Engineering 11

Data Availability

04/22/23 Department of Civil, Architectural & Environmental Engineering 12

1957-19771977-20032003-2007

Nitrogen

Temporal Coverage

04/22/23 Department of Civil, Architectural & Environmental Engineering 13

Interface Problem

NWIS ~175 form elements on a single page

STORET + NWIS + TCEQ + CIMS = ???A drop down menu ∞

String search across parameter list? How about synonyms?‘Elevation, water surface’ vs. ‘stage height’

04/22/23 Department of Civil, Architectural & Environmental Engineering 14

Completeness Problem: Metadata Catalog• Better query performance• Freedom• Fewer errors

Total Number of Sites 274,918

Sites with geographic coordinates 274,435

Sites with State/County information 273,113

Sites with Hydrologic Unit Codes 128,646

Availability of geographic identifiers for stations in EPA STORET

04/22/23 Department of Civil, Architectural & Environmental Engineering 15

Heterogeneity Problem

Syntax E.g. date & time formats, Gregorian versus Julian

Data format/structure E.g. XML, HTML, tab/tilde/comma separated

text, gunzipped tar balls…

Semanticsmore …..

04/22/23 Department of Civil, Architectural & Environmental Engineering 16

Issues with Semantics Hyponymy Parameter “Groundwater level”, “Stream stage”, “Reservoir level” versus “Water level”

Pseudo hyponymy due to lack of metadata Parameter “Manganese, 6N hydrochloric acid extracted, recoverable, dry weight, milligrams per kilogram” versus “Manganese, milligrams per kilogram”

Synonymy ‘Total Kjeldahl Nitrogen’ vs. ‘Ammonia+Organic Nitrogen’

04/22/23 Department of Civil, Architectural & Environmental Engineering 17

Search Fine tune Retrieve

rather than

Search Retrieve

avoid ‘high precision, low recall’ and ‘low precision, high recall’

problems.

Search Strategy

04/22/23 Department of Civil, Architectural & Environmental Engineering 18

Layered Ontology Model

04/22/23 Department of Civil, Architectural & Environmental Engineering 19

NavigationCompound

Core

04/22/23 Department of Civil, Architectural & Environmental Engineering 20

Knowledge Base OWL Ontologies

‘Escherichia coli’ = ‘E. coli’‘E. coli’ is-a ‘Indicator Organism’

‘Copper’ is-a ‘Micronutrient’‘Copper’ isMeasuredIn ‘Medium’‘Medium’ = {Water, Soil…}‘Micronutrient’ is-a ‘Nutrient’

• Supports classification of search results

• Entities in the ontology are associated with measured variables in a relational database

• Helps solving semantic heterogeneity issues between data repositories

04/22/23 Department of Civil, Architectural & Environmental Engineering 21

04/22/23 Department of Civil, Architectural & Environmental Engineering 22

Point Observations Information ModelData Source

Network

Sites

Variables

Values

{Value, Time, Qualifier, Offset}

USGS

Streamflow gages

Neuse River near Clayton, NC

Discharge, stage (Daily or instantaneous)

206 cfs, 13 August 2006

• A data source operates an observation network• A network is a set of observation sites• A site is a point location where one or more variables are measured• A variable is a property describing the flow or quality of water• A value is an observation of a variable at a particular time• A qualifier is a symbol that provides additional information about the value• An offset allows specification of measurements at various depths in water

http://www.cuahsi.org/his/webservices.html

GetSites

GetSiteInfo

GetVariables

GetVariableInfo

GetValues

04/22/23 Department of Civil, Architectural & Environmental Engineering 23

Hydroseek Webservices Most Hydroseek functions are available as web services (SOAP)

Support for queries using GlobalChangeMasterDirectory GCMD keywords

Supports output in GeographyMarkupLanguage GML as well as WaterML

Drexel Server

HydroSeek

Native Services

MicroSoft Server

VirtualEarth MapSan Diego Supercomputer

Center Server

USGSDaily

EPASTORET

USGSRealtime

WaterOneFlow

WaterOneFlow

WaterOneFlow

WaterOneFlow TCEQ

WaterOneFlow CIMS

04/22/23 Department of Civil, Architectural & Environmental Engineering 24

GetStationsRequest

Response

BoundingBox

04/22/23 Department of Civil, Architectural & Environmental Engineering 25

GetStationsByHU

HUC_Code

Request

Response

Request

Response

04/22/23 Department of Civil, Architectural & Environmental Engineering 26

GetStationCatalogueFiltered

Request

Response

04/22/23 Department of Civil, Architectural & Environmental Engineering 27

GetStationCatalogue

04/22/23 Department of Civil, Architectural & Environmental Engineering 28

Allows searching multiple heterogeneous data sources simultaneously regardless of semantic or structural differences between them

Modular & extensible

Architecture Outline Inside the CUAHSI HOD Module

04/22/23 Department of Civil, Architectural & Environmental Engineering 30

The Database-Ontology Link

www.HdyroTagger.org

04/22/23 Department of Civil, Architectural & Environmental Engineering 31

1) MappingsApproved_Table

HydroSeek ODM neededan upgrade, i.e. additionaltables.

2) FrequentUpDates_Table

04/22/23 Department of Civil, Architectural & Environmental Engineering 32

How does the Tagging work?Step 1Users need to register on the web-site first before they can use the HydroTagger.

When registering select the testbed site you are affiliated with. Each testbed site needs ONE administrator who can then admit additional users for that specific testbed site.

Please send an email to identify the designated tagger site administrator so we can promote that person to the role.

04/22/23 Department of Civil, Architectural & Environmental Engineering 33

How does the Tagging work?

WATERS Network Information System

Step 2The “Sniffer” jumps into action and trawls through the testbed sites to find and identify new variablenames (once a week, currently every Sunday night)

It does so by using the regular web-services published through the WSDL (no “hacking”!!!)

It returns i) data updating information and ii) variablenames used and compares these to those used by HydroSeek.

04/22/23 Department of Civil, Architectural & Environmental Engineering 34

How does the Tagging work?Step 3The Tagger now updates the HydroSeek catalogue (an amalgamation of all 10 testbed catalogues) with the newly found data entries.

If it finds a new variablename (introduced during the dataloading process using the Data-Loader), it puts it into a table and offers it up to he HydroTagger GUI for semantic Tagging.

Test-Bed VarName Siteexist? VarName? content ActionCCBay DOConcSuf Y Y new data update Cat (Time)CCBay DOConcBot Y N new variable place in TaggerBin => DOCCBay DOConcMid N Y new data upudate Cat (Site+Time)

SRBHOS DO_Water Y Y new data update Cat (Time)

Minnehaha TempSurf Y N new variable place in TaggerBin => TempMInnehaha StreamDOCon Y N new variable place in TaggerBin => DO

SantaFe WaterDOCon Y N new variable place in TaggerBin => DOSantaFe GoldConc Y N new var/no conc place in TaggerBin => ??

04/22/23 Department of Civil, Architectural & Environmental Engineering 35

Thank you…Questions?

top related