Top Banner
Towards Personalized and Active Information Management for Meteorological Investigations Beth Plale Indiana University USA
29

Towards Personalized and Active Information Management for Meteorological Investigations

Jan 11, 2016

Download

Documents

winda

Towards Personalized and Active Information Management for Meteorological Investigations. Beth Plale Indiana University USA. Problem Statement. Mesoscale meteorology research is highly data-driven . - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards Personalized and Active Information Management for Meteorological Investigations

Towards Personalized and Active Information

Management for Meteorological Investigations

Beth Plale

Indiana University

USA

Page 2: Towards Personalized and Active Information Management for Meteorological Investigations

Problem Statement• Mesoscale meteorology research is highly data-

driven.– Large percentage of data streams in from

observational platforms. Available in OPeNDAP servers.

– Data that is over 10 minutes old is too old. – Researchers are currently working on increasing real-

time responsiveness to developing weather conditions.

• Mesoscale meteorology is a vast information space.– Forecasting models assimilate data from growing

number of sources

Page 3: Towards Personalized and Active Information Management for Meteorological Investigations

Solution Statement• Internet has proven the utility of user-oriented

view towards information space management– Browser, bookmarks to organize– Blogs, web page tools (FrontPage, Dreamweaver) to

publish

• We apply concept of user-oriented view to management of mesoscale meteorology information space.

• myLEAD: tool to help an investigator make sense of, and operate in, the vast information space that is mesoscale meteorology.

Page 4: Towards Personalized and Active Information Management for Meteorological Investigations

Motivation for LEAD• Each year, mesoscale weather – floods, tornadoes,

hail, strong winds, lightning, and winter storms – causes hundreds of deaths, routinely disrupts transportation and commerce, and results in annual economic losses > $13B.

Page 5: Towards Personalized and Active Information Management for Meteorological Investigations

Conventional Numerical Weather Prediction

OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting Satellite

Wind ProfilersGPS Satellites

Page 6: Towards Personalized and Active Information Management for Meteorological Investigations

OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting Satellite

Wind ProfilersGPS Satellites

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Conventional Numerical Weather Prediction

Page 7: Towards Personalized and Active Information Management for Meteorological Investigations

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction

PCs to Teraflop Systems

Conventional Numerical Weather Prediction

OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting Satellite

Wind ProfilersGPS Satellites

Page 8: Towards Personalized and Active Information Management for Meteorological Investigations

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction

PCs to Teraflop Systems

Product Generation, Display,

Dissemination

Conventional Numerical Weather Prediction

OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting Satellite

Wind ProfilersGPS Satellites

Page 9: Towards Personalized and Active Information Management for Meteorological Investigations

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction

PCs to Teraflop Systems

Product Generation, Display,

Dissemination

End Users

NWSPrivate Companies

Students

Conventional Numerical Weather Prediction

OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting Satellite

Wind ProfilersGPS Satellites

Page 10: Towards Personalized and Active Information Management for Meteorological Investigations

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction

PCs to Teraflop Systems

Product Generation, Display,

Dissemination

End Users

NWSPrivate Companies

Students

Conventional Numerical Weather Prediction

OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting Satellite

Wind ProfilersGPS Satellites

The process is entirely serialand pre-scheduled: no response

to weather!

The process is entirely serialand pre-scheduled: no response

to weather!

Page 11: Towards Personalized and Active Information Management for Meteorological Investigations

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction

PCs to Teraflop Systems

Product Generation, Display,

Dissemination

End Users

NWSPrivate Companies

Students

The LEAD Vision: No Longer Serial or Static

OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting Satellite

Wind ProfilersGPS Satellites

Page 12: Towards Personalized and Active Information Management for Meteorological Investigations

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction

PCs to Teraflop Systems

Product Generation, Display,

Dissemination

End Users

NWSPrivate Companies

Students

The LEAD Vision: No Longer Serial or Static

OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting Satellite

Wind ProfilersGPS Satellites

Page 13: Towards Personalized and Active Information Management for Meteorological Investigations

LEAD data: initial working data set

• ETA model gridded analysis

• METAR surface observations

• Rawinsondes – upper air balloon observations

• ACARS – commercial aircraft temperature and wind observations

• NEXRAD Level II data

• GOES visible satellite data

Page 14: Towards Personalized and Active Information Management for Meteorological Investigations
Page 15: Towards Personalized and Active Information Management for Meteorological Investigations

Returning to Solution Statement• We apply concept of user-oriented view to

management of mesoscale meteorology information space.

• myLEAD: tool to help an investigator make sense of, and operate in, the vast information space that is mesoscale meteorology.

Page 16: Towards Personalized and Active Information Management for Meteorological Investigations

Information space management tool

• At core is metadata catalog– Why? Observational products already being

stored elsewhere.• Public file and could be large, so do not want to

copy user’s file system. Instead maintain “bookmark”

• Scale to support thousands of distributed users, including individual investigators, pre-college classroom investigators, casual observers.

Page 17: Towards Personalized and Active Information Management for Meteorological Investigations

Technical Challenges

• Querying must be efficient– Over data products described by rich domain-specific metadata– Over data products whose description can be augmented over time

• Obtaining metadata is hard– Automate as much as possible

• Privacy must be fully enforced– Any data product that user designates as private must remain private

• Publishing– Publish product to larger community:

• data file, model output, full experiment– Must be under user control– Discovery of information that has been made public

• Build trust– User may work within myLEAD space for 5 years of graduate work, for

instance– User must be convinced of privacy, reliability, longevity, etc.

Page 18: Towards Personalized and Active Information Management for Meteorological Investigations

Rundown on Implementation Specs

• Building on top of MCS and OGSA-DAI– MCS for extensible db schema, general db schema,

and security infrastructure already in place– OGSA-DAI for grid/web service architecture

• Database used is mySQL 5.0– Supports stored procedures– Ogsa-dai to mySQL is JDBC

• Data product descriptions in and out of database conform to LEAD-specific XML schema.

• myLEAD server and myLEAD agent are written in java.

Page 19: Towards Personalized and Active Information Management for Meteorological Investigations

Related Work• mySpace – AstroGrid, UK

– Similar to myLEAD in reigning information space– Creates swatches in large federation of data archives for the cache and

persistent data for a “community”– Provides common query access over cache space and persistent space

• RDF (Resource Description Framework)

– Basic building block is the subject-predicate-object triple:– [S] – P -> [O] [Dickens] – hasWritten -> [Pickwick Papers] – Good for storing detailed relationship information (good for

understanding the relationship between two terms)• NEESgrid – NCSA

– Uses RDF– Little available in public literature

• myGrid Information Repository (MIR) – myGRID, Manchester – Most similar to myLEAD– Support for text search scientific papers, uses Life Sciences Identifier

(LSID)– myLEAD stronger personal orientation (gurantees, publishing, automatic

metadata generation)

Page 20: Towards Personalized and Active Information Management for Meteorological Investigations

myLEAD service

Server sideservices

Client sideservices

datamodeldata

model

MCSMCS

myLEAD stored procedures

OGSA-DAI

JDBC

MCS client

myLEAD agent

Portal access to myLEAD

User interface

relational DB

myLEADmyLEAD

myLEAD Architecture

Page 21: Towards Personalized and Active Information Management for Meteorological Investigations

FactoryFactory

myLEAD“agent” instance

myLEAD“agent” instance

WRF modelWRF modelData miningtask

Data miningtask

workflowworkflow

myLEADservice

myLEADservice

Storage Repository

Service (RLS)

Storage Repository

Service (RLS)

myLEAD portletas component of LEAD portal

/var/tmp/wrf_tmp

IU NCSA

myLEAD use scenario

Workflow confers with myLEAD “agent” to determine location of scratch space

Page 22: Towards Personalized and Active Information Management for Meteorological Investigations

Metadata Catalog Data Model• Users

• Investigations– Tornado April 20 Chicago Illinois

• Experiments– Ensemble: run of 100 simultaneous forecast models

parameterized slightly differently• Collections• Logical files

– Input observational files, input parameters, derived files, analysis results, images, model results, workflows, execution status messages

Abe Bing Caru

Page 23: Towards Personalized and Active Information Management for Meteorological Investigations

Investigation

User – DublinCore

Attributes storedin “type” tables: i.e., string, float, temporal, int. Great extensibility, but need to carefully control naming; efficient querying could be an issue as well.

Logical file

Collection

Data Model

Page 24: Towards Personalized and Active Information Management for Meteorological Investigations

myWorkspace: J. Kowaleski

preferences

Experiment 1: Norman, OK 21Oct04:23:11:45

Workflow template vizEta 03Aug04:13:35:40Workflow template WRF 15May04:05:25:59

Favorite spaces

Home disk space

Thor cluster scratch space

Input parameters

NEXRAD 26Oct04:13:45:40

GOES-infrared 26Oct04:12:00:00

METAR 26Oct04:09:10:05

Wrf-out1-26Oct04:13:35:40

Input observational

WRF-out

Wrf-out2-26Oct04:13:37:25

Wrf-out3-26Oct04:13:43:15

workflow instance

Collection level

Logical file level

Have associated a set of attributes that describe this data product

Browser provides usera hierarchical view of space that is essentially flat. Users like hierarchy.

Data Model

Page 25: Towards Personalized and Active Information Management for Meteorological Investigations

myLEAD agent

• Separate transient grid/web service– Has state about user, current investigation and experiment– Embeds myLEAD client API

• Purpose: – Controls naming– Helps use database structure in repeatable, meaningful way

• Maintains FSM of current state of execution; stores into new collection based on state

– Input model run analysis final results

– Derives metadata attributes for new data product object when created during course of workflow by means of:

• Case-based reasoning• Internal state• Consulting ontology

Page 26: Towards Personalized and Active Information Management for Meteorological Investigations

Resources

Geo- Data products

Workflow scripts

compute resources,storage resource

Data analytics resources (statistics table)

services

Observational data

Model generated data

Collections

Derived data

Data analytics

Model input resources

Resources: “things that need describing (i.e., metadata)”

Data mining

Data Product Metadata

Page 27: Towards Personalized and Active Information Management for Meteorological Investigations

Data Product Metadata Notes

Global ID “LSID” for geosciences

Temporal coverage Same as spatial

Spatial coverage GML, THREDDS, FGDC, COARDS-CF

Geophysical quantity Defined by common vocabulary

Platform Goes10, Goes8; WSR-88, CASA

Instrument type

site East-west; KXYZ

Model run info Model derived data product

Syntactic description Binary format of data product

Contact info Dublin core

Physical location of service

Protocol to access service

Dataset summary Dublin core

list of predecessors GID of input data products, workflow instance

Event mesocyclone, storm cell, tornado

Quality Complex

Completeness

Page 28: Towards Personalized and Active Information Management for Meteorological Investigations

Current Research Challenges

• Publishing– Publishing data product to larger community:

• data file, model output, full experiment– Discovery of information that has been made public

• Guarantees– Any data product that user designates as private must remain private– When request for product is issued, product must exist

• Flexible yet efficient schema– Inherited from MCS, supports evolved understanding of data product

over time by means of extended attributes• Immutable investigations

– Collections, views, and logical files can be reused from earlier investigations without destroying integrity of earlier investigation

• Proactive agent– Infers metadata attributes from context of active experiment using case-

base reasoning.

Page 29: Towards Personalized and Active Information Management for Meteorological Investigations

Beth Plale

[email protected]

4 days away from our national elections … wish us well.