Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute of the Russian Academy of Sciences
Jan 14, 2016
Integrated Grid workflow for mesoscale weather modeling
and visualizationZhizhin, M., A. Polyakov, D.
Medvedev, A. Poyda, S. BerezinSpace Research Institute of the Russian Academy of Sciences
Abstract• For the model input and output we use a scalable parallel storage and data mining
system called ActiveStorage. It can store different types of weather data, provided they are in the same Command Data Model (UNIDATA CDM): NCEP reanalysis, NCDC stations weather data, MM5 model output.
• The MM5 is a mesoscale weather forecast model. For the input boundary conditions the model takes basic parameters such as elevation, air pressure and temperature, etc. It can ingest reanalysis and direct observation data. As the output the model provides high-resolution regional weather grids.
• To make the MM5 input data and the modeling results accessible on the Grid to the Earth Science community, we have developed a set of grid services (resources and activities) inside the OGSA-DAI (both ver. 2 and 3) grid service container.
• To visualize the weather data we have developed a special plugin for the NASA World Wind which can read the data directly from the OGSA-DAI resources and plot it over the 3D globe in different ways, such as contour lines, filled areas and vector fields.
Active Storage, Modeling, Data Mining and Visualization Services
Active Storage
Common Data Model
Microsoft SQL Server Cluster
OGSA-DAI and Matlab API
Numerical Modeling
Parallel mesoscale meteorologial model MM5
Windows Compute Cluster + MPI parallelization
Data Analysis
Environmental Scenario Search Engine (ESSE)
Trend and change detection algorithms
Visaulization
Microsoft Virtual EarthNASA World Wind
EVL UIC Scalable Graphics Environment (SAGE+SAIL)
Derived products from satellite data
Weather observations and reanalysis time series
Geographical information: elevation, hydrology, ...
Raw data Input
Model output
Tim
e s
eri
es
Gri
ds
Tra
ject
ori
es
Trends and relations
Raw data
Numerical models
Geographical information
ActiveStorage• ActiveStorage is a generic storage for arrays of primitive
data types.
• Its data model is based on the Unidata’s Common Data Model, used in netCDF, HDF5 and OpenDAP.
• Basically, ActiveStorage is a SQL Server database with CLR stored procedures and a client library.
• The stored procedures and the client library provide an abstraction layer for data access.
• Large arrays are split into chunks and can be spread across several parallel database servers for better performance.
SQL Server 2005/2008 DBSQL Server 2005/2008 DB
ActiveStorage components
Metadata tables
Metadata tables
Data and directory
tables
Data and directory
tables
Stored proceduresClient library
Common Data Model
This is the Common Data Model (CDM) used in the recent versions of OpenDAP, netCDF and HDF5. Its purpose is the representation of multidimensional scientific data.
-name
Group
-name-value-dataType
Attribute
-name-shape-dataType
Variable
-name-length
Dimension-char-byte-short-int-long-float-double-String
DataType
-name
Dataset
How it works
SQL Server DB
SQL Server DBClient library
2. Issue commands to the database server
3. Select the requested data from several chunks
3. Return the data parts to the client library4. Assemble the data parts into
one multi-dimensional array
1. Pass multi-dimensional data request to the client library
ApplicationApplication
Parallel query processing
SQL Server DB 1
SQL Server DB 1
Client libraryApplicationApplication
SQL Server DB 2
SQL Server DB 2
Parallel query performance
1 database server
4 parallel database servers
NCEP/NCAR Weather Reanalysis• Continually updating gridded data set
• Incorporates observations and global climate model output
• 74 weather parameters
• 5000 netCDF files, 30 – 500 MB each
NCDC Integrated Surface Database
• 1901 – 2008 time coverage.
• 30 million sensors.
• 1.7 billion observations.
Fixed ground stations Ships Mobile stations Buoys
0189010020999992007022817004+80050+016250FM-12+000899999V0202201N008019999999N0090001N1+00631+00541098651ADDGA1031+003009999KA1120N+99999...
date time lat lon
Mandatory data section Additional data sectionSection markerControl data section
• 470 000 ASCII files packed with gzip.
• 50 GB packed; 400 GB unpacked.
When you’ve downloaded and unpacked the data...
import ru.wdcb.mdb.NcConnectorimport com.microsoft.sqlserver.jdbc.SQLServerDriver s = 'jdbc:sqlserver://localhost:1433;databaseName=NCEP_01;user=guest;password=guest';connector = NcConnector();ncid = connector.nc_open(s,0);varid = connector.nc_inq_varid(ncid,'air'); origin = [0 0 10 10];size = [80000 1 1 1];stride = [1 1 1 1];A = connector.nc_get_vars_short(ncid,varid,origin,size,stride); plot(A, 'DisplayName', 'A', 'YDataSource', 'A'); figure origin = [0 0 0 0];size = [1 1 73 144];stride = [1 1 1 1];B = connector.nc_get_vars_shortm(ncid,varid,origin,size,stride); B = reshape(B,[73 144]);imagesc (B); figure(gcf);
MATLAB script using ActiveStorage library
Environmental Data Service: OGSA-DAI plugin
NCEPdatabase
NWSdatabase
SPIDRdatabases
Tomcat
DAI
Clients
Dataexport
getMetadata
Metadata XML
getProperty: sources
sources list
getXMLData
data XML
getNetCDFData
URL to NetCDF file
NetC
DF
fileserialisation
User
NetCDF file
ActiveStorage
MM5weathermodel
MS Excel
Any client
Activities for data export• XML output stream
– We have plugin for NASA World Wind to visualize XML-formatted data
– Can easily be transformed using XSLT to web page or another XML document, e.g. MS Excel
– Can be used as input for ESSE fuzzy logic search engine
• NetCDF binary data file– Standard for scientific data storage in files– There are several visualization programs for NetCDF– Compatible with Unidata Common Data Model standard
Data flow management by OGSA-DAI
OGSA-DAI query fromsingle data source
OGSA-DAI query fromdistributed data sources
Parallel mesoscale weather model MM5
Same Source Parallel MM5• Source code for the parallel
MPI and the single process MM5 model are the same
• Automated parallel code generation from MM5 sources by ANL:
– FLIC compiler– RSL library for model domain
segmentation and message exchange
• We have ported MM5 code to the MS Windows Server 2008 HPC platform
MM5 model as a grid client
Visualizing data from ActiveStorage with NASA WorldWind
A NASA WorldWind plugin, developed at the Moscow State University allows to retrieve data from ActiveStorage via an OGSA-DAI service.
Several kinds of visualization are available:- isolines- color map- vector field
OGSA-DAI services can be used by other applications to retrieve data from ActiveStorage
NASA World Wind as a grid client
Using OGSA-DAI services and a special API plugin, the NASA World Wind can visualize both the MM5 input and output datasets