Integrácia a spracovanie údajov o životnom prostredí Technológia ADMIRE Ondrej Habala Seminár CRISIS, 18.10.2011 ITMS 26240220060
Feb 23, 2016
Integrácia a spracovanie údajov o životnom prostredí
Technológia ADMIRE
Ondrej HabalaSeminár CRISIS, 18.10.2011
ITMS 26240220060
ITMS projekt 26240220060
Goals
• Accelerate access to and increase the benefits from data exploitation;
• Deliver consistent and easy to use technology for extracting information and knowledge;
• Cope with complexity, distribution, change and heterogeneity of services, data, and processes, through abstract view of data mining and integration; and
• Provide power to users and developers of data mining and integration processes.
ITMS projekt 26240220060
ADMIRE Architecture: Separation of Concerns
ITMS projekt 26240220060
ADMIRE Architecture
ITMS projekt 26240220060
ADMIRE’s High-Level Architecture
ITMS projekt 26240220060
ADMIRE Gateways
USMT
ITMS projekt 26240220060
DISPEL – Data Intensive Systems Process-Engineering Language
• Data-intensive distributed systems• Connection point of complex application requests
and complex enactment systems–Benefit: method development, engineering and evolution
of supported practices can take place independently in each world
• Describes enactment requests for streaming-data workflows processes
• “Process-engineering time” – transform and optimize process in preparation for enactment period
ITMS projekt 26240220060
DISPEL: Simple Example
Creating connections
String sql1 = "SELECT * FROM some_table";String sql2 = “SELECT * FROM table2”;String resource = "128.18.128.255";
SQLQuery query = new SQLQuery; |- sql1, sql2 -| => query.expression; |- resource -| => query.resource;
Tee tee = new Tee;query.result => tee.connectInput;
Creating streams of literals
ITMS projekt 26240220060
DISPEL – real use
ITMS projekt 26240220060
APLIKAČNÉ ŠTÚRIENASADENIE TECHNOLÓGIE ADMIRE V ŽIVOTNOM PROSTREDÍ
18.10.2011
ITMS projekt 26240220060
Flood ApplicationData sets used in hydrological scenarios
FSKD 2010 Yantai, China, August 10-12 11
Dataset Domain Description Volume Temporal coverage Spatial coverage
HUSAV Hydrology Data from two probes, containing water saturation of soil
10s of MB 1998-2007 Two distinct points
MARS Meteorology Historical meteorological data (temperature, rainfall, etc) for Slovakia
100s of MB 1975-2007 Slovakia (grid 50x50 km)
SVP Hydrology Data from waterworks in western Slovakia (mainly river Váh) – outflows, water levels, temperature, rainfall
100s of MB 1998-2007 15 distinct waterworks
DAISY Pedology Various pedological parameters for one probe in southern Slovakia
10s of MB 1961-2000 One point
WOFOST Pedology Crop data (with attached soil and meteorological data) for Slovakia, year 2006
10s of MB 2006 Slovakia (grid)
SHMU_CURR Meteorology On-line database of meteorological data – copied from SHMI web; including radar imagery
10s of GB + 2008- Slovakia (about 100 distinct probes)
SHMU_HIST Meteorology Historical meteorological data from SHMI probes
100s of MB 1998-2007 Slovakia (more than 100 distinct probes)
SHMU_GRIB Meteorology Historical temperatures and rainfall amounts in a gridded binary format
100s of GB 1998-2007 Slovakia (grid, various sizes)
RADAR Meteorology Weather radar imagery 100s of GB 2005-2008 Slovakia
SHMU_HYDRO Hydrology Historical data from hydrological measurement stations
10s of MB 1998-2007 Orava and upper Vah river
SOIL_RET Pedology Water retention capacities of soil 10s of MB current (no time series applicable)
Vah river watershed area
ITMS projekt 26240220060
Orava scenario• Legend
– Green area – Orava (part of north Slovakia)
– Blue – Orava reservoir and local rivers
– Red dots – hydrological measurement stations
• Notes– We are interested only
on hydrological stations below the Orava reservoir
– In our tests we will use the hydrological station 5830 (Tvrdosin)
ITMS projekt 26240220060
ORAVA – data mining concept• Predictors – rainfall amount (reservoir and station), air
temperature (reservoir and station), reservoir discharge, reservoir temperature
Time Water tempOrava
Rainfall Orava
Air temp Orava
Air tempStation
RainFallStation
OutflowOrava
Water -levelStation
Water tempStation
T-4 E-4 R-4 A-4 B-4 S-4 D-4 X-4 Y-4
T-3 E-3 R-3 A-3 B-3 S-3 D-3 X-3 Y-3
T-2 E-2 R-2 A-2 B-2 S-2 D-2 X-2 Y-2
T-1 E-1 R-1 A-1 B-1 S-1 D-1 X-1 Y-1
T E R A B S D X Y
T+1 R+1 A+1 B+1 S+1 D+1 X+1 Y+1
T+2 R+2 A+2 B+2 S+2 D+2 X+2 Y+2
T+3 R+3 A+3 B+3 S+3 D+3 X+3 Y+3
T+4 R+4 A+4 B+4 S+4 D+4 X+4 Y+4
T+5 R+5 A+5 B+5 S+5 D+5 X+5 Y+5
T+6 R+6 A+6 B+6 S+6 D+6 X+6 Y+6
• Targets – water level and temperature at a station below the reservoir
Predicted by a meteo model
Given in a schedule
Targets of data mining
ITMS projekt 26240220060
ORAVA – data integration
• Integration of data from
– GRIB files– Reservoirs
• Inputs– Time period of
experiment– Reservoir ID– List of hydro
stations– Geo coordinates
ITMS projekt 26240220060
ORAVA – data setsDataset Domain Description Volume Temporal
coverageSpatial
coverageSVP Hydrology Data from waterworks in
western Slovakia (mainly river Váh) – outflows, water levels, temperature, rainfall
100s of MB 1998-2007 15 distinct waterworks
SHMU_CURR Meteorology On-line database of meteorological data – copied from SHMI web; including radar imagery
10s of GB + 2008- Slovakia (about 100 distinct probes)
SHMU_HIST Meteorology Historical meteorological data from SHMI probes
100s of MB 1998-2007 Slovakia (more than 100 distinct probes)
SHMU_GRIB Meteorology Historical temperatures and rainfall amounts in a gridded binary format
100s of GB 1998-2007 Slovakia (grid, various sizes)
SHMU_HYDRO
Hydrology Historical data from hydrological measurement stations
10s of MB 1998-2007 Orava and upper Vah river
ITMS projekt 26240220060
Integrated raw data
ORAVA ScenarioIntegrated and preprocessed data
Water_temp [24 hours] Orava
Air_tempOrava
RainfallOrava
OutflowOrava
RainfallStation
Air_tempStation
Flow/HeightStation
Water_tempStation
-4 30 -5.55E-20 269.0278 28 0.71 -4 30 -5.55E-20 269.0476 28.62 0.7
-5 30 -4.24E-20 269.5059 28.62 0.7-5 30 -8.47E-20 270.2394 28.62 0.7-5 30 -8.47E-20 270.8507 28 0.7-3 50 -8.47E-20 271.2792 28 0.7-3 50 -8.47E-20 271.9238 28 0.8
Water_tempOrava
Air_tempOrava
RainfallOrava
OutflowOrava
RainfallStation
Air_tempStation
Flow/HeightStation
Water_tempStation
1.000000 -4.0 0.0 30.0 0.0 -3.12223 28.00 0.71.000000 -4.0 0.0 30.0 0.0 -3.10240 28.62 0.70.995833 -5.0 0.0 30.0 0.0 -2.64408 28.62 0.70.991667 -5.0 0.0 30.0 0.0 -1.91062 28.62 0.70.987500 -5.0 0.0 30.0 0.0 -1.29926 28.00 0.70.983333 -3.0 0.0 50.0 0.0 -0.87076 28.00 0.70.979167 -3.0 0.0 50.0 0.0 -0.22617 28.00 0.8
Integrated preprocessed data
Tim
e [h
ours
]Ti
me
[hou
rs]
ITMS projekt 26240220060
Properties \ Model Linear regression
Multilayer perceptron
Correlation coefficient 0.9639 0.9821
Mean absolute error 1.1791 0.7748
Root mean squared error 1.4607 1.0386
Relative absolute error 23.8739 % 15.6884 %
Root relative squared error 26.609 % 18.9195 %
Total Number of Instances 8760 8760
Orava ScenarioWater temperature prediction
ITMS projekt 26240220060
Properties \ Model Multilayer perceptron
Correlation coefficient 0.9816
Mean absolute error 0.4105
Root mean squared error 0.9673
Relative absolute error 30.5869 %Root relative squared error 19.2384 %
Total Number of Instances 8735
Orava ScenarioWater level prediction
ITMS projekt 26240220060
Orava ScenarioData integration workflow
ITMS projekt 26240220060
Orava ScenarioTraining workflow
ITMS projekt 26240220060
Orava ScenarioPrediction workflow
ITMS projekt 26240220060
• Needed to write custom activities for certain data extraction tasks
• Data integration was the most complex part of the scenario in terms of workflow design
• Data integration was quite easy to write and modify in DISPEL once we had all the PEs in place– Used composite PE to extract different types of quantities
from meteorological GRIB files
Implementation Notes
ITMS projekt 26240220060
ADMIRE Architecture: Separation of Concerns
ITMS projekt 26240220060
Orava Scenario Portal
ITMS projekt 26240220060
Orava Scenario Portal
Radar Scenario
Very short-term rainfall prediction from weather radar data
ITMS projekt 26240220060
Radar ScenarioDescription
Network of synoptic stations in Slovakia• 27 stations in Slovakia• Used data from years 2007 and 2008• Available variables: rainfall, humidity, Radar
reflexivity, atmospheric pressure and temperature values for each hour
• Very short-term rainfall prediction from weather radar dataMovement of areas with higher air moisture content, and thus also higher precipitation potential
ITMS projekt 26240220060
Time Wind Radar reflexivity
Rainfall
Orava
T-2 W-2 D-2 F-2
T-1 W-1 D-1 F-1
T W D F
T+1 W+1 D+1 F+1
T+2 W+2 D+2 F+2
Radar ScenarioMain predictors and target variables
• Overview of the main predictors and target variables in the Radar scenario.
• The green cells are predicted from meteo-model. Blue cells are from model, based on motions vectors. Yellow cells are final target of data mining.
ITMS projekt 26240220060
• Isotonic regression model• 10-fold Cross Validation
• Hydro-meteorological performance
Radar ScenarioAtributes of model
Numerical characteristic ValueCorrelation coefficient 0.4593Mean absolute error 0.1105Root mean squared error 0.5490Total number of instances 89 746
Attribute \ Threshold 0.3 mm 0.6 mm Probability of detection 0.6387 0.5622 Miss Rate 0.0185 0.0158 Hans-Kuiper True skill score 0.5987 0.5383 Proportion of correct 0.9443 0.9618
ITMS projekt 26240220060
• Other tested models – Neural networks, SMOreg, linear regression, ...– Reached correlation coeficient between 0,35 and 0,42– Validation - 10 Cross Fold
• Problems in model creation :– process is significantly stochastic– Some input variables/parameters (humidity) are backwards dependent
on output – rainfall. – Meteorological process is very sensitive – Reflection matrix represents quantity of water in atmosphere, not exact rainfall rate in specified area, as opposed to data from
synoptic stations
RADAR model
ITMS projekt 26240220060
Radar ScenarioStart End Step
SelectRadarFiles
ReadRawRadarData
RadarDataTime Synchronization
Rainfall data(SQL Query)
Expression Resource
Tuple aritmetic project
Column names
Expression
Generic Tuple Transform
Tuple Simple Merge
RadarDataSpace Synchronization
Classify
Load modelObtainFromFTP
Filename Host
Deserializer
TupleToCSVHeader
Result
SelectRangeFiles
Start End Step
ReadRawRadarData
RadarDataTime Synchronization
PrecipitationSQL Query
Expression Resource
RadarDataSpace Synchronization
Tuple Aritmetic Project
Generic Tuple Transform
BuildClassifier
Serialiser
DeliverToFTP
Host
Filename
Algorithm class
Class index
Training Forecast
ITMS projekt 26240220060
Radar ScenarioMotion vector computation
file name resource
Read From File
file name resource
Read From File
file name resource
Read From File
ImageMotion Vector
Radar Image Motion
RadarImage Visualization
DeliverToFTP
file name
host
RadarImage Visualization
DeliverToFTP
file name
host
RadarImage Visualization
DeliverToFTP
file name
host
SVP Scenario
Forecast of reservoir inflow based on temperature, precipitation and snow
cover
ITMS projekt 26240220060
SVP ScenarioStructure of data
Time AirTemperature
RainfallOrava Snow_prev Snow Inflow_prev Inflow
t-1 E(t-1) R(t-1) S(t-1) F(t-1)
t E(t) R(t) P(t) S(t) I(t) F(t)
t+1 E(t+1) R(t+1) P(t+1) S(t+1) I(t+1) F(t+1)
t+2 E(t+2) R(t+2) P(t+2) S(t+2) I(t+2) F(t+2)
t+3 E(t+3) R(t+3) P(t+3) S(t+3) I(t+3) F(t+3)
t+4 E(t+4) R(t+4) P(t+4) S(t+4) I(t+4) F(t+4)
1. P(t) = S(t-1)I(t) = F(t-1)
2. S(t) = f(P(t), R(t), E(t))F(t) = h(I(t), S(t), E(t), R(t))
Two steps of prediction :1. Copy previous values of snow quantity and inflow
volume. 2. Apply trained models (snow model at first, and
then inflow model).
ITMS projekt 26240220060
• 10-Fold Cross Validation, 8760 records; models for inflow prediction
• N-Fold Cross Validation, 8760 records; Decision Tree Model M5P
SVP ScenarioModels & Attributes
Properties \ ModelPerceptron
Neural Network
Gaussian Process
Linear Regression
Decision Tree M5P
Correlation coefficient 0.8810 0.8469 0.8079 0.8899Mean absolute error 7.0577 6.9821 8.3816 5.2562Root mean squared error 14.1005 15.4974 17.0586 13.1983Relative absolute error 40.5821% 40.1472% 48.1942% 30.2231%Root relative squared error 48.6547% 53.4747% 58.8616% 45.5415%
Properties \ N-Fold N = 10 N = 20 N = 25 N = 50 N = 100Correlation coefficient 0.8899 0.8933 0.8855 0.8937 0.8934Mean absolute error 5.2562 5.1253 5.2484 5.0973 5.0908
Root mean squared error 13.1983 13.0090 13.4454 12.9807 13.0033
Relative absolute error 30.2231% 29.4869% 30.2017% 29.3317% 29.2915%
Root relative squared error 45.5415% 44.9218% 46.4373% 44.8306% 44.9086%
ITMS projekt 26240220060
SVP ScenarioData Integration workflow
Query Resource Query Resource
Inflow into reservoir(SQL Query)
Quantity of snow(SQL Query)
Query Resource
Temperature and rainfall at reservoir
(SQL Query)
Daily Aggregation
Tuple merge
Tuple merge
Final projection(TupleAritmeticProject)
Transform to WRS(TupleToWRS)
Expression
Result col. names
Eliminate summer seasons (GenericTupleTransform)
Integrated data
ITMS projekt 26240220060
SVP ScenarioModel training workflow
Build classifier - Linear regression model
Serializer
Build classifier - decision tree model
Serializer
Store snow model to repository
Store inflow model to repositoryModel name
Linear trend filter(for snow column)
Delete invalid rows
Data correction
Model name
Preprocessing 1 Preprocessing 2
Snow index
Class index Class index
Integrated data
ITMS projekt 26240220060
SVP ScenarioForecast workflow
ITMS projekt 26240220060
ADMIRE Tools
• Registry client GUI• Process designer• SKSA• Gateway Process
Manager• DMI Model Visualizer
ITMS projekt 26240220060
Registry client GUI
• Read-only access to ADMIRE Registry– list PEs and view their properties– search, sort PEs
• Write access to Registry is done via DISPEL documents
ITMS projekt 26240220060
Process Designer
Manage your DMI project (files, directories – project structure)
Edit your DMI process graphically
View the canonical (DISPEL) representation of your DMI process in real time
Select elements from the Registry View the properties of
your chosen elements
ITMS projekt 26240220060
Semantic Knowledge Sharing Assistant
• Context the user works in– Several reservoirs, one
settlement• Knowledge that may be
useful in this context– previously entered by
other users
Provides access to existing user’s knowledge, sorting and selecting it automatically according to the user’s current working context
ITMS projekt 26240220060
Gateway Process Manager• Keep track of running
processes– stop/pause/cancel the
process– view the process’ source
DISPEL• access process’ results
(if available) in several ways – raw or visualized
ITMS projekt 26240220060
DMI Model VisualizerFor data mining experts
• Visualization of data mining models– Read Weka classifier
object– produce PMML
description of the model
– Show the PMML as a graphical tree
ITMS projekt 26240220060
Custom Application Portalfor end-users (domain experts)
ITMS projekt 26240220060
Vďaka za pozornosť