Top Banner
Integrácia a spracovanie údajov o životnom prostredí Technológia ADMIRE Ondrej Habala Seminár CRISIS, 18.10.2011 ITMS 26240220060
46

Integr ácia a spracovanie údajov o životnom prostredí Technol ógia ADMIRE

Feb 23, 2016

Download

Documents

ulf

Integr ácia a spracovanie údajov o životnom prostredí Technol ógia ADMIRE. Ondrej Habala Seminár CRISIS, 18.10.2011 ITMS 26240220060. Goals. Accelerate access to and increase the benefits from data exploitation; - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

Integrácia a spracovanie údajov o životnom prostredí

Technológia ADMIRE

Ondrej HabalaSeminár CRISIS, 18.10.2011

ITMS 26240220060

Page 2: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Goals

• Accelerate access to and increase the benefits from data exploitation;

• Deliver consistent and easy to use technology for extracting information and knowledge;

• Cope with complexity, distribution, change and heterogeneity of services, data, and processes, through abstract view of data mining and integration; and

• Provide power to users and developers of data mining and integration processes.

Page 3: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

ADMIRE Architecture: Separation of Concerns

Page 4: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

ADMIRE Architecture

Page 5: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

ADMIRE’s High-Level Architecture

Page 6: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

ADMIRE Gateways

USMT

Page 7: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

DISPEL – Data Intensive Systems Process-Engineering Language

• Data-intensive distributed systems• Connection point of complex application requests

and complex enactment systems–Benefit: method development, engineering and evolution

of supported practices can take place independently in each world

• Describes enactment requests for streaming-data workflows processes

• “Process-engineering time” – transform and optimize process in preparation for enactment period

Page 8: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

DISPEL: Simple Example

Creating connections

String sql1 = "SELECT * FROM some_table";String sql2 = “SELECT * FROM table2”;String resource = "128.18.128.255";

SQLQuery query = new SQLQuery; |- sql1, sql2 -| => query.expression; |- resource -| => query.resource;

Tee tee = new Tee;query.result => tee.connectInput;

Creating streams of literals

Page 9: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

DISPEL – real use

Page 10: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

APLIKAČNÉ ŠTÚRIENASADENIE TECHNOLÓGIE ADMIRE V ŽIVOTNOM PROSTREDÍ

18.10.2011

Page 11: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Flood ApplicationData sets used in hydrological scenarios

FSKD 2010 Yantai, China, August 10-12 11

Dataset Domain Description Volume Temporal coverage Spatial coverage

HUSAV Hydrology Data from two probes, containing water saturation of soil

10s of MB 1998-2007 Two distinct points

MARS Meteorology Historical meteorological data (temperature, rainfall, etc) for Slovakia

100s of MB 1975-2007 Slovakia (grid 50x50 km)

SVP Hydrology Data from waterworks in western Slovakia (mainly river Váh) – outflows, water levels, temperature, rainfall

100s of MB 1998-2007 15 distinct waterworks

DAISY Pedology Various pedological parameters for one probe in southern Slovakia

10s of MB 1961-2000 One point

WOFOST Pedology Crop data (with attached soil and meteorological data) for Slovakia, year 2006

10s of MB 2006 Slovakia (grid)

SHMU_CURR Meteorology On-line database of meteorological data – copied from SHMI web; including radar imagery

10s of GB + 2008- Slovakia (about 100 distinct probes)

SHMU_HIST Meteorology Historical meteorological data from SHMI probes

100s of MB 1998-2007 Slovakia (more than 100 distinct probes)

SHMU_GRIB Meteorology Historical temperatures and rainfall amounts in a gridded binary format

100s of GB 1998-2007 Slovakia (grid, various sizes)

RADAR Meteorology Weather radar imagery 100s of GB 2005-2008 Slovakia

SHMU_HYDRO Hydrology Historical data from hydrological measurement stations

10s of MB 1998-2007 Orava and upper Vah river

SOIL_RET Pedology Water retention capacities of soil 10s of MB current (no time series applicable)

Vah river watershed area

Page 12: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Orava scenario• Legend

– Green area – Orava (part of north Slovakia)

– Blue – Orava reservoir and local rivers

– Red dots – hydrological measurement stations

• Notes– We are interested only

on hydrological stations below the Orava reservoir

– In our tests we will use the hydrological station 5830 (Tvrdosin)

Page 13: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

ORAVA – data mining concept• Predictors – rainfall amount (reservoir and station), air

temperature (reservoir and station), reservoir discharge, reservoir temperature

Time Water tempOrava

Rainfall Orava

Air temp Orava

Air tempStation

RainFallStation

OutflowOrava

Water -levelStation

Water tempStation

T-4 E-4 R-4 A-4 B-4 S-4 D-4 X-4 Y-4

T-3 E-3 R-3 A-3 B-3 S-3 D-3 X-3 Y-3

T-2 E-2 R-2 A-2 B-2 S-2 D-2 X-2 Y-2

T-1 E-1 R-1 A-1 B-1 S-1 D-1 X-1 Y-1

T E R A B S D X Y

T+1 R+1 A+1 B+1 S+1 D+1 X+1 Y+1

T+2 R+2 A+2 B+2 S+2 D+2 X+2 Y+2

T+3 R+3 A+3 B+3 S+3 D+3 X+3 Y+3

T+4 R+4 A+4 B+4 S+4 D+4 X+4 Y+4

T+5 R+5 A+5 B+5 S+5 D+5 X+5 Y+5

T+6 R+6 A+6 B+6 S+6 D+6 X+6 Y+6

• Targets – water level and temperature at a station below the reservoir

Predicted by a meteo model

Given in a schedule

Targets of data mining

Page 14: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

ORAVA – data integration

• Integration of data from

– GRIB files– Reservoirs

• Inputs– Time period of

experiment– Reservoir ID– List of hydro

stations– Geo coordinates

Page 15: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

ORAVA – data setsDataset Domain Description Volume Temporal

coverageSpatial

coverageSVP Hydrology Data from waterworks in

western Slovakia (mainly river Váh) – outflows, water levels, temperature, rainfall

100s of MB 1998-2007 15 distinct waterworks

SHMU_CURR Meteorology On-line database of meteorological data – copied from SHMI web; including radar imagery

10s of GB + 2008- Slovakia (about 100 distinct probes)

SHMU_HIST Meteorology Historical meteorological data from SHMI probes

100s of MB 1998-2007 Slovakia (more than 100 distinct probes)

SHMU_GRIB Meteorology Historical temperatures and rainfall amounts in a gridded binary format

100s of GB 1998-2007 Slovakia (grid, various sizes)

SHMU_HYDRO

Hydrology Historical data from hydrological measurement stations

10s of MB 1998-2007 Orava and upper Vah river

Page 16: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Integrated raw data

ORAVA ScenarioIntegrated and preprocessed data

Water_temp [24 hours] Orava

Air_tempOrava

RainfallOrava

OutflowOrava

RainfallStation

Air_tempStation

Flow/HeightStation

Water_tempStation

-4 30 -5.55E-20 269.0278 28 0.71 -4 30 -5.55E-20 269.0476 28.62 0.7

-5 30 -4.24E-20 269.5059 28.62 0.7-5 30 -8.47E-20 270.2394 28.62 0.7-5 30 -8.47E-20 270.8507 28 0.7-3 50 -8.47E-20 271.2792 28 0.7-3 50 -8.47E-20 271.9238 28 0.8

Water_tempOrava

Air_tempOrava

RainfallOrava

OutflowOrava

RainfallStation

Air_tempStation

Flow/HeightStation

Water_tempStation

1.000000 -4.0 0.0 30.0 0.0 -3.12223 28.00 0.71.000000 -4.0 0.0 30.0 0.0 -3.10240 28.62 0.70.995833 -5.0 0.0 30.0 0.0 -2.64408 28.62 0.70.991667 -5.0 0.0 30.0 0.0 -1.91062 28.62 0.70.987500 -5.0 0.0 30.0 0.0 -1.29926 28.00 0.70.983333 -3.0 0.0 50.0 0.0 -0.87076 28.00 0.70.979167 -3.0 0.0 50.0 0.0 -0.22617 28.00 0.8

Integrated preprocessed data

Tim

e [h

ours

]Ti

me

[hou

rs]

Page 17: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Properties \ Model Linear regression

Multilayer perceptron

Correlation coefficient 0.9639 0.9821

Mean absolute error 1.1791 0.7748

Root mean squared error 1.4607 1.0386

Relative absolute error 23.8739 % 15.6884 %

Root relative squared error 26.609 % 18.9195 %

Total Number of Instances 8760 8760

Orava ScenarioWater temperature prediction

Page 18: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Properties \ Model Multilayer perceptron

Correlation coefficient 0.9816

Mean absolute error 0.4105

Root mean squared error 0.9673

Relative absolute error 30.5869 %Root relative squared error 19.2384 %

Total Number of Instances 8735

Orava ScenarioWater level prediction

Page 19: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Orava ScenarioData integration workflow

Page 20: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Orava ScenarioTraining workflow

Page 21: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Orava ScenarioPrediction workflow

Page 22: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

• Needed to write custom activities for certain data extraction tasks

• Data integration was the most complex part of the scenario in terms of workflow design

• Data integration was quite easy to write and modify in DISPEL once we had all the PEs in place– Used composite PE to extract different types of quantities

from meteorological GRIB files

Implementation Notes

Page 23: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

ADMIRE Architecture: Separation of Concerns

Page 24: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Orava Scenario Portal

Page 25: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Orava Scenario Portal

Page 26: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

Radar Scenario

Very short-term rainfall prediction from weather radar data

Page 27: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Radar ScenarioDescription

Network of synoptic stations in Slovakia• 27 stations in Slovakia• Used data from years 2007 and 2008• Available variables: rainfall, humidity, Radar

reflexivity, atmospheric pressure and temperature values for each hour

• Very short-term rainfall prediction from weather radar dataMovement of areas with higher air moisture content, and thus also higher precipitation potential

Page 28: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Time Wind Radar reflexivity

Rainfall

Orava

T-2 W-2 D-2 F-2

T-1 W-1 D-1 F-1

T W D F

T+1 W+1 D+1 F+1

T+2 W+2 D+2 F+2

Radar ScenarioMain predictors and target variables

• Overview of the main predictors and target variables in the Radar scenario.

• The green cells are predicted from meteo-model. Blue cells are from model, based on motions vectors. Yellow cells are final target of data mining.

Page 29: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

• Isotonic regression model• 10-fold Cross Validation

• Hydro-meteorological performance

Radar ScenarioAtributes of model

Numerical characteristic ValueCorrelation coefficient 0.4593Mean absolute error 0.1105Root mean squared error 0.5490Total number of instances 89 746

Attribute \ Threshold 0.3 mm 0.6 mm Probability of detection 0.6387 0.5622 Miss Rate 0.0185 0.0158 Hans-Kuiper True skill score 0.5987 0.5383 Proportion of correct 0.9443 0.9618

Page 30: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

• Other tested models – Neural networks, SMOreg, linear regression, ...– Reached correlation coeficient between 0,35 and 0,42– Validation - 10 Cross Fold

• Problems in model creation :– process is significantly stochastic– Some input variables/parameters (humidity) are backwards dependent

on output – rainfall. – Meteorological process is very sensitive – Reflection matrix represents quantity of water in atmosphere, not exact rainfall rate in specified area, as opposed to data from

synoptic stations

RADAR model

Page 31: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Radar ScenarioStart End Step

SelectRadarFiles

ReadRawRadarData

RadarDataTime Synchronization

Rainfall data(SQL Query)

Expression Resource

Tuple aritmetic project

Column names

Expression

Generic Tuple Transform

Tuple Simple Merge

RadarDataSpace Synchronization

Classify

Load modelObtainFromFTP

Filename Host

Deserializer

TupleToCSVHeader

Result

SelectRangeFiles

Start End Step

ReadRawRadarData

RadarDataTime Synchronization

PrecipitationSQL Query

Expression Resource

RadarDataSpace Synchronization

Tuple Aritmetic Project

Generic Tuple Transform

BuildClassifier

Serialiser

DeliverToFTP

Host

Filename

Algorithm class

Class index

Training Forecast

Page 32: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Radar ScenarioMotion vector computation

file name resource

Read From File

file name resource

Read From File

file name resource

Read From File

ImageMotion Vector

Radar Image Motion

RadarImage Visualization

DeliverToFTP

file name

host

RadarImage Visualization

DeliverToFTP

file name

host

RadarImage Visualization

DeliverToFTP

file name

host

Page 33: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

SVP Scenario

Forecast of reservoir inflow based on temperature, precipitation and snow

cover

Page 34: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

SVP ScenarioStructure of data

Time AirTemperature

RainfallOrava Snow_prev Snow Inflow_prev Inflow

t-1 E(t-1) R(t-1) S(t-1) F(t-1)

t E(t) R(t) P(t) S(t) I(t) F(t)

t+1 E(t+1) R(t+1) P(t+1) S(t+1) I(t+1) F(t+1)

t+2 E(t+2) R(t+2) P(t+2) S(t+2) I(t+2) F(t+2)

t+3 E(t+3) R(t+3) P(t+3) S(t+3) I(t+3) F(t+3)

t+4 E(t+4) R(t+4) P(t+4) S(t+4) I(t+4) F(t+4)

1. P(t) = S(t-1)I(t) = F(t-1)

2. S(t) = f(P(t), R(t), E(t))F(t) = h(I(t), S(t), E(t), R(t))

Two steps of prediction :1. Copy previous values of snow quantity and inflow

volume. 2. Apply trained models (snow model at first, and

then inflow model).

Page 35: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

• 10-Fold Cross Validation, 8760 records; models for inflow prediction

• N-Fold Cross Validation, 8760 records; Decision Tree Model M5P

SVP ScenarioModels & Attributes

Properties \ ModelPerceptron

Neural Network

Gaussian Process

Linear Regression

Decision Tree M5P

Correlation coefficient 0.8810 0.8469 0.8079 0.8899Mean absolute error 7.0577 6.9821 8.3816 5.2562Root mean squared error 14.1005 15.4974 17.0586 13.1983Relative absolute error 40.5821% 40.1472% 48.1942% 30.2231%Root relative squared error 48.6547% 53.4747% 58.8616% 45.5415%

Properties \ N-Fold N = 10 N = 20 N = 25 N = 50 N = 100Correlation coefficient 0.8899 0.8933 0.8855 0.8937 0.8934Mean absolute error 5.2562 5.1253 5.2484 5.0973 5.0908

Root mean squared error 13.1983 13.0090 13.4454 12.9807 13.0033

Relative absolute error 30.2231% 29.4869% 30.2017% 29.3317% 29.2915%

Root relative squared error 45.5415% 44.9218% 46.4373% 44.8306% 44.9086%

Page 36: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

SVP ScenarioData Integration workflow

Query Resource Query Resource

Inflow into reservoir(SQL Query)

Quantity of snow(SQL Query)

Query Resource

Temperature and rainfall at reservoir

(SQL Query)

Daily Aggregation

Tuple merge

Tuple merge

Final projection(TupleAritmeticProject)

Transform to WRS(TupleToWRS)

Expression

Result col. names

Eliminate summer seasons (GenericTupleTransform)

Integrated data

Page 37: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

SVP ScenarioModel training workflow

Build classifier - Linear regression model

Serializer

Build classifier - decision tree model

Serializer

Store snow model to repository

Store inflow model to repositoryModel name

Linear trend filter(for snow column)

Delete invalid rows

Data correction

Model name

Preprocessing 1 Preprocessing 2

Snow index

Class index Class index

Integrated data

Page 38: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

SVP ScenarioForecast workflow

Page 39: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

ADMIRE Tools

• Registry client GUI• Process designer• SKSA• Gateway Process

Manager• DMI Model Visualizer

Page 40: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Registry client GUI

• Read-only access to ADMIRE Registry– list PEs and view their properties– search, sort PEs

• Write access to Registry is done via DISPEL documents

Page 41: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Process Designer

Manage your DMI project (files, directories – project structure)

Edit your DMI process graphically

View the canonical (DISPEL) representation of your DMI process in real time

Select elements from the Registry View the properties of

your chosen elements

Page 42: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Semantic Knowledge Sharing Assistant

• Context the user works in– Several reservoirs, one

settlement• Knowledge that may be

useful in this context– previously entered by

other users

Provides access to existing user’s knowledge, sorting and selecting it automatically according to the user’s current working context

Page 43: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Gateway Process Manager• Keep track of running

processes– stop/pause/cancel the

process– view the process’ source

DISPEL• access process’ results

(if available) in several ways – raw or visualized

Page 44: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

DMI Model VisualizerFor data mining experts

• Visualization of data mining models– Read Weka classifier

object– produce PMML

description of the model

– Show the PMML as a graphical tree

Page 45: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Custom Application Portalfor end-users (domain experts)

Page 46: Integr ácia  a spracovanie údajov o životnom prostredí Technol ógia  ADMIRE

ITMS projekt 26240220060

Vďaka za pozornosť