iMarine Donatella Castelli (CNR-ISTI) Anton Ellebroek (FAO) Marc Taconet (FAO) Pasquale Pagano (CNR-ISTI)
Jan 13, 2015
iMarine
Donatella Castelli (CNR-ISTI)
Anton Ellebroek (FAO)
Marc Taconet (FAO)
Pasquale Pagano (CNR-ISTI)
Outline
1. Project Info & Objectives (D. Castelli)
2. e-Infrastructure seletected capabilities
(A.Ellenbroek)(A.Ellenbroek)
3. e-Infrastructure governance (M. Taconet)
4. Concluding remarks (M. Taconet)
Marine Knowledge All Projects meeting,
11-12 October 2012
iMarine project info
• Research Infrastructures CP & CSA funded by the
European Commission under the FP7 Capacities
Programme - eInfrastructure Unit DG INFSO
• 1 Nov 2011 - 30 Apr 2014• 1 Nov 2011 - 30 Apr 2014
• 13 partners
• 660 p/m co-funded by EU + 123 p/m in-kind
contribution from externals collaborators
Marine Knowledge All Projects meeting,
11-12 October 2012
iMarine Community
Marine Knowledge All Projects meeting,
11-12 October 2012
Objective
Launch an initiative aimed at establishing and
operating an e-infrastructure contributing to the
implementation of the principles of the Ecosystem
Approach to Fisheries Management andApproach to Fisheries Management and
Conservation of Marine Living Resources.
Marine Knowledge All Projects meeting,
11-12 October 2012
Implementing the EA
• Analysis and processing of a large amount of heterogeneous,
across-domain produced information
• Multidisciplinary & multifacets collaboration at the local,
national, regional and international levels
Marine Knowledge All Projects meeting,
11-12 October 2012
Physical and chemical features
Inventories of biological
information
Habitat types
Socio-economic
aspects
Marine resource
assessment
Fishery operation,
processingand trade
e-Infrastructure
Elecronic platform operated by a responsible
entity offering an open set of basic enabling
services (including access to resources) to a
distributed Community of Practice. By distributed Community of Practice. By
exploiting these shared services the members of
the Community of Practice realise economies of
scale.
Marine Knowledge All Projects meeting,
11-12 October 2012
iMarine focus…
Assemble
«The creation of the marine knowledge begins with
the observation of the sea and oceans. Data from
these observations are assembled, then analysed
to create information and knowledge.
Subsequently, the knowledge can be applied to Analyse
Production
…
Subsequently, the knowledge can be applied to
deliver smart sustainable growth, to assess the
health of the marine ecosystem or to protect
coastal communities.»
Marine Knowledge 2020 Communication
Marine Knowledge All Projects meeting,
11-12 October 2012
iMarine offer…
Assemble
Analyse
Production
…
Marine Knowledge All Projects meeting,
11-12 October 2012
Functionality Capacity
Building upon existing e-Infrastructures
e-Infrastructure
services
GBIFMyOcean
Genesi-
DEC
VENUS-C
(cloud)
EGI
(Grid&Cloud)
EMODNET
Marine Knowledge All Projects meeting,
11-12 October 2012
e-Infastructure ecosystem
• Interoperability
‒ Each e-Infrastructure can outsource
required facilities to other e-
Infrastructures
The same e-infrastructure can play – The same e-infrastructure can play
both provider and consumer roles
• Competition
– The most effective and sustainable
e-Infrastructures will survive
Marine Knowledge All Projects meeting,
11-12 October 2012
Data infrastructure components
Physical architecture
(computing & storage resouces)
Data & sw tool resources
e-Infrastructure software system
Governing procedures and policies
resources
Marine Knowledge All Projects meeting,
11-12 October 2012
Software system
Data M
anagement
WorkspaceTime SeriesEcologicalNiche
Modelling
Business DocumentWorkflow
Application
VesselActivityAnalyser
ResourceManagement
ResourceDiscovery
Process Execution
Security
Enabling
SearchMining
Storage
Access
Data M
anagementTransformation
Marine Knowledge All Projects meeting,
11-12 October 2012
Functionality classes
Data import & sharing
Data harmonization, validation and
enrichment
Data transformation, publishing and
visualization
Advanced data analysis
Collaborative environments
(Virtual Research
Environments)
Marine Knowledge All Projects meeting,
11-12 October 2012
Data resources
• FIGIS (reports)
• MyOcean (environmetal data)
• GENESI-DEC (earth observation data)
• DRIVER (publications)
• OBIS (marine species data)
• GBIF (occurrence points)
• Catalogue of Life (taxonomy)
• WoRMS (marine taxonomy) -> GENESI-DEC• WoRMS (marine taxonomy)
• ITIS (taxonomy)
• FAO SDMX Registry (statistical data)
• AquaMaps (species maps)
• FLOD (open linked data)
• Geonetwork (georefenced data)
• ….
Everything accessible through
– TAPIR, DigiR, OAI-PMH, OpenSearch, OGC W*S ,SDMX,….
-> GENESI-DEC
Marine Knowledge All Projects meeting,
11-12 October 2012
Products
The initiative
( CoP, board, policies, sustainability,…..)
The e-infrastructureThe e-infrastructure
(the operational platform)
The system
(the enabling sw system)
Marine Knowledge All Projects meeting,
11-12 October 2012
We are not starting from scratch
iMarine
(2011-2014)
Tech
no
log
yTe
chn
olo
gy
EA
P
art
ne
rsh
ips
DILIGENT
(2004-2006)
D4Science (2008-2009)
D4ScienceII (2010-2011)
Info
rma
tio
nTe
chn
olo
gy
Info
rma
tio
nTe
chn
olo
gy
EA
Marine Knowledge All Projects meeting,
11-12 October 2012
iMarine Opportunities
1. Functionality
Anton Ellenbroek, FAO
2. Technology
1. Functionality
Select from different Applications
iMarine e-Infrastructure - Options
Marine Knowledge All Projects meeting,
11-12 October 2012
TimeSeries
Data
TimeSeries
Data
• Import
• Validation
• Analysis
• Mining
Biodiversity
Data
Biodiversity
Data
• Discovery
• Access
iMarine e-Infrastructure – Selected Options
DataData• Access
• Analysis
Geospatial
Data
Geospatial
Data
• Discovery
• Access
• Process
Marine Knowledge All Projects meeting,
11-12 October 2012
Import Formatted and unformatted dataCSV-Import, harmonize, structure, publish as Time Series
Time Series - Import
Marine Knowledge All Projects meeting,
11-12 October 2012
Harmonize, Format, and Structure dataUse rules for formatting, range check, code-list recognition, etc.
Time Series - Validation
Marine Knowledge All Projects meeting,
11-12 October 2012
Time Series are treated as
tabular data
The Options include:
• Union / Join / Merge / Sum
• Graphs
• Plot on maps
• Analysis with R, weka,
Time Series – Data Analysis
• Analysis with R, weka,
RapidMiner
• Safely Share
• Publish to ‘World‘
Marine Knowledge All Projects meeting,
11-12 October 2012
Import Formatted and unformatted Code ListsCreate your own, or import from SDMX registry. Useful in validation
Time Series – Code Lists
Marine Knowledge All Projects meeting,
11-12 October 2012
Outlier detection allows to recognize anomalies in n-dimensional data-sets
Time Series Analysis - Data Mining
• Outlier detection
• Frequency detection
Marine Knowledge All Projects meeting,
11-12 October 2012
Example: Time Series analysis of Position Observations
Time Series processing techniques can be exploited for :
- Aggregating Vessel Data
- Calculating fishing effort
- Classifying the fishing activity
Time Series Analysis - Vessel Position Analysis
Marine Knowledge All Projects meeting,
11-12 October 2012
Biological datasetsBiological Data Provider Status
Catalogue of Life Released (2.9.0)
GBIF Released (2.9.0)
ITIS Released (2.9.0)
OBIS Released (2.9.0) OBIS Released (2.9.0)
WoRMS Released (2.9.0)
IRMNG Release (2.10.0) – October 16
NCBI Release (2.10.0) – October 16
Marine Knowledge All Projects meeting,
11-12 October 2012
Biodiversity Products Retrieval
Discovery and access across heterogeneous providersSearch by scientific name or common name; retrieval of taxonomy items and
occurrence points
Marine Knowledge All Projects meeting,
11-12 October 2012
Biodiversity – Taxonomic Items
• Active links on selected
items
• Common names matrix
• Checklists (DwC-A)
production via jobsproduction via jobs
– Executed in batch
– Concurrent jobs
– Live monitoring
Marine Knowledge All Projects meeting,
11-12 October 2012
Biodiversity – Occurrence Points Visualization
• Active links on selected
items
• Geo-visualisation
• Export
– DarwinCore– DarwinCore
– CSV
– CSV for openModeller
Marine Knowledge All Projects meeting,
11-12 October 2012
A set of probabilistic operations on Occurrence Points.
Two thresholds: T° for spatial proximity. Ts for a similarity confidence.
TsTBAMerge ,),( °
TsTBAInters ,),( °
A
x,y
Event Date
Modif Date
Author
B
x,y
Event Date
Modif Date
Author
<T°
=
Biodiversity – Occurrence Points Analysis
TsTAesNoDuplicat ,)( °
)( AOnEarth
)( AInSea
Author
Species
Scientific
Name
Author
Species
Scientific
Name
LexicalD(Author)
*
LexicalD(SciName)
> Ts
Take the most Recent
Marine Knowledge All Projects meeting,
11-12 October 201231
Biodiversity – Occurrence Points Analysis
• Dedicated environment for
occurrence points
management
• Open environment
• Export
– DarwinCore
– CSV
Marine Knowledge All Projects meeting,
11-12 October 201232
– CSV
– CSV for openModeller
Geospatial A Simple Scenario
After the Joint Activity with the other participants
Marine Knowledge All Projects meeting, 11-12
October 2012
Aq
ua
ma
ps
Su
itab
le
Distrib
utio
n
Visualization Example; Neural Network inferred suitable range maps
DISTRIBUTIONS
AQUAMAPS_SUITABLE
Aq
ua
ma
ps
Ne
ura
l
Ne
two
rk S
uita
ble
Distrib
utio
n
DISTRIBUTIONS
AQUAMAPS_SUITABLE_NEURAL_
NETWORK
Marine Knowledge All Projects meeting,
11-12 October 2012
Quality Analysis on Absence\Presence Points (Res. 0.5 degrees)
Aquamaps_suitable ( eq. to native)
TRUE POSITIVES 13
FALSE POSITIVES 0
TRUE NEGATIVES 7
FALSE NEGATIVES 21
ACCURACY 0.49
SENSITIVITY 0.38
SPECIFICITY 1
OMISSION RATE 0.62
ROC BEST THRESHOLD 0.17
AUC 0.41
A big plus: Integrated Quality Analysis with Biodiversity products
EVALUATORS
QUALITY_ANALYSIS
Quality Analysis on Absence\Presence Points (Res. 0.5 degrees)
Neural Network_suitable (eq. to native)
TRUE POSITIVES 32
FALSE POSITIVES 0
TRUE NEGATIVES 7
FALSE NEGATIVES 2
ACCURACY 0.95
SENSITIVITY 0.94
SPECIFICITY 1
OMISSION RATE 0.059
ROC BEST THRESHOLD 0
AUC 1
Marine Knowledge All Projects meeting,
11-12 October 2012
iMar
ine
iMarine e-Infrastructure
Data M
anagement
WorkspaceTime SeriesEcologicalNiche
Modelling
Business DocumentWorkflow
Application
VesselActivityAnalyser
EUPL
Application Platform
ResourceManagement
ResourceDiscovery
Process Execution
Security
Enabling
SearchMining
Storage
Access
Data M
anagementTransformation
Externalinteractions
SUBSYSTEM BOUNDARY
Subsystem Boundary
Marine Knowledge All Projects meeting,
11-12 October 2012
Se
rve
rS
erv
er
Se
rve
r
Secure, Powerful, and Standard-based
Portal
application
Se
rve
rapplication
• Secure: all data moved over the network and all server to server communications
are authorised and encrypted; data can be stored encrypted
applicationgCube Enabling Technology
Marine Knowledge All Projects meeting,
11-12 October 2012
Virtual Research Environments
User uploads/selects
apps
User uploads/selects
apps
• Stored in Software Repository
User register/selects
data sets
User register/selects
data sets
• Accessible through Mediators
VRE is the hardware, data, and applications allocated for a timeframe
to a group of people for effective collaborations
VRE is the hardware, data, and applications allocated for a timeframe
to a group of people for effective collaborations
data setsdata sets Mediators
Apps are executed on the most suitable HWApps are executed on the most suitable HW
• System deploys, configures, executes and monitors
User invites other usersUser invites other users
• System controls authentication and enforces policies
Marine Knowledge All Projects meeting,
11-12 October 2012
Summary
• An advanced data e-Infrastructure
• across location and ownership boundaries
• across technological boundaries
• regulated by governance and policies (EUPL)
• Designed for future developments• Designed for future developments
• integrate or develop applications
• designed to grow incrementally in size
• share, harmonize, transform data
Marine Knowledge All Projects meeting,
11-12 October 2012
e-Infrastructure Governance
Marc Taconet
(FAO)
Marine Knowledge All Projects meeting,
11-12 October 2012
e-Infrastructure governance
• Objective 1 of iMarine<<
To develop community-driven policies enabling:
– governance and operation of a data infrastructure,
– sharing of data and other resources, and processing data
Marine Knowledge All Projects meeting,
11-12 October 2012
– sharing of data and other resources, and processing data
In order to support the Community of Practice in implementing the Ecosystem
Approach to fishery management and marine living resource conservation
>>
� role of the iMarine Board
iMarine Boards’ tasks
• Mobilize user community
• Develop governance model
• Address systems’ harmonization
Marine Knowledge All Projects meeting,
11-12 October 2012
The iMarine Board
• Mobilize the user community
– Core set of influential partners
stimulated to work together on three
main business cases:Support to implementation of the EU Common Fishery Policy
• FAO
• DG MARE
• Eurostat
• NEAFC
• MEDDE/DOFFis
he
rie
s
Support to implementation of the EU Common Fishery Policy
Support to FAO’s deep seas fisheries programme
Support to regional tropical LME pelagic EAF community
– Raise awareness on the offer
– Position/align the offer versus the needs
Marine Knowledge All Projects meeting,
11-12 October 2012
• IRD
• ICES
• IOC/OBIS
• FIN
• CRIA
• VLIZ
• T2/GENESI-DEC
En
vir
on
tB
io-d
ive
rsit
y
The iMarine Board
• Develop Governance model, with sustainability
focus» Governance is the combination of
processes (including relationships among stakeholders),
structures (including formal and informal institutions), and
Marine Knowledge All Projects meeting,
11-12 October 2012
structures (including formal and informal institutions), and
instruments (policies, laws)
implemented by the board,
» through which stakeholders interests are articulated, right and
obligations are established, and differences are mediated
iMarine BoardAdvisory
Council
Recomm.
Explain
Marine Community
Project
Boards
iMarine Project
Eurostat
DGMare
• An interface between the CoP and the data infrastructure owners
• Designed to allow control of EA Community on Data Infrastructure Developments
• Forerunner of a Governance structure
The iMarine Board –develop Governance model
Marine Knowledge All Projects meeting,
11-12 October 2012
DecisionRecommendations
ICESEurostat
UNEPIUCN(GOBI)
FAO
RFBs
FIGIS
Emodnet
Cat.of LifeEcoscope
IFSA
OBIS
The iMarine Board
• Develop Governance model, with sustainability
focus
– Policies as governing instruments
• Data access and sharing policies
• Software and hardware sharing policies
Marine Knowledge All Projects meeting,
11-12 October 2012
• Software and hardware sharing policies
The iMarine Board
• Address the harmonization of systems
– Rationalizing solutions among partners (cost efficiency)
• 4 technical clusters: Biodiversity, Geo-spatial, Statistical, semantics
– Agreeing on “iMarine” standards
• OGC, Darwin Core, SDMX, RDF
• Includes promotion of new standard (e.g. FLUX)
– Mainstreaming requirements and specifications
�Operationalization of the agreed formats and protocols
Marine Knowledge All Projects meeting,
11-12 October 2012
The iMarine Board• Result of Systems harmonization
VLIZ
IOC
T2
MyOcean
SeaDatanet
Other sources
Oce
an
en
viro
nm
en
tTa
xon
om
y
Emodnet
WORMS
OBIS
Niche Niche Policy fmk
FLUX std
Policy fmk
FLUX std
xxx
Geo-
Processing
services
Geo-
Processing
services
Marine Knowledge All Projects meeting,
11-12 October 2012
FIN
CRIA
IRD FAO
SDMX
FIGIS
FLOD
FisheriesBiodiversity
ESTAT
DG-MARE
OGC
SDMX
National DOF
Ecoscope
ICES
SDMXDCRDF
RDB
Aquamaps
Niche
modelling
algorythms
Niche
modelling
algorythms Open Source
software
OpenSSDMX -
CLM
Open Source
software
OpenSSDMX -
CLM
FLUX stdFLUX std
Conclusion
What’s peculiar about iMarine ?
Marine Knowledge All Projects meeting,
11-12 October 2012
What’s peculiar about iMarine ?
What are we looking for ?
What’s peculiar about iMarine
• The iMarine resources offer is about
– Integrating and managing services across systems’
administrative boundaries
– End users:– End users:
• Interactive gateway for collaborative science: VREs
– Infrastructure owners:
• Platform providing outsourcing services: computing,
distribution, scaling, interoperability
• Cloud hosting services
Marine Knowledge All Projects meeting,
11-12 October 2012
• iMarine Positioning in Open data perspective
O
B
I
S
O
B
I
S
SE
AD
ATA
NE
T
Ge
no
mic
s
F
I
R
M
S
F
I
R
M
S
iMarine
Thematic
aggregators
Cross-thematic
federator
What’s peculiar about iMarine
Marine Knowledge All Projects meeting,
11-12 October 2012
SE
AD
ATA
NE
T
GE
OS
EA
S
Ge
no
mic
s
SS
Data locked
in isolated
computers
• Strategic alliances with Federators:
– Thematic aggregators of data providers
For major impact, we look for
Marine Knowledge All Projects meeting,
11-12 October 2012
– Thematic aggregators of data providers
– Scientists end-user needs
– Governance and policy models
LandscapeLandscape
D4Science e-InfrastructureD4Science e-Infrastructure
gCube FrameworkgCube Framework
Thanks for your attention
www.i-marine.org
portal.i-marine.d4science.org
Marine Knowledge All Projects meeting,
11-12 October 2012
gCube FrameworkgCube Framework
gCube AppsgCube Apps
DiscussionDiscussion
portal.i-marine.d4science.org
www.gcube-system.org
gcube.wiki.gcube-system.org