iMarine Data eInfrastructure Ini$a$ve for Fisheries Management and Conserva$on of Marine Living Resources Data access, harmonization, analysis, and management platform Pasquale Pagano (CNR) iMarine Technical Director pasquale.pagano@is$.cnr.it 22 July 2014 Open Channel Webinar Gianpaolo Coro (CNR) iMarine Data Analyst gianpaolo.coro@is$.cnr.it
59
Embed
iMarine data e-infrastructure: Data access, harmonization, analysis, and management platform
On the 22 July 2014, OpenChannels.org and the EBM Tools Network, two of the premier sources of information about coastal and marine planning and management tools in the United States and internationally, hosted the iMarine webinar: iMarine Data e-Infrastructure Initiative for Fisheries Management and Conservation of Marine Living Resources.
The webinar focused on the presentation of the iMarine initiative and its powerful data e-infrastructures and services, followed by a presentation of a set of use cases related to Geospatial Analysis, Ecology, Biodiversity and Life History Traits. The presentations were given by Pasquale Pagano, CNR-ISTI and iMarine Technical Director and Gianpaolo Coro, CNR-ISTI. Watch the video of the webinar here https://www.youtube.com/watch?v=lgf30BPyBbk
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
iMarine Data e-‐Infrastructure Ini$a$ve for Fisheries Management and Conserva$on of Marine Living Resources
Data access, harmonization, analysis, and
management platform
Pasquale Pagano (CNR) iMarine Technical Director pasquale.pagano@is$.cnr.it
22 July 2014
Open Channel Webinar
Gianpaolo Coro (CNR) iMarine Data Analyst
gianpaolo.coro@is$.cnr.it
Pasquale Pagano • Master Degree in Computer Science • Ph.D in Informa$on Engineering on Distributed Systems • it.linkedin.com/in/pasqualepagano/
Gianpaolo Coro • Master Degree in Physics -‐ Cyberne$cs • Ph.D in Computer Science • it.linkedin.com/pub/gianpaolo-‐coro/16/665/b5a
2 iMarine -‐ Just an overview
Concepts
iMarine -‐ Just an overview
The ini=a=ve (the visionary leadership)
The e-‐infrastructure (the opera$onal plaUorm)
The system (the enabling sw system)
3
THE INITIATIVE Distinguishing capabilities of the iMarine initiative
iMarine -‐ Just an overview 4
iMarine Objec$ve
5 iMarine -‐ Just an overview
Launch an Ini$a$ve aimed at establishing and opera$ng a data Infrastructure suppor$ng the
principles of the Ecosystem Approach
to Fisheries Management and Conserva=on of Marine Living Resources
Nov 2011
Sept 2014
Apr 2016
Address system harmoniza$on
6 iMarine -‐ Just an overview
VLIZ
IOC
FIN
CRIA
IRD FAO
T2
MyO
cean SeaDatanet Other sources
FIGIS
Fisheries
Ocean environm
ent
Biodiversity
ESTAT
DG-‐MARE Na$onal DOF
Ecoscope
ICES RDB
Taxono
my
Emod
net B
iology
WORM
S OBIS
Aquamaps
Niche modelling algorithms Open Source so^ware
Open SDMX -‐ CLM
SDMX
GBIF
EoL
NEAFC
FishBa
se
Courtesy by Marc Taconet (FAO)
Role of the iMarine Board
• Mobilize user community – Core set of influen$al partners mobilized to work on two main business cases:
Support to implementa=on of the EU Common Fishery Policy Support to FAO’s deep seas fisheries programme
• Develop governance model – Public partnerships – Policies – data sharing – so^ware sharing
Distinguishing capabilities of the iMarine e-infrastructure and its enabling software
iMarine -‐ Just an overview 8
Concepts and Defini$ons
iMarine -‐ Just an overview 9
The D4Science infrastructure
iMarine is exploi$ng a Hybrid Data Infrastructure combining over 500 so^ware components into a
coherent and centrally managed system of hardware, so^ware, and data resources.
iMarine -‐ Just an overview 10
Born from the user needs
11 iMarine -‐ Just an overview
I need to host my applica$ons in a secure and scalable environment
I need to maintain my database
I need to backup my data
I need to securely delivery my data to a set of known people
I want to offer a flexible sharing, storage, repor$ng, search and retrieval tool
Born from the user needs
12 iMarine -‐ Just an overview
I need to manage and analyze biological and ecological data
I need to manage the full data life-‐cycle from import to valida$on, cura$on, harmoniza$on and publica$on
I need to offer to my team a powerful tool to manage code-‐lists
I need to store and analyze geospa$al explicit informa$on
I need to reduce the costs of data maintenance of my dept.
Born from the user needs
13 iMarine -‐ Just an overview
I need to access authorita$ve biological and ecological data
I need to simplify the access to my geospa$al data
I need to mash-‐up sta$s$cal and biodiversity data
I need to validate my datasets and provide a standard access to them
I need to analyse my big datasets
User Needs Analysis
14 iMarine -‐ Just an overview
• Needs – Not isolated – Not disconnected – Not trivial
• Solu$ons – Actual and with an eye to the future
– Designed for individuals and looking at the community
iMarine e-‐infrastructure
iMarine is exploi$ng D4Science.org
iMarine -‐ Just an overview 15
Geographically Distributed Compu$ng
Infrastructure
Across administra$ve boundaries
Across private and commercial
providers
Service Alloca$ons, Deployment,
Monitoring, and Opera$on
Uniform resource and data access
Opera$on Built on SLAs
Support monitoring, audi$ng, repor$ng, and no$fica$on
Trust Privacy, governance, and apribu$on
Security, trusted network
Infrastructure: key characteris$cs • Efficient and tailored storage technologies
• Computa=onal environments dealing with the volume of the data
• Elas=c management of the resources, monitoring, aler$ng, recovery
• Collabora=ve environment to support scien$fic communi$es
• Rich porQolio of applica=ons to perform access, valida$on, enriching, processing, sharing, and mash-‐up of data
iMarine -‐ Just an overview 16
Capaci$es: Storage as Service
17 iMarine -‐ Just an overview
to host and maintain data
Database High-‐availability
Standard Ready-‐to-‐use
Cloud Storage Scalable Reliable Secure
Geographical DB Scalable
OGC Standard Privacy and AEribuFon
Capaci$es: Compu$ng as Service
18 iMarine -‐ Just an overview
to process and extract knowledge
Scalable Easy to Manage Across Boundaries
Tailored
Elas<c Assignment of CompuFng Assignment of Processors
Virtual Research Environment
Rich and Heterogeneous High Throughput Map-‐Reduce Parallel R
Applica$ons as a Service
19 iMarine -‐ Just an overview
to curate and manage data
Metadata Genera<on GeospaFal Data Biodiversity Data StaFsFcal Data
Harmoniza<on Disambiguate
Validate Integrate and Consistency Check
Data Exchange OGC protocols DarwinCore
SDMX
THE APPLICATIONS CATALOGUE
Distinguishing capabilities of the iMarine catalogue of applications
iMarine -‐ Just an overview 20
Management and interpreta$on of biological and ecological data in the environment
Complete full life-‐cycle data framework, from observa$onal data to aggregated data repositories enriched with valida$on and analy$cal tools
Storage and interpreta$on of geospa$al explicit informa$on, including WPS processing
Flexible sharing, storage, repor$ng, search and retrieval, aggrega$on and projec$on facili$es
Applica$ons as a Service
iMarine -‐ Just an overview
A BUNDLE is a set of
services and technologies grouped
according to a family of related tasks for
achieving a common objec$ve
21
Occurrence and Taxonomic Data Discovery Occurrence Data Processing Species Distribu=on Modeling Species Distribu=on Maps Discovery Taxonomic Data Comparison Taxonomic Data Matching
Code List Discovery Code List Management Sta=s=cal Engine Tabular Data Discovery Tabular Data Enrichment Tabular Data Management Tabular Data Processing
Geospa=al Data Discovery Geospa=al Data Processing
Distinguishing capabilities of the iMarine catalogue of applications
iMarine -‐ Just an overview 23
iMarine
OBIS WoRMS
WoRDS
GBIF
CoL
ITIS
IRMNG NCBI
MyOcean
WOA
EuroStat
Data.FAO
…
Data
24 iMarine -‐ Just an overview
iMarine Registries
Valida=on
Enriching
Processing
Sharing
Data
Ontologies and Data
Warehouses
Biological and
Ecological Data
GeoSpa$al Data
Sta$s$cal Data
Documents
iMarine -‐ Just an overview
DarwinCore / ISO19139 >35 M Observa$ons (OBIS) ≈ 120 K Observed Species (OBIS) ≈ 500 K Taxa (WoRMS) >600 K Scien$fic Names (ITIS) >12 K Species Maps (AquaMaps) ≈ 600 Species Extent (FAO) … FishBase, SeaLifeBase … CoL, GBIF
Distinguishing capabilities of the iMarine collaborative environment
iMarine -‐ Just an overview 26
Is this enough? • An ecosystem of par$cipatory data e-‐Infrastructures
• Regulated by policies • Enabled by standards • Promo$ng not only access but mash-‐up of heterogeneous data
iMarine -‐ Just an overview
User centric 27
Virtual Research Environment iMarine is user-‐centric and workflow-‐oriented thanks to the gCube VRE technology
Virtual Research Environment (VRE) is • a distributed and dynamically created environment • where subset of data, services, computa$onal, and storage resources • regulated by tailored policies • are assigned to a subset of users via interfaces • for a limited =meframe • at li_le or no cost for the providers of the par$cipatory data e-‐infrastructures
iMarine -‐ Just an overview
L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12
28
e-‐Infrastructure VRE VRE
VRE
Virtual Research Environment
29 iMarine -‐ Just an overview
to share and collaborate
Share Database Tables
Workflow Files
Communicate Post
Favourite ConnecFon
Organize Dynamic VRE CreaFon
Secure Policy Control
Infrastructure: Collabora$ve Environment
iMarine -‐ Just an overview
A single place to • Get status and updates from applica$ons and other users they are interested in; • Get no$fica$ons about messages, jobs comple$on, new generated products, etc.
30
Share Updates
User news feed
VREs user is member of
Infrastructure: Collabora$ve Environment
iMarine -‐ Just an overview
A single place to • Get status and updates from applica$ons and other users they are interested in; • Get no$fica$ons about messages, jobs comple$on, new generated products, etc.
Feeds fromApplications
Feed from Users
31
Infrastructure: Collabora$ve Environment
iMarine -‐ Just an overview
A single place to • Manage data, store and preserve them • Share data
32
THE SOFTWARE
Distinguishing capabilities of gCube software
iMarine -‐ Just an overview 33
iMarine Technology
• iMarine is powered by gCube
iMarine -‐ Just an overview 34
openhub.net
USE CASES Few examples of the analytics capabilities
iMarine -‐ Just an overview 35
Geospa$al Analysis
Ecology
Biodiversity
Life History Traits
Prac$cal Examples
iMarine -‐ Just an overview 36
Geospa$al Analysis
iMarine -‐ Just an overview 37
Rasteriza$on
A polygonal map is transformed into a raster map or into a point map
Es$mated impact of climate changes over 20 years on 11549
species. Pseudanthias evansi
The occupancy by the Pseudanthias evansi
decreases in Area 71 but increases in Area 77
Bioclimate HSpec
Overall occupancy in =me
iMarine -‐ Just an overview 47
Similarity between habitats Habitat Representa$veness Score:
1. Measures the similarity between the environmental features of two areas
2. Assesses the quality of models and environmental features
HRS=10.5
Habitat Representa$veness
Score
La$meria chalumnae
iMarine -‐ Just an overview 48
Occurrence Data from GBIF Occurrence Data from Obis
∩ Intersec=on
-‐ Difference
ᴜ Union
A
x,y
Event Date
Modif Date
Author
Species Scien=fic Name
Occurrence Points
B
x,y
Event Date
Modif Date
Author
Species Scien=fic Name
Records
Similarity
DD Duplicates Dele=on
iMarine -‐ Just an overview 49
BiOnym
A flexible workflow approach to taxon name matching Accounts for: • Varia$ons in the spelling and
interpreta$on of taxonomic names
• Combina$on of data from different sources
• Harmoniza$on and reconcilia$on of Taxa names
Raw Input String Gadus morua Lineus 1758
Correct Transcrip$on: Gadus morhua (Linnaeus, 1758)
Preprocessing And
Parsing
Taxon name Matcher 1
Taxon name Matcher 2
Taxon name Matcher n
PostProcessing
Reference Source (ASFIS)
Reference Source
(FISHBASE)
Reference Source
(WoRMS)
Reference Source (Other in DwC-‐A)
iMarine -‐ Just an overview 50
Trendylyzer -‐ Recognize Big Changes in Species Presence
• Fill some knowledge gaps on marine species • Account for sampling biases • Define trends for common species
Plankton regime shift
Herring recovered after the fish ban
iMarine -‐ Just an overview 51
iMarine -‐ Just an overview 52
Life History Traits
𝑊=𝑎𝐿↑𝑏 Calculate the a and b parameters for 14 230 species by means of Bayesian Methods
Length-‐Weight Rela$onships
Approach: Ø Collabora$ve development with the final user Ø Integra$on of user’s R Scripts Ø Usage of parallel processing for R Scripts Ø Periodic runs
bluewatermag.com.au
Ø The por$ng to the D4Science Sta$s$cal Manager allowed to run the scripts in distributed fashion
Ø The $me reduc$on was from 20 days to 11 hours! 95.4% reduc=on
iMarine -‐ Just an overview 53
Safe Biological Limits of Large Stocks
Re-‐es$mated SSB limit
Re-‐es$mated HS
Rule-‐based HS
Re-‐es$mated precau$onary limit
Es=mate biological limits for 50 Northeast Atlan=c fish stocks Ø Use real measures Ø Rely on previous expert knowledge Ø Use Bayesian models to combine
informa$on
iMarine -‐ Just an overview 54
Resilience vs Produc$vity of a Species
Best Resilience and Produc$vity pair for the species
iMarine -‐ Just an overview 55
THE WAY TO USE IT
Distinguishing capabilities of the exploitation models