This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Project Acronym: DataBio Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action) Project Full Title: Data-Driven Bioeconomy Project Coordinator: INTRASOFT International DELIVERABLE D6.2 – Data Management Plan Dissemination level PU -Public Type of Document Report Contractual date of delivery M06 – 30/6/2017 Deliverable Leader CREA Status - version, date Final – v1.0, 30/6/2017 WP / Task responsible WP6 Keywords: Data management plan, big data, bioeconomy
83
Embed
D6.2 Data Management Plan - DATABIO Data-driven Bioeconomy · D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017 Dissemination level: PU -Public Page
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee.
Project Acronym: DataBio
Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action)
Project Full Title: Data-Driven Bioeconomy
Project Coordinator: INTRASOFT International
DELIVERABLE
D6.2 – Data Management Plan
Dissemination level PU -Public
Type of Document Report
Contractual date of delivery M06 – 30/6/2017
Deliverable Leader CREA
Status - version, date Final – v1.0, 30/6/2017
WP / Task responsible WP6
Keywords: Data management plan, big data, bioeconomy
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 2
Executive Summary This document presents DataBio’s D6.2 deliverable, Data Management Plan (DMP), the key
element of good data management. DataBio participates in the European Commission H2020
Program’s extended open research data pilot and hence, a DMP is required. And,
consequently, DataBio project’s datasets will be as open as possible and as closed as
necessary, focusing on sound big data management for the sake of best research practice,
and in order to create value, and foster knowledge and technology out of big datasets for the
good of man. The deliverable describes the data management life cycle for the data to be
collected, processed and/or generated by DataBio project, accounting also for the necessity
to make research data findable, accessible, interoperable and reusable (FAIR).
DataBio’s partners will be encouraged to adhere to sound data management to ensure that
data are well-managed, archived and preserved. Data preservation is synonymous to data
relevance since: (1) data can then be reused by other researchers, (2) data collector can direct
requests for data to the database, rather than address requests individually, (3) preserved
data have the potential to lead to new, unanticipated discoveries, (4) preserved data prevent
duplication of scientific studies that have already been conducted, and (5) archiving data
insures against loss by the data collector. The main issues addressed in this deliverable
include: (1) the purpose of data collection, (2) data type, format, size, velocity, beneficiaries,
and provenance, (3) use of historical data, (4) making data FAIR, (5) data management
support, (6) data security, and (7) ethical aspects.
Doubtless, big data is a new paradigm and is coercing changes in businesses and other
organizations. A few entities in EU are starting to manage the massive data sets and non-
traditional data structures that are typical of big data and/or managing big data by extending
their data management skills and their portfolios of data management software. Big data
management empowers those entities to efficiently automate business operations, operate
closer to real time, and through analytics, add value and learn valuable new facts about
business operations, customers, partners, etc. Within the DataBio framework, big data
management (BDM), is a mixture of conventional and new best practices, skills, teams, data
types, and in-house grown or vendor-built functionality. All of these are being realigned under
DataBio platform built upon partners own experiences and tools. It is anticipated that DataBio
will provide a solution which will assume that datasets will be distributed among different
infrastructures and that their accessibility could be complex, needing to have mechanisms
which facilitate data retrieval, processing, manipulation and visualization as seamlessly as
possible. The infrastructure will open new possibilities for ICT sector, including SMEs to
develop new Bioeconomy 4.0 and will also open new possibilities for companies from the
Earth Observation sector.
Some partners have scaled up pre-existing applications and databases to handle burgeoning
volumes of relational big data, or they have acquired new data management platforms that
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 3
are purpose-built for managing and analyzing multi-structured big data, including streaming
big data. Others are evaluating big data platforms in order to create a brisk market of vendor
products and services for managing and harnessing big data. The Hadoop Distributed File
System (HDFS), MapReduce, various Hadoop tools, complex event processing (for streaming
big data), NoSQL databases (for schema-free big data), in-memory databases (for real-time
analytic processing of big data), private clouds, in-database analytics, and grid computing, will
be some of the software products implemented within the DataBio framework.
During the lifecycle of the DataBio project, big data will be collected that is, very large data
sets (multi-terabyte or larger) consist of a wide range of data types (relational, text, multi-
structured data, etc.) from numerous sources. Most data will come from farm and forestry
machinery, fishing vessels, remote and proximal sensors and imagery, and many other
technologies. DataBio is purposefully collecting big data, specifically:
• Forestry: Big Data methods are expected to bring the possibility to both increase the value of the forests as well as to decrease the costs within sustainability limits set by natural growth and ecological aspects. The key technology is to gather more and more accurate information about the trees from a host of sensors including new generation of satellites, UAV images, laser scanning, mobile devices through crowdsourcing and machines operating in the forests.
• Agriculture: Big Data in Agriculture is currently a hot topic. DataBio aims at building a European vision of Big Data for agriculture. This vision is to offer solution which will increase role of Big Data role in Agri Food chains in Europe: a perspective, which prepared recommendation for future big data development in Europe.
• Fisheries: the ambition of this project is to herald and promote the use of Big Data analytical tools within fisheries applications by initiating several pilots which will demonstrate benefits of using Big Data in an analytical way for the fisheries, such as improved analysis of operational data, tools for planning and operational choices, crowdsourcing methods for fish stock estimation.
This is the first version of DataBio DMP; it will be updated over the course of the project as
warranted by significant changes arising during the project implementation, and the
requirements of the project consortium. At least two updates will be prepared, on Months
18 and 36 of the project.
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 4
Deliverable Leader: Ephrem Habyarimana (CREA)
Contributors:
Jaroslav Šmejkal (ZETOR), Tomas Mildorf (UWB), Bernard
Simonis (OGSE), Christian Zinke (INFAI), Karel Charvat (LESPRO)
Reviewers: Kyrill Meyer (INFAI), Tomas Mildorf (UWB), Erwin Goor (VITO),
Fabiana Fournier (IBM), Marco Folegani (MEEO)
Approved by: Athanasios Poulakidas (INTRASOFT)
Document History
Version Date Contributor(s) Description
0.1.1-2 12/05/2017 Ephrem
Habyarimana TOC
0.1.3 22/05/2017 Ephrem
Habyarimana Reviewed TOC, First assignments
0.2 30/05/2017 Tomas Mildorf Section 4.1 FAIR data costs
0.3 05/06/2017 Bernard Stevenot Section 6 Ethical issues
0.4 09/06/2017
Irene Matzakou,
Athanasios
Poulakidas
Section 5.4 - 5.5 Privacy and sensitive data
management
0.5.1 21/06/2017 Ingo Simonis Section 3.3 and 3.4 added
0.5.2 22/06/2017 Christian Zinke,
Jaroslav Šmejkal
Sections 2.2.4.4 Machine-generated data
and 4.2 added
0.6 23/06/2017 Ephrem
Habyarimana
Added: Executive summary, sections 1.2 &
2.1, and chapter 7
0.7 27/06/2017 Ephrem
Habyarimana
added section 1.3 and made edits
throughout the document.
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 5
0.8 28/06/2017 Tomas Mildorf
Update of Section 2.2.4.3, Section 2.5.4,
Section 2.5.5, Section 3.1.3 and Section
4.1
0.9 30/06/2017 Ephrem
Habyarimana
Included all tables for currently described
DataBio’s datasets; overall edit of entire
document.
1.0 30/06/2017 Athanasios
Poulakidas
Compliance to submission format and
minor changes.
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 6
Table of Contents EXECUTIVE SUMMARY ..................................................................................................................................... 2
TABLE OF CONTENTS ........................................................................................................................................ 6
TABLE OF FIGURES ........................................................................................................................................... 8
LIST OF TABLES ................................................................................................................................................ 8
DEFINITIONS, ACRONYMS AND ABBREVIATIONS ............................................................................................. 9
DATA SUMMARY .................................................................................................................................. 15
2.1 PURPOSE OF DATA COLLECTION ...................................................................................................................... 15 2.2 DATA TYPES AND FORMATS ........................................................................................................................... 17
2.2.1 Structured data ............................................................................................................................. 17 2.2.2 Semi-structured data .................................................................................................................... 17 2.2.3 Unstructured data ......................................................................................................................... 19 2.2.4 New generation big data .............................................................................................................. 19
2.3 HISTORICAL DATA ........................................................................................................................................ 25 2.4 EXPECTED DATA SIZE AND VELOCITY ................................................................................................................. 26 2.5 DATA BENEFICIARIES .................................................................................................................................... 26
FAIR DATA ............................................................................................................................................ 31
3.1 DATA FINDABILITY ....................................................................................................................................... 31 3.1.1 Data discoverability and metadata provision ............................................................................... 31 3.1.2 Data identification, naming mechanisms and search keyword approaches................................. 33 3.1.3 Data lineage .................................................................................................................................. 34
3.2 DATA ACCESSIBILITY ..................................................................................................................................... 37 3.2.1 Open data and closed data ........................................................................................................... 37 3.2.2 Data access mechanisms, software and tools .............................................................................. 38 3.2.3 Big data warehouse architectures and database management systems ..................................... 38
3.3 DATA INTEROPERABILITY ............................................................................................................................... 40 3.3.1 Interoperability mechanisms ........................................................................................................ 41 3.3.2 Inter-discipline interoperability and ontologies ............................................................................ 41
3.4 PROMOTING DATA REUSE .............................................................................................................................. 42
DATA MANAGEMENT SUPPORT ............................................................................................................ 43
4.1 FAIR DATA COSTS........................................................................................................................................ 43
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 7
4.2 BIG DATA MANAGERS ................................................................................................................................... 43 4.2.1 Project manager ........................................................................................................................... 43 4.2.2 Business Analysts .......................................................................................................................... 44 4.2.3 Data Scientists .............................................................................................................................. 44 4.2.4 Data Engineer / Architect ............................................................................................................. 44 4.2.5 Platform architects ....................................................................................................................... 44 4.2.6 IT/Operation manager .................................................................................................................. 44 4.2.7 Consultant ..................................................................................................................................... 45 4.2.8 Business User ................................................................................................................................ 45 4.2.9 Pilot experts .................................................................................................................................. 45
DATA SECURITY .................................................................................................................................... 46
5.1 INTRODUCTION ........................................................................................................................................... 46 5.2 DATA RECOVERY .......................................................................................................................................... 47 5.3 PRIVACY AND SENSITIVE DATA MANAGEMENT ................................................................................................... 48
5.3.1 Introduction .................................................................................................................................. 48 5.3.2 Enterprise Data (commercial sensitive data) ................................................................................ 48 5.3.3 Personal Data................................................................................................................................ 49
5.4 GENERAL PRIVACY CONCERNS ........................................................................................................................ 50
APPENDIX A DATABIO DATASETS ........................................................................................................... 55
A.1 SMART POI DATA SET (UWB - D03.01) .................................................................................................... 56 A.2 OPEN TRANSPORT MAP (UWB - D03.02) ................................................................................................. 58 A.3 SENTINELS SCIENTIFIC HUB DATASETS VIA FEDEO GATEWAY (SPACEBEL -D07.01) .......................................... 60 A.4 NASA CMR LANDSAT DATASETS VIA FEDEO GATEWAY (SPACEBEL - D07.02) ............................................... 61 A.5 OPEN LAND USE (LESPRO - D02.01) ......................................................................................................... 62 A.6 FOREST RESOURCE DATA (METSAK - D18.01) ............................................................................................ 64 A.7 CUSTOMER AND FOREST ESTATE DATA (METSAK - D18.02) .......................................................................... 65 A.8 STORM DAMAGE OBSERVATIONS AND POSSIBLE RISK AREAS (METSAK - D18.03) .............................................. 67 A.9 QUALITY CONTROL DATA (METSAK - D18.04) ........................................................................................... 68 A.10 ONTOLOGY FOR (PRECISION) AGRICULTURE (PSNC - D09.01) ....................................................................... 69 A.11 WUUDIS DATA (MHGS - D20.01) ............................................................................................................ 71 A.12 SIGPAC (TRAGSA - D11.05) .................................................................................................................... 72 A.13 FIELD DATA - PILOT B2 (TRAGSA - D11.07)................................................................................................. 74 A.14 IACS (NP - D13.01) .............................................................................................................................. 75 A.15 SENTINEL DATA ...................................................................................................................................... 76 A.16 TREE SPECIES MAP (FMI - D14.03) .......................................................................................................... 76 A.17 STAND AGE MAP (FMI - D14.04) ............................................................................................................. 77 A.18 CANOPY HEIGHT MAP (FMI - D14.05) ....................................................................................................... 78 A.19 LEAF AREA INDEX (FMI - D14.06)............................................................................................................. 79 A.20 FOREST DAMAGE (FMI - D14.07) ............................................................................................................. 80 A.21 HYPERSPECTRAL IMAGE ORTHOMOSAIC (SENOP - D44.02) ............................................................................ 81 A.22 GAIATRONS IOT (DS13.01) ................................................................................................................... 81 A.23 PHENOMICS, METABOLOMICS, GENOMICS AND ENVIRONMENTAL DATASETS (CERTH - DS40.01) ......................... 82
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 8
Table of Figures FIGURE 1: DATABIO’S ANALYTICS AND BIG DATA VALUE APPROACH ..................................................................................... 16 FIGURE 2: THE PROCESSING DATA LIFECYCLE ................................................................................................................... 36 FIGURE 3: THE “DISCIPLINARY DATA INTEGRATION PLATFORM: WHERE DO YOU SSIT? (SOURCE: WYBORN) .................................. 41 FIGURE 4: DATABIO’S DATA MANAGERS ......................................................................................................................... 45 FIGURE 5: DATA LIFECYCLE .......................................................................................................................................... 46 FIGURE 6: THE DATA MODEL OF SMART POINTS OF INTEREST ............................................................................................ 58 FIGURE 7: THE DATA MODEL OF OPEN TRANSPORT MAP ................................................................................................... 60 FIGURE 8: FEDEO CLIENT (C07.05) ............................................................................................................................. 61
List of Tables TABLE 1: THE DATABIO CONSORTIUM PARTNERS ............................................................................................................. 10 TABLE 2: SENSOR DATA TOOLS, RESOLUTION AND SPATIAL DENSITY ..................................................................................... 20 TABLE 3: GEOSPATIAL DATA TOOLS, FORMAT AND ORIGIN ................................................................................................. 24 TABLE 4: GENOMIC, BIOCHEMICAL AND METABOLOMIC DATA TOOLS, DESCRIPTION AND ACQUISITION ........................................ 25
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 9
Definitions, Acronyms and Abbreviations Acronym/
Abbreviation Title
BDVA Big Data Value Association
EC European Commission
EO Earth Observation
ETL Extract Transform Load
DMP Data Management Plan
GSM Global System for Mobile
GSP Global Positioning System
FAIR Findable Accessible Interoperable and Reusable
HDFS Hadoop Distributed File System
ICT Information and Communications Technology
IoT Internet of Things
JDBC Java DataBase Connectivity
JSON JavaScript Object Notation
NoSQL Not Only SQL
OBDC Open Database Connectivity
OEM Object Exchange Model
OGC Open Geospatial Consortium
REST Representational State Transfer
RFID Radio-Frequency IDentification
RPAS Remotely Piloted Aircraft Systems
SME Small-Medium Enterprise
SOAP Simple Object Access Protocol
SQL Structured Query Language
UAV Unmanned Air Vehicle
UI User Interface
WP Work Package
XML eXtensible Markup Language
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 10
Introduction 1.1 Project Summary The data intensive target sector on which the
DataBio project focuses is the Data-Driven
Bioeconomy. DataBio focuses on utilizing Big
Data to contribute to the production of the
best possible raw materials from agriculture,
forestry and fishery (aquaculture) for the
bioeconomy industry, as well as their further
processing into food, energy and
biomaterials, while taking into account various accountability and sustainability issues.
DataBio will deploy state-of-the-art big data technologies and existing partners’ infrastructure
and solutions, linked together through the DataBio Platform. These will aggregate Big Data
from the three identified sectors (agriculture, forestry and fishery), intelligently process them
and allow the three sectors to selectively utilize numerous platform components, according
to their requirements. The execution will be through continuous cooperation of end user and
technology provider companies, bioeconomy and technology research institutes, and
stakeholders from the big data value PPP programme.
DataBio is driven by the development, use and evaluation of a large number of pilots in the
three identified sectors, where associated partners and additional stakeholders are also
involved. The selected pilot concepts will be transformed to pilot implementations utilizing
co-innovative methods and tools. The pilots select and utilize the best suitable market-ready
or almost market-ready ICT, Big Data and Earth Observation methods, technologies, tools and
services to be integrated to the common DataBio Platform.
Based on the pilot results and the new DataBio Platform, new solutions and new business
opportunities are expected to emerge. DataBio will organize a series of trainings and
hackathons to support its uptake and to enable developers outside the consortium to design
and develop new tools, services and applications based on and for the DataBio Platform.
The DataBio consortium is listed in Table 1. For more information about the project see [REF-
01].
Table 1: The DataBio consortium partners
Number Name Short name Country
1 (CO) INTRASOFT INTERNATIONAL SA INTRASOFT Belgium
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 11
2 LESPROJEKT SLUZBY SRO LESPRO Czech Republic
3 ZAPADOCESKA UNIVERZITA V PLZNI UWB Czech Republic
4
FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER
ANGEWANDTEN FORSCHUNG E.V. Fraunhofer Germany
5 ATOS SPAIN SA ATOS Spain
6 STIFTELSEN SINTEF SINTEF ICT Norway
7 SPACEBEL SA SPACEBEL Belgium
8
VLAAMSE INSTELLING VOOR TECHNOLOGISCH
ONDERZOEK N.V. VITO Belgium
9
INSTYTUT CHEMII BIOORGANICZNEJ POLSKIEJ
AKADEMII NAUK PSNC Poland
10 CIAOTECH Srl CiaoT Italy
11 EMPRESA DE TRANSFORMACION AGRARIA SA TRAGSA Spain
12 INSTITUT FUR ANGEWANDTE INFORMATIK (INFAI) EV INFAI Germany
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 13
47 ZETOR TRACTORS AS ZETOR Czech Republic
48
COOPERATIVA AGRICOLA CESENATE SOCIETA
COOPERATIVA AGRICOLA CAC Italy
1.2 Document Scope This document outlines DataBio’s data management plan (DMP), formally documenting how
data will be handled both during the implementation and upon natural termination of the
project. Many DMP aspects will be considered including metadata generation, data
preservation, data security and ethics, accounting for the FAIR (Findable, Accessible,
Interoperable, Re-usable) data principle. DataBio, Data-driven Bioeconomy project, is an
innovation big data intensive action involving public private partnership to promote
productivity on EU companies in three of the major bioeconomy sectors namely, Agriculture,
forestry and fishery. Experiences from US show that bioeconomy can get a significant boost
from Big Data. In Europe, this sector has until now attracted few large ICT vendors. A central
goal of DataBio is to increase participation of European ICT industry in the development of
Big Data systems for boosting the lagging bioeconomy productivity. As a good case in point,
European agriculture, forestry and fishery can benefit greatly from the European Copernicus
space program which has currently launched its third Sentinel satellite, telemetry IoT, UAVs,
etc.
Farm and forestry machinery, and fishing vessels in use today collect large quantities of data
in unprecedented pattern. Remote and proximal sensors and imagery, and many other
technologies, are all working together to give details about crop and soil properties, marine
environment, weeds and pests, sunlight and shade, and many other primary production
relevant variables. Deploying big data analytics in these data can help the farmers, foresters
and fishers to adjust and improve the productivity of their business operations. On the other
hand, large data sets such as those coming from the Copernicus earth monitoring
infrastructure, are increasingly available on different levels of granularity, but they are
heterogeneous, at times also unstructured, hard to analyze and distributed across various
sectors and different providers. It is here that data management plan comes in. It is
anticipated that DataBio will provide a solution which will assume that datasets will be
distributed among different infrastructures and that their accessibility could be complex,
needing to have mechanisms which facilitate data retrieval, processing, manipulation and
visualization as seamlessly as possible. The infrastructure will open new possibilities for ICT
sector, including SMEs to develop new Bioeconomy 4.0 and will also open new possibilities
for companies from the Earth Observation sector.
This DMP will be updated over the course of DataBio project whenever significant changes
arise. The updates of this document will increasingly provide in-depths on DataBio DMP
strategies with particular interest on the aspects of findability, accessibility, interoperability
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 14
and reusability of the Big Data the project produces. At least two updates will be prepared,
on Month 18 and Month 36 of the project.
1.3 Document Structure This document is comprised of the following chapters:
Chapter 1 presents an introduction to the project and the document.
Chapter 2 presents the data summary including the purpose of data collection, data size, type
and format, historical data reuse and data beneficiaries.
Chapter 3 outlines DataBio’s FAIR data strategies.
Chapter 4 describes data management support.
Chapter 5 describes data security.
Chapter 6 describes ethical issues.
Chapter 7 presents the concluding remarks.
Appendix A presents the managed data sets.
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 15
Data Summary 2.1 Purpose of data collection During the lifecycle of the DataBio project, big data will be collected that is, very large data
sets (multi-terabyte or larger) consisting of a wide range of data types (relational, text, multi-
structured data, etc.) from numerous sources, including relatively new generation big data
(machines, sensors, genomics, etc.). The ultimate purpose of data collection is to use the data
as a source of information in the implementation of a variety of big data analytics algorithms,
services and applications DataBio will deploy to create a value, new business facts and insights
with a particular focus on the bioeconomy industry. The big datasets are part of the building
blocks of the DataBio’s big data technology platform (Figure 1) that was designed to help
European companies increase productivity. Big Data experts provide common analytic
technology support for the main common and typical Bioeconomy applications/analytics that
are now emerging through the pilots in the project. Data from the past will be managed and
analyzed, including many different kind of data sources: i.e., descriptive analytics and classical
query/reporting (in need of variety management - and handling and analysis of all of the data
from the past, including performance data, transactional data, attitudinal data, descriptive
data, behavioural data, location-related data, interactional data, from many different
sources). Big data from the present time will be harnessed in the process of monitoring and
real-time analytics - pilot services (in need of velocity processing - and handling of real-time
data from the present) - trigging alarms, actuators etc.
Harnessing big data for the future time include forecasting, prediction and recommendation
analytics - pilot services (in need of volume processing - and processing of large amounts of
data combining knowledge from the past and present, and from models, to provide insight
for the future).
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 16
Figure 1: DataBio’s analytics and big data value approach
Specifically:
• Forestry: Big Data methods are expected to bring the possibility to both increase the value of the forests as well as to decrease the costs within sustainability limits set by natural growth and ecological aspects. The key technology is to gather more and more accurate information about the trees from a host of sensors including new generation of satellites, UAV images, laser scanning, mobile devices through crowdsourcing and machines operating in the forests.
• Agriculture: Big Data in Agriculture is currently a hot topic. The DataBio intention is to build a European vision of Big Data for agriculture. This vision is to offer solutions which will increase the role of Big Data role in Agri Food chains in Europe: a perspective, which will prepare recommendation for future big data development in Europe.
• Fisheries: the ambition is to herald and promote the use of Big Data analytical tools within fisheries applications by initiating several pilots which will demonstrate benefits of using Big Data in an analytical way for the fisheries, such as improved analysis of operational data, tools for planning and operational choices, crowdsourcing methods for fish stock estimation.
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 17
• The use of Big data analytics will bring about innovation. It will generate significant economic value, extend the relevant market sectors, and herald novel business/organizational models. The cross-cutting character of the geo-spatial Big Data solutions allows the straightforward extension of the scope of applications beyond the bio-economy sectors. Such extensions of the market for the Big Data technologies are foreseen in economic sectors, such as: Urban planning, Water quality, Public safety (incl. technological and natural hazards), Protection of critical infrastructures, Waste management. On the other hand, the Big Data technologies revolutionize the business approach in the geospatial market and foster the emergence of innovative business/organizational models; indeed, to achieve the cost effectiveness of the services to the customers, it is necessary to organize the offer to the market on a territorial/local basis, as the users share the same geospatial sources of data and are best served by local players (service providers). This can be illustrated by a network of European services providers, developing proximity relationships with their customers and sharing their knowledge through the network.
2.2 Data types and formats The DataBio specific data types, formats and sources are listed in detail in Appendix A; below
are described key features of the data used in the project.
2.2.1 Structured data
Structured data refers to any data that resides in a fixed field within a record or file. This
includes data contained in relational databases, spreadsheets, and data in forms of events
such as sensor data. Structured data first depends on creating a data model – a model of the
types of business data that will be recorded and how they will be stored, processed and
accessed. This includes defining what fields of data will be stored and how that data will be
stored: data type (numeric, currency, alphabetic, name, date, address) and any restrictions
on the data input (number of characters; restricted to certain terms such as Mr., Ms. or Dr.;
M or F).
2.2.2 Semi-structured data
Semi-structured data is a cross between structured and unstructured data. It is a type of
structured data, but lacks the strict data model structure. With semi-structured data, tags or
other types of markers are used to identify certain elements within the data, but the data
doesn't have a rigid structure. For example, word processing software now can include
metadata showing the author's name and the date created, with the bulk of the document
just being unstructured text. Emails have the sender, recipient, date, time and other fixed
fields added to the unstructured data of the email message content and any attachments.
Photos or other graphics can be tagged with keywords such as the creator, date, location and
keywords, making it possible to organize and locate graphics. XML and other markup
languages are often used to manage semi-structured data. Semi-structured data is therefore
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 18
a form of structured data that does not conform with the formal structure of data models
associated with relational databases or other forms of data tables, but nonetheless contains
tags or other markers to separate semantic elements and enforce hierarchies of records and
fields within the data. Therefore, it is also known as self-describing structure. In semi-
structured data, the entities belonging to the same class may have different attributes even
though they are grouped together, and the attributes' order is not important. Semi-structured
data are increasingly occurring since the advent of the Internet where full-text documents
and databases are not the only forms of data anymore, and different applications need a
medium for exchanging information. In object-oriented databases, one often finds semi-
structured data.
XML and other markup languages, email, and EDI are all forms of semi-structured data. OEM
(Object Exchange Model) was created prior to XML as a means of self-describing a data
structure. XML has been popularized by web services that are developed utilizing SOAP
principles. Some types of data described here as "semi-structured", especially XML, suffer
from the impression that they are incapable of structural rigor at the same functional level as
Relational Tables and Rows. Indeed, the view of XML as inherently semi-structured
(previously, it was referred to as "unstructured") has handicapped its use for a widening range
of data-centric applications. Even documents, normally thought of as the epitome of semi-
structure, can be designed with virtually the same rigor as database schema, enforced by the
XML schema and processed by both commercial and custom software programs without
reducing their usability by human readers.
In view of this fact, XML might be referred to as having "flexible structure" capable of human-
centric flow and hierarchy as well as highly rigorous element structure and data typing. The
concept of XML as "human-readable", however, can only be taken so far. Some
implementations/dialects of XML, such as the XML representation of the contents of a
Microsoft Word document, as implemented in Office 2007 and later versions, utilize dozens
or even hundreds of different kinds of tags that reflect a particular problem domain - in
Word's case, formatting at the character and paragraph and document level, definitions of
styles, inclusion of citations, etc. - which are nested within each other in complex ways.
Understanding even a portion of such an XML document by reading it, let alone catching
errors in its structure, is impossible without a very deep prior understanding of the specific
XML implementation, along with assistance by software that understands the XML schema
that has been employed. Such text is not "human-understandable" any more than a book
written in Swahili (which uses the Latin alphabet) would be to an American or Western
European who does not know a word of that language: the tags are symbols that are
meaningless to a person unfamiliar with the domain.
JSON or JavaScript Object Notation, is an open standard format that uses human-readable
text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit
data between a server and web application, as an alternative to XML. JSON has been
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 19
popularized by web services developed utilizing REST principles. There is a new breed of
databases such as MongoDB and Couchbase that store data natively in JSON format,
leveraging the pros of semi-structured data architecture.
2.2.3 Unstructured data
Unstructured data (or unstructured information) refers to information that either does not
have a pre-defined data model or is not organized in a pre-defined manner. This results in
irregularities and ambiguities that make it difficult to understand using traditional programs
as compared to data stored in “field” form in databases or annotated (semantically tagged)
in documents. Unstructured data can't be so readily classified and fit into a neat box: photos
and graphic images, videos, streaming instrument data, webpages, PDF files, PowerPoint
presentations, emails, blog entries, wikis and word processing documents.
In 1998, Merrill Lynch cited a rule of thumb that somewhere around 80-90% of all potentially
usable business information may originate in unstructured form. This rule of thumb is not
based on primary or any quantitative research, but nonetheless is accepted by some. IDC and
EMC project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from
the beginning of 2010. Computer World states that unstructured information might account
for more than 70%–80% of all data in organizations.
Software that creates machine-processable structure can utilize the linguistic, auditory, and
visual structure that exist in all forms of human communication. Algorithms can infer this
inherent structure from text, for instance, by examining word morphology, sentence syntax,
and other small- and large-scale patterns. Unstructured information can then be enriched and
tagged to address ambiguities and relevancy-based techniques then used to facilitate search
and discovery. Examples of "unstructured data" may include books, journals, documents,
metadata, health records, audio, video, analog data, images, files, and unstructured text such
as the body of an e-mail message, Web page, or word-processor document. While the main
content being conveyed does not have a defined structure, it generally comes packaged in
objects (e.g. in files or documents, …) that themselves have structure and are thus a mix of
structured and unstructured data, but collectively this is still referred to as "unstructured
data".
2.2.4 New generation big data
The new generation big data is in particular focusing on semi-structured and unstructured
data, often in combination with structured data.
In the BDVA reference model for big data technologies a distinction is done between 6
different big data types.
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 20
2.2.4.1 Sensor data
Within the Databio pilots, several key parameters will be monitored through sensorial
platforms and sensor data will be collected along the way to support the project activities.
Two types of sensor data have been already identified and namely, a) IoT data from in-situ
sensors and telemetric stations, b) imagery data from unmanned aerial sensing platforms
(drones), c) imagery from hand-held or mounted optical sensors.
2.2.4.1.1 Internet of Things data
The IoT data are a major subgroup of sensor data involved in multiple pilot activities in the
Databio project. IoT data are sent via TCP/UDP protocol in various formats (e.g. txt with time
series data, json strings) and can be further divided into the following categories:
• Agro-climatic/Field telemetry stations which contribute with raw data (numerical values) related to several parameters. As different pilots focus on different application scenarios, the following table summarizes several IoT-based monitoring approaches to be followed.
Table 2: Sensor data tools, resolution and spatial density
Pilot Mission, instrument Data resolution and spatial
density
A1.1,
B1.2,
C1.1,
C2.2
NP’s GAIAtrons, which are telemetry IoT stations
with modular/expandable design will be used to
monitor ambient temperature, humidity, solar
radiation, leaf wetness, rainfall volume, wind
speed and direction, barometric pressure
(GAIAtron atmo), soil temperature and humidity
(multi-depth) (GAIAtron soil)
Time step for data collection
every 10 minutes. One station
per microclimate zone (300ha -
1100 ha for atmo, 300ha -
3300ha for soil)
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
• Control data in the parcels/fields measuring sprinklers, drippers, metering devices, valves, alarm settings, heating, pumping state, pressure switches, etc.
• Contact sensing data that determine problems with great precision, speeding up the use of techniques which help to solve problems
• Vessel and buoy-based stations which contribute with raw data (numerical values), typically hydro acoustic and machinery data
2.2.4.1.2 Drone data
A specific subset of sensor data generated and processed within DataBio project is images
produced by cameras on-board drones or RPAS (Remotely Piloted Aircraft Systems). In
particular, some DataBio pilots will use optical (RGB), thermal or multispectral images and 3D
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 22
point-clouds acquired from RPAS. The information generated by drone-airborne cameras is
usually Image Data (JPEG or JPEG2000). A general description of the workflow is provided
below.
Data acquired by the RGB sensor
The RGB sensor acquires individual pictures in .JPG format, together with their ‘geotag’ files,
which are downloaded from the RPAS and processed into:
• .LAS files: 3D point clouds (x, y, z), which are then processed to produce Digital Models (Terrain- DTM, Surface-DSM, Elevation-DEM, Vegetation-DVM)
• .TIF files: which are then processed into an orthorectified mosaic. In order to obtain smaller files, mosaics are usually exported to compressed .ECW format.
Data acquired by the thermal sensor
The Thermal sensor acquires a video file which is downloaded from the RPAS and:
• split into frames in .TIF format (pixels contain Digital Numbers: 0-255)
• 1 of every 10 frames is selected (with an overlap of about 80%, so as not to process an excessive amount of information)
Data acquired by the multispectral sensor
The multispectral sensor acquires individual pictures from the 6 spectral channels in .RAW
format, which are downloaded from the RPAS and processed into:
• .TIF files (16 bits), which are then processed to produce a 6-bands .TIF mosaic (pixels contain Digital Numbers: 0-255)
2.2.4.1.3 Data from hand-held or mounted optical sensors
Images from hand-held or mounted cameras will be collected using truck-held or hand held
full Range / high resolution UV-VIS-NIR-SWIR Spectroradiometer.
2.2.4.2 Machine-generated data
Machine-generated data in the DataBio project are data produced by ships, boats and
machinery used in agriculture and in forestry (such as tractors). These data will serve for
further analysis and optimisation of processes in the bio-economy sector.
For illustration purposes, examples of data collected by tractors in agriculture are described.
Tractors are equipped by the following units:
• Control units for data control, data collection and analyses including dashboards, transmission control unit, hydrostatic or hydrodynamic system control unit, engine control unit.
• Global Positioning System (GPS) units or Global System for Mobile Communications (GSM) units for tractor tracking.
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 23
• Unit for displaying characteristics of field/soil characteristics including area, quality, boundaries and yields.
These units generate the following data:
• Identification of tractor + identification of driver by code or by RFID module.
• Identification of the current operation status.
• Time identification by the date and the current time.
• Tractor hours - monitoring working hours in time and place.
• Information from tachometer [Σ km] and [Σ working hrs and min].
• Identification of the current maintenance status.
• Tractor diagnostic: failure modes or failure codes
• Information about the date of the last calibration of each tractor systems + information about setting, information about SW version, last update, etc.
• The amount of fuel in the fuel tank [L].
• Online information about sudden loss of fuel in the fuel tank.
• Fuel consumption per trip / per time period / per kilometer (monitoring of fuel consumption in various dependencies e.g. motor load).
• Total fuel consumption per day [L/day].
• Engine speed [run/min].
• Possibility to online setup engine speed in range [run/min from - to], signaling when limits are exceeding.
• Current position of accelerator pedal [% from scale 0-100 %].
• Charging level of the main battery [V].
• Current temperature of the cooling weather [C ͦor F ͦ ].
• Current temperature of the motor oil [C ͦ or F ͦ ].
• Current temperature of after treatment [C ͦor F ͦ].
• Current temperature of the transmission oil [C ͦor F ͦ].
• Diagnosis gear shift [grades backward and forward].
• Current engine load [% from scale 0-100 %]
2.2.4.3 Geospatial data
The DataBio pilots will collect earth observation (EO) data from a number of sources which
will be refined during the project. Currently, it is confirmed that the following EO data will be
collected and used as input data:
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 24
Table 3: Geospatial data tools, format and origin
Mission,
instrument
Format Origin
Sentinel-1, C-SAR SLC, GRD Copernicus Open Access Hub
(https://scihub.copernicus.eu/)
Sentinel-2, MSI L1C Copernicus Open Access Hub
(https://scihub.copernicus.eu/)
Information about the expected sizes will be added, when the information becomes available.
In addition to EO data, DataBio will utilise other geospatial data from EU, national, local,
private and open repositories including Land Parcel Identification System data, cadastral data,
Open Land Use map (http://sdi4apps.eu/open_land_use/), Urban Atlas and Corine Land
Cover, Proba-V data (www.vito-eodata.be).
The meteo-data will be collected mainly from EO systems based and will be collected from
European data sources such as COPERNICUS products, EUMETSAT H-SAF products, but also
other EO data sources such as VIIRS and MODIS and ASTER will be considered. As
complementary data sources, the weather forecast models output (ECMWF) and the regional
weather services output usually based on ground weather stations can be considered
according to the specific target areas of the pilots."
2.2.4.4 Genomics data
Within the DataBio Pilot 1.1.2 different data will be collected and produced. Three categories
of data have been already identified for the Pilot and namely, a) in-situ sensors (including
image capture) and farm data, b) genomic data from plant breeding efforts in Green Houses
produced using Next Generation Sequencers (NGS), c) biochemical data of tomato fruits
produced by chromatographs (LC/MS/MS, GS/MS, HPLC).
In-situ sensors/Environmental outdoor: Wind speed and direction, Evaporation, Rain, Light
intensity, UVA, UVB.
In-situ sensors/Environmental indoor: Air temperature, Air relative humidity, Crop leaf
temperature (remotely and in contact), Soil/substrate water content, crop type, etc.).
Farm Data:
• In-Situ measurements: Soil nutritional status.
• Farm logs (work calendar, technical practices at farm level, irrigation information,).
The data platform and its architecture is one of the most important part of DataBio. In order
to ensure a valid platform design, systems integration and platform development, high
experienced platform architects are needed. This role will taken by Intrasoft, ATOS,
Fraunhofer IGD, SINTEF and VTT.
4.2.6 IT/Operation manager
Some of the realized pilots will be very processing intensive, which requires a very good
infrastructure. In order to provide and manage this infrastructure specific operation manager
are needed. This function will be fulfilled by PSNC and Softeam.
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 45
4.2.7 Consultant
Big Data Consultant are responsible for support, guidance and help within all design and
implementation phases. That includes high knowledge and practice in design big data
solutions as well as develop data pipelines that leverage structured and unstructured data
from multiple sources. The DataBio consortium have several partners which fulfil this role,
including SpaceBel, CIAOTECH, InfAI, FMI, Federunacoma, University of St. Gallen, CITOLIVA
and OGC
4.2.8 Business User
Business users are direct (business) beneficiaries of the developed DataBio solutions. Further,
they are important to specify detailed domain requirements and implement the solutions.
These partners are TRAGSA, Neuropublic, Finnish Forest Centre, MHG, LIMETRI, Kings Bay,
Eros, Ervik & Saevik, Liegruppen Fiskeri, Norges Sildesalgslag SA, GAIA, MEEO, Echebastar,
Novamont, Rikola, UPV/EHU, ZETOR and CAC
4.2.9 Pilot experts
In order to specify and prioritize requirements as well as manage the different pilots, finding
synergies and connecting the different experts into the pilot, domain experts are needed.
These are Lesprojekt, FMI, VTT, SINTEF, Finnish Forest Centre and AZTI.
Figure 4: DataBio’s data managers
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 46
Data security 5.1 Introduction In order to be able to address data security properly, one has to identify the various phases
of data lifecycle, from their creation, to their use, sharing, archive and deletion. Handling
project data securely throughout their lifecycle lays the foundations of a sensitive data
protection strategy. In this context, the project consortium will determine specific security
controls to apply in each phase, evaluating during the course of the project their level of
compliance. Those data lifecycle phases are featured in the image below and are summarized
as follows:
Figure 5: Data lifecycle
1. Phase 1: Create
This first phase includes the creation of structured or unstructured (raw) data. For the needs
of the DataBio project, those sensitive data are classified in the following categories: a)
Enterprise Data (commercially sensitive data), b) Personal Data (personal sensitive data) and
c) other data that are not applicable in one of the previous categories. Especially for the
enterprise data, upon the creation phase already, security classification occurs based on an
enterprise data security policy.
2. Phase 2: Store
Once data is created and included in a file, then it is stored somewhere. What needs to be
ensured is that stored data is protected and the necessary data security controls have been
implemented, so as to secure and minimize risk of information leak, ensuring efficient data
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 47
privacy. More information about this phase is found in sections 5.2 about data recovery and
5.3 about secure storage.
3. Phase 3: Use
During this phase when data is viewed, processed, modified and saved, security controls are
directly applied to data, with a focus on monitoring user activity and applying security controls
to ensure data leak prevention.
4. Phase 4: Share
Data is constantly being shared between employees, customers and partners, necessitating a
strategy that continuously monitors data stores and users. Data move among a variety of
public and private storage locations, applications and operating environments, and are
accessed by various data owners from different devices and platforms. That can happen at
any stage of the data security lifecycle, which is why it’s important to apply the right security
controls at the right time.
5. Phase 5: Archive
In the case of data leaving active use but still needed to be available, they should be securely
archived in appropriate storages, normally of low cost and performance, sometimes offline.
This may cover also version control where older versions of original (raw) data files and data
source processing programs are maintained in archive storages, per case. These backups are
then stored and can be brought back online within a reasonable timeframe that will ensure
that there is no detrimental effect of the data being lost or corrupted.
6. Phase 6: Destroy
In the case of data no longer needed, this data should be deleted securely so as to avoid any
data leakage.
5.2 Data recovery Data recovery strategy (also called disaster recovery plan) is not only a plan, but also ongoing
process of minimizing a risk of data loss that can be a consequence of different random
events.
Since DataBio is a project dealing with Big Data scenarios, the context of data recovery is
focused mostly on management procedures of data centers that are able to store and process
significant amount of data. The disasters that can occur can be classified into two categories:
• Natural disasters (floods, hurricanes, tornadoes or earthquakes): because they cannot be avoided it is possible to minimize their effects on IT infrastructure (distributed backups)
• Man-made disasters (infrastructure failure, software bugs, hackers attacks): besides minimizing the effect it is possible to prevent them in different ways (regular software updates, good, active protection mechanisms, regular testing procedures)
The most important elements of Data recovery plan are:
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 48
• Backup management: well-designed automatic procedures for regular storing copies of datasets on separate machines or even geographically distributed places
• Replication of data to an off-site location, which overcomes the need to restore the data (only the systems then need to be restored or synchronized), often making use of storage area network (SAN) technology
• Private Cloud solutions that replicate the management data (VMs, Templates and disks) into the storage domains that are part of the private cloud setup.
• Hybrid Cloud solutions that replicate both on-site and to off-site data centers. These solutions provide the ability to instantly fail-over to local on-site hardware, but in the event of a physical disaster, servers can be brought up in the cloud data centers as well.
• The use of high availability systems which keep both the data and system replicated off-site, enabling continuous access to systems and data, even after a disaster (often associated with cloud storage)
Several partners in the project are infrastructure providers. They ensure high quality in terms
of reliability and scalability.
5.3 Privacy and sensitive data management
5.3.1 Introduction
With regards to privacy and sensitive data management, it is confirmed that these activities
will be rigorously implemented in compliance to the privacy and data collection rules and
regulations as they are applied nationally and in the EU, as well as with the H2020 rules. The
next sections include more specific information regarding those activities, rules and measures
based on the classification of data made in the introduction of this section (5.1).
5.3.2 Enterprise Data (commercial sensitive data)
This category of data includes the (raw) data coming from specific sensor nodes and other
similar data management systems and sources from the various project partners in each pilot
case. They also include data about technologies and other assets protected by IPR and are
considered to be highly-commercially sensitive, belonging to the partner that provides them
for the various research and pilot activities within DataBio project. Therefore, access to those
data will be controlled and exchanges normally take place between specific end users and
partners involved in their use and management within each pilot case for DataBio related
activities.
Following also project GA and CA, each partner who provides or otherwise makes available to
any other project partner shared information represents that: (i) it has the authority to
disclose this shared information, (ii) where legally required and relevant, it has obtained
appropriate informed consents from all individuals involved, or from any other applicable
institution, all in compliance with applicable regulations; and (iii) there is no restriction in
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 49
place that would prevent any such other project partner from using this shared information
for the purpose of DataBio project and the exploitation thereof.
The abovementioned rules are also applied to any new data stemming from the project
activities. This data will be also anonymised and protected and only based on the above rules
our partners will be able to make data available to external industry stakeholders to utilise
them for their own purposes. Related publications will be released and disseminated through
the project dissemination and exploitation channels to make these parties aware of the
project as well as appropriate access to any data (see Appendix A for DataBio specific data).
On a technical level, data are protected by IPRs are often accessed as a service, with specific
access rights given under specific terms. Alternatively, they are shared encrypted or similarly
protected with the keys provided under specific terms.
5.3.3 Personal Data
According to the Grant Agreement, it has been agreed by all partners that any Background,
Results, Confidential Information and/or any and all data and/or information that is provided,
disclosed or otherwise made available between the Parties shall not include personal data.
Accordingly, each Party agreed that it will take all necessary steps to ensure that all Personal
Data is removed from the Shared Information, made illegible, or otherwise made inaccessible
(i.e. de-identify) to the other Parties prior to providing the Shared Information.
Therefore, no personal sensitive data are included in data exchanged between partners
within DataBio. Data created within project activities, e.g. some pilot activities, could initially
involve personal and/or sensitive data from human participants, like location and id, DataBio
will apply specific security measures for their informed consent and data protection in line
with the legislation and regulations in force in the countries where the research will be carried
out, with most relevant rules to the project being the following:
• The Charter of Fundamental Rights of the EU, specifically the article concerning the protection of personal data
• Council Directive 83/570/EEC of 26 October 1983 amending Directives 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data.
Regarding the procedure that is required in order to be able to participate in any DataBio
activities, we foresee that all potential participants will have to read and sign an informed
consent form before starting the participation. This form aims to fully inform the participants
about the study procedure and goals in order to guarantee that they have basic information
in order to make the decision about whether to participate or not in the project activity. It
shall include a summary and schedule of the study, the objectives and descriptions of the
DataBio system and its components. All participants have the right to receive a copy of the
documents of this form. Participants will receive a generic user ID to identify them in the
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 50
system and to anonymise their identities. No full names will be stored anywhere
electronically. All gathered personal data shall be password protected and encrypted. Users’
personal data will be safeguarded from other people not involved in the project. No adults
unable to give informed consent will be involved.
It should be stated that the protection of the privacy of participants is a responsibility of all
persons involved in research with human participants. Privacy means, that the participant can
control the access to personal information and is able to decide who has access to the
collected data in the future. Due to the principle of autonomy, the participants will be asked
for their agreement before private and personal information is collected. It will be ensured
that all persons involved in the project activities understand and respect the requirement for
confidentiality. The participants will be informed about the confidentiality policy that is used
in this research project.
5.4 General privacy concerns Other privacy concerns will be addressed as following:
• External experts: Any external experts that will be involved in the project shall be required to sign an appropriate non-disclosure agreement prior to participating in any project related meeting, decision or activity.
• Publications: Hints to or identifiable personal information of any participant in (scientific) publications should be omitted. It is avoided to reveal the identity of participants in research deliberately or inadvertently, without the expressed permission of the participants.
• Dissemination: Dissemination of data between partners. This relates to access to data, data formats, and methods of archiving (electronic and paper), including data handling, data analyses, and research communications. Access to private information will be granted only to DataBio partners for purposes of evaluation of the system and only in an anonymised form, i.e. any personally identifiable information such as name, phone number, location, address, etc. will be omitted.
• Protection: The lead project partner of every pilot case is responsible for the protection of the participants’ privacy throughout the whole project, including procedures such as communications, data exchange, presentation of findings, etc.
• Control: The responsible project partners are not allowed to circulate information without anonymisation. This means that only relevant attributes, i.e. gender, age, etc. are retained.
• Information: As already mentioned above, the protection of the confidentiality implies informing the participants about what may be done with their data (i.e. data sharing). Individuals that participate in any study must have the right to request and obtain free of charge information on his/her personal data subjected to processing, on the origin of such data and on their communication or intended communication.
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 51
Ethical issues In line with the Consortium’s commitment in the DATABIO proposal, the ethics and
responsibility work in the project is guided by the principles of responsible research and
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 58
Figure 6: The data model of Smart Points Of Interest
A.2 Open Transport Map (UWB - D03.02)
Name Open Transport Map
Identifier D03.02
Owner 03 - UWB
Description
The Open Transport Map displays a road network which – is suitable for routing – – visualizes average daily Traffic Volumes for the whole EU – – visualizes time related Traffic Volumes (in OTN Pilot Cities - Antwerp, Birmingham, Issy-le-Moulineaux, Liberec region) – Talking technical, the Open Transport Map – can serve as a map itself as well as a layer embedded in your map –
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 59
– is derived from the most popular open dataset - OpenStreetMap – – is accessible via both GUI and API – – covers the whole European Union –
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 62
A.5 Open Land Use (Lespro - D02.01)
Name Open Land Use
Identifier D02.01
Owner 02 - Lesp
Description
Open Land Use Map is a composite map that is intended to create detailed land-use maps of various regions based on certain pan-Europen datasets such as CORINE Landcover, UrbanAtlas enriched by available regional data. The dataset is derived from available open datasources at different levels of detail and coverage. These data sources include: 1) Digital cadastral maps if available 2) Land Parcel Identification System if Available 3) Urban Atlas(European Environmental Agency) 4) CORINE Land Cover 2006 (European Environmental Agency) 5) Open Street Map The order of the data sources is according to the level of detail and, therefore, the priority for data integration.
Classification(s) Land Use, Cadastral parcels, Urban Atlas
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017
Dissemination level: PU -Public Page 76
Anonymised IACS data
A.15 Sentinel Data • Sentinel-2 HR Optical data Sentinel-2 archive. European Space Agency (ESA). Global
coverage. NP has the data for its pilot areas (Τ1.2.1, Τ1.4.1, Τ1.4.2) corresponding to 6 tiles. Thematic Exploitation Platforms, such as the Forestry TEP (C16.10), are available for online analytics.
• Sentinel-2 L1 data (C14.01). Sentinel-2 L1 data archive. ESA. Czech Republic
• Sentinel-1 IWS data (C14.02). Sentinel-1 L1 data archive. EO data. Czech Republic
A.16 Tree species map (FMI - D14.03)
Name Tree species map
Identifier D15.3
Owner ESA
Description Tree species map
Classification(s) Raster dataset
Date 2017
Area coverage Czech Republic
Time coverage 2017
Format GeoTiff
Licence Property of FMI
Related datasets <Link to the related datasets, identifiers of the descriptions>
Data set size Approximately 1Gb
Frequency of update Fixed
Access interfaces <e.g. SQL, REST>
D6.2 – Data Management Plan H2020 Contract No. 732064 Final – v1.0, 30/6/2017