D1.3 Agriculture Pilot Final Report · o Pilot B1.2: Cereals and biomass and cotton crops 2 o Pilot B1.3: Cereals and biomass crops 3 o Pilot B1.4: Cereals and biomass crops 4 •

This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee.

Project Acronym: DataBio

Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action)

Project Full Title: Data-Driven Bioeconomy

Project Coordinator: INTRASOFT International

DELIVERABLE

D1.3 – Agriculture Pilot Final Report

Dissemination level PU -Public

Type of Document Report

Contractual date of delivery M36 – 31/12/2019

Deliverable Leader TRAGSA

Status - version, date Final – v1.2, 21/1/2020

WP / Task responsible WP1

Keywords: Agriculture, pilot, Big Data, modelling, stakeholders,

final results

D1.3 – Agriculture Pilot Final Report H2020 Contract No. 732064 Final – v1.2, 21/1/2020

Dissemination level: PU -Public Page 2

Executive Summary D1.1 “Agriculture Pilot Definition”, submitted on the initial stage of the agriculture pilots,

reported the definition of use cases and the description of their requirements, collected

through a collaborative effort involving Big Data Technology (BDT) experts, end users and

other relevant stakeholders. D1.2 “Agriculture Pilots intermediate report” presented the

agriculture pilot intermediate progress, being focused on DataBio Trail 1 results. This current

document, DataBio deliverable D1.3 “Agriculture Pilot Final Report”, refers to the entire

agriculture pilot report and final WP1 DataBio outcomes.

All the deliverables are in line with the objective of WP1 “Agriculture pilot” which is to

demonstrate how technologies dealing with Big Data will be implemented into pilots and

validated on real-world cases in order to fulfil the end user communities’ expectations.

D1.3 highlights the results from agriculture pilots mostly from the second period (2018-2019)

trials, i.e. Trial 2, as a consequence of experimentations ran in Trial 1.

A total of 13 pilots have been completed in DataBio project testing Big Data technologies in

key areas of interest including horticulture, arable farming, subsidies and insurance, with the

ultimate aim of addressing different challenges facing the EU’s agriculture ecosystems:

(A) Precision Horticulture including vine and olives:

• Group A1: Precision agriculture in olives, fruits, grapes and vegetables

o Pilot A1.1: Precision agriculture in olives, fruits, grapes

o Pilot A1.2: Precision agriculture in vegetable seed crops

o Pilot A1.3: Precision agriculture in vegetables -2 (Potatoes)

• Group A2: Big Data management in greenhouse eco-systems

o Pilot A2.1: Big Data management in greenhouse eco-systems

(B) Arable Precision Farming:

• Group B1: Cereals and biomass crops

o Pilot B1.1: Cereals and biomass crops

o Pilot B1.2: Cereals and biomass and cotton crops 2

o Pilot B1.3: Cereals and biomass crops 3

o Pilot B1.4: Cereals and biomass crops 4

• Group B2: Machinery management

o Pilot B2.1: Machinery management

(C) Subsidies and insurance:

• Group C1: Insurance

o Pilot C1.1: Insurance (Greece)

o Pilot C1.2: Farm Weather Insurance Assessment

• Group C2: CAP support

o Pilot C2.1: CAP Support

o Pilot C2.2: CAP Support (Greece)



The present document offers the final outcomes of the Tasks 1.2, 1.3 and 1.4 where Big Data

were exploited in the pilots together with IoT (Internet of Things) sensor data and EO (Earth

Observation) data; e.g. involving multispectral data and satellite imagery-derived markers as

NDVI (Normalized Difference Vegetation Index) indexes or biophysical parameters such as

fAPAR (fraction of Absorbed Photosynthetically Active Radiation) and several algorithms (as

machine learning techniques). DataBio platform technological components were deployed

through several applications including the development of irrigation needs algorithms, in

order to obtain full functionality in web applications based on high frequency, scalable

satellite image data at local and national level. Crop monitoring was carried out in order to

fine-tune the models to plant growth, development and performance, and health. The results

achieved in the second and final trial were satisfactory, and this document is a succinct

summary as measured against the defined objectives.



Deliverable Leader: Jesús Estrada (TRAGSATEC)

Contributors:

Savvas Rogotis, Natassa Miliaraki, Kostas Mastrogiannis (NP)

Isabelle Picard (VITO)

Balestri Stefano (CAC)

Nicole Bartelds (NB Advies)

Ephrem Habyarimana (CREA)

Jesús Estrada (TRAGSATEC)

Karel Charvat Jr., Lukas Vojtěch (LESPRO)

Jaroslav Šmejkal (Zetor)

Raul Palma (PSNC)

Antonella Catucci, Laura De Vendictis, Alessia Tricomi (e-geos)

Maria Luisa Quarta (MEEO)

Maria Plakia, Dimitris Karamitros (EXUS)

Adrian Stoica, Olimpia Copacenaru (TerraS)

Reviewers:

Savvas Rogotis (NP)

Anagnostis Argiriou, Sofia Michailidou (CERTH)

Tomas Mildorf (UWB)

Approved by: Athanasios Poulakidas (INTRASOFT)

Document History

Version Date Contributor(s) Description

0.1 30/10/2019 Jesús Estrada,

Savvas Rogotis,

Karel Charvat Jr.

Table of Contents

0.2 29/11/2019 All contributors First draft

0.3 6/12/2019 E. Habyarimana Pilots A2.1 and B1.3

0.4 10/12/2019 Reviewers Editorial review

0.5 28/12/2019 Jesús Estrada Final draft

1.0 31/12/2019 A. Poulakidas Final version for submission

1.1 20/01/2020 Jesús Estrada Updated input from partners

1.2 21/01/2020 A. Poulakidas Final version for resubmission



Table of Contents EXECUTIVE SUMMARY ....................................................................................................................................... 2

TABLE OF CONTENTS .......................................................................................................................................... 5

TABLE OF FIGURES ............................................................................................................................................. 9

LIST OF TABLES ................................................................................................................................................. 14

DEFINITIONS, ACRONYMS AND ABBREVIATIONS ............................................................................................ 15

1 INTRODUCTION ...................................................................................................................................... 17

PROJECT SUMMARY ................................................................................................................................. 17 DOCUMENT SCOPE .................................................................................................................................. 19 DOCUMENT STRUCTURE ........................................................................................................................... 19

2 AGRICULTURE PILOTS SUMMARY ........................................................................................................... 20

OVERVIEW ............................................................................................................................................. 20 INTRODUCTION OF PILOT CASES................................................................................................................... 21

3 PILOT 1 [A1.1] PRECISION AGRICULTURE IN OLIVES, FRUITS, GRAPES ................................................... 27

PILOT OVERVIEW ..................................................................................................................................... 27 SUMMARY OF PILOT BEFORE TRIAL 2 ............................................................................................................ 28 PREPARATION AND EXECUTION OF TRIAL 2 .................................................................................................... 28

Trial 2 timeline........................................................................................................................... 28 Preparation for Trial 2 ............................................................................................................... 28 Trial 2 execution ........................................................................................................................ 31 Trial 2 results ............................................................................................................................. 35

COMPONENTS, DATASETS AND PIPELINES ...................................................................................................... 36 DataBio component deployment status ..................................................................................... 36 Data Assets ............................................................................................................................... 37

EXPLOITATION AND EVALUATION OF PILOT RESULTS ......................................................................................... 38 Pilot exploitation based on results ............................................................................................. 38 KPIs ........................................................................................................................................... 39

4 PILOT 2 [A1.2] PRECISION AGRICULTURE IN VEGETABLE SEED CROPS ................................................... 43


Trial 2 timeline........................................................................................................................... 46 Preparation and execution of Trial 2 .......................................................................................... 46 Trial 2 results ............................................................................................................................. 46

COMPONENTS, DATASETS AND PIPELINES ...................................................................................................... 54 DataBio component deployment status ..................................................................................... 54 4.4.3 Data Assets ....................................................................................................................... 55


5 PILOT 3 [A1.3] PRECISION AGRICULTURE IN VEGETABLES_2 (POTATOES) .............................................. 57







6 PILOT 4 [A2.1] BIG DATA MANAGEMENT IN GREENHOUSE ECO-SYSTEM .............................................. 81





7 PILOT 5 [B1.1] CEREALS AND BIOMASS CROP ......................................................................................... 97


Trial 2 timeline........................................................................................................................... 98 Preparation for Trial 2 ............................................................................................................... 99 Trial 2 execution ........................................................................................................................ 99 Trial 2 results ........................................................................................................................... 101

COMPONENTS, DATASETS AND PIPELINES .................................................................................................... 104 DataBio component deployment status ................................................................................... 109 Data Assets ............................................................................................................................. 110

EXPLOITATION AND EVALUATION OF PILOT RESULTS ....................................................................................... 110 Pilot exploitation based on results ........................................................................................... 112 KPIs ......................................................................................................................................... 113

8 PILOT 6 [B1.2] CEREALS, BIOMASS AND COTTON CROPS_2.................................................................. 114

PILOT OVERVIEW ................................................................................................................................... 114 SUMMARY OF PILOT BEFORE TRIAL 2 .......................................................................................................... 115 PREPARATION AND EXECUTION OF TRIAL 2 .................................................................................................. 115

Trial 2 timeline......................................................................................................................... 115 Preparation for Trial 2 ............................................................................................................. 115 Trial 2 execution ...................................................................................................................... 118 Trial 2 results ........................................................................................................................... 120




EXPLOITATION AND EVALUATION OF PILOT RESULTS ....................................................................................... 122 Pilot exploitation based on results ........................................................................................... 122 KPIs ......................................................................................................................................... 123

9 PILOT 7 [B1.3] CEREAL AND BIOMASS CROPS_3 ................................................................................... 124

PILOT OVERVIEW ................................................................................................................................... 124 SUMMARY OF PILOT BEFORE TRIAL 2 .......................................................................................................... 125 PREPARATION AND EXECUTION OF TRIAL 2 .................................................................................................. 127

Trial 2 timeline......................................................................................................................... 127 Preparation for Trial 2 ............................................................................................................. 127 Trial 2 execution ...................................................................................................................... 127 Trial 2 results ........................................................................................................................... 129


EXPLOITATION AND EVALUATION OF PILOT RESULTS ....................................................................................... 132 KPIs ......................................................................................................................................... 132

10 PILOT 8 [B1.4] CEREALS AND BIOMASS CROPS_4 ................................................................................. 133

PILOT OVERVIEW .............................................................................................................................. 133 SUMMARY OF PILOT BEFORE TRIAL 2 ..................................................................................................... 133

Linked Data ......................................................................................................................... 135 PREPARATION AND EXECUTION OF TRIAL 2 .............................................................................................. 138

Trial 2 timeline .................................................................................................................... 138 Preparation for Trial 2 ......................................................................................................... 139 Trial 2 execution ................................................................................................................. 140 Trial 2 results ...................................................................................................................... 143

COMPONENTS, DATASETS AND PIPELINES ................................................................................................ 144 DataBio Component deployment status .............................................................................. 144 Data Assets ......................................................................................................................... 145

EXPLOITATION AND EVALUATION OF PILOT RESULTS .................................................................................. 146 Pilot exploitation based on results....................................................................................... 146 KPIs ..................................................................................................................................... 147

11 PILOT 9 [B2.1] MACHINERY MANAGEMENT ......................................................................................... 148

PILOT OVERVIEW .............................................................................................................................. 148 SUMMARY OF PILOT BEFORE TRIAL 2 ..................................................................................................... 148 PREPARATION AND EXECUTION OF TRIAL 2 .............................................................................................. 149


COMPONENTS, DATASETS AND PIPELINES ................................................................................................ 155 DataBio component deployment status .............................................................................. 155 Data Assets ......................................................................................................................... 157


12 PILOT 10 [C1.1] INSURANCE (GREECE) .................................................................................................. 162

PILOT OVERVIEW .............................................................................................................................. 162 SUMMARY OF PILOT BEFORE TRIAL 2 ..................................................................................................... 162



PREPARATION AND EXECUTION OF TRIAL 2 .............................................................................................. 163 Trial 2 timeline .................................................................................................................... 163 Preparation for Trial 2 ......................................................................................................... 163 Trial 2 execution ................................................................................................................. 167 Trial 2 results ...................................................................................................................... 171



13 PILOT 11 [C1.2] FARM WEATHER INSURANCE ASSESSMENT ................................................................ 178





14 PILOT 12 [C2.1] CAP SUPPORT .............................................................................................................. 205





15 PILOT 13 [C2.2] CAP SUPPORT (GREECE)............................................................................................... 238



COMPONENTS, DATASETS AND PIPELINES ................................................................................................ 247 DataBio component deployment status .............................................................................. 247



Data Assets ......................................................................................................................... 248 EXPLOITATION AND EVALUATION OF PILOT RESULTS .................................................................................. 250

Pilot exploitation based on results....................................................................................... 250 KPIs ..................................................................................................................................... 250

16 CONCLUSION ........................................................................................................................................ 252

Table of Figures FIGURE 1: PILOT A1.1 HIGH-LEVEL OVERVIEW ............................................................................................................. 27 FIGURE 2: PILOT A1.1 TIMELINE ............................................................................................................................... 28 FIGURE 3: SCREENSHOT OF THE UNIFIED UI DEVELOPED FOR A1.1 TRIAL 2. THE RED MENU ITEM INDICATES FARM LOG FUNCTIONALITIES

WHILE THE ORANGE MENU ITEM THE FARM MANAGEMENT FUNCTIONALITIES RESPECTIVELY. .......................................... 29 FIGURE 4: SCREENSHOTS OF THE ANDROID APP USED FOR COLLECTING FARM DATA ............................................................... 30 FIGURE 5: PARCEL MONITORING AT CHALKIDIKI PILOT SITE INDICATING INTRA-FIELD VARIATIONS IN TERMS OF VEGETATION INDEX

(NDVI) AND CROSS-CORRELATIONS AMONG THE LATTER WITH: A) AMBIENT TEMPERATURE (°C) AND B) RAINFALL (MM) ..... 32 FIGURE 6: PARCEL MONITORING AT STIMAGKA PILOT SITE INDICATING INTRA-FIELD VARIATIONS IN TERMS OF VEGETATION INDEX (NDVI)

AND CROSS-CORRELATIONS AMONG THE LATTER WITH A) NDVI FROM 2018 CULTIVATING PERIOD AND B) RAINFALL (MM) FROM

2018 AND 2019 CULTIVATING PERIODS ............................................................................................................ 33 FIGURE 7: IRRIGATION MONITORING AT A VERIA PILOT PARCEL SHOWING TWO (2) CORRECT IRRIGATIONS (WATER DROP ICONS) AFTER

FOLLOWING THE ADVISORY SERVICES DURING 2019 CULTIVATING PERIOD. THE IMPACT OF RAINFALLS IN THE SOIL WATER

CONTENT IS OBVIOUS (~10/6) AND IF TRANSLATED CORRECTLY CAN PREVENT UNNECESSARY IRRIGATIONS ........................ 33 FIGURE 8: CROP PROTECTION MONITORING AT A VERIA PILOT PARCEL SHOWING FOUR (4) CORRECT SPRAYS (SPRAYING ICONS) AFTER

FOLLOWING THE ADVISORY SERVICES AND THE INDICATIONS FOR HIGH CURL LEAF RISK DURING 2019 CULTIVATING PERIOD. THE

DASHED VERTICAL LINES INDICATE CRITICAL CROP PHENOLOGICAL STAGES .................................................................. 34 FIGURE 9: FERTILIZATION ADVICE FOR A CHALKIDIKI PILOT PARCEL..................................................................................... 34 FIGURE 10: PILOT A1.1 AGGREGATED FINDINGS .......................................................................................................... 35 FIGURE 11: REPRESENTATIVES OF E.C., FARM EUROPE AND OTHER PARTICIPANTS OF THE PILOT VISIT IN STIMAGKA ..................... 39 FIGURE 12: A1.2 FIELD LOCATIONS IN 2018 MONITORING PROGRAM ............................................................................... 44 FIGURE 13: WATCHITGROW® SCREENSHOT OF THE “FIELD DASHBOARD” ........................................................................... 46 FIGURE 14: “GREENNESS” FAPAR CURVE .................................................................................................................. 47 FIGURE 15: CORRELATION BETWEEN THE HARVEST DATE FOR SUGAR BEET SEEDS IN 2019 ESTIMATED FROM SENTINEL-2 IMAGES (DATE

WITH FAPAR = 0,4) AND THE ACTUAL HARVEST DATE RECORDED BY CAC SEEDS ......................................................... 48 FIGURE 16: FAPAR VALUES AT HARVEST FOR 2019 ...................................................................................................... 48 FIGURE 17: CORRELATION BETWEEN THE HARVEST DATE FOR SUGAR BEET SEEDS IN 2018 (LEFT) AND 2019 (RIGHT) ESTIMATED FROM

FUSED SENTINEL-1 AND SENTINEL-2 IMAGES (DATE WITH CROPSAR FAPAR = 0,36) AND THE ACTUAL HARVEST DATE RECORDED

BY CAC SEEDS ............................................................................................................................................. 49 FIGURE 18: ERROR OF HARVEST DATE ESTIMATION, IN DAYS, FOR 2018 AND 2019 (138 FIELDS) ............................................ 49 FIGURE 19: CORRELATION BETWEEN THE HARVEST DATE FOR SUGAR BEET SEEDS IN 2018 (LEFT) AND 2019 (RIGHT) ESTIMATED FROM

FUSED SENTINEL-1 AND SENTINEL-2 IMAGES (CROPSAR FAPAR) ON 15 AUGUST (FULL SEASON) AND THE ACTUAL HARVEST

DATE RECORDED BY CAC SEEDS ....................................................................................................................... 50 FIGURE 20: CORRELATION (R² VALUE) BETWEEN THE ESTIMATED AND ACTUAL HARVEST DATES AT DIFFERENT TIMES BEFORE HARVEST IN

2018 (BLUE) AND 2019 (GREEN) .................................................................................................................... 51 FIGURE 21: CORRELATION BETWEEN THE HARVEST DATE FOR SUGAR BEET SEEDS IN 2019 ESTIMATED FROM (LEFT) ORIGINAL SENTINEL-

2 IMAGES (DATE WITH FAPAR = 0,23) AND (RIGHT) FUSED SENTINEL-1 AND SENTINEL-2 IMAGES (DATE WITH CROPSAR FAPAR

= 0,18) AND THE ACTUAL HARVEST DATE RECORDED BY CAC SEEDS ......................................................................... 51 FIGURE 22: ERROR OF HARVEST DATE ESTIMATION FOR SOYBEANS, IN DAYS, FOR 2019 (41 FIELDS) ......................................... 52 FIGURE 23: CORRELATION BETWEEN THE HARVEST DATE FOR SOYBEANS IN 2019 ESTIMATED FROM FUSED SENTINEL-1 AND SENTINEL-

2 IMAGES (CROPSAR FAPAR) ON 20 OCTOBER (FULL SEASON) AND THE ACTUAL HARVEST DATE RECORDED BY CAC SEEDS . 52 FIGURE 24: SUNFLOWER FIELD AT HARVESTING STAGE ................................................................................................... 53 FIGURE 25: PROCESSED SENTINEL DATA INTO GREENNESS; AVAILABLE FOR THE GROWING SEASON (A1.3) ................................. 57



FIGURE 26: GREENNESS GRAPH DURING GROWING SEASON (A1.3) .................................................................................. 58 FIGURE 27: IMAGE DEMONSTRATING DROUGHT IN SUMMER 2018 FROM SENTINEL DATA (A1.3) ........................................... 58 FIGURE 28: ANALYSIS OF GREENLAND MANAGEMENT BASED ON THE GREENNESS FROM SENTINEL DATA (A1.3) .......................... 58 FIGURE 29: CONCEPT OF A SIMPLE (STARCH) POTATO DSS ............................................................................................. 60 FIGURE 30: MAP OF SOIL CHARACTERISTICS FOR THE NETHERLANDS.................................................................................. 61 FIGURE 31: WEATHER DATA (PRECIPITATION PER DAY VS TEMPERATURE) FROM WEATHER STATIONS ........................................ 61 FIGURE 32: WEATHER DATA (PRECIPITATION) FROM WEATHER STATIONS ........................................................................... 62 FIGURE 33: SOIL MOISTURE SENSORS ......................................................................................................................... 62 FIGURE 34: A1.3 GENERAL LOCATION ........................................................................................................................ 63 FIGURE 35: FARM AREAS SELECTED FOR THE PILOT A1.3 ................................................................................................ 63 FIGURE 36: ONLINE PLATFORM FOR CROP MONITORING AND BENCHMARKING ..................................................................... 64 FIGURE 37: LAI-WDVI POLYNOMIAL REGRESSION MODEL FOR SPRING POTATOES ACHIEVING HIGH R2. DOI: 10.1117/12.2029099

................................................................................................................................................................ 65 FIGURE 38: POTATO TRIAL FIELDS ............................................................................................................................. 66 FIGURE 39: UAV SPECTRAL IMAGE (RED EDGE NDVI -INDEX) IMAGE TAKEN 25 JUNE 2019 .................................................. 66 FIGURE 40: MONITORING OF TRIAL FIELDS DURING JULY AND AUGUST .............................................................................. 67 FIGURE 41: PERFORMANCE OF YIELD POTENTIAL (MEAN VALUES VS DATE) .......................................................................... 67 FIGURE 42: CROP MONITORING EXPRESSING VARIABILITY IN LAI ...................................................................................... 69 FIGURE 43: SOIL MOISTURE AND LAI INDEX DATA FOR THE PILOT FIELDS............................................................................. 70 FIGURE 44: PREDICTION DRY MATTER, BEGINNING OF JULY 2019..................................................................................... 71 FIGURE 45: DATA FOR THE WATER-LIMITED GROWTH MODEL .......................................................................................... 71 FIGURE 46: WATER LIMITED CROP GROWTH MODEL WITHOUT GROUNDWATER ................................................................... 72 FIGURE 47: DRY MATTER AND TOTAL YIELD FOR PILOT FIELDS DURING THE BEGINNING OF JULY AND HARVEST TIME ...................... 72 FIGURE 48: POTENTIAL CROP PRODUCTION (A1.3) ....................................................................................................... 73 FIGURE 49: A1.3 SAMPLES ..................................................................................................................................... 73 FIGURE 50: TOMATO ACCESSIONS IN GLASSHOUSE UNDER BREEDING SETTINGS .................................................................... 82 FIGURE 51: DDRAD PROTOCOL MODIFIED FROM PETERSON ET AL., 2012. PMCID: PMC3365034,

DOI:10.1371/JOURNAL.PONE.0037135 ........................................................................................................ 83 FIGURE 52: THE STACKS PIPELINE, AVAILABLE AT HTTP://CATCHENLAB.LIFE.ILLINOIS.EDU/STACKS/MANUAL-V1/....................... 83 FIGURE 53: CREA’S SORGHUM PILOT FIELDS USED IN THE C22.03 GENOMIC MODELS PLATFORM ............................................ 84 FIGURE 54: PRINCIPAL COMPONENT ANALYSIS FOR THE TOMATO POPULATIONS BASED ON THEIR GENETIC BACKGROUND ............... 91 FIGURE 55: PRINCIPAL COMPONENT ANALYSIS FOR THE TOMATO INDIVIDUALS BASED ON THEIR GENETIC BACKGROUND ................ 91 FIGURE 56: PRINCIPAL COMPONENT ANALYSIS FOR THE TOMATO INDIVIDUALS BASED ON THEIR BIOCHEMICAL BACKGROUND.......... 92 FIGURE 57: DISTRIBUTION (BOXPLOT) OF GS MODELS VALIDATED ACCURACY IN EXTERNAL SAMPLE (NOT USED DURING MODEL

TRAINING) OF 34 (30% OF THE TOTAL POPULATION) SORGHUM LINES. FEN, FLA, TAC, TAN, RESPECTIVELY, POLYPHENOLS,

FLAVONOIDS, TOTAL ANTIOXIDANT CAPACITY, AND CONDENSED TANNINS. TRAITS MEANS ARE INCLUDED WITHIN THE BOXPLOT.

TRAIT MEANS WITH SAME LETTER ARE NOT SIGNIFICANTLY DIFFERENT AT THE 5% LEVEL USING THE TUKEY'S HSD (HONESTLY

SIGNIFICANT DIFFERENCE) TEST. REFER TO TEXT FOR THE DESCRIPTION OF THE GS MODELS. ........................................... 93 FIGURE 58: PILOT B1.1 TIMELINE ............................................................................................................................. 99 FIGURE 59: KC AND NDVI EQUATIONS .................................................................................................................... 100 FIGURE 60: LEFT TO RIGHT: NDVI IMAGE FROM MULTISPECTRAL RPAS DATA; RGB MOSAIC; THERMAL IMAGE OVER RGB MOSAIC;

DSM ...................................................................................................................................................... 101 FIGURE 61: COMPARATIVE KC OBTAINS FOR REMOTE SENSOR IN FRONT FAO DATA PER CEREAL ............................................ 102 FIGURE 62: RESULT: HIGH-SCALE VIGOUR MAP .......................................................................................................... 103 FIGURE 63: CROPS CLASSIFICATION AND IRRIGATION NEEDS .......................................................................................... 104 FIGURE 64: MANAGEMENT PROFILE - IRRIGATION NEEDS OF THE WHOLE IRRIGATION COMMUNITY ........................................ 104 FIGURE 65: FARMER PROFILE - IRRIGATION NEEDS FOR A SPECIFIC PARCEL AND CROP .......................................................... 105 FIGURE 66: RASPBERRY UNIT AND IOT SENSORS ......................................................................................................... 106 FIGURE 67: DATA FLOW DIAGRAM OF THE MODEL FOR THE IMPLEMENTATION OF PRECISION AGRICULTURE TECHNIQUES ............. 107 FIGURE 68: DEFINITION OF HISTOGRAMS. RESULT OF HOMOGENIZATION OF IMAGES .......................................................... 108 FIGURE 69: PILOT B1.2 HIGH-LEVEL OVERVIEW ......................................................................................................... 114



FIGURE 70: PILOT B1.2 TIMELINE ........................................................................................................................... 115 FIGURE 71: SCREENSHOT OF THE UNIFIED UI DEVELOPED FOR TRIAL 2. THE RED MENU ITEM INDICATES FARM LOG FUNCTIONALITIES

WHILE THE ORANGE MENU ITEM THE FARM MANAGEMENT FUNCTIONALITIES RESPECTIVELY ......................................... 116 FIGURE 72: SCREENSHOTS OF THE ANDROID APP USED FOR COLLECTING FARM DATA ........................................................... 117 FIGURE 73: PARCEL MONITORING AT KILELER PILOT SITE INDICATING SOME SLIGHT INTRA-FIELD VARIATIONS IN TERMS OF VEGETATION

INDEX (NDVI) AND CROSS-CORRELATIONS AMONG THE LATTER WITH AMBIENT TEMPERATURE AND RAINFALL (MM) ......... 118 FIGURE 74: REFERENCE EVAPOTRANSPIRATION MONITORING AT KILELER (BOTH MODELLED USING ML METHODS DEVELOPED BY NP

AND BASED ON COPERNICUS EO DATA) FOR JULY 2019 ...................................................................................... 119 FIGURE 75: IRRIGATION MONITORING AT A KILELER PILOT PARCEL SHOWING ONE (1) CORRECT IRRIGATION (WATER DROP ICON) AFTER

FOLLOWING THE ADVISORY SERVICES. THE IMPACT OF RAINFALLS IN THE SOIL WATER CONTENT IS OBVIOUS ON SEVERAL

OCCASIONS AND IF TRANSLATED CORRECTLY CAN PREVENT UNNECESSARY IRRIGATIONS ............................................... 119 FIGURE 76: AGGREGATED RESULTS OF THE PILOT IN COMPARISON WITH THE TARGET VALUES ................................................ 120 FIGURE 77: SORGHUM PILOTS ESTABLISHED IN 2019 .................................................................................................. 125 FIGURE 78: SORGHUM FOLIAR DISEASES DETECTED AREA WITH THE RELIABILITY OF 0,925 .................................................. 125 FIGURE 79: SORGHUM FOLIAR DISEASES DETECTED AREA WITH THE RELIABILITY OF 0,861 .................................................. 126 FIGURE 80: MAP OF ITALY (A) WITH A RECTANGLE INSET INDICATING THE GEOGRAPHICAL LOCATION OF THE EXPERIMENTAL SITES (RED

DOTS) FOR PILOTS ESTABLISHED IN 2017 (B) AND 2018 (C) ................................................................................ 127 FIGURE 81: LEFT: VISUALIZATION OF MODELS CROSS-VALIDATION MAE (T HA-1) DISPERSION USING BOXPLOT APPROACH AND FAPAR

ACQUIRED FROM APRIL TO AUGUST. LM, BARTMACHINE, BAYESGLM, XGBTREE, RESPECTIVELY, SIMPLE LINEAR MODEL,

BAYESIAN ADDITIVE REGRESSION TREES (BARTMACHINE METHOD), BAYESIAN GENERALIZED LINEAR MODEL (BAYESGLM

METHOD), AND EXTREME GRADIENT BOOSTING (XGBTREE METHOD). RIGHT:RELATIVE IMPORTANCE OF REGRESSORS (DAY OF

YEAR, D) ON SORGHUM BIOMASS YIELDS USING BARTMACHINE METHOD ................................................................ 130 FIGURE 82: YIELD MAPS REPRESENTED AS RELATIVE VALUES TO THE AVERAGE CROP YIELD OF EACH FIELD (HARVEST 2018) .......... 134 FIGURE 83: TRANSFORMATION AND PUBLICATION OF CZECH DATA AS LINKED DATA WITH PROTOTYPE SYSTEM FOR VISUALISING ... 135 FIGURE 84: MAP VISUALISATION PROTOTYPE (HSLAYER APPLICATION) - HTTP://APP.HSLAYERS.ORG/PROJECT-DATABIO/LAND/ ... 138 FIGURE 85: GRAPHS OF SENTINEL-2 NDVI DURING THE VEGETATION PERIOD 2019 FOR WINTER WHEAT (ABOVE) AND SPRING BARLEY

(BELLOW) AT LOCALITY OTNICE (ROSTENICE FARM). LOW PEAKS INDICATE OCCURRENCE OF CLOUDS WITHIN THE SCENE (SOURCE:

SENTINEL-2, LEVEL L1C, GOOGLE EARTH ENGINE) ............................................................................................ 139 FIGURE 86: EXAMPLE OF THE OUTPUT MAP PRODUCTS FROM YIELD POTENTIAL ZONES CLASSIFICATION FROM EO TIME-SERIES ANALYSIS:

CLASSIFICATION INTO 5% CLASSES (LEFT), 5-ZONE MAP (MIDDLE) AND 3-ZONE MAP (RIGHT). BLUE/GREEN AREAS INDICATE

HIGHER EXPECTED YIELD ............................................................................................................................... 140 FIGURE 87: MAP OF YIELD POTENTIAL ZONES (5-ZONE MAP) UPDATED FOR 2019 SEASON FROM 8-YEAR TIME-SERIES IMAGERY; FOR

SOUTHERN (LEFT) AND NORTHERN (RIGHT) PART OF ROSTENICE FARM .................................................................... 140 FIGURE 88: VARIABLE RATE APPLICATION OF SOLID FERTILIZERS BY TWIN BIN APLICATOR ON TERRAGATOR ............................... 141 FIGURE 89: VARIABLE RATE APPLICATION OF LIQUID N FERTILIZERS (DAM390) BY 36M HORSCH LEEB PT330 SPRAYER ............ 141 FIGURE 90: CROP YIELD MAPS FROM 2019 HARVEST ................................................................................................... 142 FIGURE 91: GRAPH WITH CHANGES OF CORRELATION COEFFICIENTS BETWEEN WINTER WHEAT AND SET OF SENTINEL-2 VEGETATION

INDICES DURING THE VEGETATION PERIOD 2018. MOST SENSITIVE PERIOD WAS DETECTED IN MAI AND JUNE .................. 142 FIGURE 92: GRAPH OF CORRELATION COEFFICIENTS BETWEEN WINTER WHEAT YIELD MAPS AND SENTINEL-2 NDMI (2018/06/10)

AMONG OBSERVED FIELDS. HIGHEST CORRELATION WAS DETECTED ON THE FIELDS WITH HIGHER ACREAGE AND SPATIAL

HETEROGENEITY ......................................................................................................................................... 143 FIGURE 93: TRACTOR TRAJECTORY AND WORK LOG ..................................................................................................... 148 FIGURE 94: ZETOR MAJOR .................................................................................................................................... 150 FIGURE 95: DAILY TRACTOR UTILISATION AND TRAJECTORY IN FARMTELEMETRY ................................................................ 153 FIGURE 96: SPIKES CAUSED BY 10 SECONDS INTERVAL ................................................................................................. 154 FIGURE 97: DATA COLLECTION WITH 2 SECONDS INTERVAL ........................................................................................... 154 FIGURE 98: FLUCTUATIONS IN FUEL TANK MEASUREMENT............................................................................................. 155 FIGURE 99: PILOT TIMELINE ................................................................................................................................... 163 FIGURE 100: CROP NDVI PROBABILITY DISTRIBUTION REFERRING TO A DECAD OF THE YEAR (WHEAT-LARISA REGION-2ND DECAD OF

FEBRUARY). ANOMALIES CAN BE FOUND AT THE DISTRIBUTION EXTREMES ............................................................... 164



FIGURE 101: COTTON MODEL IN KOMOTINI REGION (T35TLF TILE, MAIZE MODEL IN EVROS REGION (T35TMF TILE) AND WHEAT

MODEL IN LARISA REGION (T34SFJ TILE) BY DECAD (HORIZONTAL AXIS) .................................................................. 165 FIGURE 102: AFTERMATH OF THE FLOODS IN KOMOTINI REGION (11/7/2019) ................................................................ 167 FIGURE 103: RAINFALL VOLUME (MM) IN THE KOMOTINI REGION .................................................................................. 168 FIGURE 104: PARCEL MONITORING AT KOMOTINI REGION (COTTON) SHOWING NEGATIVE ANOMALY (DEVIATION) FOR TWO

CONSECUTIVE DECADS JUST AFTER THE DISASTROUS INCIDENT ............................................................................... 168 FIGURE 105: HIGH-LEVEL OVERVIEW OF THE AFFECTED AREA, COLOR CODED WITH THE OUTPUT OF THE FOLLOWED DAMAGE

ASSESSMENT PROCEDURES ........................................................................................................................... 169 FIGURE 106: RISK ANALYSIS TOOL THAT MEASURES THE FREQUENCY OF PRESENCE OF EXTREME WEATHER CONDITIONS (AGAINST HEAT-

WAVES, FROSTS, OR WINDSTORMS) AS DEFINED BY ELGA .................................................................................... 169 FIGURE 107: FRAUNHOFER'S UI SCREENSHOT COLOUR CODING DIFFERENT CROP TYPES ................................................... 170 FIGURE 108: FRAUNHOFER'S UI SCREENSHOT THAT INTEGRATES CSEM’S CLASSIFICATION RESULTS INTO PIXEL HEAT MAPS...... 171 FIGURE 109: MAP CLASSIFYING THE NETHERLANDS TERRITORY IN TERMS OF NUMBER OF YEARS WITH DAMAGES ....................... 179 FIGURE 110: MAP OF PRECIPITATION EXTRACTED FROM KNMI DATASET ON DATE 30/08/2015. YELLOW POINTS: LOCATIONS

PROVIDED BY THE INSURANCE COMPANY – BLUE POINTS: FURTHER LOCATIONS WITH 24-HOURS PRECIPITATION VALUES ABOVE

THE 50 MM THRESHOLD .............................................................................................................................. 180 FIGURE 111: INTRA-FIELD ANALYSIS BASED ON NDVI SPECTRAL INDEX WITH S2A AND S2B DATA (TILE T31UET - YEAR 2018) ... 181 FIGURE 112: SENTINEL-2 TILES OVER THE NETHERLANDS ....................................................................................... 183 FIGURE 113: SPATIAL DISTRIBUTION OF POTATO FIELDS WITH RESPECT TO VARIETY FOR YEAR 2017........................................ 184 FIGURE 114: COUNT OF SAMPLES PER TYPE OF POTATOES ............................................................................................ 184 FIGURE 115: SOIL TYPE MAP .................................................................................................................................. 185 FIGURE 116: METEO CLIMATE DATA FROM LOCAL WEATHER STATIONS ............................................................................ 185 FIGURE 117: DATA FROM EO DATA SERVICE MEA .................................................................................................... 186 FIGURE 118: TEMPERATURE PROFILE (PARCEL NUMBER 1971186) ................................................................................ 186 FIGURE 119: 2016-2018 RISK MAPS (SPLIT ACROSS PAGES) ......................................................................................... 188 FIGURE 120: NVDI PER CLUSTER ............................................................................................................................ 190 FIGURE 121: PARAMETER IMPORTANCE ................................................................................................................... 191 FIGURE 122: NDVI PROFILES OF DIFFERENT TYPES OF POTATO (YEAR OF REFERENCE 2017) ................................................. 192 FIGURE 123: FIVE GROUPS OF CONSUMPTION PARCELS BASED ON CUMULATIVE TEMPERATURE BETWEEN 90 AND 200 DAY OF YEAR

.............................................................................................................................................................. 192 FIGURE 124: NDVI PROFILES OF CONSUMPTION PARCELS ACCORDING THE FIVE GROUPS IDENTIFIED BY THE TEMPERATURE ANALYSIS

.............................................................................................................................................................. 193 FIGURE 125: AVERAGE TEMPERATURE TRENDS OF PARCELS IN AREAS CHARACTERIZED BY HIGHER TEMPERATURES (BLUE) AND LOWER

TEMPERATURES (PURPLE) ............................................................................................................................. 193 FIGURE 126: FOUR GROUPS OF TBM PARCELS BASED ON CUMULATIVE TEMPERATURE BETWEEN 90 AND 200 DAY OF YEAR ....... 194 FIGURE 127: NDVI PROFILES OF TBM PARCELS ACCORDING THE FOUR GROUPS IDENTIFIED BY THE TEMPERATURE ANALYSIS ........ 194 FIGURE 128: AVERAGE TEMPERATURE TRENDS OF PARCELS IN AREAS CHARACTERIZED BY HIGHER TEMPERATURES (BLUE) AND LOWER

TEMPERATURES (RED) ................................................................................................................................. 195 FIGURE 129: THREE GROUPS OF STARCH PARCELS BASED ON CUMULATIVE TEMPERATURE BETWEEN 90 AND 200 DAY OF YEAR ... 195 FIGURE 130: NDVI PROFILES OF STARCH PARCELS ACCORDING TO THE THREE GROUPS IDENTIFIED BY THE TEMPERATURE ANALYSIS 196 FIGURE 131: FOUR GROUPS OF NAK PARCELS BASED ON CUMULATIVE TEMPERATURE BETWEEN 90 AND 200 DAY OF YEAR ........ 196 FIGURE 132: NDVI PROFILES OF NAK PARCELS ACCORDING THE FOUR GROUPS IDENTIFIED BY THE TEMPERATURE ANALYSIS ........ 197 FIGURE 133: AVERAGE TEMPERATURE TRENDS OF PARCELS IN AREAS CHARACTERIZED BY HIGHER TEMPERATURES (BLUE) AND LOWER

TEMPERATURES (RED) ................................................................................................................................. 197 FIGURE 134: INTRA-FIELD ANALYSIS BASED ON NDVI SPECTRAL INDEX WITH S2A AND S2B DATA (YEAR 2017) ........................ 198 FIGURE 135: AREAS OF ANOMALOUS GROWTH .......................................................................................................... 198 FIGURE 136: CROP FAMILIES DETECTION USING SENTINEL 2 TEMPORAL SERIES .................................................................. 206 FIGURE 137: PIXEL-BASED RESULTS OF THE ANALYSIS REGARDING POTENTIAL INCONGRUENCES WITH RESPECT TO FARMERS’

DECLARATIONS STATING CROP TYPES AND AREAS COVERED ................................................................................... 206 FIGURE 138: PILOT-BASED RESULTS OF THE ANALYSIS REGARDING POTENTIAL INCONGRUENCES WITH RESPECT TO FARMERS’

DECLARATIONS STATING CROP TYPES AND AREAS COVERED ................................................................................... 207



FIGURE 139: NDVI TEMPORAL TREND WITH IDENTIFICATION OF RELEVANT PERIODS ........................................................... 208 FIGURE 140: TRIAL 2 TIMELINE OF ROMANIAN AOI IN PILOT C2.1 ................................................................................. 211 FIGURE 141: TRIAL 2 TIMELINE OF ITALIAN AOI IN C2.1 .............................................................................................. 212 FIGURE 142: STRUCTURE OF THE DATA FOR THE 10,000 SQKM AREA OF INTEREST ............................................................. 213 FIGURE 143: AGRICULTURAL LAND PLOTS FOR THE 10,000 SQKM AREA OF INTEREST. DATA SOURCE: AGENCY FOR PAYMENTS AND

INTERVENTION IN AGRICULTURE (APIA), ROMANIA ........................................................................................... 213 FIGURE 144: ROMANIA - TOTAL DECLARED AREA AND NUMBER OF PLOTS REGISTERED FOR CAP SUPPORT (2019). DATA SOURCE:

AGENCY FOR PAYMENTS AND INTERVENTION IN AGRICULTURE (APIA), ROMANIA .................................................... 214 FIGURE 145: LPIS CROP FAMILIES DISTRIBUTION ........................................................................................................ 215 FIGURE 146: LPIS LEGEND WITH CROP TYPE AGGREGATION IN MACRO CLASSES ................................................................. 216 FIGURE 147: SUMMARY OF MARKERS PERIODS FOR EACH MACRO CLASS OF CROP TYPE ........................................................ 217 FIGURE 148: EXAMPLES OF VERIFIED (LEFT) AND NOT VERIFIED (RIGHT) AUTUMN-WINTER ARABLE LAND PARCEL ....................... 218 FIGURE 149: EXAMPLES OF VERIFIED (LEFT) AND NOT VERIFIED (RIGHT) SUMMER ARABLE LAND PARCEL .................................. 219 FIGURE 150: EXAMPLES OF VERIFIED (LEFT) AND NOT VERIFIED (RIGHT) TEMPORARY GRASSLAND PARCEL................................ 219 FIGURE 151: EXAMPLES OF NOT VERIFIED (LEFT) AUTUMN-WINTER ARABLE LAND RE-CLASSIFIED AS SUMMER ARABLE LAND (RIGHT)

.............................................................................................................................................................. 219 FIGURE 152: EXAMPLES OF NOT VERIFIED (LEFT) SUMMER ARABLE LAND RE-CLASSIFIED AS ARTEFACT (RIGHT) DUE TO THE PRESENCE OF

A NEW BUILDING ........................................................................................................................................ 220 FIGURE 153: EXAMPLE OF CAP SUPPORT ANALYSIS - TRIAL 2 RESULTS ............................................................................ 221 FIGURE 154: TRIAL 2 RESULTS. OBSERVED CROP TYPE MAP (2019) FOR THE AREA OF INTEREST IN SOUTHEASTERN ROMANIA ...... 221 FIGURE 155: TRIAL 2 RESULTS. OBSERVED CROP TYPE MAP (2019) FOR THE ENTIRE TERRITORY OF ROMANIA .......................... 222 FIGURE 156: RESULTS OF THE VALIDATION BASED ON INDEPENDENT DATA CONSISTING OF VERY-HIGH RESOLUTION IMAGERY AND FIELD-

COLLECTED DATA ........................................................................................................................................ 223 FIGURE 157: RESULTS OF THE VALIDATION BASED ON REFERENCE DATA PROVIDED BY APIA - THE ROMANIAN NATIONAL PAYING

AGENCY ................................................................................................................................................... 224 FIGURE 158: LPIS PARCEL CLASSIFIED ACCORDING TO VERIFIED PARCELS (IN GREEN), ANOMALOUS PARCELS (IN RED) AND NOT ANALYZED

PARCELS (IN GREY) - ARABLE LAND AREA .......................................................................................................... 225 FIGURE 159: LPIS PARCELS TYPE 2016 (LEFT) AND 2018 (RIGHT) AFTER RE-CLASSIFICATION OF ANOMALOUS PARCELS - ARABLE LAND

AREA ....................................................................................................................................................... 225 FIGURE 160: 2016 LPIS SUMMER ARABLE LAND PARCELS UPDATE TO 2018 .................................................................... 226 FIGURE 161: 2016 LPIS WINTER-AUTUMN ARABLE LAND PARCELS UPDATE TO 2018 ........................................................ 226 FIGURE 162: 2016 LPIS IRRIGATED SUMMER ARABLE LAND PARCELS UPDATE TO 2018 ...................................................... 227 FIGURE 163: LPIS PARCEL CLASSIFIED ACCORDING TO VERIFIED PARCELS (IN GREEN), ANOMALOUS PARCELS (IN RED) AND NOT ANALYZED

PARCELS (IN GREY) - PERMANENT GRASSLAND AREA ........................................................................................... 227 FIGURE 164: 2016 LPIS PERMANENT GRASSLAND PARCELS UPDATE TO 2018 .................................................................. 228 FIGURE 165: EXAMPLE OF NDVI TEMPORAL TRENDS (2017-2018) OF A VINEYARD PARCEL EXPLANTED ON MARCH 2018. ........ 228 FIGURE 166: RESULTS OF THE VALIDATION BASED ON REFERENCE DATA EXTRACTED FROM VERY HIGH-RESOLUTION IMAGERY ....... 229 FIGURE 167: GEOGRAPHICAL DISTRIBUTION OF THE PARCELS THAT TAKE PART TO THE PILOT C2.2 ACTIVITIES ........................... 239 FIGURE 168: C2.2 PILOT TIMELINE ......................................................................................................................... 239 FIGURE 169: FRAUNHOFER'S UI SCREENSHOT COLOUR CODING DIFFERENT CROP TYPES ................................................... 242 FIGURE 170: FRAUNHOFER'S UI SCREENSHOT THAT INTEGRATES CSEM’S CLASSIFICATION RESULTS INTO PIXEL HEAT MAPS...... 242 FIGURE 171: NORMALIZED CROP CLASSIFICATION CONFUSION MATRIX (HORIZONTAL AXIS CORRESPONDS TO THE TRUE LABEL, WHEREAS

THE VERTICAL ONE TO THE PREDICTED LABEL) .................................................................................................... 243 FIGURE 172: GREENING ELIGIBILITY ASSESSMENT USING A TRAFFIC LIGHT SYSTEM (MAP PROJECTION EXAMPLE) ........................ 246



List of Tables TABLE 1: THE DATABIO CONSORTIUM PARTNERS .......................................................................................................... 18 TABLE 2: OVERVIEW OF AGRICULTURE PILOT CASES ....................................................................................................... 21 TABLE 3: ADVISORY SERVICES IN PILOT A1.1. ............................................................................................................... 32 TABLE 4: MORPHOLOGICAL TRAITS OF THE PLANT, FLOWER AND LEAF IN 14 TOMATO GENOTYPES ACCORDING TO THE UPOV

GUIDELINES. ................................................................................................................................................ 87 TABLE 5: PLANT VIGOR AND TOLERANCE TO HIGH TEMPERATURES IN 14 TOMATO GENOTYPES. ................................................ 88 TABLE 6:TOTAL PRODUCTION TRAITS IN 14 TOMATO GENOTYPES (SUM OF SIX WEEKLY HARVESTS). .......................................... 89 TABLE 7: THE OBSERVED PERFORMANCE OF IMPLEMENTED MODELS. ............................................................................... 129 TABLE 8: CROP CLASSIFICATION RESULTS ................................................................................................................... 243 TABLE 9: GREENING ELIGIBILITY ASSESSMENT USING A TRAFFIC LIGHT SYSTEM. ................................................................... 245



Definitions, Acronyms and Abbreviations Acronym /

Abbreviation Title

BDVA Big Data Value Association

BDT Big Data Technology

BRR Bayesian Ridge Regression

CAP Common Agricultural Policy

CEN European Committee for Standardization

DSS Decision Support System

EAV Entity-Attribute-Value

EO Earth Observation

ESA European Space Agency

EAGF European Agricultural Guarantee Fund

EU European Union

FAO Food and Agriculture Organisation of the United Nations

fAPAR fraction of Absorbed Photosynthetically Active Radiation

FAS Farm Advisory System

GAEC Good Agricultural and Environmental Conditions

GBLUP Genomic Best Linear Unbiased Prediction

GEOSS Group on Earth Observations

GPRS General Packet Radio Service

GS Genomic Selection

HPC High Performance Computing

IACS Integrated Administration and Control System

ICT Information and Communication Technologies

IoT Internet of Things

ISO International organization for Standardisation

JSON JavaScript Object Notation

KPI Key Performance Indicator

LAI Leaf Area Index

LASSO Least Absolute Shrinkage and Selection Operator

LPIS Land Parcel Identification System

NDVI Normalized Difference Vegetation Index

NGS Next-Generation Sequencing

NUTS Nomenclature of Territorial Units for Statistic

PC Personal Computer

PCA Principal Component Analysis

PF Precision Farming

PU Public

RPAS Remotely Piloted Aircraft System



RTK Real Time Kinematic

SMEs Small and medium-sized enterprises

SNP Single Nucleotide Polymorhism

TRL Technology Readiness Level

UAV Unmanned Aerial Vehicle

UI User Interface

UVA, UVB (UV) ultraviolet rays, (A) long wave, (B) short wave

VRA Variable Rate Application

WP Work Package

WOFOST WOrld FOod STudies



1 Introduction Project Summary

DataBio (Data-driven Bioeconomy) is a H2020 lighthouse project focusing on utilizing Big Data

to contribute to the production of the best possible raw materials from agriculture, forestry,

and fishery/aquaculture for the bioeconomy industry in order to produce food, energy and

biomaterials, also taking into account responsibility and sustainability issues.

DataBio has deployed state-of-the-art Big Data technologies taking advantage of existing

partners’ infrastructure and solutions. These solutions aggregate Big Data from the three

identified sectors (agriculture, forestry, and fishery) and intelligently process, analyse and

visualize them. The DataBio software environment allows the three sectors to selectively

utilize numerous software components, pipelines and datasets, according to their

requirements. The execution has been through continuous cooperation of end-users and

technology provider companies, bioeconomy and technology research institutes, and

stakeholders from the EU´s Big Data Value PPP programme.

DataBio has been driven by the development, use and evaluation of 27 pilots, where also

associated partners and additional stakeholders have been involved. The selected pilot

concepts have been transformed into pilot implementations utilizing co-innovative methods

and tools. Through intensive matchmaking with the technology partners in DataBio, the pilots

have selected and utilized market-ready or near market-ready ICT, Big Data and Earth

Observation methods, technologies, tools, datasets and services, mainly provided by the

partners within DataBio, in order to offer added-value services in their domain.

Based on the developed technologies and the pilot results, new solutions and new business

opportunities are emerging. DataBio has organized a series of stakeholder events, hackathons

and trainings to support result take-up and to enable developers outside the consortium to

design and develop new tools, services and applications based on the DataBio results.



The DataBio consortium is listed in Table 1. For more information about the project see

www.databio.eu.

Table 1: The DataBio consortium partners

Number Name Short name Country

1 (CO) INTRASOFT INTERNATIONAL SA INTRASOFT Belgium

2 LESPROJEKT SLUZBY SRO LESPRO Czech Republic

3 ZAPADOCESKA UNIVERZITA V PLZNI UWB Czech Republic

4 FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG

DER ANGEWANDTEN FORSCHUNG E.V.

Fraunhofer Germany

5 ATOS SPAIN SA ATOS Spain

61 STIFTELSEN SINTEF SINTEF ICT Norway

7 SPACEBEL SA SPACEBEL Belgium

8 VLAAMSE INSTELLING VOOR TECHNOLOGISCH

ONDERZOEK N.V.

VITO Belgium

9 INSTYTUT CHEMII BIOORGANICZNEJ POLSKIEJ

AKADEMII NAUK

PSNC Poland

10 CIAOTECH Srl CiaoT Italy

11 EMPRESA DE TRANSFORMACION AGRARIA SA TRAGSA Spain

12 INSTITUT FUR ANGEWANDTE INFORMATIK (INFAI)

EV

INFAI Germany

13 NEUROPUBLIC AE PLIROFORIKIS & EPIKOINONION NP Greece

14 Ústav pro hospodářskou úpravu lesů Brandýs nad

Labem

UHUL FMI Czech Republic

15 INNOVATION ENGINEERING SRL InnoE Italy

16 Teknologian tutkimuskeskus VTT Oy VTT Finland

17 SINTEF FISKERI OG HAVBRUK AS SINTEF Fishery Norway

18 SUOMEN METSAKESKUS-FINLANDS SKOGSCENTRAL METSAK Finland

19 IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD IBM Israel

20 WUUDIS SOLUTIONS OY2 MHGS Finland

21 NB ADVIES BV NB Advies Netherlands

22 CONSIGLIO PER LA RICERCA IN AGRICOLTURA E

L'ANALISI DELL'ECONOMIA AGRARIA

CREA Italy

23 FUNDACION AZTI - AZTI FUNDAZIOA AZTI Spain

24 KINGS BAY AS KingsBay Norway

25 EROS AS Eros Norway

26 ERVIK & SAEVIK AS ESAS Norway

27 LIEGRUPPEN FISKERI AS LiegFi Norway

28 E-GEOS SPA e-geos Italy

29 DANMARKS TEKNISKE UNIVERSITET DTU Denmark

30 FEDERUNACOMA SRL UNIPERSONALE Federu Italy

1 Replaced by partner 49 as of 1/1/2018. 2 Formerly MHG SYSTEMS OY. Terminated on 27/9/2019.

http://www.databio.eu/



31 CSEM CENTRE SUISSE D'ELECTRONIQUE ET DE

MICROTECHNIQUE SA - RECHERCHE ET

DEVELOPPEMENT

CSEM Switzerland

32 UNIVERSITAET ST. GALLEN UStG Switzerland

33 NORGES SILDESALGSLAG SA Sildes Norway

34 EXUS SOFTWARE LTD EXUS United

Kingdom

35 CYBERNETICA AS CYBER Estonia

36 GAIA EPICHEIREIN ANONYMI ETAIREIA PSIFIAKON

YPIRESION

GAIA Greece

37 SOFTEAM Softeam France

38 FUNDACION CITOLIVA, CENTRO DE INNOVACION Y

TECNOLOGIA DEL OLIVAR Y DEL ACEITE

CITOLIVA Spain

39 TERRASIGNA SRL TerraS Romania

40 ETHNIKO KENTRO EREVNAS KAI TECHNOLOGIKIS

ANAPTYXIS

CERTH Greece

41 METEOROLOGICAL AND ENVIRONMENTAL EARTH

OBSERVATION SRL

MEEO Italy

42 ECHEBASTAR FLEET SOCIEDAD LIMITADA ECHEBF Spain

43 NOVAMONT SPA Novam Italy

44 SENOP OY Senop Finland

45 UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO

UNIBERTSITATEA

EHU/UPV Spain

46 OPEN GEOSPATIAL CONSORTIUM (EUROPE)

LIMITED LBG

OGCE United

Kingdom

47 ZETOR TRACTORS AS ZETOR Czech Republic

48 COOPERATIVA AGRICOLA CESENATE SOCIETA

COOPERATIVA AGRICOLA

CAC Italy

49 SINTEF AS SINTEF Norway

Document Scope

This deliverable focuses on the results of 13 agriculture pilots after Trial 2, highlighting KPIs

(Key Performance Indicator), final outcomes, datasets processed, and tools developed.

Document Structure

This document is comprised of the following chapters:

Chapter 1 contains introduction of the project and the deliverable.

Chapters 2 offers an overview of individual tasks and pilots.

Chapters 3 to 15 are focused on individual pilots briefly introduced in chapter 2 and previously

described in deliverable D1.1 Agriculture Pilot Definition. The results of pilots include KPIs,

datasets utilisation and components overview.

Chapter 16 is the conclusion of the document.



2 Agriculture pilots summary Overview

Work Package 1 (WP1) involves 13 agriculture pilots organised into three parallel tasks, T1.2

Precision Horticulture including vine and olives, T1.3 Arable Precision Farming, T1.4 Subsidies

and insurance. The pilots were described in the deliverable D1.2 within T1.1 task.

To enable adapting DataBio tools and services to the pilot needs and reflecting the

experiences from pilots in further development and integration of DataBio services the time

frame of the project has been divided into the following stages:

Preparatory stage was the phase where pilots and their needs were defined through a

collaborative process between Pilots Work Packages and Technical WP. During this period the

first version of DataBio platform was defined, the tools and services were adapted to the

needs of pilots in the next stage “Trial 1”. In this phase, the partners involved in pilots also

defined the first version of their business plans.

Trial 1 stage was the period where pilots were focused on using and testing the DataBio tools

and services. Those components were developed or adapted to pilot needs in the previous

preparatory stage. In addition to aiming to various technological or scientific goals, the pilots

were also focused on exploring and increasing their market potential. Deliverable D1.2 covers

these first two periods of the project.

In Trial 2 stage pilots used the updated DataBio platform and ran the second and final phase

of their experiments. In this stage, pilots were also focused on their business goals and target

markets in cooperation with WP7 (Exploitation and Business Planning) Partners.

In the final period of the DataBio project, pilots, as explained in this document, were able to

take advantage of their experience and results from the DataBio project and fully develop

their market potential.



Introduction of pilot cases

Table 2: Overview of agriculture pilot cases

Task (topic) Subtask Pilot group Pilot

T1.2 (A) Precision

Horticulture including

vine and olives

T1.2.1 A1: Precision

agriculture in olives,

fruits, grapes and

vegetables

A1.1: Precision agriculture in

olives, fruits, grapes


vegetable seed crops


vegetables -2 (Potatoes)

T1.2.2 A2: Big Data

management in

greenhouse eco-

systems

A2.1: Big Data management

in greenhouse eco-systems

T1.3 (B) Arable

Precision Farming

T1.3.1 B1: Cereals and

biomass crops

B1.1: Cereals and biomass

crops


and cotton crops 2


crops 3


crops 4

T1.3.2 B2: Machinery

management

B2.1: Machinery

management

T1.4 (C) Subsidies and

insurance

T1.4.1 C1: Insurance C1.1: Insurance (Greece)

C1.2: Farm Weather

Insurance Assessment

T1.4.2 C2: CAP support C2.1: CAP Support

C2.2: CAP Support (Greece)



Α1.1: Precision agriculture in olives, fruits and grapes (NP, GAIA, IBM, Fraunhofer)

The Greek pilot focuses on offering smart farming advisory services dedicated for the

cultivation of olives, fruits and grapes, based on a set of complementary monitoring and data

management technologies (IoT, EO data, Big Data analytics). Smart farming services comprise

irrigation, fertilization and pest/disease management advice provided through flexible

mechanisms to the farmers or the agricultural advisors. The pilot targets towards the

exploitation of heterogeneous data, facts and scientific knowledge to facilitate decisions and

their application in the field. It promotes the adoption of Big Data enabled technologies and

will collaborate with certified professionals to better manage the natural resources, optimize

the use of agricultural inputs and lead to increased product quality and yields.

Α1.2: Precision agriculture in vegetable seed crops (CAC seeds, VITO)

The Italian pilot focuses on the assessment of maturity and optimal harvest date for vegetable

seed crops using satellite images. Assessing the right time for harvesting in sugar beet seed

production is crucial to get best quality in terms of vitality and germination of the seeds

harvested. Today, the decision for starting harvesting operations in seed crops is taken by

experienced fieldsmen of CAC seeds according to empirical observations. In this pilot, satellite

observations, provided by VITO, were compared with information from the field, recorded by

CAC’s fieldsmen. From the trials in 2017, 2018 and 2019 it was found that the satellite-based

greenness index (fAPAR) derived from fused Sentinel-1 and Sentinel-2 is well suited to assess

the maturity of sugar beet seeds and a maturity model was set up to estimate the optimal

harvest date directly from the satellite images. In 2019 this info was provided in near real time

to CAC via VITO’s WatchITgrow® web application. The trials for sunflowers and soybeans were

also very promising and a similar maturity model was set up for soybeans.

A1.3: Precision agriculture in potatoes (NB Advies, VITO)

The Dutch pilot is developed by NB Advies in cooperation with VITO (Belgium). In the final

stage, the pilot will focus on farmer alerts based on growth model information and satellite

imagery. This service will provide farmers timely and automated identification of problematic

spots in potato fields, where crop growth is substantially lagging behind a certain benchmark

level. With feedback information from field visits, DSS system could combine high throughput

of field and satellite data with machine learning algorithms. Eventually, it might be able to

autonomously explain the causes of field problems to the farmers.

A2.1: Big Data management in greenhouse eco-system (CREA, CERTH)

The pilot was designed to implement Genomics Prediction Models (Genetic Selection, GS) as

a solution to technological limitations met with current breeding approaches. Phenotypic

breeding and marker-aided crop improvement have been tandemly implemented but with

good results, yet, their impact on agricultural progress has reached a plateau. Indeed, both

approaches are seriously impaired by their inability to capture the full package of genetic

factors that are at the basis of plant genetic and performance potential. GS technology

demonstrated its superiority compared to previous techniques, through its ability to capture

all information reflecting the genomic profile that breeders work with, to design technologies



for improving quality and quantity of agricultural products. It is out of this context that this

pilot A2.1 was designed. The pilot is run by a collaborative effort between CREA (Italy) and

CERTH (Greece). Several genetic models will be implemented. The main problem modelled is

the performance of new and unphenotyped vegetable lines, integrating quantitative and

population genetics, driven by Big Data streaming from large-scale high-throughput genomic

platforms, biochemical analysis and phenotypic data. The technology is expected to

significantly improve genetic gain by unit of time and cost, allowing farmers to grow better

variety sooner relative conventional approaches, making more income.

B1.1: Cereals and biomass crops (TRAGSA-TRAGSATEC-ATOS-IBM)

The Spanish pilot is developed by TRAGSA and TRAGSATEC with the help of ATOS and IBM

Israel, also, Citoliva will participate in the final stage. This pilot develops accurate agricultural

"irrigation maps" and "vigor maps" (using EO data and sensors data as inputs) and setting up

an informative and management system for early warning of inhomogeneity. This service is a

preventive tool for farmers and landowners to avoid production loss and aims to become a

powerful system for big agricultural areas management. The final goal of the pilot is to reduce

cost for farmer communities through better exploitation and management of water and

energy resources.

B1.2: Cereals, biomass and cotton crops_2 (NP, GAIA, Fraunhofer)

The Greek pilot B1.2 focuses on offering smart farming advisory services dedicated for arable

crops (cotton cultivation), based on a set of complementary monitoring and data

management technologies (IoT, EO data, Big Data analytics). Smart farming services are

offered as irrigation advices through flexible mechanisms to the farmers or the agricultural

advisors. The pilot targets towards exploiting heterogeneous data, facts and scientific

knowledge to facilitate decisions and their applications on field. It promotes the adoption of

Big Data enabled technologies and will collaborate with certified professionals to better

manage the natural resources and specifically the use of fresh water.

B1.3: Cereal and biomass crops_3 (CREA, NOVAMONT, VITO, INFAI)

The pilot B1.3 was designed to implement remote sensing (satellite imagery, fAPAR, NDVI),

IoT farm telemetry, and proximal sensor network-based Big Data technologies for biomass

crop monitoring, predictions, and management in order to sustainably increase farming

productivity and quality, while at the same time, minimizing farming and environment

associated risks. Biomass crops of interest included biomass sorghum and cardoon, which can

be used for several bioeconomy relevant purposes (e.g. biofuel, fiber and biochemicals). The

IoT farm telemetry technology, implemented in preliminary trials and part of first trials, was

ultimately found and adapted to biomass sorghum as the hardware was susceptible to

damages induced by wild rodents or several software glitches that were harmful to pilot

operations. We are envisaging replacing IoT with VIS-NIR machine in the 2019 trials with

similar expected output in terms of analytics and technological output in support of

agricultural farming operations. The pilot secured adhesion of private farmers and/or farming

cooperatives. In collaboration with InfAI, CREA was able to extend crop monitoring to foliar



diseases. The first results of the pilot are encouraging as there is a good agreement between

satellite data and crop phenology. The machine learning techniques showed promising

inferences with high predictive accuracy of biomass sorghum yields early on (up to 6 months

before harvesting), with important business ramifications, particularly in terms of within-

season decision support system to the parties at interest.

B1.4: Cereal, biomass and cotton crops 4 (LESP, UWB, PSNC, NB Advies)

The pilot aims to develop a platform for mapping crop vigor status by using EO data (Landsat,

Sentinel), as the support tool for variable rate application (VRA) of fertilizers and crop

protection. This includes identification of crop status, mapping of spatial variability and

delineation of management zones. The work was supported by the development of platform

for automatic downloading of Sentinel 2 data and automatic atmospheric correction.

Currently, Lesprojekt is ready to offer commercial services with processing satellite data for

any farm in Czech Republic. The pilot was also focused on transferring Czech LPIS into FOODIE

ontology and to develop effective tools for querying data. This work was done together with

PSNC and it currently supports open access to anonymous LPIS data through FOODIE ontology

and secure access to farm data.

The main focus of the pilot is to monitor cereal fields by high resolution satellite imaging data

(Landsat 8, Sentinel 2) and delineation of management zones within the fields for variable

rate application of fertilizers. The main goal is to offer farmers a solution in the form of web

GIS portal, where users can monitor their fields from EO data, based on the specified period,

select cloudless scenes and use them for further analysis. This analysis includes unsupervised

classification for defined number of classes, as identification of main zones and generating

prescription maps for variable rate application of fertilizers or crop protection products based

on the mean doses defined by farmers in web GIS interface.

B2.1: Machinery management (LESP, ZETOR, FEDERUNACOMA, PSNC)

This pilot is mainly focused on collecting telematic data from tractors and other farm

machinery to analyse and compare with other farm data. The main goal is to collect and

integrate data and receive comparable results. A challenge associated with this pilot is that a

farm may have tractors and other machinery from manufacturers that use different telematic

solutions and data ownership/sharing policies.

C1.1: Insurance (Greece) (NP, CSEM, Fraunhofer)

The main focus of the pilot is to evaluate a set of tools and services dedicated for the

agriculture insurance market that aims to eliminate the need for on-the-spot checks for

damage assessment and promote rapid payouts. The pilot concentrates on fusing

heterogeneous data (EO data, field data) for the assessment of damages at field level.

C1.2: Farm Weather Insurance Assessment (e-GEOS, NB Advies, MEEO, VITO, CSEM, EXUS)



The objective of proposed pilot is the provision and assessment on a test area of services for

agriculture insurance market, based on the usage of Copernicus satellite data series, also

integrated with meteorological data and other ground available data. For the risk assessment

phase, the integrated usage of historical meteorological series and satellite-derived indices,

supported by proper modeling, will allow to tune EO based products in support to the risk

estimation phase. For damage assessment, the operational adoption of remotely sensed data

based services will allow optimization and tuning of new insurance products based on

objective parameters, such as maps and indices, derived from EO data and allowing a strong

reduction of ground surveys, with positive impact on insurances costs and reduction of

premium to be paid by the farmers.

Key stakeholder of the pilot is mainly the Insurance company that wants to:

• determine the regional spreading of risks for different types of bad weather event

(hail, heavy rain), to evaluate their insurance portfolio,

• determine temporal trend for different types of bad weather event (hail, heavy rain,

drought), to estimate possible influence of climate change on crop growth,

• determine the actual risk per crop on field to support the pricing of the insurance

package,

• assess the damage caused by a bad weather event, to ensure non-erroneous

compensation to farmers.

Nevertheless, farmers can be considered as secondary users and beneficiaries of the services

because they have the need to view the risk level for heavy rain and drought on field

(optionally crop specific), to evaluate the business case for prevention measures. The pilot

activities will be performed on the South of Netherlands, in an area of 1.500.000 ha targeting

at high-impact crop types.

C2.1: CAP Support (e-GEOS, Terrasigna, Tragsa)

The objective of the pilot is the provision of products and services, based on specialized highly

automated processors processing Big Data, in support to the CAP and relying on multi-

temporal series of free and open EO data, with focus on Copernicus Sentinel 2 data. Products

and services will be tuned in order to fulfil requirements from the 2015-20 EU CAP policy and

will include general information layers and indicators on EU territory with different level of

aggregation and detail up to farm level.

The proposed pilot project has been tailored on the specific needs of two end users, one

operating at National level (Romania Agriculture Ministry), and the other operating at

Regional level (AVEPA Paying Agency) which is one of the most important agricultural regions

in Italy. Services provided by the pilot will rely on the processing of Big Data, such as those

provided by Copernicus Sentinel-1 and Sentinel-2 satellite, collecting SAR and multispectral

image data with a 10-days frequency (the frequency will be increased to 5-days, when the full

constellation Sentinel-2A Sentinel-2B Sentinel 1B will be fully available).



The pilot services will demonstrate the implementation of functionalities that could be used

for supporting the subsidy process in verifying specific requests set by the EU CAP. In

particular, services in support to the control of direct payments for the improvement use of

natural resources will be addressed. In fact, to receive decoupled green payment per ha,

farmers must fulfil specific criteria, e.g. crop diversification.

Through the subsidy collection process, the compliance of agricultural parcels usage must be

verified according to the farmers’ declaration. Therefore, services will:

• Identify different crops present inside a single farm when the global size of declared

surface is exceeding 10 ha. This is due to the fact that CAP requires crops

diversification such that farmers should cultivate at least two to three different crops.

The service will be based on the management of optical satellite data together with

farmer declaration information and limited ground measures if any and will provide

an indication of compliance/not compliance of the farmer.

• Identify parcels (monitoring objects) over which the declared crop is different from

the one that extracted from the EO models (outliers). The service is based on Sentinel

data and machine learning methods for the description of the crop and analytic

methods for the identification of the outliers. The service will allow the performing of

Big Data analytics to various crop indicators on parcel level.

C2.2: CAP Support (Greece) (NP, GAIA, CSEM)

This Greek pilot C2.1 is targeting towards the evaluation of a set of EO-based services

designed appropriately to support specific needs of the CAP value chain stakeholders. The

pilot services rely on innovative tools and complementary technologies that will sustain the

interconnection with IoT infrastructures and EO platforms, the collection and ingestion of

spatiotemporal data, the multidimensional deep data exploration and modelling and the

provision of meaningful insights, thus, supporting the simplification and improving the

effectiveness of CAP. The pilot activities aim to support the farmer during the submission of

aid application and more specifically leading to an improved “greening” compliance. The

ambition of the current pilot is to deal effectively with CAP demands for agricultural crop type

identification, systematic observation, tracking and assessment of eligibility conditions over a

period fully aligned with the main concepts of the new EU agricultural monitoring approach.



3 Pilot 1 [A1.1] Precision agriculture in olives, fruits, grapes

Pilot overview

The main focus of this pilot is to offer smart farming advisory services referring to the

cultivation of olives, fruits and grapes, based on a set of complementary monitoring and data

management technologies (IoT, EO data, Big Data analytics). Smart farming services comprise

irrigation, fertilization and pest/disease management advices and they are provided through

flexible mechanisms to the farmers or the agricultural advisors. The pilot targets towards

exploiting heterogeneous data, facts and scientific knowledge to facilitate decisions and field

applications. It promotes the adoption of Big Data enabled technologies and collaborates with

certified professionals to better manage the natural resources, optimize the use of

agricultural inputs and lead to increased product quality and yields. NP is leading the pilot

activities with the support of GAIA EPICHEIREIN, IBM and Fraunhofer for the execution of the

full lifecycle of the pilot. The pilot activities are being performed in three (3) pilot sites in

Greece, namely Chalkidiki (olive trees) – 600ha, Stimagka (grapes) – 3000ha and Veria

(peaches) – 10000ha (Figure 1).

In order to support the business expansion of the Big Data enabled technologies that are

introduced within the present DataBio pilot, NP and GAIA EPICHEIREIN have already

established an innovative business model that allows a swift market uptake. With no upfront

infrastructure investment costs and a subscription fee proportionate to a parcel’s size and

crop type, each smallholder farmer, can now easily participate and benefit from the

provisioned advisory services. Moreover, and as more than 70 agricultural cooperatives are

shareholders of GAIA EPICHEIREIN, it is evident that there is a clear face to the market and a

great liaison with end-user communities for introducing the pilot innovations and promoting

the commercial adoption of the DataBio’s technologies.

Figure 1: Pilot A1.1 high-level overview



Summary of pilot before Trial 2

The pilot has completed the first round of trials during Trial 1. It effectively demonstrated how

Big Data enabled technologies and smart farming advisory services can offer the means for

better managing the natural resources and for optimizing the use of agricultural inputs. All

these assumptions have been validated through a set of pilot KPIs which in their majority met

(and in some cases even exceeded) the targeted expectations (documented in D1.2). This has

been achieved as farmers and the agricultural advisors showed a collaborative spirit and

followed the advices that were generated by DataBio’s solutions. As multiple parameters

(climate and crop type related) are affecting the agricultural production it has been proven

that a solution “one-fits-all” is not applicable and several factors need to be taken into

consideration in translating the trial results (e.g. biennial bearing phenomenon in olive trees,

heavy seasonal/regional rains, multi-year fertilization strategies, etc.).

Preparation and execution of Trial 2

Trial 2 timeline

The following roadmap applies for all three (3) pilot sites (cultivation of olives in Chalkidiki,

cultivation of grapes in Stimagka, cultivation of peaches in Veria) of this pilot (Figure 2).

Figure 2: Pilot A1.1 timeline

Preparation for Trial 2

The following work was conducted by NP, as part of the preparatory work for Trial 2. As the

requirements in terms of sensors deployed for in-the-field usage differ between pilot sites, it

became obvious that several adaptations were necessary in respect to C13.03 and the way

data were represented for both cloud-based storing and Gaiatron station configuration. More

specifically, all relational and EAV (Entity-Attribute-Value) data representations were adapted

to more flexible and scalable JSON format (JavaScript Object Notation) that performs better



in a dynamic IoT measuring environment. The latter is widely acknowledged as JSON has

become gradually the standard format for collecting and storing semi-structured datasets

originating from IoT devices. The adaptation to a JSON format for modelling IoT data streams

allows further processing, parsing, integration and sharing of data collections in support of

system interoperability, through the adaptation on well-established and favoured linked-data

approaches (JSON-LD3).

User Interface integration was performed so that the farm management portal (holding all

data of agronomic value and the embedded DSS serving as the endpoint for providing the

advisory services) is integrated with the farm electronic calendar (the endpoint where the

farmer or the agricultural advisor ingests information to the system regarding the applied

cultivation practices, field level observations, sampling, etc.). Both these tools were

developed using the component C13.01. Integration activities were conducted in order to

offer a seamless user experience and allowing the user to carry out his/her intended

operations without going back and forth across different systems.

Figure 3: Screenshot of the unified UI developed for A1.1 Trial 2. The red menu item indicates farm log functionalities while the orange menu item the farm management functionalities respectively.

A new mobile application was developed, namely “gaiasense Field Collect”, so that field-level

data collection can be performed through an Android-powered device. Lessons-learnt from

Trial 1 indicated that by using portable smart devices, would be easier for the farmer or the

agricultural advisor to ingest data into the system (farm and eye data dimensions as indicated

in Figure 1). The application was implemented with the purpose of supporting several

functionalities, presented in Figure 4, like:

1. detailed planning and control of the process of trapping and monitoring of the

population and the spread of insect infestation within a crop. Specifically, farmers

3 https://json-ld.org/

https://json-ld.org/



have the ability to record insect infestation directly on the field with the help of a

smartphone and use these data to more effectively control the damage caused by

enemies while reducing the amount of insecticides released into the soil,

2. recording of the phenological stage of the cultivation at the time of the field

inspection,

3. recording of soil samples from points within the field, irrigation measurements, and

cultivation symptoms mainly from enemies and diseases.

Figure 4: Screenshots of the android app used for collecting farm data

An extension to the first event driven implementation has been performed by IBM,

accommodating one (1) more additional model/rule for peaches disease monitoring. In total

the current implementation monitors one (1) pest and one (1) disease breakout from each

pilot site (6 scientific crop protection models used in total), namely:

• spilocaea oleaginea and bactocera olea (olives)

• downy mildew and lobesia botrana (grapes)

• grapholita_molesta and curl leaf (peaches)

PROTON is performing a sophisticated temporal analysis exploiting the numerical output (risk

indicator) of GAIA Cloud’s SmartFarm services that calculate the risk associated with diseases

and pests’ breakouts using raw sensor measurements. PROTON results are being send via

email back to NEUROPUBLIC at specified intervals (e.g. once a week) for integration and

evaluation. PROTON’s running instance has been moved for Trial 2 to new and more stable

dockerised endpoints and server infrastructure by IBM.

The preparatory work conducted by FRAUNHOFER for Trial 2 concentrated on the discussion

how existing analytic services could be integrated into a web-based analytic-platform with

ease. The starting point for this was provided by the solution developed during Trial 1. During

this discussion, a variety of ideas were developed how different services could be integrated

into a single platform, which is also able to cope with multiple data sources, to fulfil the needs



of specific use-cases. One of the major challenges of such efforts is to reduce the complexity

of integration. Principles of modern architecture styles such as Self-contained Systems (SCS)

(https://scs-architecture.org) or Microservices were considered to promote the separation

into independent components. Each component to consists of capabilities to access data,

process or analyse it and consequently visualize the result. The integration itself must be done

at the UI-Layer. Following this approach provides more flexibility and eventually allows to

think about a platform which enables the users to build views for custom analytics tasks

composed by a variety of components. The horizontal impact of this stage can provide

solutions for multiple scenarios spanning from Smart Farming, to CAP support and Agri-

Insurance.

The implementation of Trial 2 focuses primary on the integration of external services. A

variety of visual analytic tools are included to allow efficient exploration of available data. The

integration of services and data sources is done using well-defined RESTful interfaces.

Trial 2 execution

During Trial 2 the following actions have been performed by the partners involved in the pilot

activities:

By M26, the growing season starts for all pilot sites. Moreover, DataBio platform v2 for the

pilot is fully operational and involves:

• Offering to the farmers and the agricultural advisors technological tools (unified UI

and “gaiasense” field collect android app) so as to provide feedback, measurements,

samplings (e.g. soil sampling for the fertilization advices), observations about pests,

diseases and detailed data regarding the farming practices. Especially, with respect to

the farming practices information needs to be ingested into the system at regular

intervals (once a week). As the farming ecosystem is complex, it is necessary to

capture this information in detail, in order to shape a holistic view of the monitored

parcels. NP was in charge of supervising the data collection process. Moreover,

certified agricultural advisors are starting to use the aforementioned main pilot UIs in

order to access the full set of collected data (in situ agro-climate, EO-based,

crowdsourced, modelled, machine-generated), evaluate it and offer data-driven

advices to the farmers towards better resource management, improved products and

yields (more descriptions and figures can be also found in Deliverable D1.2). In total,

the advisory services provided for all three (3) pilot sites are shown in Table 3.

• PROTON starts processing the numerical output of NP’s GAIA SmartFarm services that

is essential a risk indicator against specific pest/disease breakouts for offering even

earlier alerting/warning before conditions reach critical states.

Indicative figures from the pilot sites can be found in Figures 5 - 9.

https://scs-architecture.org/



Table 3: Advisory services in pilot A1.1.

Chalkidiki Pilot (Olives) Veria Pilot (Peaches) Stimagka Pilot (Grapes)

Irrigation + + +

Fertilization + + -

Crop

Protection

+

(exploiting scientific

models against 1 pest

and 1 disease)

+


models against 3 pests

and 4 diseases)

+


models against 2 pests

and 3 diseases)

Figure 5: Parcel monitoring at Chalkidiki pilot site indicating intra-field variations in terms of vegetation index (NDVI) and cross-correlations among the latter with: a) ambient temperature (°C) and b) rainfall (mm)



Figure 6: Parcel monitoring at Stimagka pilot site indicating intra-field variations in terms of vegetation index (NDVI) and cross-correlations among the latter with a) NDVI from 2018 cultivating period and b) rainfall (mm) from 2018 and 2019 cultivating periods

Figure 7: Irrigation monitoring at a Veria pilot parcel showing two (2) correct irrigations (water drop icons) after following the advisory services during 2019 cultivating period. The impact of rainfalls in the soil water content is obvious (~10/6) and if translated correctly can prevent unnecessary irrigations



Figure 8: Crop protection monitoring at a Veria pilot parcel showing four (4) correct sprays (spraying icons) after following the advisory services and the indications for high curl leaf risk during 2019 cultivating period. The dashed vertical lines indicate critical crop phenological stages

Figure 9: Fertilization advice for a Chalkidiki pilot parcel

By M28, a preliminary architecture for FRAUNHOFER’s analytics platform has been drafted.

The platform was the main discussion topic during the M28 DataBio Thessaloniki Codecamp,

hosted in NEUROPUBLIC’s N.Greece offices with the participation of other DataBio partners

involved in the WP1 pilots led by NEUROPUBLIC. Furthermore the generalization and simple

adaption to other scenarios was discussed intensively.

By M34, the growing season ends at all pilot sits and final KPI measurements are collected.

More specifically:

• 35 reports have been sent in total from IBM to NEUROPUBLIC offering PROTON’s CEP

results during the growing season. These reports were sent in regular intervals (once



a week) and provided flags about pest/disease breakouts in the pilot areas. As trained,

the system provides warnings from ~1.5 to ~4 days before the original alarm for pest

breakout and several hours before the alarm for disease breakout. These warnings

were evaluated by certified agricultural advisors and contributed to the decision-

making process regarding crop protection.

• With regular discussions with the farmers and the agronomists/agricultural advisors

involved in the pilot activities, final KPI measurements and feedback were collected

and can be found in Section 3.5.2. This work was conducted by NP and GAIA

EPICHEIREIN.

Trial 2 results

In Trial 2, the applied technologies and pipelines got even more mature and reached their

expected TRL (Technology Readiness Level). The farmers and their agricultural advisors

continued (for a second year) to benefit from irrigation, fertilization and pest/disease

management advices aiming to facilitate the decision-making process and optimize the use

of agricultural inputs. The collected KPIs validated the pilot assumptions. The aggregated

results of the pilot’s Trial 2 are outlined in the Figure 10.

Figure 10: Pilot A1.1 aggregated findings

It is effectively shown that in certain cases (irrigation) the results exceeded the initial set

targets for input cost reduction. This is due to the fact that the farmers both: a) showed

collaborative spirit and adapted their farming practices using all advice offered and b) were

benefiting from the weather conditions (rainfalls during June, July 2019) and this reduced the



fresh water requirements during critical phenological stages. The aforementioned

phenomenon, was the underlying reason for slightly not reaching the targeted crop

protection goals. The farmers chose to conduct additional proactive sprays for securing their

production against threatening situations (e.g. fruit mucilage presence at the stage of swelling

in Veria pilot site). In terms of fertilization, the exhibited deviation (under-fertilization) is part

of the farmers’ overall strategy that derives from the fact that fertilization advices are offered

with a two-to-three-year application window. This allows them a window for taking

fertilization measures and is expected that this deviation will be acknowledged and

significantly shape the fertilization strategy over the next cultivating periods.

Components, datasets and pipelines

DataBio component deployment status

Component code

and name

Purpose for pilot Deployment status Component

location

C13.01

Neurocode (NP)

Neurocode allows

the creation of the

main pilot UIs in

order to be used

by the end-users

(farmer,

agronomists) and

offer smart

farming services

for optimal

decision making

deployed NP Servers

C13.03 GAIABus

DataSmart Real-

time streaming

Subcomponent

(NP)

Real-time data

stream monitoring

for NP’s GAIAtrons

Infrastructure

installed in all

three pilot sites

Real-time

validation of data

Real-time parsing

and cross-checking

deployed NP Servers



C19.01 Proton

(IBM)

Early warning

system for

pest/disease

management

using temporal

reasoning

(PROTON) for

olives, grapes and

peaches

deployed IBM’s lnx-

blue.sl.cloud9

.ibm.com

C04.02 – C04.04

Georocket,

Geotoolbox,

SmartVis3D

(Fraunhofer)

Back-end system

for Big Data

preparation,

handling fast

querying and

spatial

aggregations (data

courtesy of NP)

Front-end

application for

interactive data

visualization and

analytics

deployed Fraunhofer

Servers

Data Assets

Data Type Dataset Dataset

original source

Dataset

locatio

n

Volum

e (GB)

Velocity

(GB/year)

Sensor

measuremen

ts (numerical

data) and

metadata

(timestamps,

sensor id,

etc.)

Gaiasense

field. Dataset

composed of

measuremen

ts from NP’s

telemetric

IoT agro-

climate

stations

called

GAIATrons

NEUROPUBLIC GAIA

Cloud

(NP’s

servers)

Severa

l GBs

Configurable

collection and

transmission

rates for all

GAIATrons.

>20

GAIAtrons

fully

operational at

the pilot sites

collecting >



for the pilot

sites.

30MBs of data

per year each

with current

configuration

(measuremen

ts every 10

minutes)

EO products

in raster

format and

metadata

Dataset

comprised of

remote

sensing data

from the

Sentinel-2

optical

products (5

tiles)

ESA

(Copernicus

Data)

GAIA

Cloud

(NP’s

servers)

>6000 >1900

Exploitation and Evaluation of pilot results

Pilot exploitation based on results

NP and GAIA EPICHEIREIN have already launched on 2013 their Smart Farming program, called

“gaiasense” (http://www.gaiasense.gr/en/gaiasense-smart-farming), which aims to establish

a national wide network of telemetric stations with agri-sensors and use the data to create a

wide range of smart farming services for agricultural professionals.

Within DataBio, the quality of the provided services greatly benefited from the collaboration

with leading technological partners like IBM and Fraunhofer, which specialize in the analysis

of Big Data. Moreover, feedback from the end-users and lessons-learnt from the pilot

execution significantly fine-tuned and will continue to shape the suite of dedicated tools and

services, thus, facilitating the penetration of “gaiasense” in the Greek agri-food sector.

The success of the pilot was established by high profile events4 (Figure 11) and online articles5

that were promoting the findings of the pilot and consequently the wider adoption of Big Data

enabled smart farming advisory services in the next years.

The sustainability of NP’s DataBio-enhanced smart farming services, after the end of the

project is achieved through: a) the commercial launch and market growth of “gaiasense” and

b) the participation to other EU and national R&D initiatives. This will allow continuously

evolving/validating the outcomes of the project, by working with both new and existing (to

4 http://www.gaiasense.gr/en/a-greek-innovation-gaiasense-evolves 5 https://www.ypaithros.gr/en/yannis-olive-grove-reduction-by-30-in-production-costs-and-parallel-increase-of-sales/

http://www.gaiasense.gr/en/gaiasense-smart-farming

http://www.gaiasense.gr/en/a-greek-innovation-gaiasense-evolves

https://www.ypaithros.gr/en/yannis-olive-grove-reduction-by-30-in-production-costs-and-parallel-increase-of-sales/

https://www.ypaithros.gr/en/yannis-olive-grove-reduction-by-30-in-production-costs-and-parallel-increase-of-sales/



DataBio) user communities and applying its innovative approach to new and existing (again

to DataBio) areas/crops.

Figure 11: Representatives of E.C., Farm Europe and other participants of the pilot visit in Stimagka

KPIs

KPI

short

nam

e

KPI

description

Goal

descripti

on

Base

value

Target

value

Measur

ed value

Uni

t of

val

ue

Comment

A1.1

_1

Reduction

in the

average

cost of

spraying

per hectare

for the

three (3)

crop types

following

the

advisory

services at

a given

period.

Chalkidiki

(olive

trees):

250,

Stimagka

(grapes):

990,

Veria

(peaches)

: 810

Chalkidi

ki (olive

trees):

213,

Stimagk

a

(grapes)

: 955,

Veria

(peache

s): 770

Chalkidi

ki (olive

trees):

219,

Stimagk

a

(grapes)

: 963,

Veria

(peache

s): 781

eur

os/

ha

As a

consequen

ce of the

rainy June

and July

2019

months in

Greece,

proactive

sprays

were

conducted

to treat

mainly

fungal

diseases

(for

example in

Veria,

peaches

were



sprayed at

that time

mainly to

treat fruit

mucilage

at the

stage of

swelling)

A1.1

_2

Reduction

in the

average

number of

unnecessar

y sprays per

farm for

the three

(3) crop

types

following

the

advisory

services at

a given

period.

Chalkidiki

(olive

trees): 5

Stimagka

(grapes):

4

Veria

(peaches)

: 4

Chalkidi

ki (olive

trees): 1

Stimagk

a

(grapes)

: 1

Veria

(peache

s): 1

Chalkidi

ki (olive

trees):

1.4,

Stimagk

a

(grapes)

: 1.8,

Veria

(peache

s): 1.6

nu

mb

er

of

spr

ays

A1.1

_3

Reduction

in the

average

cost of

irrigation

per hectare

for the

three (3)

crop types

following

the

advisory

services at

a given

period.

Chalkidiki

(olive

trees):

330,

Stimagka

(grapes):

3030,

Veria

(peaches)

: 870

Chalkidi

ki (olive

trees):

230,

Stimagk

a

(grapes)

: 2130,

Veria

(peache

s): 610

Chalkidi

ki (olive

trees):

198,

Stimagk

a

(grapes)

: 2007,

Veria

(peache

s): 497

eur

os/

ha



A1.1

_4

Reduction

in the

amount of

fresh water

used per

hectare

following

the

advisory

services at

a given

period

Chalkidiki

(olive

trees):

817,

Stimagka

(grapes):

1868,

Veria

(peaches)

: 1703

Chalkidi

ki (olive

trees):

572,

Stimagk

a

(grapes)

: 1308,

Veria

(peache

s):1192

Chalkidi

ki (olive

trees):

492.4

Stimagk

a

(grapes)

: 1,232

Veria

(peache

s):

971.18

m3/

ha

Α

significant

reduction

in the cost

of

irrigation

has been

witnessed

that came

because of

the

farmers

following

the offered

Big Data

enabled

advisory

services

and of the

many and

heavy

rainfalls of

June and

July 2019

A1.1

_5

Reduction

in the

nitrogen

use per

hectare

following

the

advisory

services at

a given

period

Chalkidiki

(olive

trees):

230,

Veria

(peaches)

: 220

Chalkidi

ki (olive

trees):

210,

Veria

(peache

s): 140

Chalkidi

ki (olive

trees):

161

Veria

(peache

s): 61.83

kg/

ha

Α1.1

_6

Quantify %

divergence

in the cost

of the

applied

Chalkidiki

(olive

trees): -

40 (under

fertilizati

Chalkidi

ki (olive

trees): -

14,

Veria

Chalkidi

ki (olive

trees): -

11.27

%/h

a



fertilization

strategy

compared

to best

practices

per hectare

(agronomis

t advice)

on), Veria

(peaches)

: +20

(peache

s): +7

(under

fertilizat

ion)

Veria

(peache

s): - 44

A1.1

_7

Increase in

production

Chalkidiki

(olive

trees):

10375,

Stimagka

(grapes):

17117,

Veria

(peaches)

: 49825

Chalkidi

ki (olive

trees):

11205,

Stimagk

a

(grapes)

: 18436,

Veria

(peache

s):

53811

Chalkidi

ki (olive

trees):

7,010

Stimagk

a

(grapes)

: 18,011

Veria

(peache

s):

52,044

kg/

ha



4 Pilot 2 [A1.2] Precision agriculture in vegetable seed crops

Pilot overview

The pilot’s main goal is to monitor the maturity of seed crops of different species with satellite

imagery to support the decision making of farmers and fieldsmen in assessing maturity of

seed crops and optimal harvesting time in order to achieve maximum quality of their

production. On-site observation of crop development and harvest date will be matched with

information derived from satellite images.


Preparatory Stage

In first growing season (2017) the crop that was monitored was the sugar beet for seed

production, with the aim to tune EO with “in situ” crop monitoring and establish a

correspondence between the empiric assessment and the parameters derived from the

satellite sensors. In case of positive feedback, the trial would be expanded to a wider range

of seed crops in the next stage.

In May 2017 five sugar beet fields (14,79 hectares in total) located in the Region Emilia

Romagna were selected by CAC seeds for the purpose of the trial. To monitor the fields under

the scope of this project the web application WatchITgrow® was used. This application was

initially developed by VITO for potato monitoring and yield prediction in Belgium and adapted

in DataBio WP5 to be able to monitor other crops (sugar beets in this case) in other regions

(Italy).

Crop monitoring was performed with Sentinel-2 satellite images. From Sentinel-2 satellite

data “greenness” maps of the target-fields were derived throughout the season. These

“greenness” maps are actually showing the fraction of absorbed photosynthetically active

radiation, a measure of the crop’s primary productivity. fAPAR is often used as an indicator of

the state and evolution of crop cover. Low fAPAR values indicate that there is no crop growing

on the field (bare soil, fAPAR=0). When the crop emerges, the index will increase until the

crop has reached the maximum growing activity (fAPAR=95-100%); then its values will

decrease again until harvest. From this “crop growth curve”, information on phenology and

crop development can be retrieved and a model can be designed to decide on the right

moment for harvesting.

The results of the first trials were promising:

• Differences in maturity between sugar beet fields and variability within individual

fields were well visible from satellite greenness index maps.

• Analysis of the growth curve and discussions with the fieldsmen made CAC seeds and

VITO confident that the greenness index can be used to check when the sugar beet

seeds are ready to be harvested.



Based on these promising results for sugar beets it was decided to extend the EO and the in

situ monitoring in the growing season 2018 to a larger number of sugar beet seed production

fields and to include new seed crops into the trial.

Trial 1 execution and results

In 2018 the EO and field monitoring was extended to approximately 90 fields of seed crops

(Figure 12). The main part of these fields were sugar beet fields. The scope of the sugar beet

monitoring in 2018 was to confirm the correlation between the fAPAR “greenness” index and

seed maturity, which appeared to be rather confident in the preliminary stage in 2017.

Furthermore, the observation was extended to several other seed crops to assess if the index

could be used to assess the right maturity stage and consequent harvesting operations

instead of the empirical methods used by farmers or the experience of the fieldsmen.

Figure 12: A1.2 field locations in 2018 monitoring program

The following crops were monitored in 2018:

• sugar beets – 61 fields

• onion – 5 fields, located in two different Provinces with different environmental

conditions

• cabbage – 5 fields, located same as onions

• sunflower – 16 fields, located in the same area as sugar beet

• alfa alfa – 3 fields

• soybean – 2 fields



While monitoring these fields, especially onion and cabbage which are early maturing,

problems were encountered, due to the unpredictable weather conditions in production

areas during late spring and early summer. The high number of cloudy days prevented the

fieldsmen to have access to the images during their field checks; hence, their reports and

checks were not influenced by the satellite data.

For cabbage the index is much more difficult to match with the harvesting dates decided by

fieldsmen. The curve gets its peak during the winter and decreases until the outset of

blooming in spring. Besides, different curves from different fields and different areas were

acquired. Probably the index is affected by plant density – for which there are different

recommendations according to the variety – and by differences in the ratio between female

and male lines (note that the male lines are destroyed after the flowering).

Concerning onion fields similar problems as for cabbage were encountered: high

heterogeneity of the greenness curves with respect to the harvesting dates decided by

fieldsmen was acquired.

For the reasons these two species were excluded from EO program in Trial 2.

The greenness curves resulting from the monitoring of the sunflower fields appeared more

reliable. The studied index followed closely the growth of the plants and tended to replicate

in all fields that were monitored. Harvesting of sunflower seeds was generally postponed for

a few days after the greenness index reaches its minimum at the end of August - September

(decreasing part of the greenness curve). This corresponds to the actual field practices: for

sunflowers harvesting operations are carried out after the seed maturity. The reason is that

the plants are left to dry in the field before they are placed into the combine, in order to ease

the threshing operations and to easily separate the seed from the heads.

Two fields of soybeans were introduced in the pilot, as they were close to the monitored fields

of sugar beet and sunflower. The resulting fAPAR curves appeared to be quite reliable and it

was decided to monitor this crop at a larger scale in Trial 2 and set up a model for estimating

the optimal harvest date according to the fAPAR index.

Overall, 61 fields of sugar beet were monitored in 2018. From the comparison of the fAPAR

curves and the harvesting dates assessed by fieldsmen, it was found that the average fAPAR

value at harvest was 0,39.

While field assessment had been carried without controlling the index, not all fields were

harvested at the exact index value. The germination rate of the seed lots harvested was

compared to the harvesting date, to check if harvesting at lower or higher fAPAR values –

especially for those lots harvested in advance – is correlated with a difference in germination;

yet, no significant differences were observed.




Trial 2 timeline

Trial 2 covered the 2019 growing season.


Trial 2 focused on crops which in Trial 1 and in the preparatory stage showed reliable response

to seed maturity assessment parameters derived from satellite data. Sugar beets, sunflower

and soybean were the crops monitored during season 2019, from sowing/transplanting until

harvesting.

As in Trial 1, fields were periodically monitored on site by the fieldsmen, which reported the

main growing stage of each crop and assessed the timing for harvesting in the traditional way.

The same fields were monitored with EO through the web application WatchITgrow®

developed by VITO (Figure 13).

Figure 13: WatchITgrow® screenshot of the “field dashboard”

The fields monitored were localised on the map and a field polygon was drawn by fieldsmen

according to on-site inspection. The application also allowed adding field data reported by

fieldsmen during their periodical visits.

The scale of monitoring was increased for sugar beet (aprox. 250 ha monitored) and for

soybean (aprox. 600 ha monitored), while for sunflower it was restricted to a few fields just

to tune the curve to the real seed maturity.

For each crop the field data were combined with satellite data to set up a robust model for

seed maturity assessment.

Trial 2 results

In 2019 CAC seeds monitored 77 sugar beet fields and 41 soybean fields with WatchITgrow®

application. Sentinel-2 satellite images provided information on the greenness and health of

the crops. From the greenness (fAPAR) curves (Figure 14) the optimal harvest date of the



sugar beet and soybean seeds was estimated in near realtime, using the maturity model

developed in 2018.

Figure 14: “Greenness” fAPAR curve

The objectives for the 2019 season were:

• for sugar beets:

o To validate the maturity model which was trained with data from 2018 with

data from 2019

o To use fused Sentinel-1 and -2 satellite images as input for harvest date

estimation and check the impact on the accuracy of the harvest date estimates

o To check the performance and the forecasting ability of the maturity model by

determining the accuracy of the harvest date forecasts at different moment

during the harvesting period

• for soybeans:

o To develop a maturity model for this crop, similar to the model developed for

sugar beets.

Results for sugar beets

Test 1: validate the approach for maturity assessment using the 2019 dataset

From the analysis of the 2018 dataset it was found that the “optimal harvest date”

corresponded to the date in the period at which the Sentinel-2 derived fAPAR reaches 0,4.

Figure 15 shows the estimated vs. actual harvest date for the sugar beets fields that were

monitored in 2019. Compared to 2018 the correlations are much lower in 2019 (R² = 0,20 vs

R² = 0,78 in 2018). This can partly be explained by the growing conditions in the summer of

2019 which were not optimal.

The fAPAR values at harvest showed a much larger variation in 2019 (Figure 16).



Figure 15: Correlation between the harvest date for sugar beet seeds in 2019 estimated from Sentinel-2 images (date with fAPAR = 0,4) and the actual harvest date recorded by CAC seeds

Figure 16: fAPAR values at harvest for 2019

Test 2: use of improved fAPAR time series

In summer 2019 the weather conditions at harvesting (July) were not optimal as there were

a lot of cloudy days. Optical satellites such as Sentinel-2 are unable to look through clouds.

This resulted in cloud-induced gaps in observations. When cloud free observations are lacking

for several weeks interpolation or smoothing techniques cannot bring a solution anymore.

The CropSAR technology developed by VITO provides a way to keep on monitoring crop

growth and development, independent of weather conditions. CropSAR relies on



observations made by Sentinel-1, a constellation of two radar satellites. Even though optical

and radar sensors see completely different things, their measurements are nevertheless

correlated, as both hold information on the vegetation status. It is exactly that correlation

that CropSAR exploits to fill in the cloud-induced gaps in Sentinel-2’s optical measurements.

As the CropSAR fAPAR values are slightly lower than the original fAPAR values, the threshold

for harvest date estimation was set at 0,36 (instead of 0,40). The results are shown in Figure

17. In a season such as 2019 with suboptimal weather conditions the correlation between the

actual and estimated harvest dates drastically increase when CropSAR fAPAR is used (R² =

0,43 compared to R² = 0,20 with original fAPAR inputs). In 2018 weather conditions were

much better. Correlations between actual and estimated harvest dates are comparable

whether original or CropSAR fAPAR inputs are used (R² = 0,71 for CropSAR vs R² = 0,78).

When combining 2018 and 2019 (in total 138 fields) correlations further increased to R²=0,99.

Figure 18 shows the error (in days) over all fields. For 44% of the fields the harvest date is

estimated with an accuracy of +/- 1 day, for 61% of the fields the accuracy amounts +/- 2 days.

Figure 17: Correlation between the harvest date for sugar beet seeds in 2018 (left) and 2019 (right) estimated from fused Sentinel-1 and Sentinel-2 images (date with cropsar fAPAR = 0,36) and the actual harvest date recorded by CAC seeds

Figure 18: Error of harvest date estimation, in days, for 2018 and 2019 (138 fields)



Test 3: performance of the maturity model

Based on the assumption that the optimal harvest date corresponds to the date in the period

[15 June - 1 August] at which the CropSAR fAPAR value reaches the threshold of 0,36, a

“maturity model” was developed to estimate this date. The descending part (the slope) of the

CropSAR fAPAR curve was checked on a daily basis. A simple linear equation over 5 days was

used to forecast the date at which the fAPAR threshold of 0,36 would be reached.

To assess the performance of the maturity model, it was run on the full seasonal time series

of CropSAR fAPAR values (1 February – 15 August 2019) and the resulting harvest date

estimates were compared with the actual harvest dates. The results are presented in Figure

19. For both seasons (2018 and 2019) the correlations are lower than when a simple threshold

(fAPAR = 0,36) is used to estimate the harvest date (R² = 0,58 vs 0,71 for 2018 and R² = 0,19

vs 0,43 in 2019).

Figure 19: Correlation between the harvest date for sugar beet seeds in 2018 (left) and 2019 (right) estimated from fused Sentinel-1 and Sentinel-2 images (CropSAR fAPAR) on 15 August (full season) and the actual harvest date recorded by CAC seeds



Test 4: forecasting ability of the maturity model

Finally, the forecasting ability of the maturity model was evaluated. Harvest dates were

estimated several times from the start of the harvest period end of June until the end of the

harvest period mid-August. Each time, the estimated harvest date was compared with the

actual harvest date. The resulting R² values are presented in Figure 20. Overall, the forecasting

ability of the current (linear) model is rather low.

Figure 20: Correlation (R² value) between the estimated and actual harvest dates at different times before harvest in 2018 (blue) and 2019 (green)

Results for soybeans

For soybeans the harvest date was estimated in a similar way as for sugar beets. Threshold of

fAPAR values were defined for “original fAPAR” (set at 0,23 for soybeans) and “CropSAR

fAPAR” values (at 0,18) based on the actual harvest dates of the 41 soybean fields that were

monitored in 2019. As shown in Figure 21, the correlations between the estimated and actual

harvest dates were significantly higher when CropSAR fAPAR input was used (R² = 0,49 vs 0,35

for original fAPAR input).

Figure 21: Correlation between the harvest date for sugar beet seeds in 2019 estimated from (left) original Sentinel-2 images (date with fAPAR = 0,23) and (right) fused Sentinel-1 and Sentinel-2 images (date with CropSAR fAPAR = 0,18) and the actual harvest date recorded by CAC seeds



Figure 22 shows the error of the harvest date estimation (in days) over all fields. For 49% of

the fields the harvest date is estimated with an accuracy of +/- 2 days, for 72% of the fields

the accuracy amounts +/- 3 days.

Figure 22: Error of harvest date estimation for soybeans, in days, for 2019 (41 fields)

To estimate the harvest date of the soybeans a “maturity model” was developed following

the same (linear) approach as for sugar beets. The performance of the model was checked by

comparing the estimated harvest dates, derived from a full seasonal time series of CropSAR

fAPAR values, with the actual harvest dates recorded by CAC seeds (Figure 23). The

correlations obtained with the model (R² = 0,53) were similar to the correlations obtained

when a simple threshold (0,18) was used to estimate the harvest date (R² = 0,49).

Figure 23: Correlation between the harvest date for soybeans in 2019 estimated from fused Sentinel-1 and Sentinel-2 images (CropSAR fAPAR) on 20 October (full season) and the actual harvest date recorded by CAC seeds



Results for sunflowers

The harvest of sunflower is conditioned by the status of the plants as the stems and the heads

need to dry perfectly before harvesting, so as the seeds will be released without damage

(Figure 24). Hence, the maturity of the seed does not correspond to the ideal stage of

combining. In contrast to sugar beets, sunflower cannot be cut and let dry in the field, so the

fAPAR index is not helpful for assessing harvesting time. Nevertheless, we took advantage of

tools developed during the project to assess the different maturity stages in relation to the

moisture of the seed.

In 2019 three sunflower fields were monitored for five weeks during maturation; samples of

the heads containing the seeds were taken from each field and brought to the CAC’s

laboratory where the seed was removed from the heads and tested for moisture and

germination. On each day of sampling the fAPAR index of each field was recorded.

The fields showed a progress in germination correlated with a reduction of moisture and

fAPAR index, as expected. The value of index at which the seed reached full germination in

the three varieties monitored was approx. 0,20, however this has been considered as a

preliminary test. Further investigation in additional fields would be desirable to set up a rule.

Figure 24: Sunflower field at harvesting stage

Sugar beets and soybeans: conclusions and possible improvements

From the pilots of sugar beets and soybeans in 2018 and 2019 it was found that, since optical

satellites are unable to look through clouds, the use of the index showing the fraction of

absorbed photosynthetically active radiation derived from Sentinel-2 images has limited

accuracy in cloudy days. If clouds persist for several days, the fieldsmen are “blind” and the

advantage of the tool fades. Introduction of fused indexes based on optical and radar data

can overcome this problem.



To get an optimal response from EO, however, fieldsmen have to draw the polygons

representing the fields with accuracy. The “pixel” reported by satellites of 10m x 10m can

distort the index in case ditches, side roads or fractions of neighbouring fields are included in

the polygon.

The final conclusion therefore is:

• It is possible to estimate the optimal harvest date from the fAPAR curve with a

moderate to high accuracy when using fAPAR threshold values.

• The accuracy of the harvest date estimation increases when CropSAR and fAPAR

values are used, especially in cloudy periods.

• The maturity model that is currently used to forecast the harvest date (simple linear

approach to estimate the date that the fAPAR threshold is reached) is not accurate

enough.


The pilot uses C08.02 Proba-V MEP EO component for processing, analysing and visualizing

the Sentinel-2 fAPAR data.


Component code

and name


location

C08.02 (Proba-V

MEP)

Sentinel-2

processing,

dashboards,

services for viewing

and time series

extraction

Adapted according to the

needs of pilot A1.2

Proba-V MEP

at VITO



4.4.3 Data Assets


original

source

Dataset

location

Volume

(GB)

Velocity

(GB/year)

EO data Sentinel-2

processed data

(raw data ->

fAPAR)

ESA Proba-V

MEP at

VITO

2630 GB 1850 GB



The performance of the maturity models could be improved by using more advanced

modelling techniques such as curve fitting, or by using machine learning techniques to predict

fAPAR values. The use of meteo data (rainfall, temperature) as additional input for maturity

modelling may also improve the accuracy of the harvest date estimation.

To enhance the reliability of the model is necessary to continue with EO adding more data to

the model and checking with on-site reports the factors which can distort the parameters.

The usability of the tool also has to be further improved in terms of speed and user

friendliness; fieldsmen are often out of their office and they need to get the platform adapted

to mobile application with easy access and easy handling.

KPIs

During the stages of the project KPI could not be measured, but just estimated.

In effect the fieldsmen did not spare any travel but, on the contrary, they had to drive more

and make more reports to collect the information needed to support the project.

The advantages of having a reliable support in assessment of maturity of the seed crops can

be estimated in: Reduced number of visits to the fields close to maturity stage, Increased

efficiency in assisting the growers in harvesting operations and increased efficiency in

warehouse planning and logistics.



KPI

short

name

KPI

descrip

tion

Goal

description

Base

value

Target

value

Measured

value

Unit

of

value

Comment

KMT Numb

er of

km

driven

by car

Reduced

km driven

100%

(actu

al

total

yearly

journ

ey)

85% Estimated

reduction

of 15% of

the km

driven by

fieldsmen

using the

tool

NOF Numb

er of

farms

contro

lled by

each

Fields

man

Increase of

the

number of

farms

controlled

100%

(actu

al

numb

er of

farms

)

120% Estimated

potential

increase

of

efficiency

due to

the tool

The outcome of the pilot confirms that satellite-driven technology in agriculture can be used

not only for assisted drive of tractors. Joining EO with IoT and sensors is the future of

agriculture.

Farmers are by nature conservative, but the development of the new technologies is going to

rapidly change the future of agriculture. The introduction of tools and devices for the control

of harvesting operations contributes in making operators aware of the importance of being

“on the spot”, ready to take advantage of the innovations that IT offers in this very traditional

sector. The dissemination of this awareness can be considered – besides the expected

performances of KPI - one of the goals of this pilot.



5 Pilot 3 [A1.3] Precision agriculture in vegetables_2 (Potatoes)

Pilot overview

The product developed by NB Advies with the help of VITO is a system to generate ‘vigor’

maps for potato growers in the Netherlands, using Earth observation and weather data

sources combined with field information. The maps are included into an online platform for

monitoring and early warning of inhomogeneity. Yield prediction data can be made available

in an early stage of the growing season, though the accuracy is not sufficient due to the lack

of reliable training data.


For the Trial 1 in 2018 the Sentinel data are being systematically processed for visualisation

in the app. There is ongoing work on the improvement of the cloud coverage issues

(smoothing, data fusion) in WP5.

It was intended that daily weather updates from KNMI (Dutch weather services) would be

added for aggregated visualisation in the app. Unfortunately, this service stopped providing

data in February 2018. A group of 10 farmers were selected for the first trial, providing

detailed data about their crops, like the variety, the plant date and their mid- and end season

yield data.

Preliminary results are visualisation of fAPAR (biomass index) from Sentinel 2 EO data of the

area of interest, presenting new imagery every 5-10 days (if cloud coverage permits). The

WatchItGrow® app can be used by the farmers for data entry of parcel information, like crop

variety, plant date etc. Graphics of fAPAR development over time per parcel and compared

to similar parcels in the surrounding area are shown by the pilot (Figures 25 – 28).

Figure 25: Processed Sentinel data into Greenness; available for the growing season (A1.3)



Figure 26: Greenness graph during growing season (A1.3)

Weather information graphics of weather data sequence, stating temperature and

precipitation were added to the interface. Data were also used in several demonstrations,

e.g. the impact of the drought in summer 2018 and the impact of irrigation (center pivot) for

mitigating the drought.

Figure 27: Image demonstrating drought in Summer 2018 from Sentinel data (A1.3)

Data were also used in a preliminary study on the impact of greenland management on the

resilience of the grassland against climatic change impacts like drought and intense rainfall.

Figure 28: Analysis of greenland management based on the greenness from Sentinel data (A1.3)



In Trial 1 a general service based on the WatchItGrow® web application was made available

to the farmers. From the feedback by the farmers we could conclude some suggestions:

• fAPAR data are hard to interpret and understand by the farmers. The maps were

useful for showing the inhomogeneity but were not actionable data. Maps using LAI

(Leaf area index) are better to understand by the farmers.

• Maps should give more insight in the actual situation compared to the potential of the

field and crop growth in values relative to the potential.

• Farmers are not willing to visit a website in order to find whether new EO images are

available; an alert service should warn them only when their action is required.


Trial 2 timeline

January - June 2019: Collecting historical data (2017-2018) for a preliminary analysis and

comparison of different crop models, preparing the gathering and processing current year’s

data in a crop growth model.

June - October 2019: Running the prototype with group of farmers, comparison of model

results and EO field data and reports for the farmer.


In preparation of Trial 2 the use of the crop growth model WOFOST (WOrld FOod STudies)

was introduced. A decision support system was created using simulated potential and water

limited crop growth based on weather and soil parameters, respectively (Figure 29).



Figure 29: Concept of a simple (starch) potato DSS

Soil, crop and weather data from field measurements, satellites, weather stations, literature

and other sources were collected and, after pre-processing and storage in a database, were

used as input in a crop growth model. The model then establishes the benchmark crop

performance: an estimation of the best possible performance under the given set of

circumstances. For the calibration, model data are compared with historical EO data.

The collected datasets include:

• Soil characteristics map BOFEK2012 spatial dataset for the Netherlands with soil

physical units, representing areas of corresponding soil structure and hydrological

behaviour (Figure 30)



Figure 30: Map of soil characteristics for the Netherlands

• Weather data (temperature, precipitation, radiance, evapotranspiration) of different

KNMI weather stations (Figures 31, 32; example growing season average temperature

and daily sum precipitation) measured daily. For each field the nearest weather

station was selected.

Figure 31: Weather data (precipitation per day vs temperature) from weather stations



Figure 32: Weather data (precipitation) from weather stations

• Soil moisture sensors (Figure 33, example one month); measured once per hour. In

each of the pilot fields, soil sensors (IoT) were installed to record soil moisture data.

Figure 33: Soil moisture sensors

Input data for the model were collected and transformed into the WOFOST format.

Trial 2 execution

The pilot aims to create a Big Data analysis platform for farmers based on Sentinel-2 data, as

a DSS system that will provide benchmark information of simulated potential and water

limited crop growth, in order to get a higher yield (in dry matter) at lower costs.

The study area is located in the region of Veenkolonien (ca. 51.000 ha) in Northern

Netherlands. This area is characterized by large scale arable farms. In 2007 already 37% of the

farmers were >100 ha in size and this number is growing.



Figure 34: A1.3 general location

In the pilot stage 2, eleven (11) farmers selected one of the fields on their farm, gathering in

total 111 ha (Figure 35).

Figure 35: Farm areas selected for the pilot A1.3



Online platform

The objective was to create an online platform for farmers for crop monitoring and

benchmarking, showing the in-field variation. Sentinel-2 satellite images are very helpful for

crop monitoring over large areas; yet for use in a DSS it is more useful to show just the field

information and not the complete images (Figure 36).

Figure 36: Online platform for crop monitoring and benchmarking

Crop growth model

The following Big Data sources were processed:

• Daily measured weather data (temperature, precipitation, radiance,

evapotranspiration) of different KNMI weather stations

• Soil characteristics map according to the BOFEK2012 classification, representing areas

of corresponding soil structure and hydrological behaviour

• Hourly measured soil moisture sensors

• Sentinel-2 with an average interval of 5 days

In order to benchmark crop performances, the WOFOST crop growth model was introduced

and was calibrated using historical (2017, 2018) and recent samples.

Processing of images refers to:

• Applying cloud mask, and cloud-shadow mask

• Calculating a-factor (nir soil / red/soil) for WDVI, based on bare soil

• WDVI=NIR - (nir bare soil/red bare soil) * RED

• Calculating WDVI from spectral data

• Calculating LAI for potato fields based on WDVI-LAI correlation data (Figure 37).



Figure 37: LAI-WDVI polynomial regression model for spring potatoes achieving high r2. doi: 10.1117/12.2029099

UAV spectral data

In cooperation with a potato breeding farm several crop index data were gathered for the

varieties that were also planted by the farmers. The layout of the test plots is presented in

Figure 38. The trial fields were monitored by UAV (Unmanned Aerial Vehicles) once a month

(June, July and August) gathering multi spectral data (Figure 39). By processing the UAV data

multiple crop indices, including yield potential, were calculated for each plot. Different

varieties are known to have different phenological development. From the average crop index

values for each variety significant differences in crop development between the varieties

were expected; differences were observed, but they were not significant. This may be due to

the weather, which was out of the ordinary in 2019, which might have dominated the crop

development.



Figure 38: Potato trial fields

Figure 39: UAV spectral image (Red Edge NDVI -index) image taken 25 June 2019



Figure 40: Monitoring of trial fields during July and August

Figure 41: Performance of yield potential (mean values vs date)



Trial 2 results

Crop monitoring

One of the issues after Trial 1 was that fAPAR data are hard to interpret and understand by

the farmers. The fAPAR maps were useful for showing the inhomogeneity. The online

platform shows the variability in LAI (Leaf Area Index). This index represents the area

intercepting the solar radiation for crop growth and thus, maps using LAI are more

understandable by the farmers.

The variability in the field indicates the area that need attention in the sense of limiting

factors, which may be soil characteristics, water, fertilizer or pests. For each of the pilot fields

the crop monitoring data were provided in the online platform, as presented in Figure 42,

expressed in LAI for June-September 2019. The farmers received an email alert when new

processed images were available.

This platform provided valuable information for farmers to inform them about:

• the in-field-variation and areas for inspection and site-specific management

• relative performance of their field compared to the surrounding fields

• relative performance of their field compared to the potential

• the need for irrigation (combined with soil moisture data) (Figure 43)



June 2019

LAI

July 2019

August 2019

September 2019

Figure 42: Crop monitoring expressing variability in LAI



Figure 43: Soil moisture and LAI index data for the pilot fields

Yield prediction

In general, the water-limited growth model underestimates the yield and the potential

compared to the samples (Figures 44 – 46).

• The data available for validation of the WOFOST model proved to be quite limiting the

results.

• Only for 2 years data were available for comparison of model data and data from

Sentinel-2

• Only 1 year (2018) of field data with location information about the parcel were

available

• Weather conditions in 2018 and 2019 were quite out of the ordinary

• Yield differences between different varieties influenced the calibration results more

than anticipated

• The water limiting effect was quite significant, but soil moisture data for previous

years were not available

Due to limited data availability, the algorithm is not sufficiently trained for reliable yield

predictions.



The potential yield prediction (dry matter) based on the weather data of the last 10 years

shows the relative differences between the years, but largely overestimates the yield at

harvest time.

Figure 44: Prediction dry matter, beginning of July 2019

Figure 45: Data for the water-limited growth model



Figure 46: Water limited crop growth model without groundwater

Comparison of the model prediction to the actual samples taken in the fields show the same

trend for the beginning of July and for harvest-time (mid-September); an over-estimation of

the potential and under-estimation of the water limited model calculations for the pilot fields.

Both in dry matter and total yield.

Figure 47: Dry matter and total yield for pilot fields during the beginning of July and harvest time

Yield improvement

It is known that the best conditions for high yields in a field are created during spring. That is

having crop emergence at the beginning of May and full crop coverage by the 10th of June,

which should remain until the end of August. Moreover, full water supplies are essential for

retaining this curve.

In the current pilot the effect of later seeding date and subsequently later crop emergence

data were tested.



Figure 48: Potential crop production (A1.3)

Seeding date vary from April 10th to May 8th resulting in differences in dry matter in

potatoes, ranging between 2.9 – 5.3 ton/ha on August 8th. This underlines the known rule

that yield improvement is best implemented during spring.

The upward trend of the yield prediction from the samples in July point towards the objectives

getting within reach.

Figure 49: A1.3 samples




From the current pilot, several datasets were produced:

• Sentinel-2 images

• KNMI Weather data (solar radiation, temperature, precipitation) based on the station

closest to the field

• Multispectral drone data (for potato-variety specific vegetation index data)

• Field data from farmers (field location, planting data, potato variety, irrigation data)

• BOFEK2012, spatial dataset for the Netherlands with soil physical units, representing

areas of corresponding soil structure and hydrological behaviour

Components:

• The WOFOST crop growth model was used to determine the reference crop growth

for benchmarking the actual crop growth from the Sentinel-images with the potential

crop growth and yield prediction per field based on the actual weather data.

• For Trial 2 additional algorithms were developed to automate the search and retrieval

of Sentinel-2 images. The images are filtered on maximum cloud coverage and clipped

to the farmers’ fields to focus on relevant parts of the images. For the purpose of the

pilot additional vegetation indices NDVI, WDVI and LAI (potatoes) are calculated. In

addition, cloud masks and (experimental) cloud shadow masks are applied.

• A script was created to retrieve the weather data from the KNMI (Dutch weather

service) and transform them into a valid format for the WOFOST crop growth model


Component

code and

name

Purpose for pilot Deployment

status

Compone

nt location

C08.02

(Proba-V

MEP)

Sentinel-2 processing, dashboards,

services for viewing and time series

extraction

Tested during

Trial 1

Proba-V

MEP at

VITO



Data Assets

Data

Type

Dataset Dataset

original

source

Dataset location Volu

me

(GB)

Veloci

ty

(GB/y

ear)

EO data Sentinel-2

processed

data (raw

data ->

faPAR)

ESA Proba-V MEP at VITO 2630

GB

1850

GB

Raster BOFEK2012 WUR https://www.wur.nl/nl/show/Bo

demfysische-Eenhedenkaart-

BOFEK2012.htm

<1 GB

Vector LPIS Georegister https://geodata.nationaalgeoregi

ster.nl/brpgewaspercelen

<1 GB

Vector Soil

moisture

IOT https://monitor.sensoterra.com/

login

< 1GB

Raster UAV

Spectral

data

NB Advies local <1 GB

Alpha

numeric

Weather

data

KNMI https://data.knmi.nl/datasets/ra

dar_corr_accum_24h/1.0

<1 GB

Alpha

numeric

Field data Farmer local < 1GB



New Business opportunities can be found in:

• Implementing the yield prediction model that was tested in the pilot with AVEBE, but

also with other potato processing cooperatives.

• Implementing, with other partners in the Netherlands, the farmer decision support

system. This may be the processing cooperatives, but also other stakeholders.

https://www.wur.nl/nl/show/Bodemfysische-Eenhedenkaart-BOFEK2012.htm



https://geodata.nationaalgeoregister.nl/brpgewaspercelen/wms?request=GetCapabilities&service=wms

https://geodata.nationaalgeoregister.nl/brpgewaspercelen/wms?request=GetCapabilities&service=wms

https://monitor.sensoterra.com/login

https://monitor.sensoterra.com/login

https://data.knmi.nl/datasets/radar_corr_accum_24h/1.0




• Elaborating on the potato growth model to create new services like variable rate

application and irrigation planning.

KPIs

KPI

description

Goal description Base

value

Target

value

Measur

ed

value

Unit of

value

Comment

No of

farmers

reached in

demonstrati

ons

In order to get

farmers committed

to invest in or start

using Big Data

applications they

need to be aware of

the opportunities

for their operation.

0 250 50 Number

of

farmers

During the

pilot the

quality of the

results were

limiting the

involvement

of more

farmers.

No of

agricultural

organisation

s involved

Agricultural

organisations are

providers of services

and knowledge

transfer. They need

to be involved to

motivate farmers to

adaption.

0 4 1 Number

of

organisa

tions

Averis

No of app

builders

reached or

involved

The pilot is just the

first step in getting

Big Data

applications across

to farmers. To

spread the use of Big

Data app builders

need to be involved

to build new

applications

0 5 1 Number

of app

builders

Fieldfromspac

e.nl



No of

proposals

for change

The basic

application needs

will be extended and

improved based on

the users’ needs.

The more proposals

for change, the

more lively the user-

community proves

to be.

0 10 n/a No fo

RFC

Not

commercially

available yet

No of

registered

farmers

The number of

registered users is

an indicator of

effectiveness and

usefulness of the

pilot

0 50 n/a No of

farmers

Not

commercially

available yet

No of

additional

use cases

The number of use

cases implemented

is an indicator of

effectiveness and

usefulness of the

pilot

0 10 3 No of

use

cases

Online

Platform

Crop

inspection

Crop

benchmarking

No of

planned

projects

Future

implementations of

the Big Data

applications could

be enhanced in

future projects.

0 2 1 No of

projects

Fieldfromspac

e.nl

No of

positive

responses

Stakeholders will be

interviewed on the

project results. The

average response

should be above

neutral to be

accounted for as a

positive response.

0 65% ? % of

respond

ents

Responses

from farmers

of pilot fields



Starch per

ha

Realizing the 20-15-

10 goals6

13.77 15 5.6 -

11.9

tons / ha Due to

unfavourable

weather

conditions in

2019;

Upward trend

from 2013

Variable

costs per

100 kg

starch

Realizing the 20-15-

10 goals8

12,59 10 - € / 100

kg starch

Input data not

available

More

reliable

yield data

Currently the yield

predictions are

based on sampling

in July and

September.

Increasing the

accurateness of the

prediction based on

the Big Data

implementation will

be a benefit for the

sales team.

< 5% < 4% n/a %

deviatio

n from

total

realised

yield

Due to limited

data

availability the

method could

be tested, but

the algorithm

is not

sufficiently

trained

Starch

content

The starch content

of the potatoes is an

indicator for the

quality. Although

the starch content

may vary from

potato varieties, the

average starch

content should be

around 20%

? 20% 20.1% % starch

content

20.1% at

harvest-time;

21.4% at 1st of

September

6 Avebe project 20-15-10; goals set for 2020. 7 Reference: average value 13,7 tons in 2012. 8 Avebe project 20-15-10; goals set for 2020. 9 Reference: average cost €12 - €13 in 2012.



Overall, the target of farmers is always the improvement of their yield and/or reduction of

cost. In this pilot we focused on the yield (both yield prediction and yield improvement),

because field data about the inputs were not available.

When application of fertilizers and pesticides are becoming more time- and site specific

according to the crop monitoring data, the inputs will decrease in the future.

Expected trends:

Higher harvested quantity / Fertilizer

consumption

Over a longer period, this is the trend, but

during the pilot period this could not be

demonstrated

Higher harvested quantity / Pesticide

consumption

In potatoes, the pesticide use is

predominated by the (un)favourable

weather conditions for Phytophthora. A

higher yield may come with a higher crop

protection due to more rain, which is

favourable for crop growth, but also for

Phytophthora

Higher harvested quantity / Irrigation

water quantity

The irrigated area has increased, resulting

in a higher yield of approx. 5 ton dry

matter/ha, depending on the soil, irrigation

intensity etc. A trend to a better irrigation

efficiency is not known.

Higher harvested quantity / land sq mt Over a longer period, this is the trend, but

during the pilot period this could not be

demonstrated

Higher employee productivity (Revenues /

Employee)

Higher productivity is expected, but not

demonstrated yet

Higher revenues This the objective of Big Data Analysis. In

the short term of the pilot and

unfavourable weather conditions, this

could not be demonstrated.

ROI ROI on Big Data Analysis including data

collection and processing is hard to

demonstrate because the lack of

convincing data of higher yields



Lower quality deviations The trend is that the lower quality is

disappearing and thus deviations are

getting smaller. Due to DSS this will

strengthen

Higher data usage Data usages is rising. More data is collected

from harvest machines, UAV, satellites and

sensors. Farmers take more data-driven

decisions and apply site-specific

management.

Higher data quality Data quality is becoming more an issue,

now more data is available. Cross

referencing different data sources provide

more insight about the good/bad quality of

the data. This will lead the way to better

data quality



6 Pilot 4 [A2.1] Big data management in greenhouse eco-system

Pilot overview

The pilot A2.1 was designed to implement Genomics Prediction Models (Genomic Selection -

GS) as a solution to technological limitations met with current breeding approaches. Indeed,

phenotypic selection (PS) and marker-assisted selection (MAS) breeding strategies represent

modern approaches upon which world agriculture have relied upon heavily. Although PS

allowed early green revolution in the mid-twentieth century, it is by now recognized that its

contribution has reached a plateau. On the other hand, thousands of marker-trait

associations uncovered in the MAS process have not been routinely exploited mainly due to

intrinsic limitations of this technology. It is out of this context that this pilot A2.1 was

designed. The pilot was run by a collaborative effort between CREA (Italy) and CERTH

(Greece). GS is a new paradigm in agriculture and demonstrates superior results in relation to

other approaches implemented thus far. Different assumptions of the distribution of marker

effects are accommodated, in order to account for different models of genetic variation

including, but not limited to: (1) the infinitesimal model, (2) finite loci model, (3) algorithms

extending Fisher’s infinitesimal model of genetic variation to account for non-additive genetic

effects. Many problems are modelled including the performance of new and unphenotyped

lines, untested environments, single-trait, multi-trait, single-environment, and multi-

environment. Genomic selection allows integrating quantitative genetics and population

genetics in a novel GS breeding approach wherein intercrosses are driven by genomic

predictions. Models are fed several data types: open-field phenotypic data, biochemical data,

phenomic and genomic data. Subsequently, these equations are used to predict the breeding

values of genotyped but unphenotyped candidates. In the process, several other Big Data

types (e.g., those describing environmental properties) can be used as covariates. The

Genomic Selection technology is expected to significantly improve genetic gain by unit of time

and cost, allowing farmers to grow a better variety sooner relative conventional approaches,

making more income. Specifically, for this pilot, the production of tomato Big Data from the

Greenhouses was slower than anticipated due the need of the production of new genetic

data, in order to assess the genetic variability of the crosses and the collection of

environmental and phenotypic data. However, preliminary results can be derived from the

application of the GS model on the genomic data since an extensive diversity study was

carried out with ddRadseq technology. Despite this, it was not possible to validate the C22.03

on tomato ddRASeq genomic data in combination with the phenomic data. As there were a

suitable amount of genomic and phenomic data from biomass sorghum pilots, in the last year

of the project, the potential of GS algorithms was successfully assessed in sorghum crops to

improve health-promoting compounds used to manufacture specialty foods. The same

approach is aimed to be tested on the tomato data once the collection of metabolic data is

complete.




The first stage of the trials started in 2018. In this year, the CREA’s platform for Genomic

prediction and selection was detailed to accommodate CERTH’s requirements following a

non-conventional approach. For this purpose, CERTH initiated a pilot study for the

identification of best tomato crosses bearing desirable traits e.g. organoleptic, nutritional

value, tolerance on various environmental conditions. The parental lines are Greek varieties

that are well adapted to the local environmental conditions. In order to investigate as many

crosses as possible, an holistic approach was applied for the best evaluation of the new

genotypes, including: (1) biochemical characterization and nutritional value assessment

(2)next generation sequencing protocols to generate genomic/genotypic datasets; (3)

environmental indoor data: air temperature, air relative humidity, solar radiation, (4)

environmental outdoor data: wind speed and direction, evaporation, rain; (5) farm data: farm

logs (work calendar, technical practices at farm level, irrigation information); farm profile

(static farm information, such as size, crop type, etc.). Biochemical, genomic and phenomic

data were collected in tomato (landraces and several recombinants lines in diverse filial

progeny stages) raised in glasshouses (Figure 50).

Figure 50: Tomato accessions in glasshouse under breeding settings

CERTH also produced an initial molecular dataset through NGS (Next Generation Sequencing)

technology based on Double Digest RADseq approach (Figure 51) and performed the initial

analysis and validation based on the STACKS pipeline (Figure 52). One hundred and thirty-

eight samples, originating from 40 tomato lines were included for the study and whole-

genome genotyped using the ddRADseq protocol. Analysis with STACKS pipeline resulted in

39,618 SNPs (Single Nucleotide Polymorphisms), using the Solanum lycopersicum as reference

genome (assembly SL3.0). After quality control and removal of SNPs that did not meet the

pre-specified thresholds, 10,402 SNPs remained to be further evaluated. In total, after next

generation sequencing (NGS) 3TB raw data, including the scanned images, were produced for

further implementation in GS algorithms. The size of the SNP marker matrix was enough to

start running the model, but the number of genotyped individuals was still low to be usefully

used to run the predictive models. More data, particularly increasing the size of tomato

population phenotyped and genotyped with whole-genome marker (SNPs) information was

needed and expected in the third year (2019) of the project.



Figure 51: ddRAD protocol modified from Peterson et al., 2012. PMCID: PMC3365034, DOI:10.1371/journal.pone.0037135

Figure 52: The STACKS pipeline, available at http://catchenlab.life.illinois.edu/stacks/manual-v1/

http://catchenlab.life.illinois.edu/stacks/manual-v1/



In the meantime, CREA set up and anticipated a GS platform for accommodating the

upcoming genomic and phenomic/phenotypic data from CERTH’s tomato breeding. In

addition, CREA set up a genotyping and phenotyping platform integrated in sorghum pilots

(B1.3) for use as testbed of the CREA’s C22.03 (Genomic models) component (Figure 53).

Figure 53: CREA’s sorghum pilot fields used in the C22.03 genomic models platform

The DataBio algorithms were implemented as DataBio C22.03 component which is registered

under DataBio platform (https://www.databiohub.eu/registry/#services?tag=C22.03) and is

deployed by CREA. To achieve the predictive analytics run in 2018, available public datasets

were used, and the outcome was encouraging. The analytics anticipated a single and several

environments to mimic single or several glasshouses. In a single environment, we

implemented standard genomic modeling predicting performance of unphenotyped plant

materials. On the other hand, experiments were run under multiple environments scenarios.

CV1 reflected prediction of tomato lines that have not been evaluated in any glasshouse trials.

CV2 reflected prediction of tomato lines that have been evaluated in some but NOT all target

environments (glasshouses). The rationale being that prediction of non-field evaluated lines

benefits from borrowing information from lines that were evaluated in other environments

(glasshouses). This is critical in cutting costs for varietal adaptability trials of large number of

lines in several target environments.

BRR (Bayesian Ridge Regression), GBLUP (Genomic Best Linear Unbiased Prediction), LASSO

(Least Absolute Shrinkage and Selection Operator), and Bayes B were implemented in this

first trial. Under several environments, these algorithms were combined with environments

to generate further predictive analytics. For each algorithm, predictive analytics were run on

a single environment basis, across environments, marker x environment, and the approach

reaction norm model. In this report, the computational power of multiple environments and

reaction norms was illustrated using GBLUP algorithm. Our findings for the 2018 trial showed

that genomic models perform equally under single environments. On the other hand, under

multiple environments, CV2 was superior to CV1. Under CV2 settings, single-environment

model performed poorly. The equal marker effects across glasshouses worked well in relation

to the single-environment model. Accounting for marker information x environment or

https://www.databiohub.eu/registry/#services?tag=C22.03



implementing the reaction norm model performed comparably and produced superior

results.


Trial 2 timeline

The production of tomato Big Data from the Greenhouses was slower than anticipated, since

only two generations could be produced within a year, and moreover, development of large-

fruited tomato cultivars was slower than expected. Hence, it was not possible to validate the

C22.03 on tomato related data. Yet, assessment of genetic variability of the parental lines and

their crosses was conducted, using different algorithms. In addition, biochemical parameters

of both parental tomato lines and their crosses were measured and an association with

genetic variability was investigated.

January - November 2019: Greenhouse preparation and collecting phenotypic data,

metabolomic data, IoT data from tomato cropping greenhouses. Tomato sampling, ddRAD

library construction, next generation sequencing and bioinformatics analysis of NGS data.

Biochemical and nutritional characterization of tomato cultivars. Monitoring of phenotypic

data of F6 and F7 generations.

January - April 2019: seed calibration for annual sorghum genotypes, processing data from

preliminary sorghum trials (2017) and first year (2018) trial.

April - October 2019: Sorghum trials establishment, phenomic and genomic data collection,

pilot data integration and processing, preparation of reports and writing peer-reviewed

papers.


To prepare the Trial 2 stage, experimental protocols were designed, glasshouses seedbeds

were set up, open-fields sites identified and prepared according to regional recommendation,

particularly in terms of fertilization and phytosanitary measures. Seeds were calibrated in

time in order to anticipate the right planting density. For each tomato cross, ten seeds were

seeded in greenhouses, to ensure that at least three individual plants will be developed and

sampled to further study their genetic, phenotypic and biochemical properties.

Trial 2 execution

Trials were sown on time and managed according the designed experimental protocols.

Greenhouses for tomato pilot trials were established in Greece, whereas sorghum pilot trials

were established in Bologna, Italy. Tomato lines were genotyped using the double digest

restriction-site associated DNA (ddRADseq) approach, while sorghums were genotyped using

a genotyping-by-sequencing (GBS) strategy.

Concerning tomato samples, sixty-nine samples were analysed in addition to the ones

produced in 2018, resulting in 207 samples originating from 9 parental lines, clustering in 51

different populations. Crossings of nine (9) parental lines were followed and analysed, up to

generation F7. DNA was extracted from young leaves using the NucleoSpin Plant II, Macherey-



Nagel kit. DNA concentration was evaluated on a Qubit 4.0 fluorimeter, using the Qubit

double-stranded DNA BR assay kit (Life Technologies, Carlsbad, CA) and its integrity was

assessed on a 0.8% TAE agarose gel. Two-hundred and seven NGS libraries were constructed

by applying the ddRADseq protocol (Figure 51), using the EcoR1 and MspI as restriction

enzymes. All libraries were quantified with fluorometric quantification using the Qubit®

dsDNA BR assay kit and their molarity was calculated in relation to their size after indexing.

Quantitative PCR (qPCR) was conducted on a Rotor-Gene Q thermocycler (Qiagen) with the

KAPA Library Quantification kit for Illumina sequencing platforms (KAPA BIOSYSTEMS).

Next generation sequencing was performed at the Institute of Applied Biosciences of the

Centre for Research and Technology Hellas, on an Illumina NextSeq500 platform (Illumina

Inc., San. Diego, CA, USA) using the NextSeq™ 500/550 High Output Kit (2 x 150 cycles).

Overall, 572.480.546 sequences of 150 bp (171 Gb) were produced to be annotated in the S.

lycopersicum reference genome. NGS data were analysed using the reference-based STACKS

v2.41 pipeline (Figure 52). Analysis was performed on the in house HPC Cluster at INAB

allocating 88 cores and 512 Gb RAM to analyse ddRadseq results. Results were analysed by

applying different thresholds for inclusion/exclusion of SNPs and individual plants. Further

filtering was conducted in PLINK v1.90 to reduce biases and incorrect inferences due to

missing data (both for individuals and SNPs) and by Minor Allele Frequency (MAF). Post-

filtering in plink was made based on the guidelines available at

https://rdtarvin.github.io/RADseq_Quito_2017/main/2017/08/03/afternoon-ddrad-

stacks.html. Plink filtering was conducted for basic summary statistics by applying options for

missing rate per SNP (--geno), missing rate per person (--mind) and allele frequency (--maf).

Again, loose and stringent criteria were used for the inclusion/exclusion of SNPs and

individuals.

Biochemical analysis and nutritional value assessment was carried out in the initial parental

lines and on the final genotypes as to evaluate the breeding process. For this purpose, a

thorough biochemical analysis was carried out implementing both colorimetric and

chromatographic methods. Total sugars and soluble solids were measured with a

refractometer and expressed as Brix values, total polyphenol content was measured with

Folin-Chiocalteu method, total antioxidant activity was assessed with DPPH radical assay,

lycopene was measured spectrophotometrically, total flavonoid content was measured with

AlCl3 method, ascorbic acid was assessed with Megazymes ascorbic acid assay kit and amino

acids was measured with GC-MS with EZFaastTM Free (Physiological) Amino Acid Analysis kit

(Phenomenex).

The phenotypic characterization of F6 and F7 crosses was carried out according to the UPOV

guidelines. In Table 4, the morphological characteristics of different parts of the plant are

presented.

https://rdtarvin.github.io/RADseq_Quito_2017/main/2017/08/03/afternoon-ddrad-stacks.html

https://rdtarvin.github.io/RADseq_Quito_2017/main/2017/08/03/afternoon-ddrad-stacks.html



Table 4: Morphological traits of the plant, flower and leaf in 14 tomato genotypes according

to the UPOV guidelines.

Genotype Type of growth

Anthocyanin in the

upper 1/3 of the stem

Flower

color

Fasciation of

the 1st flower

Leaf attitude at the

middle 1/3 of the plant

Type of

leaf blade

Attitude of petiole of leaflet in

relation to main axis

F6 11x1_a indeterminate weak yellow absence

semi-drooping bipinnate horizontal

F6 11x1_b indeterminate absent or very weak yellow absence


F6 3x1_f indeterminate absent or very weak yellow absence horizontal bipinnate semi-erect

F6 3x1_e indeterminate absent or very weak yellow absence horizontal bipinnate semi-erect

F6 3x1_d indeterminate absent or very weak yellow presence drooping bipinnate horizontal

F6 3x1_c indeterminate absent or very weak yellow presence


F6 3x1_a indeterminate absent or very weak yellow presence drooping bipinnate horizontal

F6 3x1_b indeterminate weak yellow presence


F6 1x9 indeterminate absent or very weak yellow absence drooping bipinnate semi-erect

F7 32x30 indeterminate absent or very weak yellow absence semi-erect bipinnate semi-erect

F7 17x32_b indeterminate weak yellow presence

semi-drooping bipinnate semi-erect

F7 17x32_a indeterminate weak yellow presence

semi-drooping bipinnate semi-erect

F7 32x36_a indeterminate absent or very weak yellow absence drooping bipinnate horizontal

F7 32x36_b indeterminate absent or very weak yellow absence


As it is presented in the table, most of the morphological characteristics are alike among the

different genotypes. Two characteristics had the highest variability, leaf attitude at the middle

1/3 of the plant and the attitude of petiole of leaflet in relation to main axis. Since the climate

of Greece is characterized by high temperatures during summer, the ability of the plant to

tolerate heat stress was validated. As it is demonstrated in Table 5 a significant variability was

observed among the tomato genotypes. The most tolerant crosses were F6 11x1_a and

F6_3x1_e and F6_3x1_d.



Table 5: Plant vigor and tolerance to high temperatures in 14 tomato genotypes.

Genotype Plant vigor1 Tolerance to high temperatures1,2

weak

(%)

medium

(%)

good

(%)

very good

(%)

low

(%)

medium

(%)

high

(%)

F6 11x1_a 19 81 46 42 12

F6 11x1_b 3 54 43 84 10 6

F6 3x1_f 4 28 40 28 76 24

F6 3x1_e 5 13 21 61 80 20

F6 3x1_d 4 4 32 60 60 36 4

F6 3x1_c 5 44 51 37 46 17

F6 3x1_a 19 77 4 54 42 4

F6 3x1_b 7 17 73 3 76 24

F6 1x9 33 67 53 43 4

F7 32x30 13 33 54 34 66

F7 17x32_b 26 46 18 10 53 47

F7 17x32_a 5 40 55 30 45 25

F7 32x36_a 19 73 8 65 27 8

F7 32x36_b 10 90 34 66 1visually estimated at the end of the experiment (approximately 120 days after transplanting) 2estimated on the basis of aborted flowers during high temperatures



Table 6:Total production traits in 14 tomato genotypes (sum of six weekly harvests).

Genotype

Production

weight (g)

Number of

fruits

Mean fruit

weight (g)

Number of fruits

with catface

Number of fruits

with cracking

Number of fruits

with B.E.R.1

Number of “off-type”

fruits2

Number of fruits

with “nose”3

Mean S.E. Mean S.E. Mean S.E. Mean S.E. Mean S.E. Mean S.E. Mean S.E. Mean S.E.

F6 11x1_a 1434,69 108,26 19,23 0,89 75,99 5,33 0,00 0,00 3,35 0,34 0,00 0,00 5,12 0,35 0,15 0,15

F6 11x1_b 1712,32 98,81 19,54 1,04 92,55 6,86 0,04 0,04 4,57 0,54 0,00 0,00 5,43 0,44 0,00 0,00

F6 3x1_f 2174,59 138,19 51,89 3,29 42,22 1,01 0,22 0,15 6,78 1,03 0,04 0,04 7,85 0,73 0,04 0,04

F6 3x1_e 1984,54 120,67 58,62 3,70 34,09 0,79 0,08 0,08 1,73 0,40 0,04 0,04 5,46 0,59 0,08 0,05

F6 3x1_d 3004,52 190,57 40,89 3,55 80,97 5,67 12,93 1,84 19,74 1,84 0,22 0,12 9,82 0,79 0,15 0,07

F6 3x1_c 3646,69 138,99 46,46 2,63 83,09 4,77 8,39 1,68 12,81 1,49 0,27 0,10 6,69 0,43 0,04 0,04

F6 3x1_a 2902,92 122,40 16,27 0,80 184,34 7,77 2,54 0,26 9,96 0,56 0,04 0,04 1,92 0,29 0,12 0,06

F6 3x1_b 2823,00 149,25 15,59 0,78 181,02 4,46 2,17 0,28 9,83 0,60 0,00 0,00 1,45 0,17 0,00 0,00

F6 1x9 2271,83 145,76 26,83 11,44 144,90 7,19 0,38 0,14 10,90 0,64 0,38 0,09 3,17 0,30 0,03 0,03

F7 32x30 3315,39 89,45 126,46 4,18 26,50 0,57 0,00 0,00 0,00 0,00 0,00 0,00 6,73 0,74 0,04 0,04

F7 17x32_b 3709,44 277,02 35,84 2,50 103,62 3,39 17,60 2,05 14,12 1,39 0,28 0,20 7,16 0,69 0,12 0,07

F7 17x32_a 3496,41 258,97 34,96 2,50 100,39 3,66 12,00 1,20 10,73 1,46 0,00 0,00 7,64 0,75 0,09 0,06

F7 32x36_a 1954,19 88,29 19,23 0,97 104,16 3,54 8,92 0,78 16,77 1,00 0,00 0,00 5,42 0,22 0,31 0,09

F7 32x36_b 2169,75 107,43 21,43 0,84 101,10 3,19 8,50 0,69 17,96 0,82 0,14 0,11 5,21 0,47 0,43 0,11

1B.E.R. = blossom end rot

2marketable fruits that had slightly different attributes (shape) from the rest

3one carpel was not properly fused with the rest of the fruit and was protruding upwards

The mean values and their respective standard errors (S.E.) were calculated from 25-28 independent measurements per genotype.

Finally, the tomato genotypes were also phenotyped regarding specific production traits. As

it is displayed in Table 6, the most productive genotypes were F3 3x1a-d, F7 32x30 and F7

17x32 a,b. The genotype F7 32x30 was also the most productive of all regarding the total

number of produced fruits.

The sorghum experimental sites for this pilot coincided with the experimental sites for the

pilot B1.3. DNA was isolated from plantlets using the GeneJET Plant Genomic DNA Purification



Kit. DNA concentration and purity were evaluated by a Tecan Infinite M200Pro

spectrophotometer, while DNA integrity was evaluated through 1% agarose gel

electrophoresis containing GelRed (Biotium) as fluorescent dye. For each DNA sample, an

aliquot of 60 µl at a concentration ≥ 10 ng/µl was used for downstream analyses. In sorghum

the methylation sensitive restriction enzyme ApeKI was used for library preparation, and GBS

was carried out on an Illumina HiSeq X Ten platform. The sequencing reads were aligned to

the sorghum reference genome (Sorghum_bicolor NCBIv3) to enable variants discovery. The

two batches yielded two respective matrices of 933,020 and 919,485 SNP loci, and were

delivered as separate VCF files which were subsequently merged into a single matrix using

VCFtools resulting in a total of 1,252,091 loci. Marker quality control criteria were then

applied to the merged dataset considering only samples having phenotypic and marker data.

The final working matrix consisting of 61,976 high-quality SNPs was used in this work for

genomic selection and prediction analytics.

Trial 2 results

In addition to genomic data, phenotypic data were produced but the bottleneck was the

lower number of phenotyped individuals that did not allow the implementation of genomic

selection and prediction analytics. Nevertheless, several population statistics models were

applied to the dataset (Principal Component Analysis-PCA, ADMIXTURE analysis), so as to

profile the genetic background of tomato cultivars, in relation to biochemical properties of F6

- F7 plants (stable offspring), which was the aim of the current pilot. Analysis of the genetic

diversity of the 207 genotypes revealed three major clusters; one enclosing the vast majority

of the genotyped samples, a second enclosing F6 and F7 of the 32x36 cross and a third cluster

consisting of F5, F6 and F7 17x32 crosses (Figures 54 and 55; PCA analysis per population and

per individual). Notably, all replicates of the F5_3x1 cross presented exactly the same

clustering profile, indicating that the variance is low. The most diverse cross was 3x1, which

presented a loose clustering, indicative of a less stable offspring over the generations,

compared to the other crosses used in this pilot. The first two principal components (PC1 and

PC2) explained 48.87% of the total genetic variation. Admixture analysis confirmed PCA

results; the lowest cross-validation (CV) error for the 51 populations was acquired for K = 50.



Figure 54: Principal component analysis for the tomato populations based on their genetic background

Figure 55: Principal component analysis for the tomato individuals based on their genetic background

Along with the genetic diversity of the tomato genotypes, variability of the crosses was also

assessed based on their biochemical background. For this purpose, a PCA of tomato cultivars

was conducted based on the following biochemical parameters: total sugars and solids as

expressed by Brix scale, total phenol and flavonoid content, antioxidant activity, ascorbic acid

content, amino acid content and lycopene content. Analysis of the diversity of cultivars based



on their biochemical background showed three loose clusters of F6 and F7 crosses. A loose

clustering of F6_3x1 cross was also present in the PCA analysis of the cultivars based on the

biochemical data similar to the findings of the genomic analysis. The genetic diversity of the

crosses F6_17x32 and F6_32x36 were also depicted on their biochemical background since

did not group with any other cultivar.

Figure 56: Principal component analysis for the tomato individuals based on their biochemical background

In the open-field sorghum trials, the purpose was to assess the performance of four GS

models (GBLUP, BRR, Bayesian LASSO, and BayesB) in four sorghum grain antioxidant plant

characteristics (phenols, flavonoids, total antioxidant capacity, and condensed tannins), using

whole-genome SNP markers. One key breeding problem modelled was predicting the

performance in antioxidant production of new and unphenotyped sorghum genotypes

(validation set). The populations were weakly structured (analysis of molecular variance,

AMOVA R square = 9%), demonstrated a significant genetic diversity, and expressed

antioxidant traits with a good level of variability and highly correlated. The perennial

populations (S. bicolor × S. halepense) outperformed the annual populations (Sorghum

bicolor) for all the antioxidants. The four genomic selection models implemented in this pilot



performed comparably across traits, with accuracy ranging from 0.50 to 0.60 (Figure 57), and

are considered high enough to sustain sorghum breeding for antioxidants production and

allow important genetic gains per unit of time and cost. The results produced in this pilot are

expected to contribute to genomic selection implementation and genetic improvement of

sorghum grain antioxidants for different purposes including the manufacture of health-

promoting and specialty foods in Europe in particular, and in the world in general.

Figure 57: Distribution (boxplot) of GS models validated accuracy in external sample (not used during model training) of 34 (30% of the total population) sorghum lines. FEN, FLA, TAC, TAN, respectively, polyphenols, flavonoids, total antioxidant capacity, and condensed tannins. Traits means are included within the boxplot. Trait means with same letter are not significantly different at the 5% level using the Tukey's HSD (honestly significant difference) test. Refer to text for the description of the GS models.





Component code

and name


location

C22.03 Genomic

models

Implementing

genomic selection

analytics to

calibrate the

phenomics against

the genomics to

successively

predict the

performance of

unphenotyped

plant lines and

untested

environments,

with massive time

and cost cutting,

and meaningful

genetic gain.

Validated with real pilot

data

CREA

(ephrem.hab

yarimana@cr

ea.gov.it)

Data Assets

Data Type Dataset Dataset original

source

Datase

t

locatio

n

Volum

e (GB)

Velocity

(GB/year)

Penomics,

metabolomic

s, genomics,

environment

al data

DS-40.01 CERTH, CREA CERTH,

CREA

5000 2500





The phenotyping work in tomato glasshouses proceeded slower than anticipated, which did

not allow us to validate the GS algorithms in tomato materials. The algorithms were validated

only in sorghum (annual and perennial) pilots. Nonetheless, the ddRADSeq genotyping

platform was validated and can be used for sequencing and genotyping (variants calling)

services of several plant and animal breeding schemes. Current empirical evidence for

genomic selection efficiency in plant breeding set to 0.5, the baseline for genomic selection

prediction accuracy in plant breeding. In addition, recent research works demonstrated that

genomic selection accuracy as low as 0.2 can allow substantial within-generation yield

improvement. Therefore, the genomic selection model performances obtained in our pilots

are high enough to sustain sorghum breeding for antioxidants production and allow

important genetic gains per unit of time and cost. In addition to the accuracy, the importance

of the genomic selection strategy is also evaluated using other criteria, such as the possibility

that this technology offers the potential to shorten the breeding cycle, with interesting

economic returns, due to intercrosses driven by genetic predictions. Hence, in the case of

antioxidants, genomic selection offers the possibility to select for or against this trait early

(e.g., at the seed or seedling stages) without waiting for seed setting or harvest. The genomic

selection equations developed in this work can be directly used in sorghum breeding

programs. The genomic selection results presented herein and experimental designs used in

this work can be implemented in antioxidants genetic investigations and in breeding

programs to qualitatively and quantitatively improve the antioxidant production for different

purposes including the manufacture of health-promoting and specialty foods.

KPIs

KPI

short

nam

e

KPI

descripti

on

Goal

description

Base

value

Target

value

Measure

d value

Unit

of

value

Comment

A2.1-

KPI-

01

Accuracy Increased

accuracy

0.4 0.4-

0.7

0.5-0.6 Pears

on’r

Pilot was

successful

A2.1-

KPI-

02

Breeding

cycle

(years)

Decrease the

cycle relative

to

phenotypic

breeding

- 0.30 0.25 Ratio Too early

to assess



A2.1-

KPI-

03

Breeding

costs

(index)

Decrease

costs relative

to

phenotypic

breeding

- 0.50 0.20 Ratio Too early

to assess



7 Pilot 5 [B1.1] Cereals and biomass crop Pilot overview

The product developed by TRAGSA Group with the help of ATOS and IBM Israel is a system to

generate accurate "irrigation maps" and "vigor maps" of crops, using Big Data Sources as EO

data and sensors data as inputs. Those maps, from different areas in Spain as Castile and

Andalusia, are included in an informative management system for early warning of

inhomogeneity.

As a brief summary, this new service provides analytical and accurate data on crop

heterogeneity: due to irregular irrigation, mechanical problems affecting irrigation systems,

incorrect distribution of fertilizers or any other sources of inhomogeneity could appear crops

growing differences. Therefore, this DataBio Service is an excellent preventive tool for

farmers and landowners in order to avoid production losses and it is a powerful tool for

agricultural management in big productive areas.


Once the use case was defined, a first description of the required input data was decided.

Massive and rapidly updated data, bioagronomic data, sensor data, terrestrial observation

data and geographic data from different sources were used, specifically:

• SENTINEL-2: are terrestrial observation data owned by the ESA (European Space

Agency).

• Ortophotos: terrestrial observation data in image format obtained from the National

Geographic Institute of Spain.

• RPAS: are terrestrial observation data obtained by thermal and multispectral sensors,

owned by TRAGSA.

• SigPAC: Spatial data in the ESRI Shapefile format which identify the parcels, owned by

Junta de Castilla y León.

• Alphanumeric data from surveys and field visits, owned by TRAGSA.

Regarding Big Data processing, the used remote sensing data such as Sentinel-2B have an

average TB size per year, the Spanish LPIS system has a size of hundreds of GBs, in the same

way as the Spanish orthophotos project (PNOA). In terms of speed, Sentinel-2 has the highest

update rate, within the information sources considered, this being five days. All external

sources have an annual update rate. The variety of formats will include images and terrestrial

models and the variability of the agricultural information, typically depending on the seasons

of the year.

Some research has been carried out on the acquisition of own data through sensors or IoT

devices, but this sup-pilot is still in an early stage of development.

After the capture and initial collection of data, they have been stored in Mongo DB Databases

and the tasks of processing Big Data with R for the creation of models of inhomogeneities



begun. The output of the processing of this data are output raster images (images formed by

an array of cells (or pixels) organized in rows and columns in which each cell contains a value

representing a value of a given information) and spatial databases.

As a first step, the assessment of the required satellite images and cloud processing service

platforms, part of DataBio platform, was carried out. This evaluation was made through its

application to the development of irrigation needs algorithms, in order to obtain full

functionality in web applications based on high frequency, scalable satellite image data at

national level.


Trial 2 timeline

Trial 1 timeline (Figure 58):

• First iteration of data acquisition + field work carried out on time.

• RPAS and field data acquisition and processing have required a bigger effort than

expected. General monitoring based on satellite images has consequently decided to

be improved in 2019, although some images are already available and pre-processed.

• Data processing periods have taken place after each field campaign, and are still on-

going

• The first implementation of these services as part of Databio platform was expected

to be performed in M18-M28 and, currently, there is a first version not including GIS

capacities yet. Although this is a bit delayed, we are working on it and expect to have

it ready for M28.

• KPIs were proposed to be measured in M12 (baseline) and M35 (final). In M12 it was

still too early to evaluate KPIs, so it was postponed, and an evaluation is included in

this document.

Trial 2 timeline (Figure 58):

• Second iteration of data acquisition + field work carried out on time.

• IoT Sensors installed. Use of ATOS Fiware Broker and PROTON.

• Final products and tools



Figure 58: Pilot B1.1 timeline


To prepare the Trial 2 stage, new data were collected and prepared according to the

specifications defined in Trial 1. In addition, the agreement with the second pilot zone Genil

Cabra, Andalusia was defined.

Trial 2 execution

The final product developed during the Trial 2 is composed by several technological elements

aimed to get the following objectives: (i) reduction of inputs as water, manure, fertilizers, (ii)

reduction in energy consumption, (iii) automation of irrigation systems, (iv) optimization of

irrigation areas management, (v) deploy of Big Data in agricultural environment, (vi)

modernize agriculture and (vii) traceability control.

The pilot objective is to integrate satellite and RPAS Big Data in decision-making support tools

in order to improve water efficiency and agriculture management for irrigated crops. The

study area comprehends more than 100 ha, nevertheless the methodology developed can be

extended to bigger areas.

Consequently, the goal accomplished was to design, use and deploy tools and processes to

create real Smart Agriculture in irrigation areas and to establish useful processes useful in

other agri-food chains.

In order to get the previous objectives, the following Big Data Sources have been processed

and used:

Internet of Things:

Agro-climatic stations provide temperature, relative humidity, absolute humidity and wind

data from the following sensors:

1. Contact sensors

2. Humidity sensors on cropped soil to know its actual conditions in order to determine



and control the field capacity

3. Lysimeter (to control the level of nutrients in the field, adjusting the amount of manure and fertilizer needed)

4. Control in the parcels with sprinklers, drippers, metering devices…

5. Irrigation networks controls (pressure switches, pressure reducer, flow meters, manometers, Solenoid valves for an automatic opening and closure, counters, pumping stations controls: manometers, flow meters, pumping state, anti-return valves, alarm settings, heating….

Images:

The pilot uses Remote Sensing data and RPAS/UAVs generated imageries. The use of satellite

technology (SENTINEL, LANDSAT, etc…) helps stakeholders to control the general conditions

of the crop, obtaining its specific Kc10 and detecting plagues, diseases, transpiration or

excessive humidity problems. The use of this technology defines the accurate amount of

water and fertilization that the crop needs.

A comprehensive strategy combining Big Data remote sensing and field data has been helpful

for an improved and more efficient agriculture management. Satellite data are suitable for

monitoring large areas over time, while Remotely Piloted Aircraft Systems (RPAS) provide

specific data for calibration and validation purposes. Tragsa Group counts on different

solutions based on RPAS in order to fulfil these tasks.

Therefore, the results of the pilot have highlighted that despite of irrigation needed by crops

is usually calculated using Kc reference values provided by FAO, the Normalized Difference

Vegetation Index (NDVI) obtained by means of remote sensing has proven to be more useful

for calculating the Kc factor adjusted to local conditions (Figure 59).

Kc = 1.25 x NDVI + 0.1 (Calera et al, 2014)

NDVI = (NIR – R) / (NIR + R), where R means red band and NIR means Near Infrared band

Figure 59: Kc and NDVI equations

DataBio pilot B1.1 has proved that Kc values obtained by using NDVI derived from RPAS

multispectral images improved the ones provided by FAO model, and showed a very reliable

relationship with Kc derived from SPOT 7 satellite images. In addition, some products were

obtained from RPAS data, including RGB mosaics (3 cm GSD) thermal images and Digital

Terrain and Surface Models (DTM, DSM) (Figure 60). These products provide valuable

information for different purposes such as the monitoring of plants health or the estimation

of growth and biomass.

10 http://www.fao.org/3/X0490E/x0490e0b.htm

http://www.fao.org/3/X0490E/x0490e0b.htm



Figure 60: Left to right: NDVI image from multispectral RPAS data; RGB mosaic; thermal image over RGB mosaic; DSM

Finally, Big Data have provided new efficient decision-making tools for helping agricultural

development as well as biodiversity protection. New acquired, aggregated and shared data

can serve as a breeding ground for extracting and sharing useful information and knowledge

among different actors, as well as for combining large data sources (especially regarding

weather models and earth observation datasets) with advanced crop and environment

models to provide actionable on-farm decisions.

Trial 2 results

This DataBio tool has been developed specifically for two Irrigation Communities as the final

customer. In the current pilot the objective was the water and energy saving by the use in

irrigation areas using the following techniques:

• EO Big Data sources and Remote sensing applications for calculate NDVI and corrected

crop factor Kc and balanced against participation, which needs to be measured on site.

• RPAS for address specific problems: plagues and diseases, irrigation uniformity, soils

problems etc.

• Agroclimatic stations and IoT sensors to provide information in situ.

• Telecontrol systems.

• Irrigation equipment.



Figure 61: Comparative Kc obtains for remote sensor in front FAO data per cereal

This DataBio pilot aims to reinforce agribusiness sector adapting the diversification of

production to new economic and climate scenarios to systems of remote control and remote

management in irrigated areas where is essential to achieve smart agriculture based Big Data

sources.

An algorithm C5.0 (R language) has been used for the automatic classification of soil cover,

which allows the generation of decision trees combining data of different types (cartography,

images, BD, etc.). It must be taken into account that these will vary according to the zones,

their availability and quality. The classification algorithm was trained using samples of the

different uses and coverages to be identified. The sample data are divided into two groups,

in this way 80% of the samples are used in the construction of the model. Once the decision

tree is generated, it is validated with 20% of remaining samples, not used in its construction.

Using samples from all land uses, a decision tree is generated from which a classification of

large LPIS uses is obtained. Using only the agricultural samples, another tree is generated,

from which a classification of agricultural crops is obtained. The combination of both

classifications will result in the crop layer and soil cover.

After this technical description of the algorithms, it is necessary to emphasize that they had

(in the development phase) the following limitations: (i) in LPIS there are no differences

between arable crops, so it is not possible to verify if the crop identified by remote sensing

coincides with the one existing in the field and (ii) the spatial resolution of Sentinel-2 does not

allow the correct identification of woody crops, since it is limited to the response of the crop

and / or the plant coverings under it.

The results will collect:

• Null match: agricultural use in LPIS. Classified as non-agricultural.

• Average coincidence: when both in LPIS and in the layer generated, the use is

agricultural.



• High coincidence: when both in LPIS and in the layer generated, the crop is of the same

type (in both cases, it is a woody crop, or both are herbaceous).

• Perfect match: when the crop is the same in both LPIS and the layer generated by

remote sensing.

The cause of the discrepancies should be analysed with the supervision of photo interpreters,

with Sentinel-2 images being an important aid for this. Once defined the kind of crop, the

developed methodology allows, using temporal series of Sentinel data, the definition of Kc

parameter and, using it, the irrigation needs of the specific crop.

Figure 62: Result: High-Scale vigour map

As it is possible to see in the previous image, this DataBio System has obtained the temporal

evolution of specific plots to determine both water needs and vigour maps.

Finally, a methodology for the calculation of water needs has been developed and applied to

the Genil-Cabra (Andalusia) pilot zone. The farmer association involved in the pilot has

provided it with data on irrigated plots and crops from 2017. In addition, the pilot has used

Sentinel 2017 images. Using those datasets as initial Big Data sources, a classification process

has been developed to obtain the NDVI (Normalized Difference Vegetation Index). This

biological index is the basis for the calculation of water needs. In the final cycle of the project,

an integration data process has been carried out to harmonize and unify the different

datasets. The following image highlights how using all the previously explained processes is

possible to classify the plots accordingly to irrigation needs:



Figure 63: Crops classification and irrigation needs

The maps, data sources and results obtained can be accessed through the component C11.01.


The development of a management application C11.01 has been completed. This tool allows

the aforementioned Big Data sources to be accessed in a useful way both by the managers of

the irrigation communities and by the farmers themselves. Figures 64 - 65 shows the current

status and general appearance of the web management application:

Figure 64: Management profile - Irrigation needs of the whole irrigation community



Figure 65: Farmer profile - Irrigation needs for a specific parcel and crop

Moreover, a general service has been developed (with irrigation communities as customer)

to allow the publication of vigour maps and water needs of all the plots at provincial level. It

has been also completed the analysis of the utilization of conventional sensors, IoT sensors

and tools by ATOS and IBM Israel (integrated in the DataBio platform) besides with the

publication of the processed and generated information generated through viewers or end-

users.

Massive and rapidly updated data, bio-agronomic data, sensor data, terrestrial observation

data and geographic data from different sources were used, specifically:

• SENTINEL-2: are terrestrial observation data owned by the ESA

• Ortophotos: terrestrial observation data in image format obtained from the National

Geographic Institute of Spain.

• RPAS: are terrestrial observation data obtained by thermal and multispectral sensors,

owned by TRAGSA.

• SigPAC (LPIS): Spatial data in the ESRI Shapefile format which identify the parcels,

owned by Junta de Castilla y León.

• Alphanumeric data from surveys and field visits, owned by TRAGSA.



Figure 66: Raspberry unit and IoT sensors

In regard to Big Data processing, the used remote sensing data such as Sentinel-2B have an

average TB size per year, the Spanish LPIS system has a size of hundreds of GBs, in the same

way as the Spanish orthophotos project (PNOA), as described in 7.2. Additional research has

been carried out on the acquisition of own data through sensors or IoT with the help of ATOS

and IBM Israel. After the capture and initial collection of data, storage and processing has

been conducted as described in 7.2, producing raster images and spatial databases.

As a first step, the assessment of the required satellite images and cloud processing service

platforms, part of DataBio platform, has been carried out. This evaluation was made in order

to obtain full functionality in web applications based on high frequency, scalable satellite

image data at national level. For this tool, data flow has been defined as presented in Figure

67.



Figure 67: Data flow diagram of the model for the implementation of precision agriculture techniques

As a first step, work is being done to improve the radiometry of aerial orthophotos (PNOA),

in order to increase their homogeneity and their subsequent possibilities of use, both for

agrarian and environmental purposes, in automatic processes of image analysis together with

images from satellite (radiometric intercalibration, DataBio Component C11.03). For this,

several algorithms are being developed that allow radiometry to improve and visual

interpretation of orthophotos (adjustment of colors and levels). Moreover, a software (called

"Image Enhancer Framework") has been developed that allows applying these processes to

large amounts of aerial images.



Figure 68: Definition of histograms. Result of homogenization of images

Regarding this topic, an operational methodology was designed for the generation of a layer

of ground cover and changes, using remote sensing, Big Data and GIS techniques,

methodology that has been proved functional for updating crop maps and crop health status

maps. An automatic sorting algorithm of machine learning type has been tested that

combines the temporal series of Sentinel-2 images with reference data from different sources

(PAC Declarations, SIGPAC, Forest Map, etc.). From this scope, four thematic layers have been

designed:

• Large-scale dataset: this set of raster data identifies the major land uses: agricultural,

forestry, pasture, unproductive, water and urban.

• Change dataset: The changes observed are grouped into 3 classes: change, doubt and

no change.

• Crop dataset and soil cover: this raster dataset is generated on the agricultural land

mask of LPIS. It supposes the maximum level of disaggregation of coverages / crops /

land uses to be achieved in each zone, according to the reference data used.



• Dataset of discrepancies between the CAP declarations and the crop dataset obtained

by remote sensing.


Component

code and

name


location

C11.01 Pilot itself: tools, EO

processing algorithms,

WEB API developed

Completed TRAGSA

Group

C11.03 Radiometric improvement

of Orthophotos provided by

the National Geographic

Institute. This improvement

and physical features

harmonization (colour,

intensity…) allows this

datasource to be used with

similar accuracy than

Satellite Images.

Completed TRAGSA

Group

C05.02 IoT Hub is a middleware

component to support

continuous data collection

from IoT based resources.

B1.1 Pilot collects field Data

using IoT sensors which

information is gathered by

IoT Hub.

ATOS

C19.01 Complex event processing

engine for event stream

processing. The information

centralized by C05.02 is the

input for this component. It

stores the rules defined by

the final users of TRAGSA-

TRAGSATEC

IBM



Data Assets


original

source

Dataset

location

Volume (GB) Velocity

(GB/year)

Raster SENTINEL-2 ESA.

European

Space

Agency.

https://senti

nel.esa.int/w

eb/sentinel/

missions/sen

tinel-2/data-

products

~ TB ~ TB/year

Raster PNOA

(Spanish

National

Plan of

Ortophotos

)

IGN. National

Geographical

Institute.

http://centro

dedescargas.

cnig.es/Centr

oDescargas/b

uscadorCatal

ogo.do?codF

amilia=02211

~ GB Updating

frequency ~ 5

year

Raster RPAS - UAV Developed by

TRAGSA with

its own

drones fleet

N/A ~ GB On demand

Vectorial

(Shapes)

LPIS -

SIGPAC

Autonomous

Communities

http://www.i

decyl.jcyl.es/

geonetwork/

srv/spa/catal

og.search#/h

ome

~ MB Updating

frequency ~

2/3 year

Alphanumeric IoT Sensors TRAGSA IoT Fiware Small Low


Water scarcity is an increasing and common worldwide phenomenon. Hydrologic cycles do

not coincide with the annual seasons and there are alternating periods of severe drought with

periods of heavy rains. As a general approach, irrigation agriculture is vital to guarantee food

security conditions to assure the well-being and progress levels demanded by European

Citizenship in the 21st century.

https://sentinel.esa.int/web/sentinel/missions/sentinel-2/data-products






http://centrodedescargas.cnig.es/CentroDescargas/buscadorCatalogo.do?codFamilia=02211







http://www.idecyl.jcyl.es/geonetwork/srv/spa/catalog.search#/home








According to FAO estimates, in the first decade of this century, the 17% of irrigated arable

crops supplied 42% of food in the world. By 2020, these irrigated arable crops are expected

to provide 50% of food using less water.

Therefore, sustainability of irrigation areas must be promoted, and it is mandatory to solve

their specific problems in order to meet their needs. The specific problems are: 1) water

scarcity, 2) increase of energy used, 3) absence of tools than determining the specific

requirements of each crop at the time, 4) lack of generalized and interoperable tools, 5) water

quality problems, 6) lack of performance of irrigation arable crops, 7) lack of research in the

process of switching to alternative crops: develop pest-resistant local crop varieties, develop

crop with low water requiring, etc., 8) no control of needs required to optimize the work.

Therefore, the overall challenge is to get a smart agriculture to ensure optimal conditions. It

will be necessary to get social and environmental challenges in order to attend the needs in

irrigation areas and turn them into optimized production areas.

Therefore, the following social challenges should be considered:

Sustainable Production: 1) Selecting better seeds than increase the productivity to attend the

increase of demand of food in a limit surface. Selection process and genetic improvement will

get better agricultural performance and the stabilization of this production. 2) Water

management for security agriculture and economically viable. The use of innovative

technologies, as Big Data, to design new software it is necessary to get an optimum use of

water in agriculture. 3) Fertilization optimum to use technologies to know the availability of

nutrients of the soil. The technologies used are Earth Observation EO, models or soil sensor

than will help to mechanics of land regulation to maintenance the plant nutrition, 4) Technical

process to get the best quality in soils. It will be necessary to use EO, models, soil sensors,

machinery etc for identification of problems and establish preventive measures, recovery

and/or control and monitoring necessary to implement on the ground in order to improve

their environmental conditions and remove, if any, risks than may result from contamination

having said soil.

Cost: Water scarcity and increasing energy costs are the most important threats to irrigated

agriculture. All agents involved in this sector are worried about these challenges which

require the integration of continuous sustainable technological innovation and new

management structures to achieve improved water and energy efficiencies in each region.

Furthermore, these problems could be transferred to the agribusiness sector, due to the need

for security, stability and warranty in raw material supply, created around the irrigated areas.

On the other hand, in many cases, there is the possibility of clean and renewable energy

sources introduction. It will reduce the costs in energy of irrigation areas.

Risk: The health security and safety in food is a big preoccupation. It is necessary to guarantee

the security and safety in food production. The use of unmanned aerial vehicles (UAV

technologies) allows pest and disease control. In addition, these technologies contribute

additional information which may help to distinguish the best variety in each area or the

elaboration of varieties with resistance to pests and diseases.



Collective decision making: To support farmers ́ decision making in relation to the use of

these resources (water, manure and fertilizers) and their management strategy of these

resources.

The DataBio B1.1 pilot has used different kind of sensors, and actuators distributed in

Irrigations Communities in experimental facilities for testing and finally in real scenarios

dealing with daily activity and real impact on advances in services and infrastructures that are

in place for systemic innovation in Water Communities. The kind of sensors and actuators are

very similar in all the modernization irrigation areas and the number varies depending on the

considered scenario.

These technologies contribute to smart agriculture, so that through them the right amount is

watered in getting the optimum time to apply water efficiency criteria that contribute to

improving food security, in the sense that if the amount is increased available water potential

production increases.


The final service provides information for precision agriculture, mainly based on time series

of high resolution (Sentinel-2 type) satellite images, complemented with IoT sensor data and,

in some specific cases defined by profitability, with RPAS data. The final costs saving for

farmer communities due to a better-quality management in agricultural zones, especially

focused on irrigated crops, are produced, mainly, by a water and energy better management.

Besides this, fertilizers control and monitoring produce, eventually, a prominent economic

saving per year and hectare. This better management of hydric and energetic resources is also

related to Green-house effect gases reduction, directly linked to better environmental

conditions in agriculture.

As a summary, Spain has an area of 3.621.722 hectares for irrigated agriculture, of which 73%

is modernized irrigation pressure and the remaining 27% is irrigated by gravity. Many of them

are managed under the control of Irrigation Communities; they would be our addressable

market.



KPIs

KPI short

name

KPI

description

Goal

description

Base

value

Target

value

Measur

ed value

Unit of

value

Commen

t

Surface Processed

Surface

2 Irrigation

Communiti

es

4000 12499

.87

36445.8

7

Ha

Tool Water

needs tool

0 1 1 Tool Web API

and Web

service

develope

d

Final

users

Number of

users

Stakeholder

s using the

tool

0 10 300 user

Campaig

n

Irrigation

campaign (in

Real

conditions)

managed by

the tool

1



8 Pilot 6 [B1.2] Cereals, biomass and cotton crops_2

Pilot overview

The main focus of this pilot is to offer smart farming advisory services dedicated for arable

crops, based on a set of complementary monitoring and data management technologies (IoT,

EO data, Big Data analytics). Smart farming services are offered as irrigation advices through

flexible mechanisms to the farmers or the agricultural advisors. The pilot will target towards

exploiting heterogeneous data, facts and scientific knowledge to facilitate decisions and their

application in the field. It will promote the adoption of Big Data enabled technologies and will

collaborate with certified professionals to better manage the natural resources and

specifically the use of fresh water. NP is leading the pilot activities with the support of GAIA

EPICHEIREIN and Fraunhofer for the execution of the full lifecycle of the pilot. The pilot

activities are being performed at Kileler, Greece in an area covering 5000ha and the targeted

arable crop is cotton.

Figure 69: Pilot B1.2 high-level overview

In order to support the business expansion of the Big Data enabled technologies that are

introduced within the present DataBio pilot, NP and GAIA EPICHEIREIN have already

established an innovative business model that allows a swift market uptake. With no upfront

infrastructure investment costs and a subscription fee proportionate to a parcel’s size and

crop type, each smallholder farmer, can now easily participate and benefit from the

provisioned advisory services. Moreover, and as more than 70 agricultural cooperatives are

shareholders of GAIA EPICHEIREIN, it is evident that there is a clear face to the market and a



great liaison with end-user communities for introducing the pilot innovations and promoting

the commercial adoption of the DataBio’s technologies.


The pilot has completed the first round of trials during Trial 1. It effectively demonstrated how

Big Data enabled technologies and smart farming advisory services can offer the means for

better managing the natural resources and for optimizing the use of agricultural inputs (fresh

water). All these assumptions have been validated through a set of pilot KPIs which met the

targeted expectations (documented in D1.2). This has been achieved as farmers and the

agricultural advisors showed a collaborative spirit and followed the advices that were

generated by DataBio’s solutions.


Trial 2 timeline

The following roadmap applies for this pilot.

Figure 70: Pilot B1.2 timeline


The following work was conducted by NP, as part of the preparatory work for Trial 2.

As the requirements in terms of sensors deployed for in-the-field usage differ between pilot

sites, it became obvious that several adaptations were necessary in respect to C13.03 and the

way data was represented for both cloud-based storing and Gaiatron station configuration.

More specifically, all relational and EAV (Entity-Attribute-Value) data representations were

adapted to more flexible and scalable JSON format that performs better in a dynamic IoT

measuring environment. The latter is widely acknowledged as JSON has become gradually the

standard format for collecting and storing semi-structured datasets that originate from IoT

devices. The adaptation to a JSON format for modelling IoT data streams allows the further



processing, parsing, integration and sharing of data collections in support of system

interoperability though the adaptation on well-established and favoured linked-data

approaches (JSON-LD).

User Interface integration was performed so that the farm management portal (holding all

data of agronomic value and the embedded DSS serving as the endpoint for providing the

advisory services) is integrated with the farm electronic calendar (the endpoint where the

farmer or the agricultural advisor ingests information to the system regarding the applied

cultivation practices, field level observations, sampling, etc.). Both these tools were

developed using the component C13.01. Integration activities were conducted in order to

offer a seamless user experience and allowing the user to carry out his/her intended

operations without going back and forth across different systems.

Figure 71: Screenshot of the unified UI developed for Trial 2. The red menu item indicates farm log functionalities while the orange menu item the farm management functionalities respectively

A new mobile application was developed, namely “gaiasense Field Collect”, so that field-level

data collection can be performed through an Android-powered device. Lessons-learnt from

Trial 1 indicated that by using portable smart devices, it would be easier for the farmer or the

agricultural advisor to ingest data into the system (farm and eye data dimensions as indicated

in Figure 1). The application was implemented with the purpose of supporting several

functionalities like:

a) detailed planning and control of the process of trapping and monitoring of the

population and the spread of insect infestation within a crop. More specifically,

farmers have the ability to record insect infestation directly on the field with the help

of a smartphone and use this data to more effectively control the damage caused by

enemies while reducing the amount of insecticides released into the soil,

b) the recording of the phenological stage of the cultivation at the time of the field

inspection,

c) the recording of soil samples from points within the field, irrigation measurements,

and of cultivation symptoms mainly from enemies and diseases.



Figure 72: Screenshots of the android app used for collecting farm data

Daily evapotranspiration is considered a critical parameter for generating irrigation advices.

It essentially reflects the water content being lost each day from both the plant and the soil.

By calculating this parameter using EO or modelled approaches, the requirement for installing

a tense network of irrigation sensors for monitoring soil moisture ceases to exist. This

significantly reduces infrastructure costs and leads to economy of scale, as irrigation advices

can be extrapolated for many parcels that share similar agro-climatic characteristics (soft

facts). Within Trial 2 preparatory phase, a modelled-based approach has been explored that

attempts to simulate the operation of a high-end pyranometer while measuring the solar

irradiance – an input parameter for reference evapotranspiration. ML methods (neural

networks) have been applied correlating EO and sensor data (from both low cost and high-

end sensors) in order to generate highly accurate, low cost reference evapotranspiration

measurements even at parcel level (Figure 73). The results are encouraging showing an

accuracy of up to ~90% in estimating solar irradiance by fusing low-cost sensor measurements

with EO data. This constitutes a major innovation of the pilot as it sets the stage for significant

infrastructure cost reduction that will make Smart Farming approaches even more accessible

and appealing for adoption by the farmer communities.









into independent components. Each component consists of capabilities to access data,


at the UI-Layer. Following this approach provides more flexibility and eventually allows




thinking about a platform which enables the users to build views for custom analytic tasks



Insurance.

The implementation of Trial 2 focuses primary on the integration of external services. A

variety of visual analytic tools are included to allow efficient exploration of available data. The

integration of services and data sources is done using well-defined RESTful interfaces.

Trial 2 execution


activities:

By M26, the growing season starts. Moreover, DataBio platform v2 for the pilot is fully

operational and involves offering to the farmers and the agricultural advisors technological

tools (unified UI and “gaiasense” field collect android app) in order that they provide

feedback, measurements, observations, and detailed data regarding the farming practices.

Especially, in respect to the farming practices information needs to be ingested into the

system at regular intervals (once a week). As the farming ecosystem is really complex, it is

essential to capture this information at this level of detail in order to shape a complete view

of the monitored parcels. NP was in charge of supervising the data collection process.

Moreover, certified agricultural advisors are starting to use the aforementioned main pilot

UIs in order to access the full set of collected data (in situ agro-climate, EO-based,

crowdsourced, modelled, machine-generated), evaluate it and offer data-driven advices to

the farmers towards better resource management, improved products and yields (more

descriptions and figures can be also found in Deliverable D1.2).

Some indicative figures from the pilots are presented in Figures 73 - 75.

Figure 73: Parcel monitoring at Kileler pilot site indicating some slight intra-field variations in terms of vegetation index (NDVI) and cross-correlations among the latter with ambient temperature and rainfall (mm)



Figure 74: Reference evapotranspiration monitoring at Kileler (both modelled using ML methods developed by NP and based on Copernicus EO data) for July 2019

Figure 75: Irrigation monitoring at a Kileler pilot parcel showing one (1) correct irrigation (water drop icon) after following the advisory services. The impact of rainfalls in the soil water content is obvious on several occasions and if translated correctly can prevent unnecessary irrigations



hosted in NEUROPUBLIC’s N.Greece offices with the participation of other DataBio partners



By M34, the growing season ends and final KPI measurements are collected. More specifically,

from regular discussions with the farmers and the agronomists/agricultural advisors involved

in the pilot activities, final KPI measurements and feedback was collected and can be found

in Section 8.5.2. This work was conducted by NP and GAIA EPICHEIREIN.



Trial 2 results


expected TRL. The farmers and their agricultural advisors continued (for a second year) to

benefit from irrigation advices aiming to facilitate the decision-making process and optimize

the use of agricultural inputs. The collected KPIs validate the pilot assumptions.

It is effectively shown that the results pretty much aligned with the initial set targets for

irrigation cost reduction (Figure 76). This is due to the fact that the farmers both showed

collaborative spirit and adapted their farming practices using the advice offered, thus,

reducing the freshwater requirements during critical phenological stages of their crops.

Figure 76: Aggregated results of the pilot in comparison with the target values



Component code

and name


location

C13.01

Neurocode (NP)

Neurocode allows the

creation of the main

pilot UIs in order to be

used by the end-users

(farmer, agronomists)

and offer smart farming

services for optimal

decision making

deployed NP Servers



C13.03 GAIABus

DataSmart Real-

time streaming

Subcomponent

(NP)

Real-time data stream

monitoring for NP’s

GAIAtrons

Infrastructure installed

in the pilot site

Real-time validation of

data

Real-time parsing and

cross-checking

deployed NP Servers

C04.02 – C04.04

Georocket,

Geotoolbox,

SmartVis3D

(Fraunhofer)

Back-end system for Big

Data preparation,

handling fast querying

and spatial

aggregations (data

courtesy of NP)

Front-end application

for interactive data

visualization and

analytics

deployed Fraunhofer

Servers

Data Assets


original source

Datase

t

locatio

n

Volum

e (GB)

Velocity

(GB/year)

Sensor

measuremen

ts (numerical

data) and

metadata

(timestamps,

sensor id,

etc.)

Gaiasense

field. Dataset

composed of

measurement

s from NP’s

telemetric IoT

agro-climate

stations called

GAIATrons for

the pilot site.

NEUROPUBLIC GAIA

Cloud

(NP’s

servers

)

Severa

l GBs

Configurable

collection and

transmission

rates for all

GAIATrons. 4

GAIAtrons

fully

operational at

the pilot sites

collecting >

30MBs of data

per year each



with current

configuration

(measuremen

ts every 10

minutes)

EO products

in raster

format and

metadata

Dataset

comprised of

remote

sensing data

from the

Sentinel-2

optical

products (1

tile)

ESA

(Copernicus

Data)

GAIA

Cloud

(NP’s

servers

)

>1000 >350



NP and GAIA EPICHEIREIN have already launched on 2013 their Smart Farming program, called

“gaiasense” (http://www.gaiasense.gr/en/gaiasense-smart-farming), which aims to establish

a national wide network of telemetric stations with agri-sensors and use the data to create a

wide range of smart farming services for agricultural professionals.

Within the DataBio the quality of the provided services greatly benefited from the

collaboration with leading technological partners like Fraunhofer, that specializes in the

analysis of Big Data. Moreover, feedback from the end-users and lessons-learnt from the pilot

execution significantly fine-tuned and will continue to shape the suite of dedicated tools and

services, thus, facilitating the penetration of “gaiasense” in the Greek agri-food sector.

The sustainability of NP’s DataBio-enhanced smart farming services, after the end of the

project is achieved through: a) the commercial launch and market growth of “gaiasense” and

b) the participation to other EU and national R&D initiatives. This will allow continuously

evolving/validating the outcomes of the project, by working with both new and existing (to

DataBio) user communities and applying its innovative approach to new and existing (again

to DataBio) areas/crops.

http://www.gaiasense.gr/en/gaiasense-smart-farming



KPIs

KPI

short

nam

e

KPI

description

Goal

descriptio

n

Base

value

Target

value

Measure

d value

Unit

of

value

Comment

B1.2

_1

Reduction

in the

average

cost of

irrigation

per hectare

following

the

advisory

services at a

given

period.

2670 1869 1881 euros

/ha

B1.2

_2

Decrease in

inputs

focused on

irrigation

(amount of

water used)

2670 1869 1881 m3/h

a



9 Pilot 7 [B1.3] Cereal and biomass crops_3 Pilot overview

This pilot was designed to implement remote sensing, IoT farm telemetry and proximal sensor

network-based Big Data technologies for biomass crop monitoring, predictions, and

management in order to sustainably increase farming productivity and quality, while at the

same time, minimizing farming and environment associated risks. Biomass crops of interest

include biomass sorghum and cardoon, which can be used for several purposes including,

respectively, biofuel, fiber, and biochemicals, with a high macroeconomic impact. Fiber hemp

was anticipated but, due to unexpected farmers aversion, this crop was not included in pilots.

The aversion was particularly triggered by a complicated market of the produce. Similarly, the

IoT farm telemetry technology was used in year one for a preliminary observation but, this

technology revealed itself ill adapted to biomass sorghum as the hardware, particularly the

cables, were frequently damaged by rodents. IoT was therefore removed from the trial

settings as frequent repairs were becoming a burden. The offered smart farming services

include Biomass crop monitoring using proximal sensors to derive vegetation indices, and

crop growth and yield modelling using fAPAR derived from satellite (Sentinel 2A and 2B)

imagery and appropriate machine learning techniques. The pilot secured adhesion of private

farmers and/or farming cooperatives. During the 2017 and 2018 cropping seasons, 43

sorghum pilots were run covering 240 hectares. The work on this pilot was distributed

between CREA, Novamont, and VITO. CREA worked on sorghum, and Novamont on cardoon.

VITO supported remote sensing technologies, while CREA supported proximal sensor

technology.

During 2018 an additional field of cardoon was included in the monitoring in Umbria Region

beyond the one already included in the previous reports in Sardinia, in order to give an

example of different cultivation area and cover some of the main areas where cardoon can

be cultivated. In 2018, in collaboration with InfAI, CREA was able to extend crop monitoring

to foliar diseases in one of the pilot field in Anzola, Italy. The goal was to evaluate to

possibilities of crop disease detection from Earth Observation products. For this investigation,

R-CNN - a Regional Convolutional Neural Network was implemented. Despite the great

potential we uncovered in the disease monitoring technology, we nonetheless identified a

weakness associated with relying heavily on natural disease inoculum. Indeed, natural

inoculum is heterogeneous in the field and diseased areas can range from a single plant to a

few plants which is greatly challenging in terms of resolution. This investigation was therefore

discontinued in 2019.

In 2019, crop monitoring activities in biomass sorghums continued in collaboration with VITO

and the agriculture cooperative CAB MASSARI. Four pilots were established in 2019 as

depicted in the below table.



Figure 77: Sorghum pilots established in 2019


In terms of global sorghum crop disease monitoring, five training and testing fields for crop

disease detection had been identified by CREA. Within this diseased field, CREA delimited a

most diseased area of about 1000 square meters (~232 m of perimeter) within which leaf

disease occurred in about 60 to 70% of the plants. Two foliar diseases were observed, i.e.,

Anthracnose (most prevalent) and Bacterial stripe. The primary hypothesis is that most crop

diseases highly correlate with the chlorophyll content of the crop. Moreover, the chlorophyll

content can be measured by multispectral images. Therefore, the NDVI (Normalized

Difference Vegetation Index) has been used. In the first run, excellent results had been

developed. The network works as it should and detect the fields (Figure 78).

The network was even able to detect the disease and distinguish it from surrounding areas

(Figure 79).

Figure 78: Sorghum Foliar Diseases Detected area with the reliability of 0,925



Figure 79: Sorghum Foliar Diseases Detected area with the reliability of 0,861

The set was very small. Overall there were six training sets and two for validation, so the

results were limited. The main problem of small datasets is the overfitting – which means that

the models are trained too well, precisely to the set of data. In order to overcome overfitting,

we are working on the following issues:

• Expand the database (contact to Saxonia local agricultural government, more will

follow)

• Augmentation (Expand the database by manipulation)

• Regulation

Up to now, we created 1000 test cases out of our starting point, and the success rate is still

high.

For the crop monitoring using satellite imageries, forty-three pilot biomass sorghum trials

were run by CREA over two cropping seasons in 2017 and 2018 as represented in Figure 80.

The biomass sorghum pilot trials were mainly established in private farms and co-run by CREA

and private farmers and private farming cooperatives operating in the northern Italian

communes of Nonantola, Mirandola, and Conselice. Only eight pilots were run in CREA’s

experimental station of Cà Rossa (Anzola dell’Emilia) in both 2017 and 2018 cropping seasons.

During the 2018 cropping season, sorghum was monitored for phenology, yields, and foliar

diseases. Two cardoon fields were monitored in 2018, one located in the North of Sardinia,

as continuation of 2017 work, this field cardoon was established in 2014. The other field is

located in Umbria, which represents a quite new area for the cardoon and where breeding

activity is also carried out by Novamont. In the last cultivation period (2017-18) in Umbria the

phonological phases were monitored together with the agronomical operations.



Figure 80: Map of Italy (A) with a rectangle inset indicating the geographical location of the experimental sites (red dots) for pilots established in 2017 (B) and 2018 (C)


Trial 2 timeline

January - May 2019: Pilot sites identification, preparing contracts between CREA and the

farming cooperative CAB MASSARI of Conselice, Italy, preparing fields and calibrating seeds,

sowing the pilots.

May - October 2019: Field visits, data collection, Data processing, and reporting.


In collaboration with the farming cooperative CAB Massari of Conselice, the pilot sites were

identified, and ad hoc contract signed between CREA and CAB Massari. The contract

described the sequence of field activities that CAB Massari and CREA had to carry out in the

pilots. The plot sites were geolocated and the coordinates entered into VITO system for

monitoring the fAPAR index throughout the cropping season. In addition, Chlorophyl meter

and NDVI meters were prepared for respective data collection.

Trial 2 execution

Chlorophyl index and NDVI index were collected weekly. Fields were geolocalized,

geolocation data saved as kml files before they were integrated into WatchItGrow®

application. Sentinel-2A and Sentinel-2B images from tile 32TQQ were downloaded from ESA



and processed. Processing included atmospheric correction with iCOR, cloud and shadow

detection using Sen2COR v2.5.5 and calculation of biophysical parameters using BV-NET

(Biophysical Variable Neural Network). The BV-NET methodology is based on neural networks,

which are trained on a synthetic dataset of around 50000 simulations using the PROSAIL

model. Both Sen2Cor and BV-NET are made available through ESA’s SNAP (Sentinel

Application Platform) toolbox. In this study, fAPAR was used to estimate biomass yield. The

fAPAR estimates were generated at decametric spatial resolution (10m pixel size), and a

temporal resolution of 5 days up to 2-3 days in those areas where the different satellite

overpasses overlapped. Spatial resolution refers to the surface area measured on the ground

and represented by an individual pixel, while temporal resolution is the amount of time,

expressed in days, that elapses before a satellite revisits a particular point on the Earth's

surface. For each experimental field, fAPAR or “greenness” maps were produced, and a

growth curve was built, showing the evolution of the fAPAR values throughout the cropping

season. To correct for artefacts in the curve such as abnormally low fAPAR values due to

undetected clouds, shadows or haze and to interpolate fAPAR values between subsequent

acquisition dates, a Whittaker smoothing filter was applied on the curve. Finally, the fAPAR

values from the curves were used for further analytics.

Four models were assessed including simple linear model (LM), Bayesian additive regression

trees (bartMachine method), Bayesian generalized linear model (bayesglm method), and

eXtreme Gradient boosting (xgbTree method). The simple linear model was used as a

benchmark to gauge the performance of the models implemented. The models evaluated

were selected based on their robustness. Fortnightly fAPAR values acquired from late April to

late August were used in this work, resulting in nine days of year (DOY) that is, from DOY 120

in April to DOY 240 in August. These days of year were used as explanatory (regressors)

variables in successive predictive modelling of sorghum biomass yields. The dataset was

randomly partitioned into training (80% of the entire dataset) and testing set (20% of the

entire dataset). The training set was used to run a cross-validation experiment to train and

assess the models using a 10x repeated 5-random fold cross-validation (CV), rendering a total

of 50 estimates of accuracy and prediction error. Models were validated on the testing set

which was an external test (validation) sample. The models were evaluated based on the

coefficient of determination (R2), mean absolute error (MAE), mean absolute percentage

error (MAPE), and symmetrical mean absolute percentage error (SMAPE). The MAPE makes

it possible to compare the prediction of different dependent variables that were evaluated

using different scales. The MAE measured the average magnitude of the errors in the set of

predicted values without considering their direction. The MAE provides an unambiguous

measure of the magnitude of the average error and is therefore more appropriate than the

Root Mean Square Error (RMSE) for dimensioned evaluations of aver-age model performance

error. The symmetrical MAPE (SMAPE) was used to deal with some of the limitations of the

MAPE. As in MAPE, SMAPE averages the absolute percentage errors but these errors are

computed using a denominator representing the average of the forecast and observed values.

SMAPE has an up-per limit of 200%, that is a 0 to 2 range that is useful to judge the level of

accuracy and that should be influenced less by extreme values. Furthermore, SMAPE corrects



for the computation asymmetry of the percentage error. The MAE built within the repeated

cross validation procedure was used to assess the dependability of the model performance.

On the other hand, all the above metrics as obtained on the testing set were used to assess

the model predictive ability. The importance of the explanatory variables (useful prediction

times) was determined using a 0 to 100 index, with 0 no effect and 100 the highest magnitude

of the regressor’s importance.

Trial 2 results

The results obtained in the Trial 2 (third year of the project) were integrated with the previous

two years’ data in order to be meaningful. The MAE dispersion during training experiment

was increasingly narrower in the order LM > bayesglm > xgbTree > bartMachine methods.

Over the months evaluated, the prediction errors in the testing set were mostly higher with

the linear model, which also displayed the least value of the coefficient of determination

(Table 7). Overall, the bartMachine method showed relatively high R2 values and least values

of prediction errors. The best regressors were D.150 (second half of May) and D.165 (first half

of June) (Figure 81). D.240, D.195, D.210, and D.120 showed minor effects, while D.135,

D.180, and D.225 showed no prediction importance.

Table 7: The observed performance of implemented models.

Model SMAPE

(%)

MAPE

(%)

MAE

(t ha-)

R2

LM 0.74 0.99 10.47 0.47

bartMachine 0.18 0.16 2.32 0.51

Bayesglm 0.74 0.98 10.34 0.48

xgbTree 0.44 0.36 4.07 0.62

SMAPE, MAPE, MAE, R2, respectively, symmetrical mean absolute percentage error, mean absolute percentage error, mean

absolute error, and coefficient of determination. LM, bartMachine, bayesglm, xgbTree, respectively, simple linear model,

Bayesian additive regression trees (bartMachine method), Bayesian generalized linear model (bayesglm method), and

eXtreme Gradient boosting (xgbTree method).



Figure 81: Left: visualization of models cross-validation MAE (t ha-1) dispersion using boxplot approach and fAPAR acquired from April to August. LM, bartMachine, bayesglm, xgbTree, respectively, simple linear model, Bayesian additive regression trees (bartMachine method), Bayesian generalized linear model (bayesglm method), and eXtreme Gradient boosting (xgbTree method). Right:Relative importance of regressors (day of year, D) on sorghum biomass yields using bartMachine method

The pilot B1.3 was conducted yearly from 2017 through 2019. An integrative analysis was

carried out that accounted for: 1) the data collected from the 2017 preliminary trials, 2) the

data collected from the 2018 Trial 1, and 3) the data collected from the 2019 Trial 2. An

integrative conclusion is therefore in order. Clearly, Sentinel-2-derived fraction of absorbed

photosynthetically active radiation (fAPAR) was found to explain primary productivity and

was used in this study as biophysical variable in the predictive modelling of aboveground

biomass yields in annual and perennial sorghums. Bayesian additive regression trees

(bartMachine method), a Bayesian machine learning approach, was found more promising

than most artificial intelligence approaches, and predicting sorghum biomass yields using as

regressors days of year 150 and 165 offered much modelling performance.





Component code

and name


location

C12.03 EO4CDD

Detect crop

diseases, Tested

during trail stage 1

Initial set up InfAI

[Germany]

Server

https://www.d

atabiohub.eu/

registry/#servi

ce-

view/EO4SDD

C08.02 (Proba-V

MEP)

Sentinel-2

processing,

dashboards,

services for viewing

and time series

extraction

Adapted according to the

needs of pilot B1.3

Proba-V MEP

at VITO

C22.01 Crop monitoring

and yields

prediction

Adapted according the

history and events in the

pilot B1.3

CREA

(ephrem.haby

arimana@crea

.gov.it)

Data Assets


source

Datase

t

locatio

n

Volum

e (GB)

Velocity

(GB/year)

Phenotypic

data

Sorghum

biomass.CRE

A

CREA CREA 0.3x10

^-3

0.15x10^-3

https://www.databiohub.eu/registry/#service-view/EO4SDD







Geospatial

data

Sentinel.sorg

hum.CREA

VITO VITO 3000 3000

Optical

sensors data

NDVI.Chl.CR

EA

CREA CREA 3x10^-

3

3x10^-3


KPIs

KPI

short

nam

e

KPI

descripti

on

Goal

description

Base

value

Target

value

Measure

d value

Unit

of

value

Comment

CREA

-

B1.3-

KPI-

01

Early

within

season

Yields

prediction

error

Reduce

prediction

error

5 5 0.16 % MAPE (%,

mean absolute

percentage

error)



10 Pilot 8 [B1.4] Cereals and biomass crops_4 Pilot overview

The pilot aims to develop a platform for mapping of crop vigor status by using EO data

(Landsat, Sentinel) as the support tool for variable rate application (VRA) of fertilizers and

crop protection. This includes identification of crop status, mapping of spatial variability and

delineation of management zones. Development of platform is realized on the cooperative

8300 ha farm in Czech Republic, however basic datasets are already prepared for all Czech

Republic. So current status of pilot support utilisation of solution on any farm in Czech

Republic.

The pilot farm Rostenice a.s. with 8.300 ha of arable land represents a bigger enterprise

established by aggregating several farms in past 20 years. Main production is focused on the

cereals (winter wheat, spring barley, grain maize), oilseed rape and silage maize for biogas

power station. Crop cultivation is under standard practices, partly conservation practices is

treated on the sloped fields threatened by soil erosion. Over 1600 ha is mapped since 2006

by high density soil sampling (1 sample per 3 ha) as the input information for variable

application of base fertilizers (P, K, Mg, Ca). Farm machines are equipped by RTK guidance

with 2-4 cm accuracy. Farm agronomists don’t use any strategy for VRA of nitrogen fertilizers

and crop protection because of lack of reliable solutions in CZ.

The work was supported by development of platform for automatic downloading of Sentinel

2 data and automatic atmospheric correction. Currently is Lesprojekt ready to offer

commercial services with processing satellite data for any farm in Czech Republic

Other part was focused on transferring Czech LPIS into FOODIE ontology and to developed

effective tools for querying data. This work was done together with PSNC and system is

currently supporting open accessing to anonymous LPIS data through FOODIE ontology and

also secure access to farm data.

The main focus of the pilot is on the monitoring of cereal fields by high resolution satellite

imaging data (Landsat 8, Sentinel 2) and delineation of management zones within the fields

for variable rate application of fertilizers. The main innovation is to offer a solution in form of

web GIS portal for farmers, where users could monitor their fields from EO data based on the

specified time period, select cloudless scenes and use them for further analysis. This analysis

includes unsupervised classification for defined number of classes as identification of main

zones and generating prescription maps for variable rate application of fertilizers or crop

protection products based on the mean doses defined by farmers in web GIS interface.


As the result of Trial 1, spatial data about crop yields from harvester were recorded in the

period from June to September. From the total acreage of pilot farm 8.300 ha, more than

3350 ha of arable land was covered by yield mapping in the cropping season 2018. Especially

crop yields were recorded grain cereals (winter wheat, spring barley, winter barley), oilseed



rape and also grain maize. Data was later processed for outlier analysis and by spatial

interpolation techniques to obtain final crop yield map in absolute [t.ha-1] and relative [%]

measure.

Figure 82: Yield maps represented as relative values to the average crop yield of each field (harvest 2018)

During the 2018 vegetation period, field experiment was established for testing variable rate

application of nitrogen fertilizer based on the yield potential maps computed from Landsat

time-series imagery and digital elevation model (DEM). Testing was carried out on three fields

with total acreage of 133 ha. The main reason was to tailor nitrogen rates for spring barley

according to the site-specific yield productivity and to avoid the crop lodging risk in the water

accumulation areas. Plant nutrition of spring barley for malt production is more difficult than

other cereals because of limits for maximal N content in grain. Thus, balancing of N rates to



reach highest yield and simultaneously not to exceed N content in grain is crucial for

successive production of spring barley.

For definition of yield productivity zones, 8-year time-series of Landsat imagery data was

processed with the results of relative crop variability. Final map is represented as percentage

of the yield to the mean value of each plot, later multiplied by expected yield [t.ha-1] as the

numeric variable for each field and crop species. Values of yield potential were reclassified

into three categories – high, middle and low-yielded areas – nitrogen rate was increased in

the high expected yield areas.

To guarantee access for farmers and testing of yield potential we calculate yield potential for

2017 season on basic level for all Czech Republic and data are now available as Open on

Lesprojekt server for all Czech Republic. Farmers can test this basic data for their purpose

freely.

Figure 83: Transformation and publication of Czech data as Linked data with prototype system for visualising

Linked Data

PSNC contributed to this pilot with the transformation and publication of Czech data as Linked

data in order to provide an integrated view over different and heterogeneous data sources.

This work has been carried out by applying the pipeline described in D4.4 Section 3.3 (final

version), taking as input data from the pilot partners (farm data) as well as different open

Czech datasets, and by transforming them into Linked Data using FOODIE ontology (described

in D4.i1 Section A.15) as the underlying model. In particular, the following datasets were

transformed:



• Farm data

○ Rostenice pilot farm data, including information about each field names with the associated cereal crop classifications arranged by year.

○ Data about the field boundaries and crop map and yield potential of most of the fields in Rostenice pilot farm

○ Yield records from two fields (Pivovarka, Predni) harvested in 2017.

• Open data

○ Czech LPIS data showing the actual field boundaries.

○ Czech erosion zones (strongly/SEO and moderately / MEO erosion-endangered soil zones).

○ Restricted area near to water bodies (example of 25m buffer according to the nitrate directive) from Czech.

○ The data about soil types from all over Czech.

These datasets were transformed into RDF format and published as linked data. The resulting

datasets (farm and open) are available as Linked Data in PSNC Virtuoso endpoint. In particular,

this work involved the following steps.

• Data modelling was one of the main tasks required to transform the input datasets

into RDF and to align them with the INSPIRE-based FOODIE data model (covering

farming and geospatial data). For this step, we took FOODIE ontology, which is based

on INSPIRE schema and the ISO 19100 series standards, as our base vocabulary and

created a Czech extension in order to represent all the farm and open data from the

input datasets. In particular the extension includes data elements and relations from

the input datasets that were not covered by the main FOODIE ontology and that were

specific to Czech partners needs

• Generation of the RDF data required a mapping file that specifies how to map the

contents of a dataset to RDF triples, matching the source dataset schema to FOODIE

ontology and extensions. This mapping file is generally an RDF document itself, written

in R2RML/RML, and includes information about the data source, its format and

connection details. Generating this mapping file is also not a trivial task, as most of the

available tools require manual editing of the R2RML11/RML12 definitions. The tool used

to execute the transformation usually also depends on the type of source data. As in

this experiment, both farm and open data were in the form of shapefiles, we used

GeoTriples13 tool in order to execute the mapping and generate RDF dumps from the

source shapefiles.

11 https://www.w3.org/TR/r2rml/ 12http://rml.io/ 13 http://geotriples.di.uoa.gr/

https://www.w3.org/TR/r2rml/

http://rml.io/

http://geotriples.di.uoa.gr/



• The RDF datasets that were generated were then loaded into Virtuoso triplestore. A

sparql endpoint and a faceted search endpoint are available for querying and

exploiting the Linked Data in the Virtuoso instance within PSNC infrastructure.

• Our next task was to show the integrated view over the original datasets. As target

datasets were particularly large (especially when considering connections with open

datasets), and the connections were not of equivalence (i.e., resources are related via

some properties (e.g., geometry) but they are not equivalent) it was decided to use

queries to access the integrated data as per need rather than using link discovery tools

like SILK or LIMES. Hence cross querying within the datasets were done in Virtuoso

SPARQL endpoint for some use cases to establish possible links between agricultural

and other related open datasets.

• To visualize and explore the Linked Data in a map different application/system

prototypes were created using the component called HSLayers NG as mentioned

earlier. (e.g. https://app.hslayers.org/project-databio/land/). One such visualization is

shown in Figure 84.

• As target datasets were particularly large (especially when considering connections

with open datasets), and the connections were not of equivalence (i.e., resources are

related via some properties (e.g., geometry) but they are not equivalent) it was

decided to use queries to access the integrated data as per need rather than using link

discovery tools like SILK or LIMES. Hence cross querying within the datasets were done

in Virtuoso SPARQL endpoint for some use cases to establish possible links between

agricultural and other related open datasets. The public instance of SILK is present in

http://silk.foodie-cloud.org/ .

• To visualize and explore the Linked Data in a map different application/system

prototypes were created using the component called HSLayers NG as mentioned

earlier. (e.g. https://app.hslayers.org/project-databio/land/). One such visualization is

shown below:

https://app.hslayers.org/project-databio/land/

http://silk.foodie-cloud.org/

https://app.hslayers.org/project-databio/land/



Figure 84: Map visualisation prototype (HSLayer application) - http://app.hslayers.org/project-databio/land/

The resulting linked datasets are accessible via: https://www.foodie-cloud.org/sparql.


Trial 2 at Rostenice pilot farm was focused on the full area mapping of cereals and other crops

by satellite multispectral imaging and variable rate application of nitrogen fertilizers during

the 2019 vegetation period. The farm area increased from 8.300 to 10.087 ha during 2018 by

acquisition of smaller farm enterprise in neighbourhood.

The main fertilization strategy has changed during the project. Due to the frequent

occurrence of dry periods in the last two years (2018 and 2019), the monitoring of the current

crop status has gradually lost its importance and attention has been more focused on the

accuracy of delineation of management zones based on the EO data analysis. The reason is

simple, the current crop status observed by remote sensing does not have to reflect the

nutritional status during the dry period. Thus, the dosage of N is more dependent on the

expected yield than the diagnosis of plant nutritional status. Thus, main aim of the Trial 2 was

to evaluate variable rate application of nitrogen fertilizers based on the long-term analysis of

satellite multispectral imagery from free available data sources, such as Landsat and Sentinel-

2.

Trial 2 timeline

Timeline of Trial 2 follows the vegetation period of cereal crops in the 2019. Variable rate

application of nitrogen fertilizers was carried out during the spring (March 2019) in the form

of 1st top-dressing application for winter cereals and as the application before sowing of

spring cereals (maize, spring barley).

http://app.hslayers.org/project-databio/land/

https://www.foodie-cloud.org/sparql




Preparation for Trial 2 included processing of EO imagery from the Landsat 8 and Sentinel-2

repository, both as the surface reflectance products with calculation of basic set of vegetation

indices as the next step.

For definition of yield productivity zones, 8-year time-series of Landsat imagery data was

processed with the results of relative crop variability. Final map is represented as percentage

of the yield to the mean value of each plot, later multiplied by expected yield [t.ha-1] as the

numeric variable for each field and crop species. Values of yield potential can be also

reclassified into three or five categories (zone maps) – high, middle and low-yielded areas.

Figure 85: Graphs of Sentinel-2 NDVI during the vegetation period 2019 for winter wheat (above) and spring barley (bellow) at locality Otnice (Rostenice farm). Low peaks indicate occurrence of clouds within the scene (Source: Sentinel-2, Level L1C, Google Earth Engine)



Figure 86: Example of the output map products from yield potential zones classification from EO time-series analysis: classification into 5% classes (left), 5-zone map (middle) and 3-zone map (right). Blue/green areas indicate higher expected yield

Figure 87: Map of yield potential zones (5-zone map) updated for 2019 season from 8-year time-series imagery; for southern (left) and northern (right) part of Rostenice farm

Trial 2 execution

Variable rate application of fertilizers

Prescription maps for variable rate application of nitrogen fertilizers were prepared by simple

reclassification and values editing tools in GIS. The value of nitrogen rate was determined

based on the agronomist experience and knowledge of the site-specific production conditions

and crop variety needs. Final step was an export of prepared maps into shapefile/isoxml

format and upload into machinery board computers (mainly Trimble or Mueller Elektronik).



Figure 88: Variable rate application of solid fertilizers by Twin Bin aplicator on Terragator

Figure 89: Variable rate application of liquid N fertilizers (DAM390) by 36m Horsch Leeb PT330 sprayer

Crop yield mapping

In 2019 were acquired yield maps by the combine harvester on the area over 3675 ha of grain

crops (winter wheat, spring/winter barley, oilseed rape) and 2786 ha of silage maize by forage

harvester. Data was later processed for outlier analysis and by spatial interpolation

techniques to obtain final crop yield map in absolute [t.ha-1] and relative [%] measure. Crop

yield maps are used for validation of yield potential maps estimated by EO imagery.

Statistical testing of crop yield maps from 2019 and regression analysis with set of Sentinel-2

vegetation indices are still in process. However, the results from recent years showed the

relationship between vegetation indices and yield values of crops. Correlation coefficients

varied among observed fields; closer relationship was discovered on the fields with higher

spatial variability.



Figure 90: Crop yield maps from 2019 harvest

Figure 91: Graph with changes of correlation coefficients between winter wheat and set of Sentinel-2 vegetation indices during the vegetation period 2018. Most sensitive period was detected in Mai and June



Figure 92: Graph of correlation coefficients between winter wheat yield maps and Sentinel-2 NDMI (2018/06/10) among observed fields. Highest correlation was detected on the fields with higher acreage and spatial heterogeneity

Trial 2 results

In Trial 2 was implemented a crop monitoring by Earth Observation tools in the pilot farm on

the farm area over 10.000 ha. The main area of interest was the introduction of variable rate

application of nitrogen fertilizers according to the assessment of nutritional status of crop

stands.

The main result of Trial 2 is the introduction of variable application of nitrogen fertilizers in

the pilot farm Rostěnice a.s. This was carried out on an area of about 3000 ha in the form of

a basic N application before sowing spring barley, maize and top-dressing N application during

the vegetation period of winter cereals. The main input layer is a yield potential map, which

is calculated from 8-year time series of satellite images (Landsat) and represents the

delimitation of management zones corresponding to the resulting land productivity.

Acquiring crop yield data in the form of yield maps allows to validate yield potential maps

from EO that have reached approximately 75% compliance with yield maps. Precise

quantification of the benefits of the applied procedures on the pilot farm is difficult because

there was no direct savings of applied fertilizers, but increased efficiency due to redistribution

of nitrogen doses with respect to expected yield. Although the total consumption of fertilizers

has not changed, it is precisely by targeted application according to yield levels that the

efficiency of fertilizer utilization can be expected somewhere around 8%.




DataBio Component deployment status

Component code

and name


status

Component

location

C09.12: OpenLink Virtuoso

Publishing he Czech farm and open data as Linked Data and allowing querying of the datasets via SPARQL endpoint.

operational PSNC infrastructures

C02.01 UWB/SensLog

Service, for the collection, processing and publication of sensor data.

testing Lesprojekt serves

C02.03 LESPRO/HSLayers,

Visualisation of data operational Lesprojekt servers

C02.06 LESPRO/Data model for PA

Integration of various farm data and data from other sources

operational Lesprojekt servers



Data Assets

Data Type

Dataset Dataset original source

Dataset location Volume (GB)

Velocity (GB/year)

Sentinel-2 vegetation indices

Sentinel-2 L2A

ESA openhub repository

https://scihub.copernicus.eu/dhus/#/home

1500 GB 245 GB/year

Landsat vegetation indices

Landsat 5,8 Level 2 Surface Reflectance

USGS ESPA

https://espa.cr.usgs.gov/index/

300 GB 24 GB/year

Sensor data

Yield maps - shp point data

grain harvester

Lespro server 2,5 GB (2018)

2,5 GB/year

Czech farm RDF data

Farm oriented Linked Data (field and crops, field boundaries in a farm, Yield mass data for some fields) in N-triples format

Shape files provided by Czech partners to PSNC

Virtuoso server within PSNC infrastructure

~ 1.5 GB







Czech Open RDF data

Linked Open Data (Czech LPIS, Soil maps, erosion zones, wate buffers) in N-triples format

Shape files provided by Czech partners to PSNC

Virtuoso server within PSNC infrastructure

~11 GB mostly static

DEM DMR4G DMR4G CUZK

arcgis online 0,1 GB



The biggest success of the pilot Trial 2 is the successful introduction of variable application of

nitrogen fertilizers based on satellite monitoring into the real plant operation on the farm

fields. Although Rostěnice a.s. plays in its region a role of a pioneer in the use of precision

farming technologies, they have long been hesitant about choosing the right technology for

variable N fertilizer application. After the initial scepticism of the use of crop sensors in terms

of their demands on their operation, they finally decided on a variable application based on

delineation of the management zones from the yield potential maps and the strategy of

increasing the N dose in areas with higher expected yield. This strategy has proved to be a

promising option with regard to more arid farming conditions (and the absence of irrigation,

where the main yield limiting factor is the availability of soil moisture). Testing of VRA has

been started on the selected fields with spring barley (over 150 ha) in 2018. In this case, spring

barley for beer production was chosen as the most sensitive crop to the N application,

because of the difficulty of achieving malting quality in more arid conditions (sum of

precipitation from March till July 2018 at the level of 152 mm). Inadequate nutrition of plants

by nitrogen leads to significant yield reductions, while excessive N doses decrease the malting

quality of grain. During the last growing season (2019), a variable application of N fertilizers

on an area of more than 3,000 ha was launched. This included base N fertilizing before sowing

spring barley and maize and 1st N application in top-dressing of winter cereals (winter wheat,

winter barley). Beside the plan, testing of variable application of crop growth regulators in

spring barley by combination of yield potential zoning from EO time-series analysis and actual

crop status monitoring from Sentinel-2 imagery was also started. The results of this testing

will be available during winter 2019/2020.



KPIs

KPI

short

nam

e

KPI

descripti

on

Goal

description

Base

value

Target

value

Measure

d value

Unit

of

value

Comment

EO

proc

essin

g

area

Area of

processe

d EO data

Covering the

maximum of

pilot farm

area

1500 8300 10000 ha

Zone

delin

eatio

n

accu

racy

Accuracy

of

manage

ment

zones

delineati

on by

field

survey

and yield

maps.

Estimate

d as the

deviation

to yield

zones.

Increase the

quality of

field zoning

50 75 75 %

Fertil

izers

use

effici

ency

Increase

of

fertilizers

use

efficiency

and farm

productivi

ty

Increase of

fertilizers use

efficiency

5 10 8 % Estimation

of fertilizer

usage

efficiency

was

influenced

by the

drought

occurrence

during the

vegetation

period 2018

and 2019



11 Pilot 9 [B2.1] Machinery management Pilot overview

This pilot is focused mainly on collecting telematic data from tractors and other farm

machinery and analysis of these data in relation with other farm data.to analyse and compare

to other farm data. The main goal is to collect and integrate data and receive comparable

results. A challenge associated with this pilot is that a farm may have tractors and other

machinery from manufacturers that use different telematic solutions and data

ownership/sharing policies.


During Trial 1 the number of monitored Zetor tractors increased to 50. The datasets on

LESPRO’s servers contains current or historical data from 21 tractors of various brands and

models.

Figure 93: Tractor trajectory and work log

Unlike most other DataBio agricultural pilots that target a field, farm, or wider Territory in

Task 1.4, the pilot B2.1 collects mainly data from tractors wherever they are working, so farm

data are available only for part of the farms, where the tractors are in operation.

However, even in cases where data directly provided from the farm are missing, the data from

tractors can be combined and analysed at least in context with data on the farmer’s blocks

across the whole Czech Republic because the boundaries of farmer’s blocks are part of

publicly available LPIS (Land Parcel Identification System). More detailed farm information is

available only for some farms where tractors are used.

Analysis of data from Zetor tractors during Trial 1 and comparing them to data from other

tractors collected before DataBio project or during DataBio project led to several findings.

The technical solution of the data collection process from tractors of different brands and

models is the easier part. The greater challenge is to ensure the comparability of the



information contained in the data for the purposes of various analyses, for example fuel

consumption in various parts of farmer’s blocks. In addition to GPS accuracy a key role is

played by the frequency of data collection, the interval between data transmissions, and data

processing between data acquisition and data transmission.

Some tractor models with some monitoring units are able to send only current values and do

not take into account the values between data transmission, others are able to measure

values more often and send aggregated values. In the first case, the results are rough

estimates, in the latter case they are values that are closer to reality. Although the data flows

through CAN bus and ISOBUS are based on standards, these standards include both

mandatory and customizable parts and implementation differs in different brand of tractors.

Although it is generally known that correction of GPS signals is required for the purpose of

automatic guidance of tractors, accuracy without corrections is sufficient for some types of

analysis. In this case, the frequency of data collection is important. Too long intervals between

position recording cause the trajectory to be very inaccurate, especially in places where the

tractor is turning. Setting the position record frequency is a compromise between the

trajectory accuracy and the amount of data transferred. According to LESPRO’s finding, it is

hard to set the ideal recording frequency for both the purpose of diagnostics and tractor

maintenance planning, as well as analgesics for precision farming and getting inputs for

economic analysis, if the user wants to optimize the amount of data transferred. Apart from

the fact that different variables are a subject of interest of interest in these cases, diagnostic

purposes do not require such frequent GPS position collecting as analysis for precision

farming and economic analysis. It is therefore appropriate to use different data collection

frequencies, depending on what services the customer wants to use.

PSNC took the initiative to perform an experimentation associated with the Pilot 9 [B2.1]

Machinery management where sensor data from the SensLog service (used by

FarmTelemeter service) has been transformed into Linked Data on the fly. SensLog performs

collection and processing of vital sensor data that served as the input for the transformation

and publication of sensor data as Linked Data. So, in this use case, data stays at the source

and only a virtual semantic layer was created on top of it to access it as Linked Data (RDF).


Trial 2 timeline

Most DataBio Agriculture Pilots were scheduled to begin in period from April to May 2019

and end in period from August to October 2019. Machinery management pilot sticks with this

schedule from the reporting point of view but process of data collecting continued even

between the trials as this trial is not directly dependant on growing season and it makes no

sense to interrupt collecting data between trials.


As in the first trial Machinery management pilot has no spec trial site as the data from tractor

during trials are collected wherever the monitored tractors move. Tractors used in both Trials



are owned by Zetor and collected data are being used mainly by the Testing and Development

Department of Zetor company.

Most of the Tractors are rented by farmers and Zetor Testing and Development Department

monitors these tractors in real-time operation during farm work using Zetor’s telemetry

solution, other tractors are operated in Zetor testing facilities. Farmers use these tractors for

their daily activities on the farm, Zetor uses telemetry to monitor the reliability of tractors

and their systems and to identify problems and plan maintenance. Consent to the collection

and processing of data is part of the tractor rental contract.

Figure 94: Zetor Major

50 tractors owned by Zetor company were involved in Trial 2. The models of Zator tractors

used in DataBio project involves Crystal 160, Crystal 170 HD, Forterra 140 CL, Forterra 140

HD, Forterra 150 HD, Forterra 140 HSX, Major CL, Major HS, Proxima CL 100, Proxima 110 GP,

Proxima 120 HS.

All of these tractors are equipped by monitoring units and telemetry service developed by

external supplier and adjusted to Zetor’s need according Zetor’s requests.

As one of Zetor’s Long Term goal is gradual development of services and adapting both

hardware and software to the needs of precision agriculture, during DataBio project new

functionalities have been added to Zetor telemetry before Trial 2. This extension of

functionality concerns mainly a basic information on the movement of tractors on LPIS land

blocks. Extending functionality in this direction is part of the services Zetor wants to offer to

their customers as native Zetor solution. Another new functionality includes extending

number of variables that can be exported from Zetor telemetry. These new possibilities of

export are important for viewing and analysing data from Zetor Tractors in third-party

systems.



Zetor’s telemetry service allows to set frequency of data collection. During Trial 1 the interval

between collecting positions and values of additional variables was 10. Although it would be

beneficial to increase frequency of data collection and use interval 1 or 2 seconds, the interval

remained the same for Trial 2.

The reason is contract between Zetor and supplier of telemetry service where amount of

transferred data and used storage space affect the price of the service. As 10 seconds interval

of data collection is currently sufficient for main needs of Development and testing

Department, interval of data collection wasn’t adjusted for Trial 2.

The same telemetry service, that is used for purposes of Testing and Development

Department is offered as optional service to Zetor’s customers buying Zetor tractor, but

although those data would be valuable for the pilot from those Tractor’s aren’t part of

DataBio Machinery management trials due to data protection reasons. It was decided that

during Trial 2 additional monitoring units will be deployed on several Zetor tractors in parallel

with above mentioned monitoring units to test different ways of data collecting and

processing.

For this purpose, the need to involve third party partner was identified and ESTE Technology

was selected as new partner for the Trial 2. ESTE Technology is member of FederUnacoma

association who is partner in Machinery Management pilot since the beginning of DataBio

Project. Este technology will use their monitoring units and telemetry software to push

collected data to Lesprojekt FarmTelemetry through SensLog API.

As the one of the goals of the pilots is testing the possibility to use Zetor’s data in third party

systems, data gathered in Trial 2 through both ways are transferred and imported to

FarmTelemetry application used by LESPROJEKT.

Lesprojekt will use the data gathered in Trial 2 in relation with farm-related data from other

sources and test if the data can be used for the same purposes as data gathered from other

tractors outside DataBio project. As the farmers who uses tractors rented from Zetor

company aren’t member of DataBio projects, data about farms and fields will be limited only

to those which are publicly available, mainly as part of public LPIS dataset. Lesprojekt will also

use data from tractors by other manufacturers gathered outside DataBio project for

comparison of information contained in the data and evaluation of their usability farm related

analysis.

Trial 2 execution

Tractors involved in Trial 2 involved various models of Zetor Major, Proxima, Forterra and

Crystal. The variables recorded differed according to the specific configuration of the tractor.

Part of the variables is useful or potentially useful for agriculture related analysis in

FarmTelemetry, Part of the variables is important only for purposes Testing and Development

department of Zetor.



Technically it is possible to import any type of data coming directly from tractors or other

telemetry system to FarmTelemetry as the list of recorded variables in FarmTelemetry can be

automatically extended according to the coming data.

If the data are supposed to be further processed and used in agriculture related analysis, it is

necessary to assign meaning to the individual variables. Lesprojekt focused mainly on the

variables, which are useful to agriculture related analysis and most of them have some

equaling with the same or similar meaning in data coming from other tractors brands and

models already recorded in FarmTelemetry.

From point of view of agriculture related analysis, the most important data for FarmTelemetry

are time stamp, speed, GPS coordinates, fuel tank level, fuel consumption (l/hour) or (km/l,

)engine RPM, engine load, RPM of PTO or status of PTO.

Other very useful information includes connected implement and data from connected

implement. At this stage, data about connected implement are not recorded in Zetor

Telemetry and can’t be imported to FarmTelemetry.

Two ways of import Zetor data were tested during Trial 2 on of them is using imports from

native Zetor telemetry solution, the other uses monitoring units and services provided by

ESTE.

Lesprojekt used subset of data from Zetor tractors to test various analysis related to tractor

work on fields (LPIS) block like daily activity log including fuel consumption, daily and monthly

field activities overview etc.

Including tractor data from new source in some cases requires modifications of existing

analysis according to the available variables, data quality and data frequency. In some cases,

these modifications involve only adjusting several parameters, sometimes it is necessary to

use different formulas or different algorithms for performing analysis based on data from

different tractors. One of the main points, where this analysis differs according to the data

source is fuel consumptions. Original fuel consumption related analysis in FarmTelemetry

were based mainly on (l/hour) variable, in case of Zetor tractors it is necessary to use fuel tank

level.

Trial 2 results

During Trial 2 data from 50 Zetor tractors were collected from winter to October 2019. The

data collecting and processing continues even after closing the Trial 2 formally.

Data collected during the trials can be displayed an analysed in both Zetor Telemetry and

FarmTelemetry.



Figure 95: Daily tractor utilisation and trajectory in FarmTelemetry

In addition to the needs of Testing and Development department the analysis of gathered

data showed that the technological solutions are suitable for basic agriculture work related

analysis, but the parameters of the data collection process have some limits. One of the limits

is low frequency of data collection, does not allow to accurately depict the tractor trajectory.

The problem is particularly noticeable at the headland where the recorded trajectory shows

sharp spikes.



Figure 96: Spikes caused by 10 seconds interval

The following figure shows the trajectories of another tractor with a data collection rate of 2

seconds. Of course, the result is also affected by other factors such as tractor speed and GPS

accuracy. Neither of these records provides data that is accurate enough to calculate, for

example, an application overlay, both trajectories are sufficient for creation of daily activity

log of tractor or commutation of statistics like time spend on each field, but the trajectory on

the second image provides a better overview. However, importantly, this limitation is due

only to input data and customers who use Zetor telemetry as a commercial service and have

priorities other than testing and development departments can increase frequency data

collection and get better results.

Figure 97: Data collection with 2 seconds interval

The second limitation is that calculation of Zetor tractors fuel consumption is based only on

the fuel level in the tank, not on instant consumption. Calculation results based solely on fuel

level in tank are sufficiently accurate when calculating over a longer time interval, but data

for a shorter time period may be affected by fluctuations caused by tractor movement and

terrain. The combination of both measurement methods is ideal for fuel consumption

calculations.



Figure 98: Fluctuations in fuel tank measurement

Additional result of Trial 2 is that data imported to farm telemetry are available for publication

through linked data pipeline provided by PSNC.



Component code

and name


location

C02.01

UWB/SensLog,

Service, for the

collection,

processing and

publication of

sensor data.

Senslog is required

by

FarmTelemetryser

vice.

operational Lesprojekt

servers

C02.05

LESPRO/FarmTele

metry

Extension of

SensLog for

processing,

analysis and

publication of data

from mobile

sensor units.

Tractors are

considered to be a


servers



mobile sensor

unit.

C02.03

LESPRO/HSLayers

Visualisation of

data from tractors

and other farm

data.


servers

D2RQ Server Transformation of

the Linked Data

from the mapping

file of SensLog

data and

publishing the

data on the fly

operational PSNC

infrastructur

es

C02.06

LESPRO/Data

model for PA

Linking data from

tracts with other

farm data.


servers



Data Assets


source

Datase

t

locatio

n

Volume

(GB)

Velocity

(GB/year

)

Farm data LPIS Ministry of

Agriculture

http://eagri.cz

Lespro

jekt

servers

~4 GB ~4 GB

Machinery

data.

Tractors data

in

FarmTeleme

try

Collecting from

Tractors by

Wirelessinfo

and Lesprojekt

Lespro

jekt

servers

Depends

on what is

considered

as part of

datased.

Raw

positions +

other

variables

20 GB.

Basic data

Including

indexes

and

various

processed

data ~ 100

GB

Several

GB/year

Machinery

data

Original data

from Zetor

Tractors

Data collected

by Zetor

Server

s of

Zetor’s

third

party

service

provid

er and

Lespro

~ 1 GB

(raw data

optimised

for

transfers

from

monitoring

units)

~500 MB

/year

http://eagri.cz/



jekt

servers

Sensor Data Original

sensor data

from

SensLog

Collection of

Sensor data by

Lesprojekt into

relational

Databases of

SensLog

D2RQ

server

within

PSNC

infrast

ructur

e

~ 10 MB ~10 MB



There are several main directions for exploitation of pilot results. Zetor is going to continue

to use their telemetry for purposes of Testing and Development department. Tractor is a

complex mechanical product, which has to fulfil many mandatory safety, ecological, reliability

and technical standards. Development of new product – new tractor is usually process for

many years. Based on it is necessary to look for technologies, which could speed up this

process, make development process cheaper and much more efficient. Telemetry is very

helpful for this process as it can help to perform remote and in real time observation of

reliability tests and Remote and in real time observation of tractor CAN Bus communication,

tractor control unit’s analysis and other. Other part where telemetry helps is creation of long-

term library parameter which are used as an objective from real design work by new products.

Telemetry implemented to support development phases of new tractors can easily be

adapted for additional commercial usage.

One of the main users are farmers using the tractors. They have various requirements based

on the following factors:

• number of brands and models of tractors they use at their farm and telemetry systems

of other manufacturers

• The level of adoption of ICT for agriculture, farm management information systems

etc.

The following paragraphs focus on various functionalities of telemetry systems, their usability

for various groups of farmers or other users of DataBio Machinery Management pilot for

these types of functionalities.



Displaying tractor’s position in real time

This is one of basic functionalities of most vehicle tracking system. Knowledge of the current

position of the tractor is useful mainly for security and for fleet management. Another use of

this functionality is the supervision of the work of tractor drivers. However, for supervision of

work, the knowledge of the current position of use is mainly used for the detection of

potential problems as movement of the tractor in places where it should not be at the

moment. For a more detailed analysis of the quality of work, the history of recorded positions

combined with other data from the tractor and external sources is more important. Knowing

the current position of tractors is useful information for all types of farmers, including those

with low ICT use, as it has low demands on knowledge of users and information systems and

data inputs from the side of farm management. Zetor telemetry has this functionality

covered. FarmTelemetry has support for this functionality but in case of Zetor tractors from

Machinery Management pilot, the data are not transferred to FarmTelemetry in real time.

The possibility of providing this information to a third-party system would depend on the

strategic decision of Zetor Management.

Tractor data recording and analysis of work on LPIS blocks

This functionality may include several different levels. Zetor telemetry supports displaying the

trajectory and calculating basic statistics on the time and fuel consumption on individual LPIS

blocks. This basic level requires minimum data impute from the side of farmer as boundaries

of LPIS blocks can be obtained from publicly available datasets. These results make it easier

for farmers to calculate the cost of specific work and the cost of a field or crop. Covering this

functionality in Zetor telemetry is a step on the path to development of additional services

related to precision agriculture.

Depending on the tractor manufacturer's telemetry system and the fragmentation of

information between systems, the limiting factor for farmers can be especially when Zetor

production is not focused on the most powerful tractors, and Zetor is often in the position of

the second tractor on the farm.

During the DataBio project, two ways to import Zetor telemetry data into a third-party system

(FarmTelemetry) and to perform a similar field work related analysis were tested. This is a

new opportunity for Zetor management to consider opening telemetry data to third party

systems and give tractor owners more freedom to use telemetry data from their tractors in

any way they need.

However, despite this possibility, further development of Zetor's native telemetry remains

one of the strategic priorities for Zetor management and the experience of the DataBio

project will be useful for this goal.

For LESPROJET the machinery Management Pilot provided opportunity to access tractor data from new source and extend functionality of FarmTememetry to be able to receive data from new sources and used them in field works related analysis



Additional benefit is tractor data ready to be published through linked data pipeline provided by PSNC which allows future synergies with linked data activities carried out in other pilots, mainly B1.4

Other users

In addition to the main telemetry users, which are tractor manufacturers, farmers and

advisors providing services to farmers, banks are another user. This applies in cases where

banks provide leasing products and require monitoring of the tractor, which is the bank's

property at the time of the lease. Now they are using native telemetry solutions provided by

Zetor, but it is important to take into account the possibility that banks may later begin to

require direct access to data and use their own tools.

KPIs

KPI

short

nam

e

KPI

descripti

on

Goal

description

Base

value

Target

value

Measur

ed value

Unit of

value

Comment

Tract

ors

total

s

Numbers

of tractors

and

agricultur

al

machinery

using

DataBio

solutions.

Include as

much

tractors as

possible

0 30 50 (71) numbe

r

Data from

21

tractors as

historical

data for

compariso

n. Data

from 50

Zetor

tractors

gathered

during

databio.

Number

of various

tractor

brand/mo

dels

tested.

Include data

from

multiple

tractor

models

NA NA 11

numbe

r



Amount of

collected

data

NA Na ~ 1 GB

GB Raw data

optimised

for

transfers

from

monitorin

g units.

Amount of

Data

including

various

precompu

tations

and

indexes

can be ~

10 times

bigger



12 Pilot 10 [C1.1] Insurance (Greece) Pilot overview

The main focus of the pilot is to evaluate a set of tools and services dedicated for the

agriculture insurance market that aims to eliminate the need for on-the-spot checks for

damage assessment and promote rapid payouts. The pilot concentrates on fusing

heterogeneous data (EO data, field data) for the assessment of damages at field level. NP will

lead the activities for the execution of the full lifecycle of the pilot with the technical support

of FRAUNHOFER and CSEM. Moreover, a major Greek insurance company, INTERAMERICAN,

is actively engaged in the pilot activities, bringing critical insights and its long-standing

expertise into fine-tuning and shaping the technological tools to be offered to the agriculture

insurance market. The methodology of the pilot activities involves the integration of high-

power computing and EO-based geospatial data analytics for conducting damage assessment

with data from IoT agro-climate stations for field-level condition monitoring. The convergence

of the aforementioned technologies in a single dedicated framework is expected to deal

effectively with insurance market demands which require a smooth transition from

traditional insurance policies (expensive, require human experts for damage assessment) to

more flexible index-based insurances. Index-based insurance provides transparency and

reduces bureaucracy since it is based on objective predefined thresholds. It has low

operational costs requiring minimal human intervention. On the top of that, this new type of

insurance can eliminate field loss assessment, adverse selection and moral hazards since the

whole process is fully automated, meaning that the point where the pay-out starts (trigger)

and the point where the maximum pay-out is reached (exit) are based on a prespecified fixed

model per crop. Key stakeholders of the pilot are the farmers, which wish to insure their crops

against weather-related systemic perils (e.g. floods, high/low temperatures, and drought) and

INTERAMERICAN, as a major Greek insurance company, with increased interest in agricultural

insurance products. The pilot activities are performed at Northern Greece targeting at high-

impact annual crops (e.g. tomato, maize, cotton, wheat etc.).


The pilot has completed the first round of trials during Trial 1 on annual crops (e.g. tomato,

maize, cotton) in two regions, namely Evros and Thessaly with significant economic footprint

on the Greek agri-food sector. The incidents that were evaluated (floods and heatwaves) fall

under the definition of the climate-related systemic perils. The pilot effectively demonstrated

how Big Data enabled technologies and services dedicated for the agriculture insurance

market can eliminate the need for on-the-spot checks for damage assessment and promote

rapid payouts. Important insights have been gained from Trial 1 and shaped the execution of

Trial 2. The role of field-level data has been revealed as their collection and monitoring is

important in order to determine if critical/disastrous conditions are present (heat waves,

excessive rains and high winds). Field-level data can be seen as the “starting point” of the

damage assessment methodology, followed within the pilot. Moreover, regional statistics



deriving from this data can serve as a baseline for the agri-climate underwriting processes

followed by the insurance companies who design new agricultural insurance products.


Trial 2 timeline

The following roadmap applies for the pilot activities

Figure 99: Pilot timeline


The following work was conducted by NP, as part of the preparatory work for Trial 2:

• As the requirements in terms of sensors deployed for in-the-field usage differ between

pilot sites, it became obvious that several adaptations were necessary in respect to

C13.03 and the way data was represented for both cloud-based storing and Gaiatron

station configuration. More specifically, all relational and EAV (Entity-Attribute-Value)

data representations were adapted to more flexible and scalable JSON format that

performs better in a dynamic IoT measuring environment. The latter is widely

acknowledged as JSON has become gradually the standard format for collecting and

storing semi-structured datasets that originate from IoT devices. The adaptation to a

JSON format for modelling IoT data streams allows the further processing, parsing,

integration and sharing of data collections in support of system interoperability

though the adaptation on well-established and favoured linked-data approaches

(JSON-LD).

• The work initiated as part of C13.02 GAIABus DataSmart Machine Learning

Subcomponent evolved further on Trial 2 by using statistical methods for EO-based

crop modelling. Lessons-learnt from previous research activities validated the



applicability of statistical solutions in agro-insurance use cases14. More specifically,

crop type and area tailored crop models have been created for the whole Greek arable

area making use of NDVI measurements that have proven to be suitable for assessing

plant health. In total, for each one of the 55 Sentinel-2 tiles that cover the whole Greek

arable land, 7 major arable crops for the local agri-food sector were modelled (as

suggested by INTERAMERICAN) and namely: wheat, maize, maize silage, potato,

tomato, cotton and rice (55x7=385 models in total). The models were developed

exploiting multi-year NDVI measurements from the available last three (3) cultivating

periods and instead of using sample statistics (few objects of interest but many

observations referring to them), population statistic methods (large number of objects

of interest but with few observations referring to them) were employed instead in

order to identify NDVI-anomalies. As sound insurance models are typically created

using large multi-year historical records (~30 years), this approach is ideal for deriving

robust estimates for setting anomaly thresholds (exploiting the space-time cube to

have enough degrees of freedom). The goal is to detect deviations in NDVI

measurements in respect to what is considered normal crop health behavior for a

specific time instance. Thereby, each crop model consists of 36 NDVI probability

distributions that refer to all decads of the year. By adjusting these high and low

thresholds (part of the strategy of the insurance company), it is evident that

measurements found at the distribution extremes can be spotted and flagged as

anomalies (Figure 100). Typically, insurance companies are looking for negative

anomalies (below 15%) that provide strong indications of a disastrous incident.

Figure 100: Crop NDVI probability distribution referring to a decad of the year (Wheat-Larisa region-2nd decad of February). Anomalies can be found at the distribution extremes

14 de Bie, C. A. J. M., B. H. P. Maathuis, and A. Vrieling. "Improved drought detection to support crop insurance models: powerpoint." Proba-V Symposium 2018. 2018.



The following figures graphically depict three different crop models created using the

aforementioned procedure:

Figure 101: Cotton model in Komotini region (T35TLF tile, Maize model in Evros region (T35TMF tile) and Wheat model in Larisa region (T34SFJ tile) by decad (horizontal axis)



In the previous figures, Light green threshold indicates lower 15% extremes while dark green

threshold indicates upper 85% extremes of the probability distribution. Red line is presenting

a single parcel status for the whole 2018 with its NDVI measurements staying within “normal”

ranges for the critical cultivating periods.

During the preparatory phase of Trial 2, CSEM continued on improving the accuracy of its

C31.01 Neural Network Suite for specific crop classes that can be considered a baseline for

future crop modelling activities. As a first step, a structured method of digitizing expert

knowledge in a data-driven architecture was offered. A pipeline was developed significantly

reducing the complexity of creating models by removing the need of hand-crafted filtering,

making it a cost-effective option for bringing neural network models to the market. It was

identified that is was important to verify the reliability of the data with minimum supervision

and then, use the clean data to train the network for the classification problem at hand. All

the efforts, led to an overall accuracy in terms of classification over 92% for Maize, Wheat

and Legumes. Further investigation on particular taxonomical varieties found that training a

crop model with one variety and testing with other varieties performed well, apart from the

crop type Legumes, which shows a large intra-class variability. This aspect of creating a model

with only one variety has the potential to simplify the creation of models in the future. As this

methodology is pixel-based it can be derived that in the aftermath of a disastrous effect, low

classification probabilities for the monitored crop type could be a strong indication of disaster

and could be used in damage assessment approaches.

The preparatory work by FRAUNHOFER for Trial 2 concentrated on the development of an

adaptive analytic platform for geospatial data that allows the integration of services on top

of it. For this purpose, a reference architecture has been drafted that allows to orchestrate

different data sources, processing services and UI components to fulfil the needs of a specific

use-case. What was identified during the preparatory stage of Trial 2 is that this work has a

horizontal impact and provides solutions for multiple use cases scenarios spanning from

Smart Farming, to CAP Support and Agri-insurance.











at the UI-Layer. Following this approach provides more flexibility and eventually allows to

think about a platform which enables the users to build views for custom analytics tasks






Insurance.

The implementation of Trial 2 focuses primary on the integration of external services. In this

scenario a web-application was developed to enable professional users - to do crop type

classification on demand using latest or historic satellite images. A variety of visual analytic

tools are included to allow efficient exploration of available data. The functional capabilities

for the purpose of classification are offered by external services which in turn exploit methods

from the domain of machine learning (ML). The integration of services and data sources is

done using well-defined RESTful interfaces.

Trial 2 execution


activities:

By M26, the DataBio platform v2 for the pilot is fully operational and involves offering to the

insurance company a set of tools and services for: a) damage assessment targeting towards a

faster and more objective claims monitoring approach just after the disaster (scenario 1), b)

the adverse selection problem. Through the use of high quality data, it will be possible to

identify the underlying risks associated with a given agricultural parcel, thus, supporting the

everyday work of an underwriter (scenario 2), c) large scale insurance product/risk

monitoring, that will allow the insurer to assess/monitor the risk at which the insurance

company is exposed to from a higher level (scenario 3).

The effectiveness of the methodology was tested against a flooding event (11/7/2019) in

Komotini that affected cotton farmers in the region and led to significant crop losses (Figure

102).

Figure 102: Aftermath of the floods in Komotini region (11/7/2019)

Initially, Gaiatron measurements confirmed that flooding conditions were present at the area

as a result of increased volumes of rainfalls (Figure 103). This proves that the region might

have been affected by the systemic risk and should be more thoroughly examined.



Figure 103: Rainfall volume (mm) in the Komotini region

This triggered an EO-based crop condition monitoring approach that captures the impact of

the peril to crop’s health. After only 2 weeks the approach identified statistically significant

differences compared to the respective crop model that indicate damages at field level (Figure

104). This validates the initial hypothesis that floods were responsible for severely affecting

the region’s crop health and consequently proves that the established methodology can be a

powerful tool for early identification of potentially affected/damaged parcels, crop types and

areas (as described within scenario 1). The findings have been presented both to the

insurance company and the farmers in order to show how these technologies can bridge the

gap among the farming and the insurance world.

Figure 104: Parcel monitoring at Komotini region (cotton) showing negative anomaly (deviation) for two consecutive decads just after the disastrous incident

By mapping the outcome of the followed damage assessment procedures on top of a map, it

is evident that high-level assumptions can be made. This involves the risk at which the

insurance company is exposed to (scenario 3) and prioritizing the work that needs to be



conducted by field damage evaluators (until now this process is not data-driven) that are

advised to begin with parcels exhibiting higher damage estimates and steadily move to those

with lower ones.

Figure 105: High-level overview of the affected area, color coded with the output of the followed damage assessment procedures

Finally, the exploitation of the wealth of agro-meteorological data (Gaiatron stations, EO

meteorological open data) also leads to the provision of underwriting services (scenario 2)

that provide critical statistical insights for better shaping agro-insurance products (Figure

106).

Figure 106: Risk analysis tool that measures the frequency of presence of extreme weather conditions (against heat-waves, frosts, or windstorms) as defined by ELGA15



hosted in NEUROPUBLIC’s N. Greece offices with the participation of other DataBio partners

15 http://www.elga.gr/organismos/thesmiko-plaisio/52-thesmiko-plaisio/nomoi/70-2010-04-28-08-48-39

http://www.elga.gr/organismos/thesmiko-plaisio/52-thesmiko-plaisio/nomoi/70-2010-04-28-08-48-39





By M32, a first instance of the aforementioned analytics platform has been finalized and

deployed. The use of ML services is available providing a proof of concept for its use in agri-

insurance scenarios (e.g. scenario 1 and 3). FRAUNHOFER was responsible for the

development of the UI, integrating map, pixel heat maps from the different classifiers and

information visualization capabilities

A CSEM developed system for the management of Machine Learning models was used to

facilitate the simple and retraceable management of models. RESTful services, combined with

security features in the form of JWT tokens and encryption with HTTPS, were implemented

and integrated into service. The service has also been containerized to allow simple

deployment. This service enables the communication with the FRAUNHOFER’s component

GeoRocket and UI for the on-demand classification, in both pixel and parcel levels, of crop

types.

Figure 107: FRAUNHOFER's UI screenshot colour coding different crop types



Figure 108: FRAUNHOFER's UI screenshot that integrates CSEM’s classification results into pixel heat maps

By M34, the final KPI measurements are collected. More specifically, with regular discussions

with the farmers and INTERAMERICAN, final KPI measurements and feedback was collected

and can be found in Section 12.5.2.

Trial 2 results


expected TRL. INTERAMERICAN continued (for a second year) to benefit from agri-insurance

tools and services that perform EO-based damage assessment at parcel level and target

towards evolving to next-generation index-based insurance solutions. The pilot results clearly

show that data-driven services can facilitate the work of the insurance companies, offering

tools that were previously unavailable and were responsible for severe bottlenecks in their

day-to-day activities (e.g. long wait for ELGA’s official reports, dependence on the human

factor, difficulties in prioritizing work after receiving several compensation claims). However,

there is still room for methodological improvements. Specifically, more effort should be

placed on validating negative predictions in order to capture the true accuracy of results. Data

abundancy holds the key in delivering even more precise solutions and address issues relevant

to the multi-parametric nature of the problem as different climate-related perils affect

dissimilarly different crop types within their various phenological stages.





Component code

and name

Purpose for pilot Deployme

nt status

Compone

nt

location

C13.01 Neurocode

(NP)

Neurocode allows the creation of

the main pilot UIs in order to be

used by the end-users (insurance

company, farmers) and offering

insights regarding weather-related

perils

deployed NP Servers

C13.02 GAIABus

DataSmart Machine

Learning

Subcomponent (NP)

Supports EO data preparation and

handling functionalities

Supports multi-temporal object-

based monitoring and modelling

for damage assessment

deployed NP Servers

C13.03 GAIABus

DataSmart Real-

time streaming

Subcomponent (NP)

Real-time data stream monitoring

for NP’s Gaiatrons Infrastructure

installed in all pilot sites

Real-time validation of data

Real-time parsing and cross-

checking

deployed NP Servers

C31.01 Neural

Network Suite

(CSEM)

Machine learning crop

identification system to be used

for the detection of crop

discrepancies that might derive

from reported weather-related

catastrophic events

deployed CSEM’s

Servers

C04.02 – C04.04

Georocket,

Geotoolbox,

SmartVis3D

(Fraunhofer)

Back-end system for Big Data

preparation, handling fast

querying and spatial aggregations

(data courtesy of NP)

Front-end application for interactive

data visualization and analytics

deployed Fraunhofe

r Servers



Data Assets


original

source

Dataset

location

Volu

me

(GB)

Velocity

(GB/year)

Sensor

measurem

ents

(numerical

data) and

metadata

(timestamp

s, sensor id,

etc.)

Gaiasense

field. Dataset

composed of

measurement

s from NP’s

telemetric IoT

agro-climate

stations called

Gaiatrons for

the whole

Greek area.

NEUROPUBL

IC

GAIA

Cloud

(NP’s

servers)

Sever

al

GBs

Configurable

collection and

transmission

rates for all

Gaiatrons. >200

Gaiatrons fully

operational at

several

agricultural areas

of Greece

collecting >

30MBs of data per

year each with

current

configuration

(measurements

every 10 minutes)

EO

products in

raster

format and

metadata

Dataset

comprised of

remote

sensing data

from the

Sentinel-2

optical

products (55

tiles for the

whole area of

Greece)

ESA

(Copernicus

Data)

GAIA

Cloud

(NP’s

servers)

>550

00

>18800

Parcel

Geometries

(WKT),

alphanume

ric parcel-

Dataset

comprised of

agricultural

parcel

positions

NEUROPUBL

IC

GAIA

Cloud

(NP’s

servers)

Sever

al

GBs

1 GB/year

The update

frequency

depends on the

velocity of the



related

data and

metadata

(e.g.

timestamps

)

expressed in

vectors along

with several

attributes and

extracted

multi-

temporal

vegetation

indices

associated

with them.

incoming EO data

streams and the

assignment of

vegetation indices

statistics to each

parcel. Currently,

new Sentinel-2

products are

available every 5

days

approximately

and the dataset is

updated in

regular intervals



In the context of DataBio, NP has initiated a close collaboration with INTERAMERICAN

(https://www.interamerican.gr/), a major Greek insurance company with a clear target in

evolving its agri-insurance products. This collaboration is expected to continue in the next

years as part of another high-profile research project, H2020 e-shape (https://e-shape.eu/),

where NP is a key partner in S6P4 “Resilient and Sustainable ecosystems including Agriculture

and food” and INTERAMERICAN the pilot’s co-designer. What is also being investigated is the

possibility to offer the agri-insurance services of INTERAMERICAN alongside with the Smart

farming ones of NP as part of a joint exploitation plan (and vice versa, i.e. Smart farming

services alongside agri-Insurance ones). This will allow both companies to widen their market

share.

From an implementation point of view, the quality of the provided services of NEUROPUBLIC

greatly benefited from the collaboration with leading technological partners like CSEM and

Fraunhofer, that specialize in the analysis of Big Data. Moreover, feedback from the end users

and lessons-learnt from DataBio’s pilot execution significantly fine-tuned and will continue to

shape the suite of dedicated tools and services, thus, facilitating their penetration in the agri-

insurance sector.

https://www.interamerican.gr/

https://e-shape.eu/



KPIs

KPI

short

nam

e

KPI

descripti

on

Goal

descriptio

n

Base

value

Target

value

Measur

ed value

Unit

of

value

Comment

C1.1

_1

Accuracy

in

damage

assessme

nt

No

prior

inform

ation

availab

le

>80 95%

precisio

n

% Results are

available in

real-world

data, capturing

disasters

resulting from

extreme

weather

events (July

2019 -

Komotini

region - Cotton

cultivation

affected by

floods). As our

first priority

was to notify

and assess the

most-affected

parcels,

validation was

focus on

positive

predictions.

Precision

reached ~95%

effectively

showing that

data-driven

solutions can

significantly

prioritize and

reduce the

work required



by an expert

evaluator.

C1.1

_2

Decrease

in the

required

time for

conducti

ng an

assessme

nt

Severa

l

month

s

Severa

l days

Two

weeks

approxi

mately

Days,

week

s,

mont

hs

This KPI

depends on

the availability

of reliable EO-

data in the

post-disaster

period. Cloud

presence or

absence plays

a critical role in

defining the

required time

for the

assessment.

We usually

need at least 2

post-disaster

EO-based

measurements

to reach

reliable

conclusions

and based on

Sentinel2

measuring

resolution, this

happens

approximately

within 2

weeks.

C1.1

_3

Number

of crop

types

covered

Initiall

y no

crops

were

being

covere

d by

7 7 crop

types

(based

on

specific

require

ments

plain

num

ber

7 major annual

crop types

were modelled

as suggested

by the

insurance

company for



the

system

from

the

insuranc

e

compan

y) for all

55 tiles

coverin

g

Greece.

the whole

Greek arable

area (55tiles x

7 crops = 385

models in total

created) and

namely:

cotton, rice,

maize, maize

silage, tomato,

corn, potato.

In addition,

continuous

NDVI

monitoring

(measuring

NDVI

fluctuations

before and

after a

disastrous

incident) can

be actually

applied to any

crop type to

assess

damages at

field level



13 Pilot 11 [C1.2] Farm Weather Insurance Assessment Pilot overview

The objective of proposed pilot is the provision and assessment on a test area of services for

agriculture insurance market, based on the usage of Copernicus satellite data series, also

integrated with meteorological data, and other ground available data.

Among the needs of the insurances operating in agriculture, one of the most promising in

terms of fulfilment with Earth Observation data is the evaluation of risk assessment and

damages estimation down to parcel level.

For the risk assessment phase, the integrated usage of historical meteorological series and

satellite derived indices, supported by proper modelling, will allow to tune EO based products

in support to the risk estimation phase.

For damage assessment, the operational adoption of remotely sensed data based services

will allow optimization and tuning of new insurance products based on objective parameters,

such as maps and indices, derived from EO data and allowing a strong reduction of ground

surveys, with positive impact on insurances costs and reduction of premium to be paid by the

farmers.

In the initial stage of the pilot activities, a set of services has been planned, including:

1. Historical medium resolution Risk Map: historical risk maps, based on long time series of vegetation indices estimated form medium resolution satellite images (number of critical events for each area).

2. Field crop growth vs. similar crop (inter-field anomalies): Indicator on crop behaviour (average, worst, better) during current season comparing the single parcel behaviour and the average in the area.

3. Intra-field Anomalies: information about single parcel situation to detect the growth homogeneity and evidencing irregular areas in the parcel.

4. Correlation among weather historical data and critical events: specific indexes supporting the introduction of parametric insurance products, obtained by using machine learning methods that consider, as inputs:

○ meteorological relevant data

○ spectral specific indexes

○ field characteristic (e.g. soil type)

○ loss data from Insurance




The services that have been set up in the Trial 1 are briefly described here after and first

results are presented.

Historical medium resolution Risk Map: The scope of the service is to provide historical risk

maps, based on long time series of vegetation indices estimated form medium resolution

satellite images providing, as output, a risk maps per crop (number of critical events for each

area).

The historical risk map refers to the occurrence of “damage” in the past. The map is based on

an index derived from time series of low-medium resolution satellite images. The index is

assumed to be correlated with crop yield.

“Damages” are mapped for each year in the time series by calculating on pixel basis the

difference between the actual index value and the long-term average. When the difference

exceeds a certain threshold, we assume there is damage. Ideally, the damage threshold is

defined based on reference data such as actual losses reported on the field. Geo-localized

crop loss data will be made available by the insurance company for the period 2012-2018 but

have not been received yet.

Figure 109: Map classifying the Netherlands territory in terms of number of years with damages



Weather based risk map: A weather-based risk map is going to complement the historical risk

map calculated by VITO to detect the occurrence of “damages” in the past. Such damages are

in fact not explicitly correlated to weather events. The risk map is intended to show the

occurrence of extreme weather events in the past. It is then going to show a reliable

correlation between damages occurred to the crops and extreme weather events, heavy rains

in particular, to better define certain damage patterns or to further zoom in on areas with a

high damage frequency. At the end, 8 different risk maps are expected: 1 per threshold per

year. The risk map will be available as a raster image, in geotiff format. Moreover, starting

from the list of dates related to damage claims and provided by the insurance companies for

the years 2015-2018, the extraction of precipitation values (with the respective location

coordinates) has been performed, in order to find further locations (in addition to those

provided by the insurance company) where heavy rain events have occurred (Figure 105).

mm

Figure 110: Map of precipitation extracted from KNMI dataset on date 30/08/2015. Yellow points: locations provided by the insurance company – Blue points: further locations with 24-hours precipitation values above the 50 mm threshold

Finding new locations showing heavy rain events should help in finding changes in the

vegetation index. Over the coming trial, further meteo-climate variables could be taken into

account, such as temperature.



Field crop growth vs. similar crop (inter-field analysis): The scope of the service is to

represent the status of the crop during the current season and to use it, in case of critical

weather events (flood, drought), to provide evidence that the potential damages are really

depending on the event or that the parcel was already in a critical situation in terms of

production capacity. The output of the service aims to provide and indicator on crop

behaviour (average, worst, better) during current season.

Starting from a shapefile grouping same crop fields in the area of interest, the developed tool

applies an inner buffer to each parcel and extracts the temporal profiles. Figure 106 show

some results produced by the analysis. We tested the process on winter wheat, onions and

potatoes considering S2 data from 2018-01-01 to 2018-11-15. In particular, account areas

affected by drought and frost have been taken in account and the results reveals significant

differences between temporal profiles of parcels impacted, with a high level of anomaly

(assigned by the tool), and parcels not impacted with a “normal” behaviour.

Intra-field Anomalies: The scope of the service is to analyse single parcel situation to detect

the growth homogeneity and evidencing irregular areas in the parcel, providing an indicator

of field anomalies. The vegetation variability within a parcel is mainly due to soil

characteristics such as texture and depth with consequences on water consumption and

irregular growth but it is also affected by extreme weather events (e.g. drought, excess of

rain, frost and heat). Starting from the temporal spectral profile of a parcel, the developed

tool identifies the period of maximum growth of the crop (if the parcel is the cultivated) and

calculates mean and deviation that are effective instruments for detecting anomalies.

Figure 111: Intra-field analysis based on NDVI spectral index with S2A and S2B data (tile T31UET - year 2018)




Trial 2 timeline


The first part of preparatory work conducted by 3-Geos has been focused on collecting and

processing both optical and SAR data over the Netherlands.

e-GEOS has implemented two pipelines consisting of several pre-processing steps performed

directly both on Sentinel-2 data and on Sentinel-1 data.

Here a brief description of the main steps:

● Sentinel-2 pipeline:

○ Automated product downloading and archiving

○ Pre-processing: atmospheric correction and cloud, snow and shadow masking

○ Vegetation index extraction (NDVI)

● Sentinel-1 pipeline:

○ Automated product downloading and archiving

○ Coherences and Amplitudes in VV/VH polarization

e-GEOS has collected about 1 year of both Sentinel-2 and Sentinel-1 data. A total of 10

Sentinel-2 tiles has been processed (Figure 112) and 3 relative orbits of Sentinel-1 has been

considered for generating amplitudes and coherences.



Figure 112: Sentinel-2 tiles over the Netherlands

The second part of the preparatory work has been focused instead on extracting parcels’

statistics starting from GSAA data.

e-GEOS has developed a tool for extracting maximum, minimum, standard deviation and

count of pixels for each parcel (expressed by a polygon geometry) and for each satellite

acquisition by applying also an inner buffer to mitigate border effects.

Potato has been selected as crop of interest and we focused our analysis, in particular, on 5

types of potatoes:

• Consumption

• NAK, seedTBM, seed

• Starch

• AM, disinfestation

In Figure 113 an overview of the spatial distribution of potatoes (based on type) in the

Netherlands for reference year 2017 is presented.



Figure 113: Spatial distribution of potato fields with respect to variety for year 2017

In the Figure 114 the count of samples per type are presented.

Figure 114: Count of samples per type of potatoes

Activities related to interfacing the Insurance Final User (NB Advies):

● Extraction of potato fields from LPIS on 5 types of potatoes:

○ Consumption



○ NAK, seed

○ TBM, seed

○ Starch

○ AM, disinfestation

● Extraction of soil type for each parcel based on BOFEK2012 (Figure 115).

Figure 115: Soil type map

The preparatory work has been finalized by MEEO, to extract the following dataset for each

potato parcel:

• Precipitation (24H) from local weather stations (Figure 116)

• Evapotranspiration from EO Data Service MEA (Figure 117)

• Land Surface Temperature from EO Data Service MEA (Figure 117)

• Soil Moisture from EO Data Service MEA (Figure 117)

Figure 116: Meteo climate data from local weather stations



Figure 117: Data from EO Data Service MEA

Figure 118 presents an example of temperature profile related to a potato parcel.

Figure 118: Temperature profile (parcel number 1971186)

Data analysis focused on the possible application of machine learning techniques in order to

overcome the lack of data from insurances (EXUS).

Trial 2 execution

The activities and the services that have been set up in the Trial 2 are briefly described here

after:

Weather risk map

A weather-based risk map is intended to show the occurrence of extreme weather events,

heavy rains in particular, in order to identify areas with possible high damage frequency. Four



risk maps per year (from 2016 to 2018) have been created according to the following

thresholds indicated by insurance companies:

• 50/71 mm in 24h (depending on the agreement between farmers and insurance

company)

• 84 mm in 48 h

• 100 mm in 96 h

50 mm in 24h risk map for year 2016 71 mm in 24h risk map for year 2016








Figure 119: 2016-2018 risk maps (split across pages)

Detection of parcels with anomalous behaviours and identification of more influencing

parameters

Trying to identify the parameters (weather or soil related) with the dominant impact on the

crop yield such as Normalized Difference Vegetation Index (NDVI) measurements the

following approach was first considered:



For the 2017 dataset we went through the following steps for each one of the crop types

(potato):

1) Split parcels into two datasets.

2) Use the first part of the dataset for the clustering and create groups using satellite,

meteorological measurements and soil characteristics aggregated on the level of one

or two months considering a full growing season from March to October.

March – April Avg values for satellite

measurements

Avg values for meteo

measurements

May - June Avg values for satellite

measurements


measurements

July – August Avg values for satellite

measurements


measurements

Sep - Oct Avg values for satellite

measurements


measurements

Soil characteristics of the parcel

Coordinates of the parcel

3) Characterize / label each group based on the NDVI values of their parcels.

After these steps we would have liked to continue to the prediction and feature selection

and use the second part of the dataset in order to apply the following procedure:

1) For each parcel try to identify in which cluster / group belongs considering its

measurements from March to October.

2) After selecting the group it belongs, use the prediction model that have been trained

in the measurements of the parcels that belong in the same cluster and predict NDVI

values.

Due to the limited number of usable measurements for the different parcels for the half of

the dataset we could not apply the prediction and feature selection per cluster. For that

reason, we used the full dataset of 2017 considering SAR and meteorological measurements

(such as precipitation, cumulative precipitation, temperature and cumulative temperature)

and soil characteristics for the prediction of NDVI values after 14 days or any other preferable

time window, e.g.: use the SAR and meteorological measurements for the 30/06/2017 and

predict NDVI value for 14/07/2017. And try to identify which are the dominant parameters

that affect the growing of the parcels for each crop type. For the prediction and feature

importance we use random forests. The higher the value of the importance for a feature the

stronger the correlation with the NDVI value. For the dataset of 2017 considering SAR and



meteorological measurements (such us precipitation, cumulative precipitation, temperature

and cumulative temperature) and soil characteristics for the prediction of NDVI values after

14 days or any other preferable time window, e.g.: use the SAR and meteorological

measurements for the 30/06/2017 and predict NDVI value for 14/07/2017. This prediction

model as main aspect has to identify which are the dominant parameters that affect the

growing of the parcels for each crop type. For the prediction and feature importance we use

random forests. The higher the value of the importance for a feature the stronger the

correlation with the NDVI value.

Note that for each case the parameters importance values sum at 1.

Figure 120: NVDI per cluster



Figure 121: Parameter importance

NDVI trends of potatoes and relation with temperature

Type of potato MAE (mean absolute error

between true NDVI values

and estimated NDVI)

Consumption 0.14

NAK 0.11

Desinfestation 0.14

TBM 0.09

Starch 0.13

An analysis of the behaviour of different types of potatoes has been performed.

Unfortunately, few observations are fully reliable due to the massive cloud coverage that

affected the Netherlands during 2017, nevertheless, different trends have been identified

(see Figure 122).

We decided also to investigate the response to high temperatures of each variety.



Consumption potatoes

We classified consumption potatoes based on the cumulative temperature in the period 90-

200 Day of Year (from April to the middle of July) getting five groups, see Figure 123. Figure

124 shows the average NDVI profile of parcels belonging to the above mentioned 5 different

groups and it is quite clear that high temperature affects (reduces) NDVI maximum. Figure

125 shows the plot related to the average temperature for the group characterized by higher

temperature and lower maximum NDVI and for the one with lower temperature and higher

maximum NDVI.

Figure 122: NDVI profiles of different types of potato (year of reference 2017)

Figure 123: Five groups of consumption parcels based on cumulative temperature between 90 and 200 Day of Year



Figure 124: NDVI profiles of consumption parcels according the five groups identified by the temperature analysis

Figure 125: Average temperature trends of parcels in areas characterized by higher temperatures (blue) and lower temperatures (purple)

TBM potatoes

The same approach has been followed for TBM potatoes and we got four groups based on

the cumulative temperature from 90 to 200 Day of Year, see Figure 126.

Figure 127 shows the average NDVI profile of parcels belonging to the above-mentioned four

different groups.



Figure 128 shows the plot related to the average temperature for the group characterized by

higher temperature and lower maximum NDVI and for the one with lower temperature and

higher maximum NDVI.

Figure 126: Four groups of TBM parcels based on cumulative temperature between 90 and 200 Day of Year

Figure 127: NDVI profiles of TBM parcels according the four groups identified by the temperature analysis



Figure 128: Average temperature trends of parcels in areas characterized by higher temperatures (blue) and lower temperatures (red)

Starch potatoes

This variety of potato seems not to be affected by high temperatures thanks to its spatial

distribution (Figure 129).

Figure 130 plots the average NDVI profile of parcels belonging to the three different groups

defined according to previous analysis.

Figure 129: Three groups of Starch parcels based on cumulative temperature between 90 and 200 Day of Year



Figure 130: NDVI profiles of Starch parcels according to the three groups identified by the temperature analysis

NAK potatoes

The same approach has been followed for NAK potatoes and we got four groups based on the

cumulative temperature from 90 to 200 Day of Year, see Figure 131.

Figure 132 shows the average NDVI profile of parcels belonging to the four different groups.

Figure 133 shows plot related to the average temperature for the group characterized by

higher temperature and lower maximum NDVI and for the one with lower temperature and

higher maximum NDVI.

Figure 131: Four groups of NAK parcels based on cumulative temperature between 90 and 200 Day of Year



Figure 132: NDVI profiles of NAK parcels according the four groups identified by the temperature analysis

Figure 133: Average temperature trends of parcels in areas characterized by higher temperatures (blue) and lower temperatures (red)

Intra-field analysis

The scope of the service is to analyse single parcel situation to detect the growth homogeneity

and evidencing irregular areas in the parcel, providing an indicator of field anomalies. In order

to resume the approach, a brief description of the intra-field analysis follows:

• Creation of an inner buffer within the parcel polygon in order to avoid border effects.



• Extraction of the parcel temporal profile by calculating the mean value for each

observation.

• Identification of the observation that corresponds to the maximum growth stage of

the crop. Some filters are applied in order to exclude parcels that are not cultivated or

areas with no available images in the period of interest due to cloud cover.

• Calculation of mean value and classification of pixels within the parcel based on

thresholds.

As anticipated in D1.2, the analysis has been performed over the Netherlands considering

2017 as the year of reference. Unfortunately, the available dataset provided by the insurance

companies involved was not sufficient to study the correlation between extreme weather

events and losses, nevertheless this service is extremely useful to detect areas where

vegetation grows irregularly due to soil characteristics such as texture and depth.

Figure 134: Intra-field analysis based on NDVI spectral index with S2A and S2B data (year 2017)

Figure 135: Areas of anomalous growth



Trial 2 results

Trial 2 results for each activity, actually reported in the previous section, are summarized here

after:

• The weather risk map service has produced good results in terms of identification over

time of areas repeatedly affected by heavy rain according the thresholds provided by

insurance companies. This approach can be also applied to further meteo-climate

variables and can help to identify and monitor high-risk areas.

• The clustering-based service has proved to be a very useful technique to identify

parcels with anomalous behaviour and to consider in a single analysis all the variables

that can affect the growth and the yield of a crop. Unfortunately, it was not possible

to validate the results due to lack of data from insurances, but the approach seems to

be very promising.

• The performed activity reveals that temperature is a factor with high impact on NDVI

of potatoes.

• Intra-field service is extremely effective in detecting soil anomalies that do not allow

crops to grow homogeneously within parcels. This service provides an indicator of soil

goodness: texture and depth, for instance, have consequences on water consumption

and on regular growth.



Component code

and name


location

C08.02 (Proba-V

MEP)

EO data for

historical risk

mapping

Used only in Trial 1 Proba-V MEP

at VITO

C34.01 Feature importance

applying Machine

Learning

techniques for

weather insurance

based on satellite

and meteorological

data

The component is

operational and it has

been used in Trial 2

EXUS internal

server



C41.01 (MEA

WCS)

Extraction of

meteo data for

weather-based

risk map

(precipitation

values)

The component is fully


been already used in Trial

1 and Trial 2

MEEO server

C41.02 (MEA GUI) Extraction of

meteo data for

weather-based

risk map

(precipitation

values)

The component is fully


been already used in Trial

1 and Trial 2

MEEO server

C28.01 DataCube

Management and

preprocessing of

input EO data for

their operational

usage

The component is

operational and it is

already used in the Trial 1

and Trial 2

e-GEOS

Server

EO processing

Processing chain

for multitemporal

indices

computation from

EO data

The component is



and Trial 2

Intra-field analysis The component is



and Trial 2

Zonal statistics

tool

The component is





Data Assets


source

Dataset

location

Volum

e (GB)

Velocity

(GB/year

)

NDVI data in

raster format

and

metadata

Remote

sensing data

from

Sentinel-2

optical

satellite for

2017 on 10

tiles

ESA (Copernicus

Data)

Copernicus

Scihub

https://scihu

b.copernicus

.eu/

350

SAR data in

raster format

(Amplitude

and

Coherence in

VV and VH

polarization)

Remote

sensing data

from

Sentinel-1

radar

satellite for

2017 on 3

relative

orbits

ESA (Copernicus

Data)

Copernicus

Scihub

https://scihu

b.copernicus

.eu/

1380

Vector data Netherlands

field

declarations

(2017)

Netherlands

Paying Agency

WFS Service

https://geod

ata.nationaa

lgeoregister.

nl/brpgewas

percelen/wf

s?&request=

GetCapabiliti

es&service=

WFS

1

Vector data Netherlands

Soil type

Netherlands https://ww

w.wur.nl/upl

oad_mm/1/

7/6/61a0f2a

a-4cd1-

4b5e-90db-

9498465d3b

1

https://scihub.copernicus.eu/






https://geodata.nationaalgeoregister.nl/brpgewaspercelen/wfs?&request=GetCapabilities&service=WFS











https://www.wur.nl/upload_mm/1/7/6/61a0f2aa-4cd1-4b5e-90db-9498465d3b6a_BOFEK2012_bestandenVersie2_1.zip









6a_BOFEK20

12_bestand

enVersie2_1

.zip

LST raster

dataset

Land

Surface

Temperature

with 0.031

degrees of

resolution

EUMETSAT

Precipitation

raster

dataset

24-hour

precipitation

accumulatio

ns from

radar and

rain gauges

KNMI https://data.

knmi.nl/data

sets/radar_c

orr_accum_

24h/1.0

0,2

Evapotranspi

ration

dataset

Daily MSG

Evapotranspi

ration

EUMETSAT

Soil moisture

raster

dataset

Sentinel1-

Soil Moisture

Copernicus

Global Land

Service

https://land.

copernicus.e

u/global/pro

ducts/ssm



The objective for the pilot was to find useful services for the insurance to gain more insight

about the risk and the impact of heavy rain events for crops in the Netherlands. Potato-crops

are very sensitive to heavy rain, which may cause flooding of the field (due to lack of runoff)

and saturation of the soil. This may cause the loss of the potato yield in just a few days. Areas

of greater risk can be charged with higher costs for the farmer. Instead of just raising the

premium, the intention of the pilot was to be able to create awareness and incentives for

farmers to prevent losses. Therefore, the services we created served multiple purposes.

Weather is an important factor in crop insurance, because it represents a critical aspect










https://land.copernicus.eu/global/products/ssm






influencing yield. The analysis of the long-term precipitation, categorized in threshold values,

for intense rain events, gave insight in the areas with higher risk. During climate change these

numbers may change. So, a service about the changing patterns is an interesting service.

In the pilot we looked for the relation between one single event and the potential yield loss.

For this we needed an annotated set of data, where actual losses were determined. Because

of the privacy issues related to sharing the damage data, the location of damages fields could

not be pinpointed precisely enough for correlation to the EO data. Without the details about

historical events this relationship could not be determined. Based on the information we had

though, we were able to determine the events and give information about damage risk for

other areas. A service, based on the alert that a heavy rain event took place, would be useful

for gaining insight about the impact on other locations. In order to find the most limiting

aspect in the crop development we created a dataset based on the Sentinel-2 raster size to

combine NDVI with SAR, precipitation (cumulative), temperature and soil type. The potato

type proved to be the predominant factor to predict the NDVI. Splitting up the dataset in

subsets per potato type the precipitation was the most determining factor. Unfortunately, we

couldn’t find the connection with the heavy rain, because the training set was not sufficient

for that analysis. The dataset, however, is valuable for further analysis, not limited to

insurance topics.

The consortium had a very good cooperation in a good spirit. It would be possible to continue

the cooperation in future projects based on the results of this pilot. The results are not market

ready yet, therefore there are no specific plans for joined exploitation at this moment.

KPIs

The initial set of services and activities have been reviewed and reconfigured after the analysis

of available datasets and also after the Trial 1. Some of the initially planned services (in

particular the correlation among weather historical data and critical events) were based on

the assumption to have historical dataset of losses occurred long enough to set the threshold

values and train, when necessary, the machine learning tools. The available datasets provided

by the Insurance Company involved were not sufficient to implement these approaches.



As result, KPIs have been modified to face the situation considering the available datasets.

KPI

short

nam

e

KPI

descripti

on

Goal

description

Base

value

Target

value

Measure

d value

Unit

of

value

Comment

C1.2_

1

Spatial

Scalabilit

y

Parcel

analysis at

country level

N/A 500 1624 Km^2

C1.2_

2

Temporal

Scalabilit

y

Parcel

analysis

over time

N/A 365 365 days

C1.2_

3

Multi

source

analysis

Capabilit

y

Analysis of

multisource

data (both

EO and non

EO data)

2 5 6 Numb

er of

datas

ets

used

combi

ned

for

the

analys

is

Further

included

dataset:

SAR

C1.2_

4

Identifica

tion of

key

paramet

ers for

crop

yield

Capabilit

y

Study and

identificatio

n of

parameters

affecting

crop growth

and yield

N/A 1 2 Numb

er of

para

meter

s

identi

fied

Further

identified

paramete

rs



14 Pilot 12 [C2.1] CAP Support Pilot overview

In the framework of EU Common Agricultural Policy (CAP), farmers can have access to

subsidies from the European Union, that are provided through Paying Agencies operating at

national or regional level. For the provision of the subsidies, Paying Agencies must operate

several controls in order to verify the compliance of the cultivation with EU regulations. At

present, the majority of the compliance controls are limited to a sample of the whole amount

of farmers’ declarations due to the increased costs of acquiring high and very-high resolution

satellite imagery. Moreover, they are often focused on a specific time window, not covering

the whole lifecycle of the agriculture land plots during the year.

The free and open availability of Earth Observation data is bringing land monitoring to a

completely new level, offering a wide range of opportunities, particularly suited for

agricultural purposes, from local to regional and global scale, in order to enhance the

implementation of Common Agricultural Policy (CAP). Nowadays, satellite image time series

are increasingly used to characterize the status and dynamics of crops cultivated in different

agricultural regions across the globe.

Pilot C2.1 CAP Support provides products and services, based on specialized highly automated

techniques for processing Big Data, in support to the CAP and relying on multi-temporal series

of free and open EO data, with focus on Copernicus Sentinel 2 data.

The main goal of the approach is to provide services in support to the National and Local

Paying Agencies and the authorized collection offices for a more accurate and complete farm

compliance evaluation - control of the farmers’ declarations related to the obligation

introduced by the current Common Agriculture Policy (CAP).


Trial Stage in Romania

The general methodology for Trial 1 was based on the comparison between real crop

behaviour and the expected trends for each crop typology. It involves image processing, data

mining and machine learning techniques and is based on different categories of input data:

Sentinel-2 and Landsat-8 SITS covering the time period of interest, farmers’ declarations of

intention with respect to crops types, as well as in-situ / field data.



Figure 136: Crop families detection using Sentinel 2 temporal series

Figure 137: Pixel-based results of the analysis regarding potential incongruences with respect to farmers’ declarations stating crop types and areas covered



Figure 138: Pilot-based results of the analysis regarding potential incongruences with respect to farmers’ declarations stating crop types and areas covered

Following the results of Trial 1, we can conclude that:

• There are well visible differences in declared crops versus crops identified through

unsupervised machine learning algorithms.

• The validation of the preliminary results against independent sources (in-situ data,

high or very high-resolution imagery) revealed promising results, with an accuracy

higher than 90% for all the selected crop families.

• There is a need for further trials, for more areas of interest, in order to compare the

results and refine parameter settings in algorithm design. Also, crop types will be used

during Trial 2 instead of crop families.

• The highly-automated proposed approach allows the performing of Big Data analytics

to various crop indicators, being reliable, cost- and time-saving and allowing a more

complete and efficient management of EU subsidies, strongly enhancing their

procedure for combating non-compliant behaviours. The developed technique is

replicable at any scale level and can be implemented for any other area of interest.



Trial Stage in Italy

The objective of the Trial 1 has been to set up a quick methodology, based on the computation

of markers, in relation to predefined scenarios in terms of crop type and reference periods

during which agricultural practices must take place, to detect LPIS\GSAA parcel anomalies in

terms of crop type or crop family, with respect the last update (LPIS) or farmer’s declaration

(GSAA) and to re-classify the parcel itself. The methodology works at parcel level, therefore

several markers as ploughing, presence, harvesting, are computed for each parcel depending

on the specific crop type. The workflow is based on the following steps:

• Download of Sentinel-1 and Sentinel-2 satellite data from repositories. Images

collected in 2017 and 2018 have been

• Preprocessing of Sentinel-2 data in order to mask clouds and related shadows

• Generation of spectral indices from preprocessed Sentinel-2 satellite data, also by

composing data from different images, to be used for markers computation

• Intersection of Sentinel-2 spectral indices and preprocessed Sentinel-1 data with

parcels to be monitored

• Computation of markers at parcel level

Figure 139: NDVI temporal trend with identification of relevant periods

In Figure 139 has been reported for example, for a generic crop, the identification of the two

periods in which ploughing (in blue) or harvesting (in pink) event are expected for a summer

arable land crop. The results of Trial 1 highlighted that:



• Scalability has been considered by designing a processing environment that can be

easily ported to a cloud infrastructure, therefore removing processing and storage

limitations that could be a strong barrier for enlarging at regional or national level or

more.

• Other relevant issue that has been taken into consideration is the exchange of input

and output data with the Paying Agency. This interaction requires, apart from privacy

issues, the masking of several data mainly related to property, a clear definition of

output products in terms of specifications that can properly fit the compliance

verification process. In fact, regulations for the same crop type can be different

according to the location and the adopted application schema.

• For the definition of markers, it must considered that each of them must be defined

according to the geographic location, and specific algorithm and related parameters

must be identified, therefore requiring a proper tuning by leveraging on time series

analysis. This operation is supported by the analysis, for each crop, of its spectral

behaviour along time, in order to identify from a mathematical point of view, markers

related to specific activities.

The methodology has been applied on the AOI of the project in Veneto (Varese Province)

where the LPIS 2016 was available.



Under the framework of the DataBio project, Terrasigna ran CAP support monitoring service

trials during 2017, 2018 & 2019 for 10 000 sqkm AOI in Southeastern Romania.

The main goal of the CAP Support Monitoring pilot trials was to provide crop type maps for a

large area, characterized by geographical variability, for a broad variety of crops, distributed

over diverse location and including small and narrow plots, making use of the Copernicus

Sentinel-2 spatial and temporal resolution.

During Trial 1, developed in 2018, important sets of results have been provided, consisting of

crop families’ maps and crop inadvertencies maps. The results were based on farmers’

declarations regarding crop types and areas covered for 2017 and 2018 agricultural seasons

and involved five crop families: wheatlike cereals, maize-like cereals, sunflower and related

crops, rapeseed and related crops, grassland, pastures and meadows. The results delivered

for the 2018 agricultural season have been validated through a series of in-situ data and

Sentinel-2 backgrounds for the test-area, resulting in a qualitative assessment, used in order

to define Trial 2 actions and expected results.

The main goal of Trial 2 was to overview the key results for Trial 1, identify the emerging needs

of the components that were involved in Trial 1 and provide a further development of the

Crop Monitoring Service, with products tuned in order to fulfil the requirements of the 2015-

20 EU Common Agricultural Policy.



Therefore, Trial 2 was based on an adjusted version of the crop detection algorithms, updated

according to the previous validation work, aiming to increase accuracy and success rate.

Terrasigna's proposed methodology has undergone continuous development and

improvements over the last 4 years, now reaching version v.05 of the algorithms.

Apart from the optimised version of the algorithms used, Trial 2 objectives included:

• New measurements carried out for the analysed area, according to a field tracking

plan;

• The delivery of new results for the same area of interest, but based on a higher

number of target crops and using crop types instead of crop families;

• Further testing of the algorithms developed and extension of the service at national-

scale level.


Starting from the preliminary results of Trial 1, the main objective of Trial 2 was to refine and

validate the approach implemented. For these purposes the Trial 2 activities included:

• Refinement of the criteria adopted to aggregate the crops in crop families;

• Refinement of the rules adopted for the marker computation;

• Collection of validation data for accuracy assessment

• Validation of results

Trial 2 timeline


Trial 2 activities have been mainly divided into 3 steps:

Step 1: Trial 2 Start (M26)

Trial 2 started with the DataBio components ready for pilot trial implementation, as well as

platform services ready for use in the final pilot iterations. All the specifications for Agriculture

Pilots Trial 2 have been defined within internal deliverable D1.i2 – `Agriculture Pilots Trial 2

Specifications`.

Step 2: All pilot services have been developed and ran between M26 and M34. On average,

the frost-free growing season in Romania starts at the beginning of April and ends by the

beginning of November. However, most of the target crops used in the analysis performed

within the pilot are harvested by the end of September. Therefore, final results were available

by mid-October (M34). Pilot services included:

• Further adjustments of the crop detection algorithm, based on Trial 1 results, fine

tuning and comparisons based on the results obtained in Trial 1;

• Dialog with users / beneficiaries / stakeholders (APIA - the Romanian National Paying

Agency);



• Ingestion of new farm profile data (farmers declarations for 2019) and related Earth

Observation data (Sentinel-2, Landsat-8) acquired for the 2019 growing season;

• Further development of the Crop Monitoring Service, using the 2018 and 2019

farmers’ CAP declarations regarding crop types and areas covered;

• Accuracy level computation;

• Comparisons with Trial-1 results;

• Development and extension of the service at national-scale level, for the whole

territory of Romania, for both 2018 and 2019 agricultural seasons.

Step 3: Trial 2 End (M34)

The goals obtained have been Final implementation of pilot activities (including results’

delivery), final analytics, final pilot KPI measurements for Trial 2 and collection of feedback

from pilot stakeholders.

Figure 140: Trial 2 timeline of Romanian AOI in pilot C2.1


M24-M30: Collecting historical S2 data (2017-2018) on the AOI (Verona Province, Italy) and

refinement of methodology (marker’s rules and crop type aggregation).

M31-M36: Running the prototype, analysis and validation of results.



Figure 141: Trial 2 timeline of Italian AOI in C2.1


Preparation of Trial Stage in Romania

Trial 2 started with fine tuning of algorithms. Various parameters have been tested in order

to implement a final version of the crop-type detection algorithm. The algorithms have been

modified in order to increase the number of analysed crop types and the overall accuracy of

the results.

The preparatory work for Trial 2 has focused on a preliminary collection and analysis of the

data needed for running the Romanian pilot.

Data collection activities included three main categories:

1. External data – farmers’ declarations: Pilot C2.1 CAP Support uses farmers’

declaration regarding crop types and areas covered as input data. These data are

provided by the Romanian National Paying Agency, as well as its regional offices. In

2019, the deadline for collecting the declarations was May 15th. After this deadline,

an approximately 2-weeks interval was needed in order to process the data before

delivering it to TERRASIGNA. Therefore, the proper sample processing and ingestion

of farmers’ declarations for 2019 started in June and the processing and analysis stage

ran over the next four months, from June until the beginning of October. For the

10,000 sqkm area of interest, more than 150,000 plots of different sizes have been

analyzed during the 2019 agricultural season. The analysis performed included parcels

of over 0.3 ha, regardless of shape. Of course, the 10-meters spatial resolution made

the narrower parcels difficult to properly label. Very related groups of cultures, which

have synchronous phenological evolutions and similar aspect have been grouped into

crop classes.

What is more, for the 2018 and 2019 agricultural years, Terrasigna extended its CAP

monitoring service and monitored the declarations for the entire agricultural area of

Romania. The total surveyed area exceeded 9 million ha, corresponding to more than

6 million plots of various sizes and shapes. The necessary Earth Observation (EO) data



required multiple Sentinel-2 scenes projected in 2 UTM zones. 21% of the total

number of plots within the test areas have surfaces below 1 ha.

The observed crop types maps included 32 crops, summing more than 98% of the total

declared area.

Figure 142: Structure of the data for the 10,000 sqkm area of interest

Figure 143: Agricultural land plots for the 10,000 sqkm area of interest. Data Source: Agency for Payments and Intervention in Agriculture (APIA), Romania



Figure 144: Romania - total declared area and number of plots registered for CAP support (2019). Data Source: Agency for Payments and Intervention in Agriculture (APIA), Romania

2. Optical Earth Observation (EO) data: Landsat-8 OLI and Sentinel-2 MSI - both Sentinel-

2A and Sentinel-2B have been downloaded for the area of interest, for a time interval

between March and September 2019. The 10-meter spatial resolution of the Sentinel-

2 data enables the survey of the smaller plots that in Romania represent a significant

number of CAP applications. The spectral resolution provides all the necessary

information (visible, NIR, SWIR) for observing the crop phenology. On a more general

note, TERRASIGNA’s technology uses both Copernicus Sentinel-2 and Landsat 8

imagery for a maximum of information availability and time series density compared

to using only Landsat 8 or Sentinel 2 images separately.



3. Field data: Field data have been collected according to a field tracking plan. Also, this

category includes different datasets provided by the Agency for Payments and

Intervention in Agriculture (APIA), based on the annual on-site compliance

verifications of the farmers that applied for subsidies. All the field data have been used

as independent validation data.

Preparation of Trial Stage in Italy

As prosecution of Trial 1, the first part of preparatory work of Trial 2 has been focused on four

main activities aimed to finalize the preparation of the trial input data:

Satellite data collection:

• Sentinel-2 time series over the AOI: collection, cloud, snow and shadow masking and

vegetation index extraction (NDVI) of Sentinel-2 data acquired from May 2017 to

December 2018, related to the granule T32TPR. Temporal aggregation of NDVI data

over an interval of 20 days.

LPIS 2016 data analysis and crop type aggregation:

• Analysis of crop types of the AOI and refinement of the LPIS macro classification:

aggregation in macro classes (23 families) and analysis of classes distribution.

• Selection of crop classes suitable for the automatic detection of anomalies and re-

classification, based on the Sentinel-2 time series. Largest part (about 67%) of AOI

agricultural crop families belong to 2 main groups: permanent grassland and arable

land. The crop families of these 2 groups have been considered to test the algorithm

of anomalies detection and re-classification at macro-class level. The anomalies

analysis on the group of permanent crops have been focused on the detection of

explant cases.

Figure 145: LPIS crop families distribution



Figure 146: LPIS legend with crop type aggregation in macro classes

Validation data collection:

• Collection of a validation dataset, representative of the crop families distribution

(mainly permanent grassland, winter and summer arable land, temporary grassland),

from very high resolution imagery

Marker rules refinement:

• Refinement of markers rules and their computation: markers have been defined and

computed in relation to predefined scenarios in terms of selected macro crop type

reference periods and related thresholds during which agricultural practices must take

place (e.g. ploughing, presence\growth and harvesting)



Figure 147: Summary of markers periods for each macro class of crop type

All the listed markers are computed for each macro class. This is necessary in order to allow,

as much as possible, the assignment of a new macro class (re-classification) in case of anomaly

detection.

Trial 2 execution

Execution of Trial Stage in Romania

Trial 2 execution for the Romanian area of interest was entirely based on Terrasigna's toolbox

for crop determination, consisting of a set of in-house developed algorithms for calculating

CAP support-related products. Following an automatic learning process, the system becomes

capable of recognizing several types of cultures, of the order of several tens. The processing

chain used during Trial 2 included the following activities:

A) Data Ingestion

Earth Observation data used within the framework of the CAP Support Pilot is derived from

two different sensors, which requires an effort to harmonize the spatial resolution and the

footprint of the native pixel grids. The ingestion process involves the following important

steps:

• Unzipping raw data (Sentinel2 and Landsat 8 data, not atmospherically corrected);

• Harmonizing data covering the area of interest by using a common numeric format

and a tiles system;

• Automatic co-registration / georeferencing corrections;

• Cloud and shadow masking and extraction of masks of areas of interest.

B) Scene classification

• Use of statistical parameters for the crop classification (obtaining the native structure

of semantic clusters and applying them at tile level);

• Granting of semantic profile for the individual classified scenes (the pixels get the fuzzy

labels belonging to the crop class).



C) Time series analysis

• Building the time series of semantic profiles, at tile level;

• Defuzzification` and application of a filter to reduce confusion between crop classes

D) Construction of graphical products and analytical data

• Concatenation of tile-level results;

• Delivery of single channel or RGB maps illustrating crop types, crop compliance,

classification confidence etc.;

• Extraction of numerical, quantitative syntheses based on the delivered products.

Execution of Trial Stage in Italy

The activities and the services that have been set up in the Trial 2 are briefly described here

after:

Anomalies detection

The markers computed in relation to predefined scenarios in terms of crop type, reference

periods and specific thresholds, during which agricultural practices must take place, have

been implemented in a decision model to verify parcel’s correct classification. The model has

been run for each parcel of the macro-classes considered as suitable for the automatic

detection of anomalies.

Here below some examples of parcels for which the original macro class has been confirmed

through the automatic analysis based on the related markers or that have been detected as

anomalous.

Figure 148: Examples of verified (left) and not verified (right) autumn-winter arable land parcel



Figure 149: Examples of verified (left) and not verified (right) summer arable land parcel

Figure 150: Examples of verified (left) and not verified (right) Temporary Grassland parcel

Re-classification of LPIS anomalous parcels

Parcels detected as anomalous have been automatically re-classified testing the validity of

the markers of the other macro classes. Here below some examples.

Figure 151: Examples of not verified (left) Autumn-Winter arable land re-classified as Summer arable Land (right)



Figure 152: Examples of not verified (left) Summer arable land re-classified as Artefact (right) due to the presence of a new building

Trial 2 results

Trial 2 Results for the Romanian AOI

During Trial 2, Terrasigna's toolbox for crop determination and monitoring involved automatic

procedures for calculating the following products:

• Maps of the main types of crops, for an annual agricultural cycle completed;

• Intermediate maps with the main types of crops, during an annual agricultural cycle

(they may serve as early alarms for non-observance of the declared crop type);

• Early discrimination maps between winter and summer crops;

• Layers of additional information, with the degree of confidence for the crop type maps

delivered;

• Maps of the mismatches between the crop type declared by the farmer and the one

observed by the application;

• NDVI maps nationwide for a period of time, uncontaminated by clouds and cloud

shadows;

• Lists of parcels with problems, in order of the surfaces affected by inconsistencies;

• National maps with RGB aspect mediated for a period of time, uncontaminated by

clouds and shadows, obtained through the use of components C39.01 - Mosaic Cloud

Free Background Service and C39.03 - S2 Clouds, Shadows and Snow Mask Tool.



Figure 153: Example of CAP Support Analysis - Trial 2 results

Figure 154: Trial 2 results. Observed crop type map (2019) for the area of interest in Southeastern Romania



Figure 155: Trial 2 results. Observed crop type map (2019) for the entire territory of Romania

VALIDATION

The validation stage consisted in two different types of activities:

• Independent validation activities, performed against very-high resolution imagery and

other data sources, mainly field-collected data;

• Validation using reference data provided by APIA - the Romanian National Paying

Agency.

Independent validation activities, performed against very-high resolution imagery and other

data sources, took into account more than 5,800 plots, with a total surface of more than

77,000 ha. The validation work has shown 98.3% correct estimations for 8 crop categories:

winter wheat, maize, sunflower, soybean, rapeseed, hayfields, peas and winter barley. There

can be noticed an increased performance for larger plots (more than 99% for all 8 crop

categories for plots larger than 20 ha).



Figure 156: Results of the validation based on independent data consisting of very-high resolution imagery and field-collected data

Validation using reference data provided by APIA - the Romanian National Paying Agency

The Agency for Payments and Intervention in Agriculture (APIA) performs annual verifications

of the farmers that applied for subsidies using “classical” on-site compliance verifications, as

well as remote sensing-based checks. The reference plots used for the validation activities

cover the entire area of Romania eligible for CAP support and vary in terms of declared area.

For each plot, a dominant crop code (corresponding to Terrasigna’s crop codes system) was

assigned, provided it covers more than 40% of the plot’s area. Therefore, the validation

focused on the 32 predominant crops. Data have been then intersected with Terrasigna’s

observed crop type maps and finally joined with the initial set of declarations. This part of the

validation activities took into account more than 16,000 plots, with a total surface of more

than 60,000 ha and the results have been broken down for 7 plot classes:

• Very small plots: <0.5 ha, 0.5-1 ha;

• Small plots: 1-2 ha, 2-5 ha;

• Medium plots: 5-10 ha, 10-20 ha;

• Large plots: >20 ha.

Validation using reference data provided by APIA showed a 97.28% accuracy percent for the

32 crops assessed, also noticing an increased performance for larger plots (more than 99%

for plots larger than 20 ha).



Figure 157: Results of the validation based on reference data provided by APIA - the Romanian National Paying Agency

Results of Trial Stage for the Italian AOI

During Trial 2, e-GEOS generated automatically the following products based on automatic

procedures:

• Maps of the anomalies between the crop type declared by the farmer and the one

observed by the application;

• Updated LPIS after the re-classification of the anomalies, for the macro crop classes

considered

Here below the product’s examples on 2 areas of interest characterized by a different

agricultural prevalent use: arable land and permanent grassland.

As expected in the arable land area, due to the usual crop rotation practice, the largest part

of parcels changed their agricultural use between 2016 and 2018 (Figure 159). In most cases

it is simply a change from winter-autumn to summer or temporary grassland and vice versa

(Figure 159).



Figure 158: LPIS parcel classified according to verified parcels (in green), anomalous parcels (in red) and not analyzed parcels (in grey) - Arable land area

Figure 159: LPIS parcels type 2016 (left) and 2018 (right) after re-classification of anomalous parcels - Arable land area



This is confirmed also by the following pie charts that describe, for different crop families

(autumn-winter arable land, summer arable land and irrigated summer arable land), the

percentage of parcels having the crop family confirmed (percentage number in green) and

the percentages of parcels not confirmed, re-classified as other crop families.

Figure 160: 2016 LPIS Summer arable land parcels update to 2018

Figure 161: 2016 LPIS Winter-Autumn arable land parcels update to 2018



Irrigated summer arable land parcels (e.g. rice paddies) are mostly confirmed (few anomalies)

probably because these types of crop field, supported by irrigation systems, are not subject

to crop rotations.

Figure 162: 2016 LPIS Irrigated summer arable land parcels update to 2018

For what concerns the permanent grassland area, as expected, the percentage of anomalies

is meaningful lower because usually the agricultural use of these parcels is stable for several

years (a grassland field, according to common regulations, is defined as permanent if it is not

ploughed for 5 years, at least).

Figure 163: LPIS parcel classified according to verified parcels (in green), anomalous parcels (in red) and not analyzed parcels (in grey) - Permanent grassland area



Figure 164: 2016 LPIS Permanent grassland parcels update to 2018

Permanent crops have been analysed using markers finalized to detect explant events. The

percentage of explants is low (<1%). Here below an example of a vineyard parcel, present

since 2012, explanted on March 2018.

*Google Earth

Figure 165: Example of NDVI temporal trends (2017-2018) of a vineyard parcel explanted on March 2018.

The accuracy of the methodology proposed for the LPIS anomalies detection and re-

classification has been assessed through a validation activity based on reference data

extracted from very high-resolution imagery.

About 1000 parcels, on a total amount of 18.283, corresponding to 7.5% of total hectares,

have been considered for the accuracy assessment. The resulting validation dataset was

composed by 4 main crop families (Autumn winter arable land, Summer arable land,

Permanent grassland and Temporary grassland), reflecting the crop families distribution of



the entire area. The other crop families were not represented by a number of parcels

meaningful from a statistical point of view, therefore they have not been considered in the

accuracy assessment.

Crop family Parcel number Accuracy (%)

Autumn winter arable land 26 84.6%

Summer arable land 55 96.4%

Permanent grassland 973 96.5

Temporary grassland 73 38.2%

Figure 166: Results of the validation based on reference data extracted from very high-resolution imagery

The results show that the accuracy is quite high for permanent grassland and summer arable

land (more than 95%), high for winter arable land (85%), but for what concerns temporary

grassland crop family, with respect the farmers’ declarations, just about 40% are confirmed.

The remaining 60% mis-classified are distributed, according to farmers’ declarations, mainly

as permanent grassland (33%) and they require an additional refinement of marker rules to

improve the accuracy.



Component code

and name


status

Component location

C07.01 - FedEO

Gateway

Data Management

(Collection, Curation,

Access) – EO

Collection Discovery,

EO Product Discovery,

Catalog, Metadata

Operational

component, used

in both Trial 1 and

Trial 2 (for the

Romanian AOI), in

combination with

FeoEO Catalog

and Data

Manager.

Owner: Spacebel

Visibility: visible to

project

The component is a

Java application that

can be made available

as software or can be

provided as a service

hosted by Spacebel.



C07.03 - FedEO

Catalog

The component was

used in combination

with the FedEO

Gateway and Data

Manager to setup a

complete chain to

retrieve and index

Sentinel-1, Sentinel-2

or Landsat data and

other data available

through FedEO on a

local processing

platform.

Operational

component,

deployed on an

application

server, used in

both Trial 1 and

Trial 2 (for the

Romanian AOI), in

combination with

FeoEO Gateway

and Data

Manager.

Owner: Spacebel


project

This component is

deployed on an

application server

(Tomcat) and can be

accessed by any client

application

implementing the

API.

C07.04 - Data

Manager

The component will

be used in

combination with the

FedEO Gateway and

FedEO Catalog to

setup a complete

chain to retrieve and

catalog Sentinel-1,

Sentinel-2 or Landsat

data (SciHub and

CMR/USGS) and other

data available

through FedEO on a

local processing

platform.

Operational

component,

deployed on an

application

server, used in

both Trial 1 and

Trial 2 (for the

Romanian AOI), in

combination with

FeoEO Gateway

and FedEO

Catalog.

Owner: Spacebel


project

This component is a

Java application

(.war) deployed on an

application server

(GlashFish).

Can be made

available as software

to be deployed in

combination with

FedEO Gateway

component (to access

remote catalogs) and

FedEO Catalog (to

store metadata).

C39.01 - Mosaic

Cloud Free

Background

Service

Data management

and Data curation -

keeping an up to date

collage (mosaic) of

Sentinel-2 and

Landsat-8 images,

covering the area of

interest (AOI) with the

latest, cloud free

Operational

component.

Adjusted

according to the

needs of pilot

C2.1 CAP Support

(trials for the

Romanian AOI).

The component is

deployed on an

application server and

provides a remote

sensing monitoring

service developed in-

house by Terrasigna.



satellite scenes; the

fusion and

harmonization

between images are

made only at RGB

level, mainly for eye

inspection, but also

for other possible

advanced processing;

the whole process

chain is independent

and self-content,

based on cloud and

shadows mask

extraction, histogram

matching procedures

and, finally, a pixel

based analysis.

Backgrounds will be

updated

automatically, soon

after a new raw scene

is available during the

whole Trial 2 period.

The service can run on

Linux server,

delivering results via

WMTS.

C39.02 - EO Crop

Monitoring Service

Descriptive analytics –

EO data processing.

The component is

able to assess the

agriculture parcels

from satellite data

and farmers’

declarations in order

to create a series of

products like, Crop

masks, Parcels used

maps and Crop

inadvertencies maps,

based on SITS -

Satellite Image Time

Series.

Operational

component.

Adjusted

according to the

needs of pilot

C2.1 CAP Support

(trials for the

Romanian AOI).

Service hosted by

Terrasigna. The

component is running

on a Linux server.



C39.03 - S2 Clouds,

Shadows and

Snow Mask Tool

Data curation - EO

data preprocessing.

The tool produces

Sentinel-2 Clouds,

Shadows and Snow

Masks, based only on

raw data, improving

the results of the

genuine quality

assessment band. The

results are raster

maps (GeoTiff) with 4

label codes: 0 – for no

data, 1 – for

uncontaminated/ free

pixels, 2 – for snow, 3

– for shadows and 4 –

for clouds.

Operational

component.

Adjusted

according to the

needs of pilot

C2.1 CAP Support

(trials for the

Romanian AOI).

A stand-alone

executable file was

prepared for Linux

environment and is

deployed on

Terrasigna’s servers.

C28.01 DataCube

Management and

preprocessing of

input EO data for their

operational usage

The component is

operational and it

is already used in

the Trial 1

e-GEOS Server

EO processing

Processing chain for

multitemporal indices

computation from EO

data

Markers engine

Computation of

markers at

agricultural parcel

level



Data Assets


original

source

Dataset

location

Volume

(GB)

Velocity

(GB/year)

Optical

satellite

imagery

Landsat-8

OLI

NASA - USGS

(U.S.

Geological

Survey) –

accessed via

USGS Earth

Explorer

Terrasigna’

s servers

(local

storage)

Trial 2

(2019):

approximat

ely 35 – 40

GB

Trial 1

(2017+2018

):

approximat

ely 60 GB

2017 - 2019

(Trial 1 +

Trial 2):

approximat

ely 100 GB

approximately

35 GB/year

(the pilot area

is covered by 3

Landsat-8 tiles

- 181/29,

182/29, 183-

29, with a 16-

days revisit

time;

approximately

40 Landsat-8

scenes used for

each

agricultural

season; each

archive

containing 185

km X 170 km

tiles is about

900 MB)

Optical

satellite

imagery /

Copernicus -

Sentinel

Sentinel-

2 MSI -

both

Sentinel-

2A and

Sentinel-

2B

ESA

(Copernicus

Data), via

Copernicus

Open Access

Hub

Terrasigna’

s servers

(local

storage)

Trial 2

(2019):

approximat

ely 90 GB

Trial 1

(2017+2018

):

approximat

ely 140 GB

2017 - 2019

(Trial 1 +

Trial 2):

approximately

85 GB/year

considering

the full

constellation

(Sentinel-2A +

Sentinel-2B)

(the pilot area

is covered by 2

Sentinel-2 tiles

- 35TMK and

35TNK, with a



approximat

ely 230 GB

5-days revisit

time;

more than 120

Sentinel-2

scenes used for

each

agricultural

season; each

archive

containing 100

km X 100 km

tiles is about

700 MB)

In-situ data In-situ

data

Field data Terrasigna’

s servers

(local

storage)

Trial 2:

approximat

ely 100 MB

Trial 1

(2017+2018

):

approximat

ely 100 MB

2017 - 2019

(Trial 1 +

Trial 2):

approximat

ely 200 MB

approximately

100 MB/year

Farm profile

data

Farm

profile

data -

farmers'

declarati

ons

regarding

crop

types and

area

covered,

for each

APIA (Agency

for Payments

and

Intervention

in Agriculture)

- Romanian

National

Paying Agency

Terrasigna’

s servers

(local

storage)

Trial 2:

approximat

ely 150 MB

(farmers'

declarations

for 2019)

Trial 1:

approximat

ely 150 MB

(farmers'

declarations

approximately

150 MB/year



agricultur

al season

for 2017 and

2018)

2017 - 2019

(Trial 1 +

Trial 2):

approximat

ely 300 MB

Optical

satellite

imagery /

Copernicus -

Sentinel

Sentinel-

2 MSI -

both

Sentinel-

2A and

Sentinel-

2B

ESA

(Copernicus

Data), via

Copernicus

Open Access

Hub

e-GEOS

servers

(local

storage)

2017 - 2018:

approximat

ely 290 GB

approximately

170 GB/year

considering

the full

constellation

(Sentinel-2A +

Sentinel-2B)

and the raw

data (.safe)

and NDVI

(the pilot area

is covered by 1

Sentinel-2 tiles

- 32TPR, with a

5-days revisit

time;

Vector data LPIS

Verona

Province

Italian Paying

Agency

e-GEOS

servers

(local

storage)

60 MB 60 MB

Tables Activity

markers

for

agricultur

al fields

e-GEOS e-GEOS

servers

(local

storage)

100 KB 100 KB



The highly-automated fuzzy-based proposed approach developed by Terrasigna for the

Romanian AOI used within the C2.1 CAP Support pilot allows the performing of Big Data



analytics to various crop indicators, being reliable, cost- and time-saving and allowing a more

complete and efficient management of EU subsidies, strongly enhancing their procedure for

combating non-compliant behaviours. Terrasigna's proposed methodology has undergone

continuous development and improvements over the last 4 years. A further development of

the Crop Monitoring Service is able to provide products tuned in order to fulfil the

requirements of the 2015-20 EU Common Agricultural Policy. The developed technique is

replicable at any scale level and can be implemented for any other area of interest.

The methodology proposed by e-GEOS is a quick approach to detect the LPIS anomaly of some

crop families mainly related to arable land (winter and summer arable land) and temporary

and permanent grassland. The performance and the usefulness of the approach marker-

based could be improved by using more refined marker’s rules in order to be able to analyse

single crop types, reducing the need to aggregate them in macro classes.

KPIs

KPI short

name

KPI

description

Goal

descriptio

n

Base

value

Target

value

Measured

value

Unit of

value

C2.1_1

(Values

measured

for the

Italian

AOI)

Percentage of

LPIS area

processed vs

global LPIS

coverage in

terms of

hectares

Agricultural

territory

coverage

N/A 50% 71% %

C2.1_2

(Values

measured

for the

Italian

AOI)

Percentage of

parcels > 0.5

hectares that

are processed

Small parcel

size

capability

N/A 80% 98% %

C2.1_3

(Values

measured

for the

Italian

AOI)

Parcel

anomalous

that are not re-

classified

Re-

classificatio

n

performanc

e

N/A 10% 2% %



C2.1_4

(Values

measured

for the

Romanian

AOI)

Processed

surface

Agricultural

territory

coverage

N/A 10 000 Trial 1:

10 000 km2

Trial 2:

130 000 km2

(whole

country)

sqkm

C2.1_5

(Values

measured

for the

Romanian

AOI)

Number of

crop types

addressed

Diversity.

Ability to

recognize

different

crop

cultivation

patterns

NA 5 Trial 1: 5 crop

families

Trial 2: 32

crop types

crop

types



15 Pilot 13 [C2.2] CAP Support (Greece) Pilot overview

NEUROPUBLIC and GAIA EPICHEIREIN have launched a highly ambitious pilot in Northern

Greece in an area covering 50000ha, targeting towards the evaluation of a set of EO-based

services designed appropriately to support specific needs of the CAP value chain stakeholders.

The pilot services rely on innovative tools and complementary technologies that will sustain

the interconnection with IoT infrastructures and EO platforms, the collection and ingestion of

spatiotemporal data, the multidimensional deep data exploration and modelling and the

provision of meaningful insights, thus, supporting the simplification and improving the

effectiveness of CAP. The pilot activities aim at providing EO-based products and services

designed to support key business processes including the farmer decision-making actions

during the submission of aid application and more specifically leading to an improved

“greening” compliance. The ambition of the current pilot is to deal effectively with CAP

demands for agricultural crop type identification, systematic observation, tracking and

assessment of eligibility conditions over a period of time. The pilot activities are fully aligned

with the main concepts of the new agricultural monitoring approach which will effectively

lead to fewer controls, will facilitate and expand the adoption of technology to the farmer

communities, will promote the penetration of EO deeper into the CAP line of business and

raise the awareness of the farmers, agronomists, agricultural advisors, farmer cooperatives

and organizations (e.g. groups of producers), national paying agencies (e.g. OPEKEPE) on how

new technological tools could facilitate the crop declaration process. The pilot will mainly

focus on annual crops with an important footprint in the Greek agricultural sector (rice,

wheat, cotton, maize, etc.). The main stakeholders of the pilot activities are the farmers from

the engaged agricultural cooperatives in the pilot area and GAIA EPICHEIREIN that has a

supporting role in the farmers’ declaration process. CSEM and FRAUNHOFER are also involved

in the pilot providing their long-standing expertise in the technological development

activities.


The pilot has completed the first round of trials during Trial 1 in the greater area of

Thessaloniki, Greece. It effectively demonstrated how Big Data enabled technologies and EO-

based services can support specific needs of the CAP value chain stakeholders and more

specifically the systematic and more automatic assessment of eligibility conditions for

“greening” aid declarations. Lessons-learnt from Trial 1 are valuable and critical for delivering

even more accurate solutions. Certain technical considerations have been reported, in

respect to the followed methodology and especially its applicability in challenging datasets

comprised of new and unseen data for the trained crop models. By following a systematic and

exhausting data screening parallel activity, it was identified that inter-year changes in crop

cultivating periods (begin, end, peak, length) should be deeply considered. These inter-year

changes are mostly deriving from climate changes, regulatory and market conditions, regional

characteristics etc. Major effort is underway by pilot partners to exploit new data, features



and classification methodologies that take into account all the above and deliver even better

pilot results. Moreover, in Trial 2 new visualization tools will be explored that could handle

nice-to-have features such as intra-parcel crop classification results (pixel level) and validation

of the classification outcomes. To this end, FRAUNHOFER will expand its suite of provided

tools for the DataBio pilots (until now for pilots A1.1, B1.2, C1.1 of WP1) in order to cover

specific needs of C2.2 pilot.

Figure 167: Geographical distribution of the parcels that take part to the pilot C2.2 activities


Trial 2 timeline

The following roadmap applies for the pilot activities:

Figure 168: C2.2 pilot timeline




The following work was conducted by NP, as part of the preparatory work for Trial 2:

• As the requirements in terms of sensors deployed for in-the-field usage differ between

pilot sites, it became obvious that several adaptations were necessary in respect to

C13.03 and the way data was represented for both cloud-based storing and Gaiatron

station configuration. More specifically, all relational and EAV (Entity-Attribute-Value)

data representations were adapted to more flexible and scalable JSON format that

performs better in a dynamic IoT measuring environment. The latter is widely

acknowledged as JSON has become gradually the standard format for collecting and

storing semi-structured datasets that originate from IoT devices. The adaptation to a

JSON format for modelling IoT data streams allows the further processing, parsing,

integration and sharing of data collections in support of system interoperability

though the adaptation on well-established and favoured linked-data approaches

(JSON-LD).

• Lessons-learnt from Trial 1 led to C13.02 GAIABus DataSmart Machine Learning

Subcomponent’s advancement in two ways:

1. Methodologically, deep convolutional neural networks have been explored that

have proven to outperform classical machine learning classification methods.

Crop classification is performed into “super” classes or major crop types. The

model will predict the tested parcels crop type, giving specific probability. The

eligibility status will be visualized by the system of traffic lights at parcel level,

2. In terms of EO data, Sentinel-2 derived NDVI measurements from multiple years

(2016, 2017 and 2018) are available for the region of interest, thus, offering a

strong multi-year data record for building EO-based crop models that capture

inter-year trends and changes that hindered crop classification accuracy in Trial 1.

During the preparatory phase of Trial 2, CSEM continued on improving the accuracy of its

C31.01 Neural Network Suite for specific crop classes that can be considered a baseline for

future crop modelling activities. As a first step, a structured method of digitizing expert

knowledge in a data-driven architecture was offered. A pipeline was developed significantly

reducing the complexity of creating models by removing the need of hand-crafted filtering,

making it a cost-effective option for bringing neural network models to the market. It was

identified that is was important to verify the reliability of the data with minimum supervision

and then, use the clean data to train the network for the classification problem at hand. All

the efforts, led to an overall accuracy in terms of classification over 92% for Maize, Wheat

and Legumes. Further investigation on particular taxonomical varieties found that training a

crop model with one variety and testing with other varieties performed well, apart from the

crop type Legumes, which shows large intra-class variability. This aspect of creating a model

with only one variety has the potential to simplify the creation of models in the future. As this

methodology is pixel-based, pixel probabilities are aggregated into parcel-level binary result

that provides exact fit for the CAP Support use case. In particular, a parcel is assigned to a



particular crop type label (classified) if the majority of the parcel pixels have a probability to

belong to the class greater than a given threshold (i.e. 0.5).











at the UI-Layer. Following this approach provides more flexibility and eventually allows

thinking about a platform which enables the users to build views for custom analytic tasks



Insurance.

The implementation of Trial 2 focuses primary on the integration of external services. In this

scenario a web-application was developed to enable professional users - to do crop type

classification on demand using latest or historic satellite images. A variety of visual analytic

tools are included to allow efficient exploration of available data. The functional capabilities

for the purpose of classification are offered by external services which in turn exploit methods

from the domain of machine learning (ML). The integration of services and data sources is

done using well-defined RESTful interfaces.

Trial 2 execution


activities:

By M26, the DataBio platform v2 for the pilot is fully operational and offers a valuable error

checking tool for assessing “greening” compliance.



hosted in NEUROPUBLIC’s N. Greece offices with the participation of other DataBio partners

involved in the WP1 pilots led by NEUROPUBLIC. Furthermore, the generalization and simple

adaptation to other scenarios was discussed intensively.

By M32, a first instance of the aforementioned analytics platform has been finalized and

deployed. The use of ML services is available providing a proof of concept for its use in CAP

Support scenarios. FRAUNHOFER was responsible for the development of the UI, integrating

map, pixel heat maps from the different classifiers and information visualization capabilities

(Figure 169, Figure 170).A CSEM developed system for the management of Machine Learning




models was used to facilitate the simple and retraceable management of models. RESTful

services, combined with security features in the form of JWT tokens and encryption with

HTTPS, were implemented and integrated into service. The service has also been

containerized to allow simple deployment. This service enables the communication with the

FRAUNHOFER’s component GeoRocket and UI for the on-demand classification, in both pixel

and parcel levels, of crop types.

Figure 169: FRAUNHOFER's UI screenshot colour coding different crop types

Figure 170: FRAUNHOFER's UI screenshot that integrates CSEM’s classification results into pixel heat maps

By M34, the assessment of “greening” compliance begins for the current year’s (2019) aid

applications. The crop types that have been modelled and tested by C13.02 GAIABus

DataSmart Machine Learning Subcomponent are seven (7) in total and more specifically:

cereals, cotton, maize, tobacco, rapeseed, rice and sunflower and correspond to the area of

interest (Thessaloniki, Greece region). If seen as multi-class classification problem the

performance of the trained crop models to the testing 2019 data are offered at the following

table and the confusion matrix respectively:



Table 8: Crop classification results

PRECISION RECALL F1-measure ACCURACY

Maize 0.994 0.932 0.945 0.986

Cotton 0.990 0.954 0.961 0.982

Rapeseed 1.000 0.713 0.833 0.997

Sunflower 0.985 0.823 0.818 0.974

Tobacco 0.999 0.712 0.762 0.996

Rice 0.999 0.994 0.993 0.999

Cereals 0.952 0.967 0.958 0.959

Figure 171: Normalized crop classification confusion matrix (horizontal axis corresponds to the true label, whereas the vertical one to the predicted label)



What derives from the classification results is that some crop types can be more easily

identified using EO-based deep learning methodologies (e.g. Maize, Cotton, Rice, Cereals).

Some other crops like rapeseed and tobacco are more challenging and sometimes they get

confused with other crops (e.g. cereals) that exhibit similar characteristics in terms of annual

cycles of growth and decline of vegetation (as “seen” by NDVI measurements).

For the assessment of “greening” compliance the trained models can be seen as the backbone

of the methodology. As in Trial 1, the farmers that could benefit from the methodology are

the ones holding parcels of >10ha that are eligible for checks for greening requirements

related to crop diversification. A traffic light system is employed to inform the farmers that

there could be a problem within his/her declarations. This means that:

a) if the confidence level of the classification result is >85% and the declared crop type

of the farmer was confirmed by the classification -> traffic light should be green

b) if the confidence level of the classification result is <85% and the declared crop type

of the farmer was confirmed by the classification -> traffic light should be yellow

c) if the declared crop type of the farmer was not confirmed by the classification -> traffic

light should be red

According to this approach, the farmer is more protected in order to receive the payment as

robust and reliable feedback is provided to him/her.

The following example effectively highlights the followed CAP support methodology and the

exploitation of the trained models.

The farmer holds a total arable area of more than 10ha, thus, the primary greening

requirement is met. In terms of crop diversification, the main crop type is Cereals with 6.88ha

in total. However, some issues have been identified and marked using the aforementioned

traffic light system.



Table 9: Greening eligibility assessment using a traffic light system.

Crop group DataBio Assessment Traffic

Light

Area

Determined

(ha)

AP ID Declare

d

Detected Status Categorization

001 Cereals Cereals Assesse

d

Compliant 2.08


d

Compliant 1.67

003 Maize Cereals Assesse

d

Not compliant 1.1

004 Maize Maize Assesse

d

Insufficient

evidence

1.46


d

Insufficient

evidence

1.25

006 Cotton Cotton Assesse

d

Compliant 0.82

007 Cotton Cereals Assesse

d

Not compliant 0.73


d

Compliant 1.88

Total 10.99

The farmer is notified for the issues (especially red indications are important as Cereals - the

main crop seems to cover more than 75% of the cultivated land) that puts at risk his/her

eligibility for greening compliance (the main crop may not cover more than 75% of the total

arable land), thus, contributing to raising awareness and allowing follow-up activities to be

taken.



Figure 172: Greening eligibility assessment using a traffic light system (map projection example)

The final KPI measurements are collected. More specifically, with regular discussions with

GAIA EPICHEIREIN and its Thessaloniki FSC, final KPI measurements and feedback was

collected.

Trial 2 results


expected TRL. The key pilot stakeholders (i.e. the farmers and GAIA EPICHEIREIN that has a

supporting role in the crop type declaration process), continued (for a second year) to benefit

from the EO-based geospatial data analytics, thus, promoting the simplification and

improving the effectiveness of CAP. The pilot is fully aligned with the main concepts of the

new agricultural monitoring system and adopts a technology-driven traffic light methodology.

The traffic light system has proven to be a powerful tool in assessing the “greening” eligibility

conditions and informing the farmers about the assessment outcomes, thus, leading to fewer

errors and increased funds absorption.





Component code

and name


location

C13.01

Neurocode (NP)

Neurocode allows

the creation of the

main pilot UIs in

order to be used

by the end-users

(GAIA

EPICHEIREIN) and

offering insights

regarding greening

compliance

deployed NP Servers

C13.02 GAIABus

DataSmart

Machine Learning

Subcomponent

(NP)

Supports EO data

preparation and

handling

functionalities

Supports multi-

temporal object-

based monitoring

and modelling and

crop type

identification

deployed NP Servers

C13.03 GAIABus

DataSmart Real-

time streaming

Subcomponent

(NP)

Real-time data

stream monitoring

for NP’s Gaiatrons

Infrastructure

installed in the

pilot sites

Real-time

validation of data

Real-time parsing

and cross-checking

deployed NP Servers



C31.01 Neural

Network Suite

(CSEM)

Delivery of an

accurate machine

learning crop

identification

system to be used

for the detection

of crop

discrepancies

deployed CSEM’s

Servers

C04.02 – C04.04

Georocket,

Geotoolbox,

SmartVis3D

(Fraunhofer)

Back-end system

for Big Data

preparation,

handling fast

querying and

spatial

aggregations (data

courtesy of NP)

Front-end

application for

interactive data

visualization and

analytics

deployed Fraunhofer

Servers

Data Assets


source

Datase

t

locatio

n

Volum

e (GB)

Velocity

(GB/year)

EO products

in raster

format and

metadata

Dataset

comprised of

remote

sensing data

from the

Sentinel-2

optical

products (2

tiles)

ESA (Copernicus

Data)

GAIA

Cloud

(NP’s

servers

)

>2600 >850



Sensor

measuremen

ts (numerical

data) and

metadata

(timestamps,

sensor id,

etc.)

Gaiasense

field. Dataset

composed of

measuremen

ts from NP’s

telemetric

IoT agro-

climate

stations

called

GAIATrons

for the pilot

area.

NEUROPUBLIC GAIA

Cloud

(NP’s

servers

)

Severa

l GBs

Configurable

collection and

transmission

rates for all

GAIATrons.

>20

GAIAtrons

fully

operational at

the area

collecting >

30MBs of data

per year each

with current

configuration

(measuremen

ts every 10

minutes)

Parcel

Geometries

(WKT),

alphanumeri

c parcel-

related data

and

metadata

(e.g.

timestamps)

Dataset

comprised of

agricultural

parcel

positions

expressed in

vectors

along with

several

attributes

and

extracted

multi-

temporal

vegetation

indices

associated

with them.

NEUROPUBLIC GAIA

Cloud

(NP’s

servers

)

Severa

l GBs

1 GB/year

The update

frequency

depends on

the velocity of

the incoming

EO data

streams and

the

assignment of

vegetation

indices

statistics to

each parcel.

Currently,

new Sentinel-

2 products are

available

every 5 days

approximatel

y and the

dataset is

updated in



regular

intervals



In the context of DataBio, NP has initiated a series of CAP Support activities for providing

supporting tools and services, in line with the commands of EC’s new agricultural monitoring

approach. This effort is expected to continue in the next years (contributing to the

sustainability of the projects outcomes) as part of another high-profile research project,

H2020 NIVA (https://www.niva4cap.eu/) where NP is a key partner and is close collaborating

with the Greek paying agency (OPEKEPE). This will allow evolving/further validating the

DataBio-enhanced services, so that they progressively become part of the suite of CAP

Support tools offered by GAIA EPICHEIREIN for aiding the crop declaration process.

From an implementation point of view, the quality of the provided services of NP greatly

benefited from the collaboration with leading technological partners like CSEM and

FRAUNHOFER, that specialize in the analysis of Big Data. Moreover, feedback from the end

users and lessons-learnt from DataBio’s pilot execution significantly fine-tuned and will

continue to shape the suite of dedicated tools and services, thus, facilitating their penetration

CAP Support line of business.

KPIs

KPI

short

name

KPI

description

Base

value

Targ

et

value

Measur

ed value

Unit

of

valu

e

Comment

C2.2_

1

Decrease in

false crop

type

declaration

s following

the

supporting

services vs

what

would be

expected

10 8 9.4 of

initial

declarat

ion

were

identifie

d as

potentia

lly

problem

atic

% A 9.4% of the initial farmer

declarations exhibited

potential errors based on

the followed

methodology. The farmers

were notified and received

follow-up information.

The offered advisory

services allow the farmers

holding parcels of >10ha

and more (prerequisite for

https://www.niva4cap.eu/



based on

historical

data

the greening aid

application) to be

compliant to the greening

requirements in respect to

crop diversification, thus,

favoring a further

reduction to the

percentage of erroneous

declarations that threaten

funds absorption.

C2.2_

2

Accuracy in

crop type

identificati

on

No

prior

infor

mati

on

>80 98.5 % The overall accuracy of the

crop classification

methodology used in the

pilot reached 98.5%.

Respectively, precision

reached 99.1%, recall

94.6% and f1-measure

94.7%. Some crop types

seem to be more easily

identifiable (maize,

cotton, rice, cereals)

whereas others appear to

be more challenging

(rapeseed and tobacco)

C2.2_

3

Number of

crop types

covered

Initial

ly no

crops

were

being

cover

ed by

the

syste

m

7 7 crop

types

support

ed in the

greater

region

of

Thessal

oniki,

Greece

plain

num

ber



16 Conclusion The document D1.3 describes the final status of agriculture pilots and concludes the Trial 2

including the performance indicators of the pilots. The final status of pilots also includes the

utilisation of Big Data datasets and the implementation status of DataBio

components/services defined in WP4 and WP5.

This document shows how all the individual pilots has reached their defined level of maturity

despite the different initial states of services and technologies and different level of services

integration before the start of Trial 1. Besides this, it is highlighted how, despite of different

territories and different thematic scope, the pilots have succeeded on the development of a

common approach to the problems solutions and a common focus of the use of Big Data

Technology and DataBio components.

DataBio results in agriculture are already actively used in new projects, such as NIVA16 or

DEMETER17, both of them working on the modernisation of European Agriculture.

16 The NIVA project (https://www.niva4cap.eu/project), developed by a consortium of 27 different partners including nine CAP (Common Agricultural Policy) Payment agencies, is the answer to the current discussion on the modernization of the CAP. Regarding this context, one of the main objectives of NIVA is to spread and obtain the maximum benefit from the ongoing digitization of the agricultural sector to reduce administrative burdens and to improve the sustainability and competitiveness of the sector. Through this digitization data-driven process, new potential for data use and reuse will emerge, thus, improved accessibility of CAP data as Big Data Sources. Those data sources have been proved a powerful tool for monitoring the societal benefits of agriculture towards rural development or climate change mitigation, therefore an improved access to them will endorse the current process and will define new and promising ways of use. 17 The DEMETER project (http://h2020-demeter.eu/) objective is to support farmers and cooperatives with their decisions regarding the control of their production and how they will manage Farming Information Systems and associated technologies more efficiently. Hence, fully aligned with DataBio results, a key objective of DEMETER is by demonstrating the impact of digital innovation and interoperable platforms to allow the farmers to increase the possible combination of tools from different suppliers or providers.

https://www.niva4cap.eu/project

http://h2020-demeter.eu/

D1.3 Agriculture Pilot Final Report · o Pilot B1.2: Cereals and biomass and cotton crops 2 o Pilot B1.3: Cereals and biomass crops 3 o Pilot B1.4: Cereals and biomass crops 4 •

Documents