The Big Data Platform Initiative of the EC Joint Research Centre European Commission, Joint Research Centre Directorate I Competences, Unit I.3 Text and Data Mining EO&SS@BigData Project Joint Research Centre (JRC) Data analytics workshop for official statistics (daWos) Amsterdam. 10/09/2018 URL: https://cidportal.jrc.ec.europa.eu Contact: [email protected]
31
Embed
The Big Data Platform Initiative of the EC Joint Research ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Big Data Platform Initiative of the EC Joint Research Centre
European Commission, Joint Research Centre
Directorate I Competences, Unit I.3 Text and Data Mining EO&SS@BigData Project
Joint Research Centre (JRC)
Data analytics workshop for official statistics (daWos)
• Explosion of digital data sources led to the big data paradigm (Volume, Velocity, and Variety of data streams).
• Earth Observation (EO) entering big data thanks Copernicus Sentinel satellites (full, free, and open data).
• JRC task force recommended in late 2014 to start a big data pilot project on EO and Social Sensing.
• Initial state: fragmented approach hampering collaborative working and knowledge sharing.
• Project start: January 2015.
Policy context
• REGULATION (EU) No 377/2014 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL 3/4/14 establishing the Copernicus Programme and repealing Regulation (EU) No 911/2010. [JRC also mentioned in proposed new space programme regulation to enter into force by 1.1.2021]
• Communication of the Commission on Data, information and knowledge management at the European Commission (COM(2016)6626-final)
• Communication from the Commission on the European Cloud Initiative (COM(2016) 178 final): The Commission and participating Member States should develop and deploy a large scale European HPC, data and network infrastructure, including: the establishment of a European Big Data centre, E.g. hosted by JRC for multidisciplinary data but focused on INSPIRE/GEOSS/Copernicus spatial data [COM(2016 178 final].
• Communication from the Commission on Artificial Intelligence for Europe (COM(2018) 237 final).
• 2015: survey of user needs and proposal of solutions addressing their needs; endorsement of the concept of JRC Earth Observation Data and Processing Platform (JEODPP)
• 2016: procurement of hardware and first batch processing service with massive runs
• 2017: release of interactive visualisation/analysis and deployment of remote desktop services
• 2018: multi-petabyte extension, development of machine learning capabilities, JIPlib release, user basis in continuation expansion
Indicators
Decisions
Big data
Big geospatial data for policy
Policy relevant information
Data
Volu
me,
Velo
city,
Variety
atmosphere
marine
land
climate
emergency
security
Exploit data volume, velocity, and variety to generate policy relevant information
• Using FAIR data principles (findable, accessible, interoperable, reusable) • With data mining competence in shared and collaborative environment • Relying on reproducible workflows
directives, legislations, communications, …
Earth Observation, in situ, crowd sourcing, social sensing, text data, web scrapping, …
JRC Big Data Platform: Conceptual representation
Infrastructure
Based on commodity hardware and open-source software stack:
• Storage
• CERN EOS distributed file system
• Currently 5 PiB net capacity
• 2 more PiB net for development/testing
• Processing servers (batch processing)
• 1,400 cores over 35 nodes
• 3 GPU servers
• extensions including further GPU servers in late 2018
JEODPP in
As of September 2017
As of September 2018
Main software stack
Source: Soille et al., Future Generation of Computer Systems, 2017 DOI: 1010.1016/j.future.2017.11.007 (in press)