Top Banner
CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB
22

CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

Dec 25, 2015

Download

Documents

Calvin Payne
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

CERN openlab V preparation,Data Analytics (for research)

Many contributors, especially EN-ICE and IT-DB

Page 2: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

Challenges

2

Online triggers and DAQ

Offline simulation and processing

Data storage architectures

Resource management and provisioning

Data analytics

Networks and connectivity

Page 3: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

3

Outline

Use cases and challenges Technology Analytics as a Service (AaaS) Education

Page 4: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

Use case: Quench Protection System

Critical system for LHC operation• Major upgrade for LHC Run 2 (2015-2018)

High throughput for data storage requirement• Constant load of 150k changes/s from 100k signals

Whole data set is transfered to long-term storage DB• Query + Filter + Insertion

Analysis performed on both DBs

Backup

LHC Logging(long-termstorage)

RDB Archive16 ProjectsAround LHC

4

Credit: Kacper Szkudlarek EN-ICE

Page 5: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

Use case: Quench Protection System

Nominal conditions• Stable constant load of 150k changes/s

• 100 MB/s of I/O operations

• 500 GB of data stored each day

Peak performance• Exceeded 1 million value changes per second

• 500-600 MB/s of I/O operations

All CERN production WinCC OA systems (accelerators, detectors and technical infrastructure, 600 servers) will benefit from these optimizations

Next challenge: ~10x increase • Required for next major upgrade (2019-2020)

6

Credit: Kacper Szkudlarek EN-ICE

Page 6: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

7

Anomaly detection

>. SVM - Support Vector Machines

Credit: Massimo Lamanna, Sebastien Ponce (IT-DSS), Stefano Alberto Russo (ex IT-DB)

Page 7: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

8

Data Placement / ATLAS

>Use cases: Trace Mining (user interactions with Distributed Data Management) Popularity (used for deciding which data to delete) Accounting and popularity (reports on data contents/popularity)

Log file aggregation

>ATLAS Distributed Data Management uses both SQL and NoSQL

Page 8: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

9

Data Placement / CMS

>Intelligent data placement models for the CMS experiment

>Need to extract further knowledge from the monitoring data in order to implement an effective data placement Correlate file-access monitoring with site status Readiness, queue length, storage and CPU available Classify analysis activities and needed resources Making recommendations Learn from the past trends and patterns

Page 9: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

10

Data Placement / EMBL-EBI

>To support the diverse data analysis that will take place within ELIXIR, the ability to ‘push’ data from a provider to a major analysis centres, or for the major analysis centre to ‘pull’ the required data set from a nearby source, becomes a critical capability

Page 10: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

11

Logging service (1/2)Credit: Chris Roderick

Page 11: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

12

Logging service (2/2)Credit: Chris Roderick

Page 12: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

13

Domain specific language

>LHC Logging (50+ TB/year)>Perform analysis as close to data as

possible, in database analysis: built-in + ORE?

>Multi source extraction API >Domain specific

language

Credit: Chris Roderick BE-CO

Page 13: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

14

Network monitoring

>Time correlation During a PS throughput test, was there any known

activity in the same link? There is packet loss, does this appears as degraded

performance somewhere at the same time

>We observe loss of performance in some network link Is it a network problem and where? Is it a storage problem?

Credit: Simone Campana

Page 14: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

15

ESA

>Envisage “intelligent” bots doing much of the researcher's work in scanning the archives to collect relevant information in a particular field.

>Such “automated bots” would present their results only when called upon and only focused on a problem at hand (e.g. give me serendipitous objects in the X-Ray range lying around the Crab Nebula, since an unexplained region of hot gas may have an effect on the infra-red region I am studying…).

>The bot may be further refined to extract only very good quality data from all X-Ray missions or for a given time

Credit: Salim Ansari

Page 15: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

16

FCCCredit: Johannes Gutleber

Page 16: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

17

Analytics and Modelling for Availability Improvement in the FCC

>Near real-time modelling of the accelerator complex and its infrastructure services would further improve early warning capabilities, permit preventive maintenance and leverage co-scheduling of fault-prevention interventions

>Real-world use-cases taken from LHC accelerator operation shall serve as the basis to develop formal data analytics scenarios

Credit: Johannes Gutleber

Page 17: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

18

Data analytics on scientific articles

>INSPIRE, ZENODO, ORCID>Automated extraction of information about

authors, references, key words, etc.) >Semantic analysis of text allowing identification of

the main field, key words (not appearing in the text), sentiment of references; validation based on their importance within the context of the publication and the ability to join and correlate concepts from different domains and publications.

Credit: Tim Smith

Page 18: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

19

Administrative Information System

>(among others)>Make the data available using a bi-

temporal model, one time dimension comes from the business – e.g. contractual dates; and the other one is purely technical and indicates when which data was effectively part of the DWH and allows writing queries using a “show data as of” date

Credit: Derek Mathieson

Page 19: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

Technology

Near real time processing• processing large amounts of data (Gigabytes per second)

with low latency (in the order of seconds) coming from different sources and domains

Batch processing (including predictive analytics)• Linear and nonlinear modelling, classical statistical tests,

complex time-series analysis and forecasting, classification, clustering

Data repositories, RDBMS and NoSQL Integration Challenges (Data Analytics as a Service)

20

Page 20: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

21

Analytics as a service

> “Analytics platform” or (Big data) “Analytics-as-a-service” (A3S ?):

> Data fed from multiple sources (live)

> Stored reliably> Data processing with multiple

systems> Easy access, domain expert

natural language (DSL)> Visualisation> Special interest from Human Brain

Project

Credit: CERN EN-ICE

Page 21: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

Education

“data scientist” role type Variety of tools and ideas, important

theoretical/academic background Implement a workshop/training along the

line of the one on multi-threading and parallelism

Clear need and interest about data analytics education and information sharing

22

Page 22: CERN openlab V preparation, Data Analytics (for research) Many contributors, especially EN-ICE and IT-DB.

Conclusion

Interest from many parts of CERN, experiments, engineering, administrative, IT

Leverages the work done in openlab IV Combined from the beginning with a multi

department AaaS service Education and outreach Interest from other research laboratories and

openlab partners• Challenges

• Interest in shared research / investigation / deployment

23