Top Banner
DATAIA-JST International Symposium on Data Science and AI | 11/07/2018 STREAMOPS : OPEN SOURCE PLATFORM FOR RESEARCH AND INTEGRATION OF ALGORITHMS FOR MASSIVE TIME SERIES FLOW ANALYSIS CEA list (C. Gouy-Pailler, [email protected]) UVSQ – DAVID laboratory (K. Zeitouni & Y. Taher) Foch Hospital -- UVSQ INSERM – VIMA team (Ph. Aegerter & M. Fischler)
10

CEA list (C. Gouy-Pailler, [email protected] ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

Jun 10, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

DATAIA-JST International Symposium on Data Science and AI | 11/07/2018

STREAMOPS : OPEN SOURCE PLATFORM FOR RESEARCH AND INTEGRATION OF ALGORITHMS FOR MASSIVE TIME SERIES FLOW ANALYSIS

CEA list (C. Gouy-Pailler, [email protected])

UVSQ – DAVID laboratory (K. Zeitouni & Y. Taher)

Foch Hospital -- UVSQ INSERM – VIMA team (Ph. Aegerter & M. Fischler)

Page 2: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

IDENTITY CARD OF THE TEAM

AI and streaming algorithms

Time series managementMicroservice-based infrastructure

Medical use cases

Page 3: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

STREAMING APPLICATIONS LANDSCAPE

Innovative algorithmsSpecialized approaches

from research community

Large-scale and robustsoftware community

Knowledge fromapplicative side

Kafka, Spark, flink, redis, mongoDB, cassandra, postgreSQL,

Microsoft Azure

Predix (GE), Mindsphere(Siemens), Bosch

MOA, research papers

StreamOps

Page 4: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

MEDICAL USE CASE EXAMPLE

=?

• To validate data from a connected patch (e.g. to generate alerts) compared to classical devices (e.g. multi-parameters recording from GE)• Constrained environment: monitoring during surgeries or post-operative

monitoring• Long term monitoring: 12h to 24h continuous monitoring

Page 5: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

POSITION OF THE STREAMOPS PROJECT – CHALLENGES

TRUST, USABILITY AND ROBUSTNESS

9

Data Trust: robust online preprocessing of

time series

Algorithms Trust:

Privacy and robustness in online contexts

Non regression tests in incremental ML

Make it usable: to practitioners and research

community

OPEN-SOURCE

On top of existing streaming softwares (MOA)

Operational Efficiency and

Robustness: bring advances

from open-source software

(kafka, microservices, time

series databases)

StreamOps

Page 6: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

SENSOR RAW DATA PROBLEMS

Page 7: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

SENSOR RAW DATA PROBLEMS

Data series from Sensor 1 Date Series from Sensor 2

NoiseMissing Data

Page 8: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

We adopt Basis Function Expansion F(x) represented by a linear aggregation of basis functions :

Data series from Sensor 1

Data series from Sensor 2

Noise is smoothed

Empty Values are Interpolated

F(x)

G(x)

SENSOR RAW DATA PROBLEMS

Page 9: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

PRIVACY, ROBUSTNESS AND NON-REGRESSION IN

INCREMENTAL LEARNING

• The need for privacy• When the ML model potentially brings information about individuals in the training set

• Incremental updates of the model by a milicious attacker could bring some leaks in these data

• Homomorphic encryption does not fit all use cases

• Research directions

• Differential privacy in ML online applications

• Robusness and privacy• Robustness in adversarial contexts or non-adversarial contexts (analyzed in the statistical query

framework in deterministic algorithms)

• In randomized online algorithms (necessary in private contexts) define and analyze general definitionof robustness (with attacks and defenses)

• Research directions

• Robustness in private online algorithms

• Non-regression in incremental learning• Anology with software development: how do you ensure that your model updates still satisfy some

constraints?

• Unit testing and non-regression tests for ML models

• Optimal training set sampling

Page 10: CEA list (C. Gouy-Pailler, cedric.gouy-pailler@cea.fr ...dataia.eu/sites/default/files/DATAIA_JST...Microsoft Azure Predix (GE), Mindsphere (Siemens), Bosch MOA, research papers StreamOps.

SOME REFERENCES

T. L. Coelho da Silva, K. Zeitouni, J. A. F. de Macêdo, et M. A. Casanova, « CUTiS: Optimized Online ClUstering of Trajectory Data Stream », in IDEAS 2016

Morvan, K. Choromanski, C. Gouy-Pailler, et J. Atif, « Graph sketching-based Massive Data Clustering », SIAM Int. Conf. Data Min. SDM 2018

Morvan, A. Souloumiac, C. Gouy-Pailler, et J. Atif, « Streaming Binary Sketching based on SubspaceTracking and Diagonal Uniformization », ICASSP 2018.

R. Pinot, « Minimum spanning tree release under differential privacy constraints », ArXiv180106423 Cs Math Stat, janv. 2018.

R. Mousheimish, Y. Taher, et K. Zeitouni, « Automatic Learning of Predictive CEP Rules: Bridging the Gap Between Data Mining and Complex Event Processing », in ACM DEBS 2017

Sandu Popa, K. Zeitouni, V. Oria, D. Barth, et S. Vial, « Indexing in-network trajectory flows », VLDB J., vol. 20, no 5, p. 643, oct. 2011.

Rafael Pinot, Anne Morvan, Florian Yger, Cedric Gouy-Pailler, Jamal Atif. « Graph-based Clustering under Differential Privacy”. To appear in UAI 2018, Monterey, USA.

Currently in Kyoto, JSPS summer program (2 months). Hisashi Kashima’s lab.