SECUREIoT Project | - DELIVERABLE · 2019-09-22 · programme under grant agreement No 779899. ... to make a version ready for review 0.12 27/11/2018 FUJITSU Adaptation to review

This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 779899. It is the property of the SecureIoT consortium and shall not be distributed or reproduced without the formal approval of the SecureIoT Management Committee. The content of this report reflects only the authors’ view. The Innovation and Networks Executive Agency (INEA) is not responsible for any use that may be made of the information it contains.

Project Acronym: SecureIoT

Grant Agreement number: 779899 (H2020-IoT03-2017 - RIA)

Project Full Title: Predictive Security for IoT Platforms and Networks of Smart

Objects

DELIVERABLE Deliverable Number D4.3 Deliverable Name Tools and Techniques for Predictive IoT

Security Dissemination level PU

Type of Document R

Contractual date of delivery M11

Deliverable Leader INRIA

Status & version Final - V1.0

WP / Task responsible INRIA

Keywords: Predictive algorithms, Process mining, Deep learning

Abstract (few lines): This deliverable introduces three major types of machine-

learning algorithms for predictive security: process mining,

variational autoencoders (deep learning) and behavioral

analysis. All of them will be used to predict potential anomalies

of the monitored IoT systems. Description and initial results are

given along with open-source solutions that can support their

implementation.

Deliverable Leader: Jérôme François (INRIA)

Contributors:

Nikos Kefalakis (INTRA), Abdelkader Lahmadi (INRIA), Remi

Badonnel (INRIA), Adrien Hemmer (INRIA), Jérôme François

(INRIA), Juergen Neises (FUJITSU), Thomas Walloschker

(FUJITSU), Jose Fran Ruiz (ATOS), Mariza Konidi (INTRA)

Reviewers: Sofianna Menesidou (UBI), Stylianos Georgoulas (INTRA)

Approved by: Stylianos Georgoulas (INTRA)

Page | 2

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

D4.3 - Tools and Techniques for Predictive IoT Security,

Version: v1.0- Final, Date 30/11/2018

Executive Summary The scope of this deliverable is to introduce methods and techniques that can be leveraged for

performing predictive security in the context of SecureIoT. Two types of techniques have been

identified, described and assessed through a preliminary evaluation: process mining and

variational autoencoders. Particular attention has been given to the conformity with the

requirements defined in WP2. The first technique aims at describing a complex process into a

Petri-Net model, that can be used then to follow the realization of a process and so predict its

future states. The Process mining technique has been widely used in the extraction and the

prediction of behavioural models. However, in SecureIoT, it is applied for predicting anomalies

of IoT systems rather than their traditional application for behavioural discovery and analysis of

Business Process Management systems (BPM). Unlike HMM (Hidden Markov Models) that could

be also used to predict future states of a system by building its behavioural model from

observations, process mining methods that we propose in this deliverable provide more

interpretable models and less probabilistic. However, recent works have proposed to use HMM

as un underlying method for process mining to build probabilistic models of systems workflow.

The second technique is, in our case, use to represent a sequence of events into low-dimensional

representations using neural networks. Sequences can be then clustered and so interpreted as

directions towards future states. Finally, we introduce the use of unsupervised machine learning

in the context of anomaly detection and its integration in a full process from data collection to

alert generation. This deliverable also recommends a list of open-source tools to be considered

in the context of the prototype implementation of WP4 analytics techniques.

Page | 3




Document History

Version Date Contributor(s) Description

0.1 23/05/2018 INRIA Table of contents

0.2 29/10/2018 INTRA Added deep predictive analytics with deep

learning methods description

0.3 30/10/2018 INRIA Description of process mining

0.4 04/11/2018 FUJITSU Description of the Zinrei AI services

0.5 06/11/2018 INRIA Requirements/architecture added and

notes for other partners

0.6 12/11/2018 INTRA Updated deep predictive analytics with

algorithm results and user manual

0.7 13/11/2018 INRIA Updated section on process mining

0.8 14/11/2018 FUJITSU Update on the Zinrei AI services

0.9 19/11/2018 INRIA, INTRA Update on process mining and deep

learning

0.10 25/11/2018 INRIA

Review, introduction, conclusion,

executive summary, abstract and

keywords

0.11 26/11/2018 INRIA Cleaning of comments and modifications

to make a version ready for review

0.12 27/11/2018 FUJITSU Adaptation to review comments

0.13 28/11/2018 ATOS Integration cybersecurity prediction

analytics

0.14 29/11/2018 INRIA Last edits (with reviewer comments

addressed)

0.15 29/11/2018 INRIA Clean version (all comments and revisions

validated)

1.0 30/11/2018 INTRA Final version to be submitted to the EC

portal

Page | 4




Table of Contents Executive Summary ......................................................................................................................... 2

Definitions, Acronyms and Abbreviations ...................................................................................... 7

1 Introduction ............................................................................................................................. 8

1.2 Global vision .......................................................................................................................... 8

1.2 Links with other WPs, tasks and deliverables ....................................................................... 9

1.3 Document Structure .............................................................................................................. 9

2 Requirements ........................................................................................................................ 10

3 Data analysis architecture and processing pipeline .............................................................. 12

4 Predictive techniques for IoT Security ................................................................................... 14

4.1 Prediction models using process mining ...................................................................... 14

4.1.1 Overview ................................................................................................................... 14

4.1.2 Summary of the Data Processing ................................................................................. 15

4.1.2 Process mining algorithms ........................................................................................ 16

4.1.2.1 Inductive miner algorithm .................................................................................... 17

4.1.2.2 Transition system miner algorithm ....................................................................... 22

4.1.4 Evaluation metrics ........................................................................................................ 25

4.1.5 Application to SecureIoT preliminary dataset .............................................................. 27

4.1.6 Requirement mapping .................................................................................................. 32

4.2 Deep learning ................................................................................................................ 33

4.2.1 Overview ....................................................................................................................... 33

4.2.2 Variational Autoencoders ............................................................................................. 34

4.2.3 VAE Architecture .......................................................................................................... 35

4.2.4 VAE Training .................................................................................................................. 36

4.2.5 Application to SecureIoT Use Cases/Results ................................................................ 38

4.2.6 Requirement mapping .................................................................................................. 39

4.2.7 Code Availability & User Manual .................................................................................. 40

5 Zinrai AI services .................................................................................................................... 43

5.1 Description .................................................................................................................... 43

5.2 Potential application to SecureIoT scenarios and datasets .......................................... 44

5.2.3 Requirement mapping ..................................................................................................... 44

Page | 5




6 Network traffic anomaly detection ....................................................................................... 46

6.1 Live Anomaly Detection System using Machine Learning Methods (L-ADS) ................ 46

6.2 Architecture of the planned solution .................................................................................. 47

6.3 Integration in IoT platforms ................................................................................................ 49

7 Conclusion ............................................................................................................................. 51

References .................................................................................................................................... 52

Table of Figures FIGURE 1: ANATOMY OF THE SECURITY INTELLIGENCE LAYER ..................................................................................................... 13 FIGURE 2: PROCESS PIPELINE .............................................................................................................................................. 14 FIGURE 3 EXAMPLE OF A PETRI NET ...................................................................................................................................... 15 FIGURE 4: DATA PRE-PROCESSING BLOCK .............................................................................................................................. 16 FIGURE 5: PROCESS MINING BLOCK ...................................................................................................................................... 17 FIGURE 6 PETRI NET DEDUCED FROM PROCESS TREE ................................................................................................................ 17 FIGURE 7 DIRECTLY-FOLLOW GRAPH OBTAINED FROM REFINED DATA .......................................................................................... 18 FIGURE 8 CUTS OF THE DIRECTLY-FOLLOW GRAPH [4] .............................................................................................................. 19 FIGURE 9 SEQUENCE CUT OF L............................................................................................................................................. 19 FIGURE 10 EXCLUSIVE CHOICE CUT OF L1 .............................................................................................................................. 20 FIGURE 11 LOOP CUT OF L3 ............................................................................................................................................... 20 FIGURE 12 SEQUENCE CUT OF L5 ........................................................................................................................................ 21 FIGURE 13 PARALLEL CUT OF L2 .......................................................................................................................................... 21 FIGURE 14 HOW TO TRANSFORM PART OF THE PROCESS TREE.................................................................................................... 22 FIGURE 15 PARAMETERS TO CHARACTERIZE A STATE ................................................................................................................ 23 FIGURE 16 TRANSITION SYSTEM WITH STATES DEFINED BY TWO MAXIMUM PREVIOUS ONES ............................................................ 23 FIGURE 17 DEFINITION OF A REGION S1 TO S2 WITH S2 IN S' ................................................................................................... 24 FIGURE 18 DEFINITION OF A REGION S1 TO S2 WITH S1 IN S' ................................................................................................... 24 FIGURE 19 DEFINITION OF A REGION S1 TO S2 WITH S1 AND S2 (NOT) IN S' ............................................................................... 24 FIGURE 20 PETRI NET GENERATED BY THE TRANSITION SYSTEM MINING ALGORITHM ...................................................................... 25 FIGURE 21 DESCRIPTION OF A DATA POINT EXTRACTED FROM THE DATASET ................................................................................. 28 FIGURE 22 REFINED DATA (XES FILE) ................................................................................................................................... 29 FIGURE 23 BEHAVIORAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (WITHOUT FILTERING) ....................................... 29 FIGURE 24 BEHAVIOURAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (WITH FILTERING, THRESHOLD = 0.07) ............... 30 FIGURE 25 BEHAVIOURAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (LOW NUMBER OF ACTIVITIES, K=1) ................... 31 FIGURE 26 EXTRACT OF THE BEHAVIOURAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (HIGH NUMBER OF ACTIVITIES,

K=100) ................................................................................................................................................................. 31 FIGURE 27 AN EXAMPLE OF A SHORT VAE, BASED ON [7] ........................................................................................................ 34 FIGURE 28 THE GENERAL FULLY CONNECTED NODE (PERCEPTRON) ............................................................................................. 35 FIGURE 29 THE RECTIFIER LINEAR UNIT FUNCTION (RELU) [9] ................................................................................................... 36 FIGURE 30 THE LATENT REPRESENTATIONS OF OUR TESTING DATA (CAN DATASET) ....................................................................... 38 FIGURE 31:HIGH-LEVEL DESCRIPTION OF THE L-ADS ............................................................................................................... 47 FIGURE 32:. USE OF L-ADS IN FIWARE-AWARE SECUREIOT DEVICES ........................................................................................ 49

Page | 6




List of Tables TABLE 1 MAXIMUM COST ALIGNMENT .................................................................................................................................. 26 TABLE 2 OPTIMAL COST ALIGNMENT..................................................................................................................................... 26 TABLE 3 SYNTHESIS OF PERFORMANCE METRICS WITH DIFFERENT BEHAVIOURAL MODELS ................................................................ 31 TABLE 4 INDUCTIVE MINER PERFORMANCE ........................................................................................................................... 32 TABLE 5 TRANSITION MINER PERFORMANCE ......................................................................................................................... 33 TABLE 6 SAMPLES FROM FIGURE 30 ..................................................................................................................................... 39 TABLE 7 ALGORITHMS PROPERTIES EVALUATION ..................................................................................................................... 40 TABLE 8 INPUTS FOR TRAINING VARIATIONAL AUTOENCODERS .................................................................................................. 40 TABLE 9 INPUTS FOR EXECUTING VARIATIONAL AUTOENCODERS ................................................................................................ 40 TABLE 10 MAPPING OF SECUREIOT REQUIREMENTS TO TENSORFLOW ....................................................................................... 45 TABLE 11 SUPPORTED ALGORITHMS AND MODELS ................................................................................................................. 45

Page | 7




Definitions, Acronyms and Abbreviations Acronym Title

AI Artificial Intelligence

CE Contextualization Engine

D Demonstrator

DP Data processing

Dx Deliverable (where x defines the deliverable identification number e.g. D1.1.1)

ISKB IoT Security Knowledge Base

ISTE IoT Security Templates Extraction

ML Machine Learning

Mx Month (where x defines a project month e.g. M10)

O Other

OSINT Open-Source Intelligence

OSS Open-Source Software

PaaS Platform-as-a--Service

PM Process mining

PU Public

R Report

RE Restricted to a group specified by the consortium (including Commission Services)

SPEP Security Policy Enforcement Point

TEE Template Execution Engine

TL Task Leader

VAE Variational Autoencoder

WP Work Package

WPL Work Package Leader

WPS Work Package Structure

XES eXtensible Event Stream

Page | 8




1 Introduction 1.2 Global vision The task T4.2 is dedicated to providing algorithms for enabling predictive security. It is one of the

major objectives of the project and can support several types of security mechanisms that are

envisioned in the project: manual analysis (dashboard), attack mitigation, risk assessment and

intelligent data collection. Concretely, predictive security refers to the ability to predict an attack

will occur or a threat will be exploited in the future. It mainly relies on detecting early (weak)

signs of an attack. The time horizon of this future can vary from short to long, i.e. seconds to

days. There is no a priori known or expected time horizon. It is dependent of the services that

predictive security supports such as attack mitigation for example. So, the prediction must be

done in a timeframe that allows the other services to properly integrate the outcome of the

latter. Such services also expect accurate prediction. More precisely, attack mitigation can

leverage prediction outcomes to automatically configure or reconfigure security services. Risk

assessment can combine the information about the IoT assets with the prediction to refine the

scoring. Intelligent data collection must rely on prediction and the confidence of the latter to

decide whether or not additional data should be gathered by monitoring probes in order to

improve the accuracy of the prediction, and so helps in increasing the efficiency of the mitigation

and the accuracy of the risk assessment.

T4.2 is closely coupled with T4.1. Indeed, extracted knowledge of T4.1 constitutes a rich source

of pre-processed information that predictive techniques may leverage. In this deliverable, we

thus remind the requirements that must be satisfied by the predictive security algorithms and,

similarly to D4.1 for knowledge extraction, we highlight their integration into the architecture

including the interactions with the other components. Three major predictive techniques are

detailed in this deliverable. First, deep learning is researched on by considering autoencoders.

Actually, to build a predictive approach, it is necessary to embed some historical or long-term

representation of a system behaviour. We can thus concatenate a sequence of data or events

and embed it in a single vector being then processed by usual algorithms to extract outliers, i.e.

the potential attacks. However, those vectors can be very large (with large time frame) leading

to large computational time. Autoencoders are therefore helpful since they reduce the

dimensionality of the vectors using a neural network while keeping information representative.

Another technique that has been introduced in D4.1 is Process Mining. We refine in this

deliverable and show how such an approach designed to learn a model of a process can be also

used for predicting the next states of the system. In a nutshell, learning a behavioural model can

be used to then evaluate if a new trace of the system execution follows this model or not. The

main advantage if the technique is the ability to deal with heterogeneous events but the

drawback is the multiplication of singular events. Therefore, the knowledge extraction using

clustering consolidates the raw data in representative states. Finally, we introduce a predictive

strategy based on non-supervised methods of machine learning. The analysis of the behaviour is

based in the processed network traffic of the IoT devices. Any anomaly or significant deviation in

Page | 9




any of the parameters used for detection of anomalous behaviour is identified as an incident.

This is reported either in the analysis of the data or alarms.

1.2 Links with other WPs, tasks and deliverables D4.2 is the first deliverable of T4.2. T4.2 takes as inputs the following WPs and tasks:

- WP2 defines the requirements that techniques developed in T4.2 must fulfil. They have

been documented in D2.2 and section 2 reviews them.

- WP2 also defines the architecture of the project. It is documented in D2.4 and we refer

to it when describing the analytics pipeline in section 3.

- WP4/T4.1 introduces methods for monitoring and extracting knowledge that can be used

as demonstrated in this deliverable in conjunction with predictive algorithms.

- WP3 and T4.1 defines conjointly a data model for data sources we use in this project.

- WP3 defines probes that will generate information/data used for predictive security.

- WP6 based on (ab)use cases documented in D2.1 will execute the scenarios to generate

data to be analysed

The output of T4.2 serves as support for the following:

- WP5/T5.1 defines risk assessment and mitigation services that can rely on the analytics

performed in WP4.

- WP3/T3.3 defines the intelligent data collection that can rely on predictive results to

adapt the probe configuration.

- WP3 will consolidate the data model jointly with T4.2 by modelling predictive algorithm

results.

- WP6 will rely on developed techniques for predictive security for testing and validation

purposes.

1.3 Document Structure This deliverable is structured as follow:

• Section 2 reviews the requirements defined in WP2 related to predictive security.

• Section 3 illustrates the predictive security processing pipeline and shows how it is

mapped to the logical components of the SecureIoT architecture.

• Section 4 describes the process mining and autoencoders. Theoretical background is

given as well as a practical evaluation. Conclusions are drawn regarding the requirements

of section 2.

• Section 5 lists OSS that can be used for the algorithms provided in T4.1 and T4.2.

• Section 6 describes a technique for cybersecurity predictive analytics using machine

learning and network analysis

• Section 7 gives a conclusion and introduces the next steps.

Page | 10




2 Requirements In deliverable D2.2, the analysis leads to the identification of particular requirements related to

each task. Four ones are provided for T4.2 and so are in the scope of this deliverable:

• R4.2.1: Data should be protected during processing by predictive algorithms

• R4.2.2: Predictive analytics must discover and predict threats and vulnerabilities in a timely, scalable, consistent and automated manner

• R4.2.3: Support of multiple prediction algorithms and models

• R4.2.4: Prediction algorithms should describe their constraints In this deliverable of task T4.2, we particularly aim to investigate different approaches and techniques (R4.2.3). In particular, we propose to closely work on deep learning and processing mining methods applied in an IoT context while also considering existing FIWARE enablers. For each of the proposed method, this deliverable will assess the following properties from R4.2.2:

• Prediction time: how long the prediction techniques can process the data to return valuable results such as predicting a threat. In this project, this cannot be only considered as the absolute time to execute the algorithm. Later, when being used in conjunction with mitigation, this time must be estimated regarding the time to properly trigger those counter-measures. Indeed, if a counter-measure is very simple and can be applied very fast, there is no need to have predictions on long-term horizon. Hence, we expect more concrete evaluation in subsequent deliverables produced in T4.2.

• Scalability: IoT data are very heterogeneous. A qualitative analysis of first datasets provided in SecureIoT was performed in D4.1. So, the scalability concerns both the volume and heterogeneity of provided data. We will thus carefully evaluate how our proposal can meet this requirement.

• Consistency: the predictions of the proposed algorithms and their triggered mitigations have to be consistent with the monitored system states and the presence of threats. The consistency will be measured using the available performance metrics for each proposed algorithm. For example, for process mining methods, we measure the precision, the fitness and the generalization of the inferred system models to assess their consistencies while predicting future states. Decisions based on the same rulesets and training conditions shall not be contradictory even if positioned on various levels of the SecureIoT architecture.

• Automated execution: in this deliverable, we will quantify the complexity of the configuration of the proposed techniques to evaluate the degree of human operations that are requested.

R4.2.3 is focused on the further integration of the algorithms. For each algorithm, we will thus specify what are the inputs, optional or required, and the constraints on these inputs if they exist. R4.2.1 is less coupled with the predictive techniques themselves but more related to the process that happens beforehand: the data collection. Therefore, we consider as a mandatory

Page | 11




requirement to take in account when fulfilling R4.2.4. Our techniques have to never request or force the collection of private data on their own.

Page | 12




3 Data analysis architecture and processing

pipeline

In D4.1, we provide a general overview about the processing pipeline. While D4.1 is focusing on

continuous security monitoring and knowledge inference, the pipeline is similar because the

major difference between the algorithms resides in their ability to characterize the current state

of a system (D4.1) or its future state(s) (D4.3).

Therefore, as inputs of predictive techniques, we can thus find again dynamic data collected from

a system in production, assets or static contextual data and external data. More details about

those are given in D4.1. However, in order to leverage the added value provided by the

algorithms defined in T4.1, another input for predictive techniques is the characterization of the

current state of the monitored system. It has the advantage to condense the information (rather

than relying on full raw data) and extract relevant indicators (for example using clustering

techniques).

Figure 1 refers to D2.4 that provides the architecture of the project. This figure is focusing on

WP4 integration and highlights how the analytics process will be integrated. In D4.1, we

summarized the different components of the architecture. The following are just a reminder as

their roles do not change for the predictive security:

• Dynamic data are provided by the Global Storage module. The latter is in charge of storing

all data sent by the monitoring probes of the live system. It will rely on the modelling described

in section 2.3.4.2.

• Context data are provided to the Contextualization Engine (CE): The role of this engine is

actually to fit the algorithms or pre-established learned models to a specific context, i.e. a

specific IoT deployment.

• External data are collected within IoT Security Knowledge Base (ISKB): This knowledge base

comprises external IoT security knowledge, including for example knowledge about known

threats, attacks, incidents and vulnerabilities.

However, for the others, there exist specificities related to predictive techniques that are

highlighted below.

• IoT Security Templates Extraction (ISTE): This module aims at creating models for both the

security monitoring (T4.1) and predictive security (T4.2). In this module, we will build the

different algorithms we defined in these two tasks. In the case of the predictive security, we

will find models capable of predicting the future states of the system, in particular if an attack

will occur or if it will be vulnerable and exposed to a threat.

• IoT Security Templates Database: Models previously built by the ISTE are stored in a

persistent manner in this database.

Page | 13




• Template Execution Engine (TEE): In the context of T4.2, the template execution engine will

thus build models to detect potential future attacks, threats or anomalies and will provide

this information to the Security Policy Enforcement Point (SPEP).

• Security Policy Enforcement Point (SPEP): It then turns the outputs of the predictive

algorithms into mitigation decisions to be applied through actuation such as modifying the

configurations of devices, deploying counter-measures, etc.

Figure 1: Anatomy of the Security Intelligence Layer

IoT Systems (Platforms &

Devices)

FieldNetwork

FieldDevice

Edge

Cloud

App Intelligent(Context-

Aware)Data

Collection

Actuation & Automation

Open APIs

IoT Security Template Extraction (Analytics)

Template Execution

Engine(e.g., Rule

Engine)

Global Storage(Cloud)

(SecureIoT Database + Probes Registry)

IoT Security Templates Database

Templates

ContextualizationEngine

IoT Security Knowledge Base

Security Policy Enforcement Point

WP4

Open APIs

WP3Management &

Configuration ToolsVisualization (Dashboards)

Page | 14




4 Predictive techniques for IoT Security 4.1 Prediction models using process mining We present in this section the process mining methods used to generate prediction models to

infer monitored system models and predict their deviations and anomalies. Process mining

methods have been widely used for the analysis and the building of workflow models for business

process management applications through the analysis of their event logs. However, in this

project we apply them in a new context to build predictive models for IoT systems to identify

their anomalies when they deviate from their expected states. To be applicable, we have to refine

the raw data we have in the project as process mining cannot interpret directly continuous values

as states. In that context, we detail the prior data processing required to transform raw data into

refined data interpretable by a process miner, the algorithms supporting the processing mining

activity and how they are used to generate behavioral models for prediction, and finally different

evaluation metrics.

4.1.1 Overview

The processing pipeline, detailed in Figure 2, summarizes the overall architecture of our

predictive approach for supporting IoT Security. It is composed of three main blocks

corresponding to the data preprocessing block, the process mining block and the prediction

block. The first two corresponds to the ISTE and the last one relates to TEE. This pipeline takes as

inputs raw data, that correspond to both training datasets used by the process miner or live

monitoring data used for prediction purposes from the behavioral models produced by the

process miner.

Figure 2: Process pipeline

During the training phase, the raw data have first to be transformed, during a data pre-processing

step, to generate refined data interpretable by the process miner. We are considering a

commonly-used process miner tool, called ProM, that requires an input file specified according

to the XML eXtensible Event Stream (XES) format [1] representing event logs. We have

considered in this deliverable, three datasets, provided by LuxAi, ISPRINT and IDIADA. that are

describing application data in the JSON format. We have mostly used the dataset provided by

Page | 15




ISPRINT, as an illustrative example. Once the raw data have been transformed into refined data

(XES file), they can be used by process mining algorithms to generate a behavioral model of the

observed system.

This behavioral model is formally expressed as a Petri net representing a discrete event model of

the system. This Petri net corresponds to a bipartite graph, i.e. its nodes can be split into two

disjoint and independent sets. The first set, containing the places (circles), represents the states

of the system. The second one, containing the transitions (boxes), corresponds to the events that

indicate a change of state. The Petri net also contains one or several token(s) that allow

transitions inside the graph.

Figure 3 Example of a Petri net

A simple example of a Petri net is given in Figure 3. In this figure, there are four places that are

noted P1, P2, P3 and P4, and two transitions that are noted T1 and T2. The events enable to cross

transitions. However, each input places of a given transition has to contain at least one token to

reach a new place. Once done, a transition may be crossed. As a consequence, tokens in the input

places are deleted and new tokens are built in the output places.

During the prediction phase, the raw data correspond to live monitoring data. They are also

transformed by the pre-processing step to build refined data. These refined data are then

compared by the prediction block to the behavioral models, in order to predict any abnormal

behaviors.

In the rest of the section, we will remind the methods used for supporting the data pre-processing

(DP), then detail the processing mining (PM) algorithms that are exploited by the process miner,

and overview different evaluation metrics with respect to the generated behavioral models.

4.1.2 Summary of the Data Processing

We will describe in this sub-section how the data pre-processing block generates refined data

(i.e. an interpretable XES file) from raw data. It relies on three sub-blocks, called respectively

feature selection, data normalization and data clustering, as depicted in Figure 4.

First of all, the states of the observed system have to be defined. A state is represented by a tuple

of features (𝑎, 𝑏, 𝑐 … ), where two tuples have to be strictly equal to correspond to the same

state. In particular, (𝑎1, 𝑏1, 𝑐1) and (𝑎2, 𝑏2, 𝑐2) correspond to the same state, if and only if 𝑎1 =

Page | 16




𝑎2, 𝑏1 = 𝑏2 and 𝑐1 = 𝑐2. However, this approach is inadequate to handle non-categorical

features.

Indeed, for continuous numerical attributes (such as temperature values), it does not make sense

to directly use these values. For example, in the file “bathroom_environment_it.json”, from the

ISPRINT dataset, there are so much different values for the temperature, the illuminance and the

humidity that every tuple (temperature, illuminance, humidity) would be unique, inducing a high

number of states without so the possibility to infer a real behavioural model.

Figure 4: Data pre-processing block

As a consequence, as depicted in Figure 4, continuous numerical data are processed by the data

normalization sub-block and clustering sub-block to be grouped into clusters, while the other

data (boolean and categorical ones) are directly used to generate the refined data set.

The data normalization sub-block relies on different techniques, such as min-max, z-score or

modified tanh normalization, that have been detailed in D4.1. The clustering sub-block then

permits to group continuous numerical data that share similarities into clusters corresponding to

single states. Different clustering algorithms, such as DBSCAN, K-Means and BIRCH, described in

D4.1, are considered in this work. They permit to reduce the number of states that characterize

the system, and to generate a refined data set exploitable by the process miner.

4.1.2 Process mining algorithms The next block of the process pipeline corresponds to the process mining (PM) algorithms, as

depicted in Figure 5. They permit to infer a behavioral model from the refined data (XES file). In

that context, we are considering two different PM algorithms: (1) the inductive process miner

algorithm, and (2) the transition system miner algorithm. They both lead to the building a Petri

net characterizing the observed system.

Page | 17




Figure 5: Process mining block

4.1.2.1 Inductive miner algorithm

The first process mining approach corresponds to the inductive miner algorithm [4] [5] [6].

Before describing its different steps, some important notions have to be introduced: the directly-

follow graph, the transition 𝜏, and the process tree. We will then illustrate its operations, with an

example used in [1] based on the following log 𝐿: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏, 𝑒, 𝑓 >50, < 𝑎, 𝑏, 𝑓, 𝑒 >100, <

𝑑, 𝑒, 𝑓 >100, < 𝑑, 𝑓, 𝑒 >100]

A directly-follow graph is built from the refined data (event log). It corresponds to a graph, where

each state corresponds to a state in the event log, and where a transition exists between two

states, only if this transition exists in the log. Thus, when a transition between two states 𝑆1 and

𝑆2 exists in the graph, then a trace in the form < ⋯ , 𝑆1, 𝑆2, … > was present in the refined data.

By using this mining algorithm, a transition 𝜏 may appear. This transition corresponds to a “silent

activity” [2], i.e. an activity that cannot be seen in the event log. However, it permits to reach a

new place in the Petri net. The Petri net generated by the inductive miner algorithm has a unique

beginning and ending places, and so to connect it completely, this silent activity may have to be

added.

A process tree is a compact and abstract graph that represents a Petri net. Each leaf is an activity,

a state here, and the nodes correspond to operators that describe activity interactions. There are

four operators: the exclusive choice ×, the sequential composition →, the loop ↺, and the

parallel composition ^. For example, the literal expression of the process tree corresponding to

the Petri net given in Figure 6 is the following one:

→ (× (↺ (→ (𝑎, 𝑏), 𝑐), 𝑑), ^(𝑒, 𝑓))

Figure 6 Petri net deduced from process tree

The different steps related to the inductive miner algorithm are detailed below:

Page | 18




1. Build the directly-follow graph from the refined data (event log). Figure 7

corresponds to the directly-follow graph obtained from the event log used to

generate the example given in Figure 6, where the edge weight between two

states corresponds to the number of times a transition between them has been

observed in the event log. In the graph, a green color will represent an input

transition and a red one an output transition.

Figure 7 Directly-follow graph obtained from refined data

2. Deduce a process tree from this graph. To do that, the algorithm searches the

most adequate operator to be used. Then, the set of activities is split into two

disjoint setsnoted 𝑠𝑒𝑡1 and 𝑠𝑒𝑡2. The log 𝐿 is consequently split into two sub-

logs 𝐿1 and 𝐿2, where only activities from 𝑠𝑒𝑡1 should be in 𝐿1, and only activities

from 𝑠𝑒𝑡2 should be in 𝐿2. A new cut is performed again on the obtained sub logs,

until each activity set contains only one element. Each step cuts the directly-follow

graph. The chosen operator corresponds to the one that leads to groups with the

highest number of nodes. The different operators to split the directly-follow graph

are illustrated in Figure 8.

Page | 19




Figure 8 Cuts of the directly-follow graph [4]

In the considered example, the first cut is illustrated in Figure 9, and corresponds

to a sequence operator. Consequently, the different activities of the

initial 𝑠𝑒𝑡: {𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓} are split into 𝑠𝑒𝑡1: {𝑎, 𝑏, 𝑐, 𝑑} and 𝑠𝑒𝑡2: {𝑒, 𝑓}, so the

initial log 𝐿: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏, 𝑒, 𝑓 >50, < 𝑎, 𝑏, 𝑓, 𝑒 >100, < 𝑑, 𝑒, 𝑓 >100, <

𝑑, 𝑓, 𝑒 >100] is split into 𝐿1: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏 >50, < 𝑎, 𝑏 >100, < 𝑑 >200]

and 𝐿2: [< 𝑒, 𝑓 >150, < 𝑓, 𝑒 >200].

Figure 9 Sequence cut of L

Let then consider 𝐿1. The algorithm selects an exclusive choice operator to split

this sub-log, as illustrated in Figure 10. The activities from 𝑠𝑒𝑡1: {𝑎, 𝑏, 𝑐, 𝑑} are split

into 𝑠𝑒𝑡3: {𝑎, 𝑏, 𝑐} and 𝑠𝑒𝑡4: {𝑑}. So, the log 𝐿1: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏 >50, <

𝑎, 𝑏 >100, < 𝑑 >200] is split into 𝐿3: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏 >50, < 𝑎, 𝑏 >100] and 𝐿4: [<

𝑑 >200]. The 𝑠𝑒𝑡4 contains only one activity, so we have found our first process

tree leaf. There is no need to try to split 𝐿4 anymore.

Page | 20




Figure 10 Exclusive choice cut of L1

The sub-log L3 is then considered. A loop operator is selected by the algorithm, as

depicted in Figure 11. The activities are split from set3:{a,b,c} to set5:{a,b} and

set6:{c} so the log L3:[<a,b,c,a,b>50,<a,b>100] is split into L5:[ <a,b>200] and L6:[

<c>50].

Figure 11 Loop cut of L3

The sub-log 𝐿5 is now considered. As shown in Figure 12, the sequence operator

is applied on this sub-log. The activities are therefore split from 𝑠𝑒𝑡5: {𝑎, 𝑏}

to 𝑠𝑒𝑡7: {𝑎} and 𝑠𝑒𝑡8: {𝑏}. Consequently, the log 𝐿5 is split from 𝐿5: [<

𝑎, 𝑏 >200] to 𝐿7: [ < 𝑎 >200] and 𝐿8: [ < 𝑏 >200].

Page | 21




Figure 12 Sequence cut of L5

The sub-log 𝐿2 is finally considered and split with the parallel operator, as shown

in Figure 13. The activities are thus split from 𝑠𝑒𝑡2: {𝑎, 𝑏} to 𝑠𝑒𝑡9: {𝑎}

and 𝑠𝑒𝑡10: {𝑏}, and the log 𝐿2 is split from 𝐿2: [< 𝑒, 𝑓 >150, < 𝑓, 𝑒 >200] to

and 𝐿9: [ < 𝑒 >350] and 𝐿10: [ < 𝑓 >350].

Figure 13 Parallel cut of L2

The complete process tree discovered by the algorithm is given by the literal

expression: → (× (↺ (→ (𝑎, 𝑏), 𝑐), 𝑑), ^(𝑒, 𝑓)).

3. Infer a Petri net characterizing the behavior of the observed system from the

process tree. In the considered example, the Petri net is detailed in Figure 6. Figure

14 represents how to transform every link between two elements (E1 and E2) of

the process tree literal expression into a Petri net.

Page | 22




Figure 14 How to transform part of the process tree

A filter may also be applied to remove edges that occur rarely. Technically, a noise threshold is

defined to keep only the most outgoing edges of each activity.

4.1.2.2 Transition system miner algorithm

The second process mining approach corresponds to the transition system miner algorithm [7].

A transition system is a graph (S, E, T) where S represents the states, E the events in the log and

T stands for the edges between states with the associated events.

The different steps of the transition system miner algorithm are described below:

1. Create the transition system. States are not explicitly given by the refined data

(event log). The definition of these states can therefore be parameterized. The

algorithm permits to take into account past and future, more or less distant, and

additional explicit knowledge to characterize a state. Figure 15 summarizes, for all

executions of every specific instance, i.e. for all traces, the four parameters related

to the algorithm, in order to characterize the states.

Page | 23




Figure 15 Parameters to characterize a state

In our scenarios, we cannot consider the future parameter, because our goal is to

determine the next steps from the behavioral model with only information from

the past. Figure 16 gives an example of a transition system, where a state is

inferred from a maximum of two previous ones of the observed system.

Figure 16 Transition system with states defined by two maximum previous ones

2. Apply techniques to simplify the transition system, such as deleting loops or

merging states that have the same inputs and/or outputs.

3. Build the Petri net from the transformed transition system, using the concept of

regions. Each minimal region found will be a place in the generated Petri net. For

a transition system TS = (S, E, T) and the S’ set containing some states from S. S’ is

a region when one of the following conditions is true, for each event e from E:

o There are S1 and S2 in S, such that S1 S2 goes in S’, i.e. S1 is not in S’ and

S2 is in S’, as illustrated in Figure 17.

Page | 24




Figure 17 Definition of a region S1 to S2 with S2 in S'

o There are S1 and S2 in S, such that S1 S2 goes out S’, i.e. S1 is in S’ and S2

is not in S’ (Figure 18).

Figure 18 Definition of a region S1 to S2 with S1 in S'

o There are S1 and S2 in S, such that S1 S2 does not cross S’, i.e. S1 is in S’

and S2 is in S’, or S1 is not in S’ and S2 is not in S’, as illustrated in Figure 19.

Figure 19 Definition of a region S1 to S2 with S1 and S2 (not) in S'

For our example, the Petri net generated by the transition system mining

algorithm using minimal regions is illustrated in Figure 20.

Page | 25




Figure 20 Petri net generated by the transition system mining algorithm

Once the model is built, it may be used with a new event log in order to predict an unusual

behaviour. Indeed, if while checking a log file, a deviation from the normal behaviour model can

be found then this deviation is considered as unexpected and potentially dangerous.

According to [3] and [4], these two previously described algorithms have three major differences.

First, their simplicity of use. Indeed, the Inductive Miner algorithm, and its variants, have fewer

parameters than the Transition System Miner. For each of them a slight change in one of the

parameters can highly modify the model found. Then, the metrics that can be focus on. For

example, it is possible to create a model with a perfect fitness by using the Inductive Miner but

it is much more difficult to improve the others metrics. For its part, the Transition System Miner

allows to focus on simplicity or generalization during its first and second step and on fitness or

precision during the third step. Finally, the last difference is their complexity. The principal

drawback of the Transition System Miner is its complexity that can be exponential with the

respect to the size of the log whereas the Inductive Miner is easily scalable.

4.1.4 Evaluation metrics

It is important to evaluate the performance of the behavioral models generated by PM

algorithms. First, the behavioral model and the corresponding event log have to be aligned using

a dedicated method. This alignment method can be described as follows. For each event of the

event log, if the same move (i.e. changing to one state to another one) can be done on the model

and the considered log, then this event is considered as synchronized. The alignment cost is

obtained by adding all the movement costs. A movement cost is equal to 0, when the model and

Page | 26




the log are synchronized, otherwise it is equal to 1. The goal of the alignment method is to find

the optimal alignment, i.e. the alignment with the minimal cost.

To have a more concrete idea about the alignment method, the Petri net given in Figure 6 is used as an example with a given trace < 𝒂, 𝒃, 𝒇, 𝒆 >. Table 1 Maximum cost alignment

provides a possible configuration for this alignment. The symbol “>>” means the events in the model and the event log are desynchronized, and then 1 is added to the alignment cost. The optimal cost alignment for this example is given by Table 2 Optimal cost alignment

Move Log a b f E 0 >> >> 0 0 >> 0

Move Model >> >> >> >> τ a b τ Τ e Τ

Table 1 Maximum cost alignment

Move Log 0 a B 0 0 f e 0

Move Model τ a B τ τ >> e τ

Table 2 Optimal cost alignment

Once the alignment is done, it is possible to quantify the performance of the behavioral models

with regard to the four metrics detailed below:

-Fitness: this metric indicates whether the generated model, noted P, can replay the

event log, noted L, in an accurate manner. The closer this metric is to 1, the more the

model is capable to replay the given log. The value of this metric is calculated with the

formula below.

𝐹𝑖𝑡𝑛𝑒𝑠𝑠(𝑃, 𝐿) = 1 − (𝑓𝑐𝑜𝑠𝑡(𝑃, 𝐿)

𝑀𝑜𝑣𝑒(𝐿) + 𝐿𝑒𝑛𝑔𝑡ℎ(𝐿) ∗ 𝑀𝑜𝑣𝑒(𝑃))

Fcost(P,L) is the optimal alignment cost between L and P, Move(L) is the total cost of

desynchronized moves on the log, Move(P) is the same total cost on the model, and

Length(L) is the number of events in the log. The denominator of the formula represents

the maximum possible value of the total alignment cost, when there is not a single

synchronized move between the log L and the model P in the optimal alignment.

-Generalization: this metric indicates whether the model P is general enough to include

behaviors that are not in the log L. The closer this metric is to 1, the more general the

model is (i.e. the more unknown behaviour can be played by the model P). Maximizing

this value allows to avoid overfitting, it allows the model to adapt to a new set of events.

Calculating and using the generalization metric is not trivial, because it refers to unseen

Page | 27




examples, and how the model can react to them. As given in [8], the objective is to

consider the relationship amongst the number of activities (noted w) leaving a state, the

number of time this state was visited (noted n), and the probability to discover an activity

not seen before the next time the given state is visited, noted pnew(w,n). The value of

this metric is calculated with the formula below.

𝐺𝑒𝑛𝑒𝑟𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛(𝑃, 𝐿) = 1 − (∑ 𝑝𝑛𝑒𝑤(𝑤, 𝑛)𝑒∈𝐿

𝐿𝑒𝑛𝑔ℎ𝑡(𝐿))

According to [9], the pnew(w,n) value can be estimated as follows:

𝑝𝑛𝑒𝑤(𝑤, 𝑛) = {𝑤(𝑤 + 1)

𝑛(𝑛 − 1)𝑖𝑓𝑛 ≥ 𝑤 + 2

1 𝑒𝑙𝑠𝑒

-Precision: this metric indicates whether the events in this log L follow strictly the model

P. The more this metric is close to 1, the less additional behaviours that are not described

in the log will be able to be played by the given model. Maximizing this value allows to

avoid underfitting, and so an over-generalization. The value of this metric is calculated

with the formula below.

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑃, 𝐿) = (1

𝐿𝑒𝑛𝑔𝑡ℎ(𝐿)) ∗ ∑ (

𝐹𝑖𝑟𝑒𝑑(𝑒)

𝐸𝑛𝑎𝑏𝑙𝑒𝑑(𝑒))

𝑒∈𝐿

S is the state right before executing the event e in the log. Fired(e) is the number of

activities of S already activated, Enabled(e) is the number of possible activity of S and

Length(L) is the number of events in the log.

-Simplicity: this metric characterizes whether the model is not complex. The value of this

metric can typically be given by the number of states and transitions of the model. At the

moment, this metric is not the most important. Indeed, by simplifying the behavioral

model, the fitness and precision will drop. Our goal is of course to build models that

describe accurately the observed system.

4.1.5 Application to SecureIoT preliminary dataset We will illustrate the application of the processing pipeline described in Figure 2, with the ISPRINT

dataset. The data included in iSprint dataset is from a simulation from CloudCare2U, a solution

for chronic disease patients to have a life as normal as possible. Information in the dataset is from

rooms sensors, like temperature and illuminance, or from particular devices used by the patients,

like heartbeat monitors (D4.1).

The principle of process mining algorithms is to work with different traces that record several

times the behavior of the observed system. For this data set, the JSON file containing application

data can easily be split, by considering that each recorded day stands for a new set of events and

therefore a new trace. Let us focus on the JSON file (“bathroom_environment_it.json”) to bring

Page | 28




out what is a normal behavior inside the bathroom using the sensors. In this file, each data point

is described as given in Figure 21.

{

"id":"2017-03-01T00:00:00.000Z",

"key":"2017-03-01T00:00:00.000Z",

"value":{

"rev":"1-03c148895758691766c23c94e915c6f6"

},

"doc":{

"_id":"2017-03-01T00:00:00.000Z",

"_rev":"1-03c148895758691766c23c94e915c6f6",

"LPG":0,

"door_open":true,

"illuminance":0.0,

"temperature":19.460684278090969,

"humidity":60.54275897145564,

"NG":0,

"CO":0,

"movement":false,

"timestamp":"2017-03-01T00:00:00.000Z"

}

} Figure 21 Description of a data point extracted from the dataset

Currently, the manual selection of features is a mandatory. In the “doc” part of the JSON file, all

the elements, except “_id” and “_rev”, are selected to be used by the process mining block. As

described above, normalization and clustering are performed on continuous numerical data, such

as Liquefied Petroleum GAS consumption (LPG), illuminance, temperature, humidity, Natural Gas

consumption (NG) and Carbon monoxide rate (CO). The goal is to find clusters, where these

numerical features share similarities, then each cluster is exploited as a categorical value to

define a given state in the Petri net.

After looking at the bathroom data, the behavior can be easily summarized as follow:

-During the day, illuminance rises then drops, probably because of the sun light.

-At some time periods, the temperature and humidity rise in a significant way, probably because

someone is using the bath.

Concerning the continuous numerical data processing, the z-score normalization and the K-

means clustering (with k=4) have been used for the first experiments. Once the data pre-

processing is done, the refined data (XES file) are produced. The file aggregates the values

obtained after clustering with the other features (boolean and categorical values). To facilitate

Page | 29




the execution of some algorithms, a new element, called “name”, is introduced to indicate each

feature that describes an event. An extract of the resulting file is given in Figure 22.

<trace>

<string key="concept:name" value="bathroom_environment_it.json" />

<event>

<string key="concept:clusterId" value="3" />

<float key="concept:LPG_bathroom_environment_it" value="0" />

<boolean key="concept:door_open_bathroom_environment_it" value="true" />

<float key="concept:illuminance_bathroom_environment_it"

value="0.9080663113006865" />

<float key="concept:temperature_bathroom_environment_it"

value="18.60439614860679" />

<float key="concept:humidity_bathroom_environment_it"

value="59.388833510963686" />

<float key="concept:NG_bathroom_environment_it" value="0" />

<float key="concept:CO_bathroom_environment_it" value="0" />

<boolean key="concept:movement_bathroom_environment_it" value="true" />

<date key="time:timestamp" value="2017-03-02T06:02:00.000Z" />

<string key="concept:name" value="0" />

</event>

…

</trace> Figure 22 Refined data (XES file)

From the refined data, we exploit the ProM miner to perform process mining algorithms and

obtain behavioral models. In particular, the model generated by the inductive miner algorithm

without a filter is given by Figure 23. Without filtering, each transition between two states in the

event log will be in the behavioral model. Each element in the log can therefore be replayed by

the model.

Figure 23 Behavioral model generated by the inductive mining algorithm (without filtering)

Page | 30




The obtained model is not really exploitable, it is too general and not precise enough. Indeed,

the future state is impossible to anticipate. By deleting the less relevant edges, i.e. by using a

noise threshold that avoid under- and over-fitting, the behavioral model, detailed in Figure 24,

seems to be more interesting. The zone A (low illuminance, temperature and humidity) and zone

B (medium illuminance, low temperature and humidity) show a globally similar use of the

bathroom. The principal difference is the illuminance value that is higher in the zone B. Moreover,

before reaching the zone C, corresponding to a person taking a bath, going to the zone B is a

mandatory. This means that a slight rising of the illuminance and an opened door are required to

go to this third zone.

Figure 24 Behavioural model generated by the inductive mining algorithm (with filtering, threshold = 0.07)

In order to evaluate the behavioral models, the events are replayed on these models to quantify

the fitness with respect to the event logs. Once the replay is done, the generalization, precision

and simplicity of the model are evaluated.

Considering how the precision and generalization metrics are calculated, it appears that it is

difficult to perform well with respect to both of them. Indeed, when clustering algorithms are

used to determine clusters, many activities can be activated from most of the states. This is the

reason why the precision can be low. To increase the precision, there are two ways. On one hand,

it is possible to highly reduce the number of possible activities (such as given Figure 25), but some

characteristics will be missed.

Page | 31




Figure 25 Behavioural model generated by the inductive mining algorithm (low number of activities, k=1)

On the other hand, we can highly increase the number of clusters to obtain almost one new

activity per event. In that case, the result will be more linear, as depicted by Figure 26, a

behavioral model with a high precision allows to easily predict the next state of a system. A model

with a lower precision will enable multiple activities, and it will be more difficult to know exactly

what the next step of the system will be. A synthesis of the results obtained with these different

configurations is given in Table 3.

Figure 26 Extract of the behavioural model generated by the inductive mining algorithm (high number of activities, k=100)

Fitness Precision Generalization Simplicity

k = 1

Threshold = 0 1 0.25 1 10 places

k = 4

Threshold = 0 1 0.06 1 3 places

k = 4

Threshold = 0.07 0.52 0.28 0.98 28 places

k = 100

Threshold = 0 1 0.31 0.83 200+ places

Table 3 Synthesis of performance metrics with different behavioural models

Page | 32




4.1.6 Requirement mapping

The tables below evaluate the process mining algorithms regarding properties specified by

requirements R4.2.2.

Table 4 evaluates the performance of the inductive miner algorithm.

Prediction time The generation of behavioral models (Petri net) is relatively fast, a model is expected to be obtained in less than 30 seconds [5], but its evaluation may take more time depending on the model complexity. A behavioral model with a high generalization may be evaluated with difficulties due to the huge amount of possibilities. The generation and the evaluation of the Petri net can be done in a few hundred milliseconds with simple models.

Scalability The article [5] highlights the scalability of a variant of the inductive miner algorithm for discovering processes. It was tested on a complex input XES file (77 513 traces, 358 278 events and 3 300 activities), and the result was given in less than 30 seconds.

Consistency Not applicable at this stage of the project since we do not use the model to predict (the attacks) yet.

Automated execution Currently, the process mining part of the pipeline is not automated. The user has to select the inputs and adjust parameters himself. A script will be developed to build and evaluate models in a more automated manner.

Table 4 Inductive Miner Performance

Table 5 evaluates the performance of the transition system miner algorithm.

Prediction time The generation of behavioral models (Petri net) seems to be slower than the generation with the inductive miner algorithm. In the article [4], the complexity of the algorithm in the worst-case is said to be exponential with respect to the size of the log.

Scalability A huge amount of states and activities might be a problem with this algorithm. To find the behavioral model that best fits the log, the fitness has to be evaluated for each possible final marking. The complexity of this algorithm can be exponential with respect to the size of the log. Therefore, too complex input files should be avoided.

Consistency Not applicable at this stage of the project since we do not use the model to predict (the attacks) yet.

Automated execution The algorithm requires to know the final state of the model. As for the inductive mining algorithm, a script has to be developed to evaluate efficiently the fitness and find the models that describe adequately the system in an automated manner.

Page | 33




Table 5 Transition Miner Performance

Following the requirement R4.2.3, we define below the different inputs (constraints) of the

considered process mining algorithms.

Inputs related to the inductive miner algorithm:

Data: Events log (XES file)

Event log (XES format) from the considered system

Parameter: Variant Variant of the inductive mining algorithm to be used

Parameter: Noise Threshold

The value of the filter parameter that allows to only keep the most relevant outgoing edges of each activity.

Parameter: Event classifiers to consider

Features to be used as labels in the model that will be found.

Inputs related to the transition system miner algorithm:

Data: Events log (XES file)

Event log (XES format) from the considered system

Parameter: Backward and/or forward key

Backward and/or forward key classifiers to represent a state

Parameter: Collection type

Structure type of data to be used for defining states

Parameter: Transition system size limit

Number of elements to define a state

Parameter: Threshold states

Threshold to indicate how many of the selected states should be matched by any trace

Parameter: Threshold transitions

Threshold to indicate how many of the selected transitions should be matched by any trace

Parameter: Post mining conversions

Post-mining conversions that may have to be applied

4.2 Deep learning

4.2.1 Overview

Deep learning is a class of machine learning graph algorithms where each base input feature

vector is classified, clustered, or otherwise modeled in a multi-stage process that performs both

higher-order feature extraction and class fitting in a unified way. The defining characteristic is

that the entire process is simultaneously fitted on the data; there is no need for a pre-designed

feature extraction or preprocessing step. Deep learning models are further divided into

numerous sub-classes based on their topology, such as Recurrent Neural Networks (RNNs), Deep

Belief Networks (DBNs), and Convolutional Neural Networks (CNNs) [6].

Page | 34




4.2.2 Variational Autoencoders

In the context of IoT, a deep learning method that has been found to work well, especially for

systematic faults and intrusions, is the Variational Autoencoder (VAE) [7]. This deep learning

structure has multiple hidden layers that first reduce the dimensionality of the input, and then

scale up to reproduce it (see Figure 27). It is conceptually separated into the encoder (the initial

layers that perform dimensionality reduction) and the decoder (the output layers that recreate

the output) which both are part of the Template Execution Engine of the SecureIoT architecture

(see Figure 1). In the variational approach, the code layer includes an adaptive probability

distribution for the latent variable.

Figure 27 An example of a short VAE, based on [7]

The VAE has two important characteristics with regard to its predictive characteristics:

• It is generative, i.e. it estimates the joint distribution 𝑃(𝑥, 𝑦) of inputs and outputs (in

contrast to discriminative models that estimate the posterior distribution 𝑃(𝑦|𝑥). This

allows us to better understand the latent clustering of the data and use the model in

contexts with scarce labeled data (intrusion data in our context). Any generative model

can be turned into a discriminative one using the Bayes rule 𝑃(𝑦|𝑥) = 𝑃(𝑥,𝑦)

𝑃(𝑥).

• It can be trained in both a supervised and an unsupervised manner simultaneously, by

incorporating label data in the loss function when available. This allows us to train the

VAE with the full extent of our datasets [8].

Page | 35




The compressive properties of autoencoders often lead to meaningful clustering, which can be

used for predictive analytics. Moreover, if we treat future measurements as missing features, a

prediction problem can be solved as a feature reconstruction problem (see [8] for a relevant

example).

4.2.3 VAE Architecture A variational autoencoder, being a deep learning model, processes the input in multiple layers.

The first layer is always the input layer, which forwards each input scalar into subsequent nodes.

VAEs are feed-forward models, which means that the input layer has to account for all the

relevant inputs for a single output. So, in a sequence of 5 consecutive measurements of 5

different sensors we need an input layer of length 25 (as is the case for the CAN application

dataset).

Next is the value normalization layer. Each input scalar is offset by its minimum value and scaled

down by its new maximum value so that it is always in the range from 0 to 1. This helps numerical

stability and the multi-dimensionality of the model. Without normalization, models tend to be

biased towards inputs that naturally take large values. The length of this layer is equal to the

input layer.

The encoder part is formed by one or more fully connected layers. Each node in a fully connected

layer receives an input from every node of the previous layer (plus a constant bias input),

multiplies it by a learned weight, sums the inputs, and feeds the sum to a non-linear activation

function (see Figure 28). The activation function of the node helps the overall model learn the

non-linearities of the process. The output of the node is the result of the activation function.

Models that perform classification tend to favor sigmoid functions, while regression problems

(such as ours) tend to use the rectifier linear unit function (see Figure 29).

Figure 28 The general fully connected node (perceptron)

Page | 36




Figure 29 The rectifier linear unit function (ReLU) [9]

In the VAE, the consecutive fully connected layers decrease in length, eventually down to the

desired code dimension.

In the variational approach, a latent Gaussian probability distribution is formed from the code,

using a custom layer with this format:

1. Two fully connected layers in parallel. The first produces the mean of the distribution, and

the second the variance.

2. A sampling layer that follows the previous two, and samples a Gaussian distribution with

the parameters it received.

The probability distribution is generally multi-dimensional. In our application, 2D and 3D latent

representations work well.

The decoder has the same number of fully connected layers as the encoder, but in reverse; the

length of the layers starts from the code dimension and scales up to the number of output nodes

(which here is the same as the input nodes). The same ReLU activation function is used.

Finally, a de-normalization layer is applied to the outputs, to bring them back to their original

range.

4.2.4 VAE Training Training a variational autoencoder is efficient, as it follows the conventions of training any deep

learning model. There are four steps to designing a training process:

1. Preparation of the dataset

2. Choosing a batch size

3. Choosing an optimizer

Page | 37




4. Choosing a loss function

To prepare the dataset, we form sequences of measurements according to the model we want

to train. For example, in a model that takes a sequence of 5 input vectors, we concatenate each

vector in the dataset with its neighbors, so that every vector eventually appears in 5 different

sequences, at a different position each time. Then the dataset is shuffled, and 20% is removed

and reserved for testing.

The batch size is the number of samples that are passed into the model in parallel. A high batch

size leads to faster training but can lead to reduced accuracy [10]. We use a safe batch size of 128

samples.

The optimizer is the learning algorithm that updates the weights of each node in the model [11].

We use the Nadam optimizer (Nesterov-accelerated Adaptive Moment Estimation), which is the

Adam (Adaptive Moment Estimation) optimizer that incorporates Nesterov momentum [12]. This

algorithm assigns a different learning rate to each parameter and keeps track of recent training

updates to better guide the process and mitigate noise.

The important innovation that variational autoencoders provide is the loss function, which is

developed in terms of the encoder/decoder/code separation:

𝑙𝑖 = −𝐸 [log (𝑝𝜑(𝑥𝑖|𝑧))] + 𝐾𝐿(𝑞𝜃(𝑧|𝑥𝑖)||𝑁(𝑧))

where:

• 𝑥𝑖 is the input with index 𝑖.

• 𝑧 is the latent representation of 𝑥𝑖, i.e. the code that the encoder produces.

• 𝑝𝜑(𝑥|𝑧) is the distribution of the decoder, given a certain code.

• 𝑞𝜃(𝑧|𝑥) is the distribution of the encoder, given a certain input.

• 𝑁(𝑧) is a distribution we define, usually a unit Gaussian 𝑁(0,1).

• 𝐾𝐿(𝑃||𝑄) is the Kullback-Leibler divergence. It measures how much information is lost

when using distribution 𝑃 to represent another distribution 𝑄.

The first term of the loss function is the expected log-likelihood that the decoder reconstructs

the input accurately. It is zero for a perfect reconstruction and grows as the model becomes less

accurate. We need to use a practical substitute for performance and numerical stability, as is the

root of the mean absolute error.

The second term acts as a regularizer. Normally, the encoder can learn very localized, essentially

meaningless representations, leading to overfitting. However, the Kullback-Leibler divergence

will grow as the encoder’s distribution becomes more complex and drifts away from our base

distribution 𝑁(𝑧).

Page | 38




As such, minimizing this loss function leads to an accurate model via the first term, and mitigates

overfitting and keeps latent representations meaningful via the second term.

4.2.5 Application to SecureIoT Use Cases/Results

We have trained variational autoencoders of various complexities and successfully applied them

to available datasets.

For the IDIADA CAN and V2X datasets (as analyzed in D4.1), we provide models that receive a

sequence of 5 input vectors (which leads to 25 input scalars for CAN and 35 for V2X), a total of

20 fully connected layers, and two-dimensional encodings and latent variables.

For classifying a given sequence of inputs, we use the encoding part of the autoencoder and

review its latent representation. Normal operation data concentrate on a structure (here, a line)

that is disconnected from outliers. While outliers are potentially problematic and may indicate a

fault or an attack (see Figure 30).

Figure 30 The latent representations of our testing data (CAN dataset)

Applying the model to the testing data yields insights that are consistent with our expectations.

For the CAN dataset, looking at good examples inside the cluster, we can see low steering angles,

with RPM, speed, and throttle consistent with each other, no braking or braking at low speeds,

Page | 39




sharp steering only at low or average speeds, etc. (see Table 6). As divergent examples tending

to leave the cluster, we can see sharp brakes on high speeds, high RPM but low throttle, high

values in general including sharp steering and high throttle and RPM but low speed.

Random samples of normal operation data (blue, average of the 5 inputs)

Brake RPM Speed Steering Throttle Mean_x Mean_y Description

0 2363 48.10 -112.3 20.5 0.0056 -0.1611

Driving and

steering at

low/average speed

0 3220 94.49 3.3 31.3 0.0046 -0.5357 Driving on a

straight line

0 4461 131.71 3.6 48.8 0.0031 -1.0762 Accelerating on a

straight line

Random samples of outlier data (red, average of the 5 inputs)

Brake RPM Speed Steering Throttle Mean_x Mean_y Description

0 1180 3.88 5.72 23.4 -0.0299 0.4664

Speed/RPM too

low for current

throttle

0 1000 72.97 -167.8 0.00 -0.129 1.1765 Sharp steering on

high speed

210 1000 88.91 1.2 0.00 -0.0134 1.1983 Sharp brakes on

high speed

Table 6 Samples from Figure 30

4.2.6 Requirement mapping Table 7 evaluates the algorithm regarding properties requires by requirement R4.2.2

Prediction time Currently, most of the time is spent on initializing the algorithm, i.e.

loading the data and the model and compiling the model for the

target platform. On an Intel Core-i7 4771 CPU and an nVidia Geforce

GTX 770 GPU, initialization takes several seconds, while evaluation of

the data currently takes 80ns for the proposed model.

Scalability As it stands, the number of parameters of a model of a given depth

scales linearly with the dimensionality of the input. As such, so do the

training and evaluation times.

Page | 40




Consistency There is necessarily a stochastic element in training a deep learning

model, which is further increased by using a variational approach.

Nevertheless, a fully trained encoder will always output the same

probabilistic parameters for the same input.

Automated execution The current implementation requires a script that will invoke the

model with the correct parameters and input files. However, the

parameters do not change between invocations and the input file

may or may not change, depending on the measurement storage

process.

Table 7 Algorithms properties evaluation

Following requirement R4.2.3, we defined in Table 8 and Table 9 the different inputs (constraints)

on our algorithms:

Action Model training

Data Required JSON log file with the entire dataset

Data Type The set of data columns that contain measurements (for datasets

that have not been accounted for)

Desired model format defined by: code dimension, latent variable dimension, model

depth, length of input sequence

Table 8 Inputs for training Variational Autoencoders

Action Model execution

Data Required JSON log file with the sequence of data to be evaluated

Data Type Model parameter file, as produced by the training process

Desired model

format

defined by: code dimension, latent variable dimension, model depth,

length of input sequence

Table 9 Inputs for executing Variational Autoencoders

4.2.7 Code Availability & User Manual The repository containing the code to train and run the deep learning models used to identify

attacks and faults in the context of SecureIoT can be found at SecureIoT GitLab1. Information for

runtime requirements, training, running and testing the deep learning models can be found

below.

1 https://gitlab.atosresearch.eu/dashboard

https://gitlab.atosresearch.eu/dashboard

Page | 41




Prerequisites

This project uses Python 32 and the Pipenv3 package manager to handle dependencies. Currently,

the only top-level dependency is plaidml-keras4, the cross-platform deep learning framework we

use. It is still dependent on very specific versions of its dependencies, so a virtual environment

handled by pipenv is recommended.

• To install pipenv:

pip install pipenv

• Then, to install the dependencies in a virtual environment:

pipenv install

• If you don’t want to wait for dependency locking:

pipenv install --skip-lock

Training

To train a model based on the CAN or V2X application dataset, use train.py from pipenv. For

example:

pipenv run python train.py --can path_to_can_log --format 2 2 10 5

The --format argument defines the architecture of the model, run

pipenv run python train.py -h

for details.

The training runs indefinitely and automatically saves your model every 500 epochs. Press Ctrl-C

to interrupt.

If you want to train a model on a different dataset:

1. Create a class that inherits from models.DenseCAN

2. Override get_filtered_data() based on what are your measurement fields.

3. Override get_description() to give a different name to your model files.

4. If you want to use the same entry point, add a command line argument and calling code

to train.py (and run.py)

2 https://www.python.org/download/releases/3.0/ 3 https://pipenv.readthedocs.io/en/latest/ 4 https://pypi.org/project/plaidml-keras/

http://train.py/

http://train.py/

http://run.py/

https://www.python.org/download/releases/3.0/

https://pipenv.readthedocs.io/en/latest/

https://pypi.org/project/plaidml-keras/

Page | 42




See how DenseV2X is handled for an example.

Run/Test

To run your model on new data, use run.py. For example:

pipenv run python run.py --model encoder_can_2_2_10_5.h5 --format 2 2 10

5 --can test_can.log

Example output:

Outputs:

[[0.40297362 0.13062605]]

Normal operation

The actual numbers will vary.

Other Info

You may receive a warning that the model is not compiled. It is actually compiled, we are just

using the encoding portion of the autoencoder, which does not have separate compilation

metadata.

On training a new model, you may get NaNs as the losses of the first epoch and beyond. This is a

non-deterministic, hardware-dependent issue where the very first gradients lead to numerical

instability. It is recommended that you just rerun the training command a few times and/or try a

different architecture (–format).

http://run.py/

Page | 43




5 Zinrai AI services 5.1 Description Fujitsu systematized its set of AI-related technologies, products and services as Fujitsu Human

Centric AI Zinrai, and has been offering products and services. At the same time, Fujitsu has

worked with customers in co-creation efforts and field trials due to numerous demands in

utilizing AI.

Among other fields, Fujitsu Zinrai targets at social infrastructure, maintenance and

manufacturing providing especially image recognition technologies from Fujitsu. Zinrai

encompasses standardized platform services and the application of Standard elemental AI

frameworks for specific solutions

The API of Zinrai Platform Service (PaaS) is a programming interface group prepared for

customers to introduce Fujitsu AI technology easily and quickly using standardized building blocks

using the REST API. APIs are classified according to elemental technologies of AI, and APIs that

combine elemental technologies according to usage scenes, and so on. Among the usage

scenarios are:

• Smart Document Handling

o Handwritten character recognition

o Document Translation

o Large Document Knowledge extraction

o Document Knowledge information search

• Image Recognition

• Semantic Search

o Semantic Search by field of Specialization

(advanced document search)

o FAQ Search

o Company Information Search

• Sound Analysis

o Soundtext

o Speech synthesis

o Natural Sentence Analysis

• Emotion recognition

• Optimal Candidate Selection

• Smart Home control

Page | 44




Those services usually are enriched by open elemental AI tools, e.g. Keras, Tensorflow, OpenPose,

h2o, which are applied either dedicatedly in the Zinrai context using a cloud service (PaaS) or on

premise. This way, Zinrai is a best of breed approach utilizing the most suitable technologies

concerning the specific problem in the Zinrai tool set. Therefore, a wide area of tools bringing

their specific environments and APIs regularly extend the Zinrai tool box.

Among those tools FUJITSU recommends:

• Clustering: scikit-learn

• Normalization: scipy, scikit, numpy

• Topological Data Analysis: scikit-tda

• Deep Learning: TensorFlow

5.2 Potential application to SecureIoT scenarios and datasets A major objective applying a standardized tool set based on services of the Zinrai platform for

detecting anomalies by SecureIoT was utilizing available building blocks for fast development.

However, recent observations illustrate the limitations of this kind of building block approach in

terms of flexibility and accuracy. The application needs to be mapped to the defined usage

scenarios and those modules lack of training to the specific problem beyond their scope.

Moreover, in the last 2 years, there has been a shift in the AI market towards a commoditization

of available AI frameworks. Due to the resulting high investment requirements for effective

solutions, existing OSS Machine Learning Frameworks have become relatively easy to use and

results with high precision can be obtained for almost any use case. Various manufacturers, e.g.

Facebook, Microsoft and Google also provide publicly available AI frameworks, which are

adapted to their background systems.

Based on this dynamic landscape, it currently makes sense to adapt the approach and utilize

common open AI frameworks them for their suitability and feasibility in the SECaaS services. Due

to the simplified deployment, applicability and the improved results in nowadays publicly

available AI frameworks Fujitsu regularly recommends the application of Keras in combination

with the latest TensorFlow version in AI related projects for rapid development. This combination

has demonstrated its comprehensive applicability and its suitability for a simple and rapid

development of solutions for a wide range of models.

Beyond this we see a promising development for rapid prototyping by the h2020

frameworkframeworkh20 framework, which may be assessed in the project.

5.2.3 Requirement mapping The table 5 below evaluates the algorithm regarding properties requires by requirement R4.2.2

Prediction time Prediction time depends on the selected model and the underlying

infrastructure. Considering Tensorflow, it supports optimal

performance by recommendations to the developer. Moreover, major

Page | 45




library providers like Intel have optimized their code for TensorFlow

application.

Scalability Thanks to the flexible architecture of TensorFlow, users can deploy

computation to one or more CPUs or GPUs in a desktop, server, or

mobile device with a single API. In a distributed TensorFlow work

process, it uses gRPC to connect between different nodes. However,

when deploying training tasks on high performance computing

clusters, the performance of gRPC becomes a bottleneck of

distributed TensorFlow system.

Hence, specific attention should be paid in the implementation of

TensorFlow and the design of the models for optimal scalability.

Consistency Consistency of results over various levels and systems is still a

challenge today and subject of research. It severely depends of

training conditions taking the various levels of the SecureIoT

architecture into account.

Automated execution A number of available PaaS implementations of TensorFlow illustrate

the capabilities of automating operations. For instance, tools like

Kubernetes and docker support the implementation of a TensorFlow

PaaS enabling scalable automated operations.

Table 10 Mapping of SecureIoT Requirements to TensorFlow

Following requirement R4.2.3, we defined below the different inputs (constraints) or our

algorithms.

Supported algorithms

and models

E.g. TensorFlow as an elemental technology support deliberate

algorithms and models.

Table 11 Supported Algorithms and Models

Page | 46




6 Network traffic anomaly detection The world of IoT is growing each year in different ways. On the one hand is changing the way

industry works, by being implemented in several scenarios and facilitating their work. On the

other hand, a great amount of data is being obtained from the devices, which was not even

dreamed some time ago. A single device, like a mobile phone, can manage and work with many

critical data that need to be protected. This paradigm has a critical threat in the form of security

of the system, as IoT devices are a very attractive objective. Still, with the change and evolving of

technologies (and threats) cybersecurity must shift from a reactive approach to a proactive

approach, understanding the threats before an attack can damage the system.

In this sense, predictive analytics is proving to be a potent solution to security. It enables IT

security departments to detect breaches and attacks in early stages, thus giving enough time to

take appropriate cybersecurity measures. This is done by identifying anomalies in known

behavior patterns and fortifying the cybersecurity infrastructure. It can analyze huge volumes of

data (including past) in order to understand the cause-effect patterns and provide information

about the sources of threats, probabilities, etc.

We propose to use a solution for network traffic anomaly detection based on unsupervised

machine learning algorithms that is able to identify anomalies up to a one-second interval. It is

able also to process large quantities of data, which is very relevant to the IoT domain, where

many devices manage and work with data. Our solution is generic and is aimed to be applicable

to different IoT platforms, such as FIWARE. Following we present more in detail our solution and

the planned strategy for application.

6.1 Live Anomaly Detection System using Machine Learning Methods (L-

ADS) The L-ADS uses a predictive strategy based on non-supervised methods of machine learning in

order to model automatically the behavior of users and/or applications and be able to detect

anomalies or significant deviations based in this behavior pattern. The analysis of the behavior is

based in the processed network traffic of the IoT devices.

The anomaly detection sensor could use two different algorithms for identifying anomalies: i)

one-class Support Vector Machine (SVM) and ii) Isolation Forest. These two non-supervised

algorithms evaluate the network traffic of the IoT devices (in particular, the head of the packages)

using evaluating factors such as, among others, number of connections between different

devices in the system under monitoring, connections and traffic between the ports used by the

devices, duration of the connections between the devices (or devices and a system), size of the

packages sent and received, origin and destination of the connections, etc. Any anomaly or

significant deviation in any of the parameters used for detection of anomalous behavior is

identified as an incident and an alarm is created for its analysis. We plan to evaluate which

Page | 47




algorithm fits better with the needs, requirements and performance constraints of the IoT

devices.

Additionally, we will use other metrics (e.g. accuracy, precision, recall) for evaluating the

percentage of false positives of the machine learning algorithms we use. The final goal of this

task is to better predict the false positives when we combine these solutions with more

traditional tools for predictive analytics such as heuristics and data signatures for detecting

anomalies or even information coming from different sources such as Open-Source Intelligence

(OSINT).

6.2 Architecture of the planned solution The architecture of the Live Anomaly Detection System (L-ADS) is composed of different modules

with specific functionalities such as data gathering, training of the machine learning models, etc.

Figure 31 shows a high-level representation of the modules, input and output, and the integration

with the different components of the SecureIoT solution from an architectural point of view.

Figure 31:High-level description of the L-ADS

The input of the L-ADS is the data of the network of the IoT devices after their headers are

analyzed. This information is received by our solution and processed for anomalies. The result of

the analysis is then provided together with the generated alerts. Following we provide a

description of each of the components of the architecture more in detail together with the

mapping to the SecureIoT architecture described in Section 3:

Page | 48




• Data preparation: it analyzes and prepares the input datasets for the training and prediction

process according to the features requested. Traffic is collected from the IoT devices (after

the initial analysis done by an application for the headers) in real-time and then formatted

and prepared for the next steps of the system. This initial formatting only prepares the data

for the analysis, as it already comes prepared from the real data under monitoring (headers

of the packages).

• Data analysis: this component is a multi-threaded socket-based server capable of listening to

multiple connections simultaneously. The data is analyzed using different strategies and sent

to the prediction service analyzer. Additionally, in case of training for the system (initialization

step), this component communicates with the training component using legitimate datasets

of the systems under monitoring.

• Training: this component uses machine learning to make predictions over the captured data.

The training uses a predefined time window that can vary according to the size of the data to

be analyzed. When the training process is under way, the dataset used for it is used with a

Principle Component Analysis (PCA). This component is in charge of reducing the

dimensionality of the dataset while keeping approximately 99% of the data.

• Storage of training datasets and models: it is used for storing and managing two different

data: processed training data and trained models. On the one hand the processed training

data is used as a representation of captured traffic, used for identifying malicious behavior.

On the other hand, the trained models are one-class SVM, modeling under a specific IP and

trained on the previously processed captured. Additionally, the component acts as a

temporary database for traffic captured in the system. The refresh interval can be modified

to better fit the needs of each specific system (e.g. 1 second, 5 seconds, 30 seconds, etc.)

• Configuration: the configuration component is used for managing the way the component

work and any variable that is used for the processing, data acquisition, output, etc.

• Prediction service and alerts: this component is in charge of analyzing the result of the data

analysis component and identify it as normal or anomalous. The result is based on the training

performed at the beginning of the process. It provides the output of the network-based

anomaly detector. Additionally, it generates alarms if the data processed are identified as

critical, which is also based on the learnt behavior models.

Additionally, the input of the system comes from the packets of the IoT devices and prepared in

order to compile and provide data of the required elements. The output of the solution (alarms

and traffic analysis) is provided to SecureIoT for storing in the database, further analysis or

showing to the users of the IoT devices under monitoring.

Page | 49




6.3 Integration in IoT platforms We are working currently in using this approach with deployments of IoT systems based on

FIWARE (Figure 32). We will recollect the data of the IoT systems compiled by the data probes of

SecureIoT and use it as input for analysis of malicious behavior. This will allow us to make

cybersecurity predictive analysis of the system and alert of possible threats before they can

impact the system.

Figure 32:. Use of L-ADS in FIWARE-aware SecureIoT devices

Although in a preliminary stage we think this approach will allow us to be able to work with IoT

devices implemented with FIWARE or other IoT platforms, just adapting the input for the data

probes of each system.

Application in use cases

Currently we plan to use this approach in the connected cars use case, which will be of many

benefit to us due to the large quantity of data exchanged and the needs for fast response times

due to the criticality of the system. In a first evaluation we did the cybersecurity requirements of

the use case fits with the benefits of this approach, both in terms of response and reaction times.

Regarding the other use cases we are evaluating its use in them.

Requirement mapping

Following, we present information of how this approach fits the properties specified by the

requirements of R4.2.2.

Prediction time The generation of the training models depends on how much data is provided. A usual batch of information we use in the testing/evaluation process of this tool (approximately 6000 register) takes approximately 10 minutes. On the other hand, the evaluation and response of the analytics is immediate. Of course, the training time depends on the number of features to be considered for evaluation.

Scalability Due to how it works, each instance of the L-ADS must be deployed in a network node because of accessibility to it. Therefore, no issues so far with scalability

Page | 50




Consistency Accuracy is good according to the testing we performed but we are working in improving it using several different models and sets of features

Automated execution The configuration of the entry point (packets) is manual but the rest of the processes are automatic. Providing data for the training is also manual and needs to be generated before the monitoring and evaluation of traffic can be done. The alarms and events generated are provided also in an automatic way, which can be then used for dashboards, reports, user evaluation, etc.

Additionally, and following requirement R4.2.3:

Action Model training and data analysis

Data Required Netflow data of the analyzed datasets

Data Type Different features according to what we want to analyze

Desired model format Refining for a more mature version

Page | 51




7 Conclusion In this deliverable, we mainly introduced three types of algorithms to support predictive security:

PM, VAE and unsupervised learning. From Zinrai services, we also identified OSS that may be used

to support the implementation of the proposed approach and network predictive analytics. Our

objective in this first deliverable of T4.2 is to bring the description of the mentioned algorithms

with preliminary experiments to show their expected work, performance and usability.

There will be two iterations of this work, with the next one planned on M19. Until this date, we

summarize here the next actions. First, we will define a data model for the outcomes of the

predictive algorithms. It will be done in cooperation with WP3 and by extending the model

defined in D4.1. Second, in relation with WP6, the objective is now to gather larger datasets that

includes valid data, attacks and malicious behavior as documented by WP2 (D2.1) to provide

more concrete results. Prototype developments will be shifted from standalone versions as they

are now to integrated versions in the SecureIoT architecture, using the interfaces to access data.

This implementation will be done using the GitLab of the project to ensure continuous

integration. In relation to WP5, we will particularly synchronize our activities with the risk

assessment and mitigation services (T5.1). Intelligent data-collection in WP3 would also need to

integrate prediction outcomes that will thus require synchronization. We expect M19 to have a

first integrated version of the prototypes for predictive security. From a research point of view,

the proposed techniques in this deliverable will continue to be refined according to results

obtained with other datasets and given scenarios. For instance, learned clusters thanks to VAE or

behavioral models from PM need to be refined afterwards to support a full predictive security.

Similar with the training models for the L-ADS from the use cases that plan to use these solutions.

Different approaches can be considered depending on cases (supervised vs. unsupervised) but

methods need to properly set to interpret the results, such as when replaying an unknown event

log with a built PM model.

Page | 52




References

[1] E. V. CW Günther, Xes standard definition, Fluxicon Process Laboratories, 2009.

[2] L. &. T. S. Latha, "Efficient approach to normalization of multimodal biometric scores,"

International Journal of Computer Applications, pp. 57-64, 2011.

[3] P. R. E. R. W. S. F.R. Hampel, Robust Statistics: The Approach Based on Influence

Functions, New York: Wiley, 1986.

[4] D. F. W. M. v. d. A. Sander JJ Leemans, "Discovering block-structured process models from

event logs-a constructive approach," in International conference on applications and

theory of Petri nets and concurrency, Berlin, Heidelberg, 2013.


event logs containing infrequent behaviour," in International conference on business

process management, Champ, 2013.


incomplete event logs," in International Conference on Applications and Theory of Petri

Nets and Concurrency, Champ, 2014.

[7] W. M. R. V. v. D. B. F. K. E. &. G. C. W. van der Aalst, "Process mining: A two-step approach

using transition systems and regions," BPM Center Report BPM-06-30, BPMcenter. org,

2006.

[8] A. A. B. v. D. Wil Van der Aalst, "Replaying history on process models for conformance

checking and performance analysis," Wiley Interdisciplinary Reviews: Data Mining and

Knowledge Discovery, pp. 182-192, 2012.

[9] C. G. E. &. K. A. R. Boender, "A Bayesian analysis of the number of cells of a multinomial

distribution," The Statistician, pp. 240-248, 1983.

[10] D. Li and D. Yu, "Deep Learning Methods and Applications," in Foundations and Trends in

Signal Processing, Volume 7, Issues 3-4, 2014, pp. 197-387.

[11] M. Mohammadi, A. Al-Fuqaha, S. Sorour and M. Guizani, "Deep Learning for IoT Big Data

and Streaming Analytics: A Survey," IEEE Communications Surveys Tutorials, 2018.

[12] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas and J. Lloret, "Conditional Variational

Autoencoder for Prediction and Feature Recovery Applied to Intrusion Detection in IoT,"

Sensors (Basel), 26 August 2017.

Page | 53




[13] [Online]. Available: https://commons.wikimedia.org/wiki/File:Perceptron.svg.

[14] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy and P. T. P. Tang, "On Large-Batch

Training for Deep Learning: Generalization Gap and Sharp Minima," 2017.

[15] S. Ruder, "An overview of gradient descent optimization algorithms," 2016.

[16] T. Dozat, "Incorporating Nesterov Momentum into Adam," 2015.

SECUREIoT Project | - DELIVERABLE · 2019-09-22 · programme under grant agreement No 779899. ... to make a version ready for review 0.12 27/11/2018 FUJITSU Adaptation to review

Documents