This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 779899. It is the property of the SecureIoT consortium and shall not be distributed or reproduced without the formal approval of the SecureIoT Management Committee. The content of this report reflects only the authors’ view. The Innovation and Networks Executive Agency (INEA) is not responsible for any use that may be made of the information it contains. Project Acronym: SecureIoT Grant Agreement number: 779899 (H2020-IoT03-2017 - RIA) Project Full Title: Predictive Security for IoT Platforms and Networks of Smart Objects DELIVERABLE Deliverable Number D4.3 Deliverable Name Tools and Techniques for Predictive IoT Security Dissemination level PU Type of Document R Contractual date of delivery M11 Deliverable Leader INRIA Status & version Final - V1.0 WP / Task responsible INRIA Keywords: Predictive algorithms, Process mining, Deep learning Abstract (few lines): This deliverable introduces three major types of machine- learning algorithms for predictive security: process mining, variational autoencoders (deep learning) and behavioral analysis. All of them will be used to predict potential anomalies of the monitored IoT systems. Description and initial results are given along with open-source solutions that can support their implementation. Deliverable Leader: Jérôme François (INRIA) Contributors: Nikos Kefalakis (INTRA), Abdelkader Lahmadi (INRIA), Remi Badonnel (INRIA), Adrien Hemmer (INRIA), Jérôme François (INRIA), Juergen Neises (FUJITSU), Thomas Walloschker (FUJITSU), Jose Fran Ruiz (ATOS), Mariza Konidi (INTRA) Reviewers: Sofianna Menesidou (UBI), Stylianos Georgoulas (INTRA) Approved by: Stylianos Georgoulas (INTRA)
53
Embed
SECUREIoT Project | - DELIVERABLE · 2019-09-22 · programme under grant agreement No 779899. ... to make a version ready for review 0.12 27/11/2018 FUJITSU Adaptation to review
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 779899. It is the property of the SecureIoT consortium and shall not be distributed or reproduced without the formal approval of the SecureIoT Management Committee. The content of this report reflects only the authors’ view. The Innovation and Networks Executive Agency (INEA) is not responsible for any use that may be made of the information it contains.
Project Acronym: SecureIoT
Grant Agreement number: 779899 (H2020-IoT03-2017 - RIA)
Project Full Title: Predictive Security for IoT Platforms and Networks of Smart
Objects
DELIVERABLE Deliverable Number D4.3 Deliverable Name Tools and Techniques for Predictive IoT
Security Dissemination level PU
Type of Document R
Contractual date of delivery M11
Deliverable Leader INRIA
Status & version Final - V1.0
WP / Task responsible INRIA
Keywords: Predictive algorithms, Process mining, Deep learning
Abstract (few lines): This deliverable introduces three major types of machine-
learning algorithms for predictive security: process mining,
variational autoencoders (deep learning) and behavioral
analysis. All of them will be used to predict potential anomalies
of the monitored IoT systems. Description and initial results are
given along with open-source solutions that can support their
Table of Figures FIGURE 1: ANATOMY OF THE SECURITY INTELLIGENCE LAYER ..................................................................................................... 13 FIGURE 2: PROCESS PIPELINE .............................................................................................................................................. 14 FIGURE 3 EXAMPLE OF A PETRI NET ...................................................................................................................................... 15 FIGURE 4: DATA PRE-PROCESSING BLOCK .............................................................................................................................. 16 FIGURE 5: PROCESS MINING BLOCK ...................................................................................................................................... 17 FIGURE 6 PETRI NET DEDUCED FROM PROCESS TREE ................................................................................................................ 17 FIGURE 7 DIRECTLY-FOLLOW GRAPH OBTAINED FROM REFINED DATA .......................................................................................... 18 FIGURE 8 CUTS OF THE DIRECTLY-FOLLOW GRAPH [4] .............................................................................................................. 19 FIGURE 9 SEQUENCE CUT OF L............................................................................................................................................. 19 FIGURE 10 EXCLUSIVE CHOICE CUT OF L1 .............................................................................................................................. 20 FIGURE 11 LOOP CUT OF L3 ............................................................................................................................................... 20 FIGURE 12 SEQUENCE CUT OF L5 ........................................................................................................................................ 21 FIGURE 13 PARALLEL CUT OF L2 .......................................................................................................................................... 21 FIGURE 14 HOW TO TRANSFORM PART OF THE PROCESS TREE.................................................................................................... 22 FIGURE 15 PARAMETERS TO CHARACTERIZE A STATE ................................................................................................................ 23 FIGURE 16 TRANSITION SYSTEM WITH STATES DEFINED BY TWO MAXIMUM PREVIOUS ONES ............................................................ 23 FIGURE 17 DEFINITION OF A REGION S1 TO S2 WITH S2 IN S' ................................................................................................... 24 FIGURE 18 DEFINITION OF A REGION S1 TO S2 WITH S1 IN S' ................................................................................................... 24 FIGURE 19 DEFINITION OF A REGION S1 TO S2 WITH S1 AND S2 (NOT) IN S' ............................................................................... 24 FIGURE 20 PETRI NET GENERATED BY THE TRANSITION SYSTEM MINING ALGORITHM ...................................................................... 25 FIGURE 21 DESCRIPTION OF A DATA POINT EXTRACTED FROM THE DATASET ................................................................................. 28 FIGURE 22 REFINED DATA (XES FILE) ................................................................................................................................... 29 FIGURE 23 BEHAVIORAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (WITHOUT FILTERING) ....................................... 29 FIGURE 24 BEHAVIOURAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (WITH FILTERING, THRESHOLD = 0.07) ............... 30 FIGURE 25 BEHAVIOURAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (LOW NUMBER OF ACTIVITIES, K=1) ................... 31 FIGURE 26 EXTRACT OF THE BEHAVIOURAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (HIGH NUMBER OF ACTIVITIES,
K=100) ................................................................................................................................................................. 31 FIGURE 27 AN EXAMPLE OF A SHORT VAE, BASED ON [7] ........................................................................................................ 34 FIGURE 28 THE GENERAL FULLY CONNECTED NODE (PERCEPTRON) ............................................................................................. 35 FIGURE 29 THE RECTIFIER LINEAR UNIT FUNCTION (RELU) [9] ................................................................................................... 36 FIGURE 30 THE LATENT REPRESENTATIONS OF OUR TESTING DATA (CAN DATASET) ....................................................................... 38 FIGURE 31:HIGH-LEVEL DESCRIPTION OF THE L-ADS ............................................................................................................... 47 FIGURE 32:. USE OF L-ADS IN FIWARE-AWARE SECUREIOT DEVICES ........................................................................................ 49
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
2 Requirements In deliverable D2.2, the analysis leads to the identification of particular requirements related to
each task. Four ones are provided for T4.2 and so are in the scope of this deliverable:
• R4.2.1: Data should be protected during processing by predictive algorithms
• R4.2.2: Predictive analytics must discover and predict threats and vulnerabilities in a timely, scalable, consistent and automated manner
• R4.2.3: Support of multiple prediction algorithms and models
• R4.2.4: Prediction algorithms should describe their constraints In this deliverable of task T4.2, we particularly aim to investigate different approaches and techniques (R4.2.3). In particular, we propose to closely work on deep learning and processing mining methods applied in an IoT context while also considering existing FIWARE enablers. For each of the proposed method, this deliverable will assess the following properties from R4.2.2:
• Prediction time: how long the prediction techniques can process the data to return valuable results such as predicting a threat. In this project, this cannot be only considered as the absolute time to execute the algorithm. Later, when being used in conjunction with mitigation, this time must be estimated regarding the time to properly trigger those counter-measures. Indeed, if a counter-measure is very simple and can be applied very fast, there is no need to have predictions on long-term horizon. Hence, we expect more concrete evaluation in subsequent deliverables produced in T4.2.
• Scalability: IoT data are very heterogeneous. A qualitative analysis of first datasets provided in SecureIoT was performed in D4.1. So, the scalability concerns both the volume and heterogeneity of provided data. We will thus carefully evaluate how our proposal can meet this requirement.
• Consistency: the predictions of the proposed algorithms and their triggered mitigations have to be consistent with the monitored system states and the presence of threats. The consistency will be measured using the available performance metrics for each proposed algorithm. For example, for process mining methods, we measure the precision, the fitness and the generalization of the inferred system models to assess their consistencies while predicting future states. Decisions based on the same rulesets and training conditions shall not be contradictory even if positioned on various levels of the SecureIoT architecture.
• Automated execution: in this deliverable, we will quantify the complexity of the configuration of the proposed techniques to evaluate the degree of human operations that are requested.
R4.2.3 is focused on the further integration of the algorithms. For each algorithm, we will thus specify what are the inputs, optional or required, and the constraints on these inputs if they exist. R4.2.1 is less coupled with the predictive techniques themselves but more related to the process that happens beforehand: the data collection. Therefore, we consider as a mandatory
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
4 Predictive techniques for IoT Security 4.1 Prediction models using process mining We present in this section the process mining methods used to generate prediction models to
infer monitored system models and predict their deviations and anomalies. Process mining
methods have been widely used for the analysis and the building of workflow models for business
process management applications through the analysis of their event logs. However, in this
project we apply them in a new context to build predictive models for IoT systems to identify
their anomalies when they deviate from their expected states. To be applicable, we have to refine
the raw data we have in the project as process mining cannot interpret directly continuous values
as states. In that context, we detail the prior data processing required to transform raw data into
refined data interpretable by a process miner, the algorithms supporting the processing mining
activity and how they are used to generate behavioral models for prediction, and finally different
evaluation metrics.
4.1.1 Overview
The processing pipeline, detailed in Figure 2, summarizes the overall architecture of our
predictive approach for supporting IoT Security. It is composed of three main blocks
corresponding to the data preprocessing block, the process mining block and the prediction
block. The first two corresponds to the ISTE and the last one relates to TEE. This pipeline takes as
inputs raw data, that correspond to both training datasets used by the process miner or live
monitoring data used for prediction purposes from the behavioral models produced by the
process miner.
Figure 2: Process pipeline
During the training phase, the raw data have first to be transformed, during a data pre-processing
step, to generate refined data interpretable by the process miner. We are considering a
commonly-used process miner tool, called ProM, that requires an input file specified according
to the XML eXtensible Event Stream (XES) format [1] representing event logs. We have
considered in this deliverable, three datasets, provided by LuxAi, ISPRINT and IDIADA. that are
describing application data in the JSON format. We have mostly used the dataset provided by
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
the log are synchronized, otherwise it is equal to 1. The goal of the alignment method is to find
the optimal alignment, i.e. the alignment with the minimal cost.
To have a more concrete idea about the alignment method, the Petri net given in Figure 6 is used as an example with a given trace < 𝒂, 𝒃, 𝒇, 𝒆 >. Table 1 Maximum cost alignment
provides a possible configuration for this alignment. The symbol “>>” means the events in the model and the event log are desynchronized, and then 1 is added to the alignment cost. The optimal cost alignment for this example is given by Table 2 Optimal cost alignment
Move Log a b f E 0 >> >> 0 0 >> 0
Move Model >> >> >> >> τ a b τ Τ e Τ
Table 1 Maximum cost alignment
Move Log 0 a B 0 0 f e 0
Move Model τ a B τ τ >> e τ
Table 2 Optimal cost alignment
Once the alignment is done, it is possible to quantify the performance of the behavioral models
with regard to the four metrics detailed below:
-Fitness: this metric indicates whether the generated model, noted P, can replay the
event log, noted L, in an accurate manner. The closer this metric is to 1, the more the
model is capable to replay the given log. The value of this metric is calculated with the
formula below.
𝐹𝑖𝑡𝑛𝑒𝑠𝑠(𝑃, 𝐿) = 1 − (𝑓𝑐𝑜𝑠𝑡(𝑃, 𝐿)
𝑀𝑜𝑣𝑒(𝐿) + 𝐿𝑒𝑛𝑔𝑡ℎ(𝐿) ∗ 𝑀𝑜𝑣𝑒(𝑃))
Fcost(P,L) is the optimal alignment cost between L and P, Move(L) is the total cost of
desynchronized moves on the log, Move(P) is the same total cost on the model, and
Length(L) is the number of events in the log. The denominator of the formula represents
the maximum possible value of the total alignment cost, when there is not a single
synchronized move between the log L and the model P in the optimal alignment.
-Generalization: this metric indicates whether the model P is general enough to include
behaviors that are not in the log L. The closer this metric is to 1, the more general the
model is (i.e. the more unknown behaviour can be played by the model P). Maximizing
this value allows to avoid overfitting, it allows the model to adapt to a new set of events.
Calculating and using the generalization metric is not trivial, because it refers to unseen
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
examples, and how the model can react to them. As given in [8], the objective is to
consider the relationship amongst the number of activities (noted w) leaving a state, the
number of time this state was visited (noted n), and the probability to discover an activity
not seen before the next time the given state is visited, noted pnew(w,n). The value of
this metric is calculated with the formula below.
𝐺𝑒𝑛𝑒𝑟𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛(𝑃, 𝐿) = 1 − (∑ 𝑝𝑛𝑒𝑤(𝑤, 𝑛)𝑒∈𝐿
𝐿𝑒𝑛𝑔ℎ𝑡(𝐿))
According to [9], the pnew(w,n) value can be estimated as follows:
𝑝𝑛𝑒𝑤(𝑤, 𝑛) = {𝑤(𝑤 + 1)
𝑛(𝑛 − 1)𝑖𝑓𝑛 ≥ 𝑤 + 2
1 𝑒𝑙𝑠𝑒
-Precision: this metric indicates whether the events in this log L follow strictly the model
P. The more this metric is close to 1, the less additional behaviours that are not described
in the log will be able to be played by the given model. Maximizing this value allows to
avoid underfitting, and so an over-generalization. The value of this metric is calculated
with the formula below.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑃, 𝐿) = (1
𝐿𝑒𝑛𝑔𝑡ℎ(𝐿)) ∗ ∑ (
𝐹𝑖𝑟𝑒𝑑(𝑒)
𝐸𝑛𝑎𝑏𝑙𝑒𝑑(𝑒))
𝑒∈𝐿
S is the state right before executing the event e in the log. Fired(e) is the number of
activities of S already activated, Enabled(e) is the number of possible activity of S and
Length(L) is the number of events in the log.
-Simplicity: this metric characterizes whether the model is not complex. The value of this
metric can typically be given by the number of states and transitions of the model. At the
moment, this metric is not the most important. Indeed, by simplifying the behavioral
model, the fitness and precision will drop. Our goal is of course to build models that
describe accurately the observed system.
4.1.5 Application to SecureIoT preliminary dataset We will illustrate the application of the processing pipeline described in Figure 2, with the ISPRINT
dataset. The data included in iSprint dataset is from a simulation from CloudCare2U, a solution
for chronic disease patients to have a life as normal as possible. Information in the dataset is from
rooms sensors, like temperature and illuminance, or from particular devices used by the patients,
like heartbeat monitors (D4.1).
The principle of process mining algorithms is to work with different traces that record several
times the behavior of the observed system. For this data set, the JSON file containing application
data can easily be split, by considering that each recorded day stands for a new set of events and
therefore a new trace. Let us focus on the JSON file (“bathroom_environment_it.json”) to bring
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
4.1.6 Requirement mapping
The tables below evaluate the process mining algorithms regarding properties specified by
requirements R4.2.2.
Table 4 evaluates the performance of the inductive miner algorithm.
Prediction time The generation of behavioral models (Petri net) is relatively fast, a model is expected to be obtained in less than 30 seconds [5], but its evaluation may take more time depending on the model complexity. A behavioral model with a high generalization may be evaluated with difficulties due to the huge amount of possibilities. The generation and the evaluation of the Petri net can be done in a few hundred milliseconds with simple models.
Scalability The article [5] highlights the scalability of a variant of the inductive miner algorithm for discovering processes. It was tested on a complex input XES file (77 513 traces, 358 278 events and 3 300 activities), and the result was given in less than 30 seconds.
Consistency Not applicable at this stage of the project since we do not use the model to predict (the attacks) yet.
Automated execution Currently, the process mining part of the pipeline is not automated. The user has to select the inputs and adjust parameters himself. A script will be developed to build and evaluate models in a more automated manner.
Table 4 Inductive Miner Performance
Table 5 evaluates the performance of the transition system miner algorithm.
Prediction time The generation of behavioral models (Petri net) seems to be slower than the generation with the inductive miner algorithm. In the article [4], the complexity of the algorithm in the worst-case is said to be exponential with respect to the size of the log.
Scalability A huge amount of states and activities might be a problem with this algorithm. To find the behavioral model that best fits the log, the fitness has to be evaluated for each possible final marking. The complexity of this algorithm can be exponential with respect to the size of the log. Therefore, too complex input files should be avoided.
Consistency Not applicable at this stage of the project since we do not use the model to predict (the attacks) yet.
Automated execution The algorithm requires to know the final state of the model. As for the inductive mining algorithm, a script has to be developed to evaluate efficiently the fitness and find the models that describe adequately the system in an automated manner.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Those services usually are enriched by open elemental AI tools, e.g. Keras, Tensorflow, OpenPose,
h2o, which are applied either dedicatedly in the Zinrai context using a cloud service (PaaS) or on
premise. This way, Zinrai is a best of breed approach utilizing the most suitable technologies
concerning the specific problem in the Zinrai tool set. Therefore, a wide area of tools bringing
their specific environments and APIs regularly extend the Zinrai tool box.
Among those tools FUJITSU recommends:
• Clustering: scikit-learn
• Normalization: scipy, scikit, numpy
• Topological Data Analysis: scikit-tda
• Deep Learning: TensorFlow
5.2 Potential application to SecureIoT scenarios and datasets A major objective applying a standardized tool set based on services of the Zinrai platform for
detecting anomalies by SecureIoT was utilizing available building blocks for fast development.
However, recent observations illustrate the limitations of this kind of building block approach in
terms of flexibility and accuracy. The application needs to be mapped to the defined usage
scenarios and those modules lack of training to the specific problem beyond their scope.
Moreover, in the last 2 years, there has been a shift in the AI market towards a commoditization
of available AI frameworks. Due to the resulting high investment requirements for effective
solutions, existing OSS Machine Learning Frameworks have become relatively easy to use and
results with high precision can be obtained for almost any use case. Various manufacturers, e.g.
Facebook, Microsoft and Google also provide publicly available AI frameworks, which are
adapted to their background systems.
Based on this dynamic landscape, it currently makes sense to adapt the approach and utilize
common open AI frameworks them for their suitability and feasibility in the SECaaS services. Due
to the simplified deployment, applicability and the improved results in nowadays publicly
available AI frameworks Fujitsu regularly recommends the application of Keras in combination
with the latest TensorFlow version in AI related projects for rapid development. This combination
has demonstrated its comprehensive applicability and its suitability for a simple and rapid
development of solutions for a wide range of models.
Beyond this we see a promising development for rapid prototyping by the h2020
frameworkframeworkh20 framework, which may be assessed in the project.
5.2.3 Requirement mapping The table 5 below evaluates the algorithm regarding properties requires by requirement R4.2.2
Prediction time Prediction time depends on the selected model and the underlying
infrastructure. Considering Tensorflow, it supports optimal
performance by recommendations to the developer. Moreover, major
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
6.3 Integration in IoT platforms We are working currently in using this approach with deployments of IoT systems based on
FIWARE (Figure 32). We will recollect the data of the IoT systems compiled by the data probes of
SecureIoT and use it as input for analysis of malicious behavior. This will allow us to make
cybersecurity predictive analysis of the system and alert of possible threats before they can
impact the system.
Figure 32:. Use of L-ADS in FIWARE-aware SecureIoT devices
Although in a preliminary stage we think this approach will allow us to be able to work with IoT
devices implemented with FIWARE or other IoT platforms, just adapting the input for the data
probes of each system.
Application in use cases
Currently we plan to use this approach in the connected cars use case, which will be of many
benefit to us due to the large quantity of data exchanged and the needs for fast response times
due to the criticality of the system. In a first evaluation we did the cybersecurity requirements of
the use case fits with the benefits of this approach, both in terms of response and reaction times.
Regarding the other use cases we are evaluating its use in them.
Requirement mapping
Following, we present information of how this approach fits the properties specified by the
requirements of R4.2.2.
Prediction time The generation of the training models depends on how much data is provided. A usual batch of information we use in the testing/evaluation process of this tool (approximately 6000 register) takes approximately 10 minutes. On the other hand, the evaluation and response of the analytics is immediate. Of course, the training time depends on the number of features to be considered for evaluation.
Scalability Due to how it works, each instance of the L-ADS must be deployed in a network node because of accessibility to it. Therefore, no issues so far with scalability
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Consistency Accuracy is good according to the testing we performed but we are working in improving it using several different models and sets of features
Automated execution The configuration of the entry point (packets) is manual but the rest of the processes are automatic. Providing data for the training is also manual and needs to be generated before the monitoring and evaluation of traffic can be done. The alarms and events generated are provided also in an automatic way, which can be then used for dashboards, reports, user evaluation, etc.
Additionally, and following requirement R4.2.3:
Action Model training and data analysis
Data Required Netflow data of the analyzed datasets
Data Type Different features according to what we want to analyze
Desired model format Refining for a more mature version