ICT, STREP
FERARI ICT-FP7-619491
Flexible Event pRocessing for big dAta aRchItectures
Collaborative Project
D1.3
Application Scenario and Prototype Report
01.02.2016 – 31.01.2017 (preparation period)
Contractual Date of Delivery: 31.01.2017
Actual Date of Delivery: 31.01.2017
Author(s): Damir Bogadi (HT), Marko Štajcer (PI),
Fabiana Fournier (IBM), Inna Skarbovsky
(IBM), Izchak Sharfman (TECHNION),
Michael Kamp (FhG), Michael Mock (FHG)
Institution: HT
Workpackage: WP1
Security: PU/CO (Appendix B and C)
Nature: R
Total number of pages: 51
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Project coordinator name: Michael Mock
Project coordinator organisation name:
Fraunhofer Institute for Intelligent Analysis
and Information Systems (IAIS)
Revision: 1
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
URL: http://www.iais.fraunhofer.de
Abstract:
FERARI aims at developing a general-purpose architecture for flexible,
communication efficient distributed complex event processing on massively
distributed streams of data. While the architecture, being developed in WP2 and being described in D2.3, is of general purpose and hence not restricted to specific
use cases and application domains, it was instantiated and validated in the FERARI project in two specific use cases working with real-world data. Instantiation of the
FERARI architecture in those use cases and evaluation on real-world data in the scope of these use cases is addressed in WP1. These specific use-case related applications running on top of the general purpose FERARI platform are working on
real-world data provided by HT and were instantiated in a test bed on cluster hardware installed at HT. The purpose of this document (D1.3) is to explain the
setup of the test bed hardware, definition of the evaluation criteria tested (number of fraudsters detected, detection time achieved, and theoretical revenue savings), performance results in terms of latency and throughput and usability evaluation
(end user testing of the frontend).
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Revision history
Administration Status
Project acronym: FERARI ID: ICT-FP7-619491
Document identifier: D1.3 Application Scenario and Prototype Report (01.02.2016 – 31.01.2017)
Leading Partner: HT
Report version: 1
Report preparation date: 22.01.2017. Classification: PU / CO (Appendix B and C) Nature: REPORT
Author(s) and contributors: Damir Bogadi (HT), Marko Štajcer (PI), Fabiana Fournier
(IBM), Inna Skarbovsky (IBM), Izchak Sharfman
(TECHNION), Michael Kamp (FhG), Michael Mock (FHG)
Status: - Plan
- Draft
- Working
- Final
x Submitted
Copyright
This report is © FERARI Consortium 2017. Its duplication is restricted to the personal use
within the consortium and the European Commission. www.ferari-project.eu
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Document History Version Date Author Change Description 0.1 21/12/2016 Damir Bogadi (HT) First draft 1.0 20/01/2017 Damir Bogadi (HT) Draft after integration of parts 1.1 30/01/2017 Damir Bogadi (HT) Finalization for submission
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Table of Contents 1 Introduction .......................................................................................................................................... 2
1.1 Purpose and Scope of the Document ........................................................................................... 2
1.2 FERARI Setting ............................................................................................................................... 3
1.3 Workpackage 1 Description .......................................................................................................... 4
1.4 Relationship with other Documents ............................................................................................. 5
2 Testing Environment ............................................................................................................................. 6
2.1 Prototype Environment ................................................................................................................ 6
2.2 Privacy Constraints for Publication ............................................................................................... 9
2.2.1 Privacy Requirements ........................................................................................................... 9
3 Scenarios Evaluation ........................................................................................................................... 10
3.1 Evaluation Procedure and Tasks Performed ............................................................................... 10
3.2 Fraud Use Case ............................................................................................................................ 12
3.2.1 Business Evaluation Criteria ................................................................................................ 12
3.2.1.1 KPIs event-driven implementation ................................................................................. 13
3.2.1.1.1 Event types ................................................................................................................ 14
3.2.1.2 Dataset ............................................................................................................................ 14
3.2.2 Usability Evaluation ............................................................................................................. 15
3.2.2.1 FERARI Dashboard Overview .......................................................................................... 17
3.2.3 Performance Evaluation ...................................................................................................... 20
3.2.3.1 Recall and Precision ........................................................................................................ 21
3.2.3.1.1 Alarm X ...................................................................................................................... 21
3.2.3.1.2 Alarm Y ...................................................................................................................... 22
3.2.3.1.3 Alarm Z ...................................................................................................................... 22
3.2.3.2 Timing of detection ......................................................................................................... 22
3.2.3.3 Latency ............................................................................................................................ 22
3.2.3.4 Throughput ..................................................................................................................... 25
3.2.3.5 Revenue at Risk ............................................................................................................... 25
3.2.4 Other Test Results ............................................................................................................... 26
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
3.2.4.1 Ease of Context Configuration or Event Processing ........................................................ 26
3.2.4.2 Interoperability ............................................................................................................... 28
3.2.5 Summary of our Findings in the Fraud Use Case ................................................................ 28
3.3 System Health Monitoring Use Case ........................................................................................... 30
4 Summary ............................................................................................................................................. 31
5 Appendix A - Hardware configuration ................................................................................................ 32
6 Appendix B – Fraud Alarms Description and Results (CONFIDENTIAL) .............................................. 33
6.1 Event processing agents.............................................................................................................. 33
6.1.1 EPA1: Alarm X ...................................................................................................................... 33
6.1.2 EPA2: Alarm Y ...................................................................................................................... 35
6.1.3 EPA3: Alarm Z ...................................................................................................................... 36
6.2 Timing of detection in more details ............................................................................................ 37
7 Appendix C - DSLAM Descriptive Statistics Results (CONFIDENTIAL) ................................................. 38
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
List of Tables Table 1: Acronyms ........................................................................................................................................ ix
Table 2: Data sources, periods and volumes of collection ........................................................................... 8
Table 3: Criteria Evaluation Matrix ............................................................................................................. 10
Table 4: KPI’s for fraud use case ................................................................................................................. 13
Table 5: Event types for the KPI EPN .......................................................................................................... 14
Table 6: Usability evaluation criteria categories ......................................................................................... 16
Table 7: MEGS' flagged calling numbers for alarm X .................................................................................. 21
Table 8: PROTON's derivations (situations) times for alarm X versus MEGS .............................................. 22
Table 9: Performance Results Summary (90% percentile values) .............................................................. 24
Table 10: Revenue saved by earlier detection ............................................................................................ 25
Table 11: PROTON's derivations (situations) times for alarm X versus MEGS – real values ...................... 37
Table 12: The probability of a ports LSI on the next day given its current LSI ............................................ 39
List of Figures Figure 1: Relationship between WP1 (Prototype) and WP2 (Software Platform) ........................................ 2
Figure 2: Three Phases of the FERARI Project and WP1 ............................................................................... 4
Figure 3: Deliverables in WP1 ....................................................................................................................... 4
Figure 4: Integration of the prototype within the FERARI architecture ....................................................... 8
Figure 5: KPI EPN ......................................................................................................................................... 14
Figure 6: Login form .................................................................................................................................... 18
Figure 7: The derived events in graph representation ............................................................................... 18
Figure 8: List of the derived events ............................................................................................................. 19
Figure 9: List of the most frequent calls ..................................................................................................... 19
Figure 10: Information about subscriber .................................................................................................... 20
Figure 11: List of calls that led to the selected derived event .................................................................... 20
Figure 12: Mesos cluster topology .............................................................................................................. 24
Figure 13: The event processing (EPA) of type SUM that specifies Alarm X ............................................... 27
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Figure 14: The definition of the input event (CallPOPDHW), e.i. the list of attributes and their types for this event type ............................................................................................................................................ 27
Figure 15: The definition of the segmentation context, in our case, we partition the input events (CallPOPDHW) by their calling_number attribute ...................................................................................... 28
Figure 16: Event recognition process for Alarm X EPA ............................................................................... 34
Figure 17: Context for Alarm X EPA per single customer ........................................................................... 35
Figure 18: Event recognition process for Alarm Y EPA ............................................................................... 35
Figure 19: Event recognition process for Alarm Z EPA ............................................................................... 36
Figure 20: Daily percentage of port link loss events and line stability index .............................................. 38
Figure 21: Daily percentage of port with LSI <= 2 ....................................................................................... 39
Figure 22: Distribution of link loses over the time of day. .......................................................................... 39
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Acronyms
CEP Complex Event Processing
DSLAM Digital Subscriber Line Access Multiplexer
FERARI Flexible Event pRocessing for big dAta aRchItectures
PROTON PROactive Technology Online - IBM tool
WP Work Package Table 1: Acronyms
FERARI Deliverable D1.3
Application Scenario and Prototype Report
Damir Bogadi, Marko Štajcer, Fabiana Fournier, Inna Skarbovsky,
Izchak Sharfman, Michael Kamp, Michael Mock
2
1 Introduction
1.1 Purpose and Scope of the Document
In WP1 we develop the basic application scenarios on which all further development is dependent. This WP
is driven by the end user in the project, i.e. HT.
Specific objectives of WP 1 are:
Selection and definition of the application scenarios in the telecommunications;
Definition of testing and evaluation criteria for the end users at HT;
Setting up of a test bed in HT and at the project partner’s local sites; and
Implementation and evaluation of scenarios to demonstrate the advantage of FERARI with
respect to the state of the art as well as to demonstrate its business value.
Figure 1: Relationship between WP1 (Prototype) and WP2 (Software Platform)
Figure 3 summarizes the main characteristics of WP2 and WP1, showing the differences and relationship
between these two work packages. The architecture developed in WP2 is made available as open source
platform in the FERARI open source repository https://bitbucket.org/sbothe-iais/ferari. It is intended to be
of general purpose and usable in any application domain that works on distributed streaming data. It provides
flexible mechanisms for complex event processing, libraries and run-time components for the distributed
execution of applications. Therefore, FERARI open source platform makes use of communication efficient
3
protocols for in-situ processing and distributed execution of complex event processing runtimes. Application
development of large-scale Big Data streaming applications is supported and significantly simplified by the
FERARI open source platform. However, specific application-dependent algorithms like fraud detection or
system health monitoring are not part of the open source platform, but will be developed for the purpose of
validating the FERARI platform against the use cases developed and described in WP1. These specific use-
case related applications running on top of the general purpose FERARI platform are working on real-world
data provided by HT and will be instantiated in a test bed on cluster hardware installed at HT. As these
applications are very specific and closely related to the HT data, they will not be open sourced (see also DoW
p. 6 of 28 in the WP1 description).
The purpose of this document is to explain:
Testing performed against business driven KPIs. Please note that the use case specific
evaluation criteria being described here do not stand for their own, but extend the
general purpose goals of the FERARI architecture such as scalability at very large scale,
communication efficiency, and flexibility in the expressiveness of the Complex Event
Processing.
Usability evaluation as performed by HT’s experts. Although FERARI is of general
purpose and not restricted or related to any specific kind of GUI, the evaluation of a
specific use case application makes it necessary to provide.
Recall and Precision of the event-driven application. PROTON’s results are compared to
the MEGS system that is currently in place in HT.
Performance of the FERARI architecture including latency.
This report is structured as follows: chapter 2 gives on overview of the testing environment, chapter 3 gives
an overview of the scenarios setup, evaluation criteria and test results and chapter 0 discusses the summary of
testing results.
1.2 FERARI Setting
The FERARI project aims to develop a highly scalable distributed streaming architecture supporting complex
event processing in a communication efficient manner. A key element of the architecture will be
communication efficient distributed methods for monitoring global functions on globally distributed states by
partitioning of the global function to distributed local functions that communicate only if needed. The general
applicability of these methods will be demonstrated in various application scenarios, including distributed
online machine learning. The use cases for the evaluation of the framework and the machine learning
algorithms are real world use cases from Hrvatski Telekom. One use case focuses on fraud discovery in
mobile networks, which includes SIMbox fraud, premium rate service fraud and roaming fraud amongst
others. The other use case is system health monitoring, a problem attracting more and more attention as
current failure detectors model normal behavior usually from historical data. This approach is getting more
and more challenging, as the network components are becoming more complex and very dynamic as
technologies evolve and data consumption grows.
4
1.3 Workpackage 1 Description
WP1 will develop the basic application scenarios on which all further development crucially dependents. This
WP is driven by the end user (HT) in the project.
Figure 2: Three Phases of the FERARI Project and WP1
Work package 1 (WP1) “Applications scenarios, Test Bed, Prototype” develops the basic application
scenarios and use cases of the FERARI project. The WP is driven by the end users in the project, which
selected and defined uses cases and will perform testing of final product based on predetermined evaluation
criteria.
Figure 3: Deliverables in WP1
WP 1 - Applications scenarios, Test Bed, Prototype
D1.1 Application Scenario Description
and Requirement Analysis
D1.2 Final Application Scenarios and
Description of Test Environment
D1.3 Application Scenario and
Prototype Report
5
Specific objectives as per Description of Work are:
1. Selecting and defining the application scenarios in the telecommunication scenario;
2. Definition of testing and evaluation criteria for the end users at HT;
3. Setting up of a test bed both at HT and at the project partner’s local sites; and
4. Implementation and evaluation of scenarios to demonstrate the advantage of FERARI with
respect to the state of the art as well as to demonstrate its business value.
1.4 Relationship with other Documents
The general purpose architecture described in D2.3 was instantiated in WP1 to handle the use cases described
in D1.2 and test results reported in D1.3. Fraud detection was implemented via the instantiation of specific
fraud rules in the Complex Event Processing – see detailed description in D4.1 and its application in the
flexible event model described in D4.3. D3.2 evaluates a communication efficient in-situ implementation of a
fraud rule on the fraud data described in D1.1, and D1.2. The distributed CEP optimizer described in D5.2,
and D5.3 was used to find optimal distributed placements for the complex event expressions that implement
the fraud rules.
6
2 Testing Environment
2.1 Prototype Environment
HT has installed a dedicated cluster hardware as test bed for the FERARI project. This test bed served two
main purposes:
Data repository for real-world data: Real-world data from HT is copied from the HT
operational infrastructure on the FERARI test bed. It is anonymized on the test bed by
the partner PI. Anonymized data is made available to rest of the consortium.
Installation of FERARI use cases: Use-case specific application running on the FERARI
architecture was installed at the test bed. Recall that one final goal of the FERARI
architecture is to support communication efficient in-situ operations, which implies that
the data is process close to where it is created (i.e. in the HT infrastructure). However, as
a research project, it is impossible to experiment with the HT operational infrastructure.
The test bed environment of FERARI project is located in Zagreb, Data center Utrine, where FERARI
Server is located. All Consortium partners received remote access to the environment. All partners have
approach to anonymized data, while only Poslovna Inteligencija has approach to raw data, which is
anonymized and stored into separate file directory for other consortium members to use.
The Hardware and Software configuration of the FERARI test bed is as follows:
7
Testing environment is placed in Data Center Utrina, Kombolova 2, 10 000 Zagreb
Physical Sever; CentOS Linux release 7.1.1503 (Core)
Hardware Components : HP configuration – details in Appendix A
System SW installed :
o OS CentOS,
o Backup App. Networker
Application SW installed:
o NodeJS witch attached modules
o Redis
o MySQL
o Java
o Anonymization engine
Additionally: Storm/FERARI platform will be installed and configured
Scheme of File System directories with volume of data collected:
o System directories
o oriData filesystem – contains raw data
o anonData filesystem – contains anonymized data
In a similar way, data from the HT network infrastructure is collected and copied into the FERARI test bed
hardware. The following Table summarizes the data connections and amount of data being transferred
between HT infrastructure and the FERARI test bed:
8
Data Sources and Volumes
HT data sources with period of collecting
Data Warehouse (CDR) – batch load
Fraudsters data – batch load (weekly)
VMware – Cloud Health logs – on daily basis
DSLAM (POLraw, POLraw_1 data bases) –On daily basis
DSLAM HW measurement (Netcool) – On daily basis
Data Volumes
DWH CDR (currently): 0.4 TB – planned additional 0.6 TB in next batch
Cloud Health logs (currently): 0.5 TB
DSLAM Line data: 1.9 TB
Total: 2.8 TB
Table 2: Data sources, periods and volumes of collection
The following Figure shows the relationship between the FERARI test bed and the FERARI architecture
described in D2.2 and how the data for fraud detection and system health monitoring is collected and
processed on the FERARI test bed. Note that the data from the operational HT infrastructure is copied into
the FERARI test bed.
Figure 4: Integration of the prototype within the FERARI architecture
The complete FERARI runtime system is installed on the FERARI test bed. The (anonymized) real-world
data, previously being collected and copied from the HT infrastructure to the FERARI test bed, is being fed
as streaming data into the FERARI Architecture. Use case specific algorithms implemented in the FERARI
9
architecture evaluate the streaming data and generate use case specific output, which can be fed into a use
case specific GUI.
2.2 Privacy Constraints for Publication
Due to Privacy restrictions, real thresholds used in HT for detecting fraud cannot be publicized. However,
these values were disclosed to Consortium members in order to perform valid testing, i.e. in order to be able
to perform results comparison.
2.2.1 Privacy Requirements
Sensitive customer data is used as input for FERARI and therefore strict privacy policies and anonymisation
rules have to be applied. In month 3 of the project, new EU Opinion on Anonymisation caused re-evaluation
of anonymisation techniques previously in use.
New anonymisation techniques were developed by Poslovna Inteligencija and evaluated by Hrvatski Telekom
privacy team. Approval of new anonymisation techniques was required (and successfully obtained) from
Deutsche Telekom Headquarter in Bonn.
10
3 Scenarios Evaluation
This chapter describes the use cases investigated in FERARI and testing methodologies and business values
in detail. Two real world use cases from Hrvatski Telekom were selected to test the solutions to be developed
within FERARI. These use case are:
Fraud Mining; and
System health monitoring.
Each use case is described in a dedicated section.
3.1 Evaluation Procedure and Tasks Performed
When evaluating FERARI solution, we took into consideration several criteria’s:
Criteria Evaluation Through KPIs
CRITERIA FRAUD USE CASE SYSTEM HEALTH
MONITORING USE CASE
PROCESSING TIME Fraud Case Detection Time Failure Prediction Time
NUMBER OF FALSE
ALARMS/VALUE OF THE
PROPOSED SOLUTION
Number of Fraudsters False Alarm Ratio
Revenue at Risk Due to
Fraud Failure Precision Percentage
Table 3: Criteria Evaluation Matrix
For each use case testing we focused on:
Correct implementation of theory in the solution;
Testing of the solution automation;
The ability of key components to communicate among each other;
Correctness output data format and content; and
The defined KPIs.
11
For the purpose of testing we performed the following:
Formulated evaluation criteria, verification procedure and feedback report guidelines
Created usability matrix which was used to perform usability evaluation of the implemented
Fraud use case by HT's end users
Held workshops to perform dedicated evaluation sessions engaging both end users and
developers
Evaluated the developed software tools by testing functionalities, accessibility, respect of user
needs
Evaluated integration and execution times
12
3.2 Fraud Use Case
This section defines the evaluated KPIs and business value from FERARI fraud detection use case.
Our goal is to assess the accuracy of the set of event rules specified for the mobile fraud use case. To this end,
we have implemented a new set of event rules applying real thresholds and tested it on a real data set as
described in the following sub-sections.
It’s important to note that as per HT’s privacy requirements, the thresholds shown in the event patterns are
masked with “???” due to privacy issues. However, the implementation uses the correct numbers so
performance evaluation is possible.
3.2.1 Business Evaluation Criteria
Our aim was to perform benchmarking of HT’s fraud detection system against results derived from the
FERARI fraud use case. In order to perform comparison of the results we set the real thresholds used in
HT’s fraud detection system in FERARI implementation. Result of the test should be subscribers that are
flagged as fraudsters.
For the purpose of results evaluation, the following KPIs have been determined:
Fraud Use Case KPIs
KPI Description Validation Criteria
Number of
fraudsters
Number of customers where fraudulent behavior is
detected. Our goal is to detect as many as possible
customers who exhibit fraudulent behavior. Expectation
from FERARI project is to increase number of detected
fraudsters.
While detecting the number of fraudsters in a data
population, we have to take notice on the false positive
and false negative alarms ratio. While we would have
perfect result if we declare the whole population as
fraudulent we would also have the highest percentage of
false positives. Generally, the false positive rate should
be no more than produced by the current fraud mining
system deployed in HT.
Increase in effectiveness
or value with the
proposed solution.
Detect at least as many
fraudsters as are detected
before.
Fraud case
detection time
Time spent between moment when fraud case happened
and moment when HT detected fraud case. We aim to
detect fraudulent behavior as soon as possible and to
take measures in preventing such behavior. Expectation
from FERARI project is to decrease fraud detection
time, meaning we will be able to detect fraud cases
Velocity improvement.
Decrease fraud detection
time by 10 percent.
Current fraud case
detection time is 20
13
sooner than now. minutes.
Revenue at risk due
to the fraud
Estimated sum of charges generated by fraudsters which
will never be collected by HT. Fraudsters generate
significant charges through fraudulent behavior;
fraudsters never pay those invoices which represents
direct revenue loss for the service provider.
We aim to detect all fraudsters and to detect fraudulent
behavior as soon as possible; therefore fraudsters will
have limited time to generate fraudulent charges. This
will in return decrease the revenue at risk and bring
more value.
Increase in value with the
proposed solution.
Decrease revenue at risk
due to the fraud by 15
percent.
Table 4: KPI’s for fraud use case
The first KPI is evaluated in terms of recall and precision as described in Section 3.2.3.1.
The second KPI is evaluated by comparing the detection times of the flagged calling numbers by MEGS to
their counterpart detection time derived by PROTON. A description of this evaluation is detailed in Section
3.2.3.2.
The third KPI is evaluated by multiplying revenue with saving in detection time.
In addition, we also provide numbers related to the average latency of our application to give an estimation of
its performance (see Section 3.2.3.3).
3.2.1.1 KPIs event-driven implementation
The new event processing network (EPN) consisting of three event processing agents (EPAs) is shown in
Figure 5 and detailed in the following sections. For the sake of simplicity we only show the EPAs and the
events flow in the network.
In this new event processing network (EPN) we fire alerts in the cases detailed described in Appendix B –
Fraud Alarms Description and Results (CONFIDENTIAL).
Note that as previously noted, the EPN implemented runs with real thresholds.
We follow the same semantics and names convention as in previous deliverables and show each EPA in
terms of the event recognition process and context (Please refer to work package 4 deliverables in the
FERAR project website1).
1 http://www.ferari-project.eu/key-deliverables/
14
Figure 5: KPI EPN
3.2.1.1.1 Event types
Four event types have been defined that comprise the event inputs, outputs/derived, and situations as shown
in Table 5. As in our previous application, the input event is a CDR record, “Call” in short for simplicity.
Event name Payload
Billed_msisdn (or Call)
object_id; billed_msisdn; call_service; call_start_date; calling_number; call_number; call_direction; tap_related; conversation_duration; ; other_party_tel_number; total_call_charge_amount; customer_activation_date; customer_type
Alarm X calling_number; amount_spent; call_start_dates
Alarm Y calling_number; amount_spent; call_start_dates
Alarm Z calling_number; number_of_calls; call_start_dates Table 5: Event types for the KPI EPN
3.2.1.2 Dataset
The dataset comprised 80GB of masked CDRs over a time span of 30 days that correspond to July 2016. We
counted 18,426,138 events per 24 hour on a random day, meaning an injection rate of 213 events per second.
Extrapolating n the number of records per day we arrive at approximately 553 million records per month.
We had to solve the following issues:
Running the application in FERARI server at HT, due to privacy issues as well as size of input file.
Running the application as events were coming at real-time (according to their timestamps)
Being able to somehow compress the times so we don’t wait one month to get the result of the
running.
The first issue was solved by installing ProtonOnStorm in FERARI’s server and running the application in
HT venue.
15
We tackled the last two issues by applying our utility PETITE (Proton EvenT Injection & Time compression)
that has been developed in the scope of the SPEEDD project2 . PETITE provides a mechanism that can take
the input events and inject them into the engine at very high rates (much higher than in reality imposed by
their timestamps) and process them very fast without altering the logic of the application (for a description of
PETITE refer to3).
The PETITE script is an external utility to PROTON that given an ordered input file and a total elapsed time
for processing all the events in the input file (a compression “ratio”) it:
Checks for feasibility of the requested ratio. This means that events are injected in a way that
can be processed by the engine without being out of order, contexts are not too small, and
there is a minimal interval between input events. For example, having an input file that starts
with the first input event occurrence time at 1st Dec 2014 and ends with the last input event
occurrence time at 7th Dec 2014, we target to run all these events in only one hour then our
ratio is 168 (we need to “compress” or reduce all times by 168, as we are moving from one week
or 168 hours to one hour).
Alters the temporal contexts in the JSON definition file for PROTON in such a way that the ratio
is met and the application logic is maintained.
Alters the time intervals between input events so that they are compressed to the required
injection rate. The original date and time remains intact for the purposes of date comparisons
and manipulation, however, an additional column of timestamps are added to the input file, so
that they can be used as “OccurenceTime” attribute in the Timed File Input adapter of PROTON
for injection of events based on those intervals.
In our KPIs application, we compressed one month of input data into 8 hours.
As aforementioned we tested for the detection of three event patterns: Alarm X, Alarm Y, and Alarm Z.
3.2.2 Usability Evaluation
Based on the defined usability evaluation matrix HT’s experts evaluated the Fraud use case in the following
segments:
Fraud Use Case – Usability Evaluation
Criteria Results Description
Features and
functionality
Features and functionality meet common user goals and objectives to a limited
extent. Additional tracking options for suspect fraudsters should be added to the
solution (e.g. flagging users). Also, additional filtering options and help should be
2 http://speedd-project.eu/
3 https://github.com/ishkin/Proton/
16
added.
Homepage/ starting
page
The starting page provides a clear snapshot and overview of the content, features
and functionality available. Layout is clear and uncluttered with sufficient 'white
space'.
Navigation Navigation is easier than in the current system in HT. The navigation is intuitive
and consistent. While adding additional complexity to the solution, this should be
taken into consideration as to not clutter the interface and make the navigation
more complex and less intuitive.
Search Search function is available per each field which offers quick, Excel like filtering.
However, current system implemented in HT offers much complex filtering
options (eg. subscriber type, filtering by flags set by the observer, filtering by the
number of alarms triggered by a subscriber, etc.).
Control and feedback Users can easily undo, go back and change or cancel actions. This seems more
advanced than implemented in the current solution in HT.
Errors Error handling is not present on the Dashboard side of the prototype and as such
could not be evaluated.
Content and text Terms, language and tone used are consistent (e.g. the same term is used
throughout). Language, terminology and tone used is appropriate and readily
understood by the target audience.
Performance Site performance does not inhibit the user experience (e.g. slow page downloads
and long delays). Various user technologies are supported (e.g. browsers,
resolutions, computer specs). No reliability issues were found that could inhibit
the user experience.
Table 6: Usability evaluation criteria categories
The following scoring was used:
17
Very poor (less than 29) - Users are likely to experience very significant difficulties using the site
and might not be able to complete a significant number of important tasks.
Poor (between 29 and 49) - Users are likely to experience some difficulties using the site and
might not be able to complete some important tasks.
Moderate (between 49 and 69) - Users should be able to use the site and complete most
important tasks, however the user experience could be significantly improved.
Good (between 69 and 89) - Users should be able to use this site with relative ease and should
be able to complete the vast majority of important tasks.
Excellent (more than 89) - The site provides an excellent user experience for users. Users should
be able to complete all important tasks on the site.
Overall usability score for the FERARI dashboard was evaluated as GOOD with an absolute score of 77/100.
The architectural integration of the end user application in to the FERARI platform is described in detail in
D2.3.
3.2.2.1 FERARI Dashboard Overview
FERARI dashboard contains minimum information for basic analysis. However, MEGS system enables more
filtering and settings options. In addition, MEGS enable better tracking of history per subscriber and better
case visualization. Visualization is one of the key tools used by HT’s experts in analyzing potential fraud cases.
FERARI frontend should be additionally upgraded to reach minimum options in order to present real value
as a day-to-day tool:
A more complex dashboard
More filtering options
More information about subscribers from CRM and other systems
Graphical representations of events
More statistical information about the subscribers
Ability to set a case to a certain state (e.g. checked, unchecked, monitoring, etc.)
A setup window for new alarms or editing existing ones.
A separate more complex visualization of traffic created by a subscriber.
Fraud expert notification via e-mail or SMS gateway.
Given that FERARI’s advantage lies in its speed over existing system, HT will consider effort needed to
implement the underlying architecture or upgrade the frontend.
The Figure 6 shows the login form as the first screen of the application.
18
Figure 6: Login form
On the Figure 7 there is a graphical representation of the derived events that were detected by the system in
the last hour including the real-time counter of the incoming events.
Figure 7: The derived events in graph representation
The Figure 8 shows all derived events detected by the system.
19
Figure 8: List of the derived events
The Figure 9 shows the most frequent calls and the number of their occurrence.
Figure 9: List of the most frequent calls
The Figure 10 give information about subscriber as well as exporting functionality.
20
Figure 10: Information about subscriber
On the Figure 11 there are list of calls that led to the selected derived event.
Figure 11: List of calls that led to the selected derived event
3.2.3 Performance Evaluation
In order to assess the accuracy of our event-driven application, we performed two types of analysis:
calculation of recall and precision (see Section 3.2.3.1), and of latency (see Section 3.2.3.3.).
We would like to note that event processing speed currently in place is tuned to actual business needs. The
centralized system in HT could be tuned to a higher performance if requested. Therefore, results presented
below are compared to actual production status of the platform and do not represent the maximal
performance of the used Fraud management system in HT.
21
3.2.3.1 Recall and Precision
In order to assess the accuracy of our event-driven application in terms of recall and precision we compared
PROTON’s results to the results given by MEGS to the same data set (July 2016) for the three alarms we had
(this relates to the first business KPI which aims to detect at least as many fraudsters as before). That is, we
compared the calling numbers in the derived events from PROTON to the ones detected by MEGS, for each
of the three event rules. MEGS flagged calling numbers have been provided to us by partners HT and PI in a
csv file that contained the flagged calling numbers (Billed_msisdn) along with the detection time by MEGS.
One csv file for each of the three event patterns.
3.2.3.1.1 Alarm X
The csv file contained 10 flagged calling numbers as shown in Table 7. Table 8 shows PROTON’s
derivations for the data set for this alarm. As can be seen from the tables, calling numbers 3859932QJACY
and 3859942YKEWM (rows 1 and 3 in Table 7) were flagged by MEGS but not by PROTON. On the other
hand, calling numbers color coded by green (385985CYDWI, 3859587OFFWQ, 3859941JSEPA) in Table 8,
were found by PROTON but not by MEGS.
Row # Billed_msisdn MEGS’ detection date
1 3859942YKEWM 01/07/2016
2 3859932FHJRR 01/07/2016
3 3859932QJACY 01/07/2016
4 3859966BYSFE 04/07/2016
5 3859942IHCQC 06/07/2016
6 3859816SNBYQ 12/07/2016
7 385986LZGVG 13/07/2016
8 3859942HYCEA 16/07/2016
9 3859947RTRKA 26/07/2016
10 3859980ZBDXE 28/07/2016
Table 7: MEGS' flagged calling numbers for alarm X
Row # in Table
2
Billed_msisdn PROTON’s detection time
MEGS’ detection time
2 3859932FHJRR 01/07/2016-00:39:25 Refer to Section 6.2
4 3859966BYSFE 04/07/2016-15:09:59 Refer to Section 6.2
5 3859942IHCQC 06/07/2016-00:29:35 Refer to Section 6.2
6 3859816SNBYQ 12/07/2016-13:37:17 Refer to Section 6.2
7 385986LZGVG 13/07/2016-17:56:32 Refer to Section 6.2
8 3859942HYCEA 16/07/2016-20:03:51 Refer to Section 6.2
3859587OFFWQ 16/07/2016-21:01:38 Refer to Section 6.2
385985CYDWI 25/07/2016-13:42:34 Refer to Section 6.2
9 3859947RTRKA 26/07/2016-16:42:33 Refer to Section 6.2
10 3859980ZBDXE 28/07/2016-00:04:01 Refer to Section 6.2
22
3859941JSEPA 31/07/2016-13:00:29 Refer to Section 6.2
Table 8: PROTON's derivations (situations) times for alarm X versus MEGS
Further analysis in collaboration with experts in HT resulted in the following:
Rows 1 and 3 in Table 7 couldn’t have been detected by PROTON since the dates for these calling numbers
are July 1, meaning we need events from June 30 in order to be able to derive the corresponding derived
event. Therefore, we exclude these two numbers from our calculations of precision and recall. Regarding the
calling numbers found by PROTON and not found by MEGS, two out of these three numbers were not in
the list of X alarms but were entered mistakenly under Y alarms in MEGS annotation table (3859941JSEPA
and 385985CYDWI) since the annotation table was prepared manually. Therefore, these two situations are
considered true positives. With regards to the third situation, i.e. 3859587OFFWQ, the reason why it was not
detected by MEGS is not fully clear. We apply a conservative approach and consider this calling number as a
false positive, though our drill down of the pattern matching shows clearly that this calling number should be
flagged as a fraud according to the event rule.
Therefore, we have a recall of 100% (we have identified all fraudulent numbers as marked by MEGS) while
the precision is 91% (assuming MEGS is the ground truth, we have 10 true positives while 1 false positive).
3.2.3.1.2 Alarm Y
The csv file contained 95 flagged calling numbers. Following a similar analysis as for Alarm X resulted in a
precision of 97% (as before, assuming MEGS is the ground truth, we have 95 true positives while 3 false
positives) and recall of 100%. Again, we apply a conservative approach and consider this calling numbers as
false positives, though our drill down of the pattern matching shows clearly that this calling numbers should
be flagged as a fraud according to the event rule.
3.2.3.1.3 Alarm Z
This type of alarm has not been working properly in MEGS for some time, so we were not able to get any
annotations. However, specifically for this alarm type PROTON has not detected any fraudulent calling
number, so we cannot make any conclusion based on this alarm.
3.2.3.2 Timing of detection
One of the main characteristics of complex event processing systems is their ability to derive situations in
real-time. This is extremely important in use cases such as fraud detection, since the sooner an operator
blocks a fraudulent mobile phone number, the higher the savings can be. To check whether we were able to
detect fraudulent calls before MEGS, we compared the timestamps of PROTON derived events to their
counterparts in the csv file.
Time saving results are significant both for Alarm X and Y. Further details regarding timing are discussed in
Section 6.2 due to privacy restrictions.
3.2.3.3 Latency
Latency is defined as the elapsed time between the detection time of the last input event required for a
pattern matching and the corresponding detection time of the output event. For example: assuming that the
(derived) event D is defined as a sequence of events (E1, E2, E3) then the latency is measured as the time
period since an instance of E3 arrives into the system till the emission of the corresponding instance of D.
23
Of course, this definition does not work for all event patterns. For instance, it is not applicable for ‘absence’
event patterns, or any other event patterns triggered by expiration of a time window. Therefore, in order for
this definition to hold we should analyze the performance of PROTON on a set of “applicable” patterns with
IMMEDIATE evaluation policy. All our patterns in our KPI implementation use the IMMEDIATE
evaluation policy and are TREND or aggregators, therefore applicable for latency analysis.
In order to correlate the derived event with the latest input event that triggered the derivation, we leverage the
feature of PROTON that allows attaching to a derived event the matching set of contributing input events.
Thus, given in instance of the derived event one can easily obtain the list of the events that has contributed to
the event pattern, along with their timestamps – so it’s straightforward to compute the latency as defined
above.
Big Data imposes that the performance analysis of latency will be done on a cluster infrastructure with the
testing of different workers. To this end, we exploited the infrastructure developed in the scope of the
SPEEDD2 project, comprising of a cluster with four physical machines of the following configuration:
CPU: 2 x Intel Xeon E5520 @ 2.27GHz -- Cores : 16threads (8 cores)
RAM: 12GB ECC
Disks: 2x1TB (RAID1)
NIC : 4 x 1Gbps
OS: Debian 8
In this cluster we already carried out performance tests and have a benchmark of different patterns and
applications. In our case we only have two patterns: COUNT and SUM with immediate pattern policy for
which we already have results, as detailed below.
The cluster has the Mesos4 framework installed that manages the computational resources thus simplifying
the task of cluster configuration and resource allocation. A single virtual machine runs on every physical
machine (to simplify maintenance and management), with exception of one machine where another small VM
runs that functions as a Mesos gateway server. The storm-mesos5 framework used to run STORM cluster on
Mesos, the kafka-mesos6 framework used for running Kafka cluster on Mesos.
4 http://mesos.apache.org/
5 https://github.com/mesos/storm
6 https://github.com/mesos/kafka
24
The topology of the Mesos cluster is shown in Figure 12 below:
Figure 12: Mesos cluster topology
Table 9 states the different configurations tested. As one can see in the table, configurations that involve multiple
Kafka partitions and brokers were not tested in this version. The reason for that is that initial test results have
demonstrated that the messaging layer in its minimal configuration (single broker, single executor for kafka-storm
spout, 1-2 executors for kafka-bolt) was operating significantly below its capacity, and further increase of
messaging power would not improve the performance of the entire system. As can be seen from the table, adding
more workers improves latency.
Config Number
of workers
CEP parallelization factor
End-to-end latency (ms) 50 events/sec
End-to-end latency (ms) 500 events/sec
1 1 1 31 189
2 1 2 86 253
3 2 4 25 111
4 4 8 14 44.7
5 4 16 16 87
6 8 16 11 16 Table 9: Performance Results Summary (90% percentile values)
Since the injection rate required by our use case is around 200 events/second, we can see that with the
specified configuration we can reach latency of no more than 16 ms (90% percentile value, probably much
less for average latency value). If the rate of the events increases, or faster processing is required, we can add
more STORM workers to linearly reduce the processing latency.
25
3.2.3.4 Throughput
Throughput is defined as the maximum rate at which events can be processed. Complete HT mobile network
generates approximately 500 million call/SMS events per month, which implies that required average
throughput for complete HT network is less than 200 events per second. Nevertheless, the required real-time
throughput might be higher, since in “peak” minute HT network may generate up to 340 events per second,
and in some occasions even more. Highest recorded rate was about 1750 events per second.
Experiments on the current FERARI server show that we were able to run stable system with injection rate
of 500 events per second, which is more than enough to handle current load.
FERARI server hardware configuration:
• CPU: 2 x Intel Xeon E5-2660v2 @ 2.2GHz -- Cores : 10
• RAM: 16 GB
• Disks: 8x3TB
• NIC : 1Gbps and 10Gbps
It will be considered to set up the system in distributed manner in the HT’s infrastructure once it goes for
production use. Cell towers do not allow additional data processing and cannot be used to process call events,
but HT already has a set of 80 probe servers on 6 locations across Croatia that can be used to run the
FERARI sites distributed manner. Those servers are used for network monitoring and generate aggregated
information per call, including A and B numbers, timestamps and additional call attributes needed for fraud
detection. Using these servers, we can scale out our architecture and improve throughput much more than
would be ever needed.
3.2.3.5 Revenue at Risk
In order to demonstrate the real business value of the implemented Prototype, we estimated sum of charges
generated by fraudsters which will never be collected by HT. Our aim was to decrease revenue at risk due to
the fraud by 15 percent.
Table 10: Revenue saved by earlier detection
Revenue at risk
ALARM
% of revenue at risk saved
in relation to alarm
threshold value
% of revenue saved on
average per fraud
Alarm X 62% 30%
Alarm Y 403% 30%
26
The data in the previous table was calculated with the following assumptions and considerations:
Actual average value of one hour of fraud and actual average value of fraud was taken
into consideration for Alarm X and Y (with Alarm X having threshold value for triggering
6.5 times higher).
This value has been extrapolated to total minutes saved with the earlier detection time.
We have divided this number with alarm threshold value of each alarm (??? minutes,
refer to Section 7.2 for exact values).
% of revenue at risk saved in relation to alarm threshold value = Saved revenue per
alarm/ Alarm threshold value.
Further on, we have divided average saved revenue of earlier detection with average
fraud value in HT for the selected alarms to calculate decrease of revenue at risk.
% of revenue saved on average per fraud = Saved revenue per alarm/Average fraud
value.
Since average saved time by earlier detection is some ??? minutes (refer to Section 6.2 for exact values),
we can calculate that for Alarm X and Y, we would decrease revenue at risk for some 30% which is
significantly more than 15% set in the KPI.
3.2.4 Other Test Results
3.2.4.1 Ease of Context Configuration or Event Processing
Proton‘s authoring tool is used in CEP applications, i.e. Event Processing Networks (EPNs) where they can
be defined and deployed to an engine through a web based interface.
The Proton metadata file is a JSON file containing all EPN definitions created by the Proton Authoring tool.
This includes definitions for event and action types, EPAs, contexts and producers and consumers.
When Proton runtime starts it will access the metadata file, load and parse all the definitions, create thread per
each input and output adapter and start listening for events incoming from the input adapters and forwarding
events to output adapters.
27
Figure 13: The event processing (EPA) of type SUM that specifies Alarm X
Figure 14: The definition of the input event (CallPOPDHW), e.i. the list of attributes and their types for this event type
28
Figure 15: The definition of the segmentation context, in our case, we partition the input events (CallPOPDHW) by their calling_number attribute
3.2.4.2 Interoperability
ProtonOnSTORM provides an API to integrate into standard STORM topology. That means that the system
benefits from all the various interoperability solutions the STORM platform has to offer. STORM provides
pluggable implementations for integration with different queuing and data systems as information
source/sink - messaging systems like Kestrel, RabbitMG, Kafka, JMS, databases like MongoDB, RDBMSs,
Cassandra etc. Since STORM is a large open-source community, such a list of ready-to-use integration
solutions grows continuously.
3.2.5 Summary of our Findings in the Fraud Use Case
Our results are very positive. First, we have achieved a very high percentage of recall and precision. This is
not surprising, as we specified the rules according to the domain experts and the way they are implemented in
MEGS. However, this shows that our implementation is running correctly and as expected.
More importantly, is the fact that we were able to detect the alarms in real-time as opposed to the batch
processing currently performed by MEGS, enabling a more rapid detection of frauds. We derive situations in
an average of ??? minutes earlier than MEGS (refer to Section 6.2 for exact values). This slack can be used by
the operators for an earlier inspection and if necessary blocking of a specific mobile phone number.
Regarding the KPIs listed in Section 3.2.1, as we alert immediately as the pattern is matched and we don’t
wait until the end of the hour, the time savings are (MEGS processing time + the remaining time until the
end of the hour time window). As already mentioned, the average saving surpassed the expected KPI1
29
outcome (target to 18 minutes after collection). Regarding KPI2, we only have 1 false positive for alarm while
we have 3 false positives for Alarm Y (the reason for which is undetermined since judging by the data and the
rule definition they should have been marked as fraud by MEGS as well) with a recall of 1 in both cases.
30
3.3 System Health Monitoring Use Case
In the third year of the FERARI project, the Technion has looked into DSLAM performance data provided
by HT. The data consisted of hourly readings of measurements from more than 678,000 DLAM ports. In
addition, HT has provided logs of events related to the performance of the ports. Examples of such events
are a port successfully establishing a link with the endpoint, a port losing a link, or the endpoint equipment
losing power. Finally, a daily quality score, known as the Line Stability Index (or LSI), has been calculated by
HT for every port in the dataset. LSI values range from 2 to 5 where a score of 5 indicates a very stable line,
and a score of 2 indicates a very unstable line. Poor line quality (a score of 2 or 3) was take as an indication of
a failure in the DSLAM equipment.
After closely examining the data and receiving further clarification from HT, it became apparent that while
poor line quality can be a product of malfunctioning DSLAM equipment, it is also effected by the
performance of the endpoint equipment (e.g. a DSL modem at the subscribers house or network equipment
connected to it), and by actions of the end user (powering down, restarting, or disconnecting such equipment).
Since it is impossible to isolate the effects of endpoint equipment failures and actions on line quality from
those of DSLAM equipment failures, and since an off-the-shelf DSLAM modem is more likely to fail than a
telco grade DSLAM, predicting poor line quality based on DSLAM performance indicators seems to be an
intractable task.
Hence we concluded to continue evaluation of the system health monitoring on other real-world data as
described in D3.3
We present descriptive statistic of the data in Appendix C due to its confidential nature.
31
4 Summary
In Year 3 of the FERARI project we have tested the solution in terms of business defined KPIs, performance of the system and general usability.
Overall, the results show business benefits of complex event processing for the business users, i.e. quicker reaction times and potential cost savings compared to current speed of the solution implemented in HT (though the current system in HT could be improved to certain extent). We have achieved all defined business KPIs: detection of at least the same number of fraudsters, better detection time and decreased revenue at risk through better detection time. Additionally, latency is easily reduced by adding more STORM workers to the system if needed with the increase of events injected. Proton’s authoring tool enables easier configuration of event processing thus enabling easier deployment in production environment. Though solution interface for Fraud use case enables first level analysis, it requires some additional development in order to enable minimum set of usability options for a day-to-day use. However, given that all three business KPI’s for Fraud use case were achieved this investment should have a positive ROI and shall be considered.
We have analyzed the DSLAM performance data and gotten to the conclusion that, while poor line quality of the DSLAM devices can be a sign of a device malfunction itself, it can also be affected by poor endpoint equipment. Since it is currently impossible to isolate the effects of endpoint equipment failures and actions on line quality from those of DSLAM equipment failures, and given that the off-the-shelf DSLAM modem is more likely to fail than a telco grade DSLAM, predicting poor line quality based on DSLAM performance indicators seems to be an intractable task. Therefore, we continued the evaluation of the system health monitoring on other real-world data as described in D3.3.
32
5 Appendix A - Hardware configuration
HP ProLiant DL380p Gen8 12 LFF Configure-to-order Server
HP DL380p Gen8 Intel Xeon E5-2660v2 (2.2GHz/10-core/25MB/95W) FIO Processor Kit
HP DL380p Gen8 Intel Xeon E5-2660v2 (2.2GHz/10-core/25MB/95W) Processor Kit
HP 16GB (1x16GB) Dual Rank x4 PC3-12800R (DDR3-1600) Registered CAS-11 Memory Kit
8 x HP 3TB 6G SATA 7.2K rpm LFF (3.5-inch) SC Midline 1yr Warranty Hard Drive
HP 2GB P-series Smart Array Flash Backed Write Cache
HP Ethernet 1Gb 4-port 331FLR FIO Adapter
HP Ethernet 10Gb 2-port 561T Adapter
HP 750W Common Slot Platinum Plus Hot Plug Power Supply Kit
HP iLO Advanced including 1yr 24x7 Technical Support and Updates Single Server License
HP Insight Control including 1yr 24x7 Support ProLiant ML/DL/BL-bundle Tracking License
HP 2U LFF BB Gen8 Rail Kit with CMA
33
6 Appendix B – Fraud Alarms Description and Results (CONFIDENTIAL)
It’s important to note that as per HT’s privacy requirements, the thresholds shown in the event patterns are
masked with “???” due to privacy issues. However, the implementation uses the correct numbers so
performance evaluation is possible.
In this new event processing network (EPN) we fire alerts in the following cases (for detailed descriptions of
each EPA see Section 6.1):
This alarm detects all customers that spend on voice premium rate services ??? kn or more in a
period of 24 hours (EPA1, Alarm X)
This alarm detects all customers that are active ??? days or less and spend on voice premium
rate services ??? kn or more in a period of 24 hours (EPA2, Alarm Y).
This alarm detects all customers that make ??? calls or SMSes to premium rate voice and SMS
VAS services in 4 hour period (EPA3, Alarm Z).
Note the following:
As previously noted, the EPN implemented runs with real thresholds
Premium services are calling numbers with prefix 06
Voice services are CDRs whereas call_service = T11
SMS services are CDRs whereas call_service = T12
6.1 Event processing agents
Henceforth, we describe the EPAs in the following order: Event name; motivation; event recognition
process; contexts along with temporal context policy; and pattern policies.
In the event recognition process we only show the steps that take place, i.e. relevant, in the specific EPA,
while the others are greyed. For the filtering step we show the filtering expression; for the matching step we
denote the pattern variables; and for the derivation step we denote the values assignment and calculations.
Please note that for the sake of simplicity we only show the assignments that are not copy of values (all other
derived event attributes values are copied from the input events). For attributes, we just denote their names
without the prefix of ‘attribute_name.’
6.1.1 EPA1: Alarm X
Motivation: We are looking for customers with high use to voice premium services as potential fraudsters. In
this type of alarm MEGS is triggered every hour (45 min after the rounded hour, e.g., 8:45, 9:45, and so on)
and looks for this pattern 24 hours backwards.
Event recognition process
34
Figure 16: Event recognition process for Alarm X EPA
Pattern policies
Evaluation Cardinality Repeated Consumption
IMMEDIATE SINGLE FIRST REUSE
Context
Segmentation: calling_number
Temporal (fixed sliding window):
• Initiator: 8:45 AM
• Duration: 24 hours
• Sliding period: 1 hour
Meaning: For each customer, the first window opens at 8:45 AM, and at each hour afterwards as we have
overlapping sliding windows. Each window lasts 24 hours. For example (see Figure 17), let’s assume that for a
certain customer we detect one alarm X in the first temporal window open (in blue), one in the second (in
azure), and one in the third (in green). Note that in the first temporal window we actually detect two derived
events (second one in grey), however since the policies applied are IMMEDIATE and SINGLE, we only fire
one derived event as soon as it is detected. This derived event will be fired in the scope of the second
temporal window (in azure). The color of events in each temporal window denotes the specific events in this
temporal window, while the blue events denote the common events to all open windows in our example
(non-overlapping events can only occur in the tails of the windows when new input events arrive).
35
Figure 17: Context for Alarm X EPA per single customer
6.1.2 EPA2: Alarm Y
Motivation: We are looking for relatively new customers in the system, with high use to voice premium
services as potential fraudsters. In this type of alarm MEGS is triggered every hour (45 min after the rounded
hour, e.g., 8:45, 9:45, and so on) and looks for this pattern 24 hours backwards.
Figure 18: Event recognition process for Alarm Y EPA
36
Pattern policies
Evaluation Cardinality Repeated Consumption
IMMEDIATE SINGLE FIRST REUSE
Context
Segmentation: calling_number
Temporal (fixed sliding window): Same as for Alarm X (see Figure 17).
• Initiator: 8:45 AM
• Duration: 24 hours
• Sliding period: 1 hour
6.1.3 EPA3: Alarm Z
Motivation: We are looking for customers with a large number of calls to premium rate voice and SMS VAS
services in 4 hour period as potential fraudsters. In this type of alarm MEGS is triggered every hour (30 min
after the rounded hour, e.g., 8:30, 9:30, and so on) and looks for this pattern 4 hours backwards.
Figure 19: Event recognition process for Alarm Z EPA
Pattern policies
Evaluation Cardinality Repeated Consumption
IMMEDIATE SINGLE FIRST REUSE
37
Context
Segmentation: calling_number
Temporal (fixed sliding window): Same as for Alarm X and Alarm Y but the length of the window is 4 hours
instead of 24 hours, and the initiator is 30 min after the rounded hour (8:30, 9:30, 10:30, and so on)
• Initiator: 8:30 AM
• Duration: 4 hours
• Sliding period: 1 hour
6.2 Timing of detection in more details
Row # in Table
2
Billed_msisdn PROTON’s detection time
MEGS’ detection time
2 3859932FHJRR 01/07/2016-00:39:25 01/07/2016-00:58:28
4 3859966BYSFE 04/07/2016-15:09:59 04/07/2016-17:54:08
5 3859942IHCQC 06/07/2016-00:29:35 06/07/2016-02:58:28
6 3859816SNBYQ 12/07/2016-13:37:17 12/07/2016-15:52:00
7 385986LZGVG 13/07/2016-17:56:32 13/07/2016-18:57:35
8 3859942HYCEA 16/07/2016-20:03:51 16/07/2016-21:46:32
3859587OFFWQ 16/07/2016-21:01:38
385985CYDWI 25/07/2016-13:42:34
9 3859947RTRKA 26/07/2016-16:42:33 26/07/2016-18:50:02
10 3859980ZBDXE 28/07/2016-00:04:01 28/07/2016-02:53:53
3859941JSEPA 31/07/2016-13:00:29
Table 11: PROTON's derivations (situations) times for alarm X versus MEGS – real values
In our case, MEGS runs the query for each of the alarms every hour. Therefore, theoretically, the maximum
potential time that can be gained by PROTON is (60 + MEGS processing time) min, assuming that the
detection is done just after the window is opened. On the other hand, in the worst case, that is, the detection
by PROTON is done only at the end of the temporal window; the potential time that can be gained is
(MEGS processing time) min.
Calculating the differences in the times detection of MEGS and PROTON from Table 8 we can see that we
have an earlier average detection of PROTON of 115 minutes, with a maximum of 169 minutes and a
minimum of 20 minutes. The results for alarm Y were almost the same, showing we can get a significant
“time saving” in time due to our real-time detection.
38
7 Appendix C - DSLAM Descriptive Statistics Results (CONFIDENTIAL)
In this section we present several characteristics of the network equipment data. We focus on a subset of the
data for October 2016. We focus on the following types of daily events:
1. Ports that lost their link to the endpoint equipment at least once in a given day
2. Ports with an LSI score of 2 or 3 (“unstable” or “very unstable”)
Figure 20 depicts the daily percentage of ports that have experienced one of the events described above
between 01.10.2016 and 31.10.2016. During this period, an average of 9.94% of the ports experience at least
one link loss per day, an average of 2.41% of the ports were scored as “unstable” or “very unstable”. These
event rates can reach a peak of 21.95% for ports with at least on link lost on the 15.10.2016, and of 9.28% for
ports with an LSI of 2 or 3 on the 03.10.2016. Figure 21 focuses on the daily percentage of ports with an LSI
score of 2 or 3.
Figure 22 depicts the distribution of link loses over the time-of-day. Most loses of links (5.39%) occur
between 10:00 AM to 11:00 AM. After 06:00 PM the probability of a loss of link starts declining, and starts
rising again at 02:00 AM.
Finally, Table 12 presents the probability of a ports LSI score on the succeeding day given its current LSI.
This table reveals that a port that received a LSI score of 5 (“very stable”) has a 97.85% chance of receiving
the same score on the next day, and a 99.55% of receiving a score of 4 or above. Ports with an LSI score of 2
(“very unstable”) have a probability of 60.45% of receiving the same score on the next day and a probability
of 73.92% of receiving a score of 3 or below (“unstable” or “very unstable”). Ports with an LSI score of 2
only have a probability of 17.15% of receiving an LSI score of 5 (“very stable”) on the succeeding day.
Figure 20: Daily percentage of port link loss events and line stability index
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
Pecentage of Ports with Link Loss and LSI index
Ports with line Loss LSI = 2 LSI = 3
39
Figure 21: Daily percentage of port with LSI <= 2
Figure 22: Distribution of link loses over the time of day.
LSI on Succeeding Day
2 3 4 5
LSI
2 60.75% 13.17% 8.93% 17.15%
3 11.07% 19.29% 21.86% 47.77%
4 2.06% 7.63% 22.49% 67.82%
5 0.08% 0.37% 1.70% 97.85% Table 12: The probability of a ports LSI on the next day given its current LSI
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
Pecentage of Ports with LSI values
LSI = 2 LSI = 3 LSI = 2,3
0
0.01
0.02
0.03
0.04
0.05
0.06
0:00 6:00 12:00 18:00 0:00
Pro
bab
ility
of
Lin
k Lo
ss
Time of Day
Distribution of Link Loss over Time of Day