Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen
Dec 25, 2015
Big Data Management
for Process MiningJuly 2014
Shiva Shabaninejad
B.F. van Dongen
Big Data In Process Mining
2
Bottleneck ?
• Load event log in memory
3
Bottleneck ?
• Load event log in memory• In memory process mining
4
Bottleneck ?
• Load event log in memory• In memory process mining• In memory decomposition
5
Bottleneck ?
• Load event log in memory• In memory process mining• In memory decomposition
6
Addressing “log in memory” issue
7
Using a data storage
Addressing “log in memory” issue
8
Using a data storage
Splitting Data
Addressing “log in memory” issue
9
Using a data storage
Splitting Data
Data Deduplication
Addressing “log in memory” issue
10
Using a data storage
Data Deduplication
Data Buffering Splitting Data
Backward Compatible FrameworkAspect Oriented Design
11
XLog
OpenXES
PersistentXES
Overview of our approach
12
Store Log in Database
FilteringLog
Splitting Log
BufferingTrace by trace
Overview of our approach
13
Store Log in Database
FilteringLog
Splitting Log
BufferingTrace by trace
Store Log In Database
14
PersistentXES
XESLog In Database
15
TraceLog
Log
Log
Event
Trace
Trace
Trace
Log in database 1/3
16
Log
Id Name
l1 product view
Trace
Id
T1
T2
Event
Id
E1
E2
Log has trace
LogId Trace Id Order no
l1 T1 2
L1 T2 2
Trace has event
Trace Id Event Id Order no
T1 E1 1
T2 E2 1
T1 E2 2
Log in database 3/3
17
Attribute
Id AttrKey AttrVal AttrType parentId extId
A1 Concept:name ProductA String x1
A2 Time:timestamp 1970-01-01T00:00:00
Date x2
Event has attribute
eventId attrId
E1 A1
E2 a1
Trace has attribute
traceId
T1
T2
Log has attribute
LogId attrId Trace global Event global
L1 T1 0 1
L1 T2 0 0
Log in database 2/3
18
Classifier
Id Name Attr_key LogId
c1 Activity Classifier Concept:name l1
ExtentionId Name Prefix uriX1 Concept Concept /concept.xesext
Store Log in Database
Overview of our approach
19
FilteringLog
Splitting Log
BufferingTrace by trace
Filtering Log
• Using Hibernate Framework
• Object/Relational data Mapping• Generating SQL queries on the fly• Lazy loading • Transparent layer between Database and Java
code
20
XESLog Skeleton
<Log><Name> product view </Name><Trace>T1,T2,…Tn</Trace><Attribute></ Attribute>
</Log>
21
Store Log in Database
Overview of our approach
22
FilteringLog
Splitting Log
BufferingTrace by trace
Splitting and Buffering Log
Iteration over traces
23
PersistentXESXLog Skeleton
Splitted XLog
Persistent Xlog in memory
<Log> <Name> product view </Name> <Trace> <Event> <string key="concept:name“ value=“ProductA"/> </Event> <Event> <string key=“time:timestampe“ value=“1970-01-01T00:00:00"/> </Event> <Attribute>
<string key=“time:timestampe“ value=“1970-01-01T00:00:00"/> </ Attribute> </Trace> <Attribute> <string key="concept:name“ value=“Order"/> </ Attribute></Log>
Maximum memory use in PersistentXES vs. OpenXES
OpenXES PersistentXES
XESlog 133895 48.87
Supported Objects 7.44 19259.7
Mem
ory
(KB)
25
Maximum memory use in PersistentXES vs. OpenXES
OpenXES PersistentXES
Supported Objects 7.44 19259.7
XESlog 133895 48.87
Mem
ory
(KB)
26Interning string map Hibernate Session
Real life log
• 13K traces
• 260K events
• 1,082K attributes
Database Configuration
27
List of existed logs in database
28
List of existed logs in database
29
Export Log To Database
30
Experiment: CPU use of Heuristic Miner
PAGE 31
Experiment: CPU use of ILP Miner(Using 512MB memory)
PAGE 32
Conclusion
• Enhancing memory usage by
• Loading only relevant data into memory
• Splitting log directly on database
• Buffering subset of event log
• Filtering out unnecessary data
• Developing PersistentXES framework
• Backward compatible
• On the fly SQL queries PAGE 33
Future work
• Creating database view to increase performance
• Intelligent importing system to prevent data duplication
• Check whether the same data existed in database
• Database level process mining
• Store intermediate result in database
• e.g. Heuristic mining on database using one query
• e.g. Trace identification on database
PAGE 34
35
Experiment: Memory use in PersistentXES vs. OpenXES • Real-life log
• 13k traces• 260k events• 1082k attributes
• 85% memory usage reduction PAGE 36
OpenXES PersistentXESXLog Object 137,108,632 50,048
Support Objects 7,624 19,721,936
Total 137,116,256 19,780,984
LogEntity Relationship Diagram
37
LogEntity Relationship Diagram
38
Outline
• Bottleneck ?• Addressing “log in memory” issue• Overview of our approach
• Store Log in Database• Filtering Log• Splitting Log• Buffering trace by trace
• Experiment: Xlog Memory usage• Demo • Experiment: CPU use of Heuristic & ILP Miner• Conclusion• Future work
39