Top Banner
Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen
39

Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Dec 25, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Big Data Management

for Process MiningJuly 2014

Shiva Shabaninejad

B.F. van Dongen

Page 2: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Big Data In Process Mining

2

Page 3: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Bottleneck ?

• Load event log in memory

3

Page 4: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Bottleneck ?

• Load event log in memory• In memory process mining

4

Page 5: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Bottleneck ?

• Load event log in memory• In memory process mining• In memory decomposition

5

Page 6: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Bottleneck ?

• Load event log in memory• In memory process mining• In memory decomposition

6

Page 7: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Addressing “log in memory” issue

7

Using a data storage

Page 8: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Addressing “log in memory” issue

8

Using a data storage

Splitting Data

Page 9: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Addressing “log in memory” issue

9

Using a data storage

Splitting Data

Data Deduplication

Page 10: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Addressing “log in memory” issue

10

Using a data storage

Data Deduplication

Data Buffering Splitting Data

Page 11: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Backward Compatible FrameworkAspect Oriented Design

11

XLog

OpenXES

PersistentXES

Page 12: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Overview of our approach

12

Store Log in Database

FilteringLog

Splitting Log

BufferingTrace by trace

Page 13: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Overview of our approach

13

Store Log in Database

FilteringLog

Splitting Log

BufferingTrace by trace

Page 14: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Store Log In Database

14

PersistentXES

Page 15: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

XESLog In Database

15

TraceLog

Log

Log

Event

Trace

Trace

Trace

Page 16: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Log in database 1/3

16

Log

Id Name

l1 product view

Trace

Id

T1

T2

Event

Id

E1

E2

Log has trace

LogId Trace Id Order no

l1 T1 2

L1 T2 2

Trace has event

Trace Id Event Id Order no

T1 E1 1

T2 E2 1

T1 E2 2

Page 17: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Log in database 3/3

17

Attribute

Id AttrKey AttrVal AttrType parentId extId

A1 Concept:name ProductA String x1

A2 Time:timestamp 1970-01-01T00:00:00

Date x2

Event has attribute

eventId attrId

E1 A1

E2 a1

Trace has attribute

traceId

T1

T2

Log has attribute

LogId attrId Trace global Event global

L1 T1 0 1

L1 T2 0 0

Page 18: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Log in database 2/3

18

Classifier

Id Name Attr_key LogId

c1 Activity Classifier Concept:name l1

ExtentionId Name Prefix uriX1 Concept Concept /concept.xesext

Page 19: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Store Log in Database

Overview of our approach

19

FilteringLog

Splitting Log

BufferingTrace by trace

Page 20: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Filtering Log

• Using Hibernate Framework

• Object/Relational data Mapping• Generating SQL queries on the fly• Lazy loading • Transparent layer between Database and Java

code

20

Page 21: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

XESLog Skeleton

<Log><Name> product view </Name><Trace>T1,T2,…Tn</Trace><Attribute></ Attribute>

</Log>

21

Page 22: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Store Log in Database

Overview of our approach

22

FilteringLog

Splitting Log

BufferingTrace by trace

Page 23: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Splitting and Buffering Log

Iteration over traces

23

PersistentXESXLog Skeleton

Splitted XLog

Page 24: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Persistent Xlog in memory

<Log> <Name> product view </Name> <Trace> <Event> <string key="concept:name“ value=“ProductA"/> </Event> <Event> <string key=“time:timestampe“ value=“1970-01-01T00:00:00"/> </Event> <Attribute>

<string key=“time:timestampe“ value=“1970-01-01T00:00:00"/> </ Attribute> </Trace> <Attribute> <string key="concept:name“ value=“Order"/> </ Attribute></Log>

Page 25: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Maximum memory use in PersistentXES vs. OpenXES

OpenXES PersistentXES

XESlog 133895 48.87

Supported Objects 7.44 19259.7

Mem

ory

(KB)

25

Page 26: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Maximum memory use in PersistentXES vs. OpenXES

OpenXES PersistentXES

Supported Objects 7.44 19259.7

XESlog 133895 48.87

Mem

ory

(KB)

26Interning string map Hibernate Session

Real life log

• 13K traces

• 260K events

• 1,082K attributes

Page 27: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Database Configuration

27

Page 28: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

List of existed logs in database

28

Page 29: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

List of existed logs in database

29

Page 30: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Export Log To Database

30

Page 31: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Experiment: CPU use of Heuristic Miner

PAGE 31

Page 32: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Experiment: CPU use of ILP Miner(Using 512MB memory)

PAGE 32

Page 33: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Conclusion

• Enhancing memory usage by

• Loading only relevant data into memory

• Splitting log directly on database

• Buffering subset of event log

• Filtering out unnecessary data

• Developing PersistentXES framework

• Backward compatible

• On the fly SQL queries PAGE 33

Page 34: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Future work

• Creating database view to increase performance

• Intelligent importing system to prevent data duplication

• Check whether the same data existed in database

• Database level process mining

• Store intermediate result in database

• e.g. Heuristic mining on database using one query

• e.g. Trace identification on database

PAGE 34

Page 35: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

35

Page 36: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Experiment: Memory use in PersistentXES vs. OpenXES • Real-life log

• 13k traces• 260k events• 1082k attributes

• 85% memory usage reduction PAGE 36

OpenXES PersistentXESXLog Object 137,108,632 50,048

Support Objects 7,624 19,721,936

Total 137,116,256 19,780,984

Page 37: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

LogEntity Relationship Diagram

37

Page 38: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

LogEntity Relationship Diagram

38

Page 39: Big Data Management for Process Mining July 2014 Shiva Shabaninejad B.F. van Dongen.

Outline

• Bottleneck ?• Addressing “log in memory” issue• Overview of our approach

• Store Log in Database• Filtering Log• Splitting Log• Buffering trace by trace

• Experiment: Xlog Memory usage• Demo • Experiment: CPU use of Heuristic & ILP Miner• Conclusion• Future work

39