Decision Mining Revisited - Discovering Overlapping Rules

Decision Mining RevisitedDiscovering Overlapping Rules

Felix Mannhardt, Massimiliano de Leoni,Hajo A. Reijers, Wil M.P. van der Aalst

Scope: Mining decision rules from event logs

PAGE 2

Apply

Amount

GrantExtensive

Check

RejectEligibility

Simple Check

Request InformationIncome

Receive Information

Category

ActivityData

Control-flow – Petri net defines order & possible choices

PAGE 3

Apply GrantExtensiveCheck

RejectSimple Check

Request Information

ReceiveInformation

Exclusive Choice

Sequence

Exclusive Choice

Data-perspective – Data Petri Net modelling decisions

PAGE 4

Apply

Amount

GrantExtensiveCheck

RejectSimple Check

Eligibility

Rating

[Eligibility = No]

[Eligibility = Yes]

Request Information

ReceiveInformation

Decision point

Data recording

Decision rule

PAGE 5

DMN 1.1 released on 2016

Widely adopted by tool vendors, for example:

U Eligibility Outcome1 Yes Grant

2 No Reject

Decision Table

Grant

Reject

[Eligibility = No]

[Eligibility = Yes]

Comparing the Petri net notation to DMN

Decision Rule / Guard

Why are overlapping rules needed?

PAGE 6

Incomplete Information

• Not recorded• Process context• Confidential• ...

• Expert approval• Deferred choice• Randomized check• Inconsistent human behavior• ...

Goal: Discover rules which may overlap

PAGE 7

Process Model

Event LogProcess Model with

Overlapping Decision Rules

Overlapping Rule Discovery

Decision point - Mutually-exclusive rule

PAGE 8

Grant

Reject

[Eligibility = No]

[Eligibility = Yes]

Count Eligibility Outcome5x “No” Reject

20x “Yes” Grant

Observation instances from an event log

Grant

Reject

Decision point – Overlapping rule

PAGE 9

ExtensiveCheck

Simple Check

Request Information

Apply

Amount

Rating[Rating = Unknown OR Rating = Bad AND Amount = High]

[Rating = Bad]

[Rating = Good OR Rating = Bad AND Amount = Low]

C Rating Amount Activity1 Good - Simple Check

2 Bad - Extensive Check

3 Bad Low Simple Check

4 Bad High Request Information

5 Unknown - Request Information

Alternative Decision Table Notation

Proposed Discovery Method

PAGE 10

Process Model

Event LogProcess Model

With Overlapping RulesOverlapping Rule

Discovery

foreach Decision Point

Collect Instances

1st

Classification2nd

ClassificationCollect

MisclassifiedBuild Rules

1) Collect Instances

PAGE 11

Event Log collect

Rating Amount Outcome

6x Good Low Simple

6x Good High Simple

6x Bad High Extensive

4x Bad High Request

6x Bad Low Extensive

4x Bad Low Simple

6x Unknown High Request

Observation instances

• Cyclic Behavior• Noise (Missing / Additional Events)• Unassigned values• Inconsistent recording

Alignment-based method

2) 1st Classification & 3) Misclassified Instances

PAGE 12


6x Good Low Simple

6x Good High Simple


4x Bad High Request


4x Bad Low Simple


Rating

Simple RequestExtensive

Good UnknownBad

12 OK 12 OK8 NOK

6 OK

Instances Decision Tree

4) 2nd Classification

PAGE 13

Instances

Amount

Request Simple

High Low

2nd Decision Tree


4x Bad High Request

4x Bad Low Simple

5) Build Overlapping Decision Rules

PAGE 14

Rating

Simple RequestExtensive

Good UnknownBad

Amount

Request Simple

High Low

Compiled to overlapping rules

If Rating = Good then Simple

If Rating = Unknown then Request

If Rating = Bad then Extensive

If Rating = Bad AND Amount = High

then Request

If Rating = Bad AND Amount = Low

then Simple

Resulting Data-aware Process Model

PAGE 15

ExtensiveCheck

Simple Check

Request Information

Apply

Amount


[Rating = Bad]


Trade-off: Precise and fitting model

PAGE 16


6x Good Low Simple

6x Good High Simple


4x Bad High Request


4x Bad Low Simple


ExtensiveCheck

Simple Check

Request Information

Apply

Amount

Rating

ExtensiveCheck

Simple Check

Request Information

Apply

Amount


[Rating = Bad]


ExtensiveCheck

Simple Check

Request Information

Apply

Amount

Rating [Rating = Unknown]

[Rating = Bad]

[Rating = Good]

Unfitting

Imprecise[Underfitting]

Good Trade-off

Evaluation – Measures

PAGE 17

Precision Fitness

How much unobserved behavior

is modelled?

How much observed behavior is modelled?

Image source (CC BY-SA): https://en.wikipedia.org/wiki/Precision_and_recall#/media/File:Precisionrecall.svg

Evaluation – Setup

PAGE 18

Method Description Expected Precision

Expected Fitness

WO Without rules Poor Good

DTF Mutually-exclusive approach Good Poor

DTT Naïve overlapping approach Poor Good

DTO Presented overlapping approach Balanced Balanced

Dataset # Traces # Events # Attributes # DecisionsRoad Fines 150,000 500,000 9 5

Hospital 1,000 15,000 39 11

Datasets

Compared Methods

Evaluation – Example rules in the hospital data

PAGE 19

Intensive Care

Triage

S-p5Normal Care

skipLactate

Hypotensie

Infusions

Tests

Release

Method Intensive Care Normal Care SkipDTO L > 0 H = ∧ true L > 0 L ≤ 0 ∨

(L > 0 H = ∧ false)

DTT true L > 0 L ≤ 0

DTF false L > 0 L ≤ 0Imprecise

Unfitting

Good trade-off

Evaluation – Precision & Fitness

PAGE 20

Fitness Precision

• Fitness how often rules are violated• DTO improves fitness over DTF (mutually-exclusive)

• Precision how strict are the rules• DTO improves precision against WO• DTO does sacrifice precision vs. DTF

Conclusion & Future Work

• Method: Discovery of overlapping rules using event logs• Based on decision tree induction• ProM framework: MultiPerspectiveExplorer

http://www.promtools.org• Results: Trade-off fitness & precision

• Improves the model fitness over standard trees

• Improves the model precision over naïve approach

• Future work• Better experimental validation• Manage the complexity of discovered rules• Imbalanced distributions

PAGE 21

Questions?

PAGE 22

@fmannhardt - [email protected] - http://promtools.org

Multi-Perspective Explorer

Decision Mining Revisited - Discovering Overlapping Rules

Science

Decision Mining Revisited - Discovering Overlapping Rules