Decision Mining RevisitedDiscovering Overlapping Rules
Felix Mannhardt, Massimiliano de Leoni,Hajo A. Reijers, Wil M.P. van der Aalst
Scope: Mining decision rules from event logs
PAGE 2
Apply
Amount
GrantExtensive
Check
RejectEligibility
Simple Check
Request InformationIncome
Receive Information
Category
ActivityData
Control-flow – Petri net defines order & possible choices
PAGE 3
Apply GrantExtensiveCheck
RejectSimple Check
Request Information
ReceiveInformation
Exclusive Choice
Sequence
Exclusive Choice
Data-perspective – Data Petri Net modelling decisions
PAGE 4
Apply
Amount
GrantExtensiveCheck
RejectSimple Check
Eligibility
Rating
[Eligibility = No]
[Eligibility = Yes]
Request Information
ReceiveInformation
Decision point
Data recording
Decision rule
PAGE 5
DMN 1.1 released on 2016
Widely adopted by tool vendors, for example:
U Eligibility Outcome1 Yes Grant
2 No Reject
Decision Table
Grant
Reject
[Eligibility = No]
[Eligibility = Yes]
Comparing the Petri net notation to DMN
Decision Rule / Guard
Why are overlapping rules needed?
PAGE 6
Incomplete Information
• Not recorded• Process context• Confidential• ...
• Expert approval• Deferred choice• Randomized check• Inconsistent human behavior• ...
Goal: Discover rules which may overlap
PAGE 7
Process Model
Event LogProcess Model with
Overlapping Decision Rules
Overlapping Rule Discovery
Decision point - Mutually-exclusive rule
PAGE 8
Grant
Reject
[Eligibility = No]
[Eligibility = Yes]
Count Eligibility Outcome5x “No” Reject
20x “Yes” Grant
Observation instances from an event log
Grant
Reject
Decision point – Overlapping rule
PAGE 9
ExtensiveCheck
Simple Check
Request Information
Apply
Amount
Rating[Rating = Unknown OR Rating = Bad AND Amount = High]
[Rating = Bad]
[Rating = Good OR Rating = Bad AND Amount = Low]
C Rating Amount Activity1 Good - Simple Check
2 Bad - Extensive Check
3 Bad Low Simple Check
4 Bad High Request Information
5 Unknown - Request Information
Alternative Decision Table Notation
Proposed Discovery Method
PAGE 10
Process Model
Event LogProcess Model
With Overlapping RulesOverlapping Rule
Discovery
foreach Decision Point
Collect Instances
1st
Classification2nd
ClassificationCollect
MisclassifiedBuild Rules
1) Collect Instances
PAGE 11
Event Log collect
Rating Amount Outcome
6x Good Low Simple
6x Good High Simple
6x Bad High Extensive
4x Bad High Request
6x Bad Low Extensive
4x Bad Low Simple
6x Unknown High Request
Observation instances
• Cyclic Behavior• Noise (Missing / Additional Events)• Unassigned values• Inconsistent recording
Alignment-based method
2) 1st Classification & 3) Misclassified Instances
PAGE 12
Rating Amount Outcome
6x Good Low Simple
6x Good High Simple
6x Bad High Extensive
4x Bad High Request
6x Bad Low Extensive
4x Bad Low Simple
6x Unknown High Request
Rating
Simple RequestExtensive
Good UnknownBad
12 OK 12 OK8 NOK
6 OK
Instances Decision Tree
4) 2nd Classification
PAGE 13
Instances
Amount
Request Simple
High Low
2nd Decision Tree
Rating Amount Outcome
4x Bad High Request
4x Bad Low Simple
5) Build Overlapping Decision Rules
PAGE 14
Rating
Simple RequestExtensive
Good UnknownBad
Amount
Request Simple
High Low
Compiled to overlapping rules
If Rating = Good then Simple
If Rating = Unknown then Request
If Rating = Bad then Extensive
If Rating = Bad AND Amount = High
then Request
If Rating = Bad AND Amount = Low
then Simple
Resulting Data-aware Process Model
PAGE 15
ExtensiveCheck
Simple Check
Request Information
Apply
Amount
Rating[Rating = Unknown OR Rating = Bad AND Amount = High]
[Rating = Bad]
[Rating = Good OR Rating = Bad AND Amount = Low]
Trade-off: Precise and fitting model
PAGE 16
Rating Amount Outcome
6x Good Low Simple
6x Good High Simple
6x Bad High Extensive
4x Bad High Request
6x Bad Low Extensive
4x Bad Low Simple
6x Unknown High Request
ExtensiveCheck
Simple Check
Request Information
Apply
Amount
Rating
ExtensiveCheck
Simple Check
Request Information
Apply
Amount
Rating[Rating = Unknown OR Rating = Bad AND Amount = High]
[Rating = Bad]
[Rating = Good OR Rating = Bad AND Amount = Low]
ExtensiveCheck
Simple Check
Request Information
Apply
Amount
Rating [Rating = Unknown]
[Rating = Bad]
[Rating = Good]
Unfitting
Imprecise[Underfitting]
Good Trade-off
Evaluation – Measures
PAGE 17
Precision Fitness
How much unobserved behavior
is modelled?
How much observed behavior is modelled?
Image source (CC BY-SA): https://en.wikipedia.org/wiki/Precision_and_recall#/media/File:Precisionrecall.svg
Evaluation – Setup
PAGE 18
Method Description Expected Precision
Expected Fitness
WO Without rules Poor Good
DTF Mutually-exclusive approach Good Poor
DTT Naïve overlapping approach Poor Good
DTO Presented overlapping approach Balanced Balanced
Dataset # Traces # Events # Attributes # DecisionsRoad Fines 150,000 500,000 9 5
Hospital 1,000 15,000 39 11
Datasets
Compared Methods
Evaluation – Example rules in the hospital data
PAGE 19
Intensive Care
Triage
S-p5Normal Care
skipLactate
Hypotensie
Infusions
Tests
Release
Method Intensive Care Normal Care SkipDTO L > 0 H = ∧ true L > 0 L ≤ 0 ∨
(L > 0 H = ∧ false)
DTT true L > 0 L ≤ 0
DTF false L > 0 L ≤ 0Imprecise
Unfitting
Good trade-off
Evaluation – Precision & Fitness
PAGE 20
Fitness Precision
• Fitness how often rules are violated• DTO improves fitness over DTF (mutually-exclusive)
• Precision how strict are the rules• DTO improves precision against WO• DTO does sacrifice precision vs. DTF
Conclusion & Future Work
• Method: Discovery of overlapping rules using event logs• Based on decision tree induction• ProM framework: MultiPerspectiveExplorer
http://www.promtools.org• Results: Trade-off fitness & precision
• Improves the model fitness over standard trees
• Improves the model precision over naïve approach
• Future work• Better experimental validation• Manage the complexity of discovered rules• Imbalanced distributions
PAGE 21
Questions?
PAGE 22
@fmannhardt - [email protected] - http://promtools.org
Multi-Perspective Explorer