Process-oriented System Analysis Process Mining
Jan 05, 2016
Process-oriented System AnalysisProcess Mining
BPM Lifecycle
Motivation
Up until now:Designed or pre-defined models
Assumption that they are appropriate
Process Mining
Consideration of information from the execution of proceses
This is covered in log data
Logs
Sequence of log entries, which capture events in a company that relate to processes
Log entries
Examples of log entries
Check Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:19:57
Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“) completed on 12.11.2010 at 9:22:24
Send Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:23:18
Function ContactCustomer(c1987, PromoMailing) completed on 12.11.2010 at 9:24:10
Function StoreCustomerData(„Miller“, c1988, „Osnabrück“) completed on 12.11.2010 at 9:26:08
Check Invoice for Invoice No. 4568 completed on 12.11.2010 at 9:26:38
Function ContactCustomer(c1988, PromoMailing) completed on 12.11.2010 at Send 9:27:32
Logs bear valuable information
Logs bear valuable information to answer questions likeWhen and how many process instances have been executed?
Are there recurring patterns in the execution of activities?
Can process models be derived from the data?
Which paths of execution are used how often in the process models?
Are there paths which are never taken?
Process Discovery
Process Discovery is a technique for deriving a process model from log data
Input: execution logs as ordered lists of activities with time stamp and case id
Output: process model which could have generated the execution logs
The case id is often not directly covered in the data, and needs to be generated in pre-processing
Process Conformance
Process Conformance is a technique to analyze the relationship between log data and process models
Input: Logs and process model
Output: information on the relationship, e.g. fitness
Overview
Execution Logs
AssumptionExecution log defines complete order of events, which can all be
related to process activitiesAll events in the execution log relate to process instances of the
considered process
HintOften log entries refer to different process modelsThis warrants filtering activities
AbstractionTechniques often work on abstraction of logsFocus on case id and activities
Execution Log Format
Log format(caseID, activity)
ExampleCheck Invoice for Invoice No. 4567 completed on 12.11.2010 at
9:19:57
Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“) completed on 12.11.2010 at 9:22:24
Send Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:23:18
Resulting Log(4567, Check Invoice), (c1987, StoreCustomerData), (4567, Send
Invoice), etc.
Execution Log
Further abstraction
A‘s and B‘s
(case id, task id)
Additional information
Event type, time, resource, data
Not considered here
Assumption
Activity execution captured by one event
No intermediate activities
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
The Alpha Algorithm
Process Discovery Algorithms
Simplest Algorithm: The α – Algorithm
Relatively simple, some properties can be proofed
Affected by Noise, therefore not first choice in practice
Noise refers to incomplete or erroneous logs
Furthermore, the α+(+) – Algorithms
α+ and α++ are extensions to the α – Algorithm for recognizing more fine-granular structure in the process model
Also affected by Noise
Finally, techniques for dealing with Noise
Definitions
Let T be a set of activities (Tasks) and T * the set of all sequences of arbitrary length over T, then we have:σ T * is called execution sequence, if all activities in σ belong to the
same process instance
W T * is called execution log (workflow log)
AssumptionsIn each process model, each activity appears at most once
Each direct neighbor relation between activities is represented at least once
Execution Logs
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
Execution Logs
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
Execution sequences:
Case 1: ABCD
Case 2: ACBD
Case 3: ABCD
Case 4: ACBD
Case 5: EF
Resultingworkflow log: W = {ABCD, ACBD, EF}
Order relations
Log based order relations for pairs of activities a, b T in a workflow log W:Direct successor
a >w b i.e. in an execution sequence b directly follows a
Causalitya w b i.e. a >w b and not b >w a
Concurrency a ║w b i.e. a >w b and b >w a
Exclusivenessa w b i.e. not a >w b and not b >w aActivity pairs which never succeed each other
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
W = {ABCD, ACBD, EF}• Direct successor• Causality• Concurrency
Execution log analysis
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
A>BA>CB>CB>DC>BC>DE>F
AB
AC
BD
CD
EF
B||CC||B
1) 2) 3)
• W = {ABCD, ACBD, EF}• Direct successor• Causality• Concurrency
Execution log analysis
α-Algorithm
The idea is to utilize order relations for deriving a workflow net that is compliant with these relations
Precisely, each order relation results in a petri net fragment, which imposes the respective relationship
α-Algorithm
Idea (a)
a b
α-Algorithm
Idea (b)
a b, a c and b # c
α-Algorithm
Idea (c)
b d, c d and b # c
α-Algorithm
Idea (d)
a b, a c and b || c
α-Algorithm
Idea (e)
b d, c d and b || c
The Alpha-Algorithm (simplified)
1. Identify the set of all tasks in the log as TL.
2. Identify the set of all tasks that have been observed as the first task in some case as TI.
3. Identify the set of all tasks that have been observed as the last task in some case as TO.
4. Identify the set of all connections to be potentially represented in the process model as a set XL. Add the following elements to XL:
a. Pattern (a): all pairs for which hold a→b.
b. Pattern (b): all triples for which hold a→(b#c).
c. Pattern (c): all triples for which hold (b#c)→d.
Note that triples for which Pattern (d) a→(b||c) or Pattern (e) (b||c)→d hold are not included in XL.
The Alpha-Algorithm (cont.)
5. Construct the set YL as a subset of XL by:
a. Eliminating a→b and a→c if there exists some a→(b#c).
b. Eliminating b→c and b→d if there exists some (b#c)→d.
6. Connect start and end events in the following way:
a. If there are multiple tasks in the set TI of first tasks, then draw a start event leading to an XOR-split, which connects to every task in TI. Otherwise, directly connect the start event with the only first task.
b. For each task in the set TO of last tasks, add an end event and draw an arc from the task to the end event.
The Alpha-Algorithm (cont.)
7. Construct the flow arcs in the following way:
a. Pattern (a): For each a→b in YL, draw an arc a to b.
b. Pattern (b): For each a→(b#c) in YL, draw an arc from a to an XOR-split, and from there to b and c.
c. Pattern (c): For each (b#c)→d in YL, draw an arc from b and c to an XOR-join, and from there to d.
d. Pattern (d) and (e): If a task in the so constructed process model has multiple incoming or multiple outgoing arcs, bundle these arcs with an AND-split or AND-join, respectively.
8. Return the newly constructed process model.
α-Algorithm Example
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
α-Algorithm Example
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
a(W):
α-Algorithm
Log Completeness
Level of completeness required for a log
Assume for the execution sequence EF, there is a log missing
Then, the correct process model cannot be derived
Basic assumption: each execution sequence must be part of the log
Consequence: the complete behaviour is visible
Problem: amount of required instances grows dramatically
Example:
10 activities are executed in parallel
Amount of potential execution sequences:10! = 3.628.800
Log Completeness
Result
For the α-Algorithm it is sufficient to have completeness in terms of the successor relationship (>w)
Reason
All other relations are derived from direct successorship
Interpretation
Each time two activities may succeed each other, this must be visible in at least one execution sequence
Hint
In case of highly concurrent process models, this reduces the amount of required execution sequences dramatically
Summary
• Execution Logs• Process Mining using the Alpha-Algorithm