Cleaning Structured Event Logs: A Graph Repair Approach Jianmin Wang 1 , Shaoxu Song 1 , Xuemin Lin 2 , Xiaochen Zhu 1 , Jian Pei 3 1 Tsinghua University, China 2 University of New South Wales, Australia 3 Simon Fraser University, Canada 1/23 ICDE 2015
23
Embed
Jianmin Wang 1, Shaoxu Song 1, Xuemin Lin 2, Xiaochen Zhu 1, Jian Pei 3 1 Tsinghua University, China 2 University of New South Wales, Australia 3 Simon.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Event Log Information systems record the business history in their event logs.
3/23
Huge Amount of Event Data:
Corporation
Products
No. of Event Traces 1,230,000
Power Generator
3,260,000
Machinery
2,600,000
Train
Event Name Operator Successor
t1 submit M. Liu F. Kang
t2 design F. Kang J. Zhe & O. Chu
t3 insulation proof J. Zhe X. Feng
t4 check inventory O. Chu X. Feng
t5 evaluate X. Feng System2
t6 archive System2 -------
ICDE 2015
Structured Event Data Structural information do exist among events.Task passing relationships:
4/23
Event Name Operator Successor
t1 submit A B
t2 design B C & D
t3 insulation proof C E
t4 check inventory D E
t5 evaluate E F
t6 archive F -------
Structured Event Log Execution Graph
Human Task Service Task
submit design
insulation proof
check inventory
archiveevaluate
ICDE 2015
Process Specification Business events often follow certain business rules or constraints
5/23
Process specification
Execution
Constraints by Petri net:• Sequence• Parallel• Choice
start enda
b c
d e
f g h
ssubmit
revise proof check
merge re-evaluate
archivedesign check inventory
evaluate
electrician proof
insulation proof
XORsplit
XORjoin
ANDsplit
ANDjoin
asubmit
revise merge re-evaluatefo
llow
submit design
insulation proof
check inventory
archiveevaluate submit design
electricianproof
check inventory
archiveevaluatesubmit revise proofcheck
merge archivere-evaluate
ICDE 2015
Conformance6/23
start enda
b c
d e
f g h
ssubmit
revise proof check
merge re-evaluate
archivedesign check inventory
evaluate
electrician proof
insulation proof
submit design
insulation proof
check inventory
archiveevaluate
p0:start
p7: end
p1:a
p2:b
p3:c
p4:d
p5:e
p6:s
t1:submit
t6:archive
t2:design
t4: check inventory
t5:evaluate
t3: insulation proof
Representing execution as Causal Net (Petri net without XOR)
A mapping
Process specification
CausalNet p0 p7p1
p2 p3
p4 p5 p6t1 t6t2 t4 t5
t3
start enda
b c
d e ssubmit archivedesign check inventory
evaluate
insulation proof
ICDE 2015
Dirty Event Data7/23
check inventory
electrician proof
insulation proof
start enda
b c
d e
f g h
ssubmit
revise proof check
merge re-evaluate
archivedesign evaluate
p0:start
p7: end
p1:a
p2:b
p3:c
p4:d
p5:e
p6:s
t1:submit
t6:archive
t2:revise
t4: -------- t5:evaluate
t3: proof
p0:start
p3: end
p1:a
p2:b
t1:submit
t3:archive
t2:design
Inconsistent Labeling
Unsound Structure
check inventory
electrician proof
insulation proof
submit
revise proof check
merge re-evaluate
archive
t2:revise
t3: proof
t4: --------
electrician proof
insulation proof
proof checkTwo types of dirty event data:
According to the specification:
ICDE 2015
Meaning of Repair8/23
The causes of dirty events: Man-made errors (typo); System failures (power down).
Survey in a bus manufacturer: 82% executions are dirty; 77.62% are inconsistent labeling, 4.45% are unsound structure.
Dirty event data may: Return wrong provenance answer; Mislead the aggregation profiling; Obstruct finding interesting process patterns.
ICDE 2015
Repair Dirty Event9/23
Inconsistent Labeling
Unsound Structure
p0:start
p7: end
p1:a
p2:b
p3:c
p4:d
p5:e
p6:s
t1:submit
t6:archive
t2:design
t4: check inventory
t5:evaluate
t3: electrician proof
p0:start
p7: end
p1:a
p2:b
p3:c
p4:d
p5:e
p6:s
t1:submit
t6:archive
t2:design
t4: check inventory
t5:evaluate
t3: insulation proof
1. Find all consistent mappings
2. Choose the one with the minimum repairing cost
No valid repair is found
ICDE 2015
Hardness and Related Work10/23
Hardness:Owing to choices and parallelization of flows, there exist vast possible repairs; Existing methods:Event Log Alignment1:
Does not exploit structural information.
Graph Repair2: Does not consider AND and XOR constraints.
1. M. de Leoni, F. M. Maggi, and W. M. P. van der Aalst. Aligning event logs and declarative process models for conformance checking. In BPM, pages 82–97, 2012.2. S. Song, H. Cheng, J. X. Yu, and L. Chen. Repairing vertex labels under neighborhood constraints. PVLDB, 7(11):987–998, 2014