-
1Turning Event Logs into Process Movies:Animating What Has
Really Happened
Massimiliano de Leoni, Suriadi Suriadi, Arthur H. M. ter
Hofstede, Wil M. P. van der Aalst
AbstractTodays information systems log vast amount of data which
contains information about the actual execution of
businessprocesses. The analysis of this data can provide a solid
starting point for business process improvement. This is the realm
of processmining, an area which has provided a repertoire of many
analysis techniques. Despite the impressive capabilities of
existing processmining algorithms, dealing with the abundance of
data recorded by contemporary systems and devices remains a
challenge. Ofparticular importance is the capability to guide the
meaningful interpretation of this ocean of data by process
analysts. To this end,insights from the field of visual analytics
can be leveraged. An approach is proposed where process states are
reconstructed fromevent logs and visualised in succession, leading
to an animated history of a process. This approach is customisable
in how a processstate, partially defined through a collection of
activity instances, is visualised: one can select a map and specify
a projection of activityinstances on this map based on their
properties. In this paper an implementation of the proposal is
described for the open-sourceprocess-mining framework ProM along
with reporting an evaluation with one of Australias largest
insurance companies: Suncorp.
Index TermsBusiness Process Mining, Visual Analytics, Event-log
Animation, Process Visualisation
F
1 INTRODUCTIONAs a result of increased automation and storage
capacity,more and more data is recorded by todays softwaresystems
and devices. The McKinsey Global Institute(MGI) estimated that
enterprises globally stored morethan 7 exabytes of new data on disk
drives in 2010,while consumers stored more than 6 exabytes of
newdata on devices such as PCs and notebooks [1]. Theamount of data
recorded in various domains has beengrowing exponentially, thereby
following Moores law.While the availability of large amounts of
data is anenabler for various forms of analysis, the sheer
quantityand diversity of this data creates new challenges [2].
In the field of Business Process Management (BPM),so-called
process-aware information systems record in-formation about the
execution of business processes inevent logs. Analysing such event
logs has been thedriver of the area of process mining (see e.g.
[3]), whichemerged a little over a decade ago. In this
relativelyshort timespan, this discipline has proven to be
capableof providing deep insight into process-related problemsthat
contemporary enterprises face. Through the appli-cation of process
mining, organisations can discover theprocesses as they are
conducted in reality, check whethercertain practices and
regulations were really followedand gain insight into bottlenecks,
resource utilisation,and other performance-related aspects of
processes.
Despite the fact that the field of process mining hasshown
itself to be a valuable addition to the BPMlandscape, dealing with
large collections of data stillremains a challenge. In fact,
although automatic tech-
M. de Leoni and W. M. P. van der Aalst are with Eindhoven
Universityof Technology. M. de Leoni is also with University of
Padua.
S. Suriadi and A. H. M. ter Hofstede are with Queensland
University ofTechnology
niques are certainly needed, process analysts need to beguided
with regards to where to focus their attentionin this ocean of
data, which automatic techniques tochoose for further analysis and
how to fine-tune thesetechniques. To achieve this, one can leverage
from thefield of visual analytics, a term coined by Jim Thomasin
[4], which combines automated analysis techniques withinteractive
visualizations for an effective understanding, rea-soning and
decision making on the basis of very large andcomplex data sets
[5].
A starting point of this paper is the belief that the
ap-plication of techniques from the field of visual analyticscan
play a significant role in overcoming the challengesrelated to the
analyses of large collections of (process)data.
In [6] a map metaphor was used to aid people in theselection of
activities to perform. A map could, e.g.,be a geographical map, a
timeline, or an organisationalchart, and activity instances are
positioned on this mapaccording to their properties. In addition,
the colour ofa dot representing an activity instance is determined
byits status or distance, e.g., how close the activity is to
itsdeadline, how long it is being executed. The approachfocussed on
showing the current state of the informationsystem at run-time.
This approach can easily be extendedto a-posteriori analysis: using
the information stored inevent logs, it is possible to replay the
history and buildthe states the system went through. Hence, for
eachmap, a sequence of different photographs can bebuilt, showing
how activities were projected on the mapin each of these states.
If, for each map, the constructedsequence of photographs is played
in succession, oneobtains a different process movie. These movies
oranimations (one per map) provide analysts and domainexperts with
a helicopter view of the past executionhistory seen from different
perspectives.
-
2PlayCButton
TimelineCandTimeCSlider
ACclockCshowingCtheCcurrentCtimestampCofCtheCmovie
SliderCtoCadjustmovieCspeed
TabsC-CAnimationCMovieCChooser
AggregatedCdotC-CsizeCofCthedotCisCpositivelyCcorrelatedCwithCtheCnumberCofCactivityCinstances
aggregated.
DistributionCofvariablesCperCdot(mouseCover)
ListCofCactivityCinstancesthatCareCnotCtoCbeC
positionedConCtheCmap.
ListCofCactivityCinstancesCthatCcannotCbeCplacedConC
theCmap.
ColourCLegend
Fig. 1. A screenshot of a process movie referring to thelog data
of a process enacted in an Australia insurancecompany to deal with
claims. Activity instances are pro-jected onto the Australian state
of the claimants.
An example of such a movie is shown in Figure 1where the event
log refers to the execution of process in-stances to handle
insurance claims of an insurance com-pany in Australia. The process
map is of a geographictype: each activity instance is projected
onto the map asa dot in the time frame when it was being executed
asrecorded in the log. The dot is positioned on the Aus-tralia
state of the claimant. At any given point in time,the size of the
dots on the map represents the number ofactivity instances which
are projected on to that position.The instances can have different
characteristics (e.g., inthe figure, relative to the claims type).
Therefore, the ar-eas of dots are also sliced according to the
percentage ofinstances with given characteristics, and different
coloursare assigned to the slices. Completed activity instancesare
no longer shown on the map. Thus, Figure 1 is thephotograph of the
activity instances at a particularpoint in time as shown in the
timestamp box (i.e. 23 Dec2011 14:18:05). The time slider at the
bottom of the figureallows users to go back or forth to a certain
point in timeto view the corresponding photograph. Alternatively,
thephotographs can be played in a continuous sequence byactivating
the play button.
In [7] an initial framework was proposed that exploitsthe map
metaphor to show a summary of an event log inthe form of an
animation. The corresponding implemen-tation was realised as a
plug-in for the ProM open-sourceprocess mining framework
(www.processmining.org). Aset of experiments was conducted with
some stakehold-ers from a Dutch municipality to obtain some
quickinitial feedback [8]. As a result of the feedback,
theframework was adjusted to cope with the problems thatwere
identified. In fact, significant extensions and mod-ifications of
the initial framework and of the accordantimplementation were
needed. This paper reports on theresulting framework, its
corresponding ProM plug-in,
and a new validation effort, this time with subjects
fromSuncorp, one of Australias largest insurance companies.This
evaluation is concerned with determining whetherthe map metaphor is
understood and can help to providemeaningful insights.
The paper is organised as follows. Section 2 positionsour work
with respect to the literature, highlighting thelimitations that
exist in relevant state-of-the-art work.Section 3 discusses the
adjusted framework. Then, Sec-tion 4 provides some details of the
implementation ofthis framework realised as a plug-in of the ProM
envi-ronment. Section 5 describes a case study illustrating
theapproach in the context of insurance claims processingfor homes
in Suncorp. Section 6 starts with reportingthe feedback from
stakeholders of a Dutch municipalityfollowed by the changes that
were required to cope withthe problems pointed out. Then, the
section describes theevaluation of the adapted framework done in
collabora-tion with Suncorp. Finally, Section 7 concludes the
paperby summarising the results obtained and identifyingfuture
directions of work.
2 RELATED WORKThe approach presented in this paper builds on
twoemerging disciplines: process mining [3] and visual analyt-ics
[5], [4]. As a general comment, one can observe thatwhile many
techniques have been developed for processmining, data mining and
statistical analysis, they oftendo not, or insufficiently, take
visualisation aspects intoaccount. Conversely, one can observe that
the researchcommunity in the area of information visualisation
(seee.g. [9] or [10]) has not focussed on
process-relatedaspects.
There exists a number of research works (e.g., [11],[12], [13])
on aspects related to visualisation in the fieldof business process
management. These works employdifferent metaphors and also mash-up
approaches torepresent the state of a process at run-time. The
approachdescribed in [6] elaborates on these ideas and captures
aprocess state as a map with a colouring scheme used forthe
activities representing their status or their character-istics.
This approach though is aimed at providing run-time support for
activity selection and not at providingsupport for the analysis of
the history of a process.
The term visual analytics was coined in [4]. This ref-erence
reviews the early work in this field. A comprehen-sive more recent
reference is provided in [5]. Examples ofrecent significant
research in the area of visual analyticscan be found in document
analysis [14], financial analy-sis [15], [16] and geo-spatial
object analysis [17]. In [18]pioneering work is reported where
visual analytics isapplied to the field of data mining. Another
interestingwork is [19] which concerns displaying multiple
timeseries without aggregation. The above body of work usesstatic
representations for capturing time dependencies,i.e. images that
summarise the analysis of time-orienteddata. When larger data sets
come into play, static repre-sentations show their limits as they
require large screens
-
3to represent the time axis. Conversely, dynamic
represen-tations (i.e., using the physical time) show their
powerwhen the focal point is to analyse the data over time.
Adetailed up-to-date survey is reported in [20], where it isclear
that there is no significant research work that usesdynamic
representations: A frequent goal is to integratedata from multiple
time stemps in a single image. Often,time abstraction is used for
this purpose. Unfortunately,to visualise process-related data,
temporal aspects are ofcrucial importance as they are related to
concurrency andcausality of the activities being performed.
Therefore,abstracting time would relegate the time dimension
tosecond-class status.
Most of the work in the visual analytics area is tailoredto a
particular application or to a particular visualisa-tion. This is
also confirmed by [21]: To our knowledge,there exists no
visualisation framework that [. . . ] provides abroader selection
of possible representations. We think thatan open framework fed
with pluggable visual and analyticalcomponents for analyzing
time-oriented data is useful. Sucha framework will be able to
support multiple analysis tasksand data characteristics, which is a
goal of Visual Analytics.In fact, the configuration of different
maps allow one toplug different representations, thus supporting
multipleviews.
To our knowledge, only the approach of the so-calledFuzzy
Animations [22] can be considered to be process-aware. It focuses
on providing a graphical user interfacewhere the past states
extracted from the event log areprojected onto a process model
which is automaticallyderived from the events in the log. It is a
valuable ap-proach though quite specific: it only focuses on
control-flow aspects and visualisation is limited to
processgraphs.
3 THE FRAMEWORKIn this section, we define a formal framework
that cap-tures the mapping of event logs to movies. Throughthe
resulting movies the history of a process can be visu-alised in
different contexts, thus facilitating its analysis.The framework is
derived from the framework presentedin [7], where the main
difference is that a richer notionof state is adopted. The notion
of state as presentedhere allows one to exploit temporal
information whendefining the positioning of activity instances.
The events in an event log can be sorted chrono-logically and
subsequently be replayed in that order.The occurrence of an event
makes the system entera particular state. Hence, by replaying all
events inthe event log in chronological order, it is possible
torebuild a process history, i.e. the sequence of states thesystem
went through. Each state can be representedas a configuration of
activity instances on a map ofchoice. In order to define how states
can be representedon a map, we have to choose an image for that
mapand to define the positioning of activity instances asdots on
that image. Such an annotated map can beseen as a photograph and,
thus, a process history can
be visualised as a sequence of such photographs, thattogether
form a movie. To convey more information,each dot can be filled
with a colour, where this colourdepends on the value of a
characteristic of choice of thecorresponding activity instance.
An activity instance is the execution of a certain activityin a
certain case and it can thus be represented as apair (at, cid),
with at A the activity type (A is theuniverse of activity types)
and cid C the case (C is theuniverse of case identifiers).
Processes also access andmodify data. Let V be the universe of
variable names,then a process data variable is a pair (vn, cid)
where vn Vand cid C. These variables can take on different valuesin
different cases and also within the same case as
timeprogresses.
Our framework only provides visualisations for ac-tivity
instances that have been created but are not yetconcluded. Such
activity instances can be in a numberof states: they can be
scheduled when they have beencreated but not yet been assigned to a
resource, they canbe assigned when they have been assigned to a
resourcebut have not yet started execution, executing when workon
them has commenced, and suspended when workon them has been
temporarily halted. We will use theset Z to capture the various
states which an activityinstance can be in, Z = {Scheduled
,Assigned ,Executing ,Suspended ,Concluded}. An activity instance
is referred toas active if it is in a state that is an element of Z
exceptfor the state Concluded.
Definition 1 (Event): Let U be the universe of val-ues that
variables can take. An event e is a tuple(at, cid , t, z, P )
where: at A is an activity type; cid is a case identifier; t is the
timestamp when event e occurred; z Z is the state to which the
corresponding
activity instance moves; P : V 6 U is an assignment of values to
variables.
Function P is partial since not every event has anassociated
value for all process variables.
We use the following functions to access the con-stituent
elements of an event e = (at, cid , t, z, P ),activity(e) = at,
case(e) = cid , timestamp(e) = t,state(e) = z and properties(e) = P
. The latter functionwill be overloaded and properties(e, vn) = P
(vn). More-over, given a function f , dom(f) represents the
domainof f .
Definition 2 (System State): Let T be the universe ofpossible
timestamps. A system state S = (, , s) con-sists of: a function :
(A C) 6 Z where (at, cid) = z
denotes that activity instance (at, cid) is in state z Z;
a function : (V C) 6 U where (vn, cid) = vvaluedenotes that
variable (vn, cid) has value vvalue ;
a function s : (A C) 6 T where s(at, cid) =t denotes that
activity instance (at, cid) started attimestamp t.
-
4Due to our definition of activity instance, as a pairconsisting
of activity name and case identifier, it is notpossible to
distinguish between different instances of thesame activity within
the same case. Such instances mayarise from the occurrence of loops
in a process model.
3.1 Creation of the Sequence of StatesSimilar to existing
algorithms for conformance check-ing [3], this framework is based
on the principle ofreplay. Events in the log are replayed to
determine, aposteriori, the sequence of states that the system
hasgone through. In order to formalise this notion for
ourframework, let us first define the overriding operator .Let f be
a function, a function f = f (x , y) is definedby f (
x ) = y and f (x) = f(x) for all x dom(f) \ {x}.
The definition of can be extended to tuple sets, byiteratively
applying the definition to all tuples in the set(noting that the
order in which the elements are chosenis not important).
Definition 3 (Replaying of events): Let Si = (i, i, si )be the
current state during replay and e be the nextevent to replay.
Replaying e causes the current state Sito change to state Si+1 =
(i+1, i+1, si+1). This changeis denoted as Si
e Si+1, wherei+1 = i ((activity(e), case(e)), state(e))i+1 = i
{((v, case(e)), properties(e, v)) | v dom(properties(e))}and if
state(e) 6= Executing then si+1 = siotherwise si+1 =
si ((activity(e), case(e)), timestamp(e))
The initial state from which replaying starts is S0 =(0, 0,
s0 ) where dom(0) = dom(0) = dom(s0 ) = .
Replaying is used to reconstruct the execution
history.Definition 4 (Execution History): Let e1, . . . , en be
the
sequence of events in an execution log ordered by times-tamp,
i.e. for every 1 i < j n, timestamp(ej) timestamp(ei). Let
S0
e1 S1 e2 . . . en Sn be thesequence of states visited when
replaying the eventlog. An execution history is a sequence of pairs
H = where (Si, ti) denotes that thesystem entered state Si at time
ti = timestamp(ei).
3.2 Mapping States onto MapsActivity instances are visualised as
dots on a map. Bynot fixing the type of map, but allowing this
choice tobe configurable, different types of relationships can
beshown thus providing a deeper insight into the contextof the work
that was performed. Many types of mapscan be thought of:
geographical maps (e.g., the map ofa universitys campus), process
schemas, organisationaldiagrams, Gantt charts, etc. Naturally, one
can also makehighly specialised maps to suit a particular
purpose.The positioning of an activity instance may vary
acrossdifferent maps. When the use of a certain map is envis-aged,
the location of activity instances at runtime on thismap should be
captured through a formal expressionspecified at design time.
Definition 5 (Position function): Let M be the set ofmaps of
interest. For each available map m M , there
exists a partial function that returns a pair of expressionsfor
each activity type.
positionm : A 6 Expr(V {T , t}) Expr(V {T , t})where Expr(X) is
the domain of all expressions that usesome of the variables in X .
For each activity instanceai = (at, cid) and each map m, positionm
returns a pairof expressions. The evaluation of these expressions,
overa state S, returns a pair of coordinates (x, y) which is
theposition of ai on map m at state S.
More specifically, variables T and t are used to incor-porate
references in time: they are used to represent thestarting time of
activity instances and the current time ofreplay, respectively. Let
: X 6 U be a value assignmentof a subset of the variable names in X
. We define evalas a function which, given an expression f
Expr(X)and a value assignment , yields an integer number:
eval[[f ]] () = c
where c Z.Given a map m M , a state Si = (i, i, si ),
an activity instance ai = (at, cid) dom(i), thenpositionm(at) =
(f
m,a, f
m,a). The coordinates of ai on
map m for state Si at a given timestamp t is:
coordm(ai)Si
=(eval[[f m,at]] (ai), eval[[f
m,at]] (ai)
)where ai(vn) = i(vn, cid), ai(T ) = si (ai) and ai(t) =t.
For example, consider a loan request process whereeach instance
corresponds to a different request. Theapplicants monthly income
and the requested loanamount are stored in variables income and
loan. One candefine a cartesian map c where every activity instance
isassociated with a distinct dot whose x and y coordinatesare
determined by the values of these variables. Assumethat the maximum
values of income and loan, as seen inthe log, are 150000 and 10000
respectively. Also assumethat the maximum x and y coordinate values
on to whicha dot can still be properly displayed on map c are
800and 600 respectively. To ensure that an activity instancewith
maximum income and loan values can be properlydisplayed, we can
define a position function such thatpositionc(at) = (income
800150000 , loan 60010000 ).
The projection of a state Si = (i, i, si ) onto a mapm is the
projection of activity instances ai dom(i)onto m at position
coordm(ai)
Si
. As can be seen fromthe definition, the function positionm is
partial and hencenot all activity instances in the state are
mapped. Thismay be because it simply is not meaningful.
However,there may also be activity instances in dom(i) that arenot
mapped as some of the variables in the positionfunction do not have
a value. Another reason for anactivity instance not to be projected
onto a given map isthat its coordinates are invalid, i.e. falling
outside thatmap (for example because the x or the y coordinateare
negative). Given a map, activity instances that haveinvalid
coordinates or none at all need to be visualiseddifferently from
instances with valid coordinates. To this
-
5end, each map is associated with two lists of
activitiesinstances, one enumerating the activity instances that
donot have coordinates and one enumerating those whichhave invalid
ones.
As mentioned before, the dots representing activityinstances can
be filled with colour. This allows for richervisualisation as one
can take the value of a variableof choice into account. Currently,
different colouringschemes are proposed, which are based (i) on the
stateof the activity instances, (ii) on the characteristics of
thecase of the activity instance or (iii) on the age of theactivity
instances.
Based on the state of the activity instance. When thisscheme is
used, an activity instance ai is coloured accord-ing to i(ai) where
Si = (i, i, si ) is the current state. Inparticular, for activity
instances that are in the scheduled,allocated, executing or
suspended state, we fill the rel-ative dot with white, cyan, green
or black, respectively.Of course, activity instances that are
concluded or notscheduled are not represented on a map.
Based on the characteristics of the case of the
activityinstance. When this scheme is used, the end user choosesone
of the variables vn V present in the event log.The value of the
selected variable determines whichcolour is used to fill the dots
of the activity instancesin the case. Let Sn = (n, n, sn) be the
last state in theexecution history. The dot corresponding to an
activityinstance ai = (at, cid) dom(i) is coloured according
ton(vn, cid), i.e. the last value assigned to variable vn forthe
case cid. In particular, the 15 most commonly occur-ring variable
values are associated with the 15 non-whitecolours of the 16-colour
EGA palette1. The white colouris excluded as it is used to
represent all other variablevalues. The colour of dots for a
particular case should notchange during an animation. Otherwise,
one can easilylose track of dots. Therefore, the visualisation
approachchosen is to ignore any changes to a variable and to
justuse its final value. This then means that one should notchoose
a variable for visualisation purposes whose valuecan change during
the execution of a case. In addition,the choice of variable should
also be informed by theability to use it as a meaningful classifier
of cases.
Based on the age of the activity instance. In this approach,the
dots are coloured according to the age of the activityinstance,
i.e. the amount of time that has elapsed sincethe instance was
started. The colour white is associatedwith activity instances that
just started. As time pro-gresses, the colour of instances that
have not completedbecomes closer and closer to red. Let Si = (i, i,
si ) bethe state at time t, then the age of an activity instanceai
= (at, cid) dom(i) is computed as follows:
age(ai) = expln(2)(tsi (ai))
MET(at)
where MET (at) is the average of the time that wastaken to
complete instances of activity type at. For eachactivity instance
ai, age(ai) is always between 0 and 1.If t = si (ai), i.e. activity
instance ai was just started,
1. http://en.wikipedia.org/wiki/Enhanced Graphics Adapter
age(ai) = 1. Value age(ai) decreases exponentially asai ages.
When t si (ai) = MET (at), age(ai) = 0.5.An established approach is
used to map age(ai) to acolour: the Fire Colour Pallet [10]. The
colour rangesin intensity from a bright white (age(ai) = 1)
throughyellow, orange (age(ai) 0.5), brown, and then to black(as
age(ai) 0).
In order to deal with dots that may overlap on acertain map,
they are represented transparently. Thecolour of areas of overlap
is determined by the coloursof the individual dots involved.
Unfortunately, this maylead to confusion in some cases as it may be
hard tocorrectly interpret the resulting colour, but the
advantageof this approach is that dots whose area is
completelycovered by one or more other dots still remain visible.If
the centres of dots coincide, then the dots involvedare merged to
form bigger dots in order to avoid thatdots whose sizes and centre
positions are identical canno longer be visually distinguished from
each other. Thediameter of such dots grows according to the number
nof activity instances involved. Li et al. [23] conductedan
analysis with a number of subjects where they foundthat quantities
represented as circles are most intuitivelyperceived when the
circle grows as a power of 0.4.Applying this observation to our
case, i.e. when joiningn dots of different activity instances, the
diameter ofthe resulting dot is computed as n0.4. The
amalgamateddots are also divided in as many slices as there
areconstituting dots, and each slice is filled with the colourof
the dot which it corresponds to.
Another important feature is concerned with handlingactivities
instances when they are going to disappear.If a dot suddenly
disappears between two consecutivephotographs, end users would not
notice that the corre-sponding activity instance is completed,
especially whenmany dots are visualised at the same time on the
maps.Therefore, we have introduced a fading effect: if a dot
isgoing to disappear in x photographs, it starts fading out(i.e.,
becomes transparent). Value x can be customisedby an end user on
the fly while playing the movie. Asthe number of photographs in
which a dot is going todisappear becomes smaller, the fading effect
becomesmore pronounced, till the dot completely vanishes.
4 IMPLEMENTATION OF THE FRAMEWORKFigure 2 shows the architecture
of the implemen-tation. The yellow component is implemented as
astand-alone Java application whereas the red compo-nents are
implemented as plug-ins of ProM, an open-source pluggable framework
for the implementationof process mining tools in a standardised
environment(http://www.promtools.org). Plug-ins require a numberof
input objects and produce one or more output objects.These input
objects could, for example, be event logs oroutput objects of other
plug-ins. In this way, one candefine a chain of plug-ins
invocations.
The core software is the Log-On-Map Replayer plug-inwhich takes
a map-specification file and an event log
-
6Event Log
Map
DesignerMap Specification
Automatic
Map
Generator
Log On
Map
Replayer
Movies
Process
ModelOptional
Fig. 2. The architecture of the implementation.
as input. Each map-specification file consists of a set
ofavailable maps (i.e. the map name and the URL wherethe map image
can be retrieved) with correspondingposition functions (Definition
5), one definition for eachmap. The plug-in employs the framework
defined in Sec-tion 3 and generates for each available map a
sequenceof photographs. Playing such a sequence of photographsin
succession yields a movie. As previously mentioned,each photograph
captures the state of the process as itexisted at a certain point
in time. The graphical userinterface provides controls to select
one of the availablemovies and put it in focus, play/stop that
movie, or goto a specific moment in time in that movie.
Map specifications can be drawn through a Java stand-alone
application, the Map Designer. It allows processanalysts to load
images (e.g., PNG or JPG) to use asmaps and to define how to
project activity instancesonto those maps. To do so, analysts can
simply dragand drop an activity type onto the map and place itat
the position of interest. This way, the activity typesposition is
statically defined and applies to all instancesof that type.
Alternatively, analysts can define a positionas dynamic. The
position of an activity instance is thendefined in terms of the
state of the process instanceinvolved. As far as the implementation
is concerned, thestate is encoded as an XML document and the
positionfunction is defined as an XQuery over this document.Section
5 illustrates the interface of the Map Designerapplication and the
Log On Map Replayer plug-in throughthe case study.
There exists a number of potential maps that canbe applied to a
wide range of scenarios. For example,a process model can serve as a
map where activityinstances are projected onto the icons
representing thecorresponding activity types. Hence, as long as
thepositions of the activity icons are known, this type ofmap can
be used in different scenarios and the cor-responding positioning
function can be automaticallygenerated. This concept of
facilitating the generationof maps with little effort required by
users has led tothe implementation of a second plug-in of ProM,
theAutomatic-Map-Generator. It takes an event log as inputand
optionally a process model and produces a map, in-tended as the
background image, and the correspondingposition function.
Currently, three types of maps can be
generated automatically: Cartesian,
Deadline/Timeline,Process-Model. When generating a map of these
types,users need to define the size of the image throughspecifying
the width w and the height h.
Cartesian Map. This type of map takes inspiration fromthe
Cartesian coordinate system. The projection of activ-ity instances
is determined by choosing two numericalvariables, vx and vy , to
derive the values of x and ycoordinates from. Let xmax (xmin) and
ymax (ymin) be themaximum (minimum) values for vx and vy as present
inthe event log. The position function for a Cartesian mapc for
instances of an activity type at A is defined aspositionc(at) =
(x
at, y
at) where
xat =vxxminxmaxxmin w and yat =
vyyminymaxymin h.
For example, suppose that the end user chooses vari-ables payout
and amount as vx and vy . Based on thesevariables, the plug-in
determines the corresponding min-imum and maximum values as seen in
the log, e.g.xmin = 0, xmax = 100, ymin = 1000, ymax = 10000.For
map of size, for example, 800 600 pixels (i.e.,w = 800 and h =
600), the plug-in thus specifiesxat =
((payout0)/(1000)) 800 and yat = ((amount
100)/(10000 1000)) 600.Deadline/Timeline Map. Activity instances
are posi-
tioned along the x-axis according to the time that theyhave been
active. When an activity instance has justbecome active, its
x-coordinate is equal to w. Its y-coordinate is obtained by
choosing a numerical variablevy , extracting its current value, and
using the same com-putation as for the Cartesian map. As time
progresses,the x-coordinate changes (becomes less and less), butthe
y coordinate remains constant. The position functionfor a time map
l for instances of an activity type ai =(at, cid) is defined as
positionl(at) = (xat, yat) where
xat =(1 tTdat
) w and yat = vyyminymaxymin hwhere dat is a constant which
defines the maximum validduration for instances of an activity type
at. If a certainactivity instance is active for more than dat, xat
willbecome negative and is then enumerated in the list ofactivity
instances with invalid coordinates.
Process-Model Map. As mentioned before, for a process-model map
activity instances are projected onto thecorresponding icon in the
model. In order to enable theautomatic generation of maps of this
type, end usersneed to provide a process model which also
encodesthe actual coordinates of the icons that represent themodels
activities. Currently, we only support the Petrinet formalism,
though we believe that it is relativelyeasy to extend the
implementation to support otherprocess modelling formalisms. Petri
Nets are stored infiles in PNML standard format which also encodes
thecoordinates of positions of all activities (i.e. Petri
nettransitions) of the model.
-
75 A CASE STUDY WITH AN AUSTRALIAN IN-SURANCE COMPANYWe applied
the visualisation framework in a case studythat we conducted with
one of the largest insurancecompanies in Australia, namely Suncorp,
during thesecond-half of 2012. Through regular meetings
(almostweekly) with the stakeholders from Suncorp, we iden-tified
the need to communicate the current landscapeof Suncorps claims
processing performance to highermanagers within the company. To
this end, we believedthat the visualisation framework proposed in
this papercould be used to generate a number of movies summaris-ing
Suncorps claims processing trends and performance.
Suncorp provided us with data related to the process-ing of
claims that were finalised within a 6-month period(regardless of
the starting time of the claims). The dataconsists of over one
million events for 34 activity types,which together describe the
processing of over 32,000claims from multiple departments within
Suncorp.
For the purpose of evaluating the visualisation frame-work and
the usefulness of the resulting movies, a subsetof the data (from
one department only) was used. Thissubset of data was selected
because it contains richattribute information, including: loss type
(i.e. the causeof a loss that triggered an insurance claim, such as
fire,theft, or burglary), payout amount (i.e. the amount ofmoney
paid to a customer as a result of an awardedinsurance claim), team
(i.e. the team within Suncorpwhich processed the claims), and many
others. As willbe detailed later, the richness of attribute
information inthis subset of the log allowed us to produce
interestingmaps and, consequently, movies.
5.1 Overview of the Maps (and Movies) Created forthe Case
StudyUsing this subset of data, four movies were pro-duced using
four different maps. A thorough expla-nation of the interpretation
of these maps and thevarious interactive configuration options
available tousers (while a movie is being played) is providedin the
remainder of this section. Some screencasts ofthe tool showing
these four movies are available
athttp://www.processmining.org/online/logonmaps.
5.1.1 Australia MapThe first movie was produced using a
geographical mapof Australia. In this movie, a dot is projected
onto themap at the position that corresponds to the state
orterritory where the claim was lodged. Thus, the goal ofthis map
is to display the distribution of insurance claimsacross all
Australian states and territories at any givenpoint in time, and
how the distribution evolves over aperiod of time. Figure 1 shows a
snapshot of the movie.
The bottom part of Figure 1 shows the widgets tocontrol the
playback of the movie. From left to right, thisarea contains the
play button to start/stop the movie, thetimeline and time slider
box which shows the relativeprogression of the movie and the slider
that can be
dragged by users to reach a particular snapshot, theclock
showing the current timestamp of the movie beingplayed as obtained
from the timestamp information inthe event log, and the slider to
adjust the playback speedon the fly. As also stated in [24], it is
important to playthe movie at the right speed. The optimal speed
maybe hard to predict: animations played too slowly maybecome
boring, whereas the opposite can cause relevantinformation to be
missed. The timeline box also shows awave in which the x-axis
represents time and the y-axisthe number of active activity
instances. The purpose ofthe two buttons above the timeline box
(labelled as Pos.Trend and Colour Trend) will be explained later in
thissection.
At the top of Figure 1, we see the Maps panel whichis made up of
a number of tabs, each associated with adifferent movie. By
selecting a tab, the correspondingmovie is brought to the front to
show the state ofthe process at a specific point in time, in terms
of theactivity instances that are active, their state, and
theirpositions. While playing the animation, users can changethe
animation they are watching in case they wish toconsider a
different type of movie. In Figure 1, twoanimation movies were
executed simultaneously: thefirst one is the Australian map
animation (called theLoss Cause/the State plot on the tab) and the
other oneis the Incurred Amount/Claim Duration movie (notshown in
Figure 1, but users can switch between thetwo movies at
anytime).
There are two boxes on the panel on the left-hand sideof the
screenshot. The top box enumerates the activityinstances for which
there are no associated positions(i.e. they do not belong to the
domain of the corre-sponding position function). The bottom box
enumeratesthose activity instances whose positions are invalid
(i.e.the corresponding position function returns coordinateswhich
either fall outside the map boundary or could notbe evaluated). In
the screenshot shown in Figure 1, therewas no activity instance
that was projected as invalidat that particular point in time.
The centre of Figure 1 shows a snapshot of the moviebeing
played. The configuration file used to produce thismovie projects
activity instances onto the correspondingAustralian state/territory
in which the claim was lodged.The size of the dots is positively
correlated with thenumber of events that are simultaneously present
onthe map at exactly the same position. As mentionedpreviously,
dots projected at the same coordinates aremerged to form bigger
dots; moreover, the colouringscheme of the dots is configurable. In
Figure 1, thecolour of the dot was configured so that each dot
iscoloured according to the loss cause of the insuranceclaims. The
association of colours with values of losscause is explained in the
legend shown on the right-hand side panel. Many dots are projected
at the samecoordinates and, hence, they are merged in clusters.
Asmentioned previously, each bigger dot is divided in asmany slices
as the number of different colours associatedwith merged dots. The
size of the slice is determined
-
8proportionally with the percentage of dots of a givencolour:
the number of activity instances with a particularcolour (i.e. a
particular loss cause) in a dot over thetotal number of activity
instances represented by the dot.These percentages can be viewed by
rolling the mouseover the dots (as shown in Figure 1).
On the right-hand side of Figure 1, there are two tabs.The top
one is labelled as Parameters and the bottomone is labelled as
Legend (the Legend tab is not shownin Figure 1 because the tab was
being selected in thescreenshot, causing the Legend tab itself to
disappear).As explained above, the purpose of the Legend tab isto
describe the meaning of each colour in the dot. TheParameters tab,
on the other hand, is used to configurethe colouring scheme and the
manner in which the dotsare to be displayed in the animation
movie.
Figure 3 (top part) shows the expanded parameter tab.The first
check-box allows a user to merge multiple dotsinto one dot even if
the dots are partially overlapping.If this option is ticked, dots
whose area partly over-lap are merged. Otherwise, if it is
unticked, dots aremerged only if they are exactly positioned at the
samecoordinates. The second check-box allows the fading outof the
dots when the corresponding activity instancesare soon going to be
no more active. The number offrames required for the dots to begin
to fade out canbe customised using the slider on the panel. When
dotsare merged, by default, the bigger dots are annotatedwith the
number of dots that are merged. Sometimes theprocess owner does not
want to disclosure this aggregateinformation for privacy and/or
confidentiality reasons.For instance, this happened for the
Suncorps case stud-ies. The last check-box allows users to display
or removethe exact number of merged dots that is shown in themiddle
of each dot: If this option is unticked, the numberwill not be
displayed in the movie (the legend will alsoshow an N.A. status in
place of the exact number).
The second half of the parameter panel allows usersto customise
the colouring scheme of the dots in themovie according to one of
the schemes described inSection 3. The three options will
respectively enablethe colouring of dots based on the state, the
age, orthe values of a particular variable of the
correspondingactivity instances. Furthermore, for the third option,
thevariable type to be used is selected from a drop-downbox, as
shown in Figure 3.
The Pos. Trend panel, shown in Figure 3, is usedto present to
viewers the evolution of the number ofactivity instances that are
correctly projected onto themap (blue-lined graph) and not
projected onto the map(green-lined graph) over the time span of the
movie.The red-lined graph in this figure shows those
activityinstances that could not be projected onto the mapcorrectly
(i.e. those activity instances with invalid pro-jection); however,
we do not see the red-lined graphin Figure 3 because there were no
invalidly projectedactivity instances throughout the whole
animation. Thethick vertical red line represents the current time
of themovie being played. The Colour Trend panel serves a
Fig. 3. Screenshots of the interactive parameter paneland the
position trend graph
similar function as the Pos. Trend panel, except that thistime
it displays the evolution of the number of activityinstances per
dot colour over the time span of the movie.
Typical insights that can be gained from this mapinclude an
understanding of the distribution and thecharacteristics of claims
across Australian states and,more importantly, the differences in
the characteristicsof the claims. For example, while playing the
movie,we noticed that the state of Tasmania had a higherproportion
of claims from damages of rental propertiesas compared to other
states, while natural hazard seemsto be one of the most dominant
causes for insuranceclaims across all states.
Based on the description of our framework in Sec-tion 3, the
LogOnMap plug-in is used to project indi-vidual activity instances
onto the map. However, thereare situations when stakeholders are
more interested inthe analysis of the overall distribution of cases
and theircharacteristic, rather than of the single activity
instances.This is precisely the situation that we encountered inour
case study with Suncorp. In fact, two out of the fourmovies
produced (i.e. the Australian map and the Quad-rant map) are
concerned with case-level performance. Toaddress this situation, we
inserted one dummy activityinstance for each case (i.e. trace) in
the event log. These
-
9Fig. 4. A screenshot of the second animation producedusing the
quadrant map
dummy activity instances start before any other
activityinstances in the case are active and complete when allother
activity instances are no longer active. Then, in themap
configuration file, we project the dummy activityinstances onto the
map to generate movies that showthe performance and trends of
cases. The other activityinstances are not projected (i.e. the
position function isonly defined for the dummy activity
instances).
5.1.2 Quadrant MapThe second movie was produced based on a
Cartesianmap whereby the x-axis represents the amount of in-surance
claim payouts, and the y-axis represents thenumber of days taken to
process the claims. We call thismap a quadrant map. Note that this
second movie isalso about viewing cases, thus the insertion of
dummyactivity instances into the log was applied. A screenshotof
the generated animation is shown in Figure 4.
In this screenshot, dots are coloured based on the ageof the
activity instances (see Section 3.2). Dots in thebottom-left
quadrant represent claims with low payoutvalues and relatively
quick processing times (expected),dots in the top-right quadrant
represent claims withhigh payout values and relatively long
processing times(expected), dots in the bottom-right quadrant
representclaims with high payout values and quick processingtimes,
and finally, dots in the top-left quadrant representclaims with low
payout values and long processing times(under-performing
claims).
In other words, this movie allows us to gain insightsinto the
performance of Suncorps claims process over aperiod of time. In our
case study, this movie proved to beuseful in conveying to business
analysts and managersthe performance of their claims process.
5.1.3 Deadline MapThe third movie was produced based on a
deadline map.This movie shows those activity instances which
were
Fig. 5. A screenshot of the third animation movie pro-duced
using a deadline map
completed on-time (before the deadline) and those whichwere not
(see Figure 5). The position of a dot on the x-axis tells us the
time remaining before the correspondingactivity instance reaches
its deadline, while the positionof that dot on the y-axis reveals
the type of that instance.In this figure, dots are coloured
according to the teamsthat performed the corresponding activity
instances.
When an activity instance becomes active, a dot rep-resenting
the activity instance appears on the map. They-axis position of the
dot is determined by the typeof the activity instance, while the
x-axis position isinitially determined by the amount of time the
activityinstance has before the deadline is reached (note thatthe
deadline for every activity instance is provided as anevent
attribute in the log). Thus, the later the deadlineof an activity
instance is relative to the time when theinstance becomes active,
the further to the right thestarting position of the dot is.
After an activity instance becomes active, the dotrepresenting
that activity instance moves from right toleft as time progresses.
The amount of time until thedeadline expires is represented through
the distance tothe thick black line. Consequently, at any given
point intime, those activity instances which did not complete bythe
deadline are captured by the dots to the left of thethick black
line.
In our case study, this movie allowed us to identifya number of
activities that often ran overtime, such asthe Follow Up Requested
activity, the Conduct FileReview activity, and the Incoming
Correspondenceactivity. Other activites, such as the New Claim
(IPI)activity, mostly completed around the deadline.
5.1.4 Process model mapThe fourth movie was produced based on a
processmodel map (see Figure 6). The process model used inthis
movie is the Fuzzy model [22] that we discoveredusing the Disco
tool (http://www.fluxicon.com/disco).
-
10
Fig. 6. A screenshot of the fourth animation movieproduced using
a process model map
As explained in Section 4, dots are projected onto themap
according to the position of the icon representingthe activity
captured by the dots. In this screenshot,dots are coloured
according to the age of the activityinstances.
A typical insight that can be gained from using thistype of map
is the identification of activities in a processthat can
potentially be a bottleneck. For example, whileplaying the movie,
the appearance of a large dot on aparticular activity icon over an
extended period of timemay indicate the piling up of work items
(i.e. activityinstances) related to that activity. As can be seen
fromFigure 6, many activity instances were piling up fortwo
activity types, namely, Follow Up Requested andConduct File
Review.
5.2 Map Designer
As stated in Section 4, a Map Designer tool has beenimplemented.
Figure 5.2 shows two examples of howwe have used the map designer
to generate the config-uration files for the four movies used in
the case study.
The top part of Figure 5.2 shows how we have usedthe map
designer tool to automatically generate the con-figuration file for
the fourth movie (the process modelmovie). Here, we can see how a
user can simply dragan activity name (from the Task List window) to
thedesired position on the map. By doing so, the map de-signer tool
automatically generates a map configurationfile which specifies,
for each activity whose instancesneed to be projected onto the map,
the static positionof those instances.
The bottom part of Figure 5.2 shows how we used themap designer
to help us generate the map configurationfor the second movie (the
quadrant movie). This movierequires a dynamic positioning of dots
based on thevariable values of the activity instances to be
projected.Thus, to enable such a dynamic positioning of dots,
Draglactivitiesltobelprojected
fromlthel'TasklList'windowltolthel
map.
StaticlProjection
DynamiclProjection
ManuallylinsertXQuerylstatementsltolspecifyltheldynamicpositionloflactivities.
Fig. 7. Screenshots of the Map Designer showing static(top) and
dynamic (bottom) activity projections
we inserted the desired XQuery statements into thecorresponding
pop-up window.
Overall, we found the map designer tool to be quiteuseful in
enabling us to quickly define, and adapt, ourmap configuration
files to suit the type of visualizationthat we would like to
see.
6 EVALUATION WITH END USERSThe validity, the usefulness, and the
intuitiveness ofthe approach and of the resulting implementation
hasbeen thoroughly assessed through engagement with endusers.
Specifically, the approachs validation has beenconducted in two
phases.
A first version of the tool was released in the sec-ond half of
2012 and reported in [8]. This version wasevaluated with three
subjects of a Dutch municipality:a process management specialist, a
communication andmarketing specialist, and a business advisor for
customercontacts. In this case, we used a real-life event
logconcerning the process to handle the applications of
thehouse-building permits submitted by Dutch residents.
Inparticular, we defined four maps, as reported in [8].
Through this evaluation process, we discovered anumber of
usability issues and missing features in ourtool which contributed
to unnecessary complication inthe interpretation of the results. A
summary of theidentified issues is provided below (see [8] for
details): In the first version of the tool, activity instances
left no trace when they disappeared. The subjects
-
11
found this particularly confusing as moving dotscan disappear at
different positions on the map.
The subjects interviewed remarked that it was some-times unclear
how long activity instances were ac-tive.
Activity instances could not be related to character-istics of
the case, e.g., the type of permit requested.
After addressing the issues above (and other minorissues), we
released a second version of this tool whichis the version
discussed in this article. For instance, thefading effect was
introduced to make it more evidentwhen activity instances are about
to disappear (see endof Section 3). The different colour schemas
discussed inSection 3.2 were introduced to relate activity
instancesto case characteristics or to draw attention to the age
ofactivity instances.
After releasing the second version of the tool, we per-formed a
more extensive session of experiments whereusers personally
interacted with the tool. The partici-pants of this second
experiment session were Suncorpemployees and the movies used in the
experiment werethose four movies already explained in Section 5.
The useof a different case study from another continent allowedus
to assess the framework in different settings and withsubjects with
a different cultural and work background.
The evaluation was conducted using an establishedmethodology, in
addition to a number of interviews witha relatively large number of
subjects. Section 6.1 de-tails the evaluation methodology, the
background of theparticipants, and the experiment procedure.
Section 6.2reports the result of the evaluation and lessons
learned,along with directions for future development.
6.1 Methodology for the EvaluationThe evaluation of the second
version of our tool was con-ducted using the Co-operative
Evaluation methodology,which is a mature, fully-documented
methodology in thefield of human-computer interaction [25]. This is
a cost-effective technique for identifying usability problems
inprototype products and processes. The technique encour-ages
design teams and users to collaborate in order toidentify usability
issues and their solutions.
6.1.1 Nature and Number of ParticipantsThe participants of this
experiment consisted of Sun-corps employees of various roles: five
team leaders, onemanager, two business analysts, and one claims
officer.They had different levels of knowledge of the
insuranceclaim process.
In terms of the participants familiarity with processmining
techniques and business process managementtechnology, three out of
the nine participants were awareof the existence of Business
Process Management sys-tems, while the rest were not aware at all,
or only hada very limited awareness, of such systems.
Furthermore,only one participant had experience with process
analy-sis, while the rest had none, or limited, experience
withprocess analysis.
It is worth highlighting that, because of the user-intensive
nature of this method, it is difficult to runthis experiment with a
large number of users. Never-theless, as documented in [25], past
applications of sucha method have shown that the careful choice of
experi-ment subjects, even if relatively small, can minimise
theproblem of obtaining subjective results.
6.1.2 Procedure to Conduct ExperimentsThe experiment of our
visualisation tool was conductedusing the four movies generated
using the LogOnMapRe-play plug-in (detailed in Section 5). The
experimentwas conducted with each participant individually,
oneafter another. Before the experiment started, we gavea brief
introduction of the framework and of the fourmovies, after which we
let the participant play withthe tool on their own. Each
participant was roughlygiven 10 minutes to interact with the tool.
In accordanceto the methodology, no further comments were
given,thus letting the participants draw their own
conclusions.Without our interference, we could thus evaluate
thelevel of understandability of the map metaphor and theusefulness
of the approach when extracting knowledgefrom event logs.
While performing such tasks, participants had to ex-plain what
they were doing by thinking-aloud. Duringthe experiment, notes were
taken on the behavioursexhibited by the participants in order to
measure thedegree of efficiency and effectiveness observed. In
par-ticular, they were asked to communicate any
meaningfulconclusions that they managed to draw by observing
theanimations and interacting with the tools.
At the end of the experiment, each subject was giventhe
opportunity to fill out a semi-structured question-naire with
questions designed to measure the subjectsimpressions and
expectations with regards to the tool.To ensure anonymity, the
filled-out questionnaire wasinserted into a ballot box by the
participant him/herself,thus guaranteeing the anonymity of
responses.
In summary, our experiment methodology providesa valuable means
to not only verify the effectivenessand efficiency of the
visualisation framework, but alsoto elicit further possible
improvement opportunities (seeSection 6.3). This method is,
therefore, an eminentlyformative evaluation method, rather than a
summativeone. It is useful for identifying those usability bugs
thatcan affect the effectiveness of the system being evaluated.
6.2 Evaluation ResultsThe results of our experiment will be
detailed based onthe questionnaire results (both quantitative and
quali-tative data were collected) and our observations of
theparticipants responses to the tool during the experiment.The
first two questions in the questionnaire (Q1 andQ2) were used to
gauge participants familiarity withBPM systems and process analysis
(the result of whichhas been summarised in Section 6.1.1). The rest
of thequestions (Q3Q8) were used to gauge participantssatisfaction
with the tool.
-
12
6.2.1 Questionnaire resultsThe questionnaire used consisted of
both closed andopen questions. The third question (Q3) asked
partici-pants the type of insights that they expected to obtainby
using the visualisation tool. Participants expectationsof insights
they wish to obtain by using the tool vary,although they are
roughly consistent with what can beexpected from visual analytics.
A baseline expectationfrom all participants is to understand the
trend and vol-ume of their claims processing in terms of
performance,and to understand why certain trends occurred.
Someparticipants also expected the tool to guide them in
iden-tifying problems with their processes and how to resolvethem.
Finally, a few participants also expected the tool tohelp them
identify opportunities in their processes with afocus on planning
the assignment of resources.
The remaining five questions (i.e. Q4Q8) consistedof both closed
and open-ended questions. For each ofthese questions, the user
selects one out of a numberof pre-determined satisfaction rating
(e.g. very much,much, not so much, not at all, and I dont
know).Furthermore, to gain more insights, they were allowed
toprovide reasons for the satisfaction rating they selected.
The results of the closed questions are displayed inFigure 8.
With respect to Q4, most participants (8 out of9) found the tool
allow them to gain expected process-related insights (expressed as
answer to Q3). This is con-firmed by analysing the related
comments, e.g. Having avisual representation makes it clearer [. .
.] the flow on impactswhen there is a problem in one area and how
it then relatesto another area., Was good to see key area for the
businessto improve on [. . .], You could clearly identify
bottlenecks,claims with long duration but low value etc., Give
graphicalrepresentation of claims incidents and work loading.
Giveinsights to claims costs compared to time of year.
However, there was one participant who did not findthe tool to
help him/her much with addressing the ques-tion he/she listed
earlier. Upon reading the comment,it turned out that the
participant expected a feature inour tool which was simply not
built: the ability to drilldown into activity instances that appear
on the moviesto learn more about them (e.g. obtaining the details
ofthe customer related to a particular activity instance).
Another recurrent comment was about the intuitive-ness of the
deadline map, with dots moving towards theleft as time progresses.
They would have expected themto move towards the right. This is an
interesting point:in the first version of the tool, the dots moved
towardsthe right. We changed this since the subjects from theDutch
municipality found the movements towards rightas not very
intuitive. This makes us suspect that theintuitiveness may also be
subjective and depend on thecultural background.
In terms of the maturity of the tool (Q5), it was
quitesurprising to note that one third of the participants didnot
provide any responses; however, by looking at the re-lated
comments, it turned out that these participants didnot respond
because they either did not undertand thequestion or they found
that they had not spent enough
(Q4),Did,the,tool,assist,you,in,gaining,insights,you,listed,in,Question,3?
(Q5),Do,you,think,the,tool,is,mature?
(Q6),Do,you,find,the,behaviour,of,the,tool,intuitive?,In,other,words,,does,the,tool,behave,in,a,way,that,you,expect?
(Q7),Did,the,tool,run,without,interruptions,and,without,crashes,during,the,experiment?
(Q8),Do,you,think,there,are,any,essential,features,that,are,missing?
Fig. 8. Closed questions results
time with the tool to properly answer Q5. Nevertheless,those who
responded to this question all agreed that thetool is mature. The
related comments also confirmed thisobservation, e.g. ...it
captures a lot of relevant informationand provides plenty of
options to navigate to specific areasyou might want to examine...,
...it has all the activities for6 months and I can go back in time
if I wanted to. Obviouslyas we used it we would discover more
things we may wantto see but for the moment, I think there is
enough for me toplay with. At the same time, comments related to
Q5also suggested a number of possible tool improvements,including
the use of plain English in the descriptionand labelling of various
configuration options.
Q6 evaluates the intuitiveness of the tool. As shown inFig. 8,
the responses we obtained were quite similar, with8 out of 9
participants finding the tool quite intuitive.From those who found
the tool to be intuitive, wereceived comments such as It acts
exactly to what youwould expect, according to the selections you
choose., Visualinterpretation easy to accept., and [The] tool is
easy touse.. The one participant who did not find the tool tobe
intuitive commented on the need for the tool to haveclear labels,
filters, and help boxes so that we can easilynavigate around
it.
-
13
Another feature of the tool that we wanted to evaluatewas its
operational stability. Therefore, Q7 asked partic-ipants to rate
whether the tool ran smoothly during theexperiment. The responses
to this question are almostevenly divided, with 5 participants
stating the tool to berunning smoothly during the experiment and
another 4participants stating the opposite. One participant
com-mented that it was frustrating that he/she managed tocrash the
tool, and other participants commented on thelow-quality graphics
used during the experiment due tothe projector used.
Finally, we also asked participants if there were anyfeatures in
the tool that they would like to have but thatwere not currently
available. Apart from one participantwho did not respond, the
responses to this questionare evenly split, with 4 participants
stating that therewere missing features and another 4 participants
statingthe opposite. Among those who responded Yes to thisquestion,
a number of features that should be addedwere suggested. These
suggested features, except forthose that have already been stated
in response to Q6,will be detailed in Section 6.3.
6.2.2 Additional Observations
As explained in Section 6.1, we took notes to try andgauge the
degree of efficiency and effectiveness as ex-perienced by the
involved subjects during the empiricaltests. In particular, we were
interested to learn whetherthe subjects could draw interesting
conclusions, whichwere either unexpected or confirmed previous
intuitions.
The notes taken during the experiment stated thatparticipants
managed to derive important conclusions assuggested by the movies.
For example, one participantgained insights with regards to the
distribution of claimtypes across different Australian states and
raised thepossibility of how insurance premiums can be
adjustedaccordingly. A number of other participants gained
in-sights into the absurdity of having small-value claimswhich took
a very long time to complete. One participantclearly acknowledged
the usefulness of the deadline mapto compare performance levels
across different teams.Another participant noted a peak time period
in termsof the number of claims. Overall, the insights gained
byparticipants from using this tool are consistent with whatwe
expected the tool to be able to provide.
The rest of the observation notes taken during theexperiment
corroborated the results of the questionnaire,thus further
validating the results of our experiment.
6.3 Evaluation Conclusion and Future Tool Improve-ment
The analysis of the questionnaires answers and of theusers
interaction with the tool let us make the
followingobservations:
1) In most cases, the tool behaved as expected.2) A number of
participants highlighted that the lan-
guage used in the tool was not written in plain
English, leading to potential wrong interpretationof the options
or features.
3) The stability of the tool needed to be improved asit crashed
rather frequently during the experiment.
4) A recurrent request from many participants was toprovide
filtering capabilities such that the moviecan be configured to
display only information ofinterest, e.g. only display activity
instances whichwere executed by a certain group of resources
orwhich occurred within a specific time period. Notethat ProM
already has sophisticated log filteringcapabilities; nevertheless,
it is still worthwhile tointegrate them directly into this plug-in
to allowusers to do the filtering on the fly while playingthe
movie.
5) One participant expressed his/her interest in be-ing able to
drill down into the activity instancesassociated with certain
colours or having certaincharacteristics, and to extract the
respective details.
6) For dots representing more than one activity in-stance,
participants showed an interest in beingable to quickly learn the
percentage of slices of eachrepresented colour (e.g. 30% of slices
are colouredred, 10% blue, etc.). In the version evaluated,
wesimply showed the number of dots that wereamalgamated, without
detailing this statistics percolour. Since then, this issue has
been addressed.Moreover, more detailed bug testing has been
con-ducted and, as result, many bugs have been fixed.
Despite the issues listed above, we believe that theevaluation
has clearly assessed the validity of the tool.The concept and the
design of the tool are desirable. Themetaphor of maps and movies is
clearly understood andserves the purpose of gaining insights into
past execu-tions to understand process trends and performance.
Overall, subjects were enthusiastic about the abilityto generate
movies based on event data. Several partici-pants expressed a
desire to start using the plug-in in theirown analyses. In fact,
the tool has helped some end usersfind interesting patterns in
various situations, some ofwhich were quite surprising to them.
Therefore, we con-clude that the concept of our visualization
framework ispromising and that the current tool already illustrates
itspotential.
7 CONCLUSIONProcess models can be viewed as geographic
maps.However, unlike real maps the quality of process modelsoften
leaves much to be desired. Man-made processmodels tend to be
subjective and disconnected from realprocess executions. Process
mining techniques can beused to improve the quality of process
maps, e.g., pro-cess discovery techniques can be used to
automaticallyderive process models from event data and
conformancechecking techniques can be used to pinpoint and
quan-tify deviations between model and reality. However,this is not
sufficient as process maps are static and donot show the flow of
work. Therefore, we developed
-
14
an approach to visualise process histories in a genericmanner.
Different maps can be used as long as activityinstances can be
given coordinates on such a map,e.g., an activity instance may be
mapped onto a Ganttchart, an organisation chart, a process model,
etc. Byshowing a sequence of photographs of the process(i.e., a
movie), one can see concept drift, complianceproblems, bottlenecks,
etc.
This paper describes an implementation of these ideasin the ProM
framework. We also developed a mapdesigner that allows end users to
define the maps of in-terests and to position activities on them.
Many maps arewidely-applicable (e.g., a time-line or a process
modelmap) and can be used in many different settings withfew or no
changes. Therefore, we also developed a ProMplug-in that
semi-automates the definition of certaintypes of maps, by
self-generating the map picture andthe projection of activity
instances. The implementationis generic and any collection of maps
can be usedas long as it is possible to map instances onto
mapcoordinates. Moreover, dots on the map can be colouredusing
various schemas according to the properties of thecorresponding
activity or process instance.
Interested readers can try the ProM implementationwith a sample
event log. The sample event log,the corresponding maps, the
configuration files,and the related instructions are available
fromhttp://www.processmining.org/online/logonmaps.
The approach has initially been evaluated using a casestudy in
the context a Dutch municipality. The outcometriggered a series of
changes in the framework andthe reference implementation. After the
adjustments, weperformed a second more extensive experiment
whereparticipants actually worked with the tool. To minimizethe
issue of obtaining subjective results, this experimentwas conducted
based on a well-founded methodology,i.e. the Co-operative
Evaluation methodology [25]. Thissecond experiment was conducted
with the employeesof Suncorp (one of the largest insurance
organisations inAustralia).
The results in this paper show the value of combiningprocess
mining and visual analytics. Process mining re-sults are often
perceived to be rather abstract and static.Visual analytics
approaches tend to be data-centric ratherthan process-centric. The
combination of both fields mayyield innovative process-centric
visualisations such asthe process movies proposed in this
paper.
REFERENCES[1] J. Manyika, M. Chui, B. Brown, J. Bughin, R.
Dobbs, C. Roxburgh,
and A. H. Byers, Big data: The next frontier for
innovation,competition, and productivity, McKinsey Global Institute
(MGI),Tech. Rep., May 2011.
[2] D. Laney, A. Bitterer, R. Sallam, and L. Kart. (2012,
December)Predicts 2013: Information innovation. Gartner.
[3] W. M. P. van der Aalst, Process Mining: Discovery,
Conformance andEnhancement of Business Processes. Berlin
Heidelberg: Springer-Verlag, 2011.
[4] J. J. Thomas and K. A. Cook, Eds., Illuminating the Path:
TheResearch and Development Agenda for Visual Analytics. IEEE
CSPress, 2005.
[5] D. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann, Eds.,
Mas-tering the Information Age: Solving Problems with Visual
Analytics.VisMaster, http://www.vismaster.eu/book/, 2010.
[6] M. de Leoni, M. Adams, W. M. P. van der Aalst, and A. H. M.
terHofstede, Visual support for work assignment in
process-awareinformation systems: Framework formalisation and
implementa-tion, Decision Support Systems, vol. 54, no. 1, pp.
345361, 2012.
[7] W. M. P. van der Aalst, M. de Leoni, and A. H. M. ter
Hofstede,Computational Intelligence. Nova Publisher, 2012, ch. 8:
ProcessMining and Visual Analytics: Breathing Life into Business
ProcessModels, pp. 107138.
[8] M. de Leoni, J. Buijs, W. M. P. van der Aalst, and A. H. M.
terHofstede, Facilitating process analysis through visualising
pro-cess history: Experiences with a dutch municipality,
EindhovenUniversity of Technology, Tech. Rep. BPM-12-24, 2012.
[9] C. Chen, Information Visualization: Beyond the Horizon.
Springer-Verlag, New York, Inc., 2006.
[10] R. Spence, Information Visualization: Design for
Interaction, 2nd ed.Harlow, England: Pearson Education Limited,
2006.
[11] G. Alonso and C. Hagen, Geo-Opera: Workflow Concepts
forSpatial Processes, in SSD97: Proceedings of the 5th
InternationalSymposium on Advances in Spatial Databases, ser.
Lecture Notes inComputer Science, vol. 1262. Springer Verlag, 1997,
pp. 238258.
[12] D. Kaster, C. Bauzer-Medeiros, and H. V. da Rocha,
SupportingModeling and Problem Solving from Precedent Experiences:
TheRole of Workflows and Case-based Reasoning,
EnvironmentalModelling and Software, vol. 20, no. 6, pp. 689704,
2005.
[13] B. Schonhage and A. Eliens, Management Through Vision:
ACase Study Towards Requirements of BizViz, in AVI 2000:
In-ternation Conference of Information Visualisation. IEEE
ComputerSociety, 2000, pp. 387392.
[14] D. Oelke, C. Ming, C. Rohrdantz, D. Keim, U. Dayal, H.
Lars-Erik,and H. Janetzko, Visual Opinion Analysis of Customer
FeedbackData, in Proceedings of the IEEE Symposium on Visual
AnalyticsScience and Technology (IEEE VAST 2009). IEEE, 2009, pp.
187194.
[15] D. Keim, T. Nietzschmann, N. Schelwies, J. Schneidewind,T.
Schreck, and H. Ziegler, A Spectral Visualization Systemfor
Analyzing Financial Time Series Data, in Proceedings ofthe
Eurographics/IEEE-VGTC Symposium on Visualization (EuroVis2006).
Eurographics Association, 2006, pp. 195200.
[16] H. Ziegler, T. Nietzschmann, and D. Keim, Relevance
DrivenVisualization of Financial Performance Measures, in
Proceedingsof the Eurographics/IEEE-VGTC Symposium on Visualization
(EuroVis2007). Eurographics Association, 2007, pp. 1926.
[17] P. Bak, F. Mansmann, H. Janetzko, and D. Keim,
Spatio-temporalAnalysis of Sensor Logs using Growth Ring Maps, IEEE
Transac-tions on Visualization and Computer Graphics, vol. 15, pp.
913920,2009.
[18] D. Keim, Visual Exploration of Large Data Sets,
Communicationsof the ACM, vol. 44, pp. 3844, 2001.
[19] M. Krstajic, E. Bertini, and D. A. Keim, Cloudlines:
Compactdisplay of event episodes in multiple time-series, IEEE
Transac-tions on Visualization and Computer Graphics, vol. 17, no.
12, pp.24322439, December 2011.
[20] J. Kehrer and H. Hauser, Visualization and Visual
Analysisof Multifaceted Scientific Data: A Survey, IEEE
Transactions onVisualization and Computer Graphics, vol. 19, no. 3,
pp. 495513,2013.
[21] W. Aigner, S. Miksch, W. Muller, H. Schumann, and C.
Tominski,Visualizing time-oriented data-a systematic view, Journal
onComputers and Graphics, vol. 31, no. 3, pp. 401409, 2007.
[22] C. W. Gunther and W. M. P. van der Aalst, Fuzzy Mining:
Adap-tive Process Simplification Based on Multi-perspective
Metrics,in International Conference on Business Process Management
(BPM2007), ser. LNCS, vol. 4714. Springer-Verlag, 2007, pp.
328343.
[23] J. Li, J.-B. Martens, and J. J. van Wijk, A model of
symbolsize discrimination in scatterplots, in Proceedings of the
SIGCHIConference on Human Factors in Computing Systems (CHI
10).ACM, 2010, pp. 25532562.
[24] J. Heer and G. Robertson, Animated transitions in
statisticaldata graphics, IEEE Transactions on Visualization and
ComputerGraphics, vol. 13, no. 6, pp. 12401247, November 2007.
[25] A. Dix, J. E. Finlay, G. D. Abowd, and R. Beale,
Human-ComputerInteraction, 3rd ed. Prentice Hall, 2003.