Top Banner
Research Article Modeling the Process of Event Sequence Data Generated for Working Condition Diagnosis Jianwei Ding, 1,2,3 Yingbo Liu, 2,3 Li Zhang, 2,3 and Jianmin Wang 2,3 1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China 2 Institute of Information System & Engineering, School of Soſtware, Tsinghua University, Beijing 100084, China 3 School of Soſtware, Tsinghua University, East Main Building, Beijing 100084, China Correspondence should be addressed to Jianwei Ding; [email protected] Received 1 June 2015; Accepted 5 July 2015 Academic Editor: Xiaoyu Song Copyright © 2015 Jianwei Ding et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Condition monitoring systems are widely used to monitor the working condition of equipment, generating a vast amount and variety of telemetry data in the process. e main task of surveillance focuses on analyzing these routinely collected telemetry data to help analyze the working condition in the equipment. However, with the rapid increase in the volume of telemetry data, it is a nontrivial task to analyze all the telemetry data to understand the working condition of the equipment without any a priori knowledge. In this paper, we proposed a probabilistic generative model called working condition model (WCM), which is capable of simulating the process of event sequence data generated and depicting the working condition of equipment at runtime. With the help of WCM, we are able to analyze how the event sequence data behave in different working modes and meanwhile to detect the working mode of an event sequence (working condition diagnosis). Furthermore, we have applied WCM to illustrative applications like automated detection of an anomalous event sequence for the runtime of equipment. Our experimental results on the real data sets demonstrate the effectiveness of the model. 1. Introduction Currently, with the rapid development of technology for the Internet of ings [1, 2], condition monitoring systems (CMSs) [3, 4] are widely used to monitor the working condition of equipment. KOMTRAX (KOMTRAX: http:// www.komatsuamerica.com/komtrax) from Komatsu and IEM (IEM: http://www.sanygroup.com/group/en-us/) from SANY are well-known CMSs that generate a large amount of telemetry data while monitoring the working condition of equipment at runtime. Event sequence data especially are one main type of telemetry data, which record a sequence of the operations on the equipment. If we can analyze the working mode of equipment according to these event sequence data, it will help us better understand the working condition of equipment at runtime. Diagnosis of working condition of equipment mainly depends on analyzing the telemetry data collected at the runtime of the equipment. In most CMSs, data analysis is the only way for engineers to diagnose the working condition of equipment. Telemetry data mainly contain operation events, performance counters, alert events, and others in the real CMSs. Most telemetry data can be classified into two categories: continuous time series data and temporal event data. Time series data is a sequence of real-valued data points, captured and sampled typically at successive time points equally spaced with a uniform time interval. For example, the engine temperature of equipment is a typical example of time series in the CMSs. An event sequence in CMSs is used to record the occurrences of a specific message indicating that something such as an operation has happened in the equipment. For example, an event sequence of “pumping concrete” in a concrete pump truck contains an event pumping concrete, which represents the idea that the concrete pump truck starts pumping the concrete. As illustrated in Figure 1, an event sequence records all the specific operations on the equipment at runtime, which usually provides enough information to engineers for working condition diagnosis of equipment. e previous studies on analysis of event sequence data for working Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 693450, 13 pages http://dx.doi.org/10.1155/2015/693450
14

Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

Jun 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

Research ArticleModeling the Process of Event Sequence Data Generated forWorking Condition Diagnosis

Jianwei Ding123 Yingbo Liu23 Li Zhang23 and Jianmin Wang23

1Department of Computer Science and Technology Tsinghua University Beijing 100084 China2Institute of Information System amp Engineering School of Software Tsinghua University Beijing 100084 China3School of Software Tsinghua University East Main Building Beijing 100084 China

Correspondence should be addressed to Jianwei Ding dingjw09mailstsinghuaeducn

Received 1 June 2015 Accepted 5 July 2015

Academic Editor Xiaoyu Song

Copyright copy 2015 Jianwei Ding et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Condition monitoring systems are widely used to monitor the working condition of equipment generating a vast amount andvariety of telemetry data in the process The main task of surveillance focuses on analyzing these routinely collected telemetrydata to help analyze the working condition in the equipment However with the rapid increase in the volume of telemetry datait is a nontrivial task to analyze all the telemetry data to understand the working condition of the equipment without any a prioriknowledge In this paper we proposed a probabilistic generative model called working condition model (WCM) which is capableof simulating the process of event sequence data generated and depicting the working condition of equipment at runtimeWith thehelp of WCM we are able to analyze how the event sequence data behave in different working modes and meanwhile to detect theworkingmode of an event sequence (working condition diagnosis) Furthermore we have appliedWCM to illustrative applicationslike automated detection of an anomalous event sequence for the runtime of equipment Our experimental results on the real datasets demonstrate the effectiveness of the model

1 Introduction

Currently with the rapid development of technology forthe Internet of Things [1 2] condition monitoring systems(CMSs) [3 4] are widely used to monitor the workingcondition of equipment KOMTRAX (KOMTRAX httpwwwkomatsuamericacomkomtrax) from Komatsu andIEM (IEM httpwwwsanygroupcomgroupen-us) fromSANY are well-known CMSs that generate a large amountof telemetry data while monitoring the working condition ofequipment at runtime Event sequence data especially are onemain type of telemetry data which record a sequence of theoperations on the equipment If we can analyze the workingmode of equipment according to these event sequence datait will help us better understand the working condition ofequipment at runtime

Diagnosis of working condition of equipment mainlydepends on analyzing the telemetry data collected at theruntime of the equipment In most CMSs data analysis isthe only way for engineers to diagnose the working condition

of equipment Telemetry data mainly contain operationevents performance counters alert events and others inthe real CMSs Most telemetry data can be classified intotwo categories continuous time series data and temporalevent data Time series data is a sequence of real-valueddata points captured and sampled typically at successivetime points equally spaced with a uniform time interval Forexample the engine temperature of equipment is a typicalexample of time series in the CMSs An event sequencein CMSs is used to record the occurrences of a specificmessage indicating that something such as an operation hashappened in the equipment For example an event sequenceof ldquopumping concreterdquo in a concrete pump truck contains anevent pumping concrete which represents the idea that theconcrete pump truck starts pumping the concrete

As illustrated in Figure 1 an event sequence recordsall the specific operations on the equipment at runtimewhich usually provides enough information to engineers forworking condition diagnosis of equipment The previousstudies on analysis of event sequence data for working

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 693450 13 pageshttpdxdoiorg1011552015693450

2 Mathematical Problems in Engineering

Cantileverunfolding

Cantileverfolding

Landing legunfolding

Concretepumping

Concretepumping

Concretepumping

Concretepumping

Concretepumping

Landing legunfolding

Warm-up

Operation event sequence

Oil temperature

Figure 1 Actual example of telemetry data of a concrete pump truck at one runtime Oil temperature is a typical continuous time series whichrecords the variation of oil temperature at one runtime of a concrete pump truck Operation event sequence is a typical event sequence whichrecords the occurrences of six operation events (warm-up landing leg unfolding cantilever unfolding concrete pumping landing leg foldingand cantilever folding) on the concrete pump truck at one runtime

condition diagnosis are mainly grouped into two categoriesthe correlation analysis between distinct event sequences [6ndash8] and events based process mining algorithms [9 10] Thecorrelation analysis between distinct event sequences usuallyprovides useful hints for causality analysis Although corre-lated metrics may not exactly be the root causes of eventsthey could also provide intermediate useful information thatpinpoints the root causes of events The process miningalgorithms focus on the occurrence order of distinct eventswith the help of processmodels [11 12] whichmainly indicatethe working process of equipment

However the occurrence frequency of events also pro-vides us with important information for working conditiondiagnosis which is ignored by the previous studiesWe take aconcrete pump truck as an example As illustrated in Figure 1at a normal runtime of concrete pump truck a concrete pumptruck needs five concrete pumping events to finish pumping ahopper of the concrete With the wear and tear of concretepump truck the concrete pump truck needs seven or moreconcrete pumping events to finish pumping a hopper of theconcrete Although the occurrence order of the six events(warm-up rarr landing leg unfolding rarr cantilever unfoldingrarr concrete pumping rarr landing leg folding rarr cantileverfolding) is the same yet the working condition of the concretepump truck has changed a lot If the occurrence frequenciesof events are taken into consideration for working conditiondiagnosis it will enhance the ability of working conditiondiagnosis for engineers which will help us better understandthe working condition of equipment

In this paper we proposed a probabilistic generativemodel called working condition model (WCM) which iscapable of depicting the working condition of equipmentat runtime According to event sequence data in differentworking condition of equipment we simulated the pro-cess of event sequence data generated in order to get theWCM of equipment Furthermore with the help of WCMwe extended the application of event sequence data tomore domains such as anomaly detection and the vari-ation trend analysis of working condition Motivated bythe real requirement of working condition diagnosis our

working condition model tries to answer the following threequestions

(a) How many types of working modes (details inSection 3) does the equipment have at runtime

(b) In each type of working mode how does an eventsequence behave

(c) For a new event sequence at runtime which type ofworking mode does it belong to

Our evaluation consists of multiple phases First wemodel theWCMof the real event sequence data sets collectedfrom279 concrete pump trucks over a period of 6monthsWeanalyze the performance of the WCM of the concrete pumptruck Then we apply the WCM of the concrete pump truckfor more applications including anomaly detection and thevariation trend analysis of working condition

Our work presents a probabilistic generative modelnamed WCM to simulate the process of event sequence datagenerated and to depict the working condition of equipmentat runtime The contributions of this paper are as follows

(i) Motivated by real applications we propose the WCMto depict the working condition of equipment Tothe best of our knowledge this is the first attempt tosimulate the process of event sequence data generatedfor working condition diagnosis

(ii) We illustrate two useful applications based onWCMautomated detection for a new work cycle and auto-mated detection for anomalous work cycles

(iii) The experiments on real data from a well-knownChinese construction machinery manufacturer showthe effectiveness of our model

The rest of the paper is organized as follows In Section 2we introduce some related works The problem statementand formulation are introduced in Section 3 We introduceour approach in Sections 4 and 5 The empirical evaluationis shown in Section 6 Finally we conclude our work inSection 7

Mathematical Problems in Engineering 3

2 Related Work

21 Analysis of Event Sequence Data An event is a happeningof interest [13 14] In the surveillance of equipment theinterest in events comes mostly from the state of equipmentchanges that are produced by equipment manipulation oper-ations [15] Example events in the actual surveillance of theconcrete pump truck include warm-up landing leg unfoldingcantilever unfolding concrete pumping landing leg folding andcantilever folding as shown in Figure 1 When a sequenceof events takes place we refer to these occurrences to getthe event sequence data The main idea of analysis of eventsequence data is to process events to gather meaningful orvaluable information and then to derive actions from them

Events in an event sequence are often interrelated andform complex relationshipsThe correlation analysis of eventsequence data [7 8 16 17] focuses on detecting these rela-tionships and is extended to other related applications such asanomaly detection [18ndash20] A temporal spatial or causal rela-tionship of events can determine the partial order betweenevents [16] Hence event sequence data based processminingalgorithms focus on the causal relationship of events byanalyzing the occurrence order of distinct events [21 22]There have been many existing process mining algorithms[23 24] and tools [25 26] to mine the causal relationshipof events which is capable of instructing engineers to betterunderstand the operation procedure of equipment [27]

22 Working Condition Diagnosis Working condition ofequipment [28] is the condition in which the equipmentworks including but not limited to such things as amenitiesphysical environment stress and noise levels degree of safetyor danger and the like The working condition diagnosisusually uses specific models or variables for different appli-cations In the correlation analysis of event sequence datafor working condition diagnosis the correlation coefficientsof event sequence data for example the Pearson correlation[6 7 29 30] and the Rank correlation [31] are used todepict the working conditions of equipment In the processmining algorithms of event sequence data varieties of processmodels for example Petri net [32] and business processmodeling notation [33 34] are specific to depicting theworking condition of equipment for different process miningalgorithms In this paper we take the occurrences frequenciesof events into consideration and simulate the process ofevent sequences generated in different working modes ofequipment Hence we use the occurrence probability ofevents in the event sequence to depict the working conditionof equipment at runtime

23 Probabilistic Generative Model In probability and statis-tics a generative model [35] is a model for randomlygenerating observable data typically given some hiddenparameters It specifies a joint probability distribution overobservable data Generative models are used in machinelearning [36] either for modeling data directly (ie modelingobservations drawn from a probability density function) oras an intermediate model to forming a conditional prob-ability density function A conditional distribution can be

120573

120572 120579 Z W

MN

120593

Figure 2 Graphic model of a typical generative model LDA Theboxes are plates representing replicates The outer plate representsdocuments while the inner plate represents the repeated choice oftopics and words within a document 119872 denotes the number ofdocuments and 119873 the number of words in a document 120572 120573 120579 120593 119885

are hidden parameters and 119882 is observations Details about LDArefer to [5]

formed from a generative model through the Bayesian rule[36]

For example latent Dirichlet allocation (LDA) [5 37 38]is a typical generative model which is widely used in manydomains In natural language processing LDA is capable ofsimulating the process of documents generated well whereobservations are words collected into documents and itposits that each document is a mixture of a small numberof topics and that each wordrsquos creation is attributable toone of the documentrsquos topics Figure 2 illustrates the graphicmodel [39] of LDA [5]With plate notation the dependenciesamong variables can be captured concisely

3 Terminology and Notation

As theworking condition of equipment is always correspond-ing to a period for example one day or one week we firstdetermine the basic unit of observation forworking conditiondiagnosis intended to ease the working condition diagnosisaccording to the event sequence data

Definition 1 (work cycle) A work cycle of a piece of equip-ment denoted by 119904 is a complete work period that is acomplete usage period of the equipment from the time theequipment starts working until it shuts down

We define the idea that 119904 consists of elements that areintegers from 1 119878 where 119878 is the number of work cyclesThere is one important advantage in adopting the workcycle as the basic unit of observation in terms of eventsequence data analysis for working condition diagnosis Inour opinion no matter in what kind of circumstances (egdifferent places and different climates) the equipment workstheworking condition of equipment in onework cyclemainlybehaves similarly For example in awork cycle of the concretepump truck illustrated in Figure 1 different concrete pumptrucks usually have a similar working process warm-up rarr

landing leg unfolding rarr cantilever unfolding rarr concretepumping rarr landing leg folding rarr cantilever folding eventhough the concrete pump truck works in different workingcircumstances

4 Mathematical Problems in Engineering

Definition 2 (event) An event of the equipment denoted by119890 is to record an occurrence of a specific message indicatingthat something such as an operation has happened in theequipment

For example in Figure 1 there are six events (warm-uplanding leg unfolding cantilever unfolding concrete pumpinglanding leg folding and cantilever folding) which reflect theworking condition of some component in the concrete pumptruck respectively We will use integers to denote the entriesin the event set with each event 119890 taking a value from 1 119864where 119864 is the number of unique events in the event setdenoted by E

Definition 3 (event sequence) An event sequence denotedby e119904 of the equipment consists of a sequence of events that

occur in work cycle 119904

An event sequence is represented as a vector of events e119904

with 119873119904entries For example in Figure 1 the event set of the

concrete pump truck contains six (119864 = 6) events denotedby Epump = (1 6) where the integers represent theentry of the events warm-up landing leg unfolding cantileverunfolding concrete pumping landing leg folding and cantileverfolding respectively Hence the event sequence is equal toa vector with the length 119873

119904= 10 denoted by e

119904=

(1 2 3 4 4 4 4 4 5 6)Suppose that the data set has 119878 work cycles of the

equipment corresponding to 119878 event sequences The data setwith 119878 event sequences is represented as a concatenation ofthe event sequence vectors which we will denote by e having119873 = sum

119878

119904=1 119873119904

In a work cycle an event sequence provides us a mainworking process of the equipment However an occurrenceof the event is also relatedwith theworking place andworkingdate of the equipment For example the concrete pumptruck will add an operation event concrete mixing in order toprevent the concrete setting if theworking temperature is lowThe working temperature is directly related with the workingplace (eg north or south of China) and working date (egwinter or summer)

In addition to these events we have the informationabout the characteristics of each event sequence (work cycle)working place working date and equipment pieces numberof the work cycle We define p

119904to be the set of working places

of work cycle 119904 p119904consists of elements that are integers from

1 119875 where 119875 is the number of working places whichgenerated the event sequences in the data set 119875

119904will be used

to denote the number of working places of work cycle 119904 Wedefine 120591 to be the set of working dates of work cycle 119904 120591

119904

consists of elements that are integers from 1 119879 where119879 isthe number of working dates (In order to ease the notationin the working date we just record the working month of thework cycle which means 119879 = 12) 119879

119904will be used to denote

the number of working dates of work cycle 119904 We define 120596119904to

be the set of equipment number of work cycle 119904 120596119904consists

of elements that are integers from 1 Ω where Ω is thenumber of the equipment pieces

Definition 4 (work cycle characteristic) A work cycle char-acteristic (WCC) is five-tuple set denoted by W

119904=

E e119904 p119904 120591119904 120596119904 which record all the information about the

work cycle 119904

A WCC is corresponding to a work cycle so the originaldata set is redefined as a group of WCCs denoted by D =

W1 W119878 The WCCs of two work cycles are likelyto be different though they have the same working placeand working date The main differences between the workcycles center on the occurrence of the events However theoccurrence disciplines of the events are akin to each other forthe work cycles in the same working mode For example theconcrete pump truck has twomain workingmodes pumpingmode and travelingmodel For thework cycle in the pumpingmode of the concrete pump truck the occurrence of the eventconcrete pumping is frequent as shown in Figure 1 Howeverfor the work cycle in the traveling mode the occurrence ofthe event concrete pumping is none since the concrete pumptruck can not pump concrete in the traveling mode

Definition 5 (working mode) A working mode denoted by120587 is on behalf of a kind of work cycles that is about a specificsubject has an identifiable purpose and can stand alone

For event set E we define working mode vector (WMV)G(120587) = (1 1198881) (119864 119888

119864) to be the set of events 119890 associated

with its occurrence frequency 119888119890 where sum

119864

119890=1 119888119890

= 1 TheWMV G(120587) is able to depict the occurrence disciplinesof events according to the occurrence frequency of eventsTherefore if we can get a group of WMVs for a group ofwork cycles it will help us better understand the occurrencedisciplines of events

Definition 6 (working mode space (WMS)) A working modespace (WMS) denoted by G = G(1) G(Π) is a set ofWMVs for a group of given work cycles of equipment

Actually the WMS is akin to a group of cluster centerseach of which depicts the working condition of equipment indifferent working modes

4 The Inference of WMS

In this section we develop effective algorithms for theinference of the WMS for a group of given work cycles ofequipment Before proceeding we formulate our problem asfollows

WMS Inference Problem Given a group of work cycles asso-ciated with the corresponding WCCs D = W1 W119878the inference problem is to infer the WMS model G =

G(1) G(Π) whereΠ represents the number of workingmodes

With the help of WMS we can find that in differentworking places and differentworking dates thework cycles ofequipment have different working modes Meanwhile thereare several working modes in the same working place andthe same working date The WMV of working mode reflects

Mathematical Problems in Engineering 5

the working condition of its corresponding work cycleespecially the occurrence disciplines of events

In the remainder of this section we first introduce theWCM for learning theWMS for a group of given work cyclesand then introduce the inference framework of the WCM

41 The WCM TheWCM is a hierarchical generative modelin which each event 119890 in a work cycle is associated with threelatent variables a working place x a working date y anda working mode z These latent variables augment the 119864-dimensional vector e (indicating the values of all events in theevent set E) with three additional 119864-dimensional vectors xy and z indicatingworking place working date andworkingmode assignments for the 119864 events

As we observed the sets of working places and the sets ofworking dates for each work cycle are observed This leavesthe unresolved issue of having unobserved working placesand working dates and avoids the need to define a prioron working places and working dates which is outside ofthe scope of our model Each working place is associatedwith a multinomial distribution over working mode andeach working date is also associated with a multinomialdistribution over working mode Conditioned on the setof working places and the set of working dates associatedwith their distributions over working modes the process bywhich the corresponding event sequence for a work cycleis simulated can be summarized as follows first a workingplace and a working date are respectively chosen uniformlyat random for each event that will appear in the workcycle next a working mode is sampled for each event bothfrom the distribution over working mode associated withthe working place of that event and from the distributionover working mode associated with the working date ofthat event finally the events themselves are sampled fromthe distribution over events associated with each workingmode

This simulating process can be expressed more formallyby defining some of the other variables in the WCMAssume we have Π working modes We can parameterizethe multinomial distribution over working modes for eachworking place using matrix Θ of size Π times 119875 with elements120579120587119901

that stand for the probability of assigning working mode120587 to an event occurring in working place 119901 Thus sum

Π

120587=1 120579120587119901

=

1 and for simplicity of notation we will drop the index 120587

when convenient and use 120579119901to stand for the 119901th column

of the matrix Θ Similarly we use matrix Δ of size Π times 119879

to parameterize the multinomial distribution over workingmodes for each working date where elements 120575

120587120591stand for

the probability of assigning working mode 120587 to an eventoccurring in the working date 120591 Thus sum

119879

120591=1 120575120587120591

= 1 andwe will also drop the index 120587 when convenient and use 120575

120591

to stand for the 120591th column of the matrix Δ intended tosimplify the notation The multinomial distributions overevents associated with each workingmode are parameterizedby matrix Φ of size 119864 times Π with elements 120601

119890120587that stand

for the probability of simulating to make event 119890 occurin the working mode 120587 Again sum

119864

119890=1 120601119890120587

= 1 and 120601119890

stands for the 119890th column of the matrix Φ These three

120572

120573

120574120575

Π

T

S

120579

120601

x y

z

e

P

Ns

Ps 120591s

Figure 3 The graphic representation of WCM

multinomial distributions are assumed to be generated fromsymmetric Dirichlet priors with hyperparameters 120572 120573 and120574 respectively In the results of this paper we assume thatthese hyperparameters are fixedThis notation is summarizedin Notations

The sequential simulating procedure of first picking aworking place and a working date respectively followed bypicking a working mode and then simulating an event tooccur in this working mode according to the probabilitydistributions leads to the following generative process

(1) For each working place 119901 = 1 119875 choose 120579119901

sim

Dirichlet(120572)

for each working date 120591 = 1 119879 choose 120575120591

sim

Dirichlet(120574)

for each working mode 120587 = 1 Π choose 120601120587

sim

Dirichlet(120573)

(2) For each work cycle 119904 = 1 119878

given the vector of working places p119904

given the vector of working dates 120591119904

for each event 119894 = 1 119873119904

conditioned on p119904choose working place

119909119904119894

sim Uniform(p119904)

conditioned on 120591119904choose working date

119910119904119894

sim Uniform(120591119904)

conditioned on 119909119904119894and 119910

119904119894choose working

mode 119911119904119894

sim Discrete(120579119909119904119894

120575119910119904119894

)conditioned on 119911

119904119894choose event 119890

119904119894sim

Discrete(120601119911119904119894

)

The graphical model corresponding to this process isshown in Figure 3 Under this simulating process the work-ing mode is drawn independently when conditioned onΦ and each working mode is drawn independently whenconditioned on Θ Δ and Π The probability of the eventsequence e conditioned on Θ Δ and Φ (and implicitly ona fixed number of working modes Π) is

119875 (e | Φ Δ ΘPT) =

119878

sum

119904=1119875 (e119904

| Φ Δ Θ p119904 120591119904) (1)

6 Mathematical Problems in Engineering

With the help of (1) we can first obtain the probability ofthe event sequence in each work cycle e

119904 by summing over

the latent variables x y and z to get what is shown in (3)Consider

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1119875 (119890119904119894

| Φ Δ Θ p119904 120591119904) =

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894 119911119904119894

= 120587 119909119904119894

= 119901 119910119904119894

= 120591 | Φ Δ Θ p119904 120591119904)

=

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894

| 119911119904119894

= 120587 Φ) 119875 (119911119904119894

= 120587 | 119909119904119894

= 119901 Θ) 119875 (119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) 119875 (119909119904119894

= 119901 | p119904) 119875 (119910

119904119894= 120591 | 120591

119904)

(2)

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

(3)

119875 (e | 120572 120573 120574PT) = int

Θ

int

Δ

int

Φ

119875 (e | Θ Δ ΦPT) 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (4)

= int

Θ

int

Δ

int

Φ

[

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

] 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (5)

In (3) the factorizationmakes use of the conditional inde-pendence assumptions of model Meanwhile the variablesx and y are mutually stochastically independent Equation(3) represents the probability of the events e in terms ofthe entries of the parameter matrices Θ Φ and Δ asintroduced above The probability distribution over workingplace assignments 119875(119909

119904119894= 119901 | p

119904) is assumed to be

uniform over the elements of p119904and deterministic if 119875

119904=

1 Similarly the probability distribution over working dateassignments 119875(119910

119904119894= 120591 | 120591

119904) is assumed to be uniform over

the elements of 120591119904and deterministic if 119879

119904= 1The probability

distribution over working mode assignments both 119875(119911119904119894

=

120587 | 119909119904119894

= 119901 Θ) and 119875(119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) is themultinomial distributions 120579

119901and 120575120591in Θ and Δ respectively

that corresponds to working place 119901 and working date 120591respectively The probability of an event given a workingmode assignment 119875(119890

119904119894| 119911119904119894

= 120587 Φ) is the multinomialdistribution 120601

120587in Φ that corresponds to working mode

120587In (4) and (5) we treat Θ Φ and Δ as random variables

and compute themarginal probability of a corpus by integrat-ing them out 119875(Θ Δ Φ | 120572 120574 120573) = 119875(Θ | 120572)119875(Δ | 120574)119875(Φ |

120573) are the Dirichlet priors on Θ Δ and Φ respectively as wedefined before

5 Inference of WCM from Data

The WCM contains three continuous random variablesΘ Δ and Φ Various approximate inference approacheshave recently been proposed for estimating the posteriordistribution for continuous random variables in hierarchicalBayesianmodels In this paper our inferencemethod is Gibbssampling [40] which is a special formofMarkov chainMonteCarlo

Our target of estimation is to compute the posteriordistribution119875(Θ Δ Φ | 120572 120574 120573) In order to sample the values

of the distribution we have to use the latent variables x y andz to estimate the posterior distribution

119875 (Θ Δ Φ | 120572 120574 120573)

= sum

xyz119875 (Θ Δ Φ | x y z 120572 120574 120573) 119875 (x y z | 120572 120574 120573)

(6)

The estimation process mainly involves two steps first weuse Gibbs sampling to get approximate posterior 119875(x y z |

120572 120574 120573) second 119875(Θ Δ Φ | x y z 120572 120574 120573) can be computeddirectly for each sample by exploiting the fact that theDirichlet distribution is conjugate to the multinomial

51 Gibbs Sampling Using Gibbs sampling we can generatea sample from the joint distribution 119875(z y z | 119863train 120572 120573)

by two steps first sampling working place assignment 119909119904119894

working date assignment 119910119904119894 and working mode assignment

119911119904119894for individual event 119890

119904119894 conditioned on fixed assignments

of working places working dates and working modes for allother events in the data set second repeating this processfor each event A single Gibbs sampling iteration consistsof sequentially performing this sampling of working placeworking date and working mode assignments for eachindividual event in the data set

119875 (119909119904119894

= 119901 119910119904119894

= 120591 119911119904119894

= 120587 | 119890119904119894

= 119890 xminus119904119894

yminus119904119894

zminus119904119894

eminus119904119894

PT 120572 120573)

prop

119862119864Π

119890120587minus119904119894+ 120573

sum1198901015840 119862119864Π

1198901015840120587minus119904119894

+ 119864120573

119862Π119875

120587119901minus119904119894+ 120573

sum1199011015840 119862Π119875

1205871199011015840minus119904119894

+ 119875120572

sdot

119862Π119879

120587120591minus119904119894+ 120573

sum1205911015840 119862Π119879

1205871205911015840minus119904119894

+ 119879120574

(7)

According to (1)sim(5) we can derive a basic equationneeded for the Gibbs sampler as shown in (7) In (7)

Mathematical Problems in Engineering 7

119862Π119875 means working mode assigned to working place count

matrix where 119862Π119875

120587119901minus119904119894means the number of events assigned

to working mode 120587 in the working place 119901 excluding theworking mode assignment to event 119890

119904119894 Similarly 119862

Π119879 meansworking mode assigned to working date count matrix where119862Π119879

120587120591minus119904119894means the number of events assigned toworkingmode

120587 in the working date 120591 excluding the working mode assign-ment to event 119890

119904119894 Similarly 119862

119864Π represents event assignedto working mode count matrix where 119862

119864Π

119890120587minus119904119894represents

the number of events from the 119890th entry in the event setassigned to working mode 120587 excluding the topic assignmentto event 119890

119904119894 Meanwhile x

minus119904119894 yminus119904119894

zminus119904119894

eminus119904119894

represents thevector of working place assignment vector of working dateassignment vector of working mode assignments and vectorof event observations in the data set except for the 119894th eventin the 119904th work cycle respectively

The main sampling steps are as follows we first ini-tialize the working place working date and working modeassignments x y and z randomly In each Gibbs samplingiteration we sequentially draw the working mode work-ing place and working date assignment of the 119894th eventfrom the joint conditional distribution in (7) With theincreasing of iterations the Gibbs sampler will approach itsstationary distributionmdashthe posterior distribution 119875(z y z |

119863train 120572 120573)

52 The Posterior Probability Given z y z 119863train 120572 120573 and 120574computing posterior distributions on Θ Δ and Φ is straight-forward Based on the fact that the Dirichlet distribution isconjugate to the multinomial distribution then we can get

120601120587

| z 120573 119863train sim Dilichlet (119862

119864Π

120587+ 120573)

120579119901

| x z 120572 119863train sim Dilichlet (119862

Π119875

119901+ 120572)

120575120591

| y z 120574 119863train sim Dilichlet (119862

Π119879

120591+ 120574)

(8)

where 119862119864Π

120587represents the vector of counts of the number

of times each event has been assigned to working mode120587 119862Π119875

119901and 119862

Π119879

120591are similar to 119862

119864Π

120587 Then we can evaluate

the posterior probability of each element of Θ Δ and Φ asfollows

119864 [120601120587

| z 120573 119863train] =

(119862119864Π

)

119896

+ 120573

sum1198901015840 (119862119864Π

1198901015840120587

)

119896

+ 119864120573

119864 [120579119901| x z 120572 119863train] =

(119862Π119875

)

119896

+ 120572

sum1205911015840 (119862Π119875

1205871015840119901

)

119896

+ 119875120572

119864 [120575120591| y z 120574 119863train] =

(119862Π119879

)

119896

+ 120574

sum (119862Π119879

1205871015840119905

)

119896

+ 119879120574

(9)

where (119862119864Π

)119896 is the matrix of working mode assigned to

event counts exhibited in (z)119896 and 119896 refers to sample 119896

from the Gibbs sampler These posterior probabilities also

Hopper Transportationcylinder

Stirringsystem

Pumpingsystem

Landingleg system

Cantileversystem

Concrete Specifiedlocation

Concrete streamOperation sequence

Related system

Figure 4 The stream of the concrete in the concrete pump truckand the operation sequence of the concrete pump truck at runtime

provide point estimates for Φ Θ and Δ and correspond tothe posterior predictive distribution for the next event froma working mode the next event from a working date and thenext working mode in a work cycle respectively

6 Experimental Evaluation

61 Data Preparation We trained the WCM on a real worlddata set collected from a well-known Chinese constructionmachinery manufacturer The data set is a set of eventsequence data from the concrete pump truck in 6 months(from June 2012 to November 2012) This data set contains119878 = 32 632 work cycles 119875 = 5 different working places119879 = 6 different working dates a total of 119873 = 22 418 756event tokens and an event set size of 119864 = 33 uniqueevents The working date of each work cycle is accordingto its real working month which means the working dateset T = Jun JulAug SepOctNov Because the eventsequence data are all collected in the Chinese Mainlandwe divide the working places into 5 regions according toadministrative region of China Northern China Northeast-ernChina EasternChinaMid-SouthernChina andWesternChina

The concrete pump truck is a type of constructionmachinery which is a truck associated with a concrete pumpIt alternates between two working statuses traveling andpumping In the pumping status it will push the concreteto the specified location In the traveling status it is just atruck In the experiment we mainly focus on events in thepumping status Figure 4 shows the stream of the concrete inthe concrete pump truck at runtime and operation sequenceof different systems in the concrete pump truckThe concretepump truck first switches to pumping status and then unfoldsand fixes the landing leg Next it unfolds cantilever tothe specified location Afterwards the concrete is pouredto the hopper and meanwhile the stirring system initiatesstirring the concrete Finally the pumping system initiatespumping the concrete in the hopper to the specified locationWhen the pumping ends the concrete pump truck stops thepumping system and then folds the cantilever and landing leg

8 Mathematical Problems in Engineering

Table 1 Event set

Event Abbr Type Related systemStop pumping mandatorily SPM Alarm event AllReminder of concrete import RCI Alarm event HopperConcrete piston withdrawing CPW Alarm event Pumping systemReminder of concrete cylinder water RCSW Alarm event HopperSwing cylinder initiate SCI Operation event Pumping systemStalling of engine SoE Alarm event AllAlteration of operation mode (remote or close) AOM Operation event Pumping systemAlteration of pump truck status (pumping or travelling) APTS Operation event AllControl of pumping displacement CPD Operation event Pumping systemTransportation cylinder initiate TCI Operation event Pumping systemManual control of master cylinder MCMC Operation event Pumping systemManual control of swing cylinder MCSC Operation event Pumping systemDetection of system pressure DSP Alarm event Pumping systemManual control of engine speed MCES Operation event Pumping systemHigh pressure mode initiate HPMI Operation event Pumping systemWarm-up initiate WUI Operation event Pumping systemWater pump initiate WPI Operation event HopperConcrete stirring initiate CSI Operation event Stirring systemCantilever folding initiate CFI Operation event Cantilever systemTemperature control initiate TCI Operation event Pumping systemCantilever movement CM Operation event Cantilever systemLanding leg movement LLM Operation event Landing leg systemDetection of oil pressure DOP Alarm event Pumping systemLanding leg folding LLF Operation event Landing leg systemRotary table movement RTM Operation event Cantilever systemOil pump initiate OPI Operation event Pumping systemEnergy accumulator initiate EAI Operation event Pumping systemBypath valve initiate BVI Operation event Pumping systemConcrete pumping initiate CPI Operation event Pumping systemMaster cylinder initiate MCI Operation event Pumping systemCantilever shock absorbers initiate CSAI Alarm event Cantilever systemInitiate of system cooling ISC Operation event Pumping systemHydraulic oil supplement HOS Operation event Pumping system

successively Table 1 shows the relations between systems andevents in the concrete pump truck

Table 1 shows all the events in the event set There aretwo types of events alert event and operation event Theoccurrence of an alarm event is to remind the operator thatsome emergency happens For example the occurrence ofevent RCI means to remind the operator to import concreteinto the hopper The alarm event is not a regular operationThe operation event is the real record of regular operations inthe concrete pump truck

62 Analysis for Gibbs Sampling Using Perplexity As men-tioned earlier in the experiment described in this paperwe donot estimate the hyperparameters120572120573 and 120574 Instead they arefixed at 50Π 001 and 50Π respectively In this paper weuse the perplexity of themodel on test work cycles to evaluatewhen the performance of the model begins to stabilize

The perplexity of new unobserved work cycle 119904 thatcontains events e

119904and is conditioned on the working places

p119904and working dates 120591

119904of the work cycle is defined as

Perplexity (e119904

| p119904 120591119904) = exp(minus

log119875 (e119904

| p119904 120591119904)

119873119904

) (10)

where 119875(e119904

| p119904 120591119904) is the probability assigned by the

WCM To simplify notation here we do not consider theexplicit dependency on the hyperparameters For multiplework cycles we report the average perplexity overwork cyclesdefined as follows

Perplexity =

119878

sum

119904=1

Perplexity (e119904

| p119904 120591119904)

119878

(11)

The lower the perplexity the better the performance of themodel We can obtain an approximate estimate of perplexity

Mathematical Problems in Engineering 9

0 20 40 60 80 100 120 140 160 180 200

4500

4600

4700

4800

4900

5000

5100

5200

5300

Iteration

Perp

lexi

ty

K = 10

K = 8

K = 6

K = 4

K = 2

The number of working modes Π = 200

Figure 5 Perplexity as a function of iterations of the Gibbs samplerfor a Π = 200 model respectively Each curve shows the perplexityfromaveraging for different settings ofΠ but nowover a larger rangeof sampling iterations

by averaging over multiple samples according to (9) calcu-lated as follows

119875 (e119904

| p119904 120591119904)

=

1119870

119870

sum

119896=1

119873119904

prod

119894=1

1119875119904119879119904

sum

119901isinp119904120591isin120591119904120587

119864 [120579120587119901

120575120587120591

120601119890119904119894120587

| x119896 y119896 z119896]

(12)

Experimental results using different values for 119870 indicatedthat 119870 = 10 samples is a reasonable choice to get a goodapproximation of the perplexity Because of the exchangeabil-ity of the working modes it is possible that quite differentsolutions of working modes are detected across differentsamples In practice however we have also found thatthe solutions of working modes are relatively stable acrosssamples with only a small subset of unique working modesappearing in any sampleHencewe use the average perplexityvalues across samples in the experiment

Figure 5 illustrates the perplexity as a function of itera-tions of the Gibbs sampler for aΠ = 200model to fit the dataset respectively It appears from Figure 5 that performance ofmodels (for different settings of parameter 119870) trained usingthe Gibbs sampler appears to stabilize rather quickly (afterabout 100 iterations) at least in terms of perplexity on thedata set This indicates that the perplexity values flatten outafter a 100 or so iterations of the Gibbs sampler

63The Number ofWorkingModesΠ Although the perplex-ity computation is able to be averaged over different Gibbssampler runs other applications of the model rely on theanalysis of each working mode and are based on the analysisof each sample Meanwhile the setting of the parameter Π isalso determined according to the perplexity The parameterΠ represents the number of working modes

0 50 100 150 200 250 30044504500455046004650470047504800485049004950

Perp

lexi

ty

Perplexity

Number of working modes Π

K = 10 Gibbs samples

Figure 6 Perplexity as a function of the parameter Π of the Gibbssampler for 119870 = 10 samples

Figure 6 illustrates the perplexity as a function of theparameter Π in 119870 = 10 Gibbs samples Empirical settingsof the parameter Π show that the average perplexity overthe data set decreases with the increase of the parameterΠ Experimental results confirm that the average perplexityindeed decreases as we made analysis In particular theperplexity values flatten out after the parameter Π is set to200 This indicates that the parameter Π = 200 fits the dataset in the model

64 Analysis of the WCM Results About the analysis of theWCM results we can use the point estimate of the WCMparameters to look at specific Θ Δ and Φ distributions andrelated quantities that can be derived from these parameters(such as the probability of a working place and a working dategiven a randomly selected event fromaworkingmode) In thefollowing results we take a specific sample x

119896 y119896 and z

119896 after

100 iterations from a single arbitrarily selected Gibbs run andthen generate point estimates of Θ Δ and Φ using (9)

There are totally 200 working modes (parameter Π =

200) Each working mode using a WMV helps us to betterunderstand the occurrences of events For the sake of analysiswe list the highest probability working modes for eachworking place and each working date from the WCM inTable 2 In each working mode we list the top 10 eventsmost likely to be generated in the most likely working modeconditioned on both the working place and working dateFor example in the working place of Northern China for themost likely workingmode (numbered 101 in the 200 workingmodes) the top 10 events (OPI SPM EAI HOS BVI MCIAOM APTS CPD and TCI) are most likely to occur in theworking date of June

Experimental results show that different working placeshave different working modes in spite of the same workingdate and the same working place also has different workingmodes for different working dates It indicates that theworking mode is indeed related with the working place andworking date Events related with the pumping system such

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

2 Mathematical Problems in Engineering

Cantileverunfolding

Cantileverfolding

Landing legunfolding

Concretepumping

Concretepumping

Concretepumping

Concretepumping

Concretepumping

Landing legunfolding

Warm-up

Operation event sequence

Oil temperature

Figure 1 Actual example of telemetry data of a concrete pump truck at one runtime Oil temperature is a typical continuous time series whichrecords the variation of oil temperature at one runtime of a concrete pump truck Operation event sequence is a typical event sequence whichrecords the occurrences of six operation events (warm-up landing leg unfolding cantilever unfolding concrete pumping landing leg foldingand cantilever folding) on the concrete pump truck at one runtime

condition diagnosis are mainly grouped into two categoriesthe correlation analysis between distinct event sequences [6ndash8] and events based process mining algorithms [9 10] Thecorrelation analysis between distinct event sequences usuallyprovides useful hints for causality analysis Although corre-lated metrics may not exactly be the root causes of eventsthey could also provide intermediate useful information thatpinpoints the root causes of events The process miningalgorithms focus on the occurrence order of distinct eventswith the help of processmodels [11 12] whichmainly indicatethe working process of equipment

However the occurrence frequency of events also pro-vides us with important information for working conditiondiagnosis which is ignored by the previous studiesWe take aconcrete pump truck as an example As illustrated in Figure 1at a normal runtime of concrete pump truck a concrete pumptruck needs five concrete pumping events to finish pumping ahopper of the concrete With the wear and tear of concretepump truck the concrete pump truck needs seven or moreconcrete pumping events to finish pumping a hopper of theconcrete Although the occurrence order of the six events(warm-up rarr landing leg unfolding rarr cantilever unfoldingrarr concrete pumping rarr landing leg folding rarr cantileverfolding) is the same yet the working condition of the concretepump truck has changed a lot If the occurrence frequenciesof events are taken into consideration for working conditiondiagnosis it will enhance the ability of working conditiondiagnosis for engineers which will help us better understandthe working condition of equipment

In this paper we proposed a probabilistic generativemodel called working condition model (WCM) which iscapable of depicting the working condition of equipmentat runtime According to event sequence data in differentworking condition of equipment we simulated the pro-cess of event sequence data generated in order to get theWCM of equipment Furthermore with the help of WCMwe extended the application of event sequence data tomore domains such as anomaly detection and the vari-ation trend analysis of working condition Motivated bythe real requirement of working condition diagnosis our

working condition model tries to answer the following threequestions

(a) How many types of working modes (details inSection 3) does the equipment have at runtime

(b) In each type of working mode how does an eventsequence behave

(c) For a new event sequence at runtime which type ofworking mode does it belong to

Our evaluation consists of multiple phases First wemodel theWCMof the real event sequence data sets collectedfrom279 concrete pump trucks over a period of 6monthsWeanalyze the performance of the WCM of the concrete pumptruck Then we apply the WCM of the concrete pump truckfor more applications including anomaly detection and thevariation trend analysis of working condition

Our work presents a probabilistic generative modelnamed WCM to simulate the process of event sequence datagenerated and to depict the working condition of equipmentat runtime The contributions of this paper are as follows

(i) Motivated by real applications we propose the WCMto depict the working condition of equipment Tothe best of our knowledge this is the first attempt tosimulate the process of event sequence data generatedfor working condition diagnosis

(ii) We illustrate two useful applications based onWCMautomated detection for a new work cycle and auto-mated detection for anomalous work cycles

(iii) The experiments on real data from a well-knownChinese construction machinery manufacturer showthe effectiveness of our model

The rest of the paper is organized as follows In Section 2we introduce some related works The problem statementand formulation are introduced in Section 3 We introduceour approach in Sections 4 and 5 The empirical evaluationis shown in Section 6 Finally we conclude our work inSection 7

Mathematical Problems in Engineering 3

2 Related Work

21 Analysis of Event Sequence Data An event is a happeningof interest [13 14] In the surveillance of equipment theinterest in events comes mostly from the state of equipmentchanges that are produced by equipment manipulation oper-ations [15] Example events in the actual surveillance of theconcrete pump truck include warm-up landing leg unfoldingcantilever unfolding concrete pumping landing leg folding andcantilever folding as shown in Figure 1 When a sequenceof events takes place we refer to these occurrences to getthe event sequence data The main idea of analysis of eventsequence data is to process events to gather meaningful orvaluable information and then to derive actions from them

Events in an event sequence are often interrelated andform complex relationshipsThe correlation analysis of eventsequence data [7 8 16 17] focuses on detecting these rela-tionships and is extended to other related applications such asanomaly detection [18ndash20] A temporal spatial or causal rela-tionship of events can determine the partial order betweenevents [16] Hence event sequence data based processminingalgorithms focus on the causal relationship of events byanalyzing the occurrence order of distinct events [21 22]There have been many existing process mining algorithms[23 24] and tools [25 26] to mine the causal relationshipof events which is capable of instructing engineers to betterunderstand the operation procedure of equipment [27]

22 Working Condition Diagnosis Working condition ofequipment [28] is the condition in which the equipmentworks including but not limited to such things as amenitiesphysical environment stress and noise levels degree of safetyor danger and the like The working condition diagnosisusually uses specific models or variables for different appli-cations In the correlation analysis of event sequence datafor working condition diagnosis the correlation coefficientsof event sequence data for example the Pearson correlation[6 7 29 30] and the Rank correlation [31] are used todepict the working conditions of equipment In the processmining algorithms of event sequence data varieties of processmodels for example Petri net [32] and business processmodeling notation [33 34] are specific to depicting theworking condition of equipment for different process miningalgorithms In this paper we take the occurrences frequenciesof events into consideration and simulate the process ofevent sequences generated in different working modes ofequipment Hence we use the occurrence probability ofevents in the event sequence to depict the working conditionof equipment at runtime

23 Probabilistic Generative Model In probability and statis-tics a generative model [35] is a model for randomlygenerating observable data typically given some hiddenparameters It specifies a joint probability distribution overobservable data Generative models are used in machinelearning [36] either for modeling data directly (ie modelingobservations drawn from a probability density function) oras an intermediate model to forming a conditional prob-ability density function A conditional distribution can be

120573

120572 120579 Z W

MN

120593

Figure 2 Graphic model of a typical generative model LDA Theboxes are plates representing replicates The outer plate representsdocuments while the inner plate represents the repeated choice oftopics and words within a document 119872 denotes the number ofdocuments and 119873 the number of words in a document 120572 120573 120579 120593 119885

are hidden parameters and 119882 is observations Details about LDArefer to [5]

formed from a generative model through the Bayesian rule[36]

For example latent Dirichlet allocation (LDA) [5 37 38]is a typical generative model which is widely used in manydomains In natural language processing LDA is capable ofsimulating the process of documents generated well whereobservations are words collected into documents and itposits that each document is a mixture of a small numberof topics and that each wordrsquos creation is attributable toone of the documentrsquos topics Figure 2 illustrates the graphicmodel [39] of LDA [5]With plate notation the dependenciesamong variables can be captured concisely

3 Terminology and Notation

As theworking condition of equipment is always correspond-ing to a period for example one day or one week we firstdetermine the basic unit of observation forworking conditiondiagnosis intended to ease the working condition diagnosisaccording to the event sequence data

Definition 1 (work cycle) A work cycle of a piece of equip-ment denoted by 119904 is a complete work period that is acomplete usage period of the equipment from the time theequipment starts working until it shuts down

We define the idea that 119904 consists of elements that areintegers from 1 119878 where 119878 is the number of work cyclesThere is one important advantage in adopting the workcycle as the basic unit of observation in terms of eventsequence data analysis for working condition diagnosis Inour opinion no matter in what kind of circumstances (egdifferent places and different climates) the equipment workstheworking condition of equipment in onework cyclemainlybehaves similarly For example in awork cycle of the concretepump truck illustrated in Figure 1 different concrete pumptrucks usually have a similar working process warm-up rarr

landing leg unfolding rarr cantilever unfolding rarr concretepumping rarr landing leg folding rarr cantilever folding eventhough the concrete pump truck works in different workingcircumstances

4 Mathematical Problems in Engineering

Definition 2 (event) An event of the equipment denoted by119890 is to record an occurrence of a specific message indicatingthat something such as an operation has happened in theequipment

For example in Figure 1 there are six events (warm-uplanding leg unfolding cantilever unfolding concrete pumpinglanding leg folding and cantilever folding) which reflect theworking condition of some component in the concrete pumptruck respectively We will use integers to denote the entriesin the event set with each event 119890 taking a value from 1 119864where 119864 is the number of unique events in the event setdenoted by E

Definition 3 (event sequence) An event sequence denotedby e119904 of the equipment consists of a sequence of events that

occur in work cycle 119904

An event sequence is represented as a vector of events e119904

with 119873119904entries For example in Figure 1 the event set of the

concrete pump truck contains six (119864 = 6) events denotedby Epump = (1 6) where the integers represent theentry of the events warm-up landing leg unfolding cantileverunfolding concrete pumping landing leg folding and cantileverfolding respectively Hence the event sequence is equal toa vector with the length 119873

119904= 10 denoted by e

119904=

(1 2 3 4 4 4 4 4 5 6)Suppose that the data set has 119878 work cycles of the

equipment corresponding to 119878 event sequences The data setwith 119878 event sequences is represented as a concatenation ofthe event sequence vectors which we will denote by e having119873 = sum

119878

119904=1 119873119904

In a work cycle an event sequence provides us a mainworking process of the equipment However an occurrenceof the event is also relatedwith theworking place andworkingdate of the equipment For example the concrete pumptruck will add an operation event concrete mixing in order toprevent the concrete setting if theworking temperature is lowThe working temperature is directly related with the workingplace (eg north or south of China) and working date (egwinter or summer)

In addition to these events we have the informationabout the characteristics of each event sequence (work cycle)working place working date and equipment pieces numberof the work cycle We define p

119904to be the set of working places

of work cycle 119904 p119904consists of elements that are integers from

1 119875 where 119875 is the number of working places whichgenerated the event sequences in the data set 119875

119904will be used

to denote the number of working places of work cycle 119904 Wedefine 120591 to be the set of working dates of work cycle 119904 120591

119904

consists of elements that are integers from 1 119879 where119879 isthe number of working dates (In order to ease the notationin the working date we just record the working month of thework cycle which means 119879 = 12) 119879

119904will be used to denote

the number of working dates of work cycle 119904 We define 120596119904to

be the set of equipment number of work cycle 119904 120596119904consists

of elements that are integers from 1 Ω where Ω is thenumber of the equipment pieces

Definition 4 (work cycle characteristic) A work cycle char-acteristic (WCC) is five-tuple set denoted by W

119904=

E e119904 p119904 120591119904 120596119904 which record all the information about the

work cycle 119904

A WCC is corresponding to a work cycle so the originaldata set is redefined as a group of WCCs denoted by D =

W1 W119878 The WCCs of two work cycles are likelyto be different though they have the same working placeand working date The main differences between the workcycles center on the occurrence of the events However theoccurrence disciplines of the events are akin to each other forthe work cycles in the same working mode For example theconcrete pump truck has twomain workingmodes pumpingmode and travelingmodel For thework cycle in the pumpingmode of the concrete pump truck the occurrence of the eventconcrete pumping is frequent as shown in Figure 1 Howeverfor the work cycle in the traveling mode the occurrence ofthe event concrete pumping is none since the concrete pumptruck can not pump concrete in the traveling mode

Definition 5 (working mode) A working mode denoted by120587 is on behalf of a kind of work cycles that is about a specificsubject has an identifiable purpose and can stand alone

For event set E we define working mode vector (WMV)G(120587) = (1 1198881) (119864 119888

119864) to be the set of events 119890 associated

with its occurrence frequency 119888119890 where sum

119864

119890=1 119888119890

= 1 TheWMV G(120587) is able to depict the occurrence disciplinesof events according to the occurrence frequency of eventsTherefore if we can get a group of WMVs for a group ofwork cycles it will help us better understand the occurrencedisciplines of events

Definition 6 (working mode space (WMS)) A working modespace (WMS) denoted by G = G(1) G(Π) is a set ofWMVs for a group of given work cycles of equipment

Actually the WMS is akin to a group of cluster centerseach of which depicts the working condition of equipment indifferent working modes

4 The Inference of WMS

In this section we develop effective algorithms for theinference of the WMS for a group of given work cycles ofequipment Before proceeding we formulate our problem asfollows

WMS Inference Problem Given a group of work cycles asso-ciated with the corresponding WCCs D = W1 W119878the inference problem is to infer the WMS model G =

G(1) G(Π) whereΠ represents the number of workingmodes

With the help of WMS we can find that in differentworking places and differentworking dates thework cycles ofequipment have different working modes Meanwhile thereare several working modes in the same working place andthe same working date The WMV of working mode reflects

Mathematical Problems in Engineering 5

the working condition of its corresponding work cycleespecially the occurrence disciplines of events

In the remainder of this section we first introduce theWCM for learning theWMS for a group of given work cyclesand then introduce the inference framework of the WCM

41 The WCM TheWCM is a hierarchical generative modelin which each event 119890 in a work cycle is associated with threelatent variables a working place x a working date y anda working mode z These latent variables augment the 119864-dimensional vector e (indicating the values of all events in theevent set E) with three additional 119864-dimensional vectors xy and z indicatingworking place working date andworkingmode assignments for the 119864 events

As we observed the sets of working places and the sets ofworking dates for each work cycle are observed This leavesthe unresolved issue of having unobserved working placesand working dates and avoids the need to define a prioron working places and working dates which is outside ofthe scope of our model Each working place is associatedwith a multinomial distribution over working mode andeach working date is also associated with a multinomialdistribution over working mode Conditioned on the setof working places and the set of working dates associatedwith their distributions over working modes the process bywhich the corresponding event sequence for a work cycleis simulated can be summarized as follows first a workingplace and a working date are respectively chosen uniformlyat random for each event that will appear in the workcycle next a working mode is sampled for each event bothfrom the distribution over working mode associated withthe working place of that event and from the distributionover working mode associated with the working date ofthat event finally the events themselves are sampled fromthe distribution over events associated with each workingmode

This simulating process can be expressed more formallyby defining some of the other variables in the WCMAssume we have Π working modes We can parameterizethe multinomial distribution over working modes for eachworking place using matrix Θ of size Π times 119875 with elements120579120587119901

that stand for the probability of assigning working mode120587 to an event occurring in working place 119901 Thus sum

Π

120587=1 120579120587119901

=

1 and for simplicity of notation we will drop the index 120587

when convenient and use 120579119901to stand for the 119901th column

of the matrix Θ Similarly we use matrix Δ of size Π times 119879

to parameterize the multinomial distribution over workingmodes for each working date where elements 120575

120587120591stand for

the probability of assigning working mode 120587 to an eventoccurring in the working date 120591 Thus sum

119879

120591=1 120575120587120591

= 1 andwe will also drop the index 120587 when convenient and use 120575

120591

to stand for the 120591th column of the matrix Δ intended tosimplify the notation The multinomial distributions overevents associated with each workingmode are parameterizedby matrix Φ of size 119864 times Π with elements 120601

119890120587that stand

for the probability of simulating to make event 119890 occurin the working mode 120587 Again sum

119864

119890=1 120601119890120587

= 1 and 120601119890

stands for the 119890th column of the matrix Φ These three

120572

120573

120574120575

Π

T

S

120579

120601

x y

z

e

P

Ns

Ps 120591s

Figure 3 The graphic representation of WCM

multinomial distributions are assumed to be generated fromsymmetric Dirichlet priors with hyperparameters 120572 120573 and120574 respectively In the results of this paper we assume thatthese hyperparameters are fixedThis notation is summarizedin Notations

The sequential simulating procedure of first picking aworking place and a working date respectively followed bypicking a working mode and then simulating an event tooccur in this working mode according to the probabilitydistributions leads to the following generative process

(1) For each working place 119901 = 1 119875 choose 120579119901

sim

Dirichlet(120572)

for each working date 120591 = 1 119879 choose 120575120591

sim

Dirichlet(120574)

for each working mode 120587 = 1 Π choose 120601120587

sim

Dirichlet(120573)

(2) For each work cycle 119904 = 1 119878

given the vector of working places p119904

given the vector of working dates 120591119904

for each event 119894 = 1 119873119904

conditioned on p119904choose working place

119909119904119894

sim Uniform(p119904)

conditioned on 120591119904choose working date

119910119904119894

sim Uniform(120591119904)

conditioned on 119909119904119894and 119910

119904119894choose working

mode 119911119904119894

sim Discrete(120579119909119904119894

120575119910119904119894

)conditioned on 119911

119904119894choose event 119890

119904119894sim

Discrete(120601119911119904119894

)

The graphical model corresponding to this process isshown in Figure 3 Under this simulating process the work-ing mode is drawn independently when conditioned onΦ and each working mode is drawn independently whenconditioned on Θ Δ and Π The probability of the eventsequence e conditioned on Θ Δ and Φ (and implicitly ona fixed number of working modes Π) is

119875 (e | Φ Δ ΘPT) =

119878

sum

119904=1119875 (e119904

| Φ Δ Θ p119904 120591119904) (1)

6 Mathematical Problems in Engineering

With the help of (1) we can first obtain the probability ofthe event sequence in each work cycle e

119904 by summing over

the latent variables x y and z to get what is shown in (3)Consider

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1119875 (119890119904119894

| Φ Δ Θ p119904 120591119904) =

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894 119911119904119894

= 120587 119909119904119894

= 119901 119910119904119894

= 120591 | Φ Δ Θ p119904 120591119904)

=

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894

| 119911119904119894

= 120587 Φ) 119875 (119911119904119894

= 120587 | 119909119904119894

= 119901 Θ) 119875 (119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) 119875 (119909119904119894

= 119901 | p119904) 119875 (119910

119904119894= 120591 | 120591

119904)

(2)

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

(3)

119875 (e | 120572 120573 120574PT) = int

Θ

int

Δ

int

Φ

119875 (e | Θ Δ ΦPT) 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (4)

= int

Θ

int

Δ

int

Φ

[

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

] 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (5)

In (3) the factorizationmakes use of the conditional inde-pendence assumptions of model Meanwhile the variablesx and y are mutually stochastically independent Equation(3) represents the probability of the events e in terms ofthe entries of the parameter matrices Θ Φ and Δ asintroduced above The probability distribution over workingplace assignments 119875(119909

119904119894= 119901 | p

119904) is assumed to be

uniform over the elements of p119904and deterministic if 119875

119904=

1 Similarly the probability distribution over working dateassignments 119875(119910

119904119894= 120591 | 120591

119904) is assumed to be uniform over

the elements of 120591119904and deterministic if 119879

119904= 1The probability

distribution over working mode assignments both 119875(119911119904119894

=

120587 | 119909119904119894

= 119901 Θ) and 119875(119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) is themultinomial distributions 120579

119901and 120575120591in Θ and Δ respectively

that corresponds to working place 119901 and working date 120591respectively The probability of an event given a workingmode assignment 119875(119890

119904119894| 119911119904119894

= 120587 Φ) is the multinomialdistribution 120601

120587in Φ that corresponds to working mode

120587In (4) and (5) we treat Θ Φ and Δ as random variables

and compute themarginal probability of a corpus by integrat-ing them out 119875(Θ Δ Φ | 120572 120574 120573) = 119875(Θ | 120572)119875(Δ | 120574)119875(Φ |

120573) are the Dirichlet priors on Θ Δ and Φ respectively as wedefined before

5 Inference of WCM from Data

The WCM contains three continuous random variablesΘ Δ and Φ Various approximate inference approacheshave recently been proposed for estimating the posteriordistribution for continuous random variables in hierarchicalBayesianmodels In this paper our inferencemethod is Gibbssampling [40] which is a special formofMarkov chainMonteCarlo

Our target of estimation is to compute the posteriordistribution119875(Θ Δ Φ | 120572 120574 120573) In order to sample the values

of the distribution we have to use the latent variables x y andz to estimate the posterior distribution

119875 (Θ Δ Φ | 120572 120574 120573)

= sum

xyz119875 (Θ Δ Φ | x y z 120572 120574 120573) 119875 (x y z | 120572 120574 120573)

(6)

The estimation process mainly involves two steps first weuse Gibbs sampling to get approximate posterior 119875(x y z |

120572 120574 120573) second 119875(Θ Δ Φ | x y z 120572 120574 120573) can be computeddirectly for each sample by exploiting the fact that theDirichlet distribution is conjugate to the multinomial

51 Gibbs Sampling Using Gibbs sampling we can generatea sample from the joint distribution 119875(z y z | 119863train 120572 120573)

by two steps first sampling working place assignment 119909119904119894

working date assignment 119910119904119894 and working mode assignment

119911119904119894for individual event 119890

119904119894 conditioned on fixed assignments

of working places working dates and working modes for allother events in the data set second repeating this processfor each event A single Gibbs sampling iteration consistsof sequentially performing this sampling of working placeworking date and working mode assignments for eachindividual event in the data set

119875 (119909119904119894

= 119901 119910119904119894

= 120591 119911119904119894

= 120587 | 119890119904119894

= 119890 xminus119904119894

yminus119904119894

zminus119904119894

eminus119904119894

PT 120572 120573)

prop

119862119864Π

119890120587minus119904119894+ 120573

sum1198901015840 119862119864Π

1198901015840120587minus119904119894

+ 119864120573

119862Π119875

120587119901minus119904119894+ 120573

sum1199011015840 119862Π119875

1205871199011015840minus119904119894

+ 119875120572

sdot

119862Π119879

120587120591minus119904119894+ 120573

sum1205911015840 119862Π119879

1205871205911015840minus119904119894

+ 119879120574

(7)

According to (1)sim(5) we can derive a basic equationneeded for the Gibbs sampler as shown in (7) In (7)

Mathematical Problems in Engineering 7

119862Π119875 means working mode assigned to working place count

matrix where 119862Π119875

120587119901minus119904119894means the number of events assigned

to working mode 120587 in the working place 119901 excluding theworking mode assignment to event 119890

119904119894 Similarly 119862

Π119879 meansworking mode assigned to working date count matrix where119862Π119879

120587120591minus119904119894means the number of events assigned toworkingmode

120587 in the working date 120591 excluding the working mode assign-ment to event 119890

119904119894 Similarly 119862

119864Π represents event assignedto working mode count matrix where 119862

119864Π

119890120587minus119904119894represents

the number of events from the 119890th entry in the event setassigned to working mode 120587 excluding the topic assignmentto event 119890

119904119894 Meanwhile x

minus119904119894 yminus119904119894

zminus119904119894

eminus119904119894

represents thevector of working place assignment vector of working dateassignment vector of working mode assignments and vectorof event observations in the data set except for the 119894th eventin the 119904th work cycle respectively

The main sampling steps are as follows we first ini-tialize the working place working date and working modeassignments x y and z randomly In each Gibbs samplingiteration we sequentially draw the working mode work-ing place and working date assignment of the 119894th eventfrom the joint conditional distribution in (7) With theincreasing of iterations the Gibbs sampler will approach itsstationary distributionmdashthe posterior distribution 119875(z y z |

119863train 120572 120573)

52 The Posterior Probability Given z y z 119863train 120572 120573 and 120574computing posterior distributions on Θ Δ and Φ is straight-forward Based on the fact that the Dirichlet distribution isconjugate to the multinomial distribution then we can get

120601120587

| z 120573 119863train sim Dilichlet (119862

119864Π

120587+ 120573)

120579119901

| x z 120572 119863train sim Dilichlet (119862

Π119875

119901+ 120572)

120575120591

| y z 120574 119863train sim Dilichlet (119862

Π119879

120591+ 120574)

(8)

where 119862119864Π

120587represents the vector of counts of the number

of times each event has been assigned to working mode120587 119862Π119875

119901and 119862

Π119879

120591are similar to 119862

119864Π

120587 Then we can evaluate

the posterior probability of each element of Θ Δ and Φ asfollows

119864 [120601120587

| z 120573 119863train] =

(119862119864Π

)

119896

+ 120573

sum1198901015840 (119862119864Π

1198901015840120587

)

119896

+ 119864120573

119864 [120579119901| x z 120572 119863train] =

(119862Π119875

)

119896

+ 120572

sum1205911015840 (119862Π119875

1205871015840119901

)

119896

+ 119875120572

119864 [120575120591| y z 120574 119863train] =

(119862Π119879

)

119896

+ 120574

sum (119862Π119879

1205871015840119905

)

119896

+ 119879120574

(9)

where (119862119864Π

)119896 is the matrix of working mode assigned to

event counts exhibited in (z)119896 and 119896 refers to sample 119896

from the Gibbs sampler These posterior probabilities also

Hopper Transportationcylinder

Stirringsystem

Pumpingsystem

Landingleg system

Cantileversystem

Concrete Specifiedlocation

Concrete streamOperation sequence

Related system

Figure 4 The stream of the concrete in the concrete pump truckand the operation sequence of the concrete pump truck at runtime

provide point estimates for Φ Θ and Δ and correspond tothe posterior predictive distribution for the next event froma working mode the next event from a working date and thenext working mode in a work cycle respectively

6 Experimental Evaluation

61 Data Preparation We trained the WCM on a real worlddata set collected from a well-known Chinese constructionmachinery manufacturer The data set is a set of eventsequence data from the concrete pump truck in 6 months(from June 2012 to November 2012) This data set contains119878 = 32 632 work cycles 119875 = 5 different working places119879 = 6 different working dates a total of 119873 = 22 418 756event tokens and an event set size of 119864 = 33 uniqueevents The working date of each work cycle is accordingto its real working month which means the working dateset T = Jun JulAug SepOctNov Because the eventsequence data are all collected in the Chinese Mainlandwe divide the working places into 5 regions according toadministrative region of China Northern China Northeast-ernChina EasternChinaMid-SouthernChina andWesternChina

The concrete pump truck is a type of constructionmachinery which is a truck associated with a concrete pumpIt alternates between two working statuses traveling andpumping In the pumping status it will push the concreteto the specified location In the traveling status it is just atruck In the experiment we mainly focus on events in thepumping status Figure 4 shows the stream of the concrete inthe concrete pump truck at runtime and operation sequenceof different systems in the concrete pump truckThe concretepump truck first switches to pumping status and then unfoldsand fixes the landing leg Next it unfolds cantilever tothe specified location Afterwards the concrete is pouredto the hopper and meanwhile the stirring system initiatesstirring the concrete Finally the pumping system initiatespumping the concrete in the hopper to the specified locationWhen the pumping ends the concrete pump truck stops thepumping system and then folds the cantilever and landing leg

8 Mathematical Problems in Engineering

Table 1 Event set

Event Abbr Type Related systemStop pumping mandatorily SPM Alarm event AllReminder of concrete import RCI Alarm event HopperConcrete piston withdrawing CPW Alarm event Pumping systemReminder of concrete cylinder water RCSW Alarm event HopperSwing cylinder initiate SCI Operation event Pumping systemStalling of engine SoE Alarm event AllAlteration of operation mode (remote or close) AOM Operation event Pumping systemAlteration of pump truck status (pumping or travelling) APTS Operation event AllControl of pumping displacement CPD Operation event Pumping systemTransportation cylinder initiate TCI Operation event Pumping systemManual control of master cylinder MCMC Operation event Pumping systemManual control of swing cylinder MCSC Operation event Pumping systemDetection of system pressure DSP Alarm event Pumping systemManual control of engine speed MCES Operation event Pumping systemHigh pressure mode initiate HPMI Operation event Pumping systemWarm-up initiate WUI Operation event Pumping systemWater pump initiate WPI Operation event HopperConcrete stirring initiate CSI Operation event Stirring systemCantilever folding initiate CFI Operation event Cantilever systemTemperature control initiate TCI Operation event Pumping systemCantilever movement CM Operation event Cantilever systemLanding leg movement LLM Operation event Landing leg systemDetection of oil pressure DOP Alarm event Pumping systemLanding leg folding LLF Operation event Landing leg systemRotary table movement RTM Operation event Cantilever systemOil pump initiate OPI Operation event Pumping systemEnergy accumulator initiate EAI Operation event Pumping systemBypath valve initiate BVI Operation event Pumping systemConcrete pumping initiate CPI Operation event Pumping systemMaster cylinder initiate MCI Operation event Pumping systemCantilever shock absorbers initiate CSAI Alarm event Cantilever systemInitiate of system cooling ISC Operation event Pumping systemHydraulic oil supplement HOS Operation event Pumping system

successively Table 1 shows the relations between systems andevents in the concrete pump truck

Table 1 shows all the events in the event set There aretwo types of events alert event and operation event Theoccurrence of an alarm event is to remind the operator thatsome emergency happens For example the occurrence ofevent RCI means to remind the operator to import concreteinto the hopper The alarm event is not a regular operationThe operation event is the real record of regular operations inthe concrete pump truck

62 Analysis for Gibbs Sampling Using Perplexity As men-tioned earlier in the experiment described in this paperwe donot estimate the hyperparameters120572120573 and 120574 Instead they arefixed at 50Π 001 and 50Π respectively In this paper weuse the perplexity of themodel on test work cycles to evaluatewhen the performance of the model begins to stabilize

The perplexity of new unobserved work cycle 119904 thatcontains events e

119904and is conditioned on the working places

p119904and working dates 120591

119904of the work cycle is defined as

Perplexity (e119904

| p119904 120591119904) = exp(minus

log119875 (e119904

| p119904 120591119904)

119873119904

) (10)

where 119875(e119904

| p119904 120591119904) is the probability assigned by the

WCM To simplify notation here we do not consider theexplicit dependency on the hyperparameters For multiplework cycles we report the average perplexity overwork cyclesdefined as follows

Perplexity =

119878

sum

119904=1

Perplexity (e119904

| p119904 120591119904)

119878

(11)

The lower the perplexity the better the performance of themodel We can obtain an approximate estimate of perplexity

Mathematical Problems in Engineering 9

0 20 40 60 80 100 120 140 160 180 200

4500

4600

4700

4800

4900

5000

5100

5200

5300

Iteration

Perp

lexi

ty

K = 10

K = 8

K = 6

K = 4

K = 2

The number of working modes Π = 200

Figure 5 Perplexity as a function of iterations of the Gibbs samplerfor a Π = 200 model respectively Each curve shows the perplexityfromaveraging for different settings ofΠ but nowover a larger rangeof sampling iterations

by averaging over multiple samples according to (9) calcu-lated as follows

119875 (e119904

| p119904 120591119904)

=

1119870

119870

sum

119896=1

119873119904

prod

119894=1

1119875119904119879119904

sum

119901isinp119904120591isin120591119904120587

119864 [120579120587119901

120575120587120591

120601119890119904119894120587

| x119896 y119896 z119896]

(12)

Experimental results using different values for 119870 indicatedthat 119870 = 10 samples is a reasonable choice to get a goodapproximation of the perplexity Because of the exchangeabil-ity of the working modes it is possible that quite differentsolutions of working modes are detected across differentsamples In practice however we have also found thatthe solutions of working modes are relatively stable acrosssamples with only a small subset of unique working modesappearing in any sampleHencewe use the average perplexityvalues across samples in the experiment

Figure 5 illustrates the perplexity as a function of itera-tions of the Gibbs sampler for aΠ = 200model to fit the dataset respectively It appears from Figure 5 that performance ofmodels (for different settings of parameter 119870) trained usingthe Gibbs sampler appears to stabilize rather quickly (afterabout 100 iterations) at least in terms of perplexity on thedata set This indicates that the perplexity values flatten outafter a 100 or so iterations of the Gibbs sampler

63The Number ofWorkingModesΠ Although the perplex-ity computation is able to be averaged over different Gibbssampler runs other applications of the model rely on theanalysis of each working mode and are based on the analysisof each sample Meanwhile the setting of the parameter Π isalso determined according to the perplexity The parameterΠ represents the number of working modes

0 50 100 150 200 250 30044504500455046004650470047504800485049004950

Perp

lexi

ty

Perplexity

Number of working modes Π

K = 10 Gibbs samples

Figure 6 Perplexity as a function of the parameter Π of the Gibbssampler for 119870 = 10 samples

Figure 6 illustrates the perplexity as a function of theparameter Π in 119870 = 10 Gibbs samples Empirical settingsof the parameter Π show that the average perplexity overthe data set decreases with the increase of the parameterΠ Experimental results confirm that the average perplexityindeed decreases as we made analysis In particular theperplexity values flatten out after the parameter Π is set to200 This indicates that the parameter Π = 200 fits the dataset in the model

64 Analysis of the WCM Results About the analysis of theWCM results we can use the point estimate of the WCMparameters to look at specific Θ Δ and Φ distributions andrelated quantities that can be derived from these parameters(such as the probability of a working place and a working dategiven a randomly selected event fromaworkingmode) In thefollowing results we take a specific sample x

119896 y119896 and z

119896 after

100 iterations from a single arbitrarily selected Gibbs run andthen generate point estimates of Θ Δ and Φ using (9)

There are totally 200 working modes (parameter Π =

200) Each working mode using a WMV helps us to betterunderstand the occurrences of events For the sake of analysiswe list the highest probability working modes for eachworking place and each working date from the WCM inTable 2 In each working mode we list the top 10 eventsmost likely to be generated in the most likely working modeconditioned on both the working place and working dateFor example in the working place of Northern China for themost likely workingmode (numbered 101 in the 200 workingmodes) the top 10 events (OPI SPM EAI HOS BVI MCIAOM APTS CPD and TCI) are most likely to occur in theworking date of June

Experimental results show that different working placeshave different working modes in spite of the same workingdate and the same working place also has different workingmodes for different working dates It indicates that theworking mode is indeed related with the working place andworking date Events related with the pumping system such

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

Mathematical Problems in Engineering 3

2 Related Work

21 Analysis of Event Sequence Data An event is a happeningof interest [13 14] In the surveillance of equipment theinterest in events comes mostly from the state of equipmentchanges that are produced by equipment manipulation oper-ations [15] Example events in the actual surveillance of theconcrete pump truck include warm-up landing leg unfoldingcantilever unfolding concrete pumping landing leg folding andcantilever folding as shown in Figure 1 When a sequenceof events takes place we refer to these occurrences to getthe event sequence data The main idea of analysis of eventsequence data is to process events to gather meaningful orvaluable information and then to derive actions from them

Events in an event sequence are often interrelated andform complex relationshipsThe correlation analysis of eventsequence data [7 8 16 17] focuses on detecting these rela-tionships and is extended to other related applications such asanomaly detection [18ndash20] A temporal spatial or causal rela-tionship of events can determine the partial order betweenevents [16] Hence event sequence data based processminingalgorithms focus on the causal relationship of events byanalyzing the occurrence order of distinct events [21 22]There have been many existing process mining algorithms[23 24] and tools [25 26] to mine the causal relationshipof events which is capable of instructing engineers to betterunderstand the operation procedure of equipment [27]

22 Working Condition Diagnosis Working condition ofequipment [28] is the condition in which the equipmentworks including but not limited to such things as amenitiesphysical environment stress and noise levels degree of safetyor danger and the like The working condition diagnosisusually uses specific models or variables for different appli-cations In the correlation analysis of event sequence datafor working condition diagnosis the correlation coefficientsof event sequence data for example the Pearson correlation[6 7 29 30] and the Rank correlation [31] are used todepict the working conditions of equipment In the processmining algorithms of event sequence data varieties of processmodels for example Petri net [32] and business processmodeling notation [33 34] are specific to depicting theworking condition of equipment for different process miningalgorithms In this paper we take the occurrences frequenciesof events into consideration and simulate the process ofevent sequences generated in different working modes ofequipment Hence we use the occurrence probability ofevents in the event sequence to depict the working conditionof equipment at runtime

23 Probabilistic Generative Model In probability and statis-tics a generative model [35] is a model for randomlygenerating observable data typically given some hiddenparameters It specifies a joint probability distribution overobservable data Generative models are used in machinelearning [36] either for modeling data directly (ie modelingobservations drawn from a probability density function) oras an intermediate model to forming a conditional prob-ability density function A conditional distribution can be

120573

120572 120579 Z W

MN

120593

Figure 2 Graphic model of a typical generative model LDA Theboxes are plates representing replicates The outer plate representsdocuments while the inner plate represents the repeated choice oftopics and words within a document 119872 denotes the number ofdocuments and 119873 the number of words in a document 120572 120573 120579 120593 119885

are hidden parameters and 119882 is observations Details about LDArefer to [5]

formed from a generative model through the Bayesian rule[36]

For example latent Dirichlet allocation (LDA) [5 37 38]is a typical generative model which is widely used in manydomains In natural language processing LDA is capable ofsimulating the process of documents generated well whereobservations are words collected into documents and itposits that each document is a mixture of a small numberof topics and that each wordrsquos creation is attributable toone of the documentrsquos topics Figure 2 illustrates the graphicmodel [39] of LDA [5]With plate notation the dependenciesamong variables can be captured concisely

3 Terminology and Notation

As theworking condition of equipment is always correspond-ing to a period for example one day or one week we firstdetermine the basic unit of observation forworking conditiondiagnosis intended to ease the working condition diagnosisaccording to the event sequence data

Definition 1 (work cycle) A work cycle of a piece of equip-ment denoted by 119904 is a complete work period that is acomplete usage period of the equipment from the time theequipment starts working until it shuts down

We define the idea that 119904 consists of elements that areintegers from 1 119878 where 119878 is the number of work cyclesThere is one important advantage in adopting the workcycle as the basic unit of observation in terms of eventsequence data analysis for working condition diagnosis Inour opinion no matter in what kind of circumstances (egdifferent places and different climates) the equipment workstheworking condition of equipment in onework cyclemainlybehaves similarly For example in awork cycle of the concretepump truck illustrated in Figure 1 different concrete pumptrucks usually have a similar working process warm-up rarr

landing leg unfolding rarr cantilever unfolding rarr concretepumping rarr landing leg folding rarr cantilever folding eventhough the concrete pump truck works in different workingcircumstances

4 Mathematical Problems in Engineering

Definition 2 (event) An event of the equipment denoted by119890 is to record an occurrence of a specific message indicatingthat something such as an operation has happened in theequipment

For example in Figure 1 there are six events (warm-uplanding leg unfolding cantilever unfolding concrete pumpinglanding leg folding and cantilever folding) which reflect theworking condition of some component in the concrete pumptruck respectively We will use integers to denote the entriesin the event set with each event 119890 taking a value from 1 119864where 119864 is the number of unique events in the event setdenoted by E

Definition 3 (event sequence) An event sequence denotedby e119904 of the equipment consists of a sequence of events that

occur in work cycle 119904

An event sequence is represented as a vector of events e119904

with 119873119904entries For example in Figure 1 the event set of the

concrete pump truck contains six (119864 = 6) events denotedby Epump = (1 6) where the integers represent theentry of the events warm-up landing leg unfolding cantileverunfolding concrete pumping landing leg folding and cantileverfolding respectively Hence the event sequence is equal toa vector with the length 119873

119904= 10 denoted by e

119904=

(1 2 3 4 4 4 4 4 5 6)Suppose that the data set has 119878 work cycles of the

equipment corresponding to 119878 event sequences The data setwith 119878 event sequences is represented as a concatenation ofthe event sequence vectors which we will denote by e having119873 = sum

119878

119904=1 119873119904

In a work cycle an event sequence provides us a mainworking process of the equipment However an occurrenceof the event is also relatedwith theworking place andworkingdate of the equipment For example the concrete pumptruck will add an operation event concrete mixing in order toprevent the concrete setting if theworking temperature is lowThe working temperature is directly related with the workingplace (eg north or south of China) and working date (egwinter or summer)

In addition to these events we have the informationabout the characteristics of each event sequence (work cycle)working place working date and equipment pieces numberof the work cycle We define p

119904to be the set of working places

of work cycle 119904 p119904consists of elements that are integers from

1 119875 where 119875 is the number of working places whichgenerated the event sequences in the data set 119875

119904will be used

to denote the number of working places of work cycle 119904 Wedefine 120591 to be the set of working dates of work cycle 119904 120591

119904

consists of elements that are integers from 1 119879 where119879 isthe number of working dates (In order to ease the notationin the working date we just record the working month of thework cycle which means 119879 = 12) 119879

119904will be used to denote

the number of working dates of work cycle 119904 We define 120596119904to

be the set of equipment number of work cycle 119904 120596119904consists

of elements that are integers from 1 Ω where Ω is thenumber of the equipment pieces

Definition 4 (work cycle characteristic) A work cycle char-acteristic (WCC) is five-tuple set denoted by W

119904=

E e119904 p119904 120591119904 120596119904 which record all the information about the

work cycle 119904

A WCC is corresponding to a work cycle so the originaldata set is redefined as a group of WCCs denoted by D =

W1 W119878 The WCCs of two work cycles are likelyto be different though they have the same working placeand working date The main differences between the workcycles center on the occurrence of the events However theoccurrence disciplines of the events are akin to each other forthe work cycles in the same working mode For example theconcrete pump truck has twomain workingmodes pumpingmode and travelingmodel For thework cycle in the pumpingmode of the concrete pump truck the occurrence of the eventconcrete pumping is frequent as shown in Figure 1 Howeverfor the work cycle in the traveling mode the occurrence ofthe event concrete pumping is none since the concrete pumptruck can not pump concrete in the traveling mode

Definition 5 (working mode) A working mode denoted by120587 is on behalf of a kind of work cycles that is about a specificsubject has an identifiable purpose and can stand alone

For event set E we define working mode vector (WMV)G(120587) = (1 1198881) (119864 119888

119864) to be the set of events 119890 associated

with its occurrence frequency 119888119890 where sum

119864

119890=1 119888119890

= 1 TheWMV G(120587) is able to depict the occurrence disciplinesof events according to the occurrence frequency of eventsTherefore if we can get a group of WMVs for a group ofwork cycles it will help us better understand the occurrencedisciplines of events

Definition 6 (working mode space (WMS)) A working modespace (WMS) denoted by G = G(1) G(Π) is a set ofWMVs for a group of given work cycles of equipment

Actually the WMS is akin to a group of cluster centerseach of which depicts the working condition of equipment indifferent working modes

4 The Inference of WMS

In this section we develop effective algorithms for theinference of the WMS for a group of given work cycles ofequipment Before proceeding we formulate our problem asfollows

WMS Inference Problem Given a group of work cycles asso-ciated with the corresponding WCCs D = W1 W119878the inference problem is to infer the WMS model G =

G(1) G(Π) whereΠ represents the number of workingmodes

With the help of WMS we can find that in differentworking places and differentworking dates thework cycles ofequipment have different working modes Meanwhile thereare several working modes in the same working place andthe same working date The WMV of working mode reflects

Mathematical Problems in Engineering 5

the working condition of its corresponding work cycleespecially the occurrence disciplines of events

In the remainder of this section we first introduce theWCM for learning theWMS for a group of given work cyclesand then introduce the inference framework of the WCM

41 The WCM TheWCM is a hierarchical generative modelin which each event 119890 in a work cycle is associated with threelatent variables a working place x a working date y anda working mode z These latent variables augment the 119864-dimensional vector e (indicating the values of all events in theevent set E) with three additional 119864-dimensional vectors xy and z indicatingworking place working date andworkingmode assignments for the 119864 events

As we observed the sets of working places and the sets ofworking dates for each work cycle are observed This leavesthe unresolved issue of having unobserved working placesand working dates and avoids the need to define a prioron working places and working dates which is outside ofthe scope of our model Each working place is associatedwith a multinomial distribution over working mode andeach working date is also associated with a multinomialdistribution over working mode Conditioned on the setof working places and the set of working dates associatedwith their distributions over working modes the process bywhich the corresponding event sequence for a work cycleis simulated can be summarized as follows first a workingplace and a working date are respectively chosen uniformlyat random for each event that will appear in the workcycle next a working mode is sampled for each event bothfrom the distribution over working mode associated withthe working place of that event and from the distributionover working mode associated with the working date ofthat event finally the events themselves are sampled fromthe distribution over events associated with each workingmode

This simulating process can be expressed more formallyby defining some of the other variables in the WCMAssume we have Π working modes We can parameterizethe multinomial distribution over working modes for eachworking place using matrix Θ of size Π times 119875 with elements120579120587119901

that stand for the probability of assigning working mode120587 to an event occurring in working place 119901 Thus sum

Π

120587=1 120579120587119901

=

1 and for simplicity of notation we will drop the index 120587

when convenient and use 120579119901to stand for the 119901th column

of the matrix Θ Similarly we use matrix Δ of size Π times 119879

to parameterize the multinomial distribution over workingmodes for each working date where elements 120575

120587120591stand for

the probability of assigning working mode 120587 to an eventoccurring in the working date 120591 Thus sum

119879

120591=1 120575120587120591

= 1 andwe will also drop the index 120587 when convenient and use 120575

120591

to stand for the 120591th column of the matrix Δ intended tosimplify the notation The multinomial distributions overevents associated with each workingmode are parameterizedby matrix Φ of size 119864 times Π with elements 120601

119890120587that stand

for the probability of simulating to make event 119890 occurin the working mode 120587 Again sum

119864

119890=1 120601119890120587

= 1 and 120601119890

stands for the 119890th column of the matrix Φ These three

120572

120573

120574120575

Π

T

S

120579

120601

x y

z

e

P

Ns

Ps 120591s

Figure 3 The graphic representation of WCM

multinomial distributions are assumed to be generated fromsymmetric Dirichlet priors with hyperparameters 120572 120573 and120574 respectively In the results of this paper we assume thatthese hyperparameters are fixedThis notation is summarizedin Notations

The sequential simulating procedure of first picking aworking place and a working date respectively followed bypicking a working mode and then simulating an event tooccur in this working mode according to the probabilitydistributions leads to the following generative process

(1) For each working place 119901 = 1 119875 choose 120579119901

sim

Dirichlet(120572)

for each working date 120591 = 1 119879 choose 120575120591

sim

Dirichlet(120574)

for each working mode 120587 = 1 Π choose 120601120587

sim

Dirichlet(120573)

(2) For each work cycle 119904 = 1 119878

given the vector of working places p119904

given the vector of working dates 120591119904

for each event 119894 = 1 119873119904

conditioned on p119904choose working place

119909119904119894

sim Uniform(p119904)

conditioned on 120591119904choose working date

119910119904119894

sim Uniform(120591119904)

conditioned on 119909119904119894and 119910

119904119894choose working

mode 119911119904119894

sim Discrete(120579119909119904119894

120575119910119904119894

)conditioned on 119911

119904119894choose event 119890

119904119894sim

Discrete(120601119911119904119894

)

The graphical model corresponding to this process isshown in Figure 3 Under this simulating process the work-ing mode is drawn independently when conditioned onΦ and each working mode is drawn independently whenconditioned on Θ Δ and Π The probability of the eventsequence e conditioned on Θ Δ and Φ (and implicitly ona fixed number of working modes Π) is

119875 (e | Φ Δ ΘPT) =

119878

sum

119904=1119875 (e119904

| Φ Δ Θ p119904 120591119904) (1)

6 Mathematical Problems in Engineering

With the help of (1) we can first obtain the probability ofthe event sequence in each work cycle e

119904 by summing over

the latent variables x y and z to get what is shown in (3)Consider

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1119875 (119890119904119894

| Φ Δ Θ p119904 120591119904) =

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894 119911119904119894

= 120587 119909119904119894

= 119901 119910119904119894

= 120591 | Φ Δ Θ p119904 120591119904)

=

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894

| 119911119904119894

= 120587 Φ) 119875 (119911119904119894

= 120587 | 119909119904119894

= 119901 Θ) 119875 (119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) 119875 (119909119904119894

= 119901 | p119904) 119875 (119910

119904119894= 120591 | 120591

119904)

(2)

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

(3)

119875 (e | 120572 120573 120574PT) = int

Θ

int

Δ

int

Φ

119875 (e | Θ Δ ΦPT) 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (4)

= int

Θ

int

Δ

int

Φ

[

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

] 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (5)

In (3) the factorizationmakes use of the conditional inde-pendence assumptions of model Meanwhile the variablesx and y are mutually stochastically independent Equation(3) represents the probability of the events e in terms ofthe entries of the parameter matrices Θ Φ and Δ asintroduced above The probability distribution over workingplace assignments 119875(119909

119904119894= 119901 | p

119904) is assumed to be

uniform over the elements of p119904and deterministic if 119875

119904=

1 Similarly the probability distribution over working dateassignments 119875(119910

119904119894= 120591 | 120591

119904) is assumed to be uniform over

the elements of 120591119904and deterministic if 119879

119904= 1The probability

distribution over working mode assignments both 119875(119911119904119894

=

120587 | 119909119904119894

= 119901 Θ) and 119875(119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) is themultinomial distributions 120579

119901and 120575120591in Θ and Δ respectively

that corresponds to working place 119901 and working date 120591respectively The probability of an event given a workingmode assignment 119875(119890

119904119894| 119911119904119894

= 120587 Φ) is the multinomialdistribution 120601

120587in Φ that corresponds to working mode

120587In (4) and (5) we treat Θ Φ and Δ as random variables

and compute themarginal probability of a corpus by integrat-ing them out 119875(Θ Δ Φ | 120572 120574 120573) = 119875(Θ | 120572)119875(Δ | 120574)119875(Φ |

120573) are the Dirichlet priors on Θ Δ and Φ respectively as wedefined before

5 Inference of WCM from Data

The WCM contains three continuous random variablesΘ Δ and Φ Various approximate inference approacheshave recently been proposed for estimating the posteriordistribution for continuous random variables in hierarchicalBayesianmodels In this paper our inferencemethod is Gibbssampling [40] which is a special formofMarkov chainMonteCarlo

Our target of estimation is to compute the posteriordistribution119875(Θ Δ Φ | 120572 120574 120573) In order to sample the values

of the distribution we have to use the latent variables x y andz to estimate the posterior distribution

119875 (Θ Δ Φ | 120572 120574 120573)

= sum

xyz119875 (Θ Δ Φ | x y z 120572 120574 120573) 119875 (x y z | 120572 120574 120573)

(6)

The estimation process mainly involves two steps first weuse Gibbs sampling to get approximate posterior 119875(x y z |

120572 120574 120573) second 119875(Θ Δ Φ | x y z 120572 120574 120573) can be computeddirectly for each sample by exploiting the fact that theDirichlet distribution is conjugate to the multinomial

51 Gibbs Sampling Using Gibbs sampling we can generatea sample from the joint distribution 119875(z y z | 119863train 120572 120573)

by two steps first sampling working place assignment 119909119904119894

working date assignment 119910119904119894 and working mode assignment

119911119904119894for individual event 119890

119904119894 conditioned on fixed assignments

of working places working dates and working modes for allother events in the data set second repeating this processfor each event A single Gibbs sampling iteration consistsof sequentially performing this sampling of working placeworking date and working mode assignments for eachindividual event in the data set

119875 (119909119904119894

= 119901 119910119904119894

= 120591 119911119904119894

= 120587 | 119890119904119894

= 119890 xminus119904119894

yminus119904119894

zminus119904119894

eminus119904119894

PT 120572 120573)

prop

119862119864Π

119890120587minus119904119894+ 120573

sum1198901015840 119862119864Π

1198901015840120587minus119904119894

+ 119864120573

119862Π119875

120587119901minus119904119894+ 120573

sum1199011015840 119862Π119875

1205871199011015840minus119904119894

+ 119875120572

sdot

119862Π119879

120587120591minus119904119894+ 120573

sum1205911015840 119862Π119879

1205871205911015840minus119904119894

+ 119879120574

(7)

According to (1)sim(5) we can derive a basic equationneeded for the Gibbs sampler as shown in (7) In (7)

Mathematical Problems in Engineering 7

119862Π119875 means working mode assigned to working place count

matrix where 119862Π119875

120587119901minus119904119894means the number of events assigned

to working mode 120587 in the working place 119901 excluding theworking mode assignment to event 119890

119904119894 Similarly 119862

Π119879 meansworking mode assigned to working date count matrix where119862Π119879

120587120591minus119904119894means the number of events assigned toworkingmode

120587 in the working date 120591 excluding the working mode assign-ment to event 119890

119904119894 Similarly 119862

119864Π represents event assignedto working mode count matrix where 119862

119864Π

119890120587minus119904119894represents

the number of events from the 119890th entry in the event setassigned to working mode 120587 excluding the topic assignmentto event 119890

119904119894 Meanwhile x

minus119904119894 yminus119904119894

zminus119904119894

eminus119904119894

represents thevector of working place assignment vector of working dateassignment vector of working mode assignments and vectorof event observations in the data set except for the 119894th eventin the 119904th work cycle respectively

The main sampling steps are as follows we first ini-tialize the working place working date and working modeassignments x y and z randomly In each Gibbs samplingiteration we sequentially draw the working mode work-ing place and working date assignment of the 119894th eventfrom the joint conditional distribution in (7) With theincreasing of iterations the Gibbs sampler will approach itsstationary distributionmdashthe posterior distribution 119875(z y z |

119863train 120572 120573)

52 The Posterior Probability Given z y z 119863train 120572 120573 and 120574computing posterior distributions on Θ Δ and Φ is straight-forward Based on the fact that the Dirichlet distribution isconjugate to the multinomial distribution then we can get

120601120587

| z 120573 119863train sim Dilichlet (119862

119864Π

120587+ 120573)

120579119901

| x z 120572 119863train sim Dilichlet (119862

Π119875

119901+ 120572)

120575120591

| y z 120574 119863train sim Dilichlet (119862

Π119879

120591+ 120574)

(8)

where 119862119864Π

120587represents the vector of counts of the number

of times each event has been assigned to working mode120587 119862Π119875

119901and 119862

Π119879

120591are similar to 119862

119864Π

120587 Then we can evaluate

the posterior probability of each element of Θ Δ and Φ asfollows

119864 [120601120587

| z 120573 119863train] =

(119862119864Π

)

119896

+ 120573

sum1198901015840 (119862119864Π

1198901015840120587

)

119896

+ 119864120573

119864 [120579119901| x z 120572 119863train] =

(119862Π119875

)

119896

+ 120572

sum1205911015840 (119862Π119875

1205871015840119901

)

119896

+ 119875120572

119864 [120575120591| y z 120574 119863train] =

(119862Π119879

)

119896

+ 120574

sum (119862Π119879

1205871015840119905

)

119896

+ 119879120574

(9)

where (119862119864Π

)119896 is the matrix of working mode assigned to

event counts exhibited in (z)119896 and 119896 refers to sample 119896

from the Gibbs sampler These posterior probabilities also

Hopper Transportationcylinder

Stirringsystem

Pumpingsystem

Landingleg system

Cantileversystem

Concrete Specifiedlocation

Concrete streamOperation sequence

Related system

Figure 4 The stream of the concrete in the concrete pump truckand the operation sequence of the concrete pump truck at runtime

provide point estimates for Φ Θ and Δ and correspond tothe posterior predictive distribution for the next event froma working mode the next event from a working date and thenext working mode in a work cycle respectively

6 Experimental Evaluation

61 Data Preparation We trained the WCM on a real worlddata set collected from a well-known Chinese constructionmachinery manufacturer The data set is a set of eventsequence data from the concrete pump truck in 6 months(from June 2012 to November 2012) This data set contains119878 = 32 632 work cycles 119875 = 5 different working places119879 = 6 different working dates a total of 119873 = 22 418 756event tokens and an event set size of 119864 = 33 uniqueevents The working date of each work cycle is accordingto its real working month which means the working dateset T = Jun JulAug SepOctNov Because the eventsequence data are all collected in the Chinese Mainlandwe divide the working places into 5 regions according toadministrative region of China Northern China Northeast-ernChina EasternChinaMid-SouthernChina andWesternChina

The concrete pump truck is a type of constructionmachinery which is a truck associated with a concrete pumpIt alternates between two working statuses traveling andpumping In the pumping status it will push the concreteto the specified location In the traveling status it is just atruck In the experiment we mainly focus on events in thepumping status Figure 4 shows the stream of the concrete inthe concrete pump truck at runtime and operation sequenceof different systems in the concrete pump truckThe concretepump truck first switches to pumping status and then unfoldsand fixes the landing leg Next it unfolds cantilever tothe specified location Afterwards the concrete is pouredto the hopper and meanwhile the stirring system initiatesstirring the concrete Finally the pumping system initiatespumping the concrete in the hopper to the specified locationWhen the pumping ends the concrete pump truck stops thepumping system and then folds the cantilever and landing leg

8 Mathematical Problems in Engineering

Table 1 Event set

Event Abbr Type Related systemStop pumping mandatorily SPM Alarm event AllReminder of concrete import RCI Alarm event HopperConcrete piston withdrawing CPW Alarm event Pumping systemReminder of concrete cylinder water RCSW Alarm event HopperSwing cylinder initiate SCI Operation event Pumping systemStalling of engine SoE Alarm event AllAlteration of operation mode (remote or close) AOM Operation event Pumping systemAlteration of pump truck status (pumping or travelling) APTS Operation event AllControl of pumping displacement CPD Operation event Pumping systemTransportation cylinder initiate TCI Operation event Pumping systemManual control of master cylinder MCMC Operation event Pumping systemManual control of swing cylinder MCSC Operation event Pumping systemDetection of system pressure DSP Alarm event Pumping systemManual control of engine speed MCES Operation event Pumping systemHigh pressure mode initiate HPMI Operation event Pumping systemWarm-up initiate WUI Operation event Pumping systemWater pump initiate WPI Operation event HopperConcrete stirring initiate CSI Operation event Stirring systemCantilever folding initiate CFI Operation event Cantilever systemTemperature control initiate TCI Operation event Pumping systemCantilever movement CM Operation event Cantilever systemLanding leg movement LLM Operation event Landing leg systemDetection of oil pressure DOP Alarm event Pumping systemLanding leg folding LLF Operation event Landing leg systemRotary table movement RTM Operation event Cantilever systemOil pump initiate OPI Operation event Pumping systemEnergy accumulator initiate EAI Operation event Pumping systemBypath valve initiate BVI Operation event Pumping systemConcrete pumping initiate CPI Operation event Pumping systemMaster cylinder initiate MCI Operation event Pumping systemCantilever shock absorbers initiate CSAI Alarm event Cantilever systemInitiate of system cooling ISC Operation event Pumping systemHydraulic oil supplement HOS Operation event Pumping system

successively Table 1 shows the relations between systems andevents in the concrete pump truck

Table 1 shows all the events in the event set There aretwo types of events alert event and operation event Theoccurrence of an alarm event is to remind the operator thatsome emergency happens For example the occurrence ofevent RCI means to remind the operator to import concreteinto the hopper The alarm event is not a regular operationThe operation event is the real record of regular operations inthe concrete pump truck

62 Analysis for Gibbs Sampling Using Perplexity As men-tioned earlier in the experiment described in this paperwe donot estimate the hyperparameters120572120573 and 120574 Instead they arefixed at 50Π 001 and 50Π respectively In this paper weuse the perplexity of themodel on test work cycles to evaluatewhen the performance of the model begins to stabilize

The perplexity of new unobserved work cycle 119904 thatcontains events e

119904and is conditioned on the working places

p119904and working dates 120591

119904of the work cycle is defined as

Perplexity (e119904

| p119904 120591119904) = exp(minus

log119875 (e119904

| p119904 120591119904)

119873119904

) (10)

where 119875(e119904

| p119904 120591119904) is the probability assigned by the

WCM To simplify notation here we do not consider theexplicit dependency on the hyperparameters For multiplework cycles we report the average perplexity overwork cyclesdefined as follows

Perplexity =

119878

sum

119904=1

Perplexity (e119904

| p119904 120591119904)

119878

(11)

The lower the perplexity the better the performance of themodel We can obtain an approximate estimate of perplexity

Mathematical Problems in Engineering 9

0 20 40 60 80 100 120 140 160 180 200

4500

4600

4700

4800

4900

5000

5100

5200

5300

Iteration

Perp

lexi

ty

K = 10

K = 8

K = 6

K = 4

K = 2

The number of working modes Π = 200

Figure 5 Perplexity as a function of iterations of the Gibbs samplerfor a Π = 200 model respectively Each curve shows the perplexityfromaveraging for different settings ofΠ but nowover a larger rangeof sampling iterations

by averaging over multiple samples according to (9) calcu-lated as follows

119875 (e119904

| p119904 120591119904)

=

1119870

119870

sum

119896=1

119873119904

prod

119894=1

1119875119904119879119904

sum

119901isinp119904120591isin120591119904120587

119864 [120579120587119901

120575120587120591

120601119890119904119894120587

| x119896 y119896 z119896]

(12)

Experimental results using different values for 119870 indicatedthat 119870 = 10 samples is a reasonable choice to get a goodapproximation of the perplexity Because of the exchangeabil-ity of the working modes it is possible that quite differentsolutions of working modes are detected across differentsamples In practice however we have also found thatthe solutions of working modes are relatively stable acrosssamples with only a small subset of unique working modesappearing in any sampleHencewe use the average perplexityvalues across samples in the experiment

Figure 5 illustrates the perplexity as a function of itera-tions of the Gibbs sampler for aΠ = 200model to fit the dataset respectively It appears from Figure 5 that performance ofmodels (for different settings of parameter 119870) trained usingthe Gibbs sampler appears to stabilize rather quickly (afterabout 100 iterations) at least in terms of perplexity on thedata set This indicates that the perplexity values flatten outafter a 100 or so iterations of the Gibbs sampler

63The Number ofWorkingModesΠ Although the perplex-ity computation is able to be averaged over different Gibbssampler runs other applications of the model rely on theanalysis of each working mode and are based on the analysisof each sample Meanwhile the setting of the parameter Π isalso determined according to the perplexity The parameterΠ represents the number of working modes

0 50 100 150 200 250 30044504500455046004650470047504800485049004950

Perp

lexi

ty

Perplexity

Number of working modes Π

K = 10 Gibbs samples

Figure 6 Perplexity as a function of the parameter Π of the Gibbssampler for 119870 = 10 samples

Figure 6 illustrates the perplexity as a function of theparameter Π in 119870 = 10 Gibbs samples Empirical settingsof the parameter Π show that the average perplexity overthe data set decreases with the increase of the parameterΠ Experimental results confirm that the average perplexityindeed decreases as we made analysis In particular theperplexity values flatten out after the parameter Π is set to200 This indicates that the parameter Π = 200 fits the dataset in the model

64 Analysis of the WCM Results About the analysis of theWCM results we can use the point estimate of the WCMparameters to look at specific Θ Δ and Φ distributions andrelated quantities that can be derived from these parameters(such as the probability of a working place and a working dategiven a randomly selected event fromaworkingmode) In thefollowing results we take a specific sample x

119896 y119896 and z

119896 after

100 iterations from a single arbitrarily selected Gibbs run andthen generate point estimates of Θ Δ and Φ using (9)

There are totally 200 working modes (parameter Π =

200) Each working mode using a WMV helps us to betterunderstand the occurrences of events For the sake of analysiswe list the highest probability working modes for eachworking place and each working date from the WCM inTable 2 In each working mode we list the top 10 eventsmost likely to be generated in the most likely working modeconditioned on both the working place and working dateFor example in the working place of Northern China for themost likely workingmode (numbered 101 in the 200 workingmodes) the top 10 events (OPI SPM EAI HOS BVI MCIAOM APTS CPD and TCI) are most likely to occur in theworking date of June

Experimental results show that different working placeshave different working modes in spite of the same workingdate and the same working place also has different workingmodes for different working dates It indicates that theworking mode is indeed related with the working place andworking date Events related with the pumping system such

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

4 Mathematical Problems in Engineering

Definition 2 (event) An event of the equipment denoted by119890 is to record an occurrence of a specific message indicatingthat something such as an operation has happened in theequipment

For example in Figure 1 there are six events (warm-uplanding leg unfolding cantilever unfolding concrete pumpinglanding leg folding and cantilever folding) which reflect theworking condition of some component in the concrete pumptruck respectively We will use integers to denote the entriesin the event set with each event 119890 taking a value from 1 119864where 119864 is the number of unique events in the event setdenoted by E

Definition 3 (event sequence) An event sequence denotedby e119904 of the equipment consists of a sequence of events that

occur in work cycle 119904

An event sequence is represented as a vector of events e119904

with 119873119904entries For example in Figure 1 the event set of the

concrete pump truck contains six (119864 = 6) events denotedby Epump = (1 6) where the integers represent theentry of the events warm-up landing leg unfolding cantileverunfolding concrete pumping landing leg folding and cantileverfolding respectively Hence the event sequence is equal toa vector with the length 119873

119904= 10 denoted by e

119904=

(1 2 3 4 4 4 4 4 5 6)Suppose that the data set has 119878 work cycles of the

equipment corresponding to 119878 event sequences The data setwith 119878 event sequences is represented as a concatenation ofthe event sequence vectors which we will denote by e having119873 = sum

119878

119904=1 119873119904

In a work cycle an event sequence provides us a mainworking process of the equipment However an occurrenceof the event is also relatedwith theworking place andworkingdate of the equipment For example the concrete pumptruck will add an operation event concrete mixing in order toprevent the concrete setting if theworking temperature is lowThe working temperature is directly related with the workingplace (eg north or south of China) and working date (egwinter or summer)

In addition to these events we have the informationabout the characteristics of each event sequence (work cycle)working place working date and equipment pieces numberof the work cycle We define p

119904to be the set of working places

of work cycle 119904 p119904consists of elements that are integers from

1 119875 where 119875 is the number of working places whichgenerated the event sequences in the data set 119875

119904will be used

to denote the number of working places of work cycle 119904 Wedefine 120591 to be the set of working dates of work cycle 119904 120591

119904

consists of elements that are integers from 1 119879 where119879 isthe number of working dates (In order to ease the notationin the working date we just record the working month of thework cycle which means 119879 = 12) 119879

119904will be used to denote

the number of working dates of work cycle 119904 We define 120596119904to

be the set of equipment number of work cycle 119904 120596119904consists

of elements that are integers from 1 Ω where Ω is thenumber of the equipment pieces

Definition 4 (work cycle characteristic) A work cycle char-acteristic (WCC) is five-tuple set denoted by W

119904=

E e119904 p119904 120591119904 120596119904 which record all the information about the

work cycle 119904

A WCC is corresponding to a work cycle so the originaldata set is redefined as a group of WCCs denoted by D =

W1 W119878 The WCCs of two work cycles are likelyto be different though they have the same working placeand working date The main differences between the workcycles center on the occurrence of the events However theoccurrence disciplines of the events are akin to each other forthe work cycles in the same working mode For example theconcrete pump truck has twomain workingmodes pumpingmode and travelingmodel For thework cycle in the pumpingmode of the concrete pump truck the occurrence of the eventconcrete pumping is frequent as shown in Figure 1 Howeverfor the work cycle in the traveling mode the occurrence ofthe event concrete pumping is none since the concrete pumptruck can not pump concrete in the traveling mode

Definition 5 (working mode) A working mode denoted by120587 is on behalf of a kind of work cycles that is about a specificsubject has an identifiable purpose and can stand alone

For event set E we define working mode vector (WMV)G(120587) = (1 1198881) (119864 119888

119864) to be the set of events 119890 associated

with its occurrence frequency 119888119890 where sum

119864

119890=1 119888119890

= 1 TheWMV G(120587) is able to depict the occurrence disciplinesof events according to the occurrence frequency of eventsTherefore if we can get a group of WMVs for a group ofwork cycles it will help us better understand the occurrencedisciplines of events

Definition 6 (working mode space (WMS)) A working modespace (WMS) denoted by G = G(1) G(Π) is a set ofWMVs for a group of given work cycles of equipment

Actually the WMS is akin to a group of cluster centerseach of which depicts the working condition of equipment indifferent working modes

4 The Inference of WMS

In this section we develop effective algorithms for theinference of the WMS for a group of given work cycles ofequipment Before proceeding we formulate our problem asfollows

WMS Inference Problem Given a group of work cycles asso-ciated with the corresponding WCCs D = W1 W119878the inference problem is to infer the WMS model G =

G(1) G(Π) whereΠ represents the number of workingmodes

With the help of WMS we can find that in differentworking places and differentworking dates thework cycles ofequipment have different working modes Meanwhile thereare several working modes in the same working place andthe same working date The WMV of working mode reflects

Mathematical Problems in Engineering 5

the working condition of its corresponding work cycleespecially the occurrence disciplines of events

In the remainder of this section we first introduce theWCM for learning theWMS for a group of given work cyclesand then introduce the inference framework of the WCM

41 The WCM TheWCM is a hierarchical generative modelin which each event 119890 in a work cycle is associated with threelatent variables a working place x a working date y anda working mode z These latent variables augment the 119864-dimensional vector e (indicating the values of all events in theevent set E) with three additional 119864-dimensional vectors xy and z indicatingworking place working date andworkingmode assignments for the 119864 events

As we observed the sets of working places and the sets ofworking dates for each work cycle are observed This leavesthe unresolved issue of having unobserved working placesand working dates and avoids the need to define a prioron working places and working dates which is outside ofthe scope of our model Each working place is associatedwith a multinomial distribution over working mode andeach working date is also associated with a multinomialdistribution over working mode Conditioned on the setof working places and the set of working dates associatedwith their distributions over working modes the process bywhich the corresponding event sequence for a work cycleis simulated can be summarized as follows first a workingplace and a working date are respectively chosen uniformlyat random for each event that will appear in the workcycle next a working mode is sampled for each event bothfrom the distribution over working mode associated withthe working place of that event and from the distributionover working mode associated with the working date ofthat event finally the events themselves are sampled fromthe distribution over events associated with each workingmode

This simulating process can be expressed more formallyby defining some of the other variables in the WCMAssume we have Π working modes We can parameterizethe multinomial distribution over working modes for eachworking place using matrix Θ of size Π times 119875 with elements120579120587119901

that stand for the probability of assigning working mode120587 to an event occurring in working place 119901 Thus sum

Π

120587=1 120579120587119901

=

1 and for simplicity of notation we will drop the index 120587

when convenient and use 120579119901to stand for the 119901th column

of the matrix Θ Similarly we use matrix Δ of size Π times 119879

to parameterize the multinomial distribution over workingmodes for each working date where elements 120575

120587120591stand for

the probability of assigning working mode 120587 to an eventoccurring in the working date 120591 Thus sum

119879

120591=1 120575120587120591

= 1 andwe will also drop the index 120587 when convenient and use 120575

120591

to stand for the 120591th column of the matrix Δ intended tosimplify the notation The multinomial distributions overevents associated with each workingmode are parameterizedby matrix Φ of size 119864 times Π with elements 120601

119890120587that stand

for the probability of simulating to make event 119890 occurin the working mode 120587 Again sum

119864

119890=1 120601119890120587

= 1 and 120601119890

stands for the 119890th column of the matrix Φ These three

120572

120573

120574120575

Π

T

S

120579

120601

x y

z

e

P

Ns

Ps 120591s

Figure 3 The graphic representation of WCM

multinomial distributions are assumed to be generated fromsymmetric Dirichlet priors with hyperparameters 120572 120573 and120574 respectively In the results of this paper we assume thatthese hyperparameters are fixedThis notation is summarizedin Notations

The sequential simulating procedure of first picking aworking place and a working date respectively followed bypicking a working mode and then simulating an event tooccur in this working mode according to the probabilitydistributions leads to the following generative process

(1) For each working place 119901 = 1 119875 choose 120579119901

sim

Dirichlet(120572)

for each working date 120591 = 1 119879 choose 120575120591

sim

Dirichlet(120574)

for each working mode 120587 = 1 Π choose 120601120587

sim

Dirichlet(120573)

(2) For each work cycle 119904 = 1 119878

given the vector of working places p119904

given the vector of working dates 120591119904

for each event 119894 = 1 119873119904

conditioned on p119904choose working place

119909119904119894

sim Uniform(p119904)

conditioned on 120591119904choose working date

119910119904119894

sim Uniform(120591119904)

conditioned on 119909119904119894and 119910

119904119894choose working

mode 119911119904119894

sim Discrete(120579119909119904119894

120575119910119904119894

)conditioned on 119911

119904119894choose event 119890

119904119894sim

Discrete(120601119911119904119894

)

The graphical model corresponding to this process isshown in Figure 3 Under this simulating process the work-ing mode is drawn independently when conditioned onΦ and each working mode is drawn independently whenconditioned on Θ Δ and Π The probability of the eventsequence e conditioned on Θ Δ and Φ (and implicitly ona fixed number of working modes Π) is

119875 (e | Φ Δ ΘPT) =

119878

sum

119904=1119875 (e119904

| Φ Δ Θ p119904 120591119904) (1)

6 Mathematical Problems in Engineering

With the help of (1) we can first obtain the probability ofthe event sequence in each work cycle e

119904 by summing over

the latent variables x y and z to get what is shown in (3)Consider

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1119875 (119890119904119894

| Φ Δ Θ p119904 120591119904) =

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894 119911119904119894

= 120587 119909119904119894

= 119901 119910119904119894

= 120591 | Φ Δ Θ p119904 120591119904)

=

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894

| 119911119904119894

= 120587 Φ) 119875 (119911119904119894

= 120587 | 119909119904119894

= 119901 Θ) 119875 (119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) 119875 (119909119904119894

= 119901 | p119904) 119875 (119910

119904119894= 120591 | 120591

119904)

(2)

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

(3)

119875 (e | 120572 120573 120574PT) = int

Θ

int

Δ

int

Φ

119875 (e | Θ Δ ΦPT) 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (4)

= int

Θ

int

Δ

int

Φ

[

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

] 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (5)

In (3) the factorizationmakes use of the conditional inde-pendence assumptions of model Meanwhile the variablesx and y are mutually stochastically independent Equation(3) represents the probability of the events e in terms ofthe entries of the parameter matrices Θ Φ and Δ asintroduced above The probability distribution over workingplace assignments 119875(119909

119904119894= 119901 | p

119904) is assumed to be

uniform over the elements of p119904and deterministic if 119875

119904=

1 Similarly the probability distribution over working dateassignments 119875(119910

119904119894= 120591 | 120591

119904) is assumed to be uniform over

the elements of 120591119904and deterministic if 119879

119904= 1The probability

distribution over working mode assignments both 119875(119911119904119894

=

120587 | 119909119904119894

= 119901 Θ) and 119875(119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) is themultinomial distributions 120579

119901and 120575120591in Θ and Δ respectively

that corresponds to working place 119901 and working date 120591respectively The probability of an event given a workingmode assignment 119875(119890

119904119894| 119911119904119894

= 120587 Φ) is the multinomialdistribution 120601

120587in Φ that corresponds to working mode

120587In (4) and (5) we treat Θ Φ and Δ as random variables

and compute themarginal probability of a corpus by integrat-ing them out 119875(Θ Δ Φ | 120572 120574 120573) = 119875(Θ | 120572)119875(Δ | 120574)119875(Φ |

120573) are the Dirichlet priors on Θ Δ and Φ respectively as wedefined before

5 Inference of WCM from Data

The WCM contains three continuous random variablesΘ Δ and Φ Various approximate inference approacheshave recently been proposed for estimating the posteriordistribution for continuous random variables in hierarchicalBayesianmodels In this paper our inferencemethod is Gibbssampling [40] which is a special formofMarkov chainMonteCarlo

Our target of estimation is to compute the posteriordistribution119875(Θ Δ Φ | 120572 120574 120573) In order to sample the values

of the distribution we have to use the latent variables x y andz to estimate the posterior distribution

119875 (Θ Δ Φ | 120572 120574 120573)

= sum

xyz119875 (Θ Δ Φ | x y z 120572 120574 120573) 119875 (x y z | 120572 120574 120573)

(6)

The estimation process mainly involves two steps first weuse Gibbs sampling to get approximate posterior 119875(x y z |

120572 120574 120573) second 119875(Θ Δ Φ | x y z 120572 120574 120573) can be computeddirectly for each sample by exploiting the fact that theDirichlet distribution is conjugate to the multinomial

51 Gibbs Sampling Using Gibbs sampling we can generatea sample from the joint distribution 119875(z y z | 119863train 120572 120573)

by two steps first sampling working place assignment 119909119904119894

working date assignment 119910119904119894 and working mode assignment

119911119904119894for individual event 119890

119904119894 conditioned on fixed assignments

of working places working dates and working modes for allother events in the data set second repeating this processfor each event A single Gibbs sampling iteration consistsof sequentially performing this sampling of working placeworking date and working mode assignments for eachindividual event in the data set

119875 (119909119904119894

= 119901 119910119904119894

= 120591 119911119904119894

= 120587 | 119890119904119894

= 119890 xminus119904119894

yminus119904119894

zminus119904119894

eminus119904119894

PT 120572 120573)

prop

119862119864Π

119890120587minus119904119894+ 120573

sum1198901015840 119862119864Π

1198901015840120587minus119904119894

+ 119864120573

119862Π119875

120587119901minus119904119894+ 120573

sum1199011015840 119862Π119875

1205871199011015840minus119904119894

+ 119875120572

sdot

119862Π119879

120587120591minus119904119894+ 120573

sum1205911015840 119862Π119879

1205871205911015840minus119904119894

+ 119879120574

(7)

According to (1)sim(5) we can derive a basic equationneeded for the Gibbs sampler as shown in (7) In (7)

Mathematical Problems in Engineering 7

119862Π119875 means working mode assigned to working place count

matrix where 119862Π119875

120587119901minus119904119894means the number of events assigned

to working mode 120587 in the working place 119901 excluding theworking mode assignment to event 119890

119904119894 Similarly 119862

Π119879 meansworking mode assigned to working date count matrix where119862Π119879

120587120591minus119904119894means the number of events assigned toworkingmode

120587 in the working date 120591 excluding the working mode assign-ment to event 119890

119904119894 Similarly 119862

119864Π represents event assignedto working mode count matrix where 119862

119864Π

119890120587minus119904119894represents

the number of events from the 119890th entry in the event setassigned to working mode 120587 excluding the topic assignmentto event 119890

119904119894 Meanwhile x

minus119904119894 yminus119904119894

zminus119904119894

eminus119904119894

represents thevector of working place assignment vector of working dateassignment vector of working mode assignments and vectorof event observations in the data set except for the 119894th eventin the 119904th work cycle respectively

The main sampling steps are as follows we first ini-tialize the working place working date and working modeassignments x y and z randomly In each Gibbs samplingiteration we sequentially draw the working mode work-ing place and working date assignment of the 119894th eventfrom the joint conditional distribution in (7) With theincreasing of iterations the Gibbs sampler will approach itsstationary distributionmdashthe posterior distribution 119875(z y z |

119863train 120572 120573)

52 The Posterior Probability Given z y z 119863train 120572 120573 and 120574computing posterior distributions on Θ Δ and Φ is straight-forward Based on the fact that the Dirichlet distribution isconjugate to the multinomial distribution then we can get

120601120587

| z 120573 119863train sim Dilichlet (119862

119864Π

120587+ 120573)

120579119901

| x z 120572 119863train sim Dilichlet (119862

Π119875

119901+ 120572)

120575120591

| y z 120574 119863train sim Dilichlet (119862

Π119879

120591+ 120574)

(8)

where 119862119864Π

120587represents the vector of counts of the number

of times each event has been assigned to working mode120587 119862Π119875

119901and 119862

Π119879

120591are similar to 119862

119864Π

120587 Then we can evaluate

the posterior probability of each element of Θ Δ and Φ asfollows

119864 [120601120587

| z 120573 119863train] =

(119862119864Π

)

119896

+ 120573

sum1198901015840 (119862119864Π

1198901015840120587

)

119896

+ 119864120573

119864 [120579119901| x z 120572 119863train] =

(119862Π119875

)

119896

+ 120572

sum1205911015840 (119862Π119875

1205871015840119901

)

119896

+ 119875120572

119864 [120575120591| y z 120574 119863train] =

(119862Π119879

)

119896

+ 120574

sum (119862Π119879

1205871015840119905

)

119896

+ 119879120574

(9)

where (119862119864Π

)119896 is the matrix of working mode assigned to

event counts exhibited in (z)119896 and 119896 refers to sample 119896

from the Gibbs sampler These posterior probabilities also

Hopper Transportationcylinder

Stirringsystem

Pumpingsystem

Landingleg system

Cantileversystem

Concrete Specifiedlocation

Concrete streamOperation sequence

Related system

Figure 4 The stream of the concrete in the concrete pump truckand the operation sequence of the concrete pump truck at runtime

provide point estimates for Φ Θ and Δ and correspond tothe posterior predictive distribution for the next event froma working mode the next event from a working date and thenext working mode in a work cycle respectively

6 Experimental Evaluation

61 Data Preparation We trained the WCM on a real worlddata set collected from a well-known Chinese constructionmachinery manufacturer The data set is a set of eventsequence data from the concrete pump truck in 6 months(from June 2012 to November 2012) This data set contains119878 = 32 632 work cycles 119875 = 5 different working places119879 = 6 different working dates a total of 119873 = 22 418 756event tokens and an event set size of 119864 = 33 uniqueevents The working date of each work cycle is accordingto its real working month which means the working dateset T = Jun JulAug SepOctNov Because the eventsequence data are all collected in the Chinese Mainlandwe divide the working places into 5 regions according toadministrative region of China Northern China Northeast-ernChina EasternChinaMid-SouthernChina andWesternChina

The concrete pump truck is a type of constructionmachinery which is a truck associated with a concrete pumpIt alternates between two working statuses traveling andpumping In the pumping status it will push the concreteto the specified location In the traveling status it is just atruck In the experiment we mainly focus on events in thepumping status Figure 4 shows the stream of the concrete inthe concrete pump truck at runtime and operation sequenceof different systems in the concrete pump truckThe concretepump truck first switches to pumping status and then unfoldsand fixes the landing leg Next it unfolds cantilever tothe specified location Afterwards the concrete is pouredto the hopper and meanwhile the stirring system initiatesstirring the concrete Finally the pumping system initiatespumping the concrete in the hopper to the specified locationWhen the pumping ends the concrete pump truck stops thepumping system and then folds the cantilever and landing leg

8 Mathematical Problems in Engineering

Table 1 Event set

Event Abbr Type Related systemStop pumping mandatorily SPM Alarm event AllReminder of concrete import RCI Alarm event HopperConcrete piston withdrawing CPW Alarm event Pumping systemReminder of concrete cylinder water RCSW Alarm event HopperSwing cylinder initiate SCI Operation event Pumping systemStalling of engine SoE Alarm event AllAlteration of operation mode (remote or close) AOM Operation event Pumping systemAlteration of pump truck status (pumping or travelling) APTS Operation event AllControl of pumping displacement CPD Operation event Pumping systemTransportation cylinder initiate TCI Operation event Pumping systemManual control of master cylinder MCMC Operation event Pumping systemManual control of swing cylinder MCSC Operation event Pumping systemDetection of system pressure DSP Alarm event Pumping systemManual control of engine speed MCES Operation event Pumping systemHigh pressure mode initiate HPMI Operation event Pumping systemWarm-up initiate WUI Operation event Pumping systemWater pump initiate WPI Operation event HopperConcrete stirring initiate CSI Operation event Stirring systemCantilever folding initiate CFI Operation event Cantilever systemTemperature control initiate TCI Operation event Pumping systemCantilever movement CM Operation event Cantilever systemLanding leg movement LLM Operation event Landing leg systemDetection of oil pressure DOP Alarm event Pumping systemLanding leg folding LLF Operation event Landing leg systemRotary table movement RTM Operation event Cantilever systemOil pump initiate OPI Operation event Pumping systemEnergy accumulator initiate EAI Operation event Pumping systemBypath valve initiate BVI Operation event Pumping systemConcrete pumping initiate CPI Operation event Pumping systemMaster cylinder initiate MCI Operation event Pumping systemCantilever shock absorbers initiate CSAI Alarm event Cantilever systemInitiate of system cooling ISC Operation event Pumping systemHydraulic oil supplement HOS Operation event Pumping system

successively Table 1 shows the relations between systems andevents in the concrete pump truck

Table 1 shows all the events in the event set There aretwo types of events alert event and operation event Theoccurrence of an alarm event is to remind the operator thatsome emergency happens For example the occurrence ofevent RCI means to remind the operator to import concreteinto the hopper The alarm event is not a regular operationThe operation event is the real record of regular operations inthe concrete pump truck

62 Analysis for Gibbs Sampling Using Perplexity As men-tioned earlier in the experiment described in this paperwe donot estimate the hyperparameters120572120573 and 120574 Instead they arefixed at 50Π 001 and 50Π respectively In this paper weuse the perplexity of themodel on test work cycles to evaluatewhen the performance of the model begins to stabilize

The perplexity of new unobserved work cycle 119904 thatcontains events e

119904and is conditioned on the working places

p119904and working dates 120591

119904of the work cycle is defined as

Perplexity (e119904

| p119904 120591119904) = exp(minus

log119875 (e119904

| p119904 120591119904)

119873119904

) (10)

where 119875(e119904

| p119904 120591119904) is the probability assigned by the

WCM To simplify notation here we do not consider theexplicit dependency on the hyperparameters For multiplework cycles we report the average perplexity overwork cyclesdefined as follows

Perplexity =

119878

sum

119904=1

Perplexity (e119904

| p119904 120591119904)

119878

(11)

The lower the perplexity the better the performance of themodel We can obtain an approximate estimate of perplexity

Mathematical Problems in Engineering 9

0 20 40 60 80 100 120 140 160 180 200

4500

4600

4700

4800

4900

5000

5100

5200

5300

Iteration

Perp

lexi

ty

K = 10

K = 8

K = 6

K = 4

K = 2

The number of working modes Π = 200

Figure 5 Perplexity as a function of iterations of the Gibbs samplerfor a Π = 200 model respectively Each curve shows the perplexityfromaveraging for different settings ofΠ but nowover a larger rangeof sampling iterations

by averaging over multiple samples according to (9) calcu-lated as follows

119875 (e119904

| p119904 120591119904)

=

1119870

119870

sum

119896=1

119873119904

prod

119894=1

1119875119904119879119904

sum

119901isinp119904120591isin120591119904120587

119864 [120579120587119901

120575120587120591

120601119890119904119894120587

| x119896 y119896 z119896]

(12)

Experimental results using different values for 119870 indicatedthat 119870 = 10 samples is a reasonable choice to get a goodapproximation of the perplexity Because of the exchangeabil-ity of the working modes it is possible that quite differentsolutions of working modes are detected across differentsamples In practice however we have also found thatthe solutions of working modes are relatively stable acrosssamples with only a small subset of unique working modesappearing in any sampleHencewe use the average perplexityvalues across samples in the experiment

Figure 5 illustrates the perplexity as a function of itera-tions of the Gibbs sampler for aΠ = 200model to fit the dataset respectively It appears from Figure 5 that performance ofmodels (for different settings of parameter 119870) trained usingthe Gibbs sampler appears to stabilize rather quickly (afterabout 100 iterations) at least in terms of perplexity on thedata set This indicates that the perplexity values flatten outafter a 100 or so iterations of the Gibbs sampler

63The Number ofWorkingModesΠ Although the perplex-ity computation is able to be averaged over different Gibbssampler runs other applications of the model rely on theanalysis of each working mode and are based on the analysisof each sample Meanwhile the setting of the parameter Π isalso determined according to the perplexity The parameterΠ represents the number of working modes

0 50 100 150 200 250 30044504500455046004650470047504800485049004950

Perp

lexi

ty

Perplexity

Number of working modes Π

K = 10 Gibbs samples

Figure 6 Perplexity as a function of the parameter Π of the Gibbssampler for 119870 = 10 samples

Figure 6 illustrates the perplexity as a function of theparameter Π in 119870 = 10 Gibbs samples Empirical settingsof the parameter Π show that the average perplexity overthe data set decreases with the increase of the parameterΠ Experimental results confirm that the average perplexityindeed decreases as we made analysis In particular theperplexity values flatten out after the parameter Π is set to200 This indicates that the parameter Π = 200 fits the dataset in the model

64 Analysis of the WCM Results About the analysis of theWCM results we can use the point estimate of the WCMparameters to look at specific Θ Δ and Φ distributions andrelated quantities that can be derived from these parameters(such as the probability of a working place and a working dategiven a randomly selected event fromaworkingmode) In thefollowing results we take a specific sample x

119896 y119896 and z

119896 after

100 iterations from a single arbitrarily selected Gibbs run andthen generate point estimates of Θ Δ and Φ using (9)

There are totally 200 working modes (parameter Π =

200) Each working mode using a WMV helps us to betterunderstand the occurrences of events For the sake of analysiswe list the highest probability working modes for eachworking place and each working date from the WCM inTable 2 In each working mode we list the top 10 eventsmost likely to be generated in the most likely working modeconditioned on both the working place and working dateFor example in the working place of Northern China for themost likely workingmode (numbered 101 in the 200 workingmodes) the top 10 events (OPI SPM EAI HOS BVI MCIAOM APTS CPD and TCI) are most likely to occur in theworking date of June

Experimental results show that different working placeshave different working modes in spite of the same workingdate and the same working place also has different workingmodes for different working dates It indicates that theworking mode is indeed related with the working place andworking date Events related with the pumping system such

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

Mathematical Problems in Engineering 5

the working condition of its corresponding work cycleespecially the occurrence disciplines of events

In the remainder of this section we first introduce theWCM for learning theWMS for a group of given work cyclesand then introduce the inference framework of the WCM

41 The WCM TheWCM is a hierarchical generative modelin which each event 119890 in a work cycle is associated with threelatent variables a working place x a working date y anda working mode z These latent variables augment the 119864-dimensional vector e (indicating the values of all events in theevent set E) with three additional 119864-dimensional vectors xy and z indicatingworking place working date andworkingmode assignments for the 119864 events

As we observed the sets of working places and the sets ofworking dates for each work cycle are observed This leavesthe unresolved issue of having unobserved working placesand working dates and avoids the need to define a prioron working places and working dates which is outside ofthe scope of our model Each working place is associatedwith a multinomial distribution over working mode andeach working date is also associated with a multinomialdistribution over working mode Conditioned on the setof working places and the set of working dates associatedwith their distributions over working modes the process bywhich the corresponding event sequence for a work cycleis simulated can be summarized as follows first a workingplace and a working date are respectively chosen uniformlyat random for each event that will appear in the workcycle next a working mode is sampled for each event bothfrom the distribution over working mode associated withthe working place of that event and from the distributionover working mode associated with the working date ofthat event finally the events themselves are sampled fromthe distribution over events associated with each workingmode

This simulating process can be expressed more formallyby defining some of the other variables in the WCMAssume we have Π working modes We can parameterizethe multinomial distribution over working modes for eachworking place using matrix Θ of size Π times 119875 with elements120579120587119901

that stand for the probability of assigning working mode120587 to an event occurring in working place 119901 Thus sum

Π

120587=1 120579120587119901

=

1 and for simplicity of notation we will drop the index 120587

when convenient and use 120579119901to stand for the 119901th column

of the matrix Θ Similarly we use matrix Δ of size Π times 119879

to parameterize the multinomial distribution over workingmodes for each working date where elements 120575

120587120591stand for

the probability of assigning working mode 120587 to an eventoccurring in the working date 120591 Thus sum

119879

120591=1 120575120587120591

= 1 andwe will also drop the index 120587 when convenient and use 120575

120591

to stand for the 120591th column of the matrix Δ intended tosimplify the notation The multinomial distributions overevents associated with each workingmode are parameterizedby matrix Φ of size 119864 times Π with elements 120601

119890120587that stand

for the probability of simulating to make event 119890 occurin the working mode 120587 Again sum

119864

119890=1 120601119890120587

= 1 and 120601119890

stands for the 119890th column of the matrix Φ These three

120572

120573

120574120575

Π

T

S

120579

120601

x y

z

e

P

Ns

Ps 120591s

Figure 3 The graphic representation of WCM

multinomial distributions are assumed to be generated fromsymmetric Dirichlet priors with hyperparameters 120572 120573 and120574 respectively In the results of this paper we assume thatthese hyperparameters are fixedThis notation is summarizedin Notations

The sequential simulating procedure of first picking aworking place and a working date respectively followed bypicking a working mode and then simulating an event tooccur in this working mode according to the probabilitydistributions leads to the following generative process

(1) For each working place 119901 = 1 119875 choose 120579119901

sim

Dirichlet(120572)

for each working date 120591 = 1 119879 choose 120575120591

sim

Dirichlet(120574)

for each working mode 120587 = 1 Π choose 120601120587

sim

Dirichlet(120573)

(2) For each work cycle 119904 = 1 119878

given the vector of working places p119904

given the vector of working dates 120591119904

for each event 119894 = 1 119873119904

conditioned on p119904choose working place

119909119904119894

sim Uniform(p119904)

conditioned on 120591119904choose working date

119910119904119894

sim Uniform(120591119904)

conditioned on 119909119904119894and 119910

119904119894choose working

mode 119911119904119894

sim Discrete(120579119909119904119894

120575119910119904119894

)conditioned on 119911

119904119894choose event 119890

119904119894sim

Discrete(120601119911119904119894

)

The graphical model corresponding to this process isshown in Figure 3 Under this simulating process the work-ing mode is drawn independently when conditioned onΦ and each working mode is drawn independently whenconditioned on Θ Δ and Π The probability of the eventsequence e conditioned on Θ Δ and Φ (and implicitly ona fixed number of working modes Π) is

119875 (e | Φ Δ ΘPT) =

119878

sum

119904=1119875 (e119904

| Φ Δ Θ p119904 120591119904) (1)

6 Mathematical Problems in Engineering

With the help of (1) we can first obtain the probability ofthe event sequence in each work cycle e

119904 by summing over

the latent variables x y and z to get what is shown in (3)Consider

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1119875 (119890119904119894

| Φ Δ Θ p119904 120591119904) =

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894 119911119904119894

= 120587 119909119904119894

= 119901 119910119904119894

= 120591 | Φ Δ Θ p119904 120591119904)

=

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894

| 119911119904119894

= 120587 Φ) 119875 (119911119904119894

= 120587 | 119909119904119894

= 119901 Θ) 119875 (119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) 119875 (119909119904119894

= 119901 | p119904) 119875 (119910

119904119894= 120591 | 120591

119904)

(2)

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

(3)

119875 (e | 120572 120573 120574PT) = int

Θ

int

Δ

int

Φ

119875 (e | Θ Δ ΦPT) 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (4)

= int

Θ

int

Δ

int

Φ

[

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

] 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (5)

In (3) the factorizationmakes use of the conditional inde-pendence assumptions of model Meanwhile the variablesx and y are mutually stochastically independent Equation(3) represents the probability of the events e in terms ofthe entries of the parameter matrices Θ Φ and Δ asintroduced above The probability distribution over workingplace assignments 119875(119909

119904119894= 119901 | p

119904) is assumed to be

uniform over the elements of p119904and deterministic if 119875

119904=

1 Similarly the probability distribution over working dateassignments 119875(119910

119904119894= 120591 | 120591

119904) is assumed to be uniform over

the elements of 120591119904and deterministic if 119879

119904= 1The probability

distribution over working mode assignments both 119875(119911119904119894

=

120587 | 119909119904119894

= 119901 Θ) and 119875(119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) is themultinomial distributions 120579

119901and 120575120591in Θ and Δ respectively

that corresponds to working place 119901 and working date 120591respectively The probability of an event given a workingmode assignment 119875(119890

119904119894| 119911119904119894

= 120587 Φ) is the multinomialdistribution 120601

120587in Φ that corresponds to working mode

120587In (4) and (5) we treat Θ Φ and Δ as random variables

and compute themarginal probability of a corpus by integrat-ing them out 119875(Θ Δ Φ | 120572 120574 120573) = 119875(Θ | 120572)119875(Δ | 120574)119875(Φ |

120573) are the Dirichlet priors on Θ Δ and Φ respectively as wedefined before

5 Inference of WCM from Data

The WCM contains three continuous random variablesΘ Δ and Φ Various approximate inference approacheshave recently been proposed for estimating the posteriordistribution for continuous random variables in hierarchicalBayesianmodels In this paper our inferencemethod is Gibbssampling [40] which is a special formofMarkov chainMonteCarlo

Our target of estimation is to compute the posteriordistribution119875(Θ Δ Φ | 120572 120574 120573) In order to sample the values

of the distribution we have to use the latent variables x y andz to estimate the posterior distribution

119875 (Θ Δ Φ | 120572 120574 120573)

= sum

xyz119875 (Θ Δ Φ | x y z 120572 120574 120573) 119875 (x y z | 120572 120574 120573)

(6)

The estimation process mainly involves two steps first weuse Gibbs sampling to get approximate posterior 119875(x y z |

120572 120574 120573) second 119875(Θ Δ Φ | x y z 120572 120574 120573) can be computeddirectly for each sample by exploiting the fact that theDirichlet distribution is conjugate to the multinomial

51 Gibbs Sampling Using Gibbs sampling we can generatea sample from the joint distribution 119875(z y z | 119863train 120572 120573)

by two steps first sampling working place assignment 119909119904119894

working date assignment 119910119904119894 and working mode assignment

119911119904119894for individual event 119890

119904119894 conditioned on fixed assignments

of working places working dates and working modes for allother events in the data set second repeating this processfor each event A single Gibbs sampling iteration consistsof sequentially performing this sampling of working placeworking date and working mode assignments for eachindividual event in the data set

119875 (119909119904119894

= 119901 119910119904119894

= 120591 119911119904119894

= 120587 | 119890119904119894

= 119890 xminus119904119894

yminus119904119894

zminus119904119894

eminus119904119894

PT 120572 120573)

prop

119862119864Π

119890120587minus119904119894+ 120573

sum1198901015840 119862119864Π

1198901015840120587minus119904119894

+ 119864120573

119862Π119875

120587119901minus119904119894+ 120573

sum1199011015840 119862Π119875

1205871199011015840minus119904119894

+ 119875120572

sdot

119862Π119879

120587120591minus119904119894+ 120573

sum1205911015840 119862Π119879

1205871205911015840minus119904119894

+ 119879120574

(7)

According to (1)sim(5) we can derive a basic equationneeded for the Gibbs sampler as shown in (7) In (7)

Mathematical Problems in Engineering 7

119862Π119875 means working mode assigned to working place count

matrix where 119862Π119875

120587119901minus119904119894means the number of events assigned

to working mode 120587 in the working place 119901 excluding theworking mode assignment to event 119890

119904119894 Similarly 119862

Π119879 meansworking mode assigned to working date count matrix where119862Π119879

120587120591minus119904119894means the number of events assigned toworkingmode

120587 in the working date 120591 excluding the working mode assign-ment to event 119890

119904119894 Similarly 119862

119864Π represents event assignedto working mode count matrix where 119862

119864Π

119890120587minus119904119894represents

the number of events from the 119890th entry in the event setassigned to working mode 120587 excluding the topic assignmentto event 119890

119904119894 Meanwhile x

minus119904119894 yminus119904119894

zminus119904119894

eminus119904119894

represents thevector of working place assignment vector of working dateassignment vector of working mode assignments and vectorof event observations in the data set except for the 119894th eventin the 119904th work cycle respectively

The main sampling steps are as follows we first ini-tialize the working place working date and working modeassignments x y and z randomly In each Gibbs samplingiteration we sequentially draw the working mode work-ing place and working date assignment of the 119894th eventfrom the joint conditional distribution in (7) With theincreasing of iterations the Gibbs sampler will approach itsstationary distributionmdashthe posterior distribution 119875(z y z |

119863train 120572 120573)

52 The Posterior Probability Given z y z 119863train 120572 120573 and 120574computing posterior distributions on Θ Δ and Φ is straight-forward Based on the fact that the Dirichlet distribution isconjugate to the multinomial distribution then we can get

120601120587

| z 120573 119863train sim Dilichlet (119862

119864Π

120587+ 120573)

120579119901

| x z 120572 119863train sim Dilichlet (119862

Π119875

119901+ 120572)

120575120591

| y z 120574 119863train sim Dilichlet (119862

Π119879

120591+ 120574)

(8)

where 119862119864Π

120587represents the vector of counts of the number

of times each event has been assigned to working mode120587 119862Π119875

119901and 119862

Π119879

120591are similar to 119862

119864Π

120587 Then we can evaluate

the posterior probability of each element of Θ Δ and Φ asfollows

119864 [120601120587

| z 120573 119863train] =

(119862119864Π

)

119896

+ 120573

sum1198901015840 (119862119864Π

1198901015840120587

)

119896

+ 119864120573

119864 [120579119901| x z 120572 119863train] =

(119862Π119875

)

119896

+ 120572

sum1205911015840 (119862Π119875

1205871015840119901

)

119896

+ 119875120572

119864 [120575120591| y z 120574 119863train] =

(119862Π119879

)

119896

+ 120574

sum (119862Π119879

1205871015840119905

)

119896

+ 119879120574

(9)

where (119862119864Π

)119896 is the matrix of working mode assigned to

event counts exhibited in (z)119896 and 119896 refers to sample 119896

from the Gibbs sampler These posterior probabilities also

Hopper Transportationcylinder

Stirringsystem

Pumpingsystem

Landingleg system

Cantileversystem

Concrete Specifiedlocation

Concrete streamOperation sequence

Related system

Figure 4 The stream of the concrete in the concrete pump truckand the operation sequence of the concrete pump truck at runtime

provide point estimates for Φ Θ and Δ and correspond tothe posterior predictive distribution for the next event froma working mode the next event from a working date and thenext working mode in a work cycle respectively

6 Experimental Evaluation

61 Data Preparation We trained the WCM on a real worlddata set collected from a well-known Chinese constructionmachinery manufacturer The data set is a set of eventsequence data from the concrete pump truck in 6 months(from June 2012 to November 2012) This data set contains119878 = 32 632 work cycles 119875 = 5 different working places119879 = 6 different working dates a total of 119873 = 22 418 756event tokens and an event set size of 119864 = 33 uniqueevents The working date of each work cycle is accordingto its real working month which means the working dateset T = Jun JulAug SepOctNov Because the eventsequence data are all collected in the Chinese Mainlandwe divide the working places into 5 regions according toadministrative region of China Northern China Northeast-ernChina EasternChinaMid-SouthernChina andWesternChina

The concrete pump truck is a type of constructionmachinery which is a truck associated with a concrete pumpIt alternates between two working statuses traveling andpumping In the pumping status it will push the concreteto the specified location In the traveling status it is just atruck In the experiment we mainly focus on events in thepumping status Figure 4 shows the stream of the concrete inthe concrete pump truck at runtime and operation sequenceof different systems in the concrete pump truckThe concretepump truck first switches to pumping status and then unfoldsand fixes the landing leg Next it unfolds cantilever tothe specified location Afterwards the concrete is pouredto the hopper and meanwhile the stirring system initiatesstirring the concrete Finally the pumping system initiatespumping the concrete in the hopper to the specified locationWhen the pumping ends the concrete pump truck stops thepumping system and then folds the cantilever and landing leg

8 Mathematical Problems in Engineering

Table 1 Event set

Event Abbr Type Related systemStop pumping mandatorily SPM Alarm event AllReminder of concrete import RCI Alarm event HopperConcrete piston withdrawing CPW Alarm event Pumping systemReminder of concrete cylinder water RCSW Alarm event HopperSwing cylinder initiate SCI Operation event Pumping systemStalling of engine SoE Alarm event AllAlteration of operation mode (remote or close) AOM Operation event Pumping systemAlteration of pump truck status (pumping or travelling) APTS Operation event AllControl of pumping displacement CPD Operation event Pumping systemTransportation cylinder initiate TCI Operation event Pumping systemManual control of master cylinder MCMC Operation event Pumping systemManual control of swing cylinder MCSC Operation event Pumping systemDetection of system pressure DSP Alarm event Pumping systemManual control of engine speed MCES Operation event Pumping systemHigh pressure mode initiate HPMI Operation event Pumping systemWarm-up initiate WUI Operation event Pumping systemWater pump initiate WPI Operation event HopperConcrete stirring initiate CSI Operation event Stirring systemCantilever folding initiate CFI Operation event Cantilever systemTemperature control initiate TCI Operation event Pumping systemCantilever movement CM Operation event Cantilever systemLanding leg movement LLM Operation event Landing leg systemDetection of oil pressure DOP Alarm event Pumping systemLanding leg folding LLF Operation event Landing leg systemRotary table movement RTM Operation event Cantilever systemOil pump initiate OPI Operation event Pumping systemEnergy accumulator initiate EAI Operation event Pumping systemBypath valve initiate BVI Operation event Pumping systemConcrete pumping initiate CPI Operation event Pumping systemMaster cylinder initiate MCI Operation event Pumping systemCantilever shock absorbers initiate CSAI Alarm event Cantilever systemInitiate of system cooling ISC Operation event Pumping systemHydraulic oil supplement HOS Operation event Pumping system

successively Table 1 shows the relations between systems andevents in the concrete pump truck

Table 1 shows all the events in the event set There aretwo types of events alert event and operation event Theoccurrence of an alarm event is to remind the operator thatsome emergency happens For example the occurrence ofevent RCI means to remind the operator to import concreteinto the hopper The alarm event is not a regular operationThe operation event is the real record of regular operations inthe concrete pump truck

62 Analysis for Gibbs Sampling Using Perplexity As men-tioned earlier in the experiment described in this paperwe donot estimate the hyperparameters120572120573 and 120574 Instead they arefixed at 50Π 001 and 50Π respectively In this paper weuse the perplexity of themodel on test work cycles to evaluatewhen the performance of the model begins to stabilize

The perplexity of new unobserved work cycle 119904 thatcontains events e

119904and is conditioned on the working places

p119904and working dates 120591

119904of the work cycle is defined as

Perplexity (e119904

| p119904 120591119904) = exp(minus

log119875 (e119904

| p119904 120591119904)

119873119904

) (10)

where 119875(e119904

| p119904 120591119904) is the probability assigned by the

WCM To simplify notation here we do not consider theexplicit dependency on the hyperparameters For multiplework cycles we report the average perplexity overwork cyclesdefined as follows

Perplexity =

119878

sum

119904=1

Perplexity (e119904

| p119904 120591119904)

119878

(11)

The lower the perplexity the better the performance of themodel We can obtain an approximate estimate of perplexity

Mathematical Problems in Engineering 9

0 20 40 60 80 100 120 140 160 180 200

4500

4600

4700

4800

4900

5000

5100

5200

5300

Iteration

Perp

lexi

ty

K = 10

K = 8

K = 6

K = 4

K = 2

The number of working modes Π = 200

Figure 5 Perplexity as a function of iterations of the Gibbs samplerfor a Π = 200 model respectively Each curve shows the perplexityfromaveraging for different settings ofΠ but nowover a larger rangeof sampling iterations

by averaging over multiple samples according to (9) calcu-lated as follows

119875 (e119904

| p119904 120591119904)

=

1119870

119870

sum

119896=1

119873119904

prod

119894=1

1119875119904119879119904

sum

119901isinp119904120591isin120591119904120587

119864 [120579120587119901

120575120587120591

120601119890119904119894120587

| x119896 y119896 z119896]

(12)

Experimental results using different values for 119870 indicatedthat 119870 = 10 samples is a reasonable choice to get a goodapproximation of the perplexity Because of the exchangeabil-ity of the working modes it is possible that quite differentsolutions of working modes are detected across differentsamples In practice however we have also found thatthe solutions of working modes are relatively stable acrosssamples with only a small subset of unique working modesappearing in any sampleHencewe use the average perplexityvalues across samples in the experiment

Figure 5 illustrates the perplexity as a function of itera-tions of the Gibbs sampler for aΠ = 200model to fit the dataset respectively It appears from Figure 5 that performance ofmodels (for different settings of parameter 119870) trained usingthe Gibbs sampler appears to stabilize rather quickly (afterabout 100 iterations) at least in terms of perplexity on thedata set This indicates that the perplexity values flatten outafter a 100 or so iterations of the Gibbs sampler

63The Number ofWorkingModesΠ Although the perplex-ity computation is able to be averaged over different Gibbssampler runs other applications of the model rely on theanalysis of each working mode and are based on the analysisof each sample Meanwhile the setting of the parameter Π isalso determined according to the perplexity The parameterΠ represents the number of working modes

0 50 100 150 200 250 30044504500455046004650470047504800485049004950

Perp

lexi

ty

Perplexity

Number of working modes Π

K = 10 Gibbs samples

Figure 6 Perplexity as a function of the parameter Π of the Gibbssampler for 119870 = 10 samples

Figure 6 illustrates the perplexity as a function of theparameter Π in 119870 = 10 Gibbs samples Empirical settingsof the parameter Π show that the average perplexity overthe data set decreases with the increase of the parameterΠ Experimental results confirm that the average perplexityindeed decreases as we made analysis In particular theperplexity values flatten out after the parameter Π is set to200 This indicates that the parameter Π = 200 fits the dataset in the model

64 Analysis of the WCM Results About the analysis of theWCM results we can use the point estimate of the WCMparameters to look at specific Θ Δ and Φ distributions andrelated quantities that can be derived from these parameters(such as the probability of a working place and a working dategiven a randomly selected event fromaworkingmode) In thefollowing results we take a specific sample x

119896 y119896 and z

119896 after

100 iterations from a single arbitrarily selected Gibbs run andthen generate point estimates of Θ Δ and Φ using (9)

There are totally 200 working modes (parameter Π =

200) Each working mode using a WMV helps us to betterunderstand the occurrences of events For the sake of analysiswe list the highest probability working modes for eachworking place and each working date from the WCM inTable 2 In each working mode we list the top 10 eventsmost likely to be generated in the most likely working modeconditioned on both the working place and working dateFor example in the working place of Northern China for themost likely workingmode (numbered 101 in the 200 workingmodes) the top 10 events (OPI SPM EAI HOS BVI MCIAOM APTS CPD and TCI) are most likely to occur in theworking date of June

Experimental results show that different working placeshave different working modes in spite of the same workingdate and the same working place also has different workingmodes for different working dates It indicates that theworking mode is indeed related with the working place andworking date Events related with the pumping system such

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

6 Mathematical Problems in Engineering

With the help of (1) we can first obtain the probability ofthe event sequence in each work cycle e

119904 by summing over

the latent variables x y and z to get what is shown in (3)Consider

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1119875 (119890119904119894

| Φ Δ Θ p119904 120591119904) =

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894 119911119904119894

= 120587 119909119904119894

= 119901 119910119904119894

= 120591 | Φ Δ Θ p119904 120591119904)

=

119873119904

prod

119894=1

119879

sum

120591=1

119875

sum

119901=1

Π

sum

120587=1119875 (119890119904119894

| 119911119904119894

= 120587 Φ) 119875 (119911119904119894

= 120587 | 119909119904119894

= 119901 Θ) 119875 (119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) 119875 (119909119904119894

= 119901 | p119904) 119875 (119910

119904119894= 120591 | 120591

119904)

(2)

119875 (e119904

| Φ Δ ΘPT) =

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

(3)

119875 (e | 120572 120573 120574PT) = int

Θ

int

Δ

int

Φ

119875 (e | Θ Δ ΦPT) 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (4)

= int

Θ

int

Δ

int

Φ

[

119873119904

prod

119894=1

1119875119904

1119879119904

sum

119901isinp119904

sum

120591isin120591119904

Π

sum

120587=1120601119890119904119894120587120579120587119901

120575120587120591

] 119875 (Θ Δ Φ | 120572 120574 120573) 119889Θ 119889Δ 119889Φ (5)

In (3) the factorizationmakes use of the conditional inde-pendence assumptions of model Meanwhile the variablesx and y are mutually stochastically independent Equation(3) represents the probability of the events e in terms ofthe entries of the parameter matrices Θ Φ and Δ asintroduced above The probability distribution over workingplace assignments 119875(119909

119904119894= 119901 | p

119904) is assumed to be

uniform over the elements of p119904and deterministic if 119875

119904=

1 Similarly the probability distribution over working dateassignments 119875(119910

119904119894= 120591 | 120591

119904) is assumed to be uniform over

the elements of 120591119904and deterministic if 119879

119904= 1The probability

distribution over working mode assignments both 119875(119911119904119894

=

120587 | 119909119904119894

= 119901 Θ) and 119875(119911119904119894

= 120587 | 119910119904119894

= 120591 Δ) is themultinomial distributions 120579

119901and 120575120591in Θ and Δ respectively

that corresponds to working place 119901 and working date 120591respectively The probability of an event given a workingmode assignment 119875(119890

119904119894| 119911119904119894

= 120587 Φ) is the multinomialdistribution 120601

120587in Φ that corresponds to working mode

120587In (4) and (5) we treat Θ Φ and Δ as random variables

and compute themarginal probability of a corpus by integrat-ing them out 119875(Θ Δ Φ | 120572 120574 120573) = 119875(Θ | 120572)119875(Δ | 120574)119875(Φ |

120573) are the Dirichlet priors on Θ Δ and Φ respectively as wedefined before

5 Inference of WCM from Data

The WCM contains three continuous random variablesΘ Δ and Φ Various approximate inference approacheshave recently been proposed for estimating the posteriordistribution for continuous random variables in hierarchicalBayesianmodels In this paper our inferencemethod is Gibbssampling [40] which is a special formofMarkov chainMonteCarlo

Our target of estimation is to compute the posteriordistribution119875(Θ Δ Φ | 120572 120574 120573) In order to sample the values

of the distribution we have to use the latent variables x y andz to estimate the posterior distribution

119875 (Θ Δ Φ | 120572 120574 120573)

= sum

xyz119875 (Θ Δ Φ | x y z 120572 120574 120573) 119875 (x y z | 120572 120574 120573)

(6)

The estimation process mainly involves two steps first weuse Gibbs sampling to get approximate posterior 119875(x y z |

120572 120574 120573) second 119875(Θ Δ Φ | x y z 120572 120574 120573) can be computeddirectly for each sample by exploiting the fact that theDirichlet distribution is conjugate to the multinomial

51 Gibbs Sampling Using Gibbs sampling we can generatea sample from the joint distribution 119875(z y z | 119863train 120572 120573)

by two steps first sampling working place assignment 119909119904119894

working date assignment 119910119904119894 and working mode assignment

119911119904119894for individual event 119890

119904119894 conditioned on fixed assignments

of working places working dates and working modes for allother events in the data set second repeating this processfor each event A single Gibbs sampling iteration consistsof sequentially performing this sampling of working placeworking date and working mode assignments for eachindividual event in the data set

119875 (119909119904119894

= 119901 119910119904119894

= 120591 119911119904119894

= 120587 | 119890119904119894

= 119890 xminus119904119894

yminus119904119894

zminus119904119894

eminus119904119894

PT 120572 120573)

prop

119862119864Π

119890120587minus119904119894+ 120573

sum1198901015840 119862119864Π

1198901015840120587minus119904119894

+ 119864120573

119862Π119875

120587119901minus119904119894+ 120573

sum1199011015840 119862Π119875

1205871199011015840minus119904119894

+ 119875120572

sdot

119862Π119879

120587120591minus119904119894+ 120573

sum1205911015840 119862Π119879

1205871205911015840minus119904119894

+ 119879120574

(7)

According to (1)sim(5) we can derive a basic equationneeded for the Gibbs sampler as shown in (7) In (7)

Mathematical Problems in Engineering 7

119862Π119875 means working mode assigned to working place count

matrix where 119862Π119875

120587119901minus119904119894means the number of events assigned

to working mode 120587 in the working place 119901 excluding theworking mode assignment to event 119890

119904119894 Similarly 119862

Π119879 meansworking mode assigned to working date count matrix where119862Π119879

120587120591minus119904119894means the number of events assigned toworkingmode

120587 in the working date 120591 excluding the working mode assign-ment to event 119890

119904119894 Similarly 119862

119864Π represents event assignedto working mode count matrix where 119862

119864Π

119890120587minus119904119894represents

the number of events from the 119890th entry in the event setassigned to working mode 120587 excluding the topic assignmentto event 119890

119904119894 Meanwhile x

minus119904119894 yminus119904119894

zminus119904119894

eminus119904119894

represents thevector of working place assignment vector of working dateassignment vector of working mode assignments and vectorof event observations in the data set except for the 119894th eventin the 119904th work cycle respectively

The main sampling steps are as follows we first ini-tialize the working place working date and working modeassignments x y and z randomly In each Gibbs samplingiteration we sequentially draw the working mode work-ing place and working date assignment of the 119894th eventfrom the joint conditional distribution in (7) With theincreasing of iterations the Gibbs sampler will approach itsstationary distributionmdashthe posterior distribution 119875(z y z |

119863train 120572 120573)

52 The Posterior Probability Given z y z 119863train 120572 120573 and 120574computing posterior distributions on Θ Δ and Φ is straight-forward Based on the fact that the Dirichlet distribution isconjugate to the multinomial distribution then we can get

120601120587

| z 120573 119863train sim Dilichlet (119862

119864Π

120587+ 120573)

120579119901

| x z 120572 119863train sim Dilichlet (119862

Π119875

119901+ 120572)

120575120591

| y z 120574 119863train sim Dilichlet (119862

Π119879

120591+ 120574)

(8)

where 119862119864Π

120587represents the vector of counts of the number

of times each event has been assigned to working mode120587 119862Π119875

119901and 119862

Π119879

120591are similar to 119862

119864Π

120587 Then we can evaluate

the posterior probability of each element of Θ Δ and Φ asfollows

119864 [120601120587

| z 120573 119863train] =

(119862119864Π

)

119896

+ 120573

sum1198901015840 (119862119864Π

1198901015840120587

)

119896

+ 119864120573

119864 [120579119901| x z 120572 119863train] =

(119862Π119875

)

119896

+ 120572

sum1205911015840 (119862Π119875

1205871015840119901

)

119896

+ 119875120572

119864 [120575120591| y z 120574 119863train] =

(119862Π119879

)

119896

+ 120574

sum (119862Π119879

1205871015840119905

)

119896

+ 119879120574

(9)

where (119862119864Π

)119896 is the matrix of working mode assigned to

event counts exhibited in (z)119896 and 119896 refers to sample 119896

from the Gibbs sampler These posterior probabilities also

Hopper Transportationcylinder

Stirringsystem

Pumpingsystem

Landingleg system

Cantileversystem

Concrete Specifiedlocation

Concrete streamOperation sequence

Related system

Figure 4 The stream of the concrete in the concrete pump truckand the operation sequence of the concrete pump truck at runtime

provide point estimates for Φ Θ and Δ and correspond tothe posterior predictive distribution for the next event froma working mode the next event from a working date and thenext working mode in a work cycle respectively

6 Experimental Evaluation

61 Data Preparation We trained the WCM on a real worlddata set collected from a well-known Chinese constructionmachinery manufacturer The data set is a set of eventsequence data from the concrete pump truck in 6 months(from June 2012 to November 2012) This data set contains119878 = 32 632 work cycles 119875 = 5 different working places119879 = 6 different working dates a total of 119873 = 22 418 756event tokens and an event set size of 119864 = 33 uniqueevents The working date of each work cycle is accordingto its real working month which means the working dateset T = Jun JulAug SepOctNov Because the eventsequence data are all collected in the Chinese Mainlandwe divide the working places into 5 regions according toadministrative region of China Northern China Northeast-ernChina EasternChinaMid-SouthernChina andWesternChina

The concrete pump truck is a type of constructionmachinery which is a truck associated with a concrete pumpIt alternates between two working statuses traveling andpumping In the pumping status it will push the concreteto the specified location In the traveling status it is just atruck In the experiment we mainly focus on events in thepumping status Figure 4 shows the stream of the concrete inthe concrete pump truck at runtime and operation sequenceof different systems in the concrete pump truckThe concretepump truck first switches to pumping status and then unfoldsand fixes the landing leg Next it unfolds cantilever tothe specified location Afterwards the concrete is pouredto the hopper and meanwhile the stirring system initiatesstirring the concrete Finally the pumping system initiatespumping the concrete in the hopper to the specified locationWhen the pumping ends the concrete pump truck stops thepumping system and then folds the cantilever and landing leg

8 Mathematical Problems in Engineering

Table 1 Event set

Event Abbr Type Related systemStop pumping mandatorily SPM Alarm event AllReminder of concrete import RCI Alarm event HopperConcrete piston withdrawing CPW Alarm event Pumping systemReminder of concrete cylinder water RCSW Alarm event HopperSwing cylinder initiate SCI Operation event Pumping systemStalling of engine SoE Alarm event AllAlteration of operation mode (remote or close) AOM Operation event Pumping systemAlteration of pump truck status (pumping or travelling) APTS Operation event AllControl of pumping displacement CPD Operation event Pumping systemTransportation cylinder initiate TCI Operation event Pumping systemManual control of master cylinder MCMC Operation event Pumping systemManual control of swing cylinder MCSC Operation event Pumping systemDetection of system pressure DSP Alarm event Pumping systemManual control of engine speed MCES Operation event Pumping systemHigh pressure mode initiate HPMI Operation event Pumping systemWarm-up initiate WUI Operation event Pumping systemWater pump initiate WPI Operation event HopperConcrete stirring initiate CSI Operation event Stirring systemCantilever folding initiate CFI Operation event Cantilever systemTemperature control initiate TCI Operation event Pumping systemCantilever movement CM Operation event Cantilever systemLanding leg movement LLM Operation event Landing leg systemDetection of oil pressure DOP Alarm event Pumping systemLanding leg folding LLF Operation event Landing leg systemRotary table movement RTM Operation event Cantilever systemOil pump initiate OPI Operation event Pumping systemEnergy accumulator initiate EAI Operation event Pumping systemBypath valve initiate BVI Operation event Pumping systemConcrete pumping initiate CPI Operation event Pumping systemMaster cylinder initiate MCI Operation event Pumping systemCantilever shock absorbers initiate CSAI Alarm event Cantilever systemInitiate of system cooling ISC Operation event Pumping systemHydraulic oil supplement HOS Operation event Pumping system

successively Table 1 shows the relations between systems andevents in the concrete pump truck

Table 1 shows all the events in the event set There aretwo types of events alert event and operation event Theoccurrence of an alarm event is to remind the operator thatsome emergency happens For example the occurrence ofevent RCI means to remind the operator to import concreteinto the hopper The alarm event is not a regular operationThe operation event is the real record of regular operations inthe concrete pump truck

62 Analysis for Gibbs Sampling Using Perplexity As men-tioned earlier in the experiment described in this paperwe donot estimate the hyperparameters120572120573 and 120574 Instead they arefixed at 50Π 001 and 50Π respectively In this paper weuse the perplexity of themodel on test work cycles to evaluatewhen the performance of the model begins to stabilize

The perplexity of new unobserved work cycle 119904 thatcontains events e

119904and is conditioned on the working places

p119904and working dates 120591

119904of the work cycle is defined as

Perplexity (e119904

| p119904 120591119904) = exp(minus

log119875 (e119904

| p119904 120591119904)

119873119904

) (10)

where 119875(e119904

| p119904 120591119904) is the probability assigned by the

WCM To simplify notation here we do not consider theexplicit dependency on the hyperparameters For multiplework cycles we report the average perplexity overwork cyclesdefined as follows

Perplexity =

119878

sum

119904=1

Perplexity (e119904

| p119904 120591119904)

119878

(11)

The lower the perplexity the better the performance of themodel We can obtain an approximate estimate of perplexity

Mathematical Problems in Engineering 9

0 20 40 60 80 100 120 140 160 180 200

4500

4600

4700

4800

4900

5000

5100

5200

5300

Iteration

Perp

lexi

ty

K = 10

K = 8

K = 6

K = 4

K = 2

The number of working modes Π = 200

Figure 5 Perplexity as a function of iterations of the Gibbs samplerfor a Π = 200 model respectively Each curve shows the perplexityfromaveraging for different settings ofΠ but nowover a larger rangeof sampling iterations

by averaging over multiple samples according to (9) calcu-lated as follows

119875 (e119904

| p119904 120591119904)

=

1119870

119870

sum

119896=1

119873119904

prod

119894=1

1119875119904119879119904

sum

119901isinp119904120591isin120591119904120587

119864 [120579120587119901

120575120587120591

120601119890119904119894120587

| x119896 y119896 z119896]

(12)

Experimental results using different values for 119870 indicatedthat 119870 = 10 samples is a reasonable choice to get a goodapproximation of the perplexity Because of the exchangeabil-ity of the working modes it is possible that quite differentsolutions of working modes are detected across differentsamples In practice however we have also found thatthe solutions of working modes are relatively stable acrosssamples with only a small subset of unique working modesappearing in any sampleHencewe use the average perplexityvalues across samples in the experiment

Figure 5 illustrates the perplexity as a function of itera-tions of the Gibbs sampler for aΠ = 200model to fit the dataset respectively It appears from Figure 5 that performance ofmodels (for different settings of parameter 119870) trained usingthe Gibbs sampler appears to stabilize rather quickly (afterabout 100 iterations) at least in terms of perplexity on thedata set This indicates that the perplexity values flatten outafter a 100 or so iterations of the Gibbs sampler

63The Number ofWorkingModesΠ Although the perplex-ity computation is able to be averaged over different Gibbssampler runs other applications of the model rely on theanalysis of each working mode and are based on the analysisof each sample Meanwhile the setting of the parameter Π isalso determined according to the perplexity The parameterΠ represents the number of working modes

0 50 100 150 200 250 30044504500455046004650470047504800485049004950

Perp

lexi

ty

Perplexity

Number of working modes Π

K = 10 Gibbs samples

Figure 6 Perplexity as a function of the parameter Π of the Gibbssampler for 119870 = 10 samples

Figure 6 illustrates the perplexity as a function of theparameter Π in 119870 = 10 Gibbs samples Empirical settingsof the parameter Π show that the average perplexity overthe data set decreases with the increase of the parameterΠ Experimental results confirm that the average perplexityindeed decreases as we made analysis In particular theperplexity values flatten out after the parameter Π is set to200 This indicates that the parameter Π = 200 fits the dataset in the model

64 Analysis of the WCM Results About the analysis of theWCM results we can use the point estimate of the WCMparameters to look at specific Θ Δ and Φ distributions andrelated quantities that can be derived from these parameters(such as the probability of a working place and a working dategiven a randomly selected event fromaworkingmode) In thefollowing results we take a specific sample x

119896 y119896 and z

119896 after

100 iterations from a single arbitrarily selected Gibbs run andthen generate point estimates of Θ Δ and Φ using (9)

There are totally 200 working modes (parameter Π =

200) Each working mode using a WMV helps us to betterunderstand the occurrences of events For the sake of analysiswe list the highest probability working modes for eachworking place and each working date from the WCM inTable 2 In each working mode we list the top 10 eventsmost likely to be generated in the most likely working modeconditioned on both the working place and working dateFor example in the working place of Northern China for themost likely workingmode (numbered 101 in the 200 workingmodes) the top 10 events (OPI SPM EAI HOS BVI MCIAOM APTS CPD and TCI) are most likely to occur in theworking date of June

Experimental results show that different working placeshave different working modes in spite of the same workingdate and the same working place also has different workingmodes for different working dates It indicates that theworking mode is indeed related with the working place andworking date Events related with the pumping system such

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

Mathematical Problems in Engineering 7

119862Π119875 means working mode assigned to working place count

matrix where 119862Π119875

120587119901minus119904119894means the number of events assigned

to working mode 120587 in the working place 119901 excluding theworking mode assignment to event 119890

119904119894 Similarly 119862

Π119879 meansworking mode assigned to working date count matrix where119862Π119879

120587120591minus119904119894means the number of events assigned toworkingmode

120587 in the working date 120591 excluding the working mode assign-ment to event 119890

119904119894 Similarly 119862

119864Π represents event assignedto working mode count matrix where 119862

119864Π

119890120587minus119904119894represents

the number of events from the 119890th entry in the event setassigned to working mode 120587 excluding the topic assignmentto event 119890

119904119894 Meanwhile x

minus119904119894 yminus119904119894

zminus119904119894

eminus119904119894

represents thevector of working place assignment vector of working dateassignment vector of working mode assignments and vectorof event observations in the data set except for the 119894th eventin the 119904th work cycle respectively

The main sampling steps are as follows we first ini-tialize the working place working date and working modeassignments x y and z randomly In each Gibbs samplingiteration we sequentially draw the working mode work-ing place and working date assignment of the 119894th eventfrom the joint conditional distribution in (7) With theincreasing of iterations the Gibbs sampler will approach itsstationary distributionmdashthe posterior distribution 119875(z y z |

119863train 120572 120573)

52 The Posterior Probability Given z y z 119863train 120572 120573 and 120574computing posterior distributions on Θ Δ and Φ is straight-forward Based on the fact that the Dirichlet distribution isconjugate to the multinomial distribution then we can get

120601120587

| z 120573 119863train sim Dilichlet (119862

119864Π

120587+ 120573)

120579119901

| x z 120572 119863train sim Dilichlet (119862

Π119875

119901+ 120572)

120575120591

| y z 120574 119863train sim Dilichlet (119862

Π119879

120591+ 120574)

(8)

where 119862119864Π

120587represents the vector of counts of the number

of times each event has been assigned to working mode120587 119862Π119875

119901and 119862

Π119879

120591are similar to 119862

119864Π

120587 Then we can evaluate

the posterior probability of each element of Θ Δ and Φ asfollows

119864 [120601120587

| z 120573 119863train] =

(119862119864Π

)

119896

+ 120573

sum1198901015840 (119862119864Π

1198901015840120587

)

119896

+ 119864120573

119864 [120579119901| x z 120572 119863train] =

(119862Π119875

)

119896

+ 120572

sum1205911015840 (119862Π119875

1205871015840119901

)

119896

+ 119875120572

119864 [120575120591| y z 120574 119863train] =

(119862Π119879

)

119896

+ 120574

sum (119862Π119879

1205871015840119905

)

119896

+ 119879120574

(9)

where (119862119864Π

)119896 is the matrix of working mode assigned to

event counts exhibited in (z)119896 and 119896 refers to sample 119896

from the Gibbs sampler These posterior probabilities also

Hopper Transportationcylinder

Stirringsystem

Pumpingsystem

Landingleg system

Cantileversystem

Concrete Specifiedlocation

Concrete streamOperation sequence

Related system

Figure 4 The stream of the concrete in the concrete pump truckand the operation sequence of the concrete pump truck at runtime

provide point estimates for Φ Θ and Δ and correspond tothe posterior predictive distribution for the next event froma working mode the next event from a working date and thenext working mode in a work cycle respectively

6 Experimental Evaluation

61 Data Preparation We trained the WCM on a real worlddata set collected from a well-known Chinese constructionmachinery manufacturer The data set is a set of eventsequence data from the concrete pump truck in 6 months(from June 2012 to November 2012) This data set contains119878 = 32 632 work cycles 119875 = 5 different working places119879 = 6 different working dates a total of 119873 = 22 418 756event tokens and an event set size of 119864 = 33 uniqueevents The working date of each work cycle is accordingto its real working month which means the working dateset T = Jun JulAug SepOctNov Because the eventsequence data are all collected in the Chinese Mainlandwe divide the working places into 5 regions according toadministrative region of China Northern China Northeast-ernChina EasternChinaMid-SouthernChina andWesternChina

The concrete pump truck is a type of constructionmachinery which is a truck associated with a concrete pumpIt alternates between two working statuses traveling andpumping In the pumping status it will push the concreteto the specified location In the traveling status it is just atruck In the experiment we mainly focus on events in thepumping status Figure 4 shows the stream of the concrete inthe concrete pump truck at runtime and operation sequenceof different systems in the concrete pump truckThe concretepump truck first switches to pumping status and then unfoldsand fixes the landing leg Next it unfolds cantilever tothe specified location Afterwards the concrete is pouredto the hopper and meanwhile the stirring system initiatesstirring the concrete Finally the pumping system initiatespumping the concrete in the hopper to the specified locationWhen the pumping ends the concrete pump truck stops thepumping system and then folds the cantilever and landing leg

8 Mathematical Problems in Engineering

Table 1 Event set

Event Abbr Type Related systemStop pumping mandatorily SPM Alarm event AllReminder of concrete import RCI Alarm event HopperConcrete piston withdrawing CPW Alarm event Pumping systemReminder of concrete cylinder water RCSW Alarm event HopperSwing cylinder initiate SCI Operation event Pumping systemStalling of engine SoE Alarm event AllAlteration of operation mode (remote or close) AOM Operation event Pumping systemAlteration of pump truck status (pumping or travelling) APTS Operation event AllControl of pumping displacement CPD Operation event Pumping systemTransportation cylinder initiate TCI Operation event Pumping systemManual control of master cylinder MCMC Operation event Pumping systemManual control of swing cylinder MCSC Operation event Pumping systemDetection of system pressure DSP Alarm event Pumping systemManual control of engine speed MCES Operation event Pumping systemHigh pressure mode initiate HPMI Operation event Pumping systemWarm-up initiate WUI Operation event Pumping systemWater pump initiate WPI Operation event HopperConcrete stirring initiate CSI Operation event Stirring systemCantilever folding initiate CFI Operation event Cantilever systemTemperature control initiate TCI Operation event Pumping systemCantilever movement CM Operation event Cantilever systemLanding leg movement LLM Operation event Landing leg systemDetection of oil pressure DOP Alarm event Pumping systemLanding leg folding LLF Operation event Landing leg systemRotary table movement RTM Operation event Cantilever systemOil pump initiate OPI Operation event Pumping systemEnergy accumulator initiate EAI Operation event Pumping systemBypath valve initiate BVI Operation event Pumping systemConcrete pumping initiate CPI Operation event Pumping systemMaster cylinder initiate MCI Operation event Pumping systemCantilever shock absorbers initiate CSAI Alarm event Cantilever systemInitiate of system cooling ISC Operation event Pumping systemHydraulic oil supplement HOS Operation event Pumping system

successively Table 1 shows the relations between systems andevents in the concrete pump truck

Table 1 shows all the events in the event set There aretwo types of events alert event and operation event Theoccurrence of an alarm event is to remind the operator thatsome emergency happens For example the occurrence ofevent RCI means to remind the operator to import concreteinto the hopper The alarm event is not a regular operationThe operation event is the real record of regular operations inthe concrete pump truck

62 Analysis for Gibbs Sampling Using Perplexity As men-tioned earlier in the experiment described in this paperwe donot estimate the hyperparameters120572120573 and 120574 Instead they arefixed at 50Π 001 and 50Π respectively In this paper weuse the perplexity of themodel on test work cycles to evaluatewhen the performance of the model begins to stabilize

The perplexity of new unobserved work cycle 119904 thatcontains events e

119904and is conditioned on the working places

p119904and working dates 120591

119904of the work cycle is defined as

Perplexity (e119904

| p119904 120591119904) = exp(minus

log119875 (e119904

| p119904 120591119904)

119873119904

) (10)

where 119875(e119904

| p119904 120591119904) is the probability assigned by the

WCM To simplify notation here we do not consider theexplicit dependency on the hyperparameters For multiplework cycles we report the average perplexity overwork cyclesdefined as follows

Perplexity =

119878

sum

119904=1

Perplexity (e119904

| p119904 120591119904)

119878

(11)

The lower the perplexity the better the performance of themodel We can obtain an approximate estimate of perplexity

Mathematical Problems in Engineering 9

0 20 40 60 80 100 120 140 160 180 200

4500

4600

4700

4800

4900

5000

5100

5200

5300

Iteration

Perp

lexi

ty

K = 10

K = 8

K = 6

K = 4

K = 2

The number of working modes Π = 200

Figure 5 Perplexity as a function of iterations of the Gibbs samplerfor a Π = 200 model respectively Each curve shows the perplexityfromaveraging for different settings ofΠ but nowover a larger rangeof sampling iterations

by averaging over multiple samples according to (9) calcu-lated as follows

119875 (e119904

| p119904 120591119904)

=

1119870

119870

sum

119896=1

119873119904

prod

119894=1

1119875119904119879119904

sum

119901isinp119904120591isin120591119904120587

119864 [120579120587119901

120575120587120591

120601119890119904119894120587

| x119896 y119896 z119896]

(12)

Experimental results using different values for 119870 indicatedthat 119870 = 10 samples is a reasonable choice to get a goodapproximation of the perplexity Because of the exchangeabil-ity of the working modes it is possible that quite differentsolutions of working modes are detected across differentsamples In practice however we have also found thatthe solutions of working modes are relatively stable acrosssamples with only a small subset of unique working modesappearing in any sampleHencewe use the average perplexityvalues across samples in the experiment

Figure 5 illustrates the perplexity as a function of itera-tions of the Gibbs sampler for aΠ = 200model to fit the dataset respectively It appears from Figure 5 that performance ofmodels (for different settings of parameter 119870) trained usingthe Gibbs sampler appears to stabilize rather quickly (afterabout 100 iterations) at least in terms of perplexity on thedata set This indicates that the perplexity values flatten outafter a 100 or so iterations of the Gibbs sampler

63The Number ofWorkingModesΠ Although the perplex-ity computation is able to be averaged over different Gibbssampler runs other applications of the model rely on theanalysis of each working mode and are based on the analysisof each sample Meanwhile the setting of the parameter Π isalso determined according to the perplexity The parameterΠ represents the number of working modes

0 50 100 150 200 250 30044504500455046004650470047504800485049004950

Perp

lexi

ty

Perplexity

Number of working modes Π

K = 10 Gibbs samples

Figure 6 Perplexity as a function of the parameter Π of the Gibbssampler for 119870 = 10 samples

Figure 6 illustrates the perplexity as a function of theparameter Π in 119870 = 10 Gibbs samples Empirical settingsof the parameter Π show that the average perplexity overthe data set decreases with the increase of the parameterΠ Experimental results confirm that the average perplexityindeed decreases as we made analysis In particular theperplexity values flatten out after the parameter Π is set to200 This indicates that the parameter Π = 200 fits the dataset in the model

64 Analysis of the WCM Results About the analysis of theWCM results we can use the point estimate of the WCMparameters to look at specific Θ Δ and Φ distributions andrelated quantities that can be derived from these parameters(such as the probability of a working place and a working dategiven a randomly selected event fromaworkingmode) In thefollowing results we take a specific sample x

119896 y119896 and z

119896 after

100 iterations from a single arbitrarily selected Gibbs run andthen generate point estimates of Θ Δ and Φ using (9)

There are totally 200 working modes (parameter Π =

200) Each working mode using a WMV helps us to betterunderstand the occurrences of events For the sake of analysiswe list the highest probability working modes for eachworking place and each working date from the WCM inTable 2 In each working mode we list the top 10 eventsmost likely to be generated in the most likely working modeconditioned on both the working place and working dateFor example in the working place of Northern China for themost likely workingmode (numbered 101 in the 200 workingmodes) the top 10 events (OPI SPM EAI HOS BVI MCIAOM APTS CPD and TCI) are most likely to occur in theworking date of June

Experimental results show that different working placeshave different working modes in spite of the same workingdate and the same working place also has different workingmodes for different working dates It indicates that theworking mode is indeed related with the working place andworking date Events related with the pumping system such

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

8 Mathematical Problems in Engineering

Table 1 Event set

Event Abbr Type Related systemStop pumping mandatorily SPM Alarm event AllReminder of concrete import RCI Alarm event HopperConcrete piston withdrawing CPW Alarm event Pumping systemReminder of concrete cylinder water RCSW Alarm event HopperSwing cylinder initiate SCI Operation event Pumping systemStalling of engine SoE Alarm event AllAlteration of operation mode (remote or close) AOM Operation event Pumping systemAlteration of pump truck status (pumping or travelling) APTS Operation event AllControl of pumping displacement CPD Operation event Pumping systemTransportation cylinder initiate TCI Operation event Pumping systemManual control of master cylinder MCMC Operation event Pumping systemManual control of swing cylinder MCSC Operation event Pumping systemDetection of system pressure DSP Alarm event Pumping systemManual control of engine speed MCES Operation event Pumping systemHigh pressure mode initiate HPMI Operation event Pumping systemWarm-up initiate WUI Operation event Pumping systemWater pump initiate WPI Operation event HopperConcrete stirring initiate CSI Operation event Stirring systemCantilever folding initiate CFI Operation event Cantilever systemTemperature control initiate TCI Operation event Pumping systemCantilever movement CM Operation event Cantilever systemLanding leg movement LLM Operation event Landing leg systemDetection of oil pressure DOP Alarm event Pumping systemLanding leg folding LLF Operation event Landing leg systemRotary table movement RTM Operation event Cantilever systemOil pump initiate OPI Operation event Pumping systemEnergy accumulator initiate EAI Operation event Pumping systemBypath valve initiate BVI Operation event Pumping systemConcrete pumping initiate CPI Operation event Pumping systemMaster cylinder initiate MCI Operation event Pumping systemCantilever shock absorbers initiate CSAI Alarm event Cantilever systemInitiate of system cooling ISC Operation event Pumping systemHydraulic oil supplement HOS Operation event Pumping system

successively Table 1 shows the relations between systems andevents in the concrete pump truck

Table 1 shows all the events in the event set There aretwo types of events alert event and operation event Theoccurrence of an alarm event is to remind the operator thatsome emergency happens For example the occurrence ofevent RCI means to remind the operator to import concreteinto the hopper The alarm event is not a regular operationThe operation event is the real record of regular operations inthe concrete pump truck

62 Analysis for Gibbs Sampling Using Perplexity As men-tioned earlier in the experiment described in this paperwe donot estimate the hyperparameters120572120573 and 120574 Instead they arefixed at 50Π 001 and 50Π respectively In this paper weuse the perplexity of themodel on test work cycles to evaluatewhen the performance of the model begins to stabilize

The perplexity of new unobserved work cycle 119904 thatcontains events e

119904and is conditioned on the working places

p119904and working dates 120591

119904of the work cycle is defined as

Perplexity (e119904

| p119904 120591119904) = exp(minus

log119875 (e119904

| p119904 120591119904)

119873119904

) (10)

where 119875(e119904

| p119904 120591119904) is the probability assigned by the

WCM To simplify notation here we do not consider theexplicit dependency on the hyperparameters For multiplework cycles we report the average perplexity overwork cyclesdefined as follows

Perplexity =

119878

sum

119904=1

Perplexity (e119904

| p119904 120591119904)

119878

(11)

The lower the perplexity the better the performance of themodel We can obtain an approximate estimate of perplexity

Mathematical Problems in Engineering 9

0 20 40 60 80 100 120 140 160 180 200

4500

4600

4700

4800

4900

5000

5100

5200

5300

Iteration

Perp

lexi

ty

K = 10

K = 8

K = 6

K = 4

K = 2

The number of working modes Π = 200

Figure 5 Perplexity as a function of iterations of the Gibbs samplerfor a Π = 200 model respectively Each curve shows the perplexityfromaveraging for different settings ofΠ but nowover a larger rangeof sampling iterations

by averaging over multiple samples according to (9) calcu-lated as follows

119875 (e119904

| p119904 120591119904)

=

1119870

119870

sum

119896=1

119873119904

prod

119894=1

1119875119904119879119904

sum

119901isinp119904120591isin120591119904120587

119864 [120579120587119901

120575120587120591

120601119890119904119894120587

| x119896 y119896 z119896]

(12)

Experimental results using different values for 119870 indicatedthat 119870 = 10 samples is a reasonable choice to get a goodapproximation of the perplexity Because of the exchangeabil-ity of the working modes it is possible that quite differentsolutions of working modes are detected across differentsamples In practice however we have also found thatthe solutions of working modes are relatively stable acrosssamples with only a small subset of unique working modesappearing in any sampleHencewe use the average perplexityvalues across samples in the experiment

Figure 5 illustrates the perplexity as a function of itera-tions of the Gibbs sampler for aΠ = 200model to fit the dataset respectively It appears from Figure 5 that performance ofmodels (for different settings of parameter 119870) trained usingthe Gibbs sampler appears to stabilize rather quickly (afterabout 100 iterations) at least in terms of perplexity on thedata set This indicates that the perplexity values flatten outafter a 100 or so iterations of the Gibbs sampler

63The Number ofWorkingModesΠ Although the perplex-ity computation is able to be averaged over different Gibbssampler runs other applications of the model rely on theanalysis of each working mode and are based on the analysisof each sample Meanwhile the setting of the parameter Π isalso determined according to the perplexity The parameterΠ represents the number of working modes

0 50 100 150 200 250 30044504500455046004650470047504800485049004950

Perp

lexi

ty

Perplexity

Number of working modes Π

K = 10 Gibbs samples

Figure 6 Perplexity as a function of the parameter Π of the Gibbssampler for 119870 = 10 samples

Figure 6 illustrates the perplexity as a function of theparameter Π in 119870 = 10 Gibbs samples Empirical settingsof the parameter Π show that the average perplexity overthe data set decreases with the increase of the parameterΠ Experimental results confirm that the average perplexityindeed decreases as we made analysis In particular theperplexity values flatten out after the parameter Π is set to200 This indicates that the parameter Π = 200 fits the dataset in the model

64 Analysis of the WCM Results About the analysis of theWCM results we can use the point estimate of the WCMparameters to look at specific Θ Δ and Φ distributions andrelated quantities that can be derived from these parameters(such as the probability of a working place and a working dategiven a randomly selected event fromaworkingmode) In thefollowing results we take a specific sample x

119896 y119896 and z

119896 after

100 iterations from a single arbitrarily selected Gibbs run andthen generate point estimates of Θ Δ and Φ using (9)

There are totally 200 working modes (parameter Π =

200) Each working mode using a WMV helps us to betterunderstand the occurrences of events For the sake of analysiswe list the highest probability working modes for eachworking place and each working date from the WCM inTable 2 In each working mode we list the top 10 eventsmost likely to be generated in the most likely working modeconditioned on both the working place and working dateFor example in the working place of Northern China for themost likely workingmode (numbered 101 in the 200 workingmodes) the top 10 events (OPI SPM EAI HOS BVI MCIAOM APTS CPD and TCI) are most likely to occur in theworking date of June

Experimental results show that different working placeshave different working modes in spite of the same workingdate and the same working place also has different workingmodes for different working dates It indicates that theworking mode is indeed related with the working place andworking date Events related with the pumping system such

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

Mathematical Problems in Engineering 9

0 20 40 60 80 100 120 140 160 180 200

4500

4600

4700

4800

4900

5000

5100

5200

5300

Iteration

Perp

lexi

ty

K = 10

K = 8

K = 6

K = 4

K = 2

The number of working modes Π = 200

Figure 5 Perplexity as a function of iterations of the Gibbs samplerfor a Π = 200 model respectively Each curve shows the perplexityfromaveraging for different settings ofΠ but nowover a larger rangeof sampling iterations

by averaging over multiple samples according to (9) calcu-lated as follows

119875 (e119904

| p119904 120591119904)

=

1119870

119870

sum

119896=1

119873119904

prod

119894=1

1119875119904119879119904

sum

119901isinp119904120591isin120591119904120587

119864 [120579120587119901

120575120587120591

120601119890119904119894120587

| x119896 y119896 z119896]

(12)

Experimental results using different values for 119870 indicatedthat 119870 = 10 samples is a reasonable choice to get a goodapproximation of the perplexity Because of the exchangeabil-ity of the working modes it is possible that quite differentsolutions of working modes are detected across differentsamples In practice however we have also found thatthe solutions of working modes are relatively stable acrosssamples with only a small subset of unique working modesappearing in any sampleHencewe use the average perplexityvalues across samples in the experiment

Figure 5 illustrates the perplexity as a function of itera-tions of the Gibbs sampler for aΠ = 200model to fit the dataset respectively It appears from Figure 5 that performance ofmodels (for different settings of parameter 119870) trained usingthe Gibbs sampler appears to stabilize rather quickly (afterabout 100 iterations) at least in terms of perplexity on thedata set This indicates that the perplexity values flatten outafter a 100 or so iterations of the Gibbs sampler

63The Number ofWorkingModesΠ Although the perplex-ity computation is able to be averaged over different Gibbssampler runs other applications of the model rely on theanalysis of each working mode and are based on the analysisof each sample Meanwhile the setting of the parameter Π isalso determined according to the perplexity The parameterΠ represents the number of working modes

0 50 100 150 200 250 30044504500455046004650470047504800485049004950

Perp

lexi

ty

Perplexity

Number of working modes Π

K = 10 Gibbs samples

Figure 6 Perplexity as a function of the parameter Π of the Gibbssampler for 119870 = 10 samples

Figure 6 illustrates the perplexity as a function of theparameter Π in 119870 = 10 Gibbs samples Empirical settingsof the parameter Π show that the average perplexity overthe data set decreases with the increase of the parameterΠ Experimental results confirm that the average perplexityindeed decreases as we made analysis In particular theperplexity values flatten out after the parameter Π is set to200 This indicates that the parameter Π = 200 fits the dataset in the model

64 Analysis of the WCM Results About the analysis of theWCM results we can use the point estimate of the WCMparameters to look at specific Θ Δ and Φ distributions andrelated quantities that can be derived from these parameters(such as the probability of a working place and a working dategiven a randomly selected event fromaworkingmode) In thefollowing results we take a specific sample x

119896 y119896 and z

119896 after

100 iterations from a single arbitrarily selected Gibbs run andthen generate point estimates of Θ Δ and Φ using (9)

There are totally 200 working modes (parameter Π =

200) Each working mode using a WMV helps us to betterunderstand the occurrences of events For the sake of analysiswe list the highest probability working modes for eachworking place and each working date from the WCM inTable 2 In each working mode we list the top 10 eventsmost likely to be generated in the most likely working modeconditioned on both the working place and working dateFor example in the working place of Northern China for themost likely workingmode (numbered 101 in the 200 workingmodes) the top 10 events (OPI SPM EAI HOS BVI MCIAOM APTS CPD and TCI) are most likely to occur in theworking date of June

Experimental results show that different working placeshave different working modes in spite of the same workingdate and the same working place also has different workingmodes for different working dates It indicates that theworking mode is indeed related with the working place andworking date Events related with the pumping system such

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

10 Mathematical Problems in Engineering

Table 2 The highest probability working mode for each working place and each working date from the WCM

Working date Probability Working mode EventsWorking place = Northern China

Jun 00251 101 OPI CM EAI RTM BVI SPM AOM APTS CPD and TCIJul 00341 164 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCIAug 00051 62 LLF CFI APTS CSI AOM ISC RTM SPM CPD and TCISep 00342 12 OPI RTM CM CPI BVI LLF SPM AOM APTS and CPDOct 00351 49 RTM OPI CM BVI MCI SPM AOM APTS CPD and TCINov 00353 129 OPI ISC SPM EAI CSI APTS AOM CPD TCI andMCMC

Working place = Northeastern ChinaJun 00258 176 OPI SPM EAI HOS BVI MCI AOM APTS CPD and TCIJul 00263 29 OPI LLF ISC SPM CFI APTS CPI HOS AOM and CPDAug 00141 71 OPI CSI RTM CM ISC APTS SPM AOM CPD and TCISep 00114 111 RTM BVI OPI CM MCI HOS EAI SPM AOM and APTSOct 00146 69 ISC LLF CSI AOM APTS OPI CFI SPM CPD and TCINov 00257 93 RTM OPI BVI MCI CM CPI SPM AOM APTS andCPD

Working place = Eastern ChinaJun 00279 177 OPI HOS CPI SPM LLF RTM EAI BVI AOM and APTSJul 00201 72 OPI EAI CPI SPM MCI RTM HOS AOM APTS and CPDAug 00277 87 OPI BVI EAI RTM AOM SPM MCI APTS CPD and TCISep 00274 9 OPI EAI BVI RTM HOS SPM AOM APTS CPD and TCIOct 00214 191 RTM MCI CPI CM EAI OPI HOS SPM AOM and APTSNov 00255 170 OPI MCI BVI RTM CPI HOS SPM AOM APTS and CPD

Working place = Mid-Southern ChinaJun 00122 74 OPI EAI CSI CPI ISC MCI SPM AOM APTS and CPDJul 00177 33 OPI CPI CM MCI HOS SPM AOM APTS CPD and TCIAug 00262 187 HOS MCI CPI OPI EAI BVI CSI SPM AOM and APTSSep 00205 104 RTM EAI BVI OPI SPM MCI CFI APTS AOM and CPDOct 00193 39 OPI HOS BVI CM RTM SPM AOM APTS CPD and TCINov 00133 158 OPI BVI RTM MCI CM SPM AOM APTS CPD and TCI

Working place = Western ChinaJun 00037 4 OPI RTM BVI CM EAI SPM CPI MCI AOM and APTSJul 00134 144 HOS MCI CPI OPI CFI EAI SPM AOM APTS and CPDAug 00126 126 OPI SPM CM BVI AOM LLF APTS CSI CPD and TCISep 00122 88 OPI HOS CPI CM LLF AOM CFI MCI BVI and SPMOct 00104 37 OPI EAI MCI HOS CSI ISC CFI LLF SPM and AOMNov 00135 78 OPI HOS RTM BVI CSI EAI MCI APTS AOM and SPM

as OPI MCI and CPI are most likely to occur in mostworking modes which indicates that the working modesof the concrete pump truck are consistent with the actualsituations Meanwhile events related with the cantileversystem and landing leg system such as LLF and CFI have lessoccurrences as compared with events of the pumping systemMoreover in the working date of summer (working date =June July and August) the alert event SPM is more likelyto occur which indicates that the concrete pump truck morelikely fails in the hot climate The operation event AOM ismore likely to occur which indicates that the operators preferto operate the concrete pump truck in the remote manner

Because the probability of working mode reflects theprobability of its occurrence we can analyze the workloads of different working places in different working dates

According to the probability of the working mode in Table 2we can find that the working modes in the working placeof Eastern China are more likely to occur than the workingmodes in the working place of Western China It indicatesthat the concrete pump trucks in the working place of EasternChina have more work loads than that in the working placeof Western China Meanwhile the concrete pump trucks inthe working date of June have more work loads than thatin the working date of November Generally we can analyzedifferent working modes according to the probability

65 Illustrative Applications for the WCM In this section weprovide some illustrative examples of how the WCM can beused to answer different types of questions and predictionproblems concerning working modes of the equipment

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

Mathematical Problems in Engineering 11

651 Automated Detection for a New Work Cycle In realcases we would like to quickly assess working mode assign-ments for new work cycles not contained in the training dataset especially for the real-time event sequence flow

Our automated detection strategy is to apply the Gibbssampling algorithm that runs only on the event tokens inthe new work cycle instead of rerunning the algorithm forevery new work cycle again Afterwards the event tokens inthe new work cycles are quickly assigned to the most likelyworking places working dates andworkingmodesThemainprocedure is as follows first we start by assigning eventsrandomly to working places working dates and workingmodes second we then sample new assignments of eventsby applying the Gibbs sampler only to the event tokens in thenew work cycle each time temporarily updating the countmatrices 119862

119864Π 119862Π119875 and 119862Π119879 shown in (7)

Table 3 shows the occurrences of events for a new workcycle After the sampling the WCM has assigned each eventto its most likely working mode Table 3 illustrates the top3 most likely working modes assigned to each event for thenew work cycle Note that each event is assigned to differentworkingmodes according to its occurrence count Accordingto (7) although events of this new work cycle are assigned todifferent workingmodes they are assigned to the number 107working mode with the probability 00003 The top 10 mostlikely events in the number 107 working mode are shown asfollows

RTM CM OPI BVI SPM CPI MCI SCI ISC andSoE

The automated detection result for the new work cycle isindeed consistent with the actual situations in comparisonwith the real occurrences of events

652 Automated Detection of Anomalous Work Cycles Weillustrate in this section how our model could be useful fordetecting anomalous work cycles A work cycle assigned toa working mode with low probability is considered as ananomalous work cycle

We also take the work cycle as an example for theautomated detection of an anomalous work cycle shownin Table 3 The work cycle is assigned to the number 107workingmodewith the probability 00003 As comparedwithmost of other working modes number 107 working modehas lower probability so this work cycle is detected as ananomalous work cycle The alert events SPM and SoE havefrequent occurrences both in the work cycle and in number107 working mode which indicates that this work cycle isan anomalous work cycle Meanwhile we analyzed the realfailure records and confirmed that the engine indeed failedfrequently during thiswork cycle Generally these anomalouswork cycles can be automatically detected efficiently with thehelp of the WCM

7 Conclusions and Future Work

The working condition model proposed in this paper pro-vides a relatively simple probabilistic model for exploring

Table 3 Actual example of automated detection for a new workcycle Each event is assigned to its most likely working modeaccording to its corresponding occurrence count In the table welist the top 3 most likely working modes for each event for the newwork cycle

Top 3 most likely working modesWorking date = Jun working place = Eastern China

Event Count First Second ThirdSPM 72 107 181 112AOM 33 169 67 183APTS 23 90 15 76CPD 42 145 139 59TCI 2 118 134 112MCMC 0 Null Null NullMCSC 0 Null Null NullDSP 0 Null Null NullMCES 0 Null Null NullHPMI 0 Null Null NullWUI 2 159 104 77WPI 23 54 175 71CSI 55 147 29 61CFI 25 2 132 100TCI 23 95 185 53CM 127 12 49 192LLM 55 189 114 23RCI 0 Null Null NullDOP 0 Null Null NullLLF 40 111 10 42RTM 297 191 104 52CPW 0 Null Null NullRCSW 0 Null Null NullOPI 95 177 176 101EAI 56 126 100 170BVI 77 177 53 146CPI 60 164 104 149MCI 60 177 175 162SCI 66 120 149 73CSAI 0 Null Null NullISC 51 68 149 23HOS 0 Null Null NullSoE 33 119 112 107

the relationships between working place working placeworking mode and events in a work cycle This modelprovides significantly improved predictive power in termsof the analysis of working condition according to the eventsequence data

Our future works mainly include the optimization of themodel the model training and the conduction experimentson different data sets Furthermore the further analysis of

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

12 Mathematical Problems in Engineering

the anomalous work cycles detected by our model is also aninteresting question

Notations Associated with the WCMAs Used in This Paper

P Working places of all the work cycles (set)T Working dates of all the work cycles (set)p119904 Working places of the 119904th work cycle

(119875119904-dimensional vector)

119875119904 Number of working places of the 119904th work

cycle (Scalar)120591119904 Working dates of the 119904th work cycle

(119879119904-dimensional vector)

119879119904 Number of working dates of the 119904th work

cycle (Scalar)119875 Number of working places (Scalar)119878 Number of work cycles (Scalar)119879 Number of working dates (Scalar)119873119904 Number of events in the 119904th work cycle

(Scalar)119873 Number of events in all the event

sequences (Scalar)Π Number of working modes (Scalar)119864 Number of events in the event set (Scalar)e119904 Event sequence vector for the 119904th work

cycle (119873119904-dimensional vector)

119890119904119894 119894th event in the 119904th work cycle (119894th

component of vector e119904)

x Working place assignments(119873-dimensional vector)

119909119904119894 Working place assignment for event 119890

119904119894

(119894th component of vector x119904)

y Working date assignments(119873-dimensional vector)

119910119904119894 Working date assignment for event 119890

119904119894(119894th

component of vector y119904)

z Working mode assignments(119873-dimensional vector)

119911119904119894 Working mode assignment for event 119890

119904119894

(119894th component of vector z119904)

120572 120573 120574 Dirichlet prior (Scalar)Φ Probabilities of events given working

modes (119864 times Π matrix)120601120587 Probabilities of events given working

mode 120587 (119864-dimensional vector)Θ Probabilities of working modes given

working places (Π times 119875 matrix)120579119901 Probabilities of working modes given

working place 119901 (Π-dimensional vector)Δ Probabilities of working modes given

working dates (Π times 119879 matrix)120575120591 Probabilities of working modes given

working dates 120591 (Π-dimensional vector)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J Holler V Tsiatsis CMulligan S Avesand S Karnouskos andD Boyle From Machine-to-Machine to the Internet of ThingsIntroduction to a New Age of Intelligence Academic Press 2014

[2] C Perera A Zaslavsky P Christen and D GeorgakopoulosldquoSensing as a service model for smart cities supported by Inter-net of Thingsrdquo Transactions on Emerging TelecommunicationsTechnologies vol 25 no 1 pp 81ndash93 2014

[3] R F Mesquita Brandao and J A Beleza Carvalho ldquoTheimportance of control monitoring systems in wind parksmaintenancerdquo British Journal of Applied Science amp Technologyvol 4 no 10 pp 1461ndash1471 2014

[4] C J Crabtree D Zappala and P J Tavner ldquoSurvey of com-mercially available condition monitoring systems for windturbinesrdquo Tech Rep Durham University 2014

[5] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[6] S Kandula R Mahajan P Verkaik S Agarwal J Padhyeand P Bahl ldquoDetailed diagnosis in enterprise networksrdquo inProceedings of the ACM SIGCOMM Conference on Data Com-munication (SIGCOMMrsquo09) vol 39 pp 243ndash254ACMAugust2009

[7] J-G Lou Q Fu Y Wang and J Li ldquoMining dependency indistributed systems through unstructured logs analysisrdquo ACMSIGOPSOperating Systems Review vol 44 no 1 pp 91ndash96 2010

[8] C Luo J-G Lou Q Lin et al ldquoCorrelating events with timeseries for incident diagnosisrdquo in Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo14) pp 1583ndash1592 ACM August 2014

[9] J Chen and R Kumar ldquoOnline failure diagnosis of stochasticdiscrete event systemsrdquo in Proceedings of the IEEE ConferenceonComputerAidedControl SystemDesign (CACSD rsquo13) pp 194ndash199 IEEE August 2013

[10] J Chen and R Kumar ldquoFailure diagnosis of discrete-timestochastic systems subject to temporal logic correctness require-mentsrdquo in Proceedings of the 11th IEEE International Conferenceon Networking Sensing and Control (ICNSC rsquo14) pp 42ndash47IEEE April 2014

[11] Business ProcessModel and Notation (BPMN) Version 20 OMGSpecification Object Management Group 2011

[12] F Leymann ldquoBpel vs bpmn 20 should you carerdquo in BusinessProcess Modeling Notation pp 8ndash13 Springer Berlin Germany2011

[13] C C Aggarwal Managing and Mining Sensor Data Springer2013

[14] N H Gehani H V Jagadish andO Shmueli ldquoComposite eventspecification in active databasesmodel and implementationrdquo inProceedings of the 18th VLDBConference Vancouver (VLDB rsquo92)vol 92 pp 327ndash338 Citeseer British Columbia Canada 1992

[15] I Davidson S Gilpin and P B Walker ldquoBehavioral event dataand their analysisrdquo Data Mining and Knowledge Discovery vol25 no 3 pp 635ndash653 2012

[16] J Han and M Kamber Data Mining Southeast Asia EditionConcepts and Techniques Morgan Kaufmann 2006

[17] H RMotahari-Nezhad R Saint-Paul F Casati and B Benatal-lah ldquoEvent correlation for process discovery from web serviceinteraction logsrdquoThe VLDB Journal vol 20 no 3 pp 417ndash4442011

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

Mathematical Problems in Engineering 13

[18] F Skopik and R Fiedler ldquoIntrusion detection in distributedsystems using fingerprinting and massive event correlationrdquo inGI-Jahrestagung pp 2240ndash2254 2013

[19] G A Wilkin P Eugster and K R Jayaram ldquoDecentralizedfault-tolerant event correlationrdquo ACM Transactions on InternetTechnology vol 14 no 1 article 5 2014

[20] H Wei ldquoA correlation analysis method for network securityeventsrdquo in Informatics and Management Science III vol 206 ofLecture Notes in Electrical Engineering pp 269ndash277 SpringerLondon UK 2013

[21] W Van Der Aalst A Adriansyah A K A de Medeiros etal ldquoProcess mining manifestordquo in Usiness Process ManagementWorkshops pp 169ndash194 Springer Berlin Germany 2012

[22] J C A M Buijs B F van Dongen and W M P van der AalstldquoMining configurable process models from collections of eventlogsrdquo inBusiness ProcessManagement pp 33ndash48 Springer 2013

[23] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[24] J Wang R K Wong J Ding Q Guo and L Wen ldquoOnrecommendation of process mining algorithmsrdquo in Proceedingsof the IEEE 19th International Conference onWeb Services (ICWSrsquo12) pp 311ndash318 IEEE Honolulu Hawaii USA June 2012

[25] R S Mans W M P van der Aalst and H M W VerbeekldquoSupporting process mining workflows with rapidpromrdquo inProceedings of the Business Process Management Demo Sessions(BPMD rsquo14) vol 1295 pp 56ndash60 Eindhoven The NetherlandsSeptember 2014

[26] C Li M Reichert and A Wombacher ldquoMining businessprocess variants challenges scenarios algorithmsrdquo Data ampKnowledge Engineering vol 70 no 5 pp 409ndash434 2011

[27] R Accorsi T Stocker and G Muller ldquoOn the exploitation ofprocess mining for security audits the process discovery caserdquoin Proceedings of the 28th Annual ACM Symposium on AppliedComputing pp 1462ndash1468 ACM March 2013

[28] B-J Lee S-G Park K-B Min et al ldquoThe relationship betweenworking condition factors and well-beingrdquo Annals of Occupa-tional and Environmental Medicine vol 26 no 1 article 342014

[29] J Cohen Statistical Power Analysis for the Behavioral SciencesRoutledge Academic New York NY USA 2013

[30] P Bahl R Chandra A Greenberg S Kandula D A Maltz andM Zhang ldquoTowards highly reliable enterprise network servicesvia inference of multi-level dependenciesrdquo ACM SIGCOMMComputer Communication Review vol 37 no 4 pp 13ndash24 2007

[31] B Rosner Fundamentals of Biostatistics Cengage Learning2010

[32] A Zimmermann ldquoColored petri netsrdquo in Stochastic DiscreteEvent Systems Modeling Evaluation Applications pp 99ndash124Springer 2008

[33] A Adriansyah B F van Dongen and W M P van der AalstldquoTowards robust conformance checkingrdquo in Business ProcessManagement Workshops vol 66 of Lecture Notes in BusinessInformation Processing pp 122ndash133 Springer Berlin Germany2011

[34] MWeidlich andMWeske Business Process Modeling NotationSpringer Berlin Germany 2010

[35] C M Bishop and J Lasserre ldquoGenerative or discriminativeGetting the best of both worldsrdquo in Bayesian Statistics J MBernardo M J Bayarri J O Berger et al Eds vol 8 pp 3ndash23 Oxford University 2007

[36] C M Bishop Pattern Recognition and Machine LearningVolume 1 Springer New York NY USA 2006

[37] D M Blei and J D Lafferty ldquoDynamic topic modelsrdquo inProceedings of the 23rd International Conference on MachineLearning (ICML rsquo06) pp 113ndash120 ACM June 2006

[38] J Foulds L Boyles C DuBois P Smyth and M WellingldquoStochastic collapsed variational Bayesian inference for latentdirichlet allocationrdquo in Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 446ndash454 ACM 2013

[39] J Pearl Bayesian Networks Department of Statistics UCLA2011

[40] I Porteous D Newman A Ihler A Asuncion P Smythand M Welling ldquoFast collapsed gibbs sampling for latentdirichlet allocationrdquo in Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD rsquo08) pp 569ndash577 ACM August 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article Modeling the Process of Event Sequence ...downloads.hindawi.com/journals/mpe/2015/693450.pdf · Research Article Modeling the Process of Event Sequence Data Generated

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of