This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleHidden Semi-Markov Models for Predictive Maintenance
Francesco Cartella1 Jan Lemeire1 Luca Dimiccoli1 and Hichem Sahli12
1Electronics and Informatics Department (ETRO) Vrije Universiteit Brussel (VUB) Plainlaan 2 1050 Brussels Belgium2Interuniversity Microelectronics Center (IMEC) Kapeldreef 75 3001 Leuven Belgium
Correspondence should be addressed to Francesco Cartella fcartelletrovubacbe
Received 9 October 2014 Accepted 28 December 2014
Academic Editor Hang Xu
Copyright copy 2015 Francesco Cartella et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
Realistic predictive maintenance approaches are essential for condition monitoring and predictive maintenance of industrialmachines In this work we propose Hidden Semi-Markov Models (HSMMs) with (i) no constraints on the state duration densityfunction and (ii) being applied to continuous or discrete observation To deal with such a type of HSMM we also proposemodifications to the learning inference and prediction algorithms Finally automaticmodel selection has beenmade possible usingthe Akaike Information Criterion This paper describes the theoretical formalization of the model as well as several experimentsperformed on simulated and real data with the aim of methodology validation In all performed experiments the model is ableto correctly estimate the current state and to effectively predict the time to a predefined event with a low overall average absoluteerror As a consequence its applicability to real world settings can be beneficial especially where in real time the Remaining UsefulLifetime (RUL) of the machine is calculated
1 Introduction
Predictive models that are able to estimate the currentcondition and the Remaining Useful Lifetime of an industrialequipment are of high interest especially for manufacturingcompanies which can optimize their maintenance strategiesIf we consider that the costs derived from maintenance areone of the largest parts of the operational costs [1] and thatoften the maintenance and operations departments compriseabout 30of themanpower [2 3] it is not difficult to estimatethe economic advantages that such innovative techniquescan bring to industry Moreover predictive maintenancewhere in real time the Remaining Useful Lifetime (RUL) ofthe machine is calculated has been proven to significantlyoutperforms other maintenance strategies such as correctivemaintenance [4] In this work RUL is defined as the timefrom the current moment that the systems will fail [5]Failure in this context is defined as a deviation of thedelivered output of a machine from the specified servicerequirements [6] that necessitate maintenance
Models like Support Vector Machines [7] DynamicBayesian Networks [8] clustering techniques [9] and datamining approaches [10] have been successfully applied to
condition monitoring RUL estimation and predictive main-tenance problems [11 12] State space models like HiddenMarkov Models (HMMs) [13] are particularly suitable to beused in industrial applications due to their ability to modelthe latent state which represents the health condition of themachine
Classical HMMs have been applied to condition assess-ment [14 15] however their usage in predictive maintenancehas not been effective due to their intrinsic modeling of thestate duration as a geometric distribution
To overcome this drawback a modified version of HMMwhich takes into account an estimate of the duration in eachstate has been proposed in the works of Tobon-Mejia et al[16ndash19] Thanks to the explicit state sojourn time modelingit has been shown that it is possible to effectively estimatethe RUL for industrial equipment However the drawbackof their proposed HMM model is that the state duration isalways assumed as Gaussian distributed and the durationparameters are estimated empirically from the Viterbi pathof the HMM
A complete specification of a duration model togetherwith a set of learning and inference algorithms has beengiven firstly by Ferguson [20] In his work Ferguson allowed
Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 278120 23 pageshttpdxdoiorg1011552015278120
2 Mathematical Problems in Engineering
the underlying stochastic process of the state to be a semi-Markov chain instead of a simple Markov chain of a HMMSuch model is referred to as Hidden Semi-Markov Model(HSMM) [21] HSMMs and explicit duration modeles havebeen proven beneficial for many applications [22ndash25] Acomplete overview of different duration model classes hasbeenmade byYu [26]Most state durationmodels used in theliterature are nonparametric discrete distributions [27ndash29]As a consequence the number of parameters that describe themodel and that have to be estimated is high and consequentlythe learning procedure can be computationally expensive forreal complex applications Moreover it is necessary to specifya priori the maximum duration allowed in each state
To alleviate the high dimensionality of the parameterspace parametric duration models have been proposed Forexample Salfner [6] proposed a generic parametric continu-ous distribution to model the state sojourn time However intheir model the observation has been assumed to be discreteand applied to recognize failure-prone observation sequenceUsing continuous observation Azimi et al [30ndash32] specifiedan HSMM with parametric duration distribution belongingto the Gamma family and modeled the observation processby a Gaussian
Inspired by the latter two approaches in this workwe pro-pose a generic specification of a parametric HSMM in whichno constraints are made on the model of the state durationand on the observation processes In our approach the stateduration ismodeled as a generic parametric density functionOn the other hand the observations can be modeled eitheras a discrete stochastic process or as continuous mixtureof Gaussians The latter has been shown to approximatearbitrarily closely any finite continuous density function[33] The proposed model can be generally used in a widerange of applications and types of data Moreover in thispaper we introduce a new and more effective estimator ofthe time spent by the system in a determinate state priorto the current time To the best of our knowledge a partfrom the above referred works the literature on HSMMsapplied to prognosis and predictive maintenance for indus-trial machines is limited [34] Hence the present work aimsto show the effectiveness of the proposed duration model insolving condition monitoring and RUL estimation problems
Dealing with state space models and in particular ofHSMMs one should define the number of states and cor-rect family of duration density and in case of continuousobservations the adequate number of Gaussian mixturesSuch parameters play a prominent role since the right modelconfiguration is essential to enable an accurate modelingof the dynamic pattern and the covariance structure of theobserved time series The estimation of a satisfactory modelconfiguration is referred to asmodel selection in literature
While several state-of-the-art approaches use expertknowledge to get insight on the model structure [15 3536] an automated methodology for model selection is oftenrequired In the literature model selection has been deeplystudied for a wide range of models Among the existingmethodologies information based techniques have beenextensively analyzed in literature with satisfactory results
Although Bayesian Information Criterion (BIC) is particu-larly appropriate to be used in finite mixture models [37 38]Akaike Information Criterion (AIC) has been demonstratedto outperform BIC when applied to more complex modelsand when the sample size is limited [39 40] which is the caseof the target application of this paper
In this work AIC is used to estimate the correct modelconfiguration with the final goal of an automated HSMMsmodel selection which exploits only the information avail-able in the input dataWhile model selection techniques havebeen extensively used in the framework of Hidden MarkovModels [41ndash43] to the best of our knowledge the presentwork is the first that proposes their appliance to durationmodels and in particular to HSMMs
In summary the present work contributes to conditionmonitoring predictive maintenance and RUL estimationproblems by
(i) proposing a general Hidden Semi-Markov Modelapplicable for continuous or discrete observations andwith no constraints on the density function used tomodel the state duration
(ii) proposing a more effective estimator of the stateduration variable 119889
119905(119894) that is the time spent by the
system in the 119894th state prior to current time 119905(iii) adapting the learning inference and prediction algo-
rithms considering the defined HSMM parametersand the proposed 119889
119905(119894) estimator
(iv) using the Akaike Information Criterion for automaticmodel selection
The rest of the paper is organized as follows in Section 2we introduce the theory of the proposed HSMM togetherwith its learning inference and prediction algorithmsSection 3 gives a short theoretical overview of the AkaikeInformation Criterion Section 4 presents the methodologyused to estimate the Remaining Useful Lifetime using theproposed HSMM In Section 5 experimental results arediscussed The conclusion and future research directions aregiven in Section 6
2 Hidden Semi-Markov Models
Hidden Semi-Markov Models (HSMMs) introduce the con-cept of variable duration which results in a more accuratemodeling power if the system being modeled shows a depen-dence on time
In this section we give the specification of the proposedHSMM for which we model the state duration with a para-metric state-dependent distribution Compared to nonpara-metric modeling this approach has two main advantages
(i) the model is specified by a limited number of param-eters as a consequence the learning procedure iscomputationally less expensive
(ii) the model does not require the a priori knowledgeof the maximum sojourn time allowed in each statebeing inherently learnt through the duration distribu-tion parameters
Mathematical Problems in Engineering 3
21 Model Specification A Hidden Semi-Markov Model isa doubly embedded stochastic model with an underlyingstochastic process that is not observable (hidden) but canonly be observed through another set of stochastic processesthat produce the sequence of observations HSMMallows theunderlying process to be a semi-Markov chain with a variableduration or sojourn time for each state The key conceptof HSMMs is that the semi-Markov property holds for thismodel while in HMMs the Markov property implies that thevalue of the hidden state at time 119905 depends exclusively on itsvalue of time 119905 minus 1 in HSMMs the probability of transitionfrom state 119878
119895to state 119878
119894at time 119905depends on the duration spent
in state 119878119895prior to time 119905
In the following we denote the number of states in themodel as119873 the individual states as 119878 = 119878
1 119878
119873 and the
state at time 119905 as 119904119905 The semi-Markov property can be written
as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119895 119904
1= 119878
119896)
= P (119904119905+1
= 119894 | 119904119905= 119895 119889
119905(119895)) 1 le 119894 119895 119896 le 119873
(1)
where the duration variable 119889119905(119895) is defined as the time spent
in state 119878119895prior to time 119905
Although the state duration is inherently discrete inmany studies [44 45] it has been modeled with a continuousparametric density function Similar to the work of Azimiet al [30ndash32] in this paper we use the discrete counterpartof the chosen parametric probability density function (pdf)With this approximation if we denote the pdf of the sojourntime in state 119878
119894as 119891(119909 120579
119894) where 120579
119894represents the set of
parameters of the pdf relative to the 119894th state the probabilitythat the system stays in state 119878
119894for exactly 119889 time steps
can be calculated as int119889
119889minus1119891(119909 120579
119894)119889119909 Considering the HSMM
formulation we can generally denote the state dependentduration distributions by the set of their parameters relativeto each state as Θ = 120579
1 120579
119873
Many related works on HSMMs [31 32 44 45] consider119891(119909 120579
119894) within the exponential family In particular Gamma
distributions are oftenused in speech processing applicationsIn this work we do not impose a type of distribution functionto model the duration The only requirement is that theduration should be modeled as a positive function beingnegative durations physically meaningless
HSMMs require also the definition of a ldquodynamicrdquo tran-sitionmatrix as a consequence of the semi-Markov propertyDifferently from the HMMs in which a constant transitionprobability leads to a geometric distributed state sojourntime HSMMs explicitly define a transition matrix whichdepending on the duration variable has increasing probabil-ities of changing state as the time goes on For conveniencewe specify the state duration variable in a form of a vector d
119905
with dimensions119873 times 1 as
d119905=
119889119905(119895) if 119904
119905= 119878
119895
1 if 119904119905
= 119878119895
(2)
The quantity 119889119905(119895) can be easily calculated by induction from
119889119905minus1
(119895) as
119889119905(119895) = 119904
119905(119895) sdot 119904
119905minus1(119895) sdot 119889
119905minus1(119895) + 1 (3)
where 119904119905(119895) is 1 if 119904
119905= 119878
119895 0 otherwise
If we assume that at time 119905 the system is in state 119878119894 we can
formally define the duration-dependent transition matrix asAd119905
= [119886119894119895(d
119905)] with
119886119894119895(d
119905) = P (119904
119905+1= 119878
119895| 119904
119905= 119878
119894 119889
119905(119894)) 1 le 119894 119895 le 119873 (4)
The specification of themodel can be further simplified byobserving that at each time 119905 the matrix Ad
119905
can be decom-posed in two terms the recurrent and the nonrecurrent statetransition probabilities
The recurrent transition probabilities P(d119905) = [119901
119894119895(d
119905)]
which depend only on the duration vector d119905and the
parameters Θ take into account the dynamics of the self-transition probabilities It is defined as the probability ofremaining in the current state at the next time step given theduration spent in the current state prior to time 119905
119901119894119894(d
119905) = P (119904
119905+1= 119878
119894| 119904
119905= 119878
119894 119889
119905(119894))
= P (119904119905+1
= 119878119894|
119904119905= 119878
119894 119904
119905minus1= 119878
119894 119904
119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894)
= (P (119904119905+1
= 119878119894 119904
119905= 119878
119894 119904
119905minus119889119905(119894)+2
= 119878119894|
119904119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894))
sdot (P (119904119905= 119878
119894 119904
119905minus1= 119878
119894 119904
119905minus119889119905(119894)+2
= 119878119894|
119904119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894))
minus1
(5)
The denominator in (5) can be expressed as suminfin
119896=1P(119904
119905+119896=
119878119894 119904
119905+119896minus1= 119878
119894 119904
119905minus119889119905(119894)+2
= 119878119894| 119904
119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894)
which is the probability that the system at time 119905 has beenstaying in state 119878
119894for at least 119889
119905(119894) minus 1 time units The above
expression is equivalent to 1 minus 119865(119889119905(119894) minus 1 120579
119894) where 119865(sdot 120579
119894)
is the duration cumulative distribution function relative tothe the state 119878
119894 that is 119865(119889 120579) = int
119889
minusinfin119891(119909 120579)119889119909 As a
consequence from (5) we can define the recurrent transitionprobabilities as a diagonal matrix with dimensions119873times119873 as
P (d119905) = [119901
119894119895(d
119905)] =
1 minus 119865 (119889119905(119894) 120579
119894)
1 minus 119865 (119889119905(119894) minus 1 120579
119894)
if 119894 = 119895
0 if 119894 = 119895
(6)
The usage of the cumulative functions in (6) which tendto 1 as the duration tends to infinity suggests that theprobability of self-transition tends to decrease as the sojourntime increases leading the model to always leave the currentstate if time approaches infinity
The nonrecurrent state transition probabilities A0=
[1198860
119894119895] rule the transitions between two different states It is
4 Mathematical Problems in Engineering
represented by a 119873 times 119873 matrix with the diagonal elementsequal to zero defined as
A0= [119886
0
119894119895] =
0 if 119894 = 119895
P (119904119905+1
= 119878119895| 119904
119905= 119878
119894) if 119894 = 119895
(7)
A0 must be specified as a stochastic matrix that is its ele-ments have to satisfy the constraintsum119873
119895=11198860
119894119895= 1 for all 119894
As a consequence of the above decomposition thedynamic of the underlying semi-Markov chain can be definedby specifying only the state-dependent duration parametersΘ and the nonrecurrentmatrixA0 since themodel transitionmatrix can be calculated at each time 119905 using (6) and (7)
Ad119905
= P (d119905) + (I minus P (d
119905))A0
(8)
where I is the identity matrix If we denote the elements ofthe dynamic transition matrix Ad
119905
as 119886119894119895(d
119905) the stochastic
constraint sum119873
119895=1119886119894119895(d
119905) = 1 for all 119894 and 119905 is guaranteed from
the fact that P(d119905) is a diagonal matrix and A0 is a stochastic
matrixFor several applications it is necessary to model the
absorbing state which in the case of industrial equipmentcorresponds to the ldquobrokenrdquo or ldquofailurerdquo state If we denote theabsorbing state as 119878
119896with 119896 isin [1119873] we must fix the 119896th row
of the nonrecurrentmatrixA0 to be 1198860119896119896= 1 and 1198860
119896119894= 0 for all
1 le 119894 le 119873 with 119894 = 119896 By substituting such A0 matrix in (8)it is easy to show that the element 119886
119896119896(d
119905) = 1 and remains
constant for all 119905 while the duration probability parameters120579119896are not influent for the absorbing state 119878
119896 An example of
absorbing state specification will be given in Section 5With respect to the input observation signals in this work
we consider both continuous and discrete data by adaptingthe suitable observationmodel depending on the observationnature In particular for the continuous case we modelthe observations with a multivariate mixture of Gaussiansdistributions This choice presents two main advantages (i)a multivariate model allows to deal with multiple observa-tions at the same time this is often the case of industrialequipments modeling since at each time multiple sensorsrsquomeasurements are available and (ii) mixture of Gaussians hasbeen proved to closely approximate any finite and continuousdensity function [33] Formally if we denote by x
119905the
observation vector at time 119905 and the generic observationvector beingmodeled as x the observation density for the 119895thstate is represented by a finite mixture of119872 gaussians
119887119895(x) =
119872
sum
119898=1
119888119895119898N (x120583
119895119898U
119895119898) 1 le 119895 le 119873 (9)
where 119888119895119898
is the mixture coefficient for the 119898th mixture instate 119878
119895 which satisfies the stochastic constraintsum119872
119898=1119888119895119898
= 1
for 1 le 119895 le 119873 and 119888119895119898
ge 0 for 1 le 119895 le 119873 and 1 le 119898 le 119872while N is the Gaussian density with mean vector 120583
119895119898and
covariance matrix U119895119898
for the 119898th mixture component instate 119895
In case of discrete data we model the observationswithin each state with a nonparametric discrete probability
distribution In particular if 119871 is the number of distinctobservation symbols per state and if we denote the symbolsas 119883 = 119883
1 119883
119871 and the observation at time 119905 as 119909
119905 the
observation symbol probability distribution can be defined asa matrix 119861 = [119887
119895(119897)] of dimensions119873 times 119871 where
119887119895(119897) = P [119909
119905= 119883
119897| 119904
119905= 119878
119895] 1 le 119895 le 119873 1 le 119897 le 119871 (10)
Since the system in each state at each time step can emit oneof the possible 119871 symbols the matrix 119861 is stochastic that is itis constrained to sum119871
119897=1119887119895(119897) = 1 for all 1 le 119895 le 119873
Finally as in the case of HMMs we specify the initial statedistribution 120587 = 120587
119894 which defines the probability of the
starting state as
120587119894= P [119904
1= 119878
119894] 1 le 119894 le 119873 (11)
From the above considerations two different HSMMmodels can be considered In the case of continuous obser-vation 120582 = (A0
Θ 119862 120583 119880 120587) and in the case of discreteobservation the HSMM is characterized by 120582 = (A0
Θ 119861 120587)An example of continuous HSMM with 3 states is shown inFigure 1
22 Learning and Inference Algorithms Let us denote thegeneric sequence of observations being indiscriminatelycontinuous vectors or discrete symbols as x = x
1x2sdot sdot sdot x
119879 in
order to use the defined HSMM model in practice similarlyto the HMM we need to solve three basic problems
(1) Given the observation x and a model 120582 calculate theprobability that the sequence x has been generated bythe model 120582 that is P(x | 120582)
(2) Given the observation x and a model 120582 calculatethe state sequence 119878 = 119904
11199042sdot sdot sdot 119904
119879which have most
probably generated the sequence x(3) Given the observation x find the parameters of the
model 120582 which maximize P(x | 120582)
As in case of HMM solving the above problems requiresusing the forward-backward [13] decoding (Viterbi [46] andForney [47]) and ExpectationMaximization [48] algorithmswhichwill be adapted to theHSMMintroduced in Section 21In the following we also propose a more effective estimatorof the state duration variable 119889
119905(119894) defined in (2)
221 The Forward-Backward Algorithm Given a genericsequence of observations x = x
1x2sdot sdot sdot x
119879 the goal is to
calculate the model likelihood that isP(x | 120582) This quantityis useful for the training procedure where the parametersthat locally maximize the model likelihood are chosen aswell as for classification problems The latter is the case inwhich the observation sequence x has to be mapped to oneof a finite set of 119862 classes represented by a set of HSMMparameters 119871 = 120582
1 120582
119862 The class of x is chosen such
that 120582(x) = argmax120582isin119871
P(119883 | 120582)To calculate the model likelihood we first define the
forward variable at each time 119905 as
120572119905(119894) = P (x
1x2sdot sdot sdot x
119905 119904
119905= 119878
119894| 120582) 1 le 119894 le 119873 (12)
Mathematical Problems in Engineering 5
Hiddenstates
Observationprobabilities
Sojournprobabilities
Time (u) Time (u) Time (u)
Observed Observed Observed
S1 S2 S3
a12 a23
P(oS1) P(oS2) P(oS3)
d3(u)d2(u)d1(u)
Figure 1 Graphical representation of an HSMM
Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula
120572119905(119895 119889) =
119863
sum
1198891015840=1
119873
sum
119894=1
(120572119905minus1198891015840 (119894 119889
1015840) 119886
0
119894119895119901119895119895(119889
1015840)
119905
prod
119896=119905minus119889+1
119887119895(x
119896))
1 le 119895 le 119873 1 le 119905 le 119879
(13)
that is the sum of the probabilities of being in the currentstate 119878
119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889
1015840le 119863
and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878
119894 1 le 119894 le 119873 and 119894 = 119895
The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction
To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency
To calculate the forward variable 120572119905(119895) using Azimirsquos
approach the duration-dependent transition matrix defined
in (8) is taken in consideration in the induction formula of(13) which becomes [30]
120572119905(119895) = [
119873
sum
119894=1
120572119905minus1
(119894) 119886119894119895(d
119905minus1)] 119887
119895(x
119905) (14)
To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889
119905(119894)] defined as
119889119905(119894) = E (119889
119905(119894) | x
1x2sdot sdot sdot x
119905 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (15)
where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula
d119905= 120574
119905minus1⊙ d
119905minus1+ 1 (16)
where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574
119905(119894)] (the
probability of being in state 119878119894at time 119905 given the observation
sequence and the model parameters) with dimensions119873 times 1
is calculated in terms of 120572119905(119894) as
120574119905(119894) = P (119904
119905= 119878
119894| x
1x2sdot sdot sdot x
119905 120582) =
120572119905(119894)
sum119873
119895=1120572119905(119895)
1 le 119894 le 119873
(17)
Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known
119889119905(119894) = 119904
119905minus1(119894) sdot 119889
119905minus1(119894) + 1 (18)
where for each 119905 119904119905(119894) is 1 if 119904
119905= 119878
119894 0 otherwise
6 Mathematical Problems in Engineering
A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878
1 119878
1 119878
2 ) the correct sequence of the duration
vector is d1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3= [1 1 1]
119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d
1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3=
[1 2 1]119879 which is in contradiction with the definition of the
state duration vector given in (2)To calculate the average state duration variable 119889
119905(119894) we
propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as
119889119905(119894) = P (119904
119905minus1= 119878
119894| 119904
119905= 119878
119894 x
1 x
119905) sdot (119889
119905minus1(119894) + 1) (19)
=119886119894119894(d
119905minus1) sdot 120572
119905minus1(119894) sdot 119887
119894(x
119905)
120572119905(119894)
sdot (119889119905minus1
(119894) + 1)
1 le 119894 le 119873
(20)
The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878
119894in the previous step
Using the proposed (20) the forward algorithm can bespecified as follows
(1) initialization with 1 le 119894 le 119873
1205721(119894) = 120587
119894119887119894(x
1)
1198891(119894) = 1
Ad1
= P (d1) + (I minus P (d
1))A0
(21)
where P(d119894) is estimated using (6)
(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1
120572119905+1
(119895) = [
119873
sum
119894=1
120572119905(119894) 119886
119894119895(d
119905)] 119887
119895(x
119905+1) (22)
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (23)
Ad119905+1
= P (d119905+1
) + (I minus P (d119905+1
))A0 (24)
where 119886119894119895(d
119905) are the coefficients of the matrix Ad
119905
(3) termination
P (x | 120582) =119873
sum
119894=1
120572119879(119894) (25)
Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573
119905(119894) as
120573119905(119894) = P (x
119905+1x119905+2
sdot sdot sdot x119879| 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (26)
Having estimated the dynamic transition matrix Ad119905
foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows
(1) Initialization
120573119879(119894) = 1 1 le 119894 le 119873 (27)
(2) Induction
120573119905(119894) =
119873
sum
119895=1
119886119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873
(28)
Although the variable 120573119905(119894) is not necessary for the
calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223
222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence
Formally given a sequence of observation x = x1x2sdot sdot sdot x
119879
the best state sequence 119878lowast = 119904lowast
1119904lowast
2sdot sdot sdot 119904
lowast
119879corresponding to x is
calculated by defining the variable 120575119905(119894) as
120575119905(119894) = max
11990411199042119904119905minus1
P (11990411199042 119904
119905= 119878
119894 x
1x2sdot sdot sdot x
119905| 120582) (29)
The procedure to recursively calculate the variable 120575119905(119894)
and to retrieve the target state sequence (ie the argumentswhich maximize the 120575
119905(119894)rsquos) for the proposed HSMM is a
straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575
119905(119894) of the dynamic transition matrix Ad
119905
= [119886119894119895(d
119905)]
calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows
(1) initialization with 1 le 119894 le 119873
1205751(119894) = 120587
119894119887119894(x
1)
1205951(119894) = 0
(30)
(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879
120575119905(119895) = max
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] 119887
119895(x
119905) (31)
120595119905(119895) = argmax
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] (32)
(3) termination
119875lowast= max
1le119894le119873
[120575119879(119894)] (33)
119904lowast
119879= argmax
1le119894le119873
[120575119879(119894)] (34)
where we keep track of the argument maximizing (31) usingthe vector 120595
119905 which tracked back gives the desired best
state sequence
119904lowast
119905= 120595
119905+1(119904
lowast
119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)
Mathematical Problems in Engineering 7
223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0
Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0
Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =
x1x2sdot sdot sdot x
119879 referred to as training set in the following the
training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)
We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations
Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations
Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585
119905(119894 119895) which
represents the probability of being in state 119878119894at time 119905 and
in state 119878119895at time 119905 + 1 given the model and the observation
sequence as
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582) (36)
However in the HSMM case the variable 120585119905(119894 119895) considers
the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582)
=P (119904
119905= 119878
119894 119904
119905+1= 119878
119895 x | 120582)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
sum119873
119894=1sum
119873
119895=1120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
(37)
From 120585119905(119894 119895) we can derive the quantity 120574
119905(119894) (already
defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model
parameters
120574119905(119894) =
119873
sum
119895=1
120585119905(119894 119895) (38)
Finally the the reestimation formulas for the parameters120587 and A0 are given by
120587119894= 120574
1(119894) (39)
1198860
119894119895=
(sum119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
sum119873
119895=1(sum
119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
(40)
where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where
119892119894119895= 0 for 119894 = 119895 and 119892
119894119895= 1 for 119894 = 119895 ⊙ represents the element
by element product between two matrices sum119879minus1
119905=1120574119905(119894) is the
expected number of transitions from state 119878119894 andsum119879minus1
119905=1120585119905(119894 119895)
is the expected number of transitions from state 119878119894to state 119878
119895
Equation (39) represents the expected number of timesthat the model starts in state 119878
119894 while (40) represents the
expected number of transitions from state 119878119894to state 119878
119895with
119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878
119894
For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873
119895=11198860
119894119895= 1
for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587
119894inherently sums up to 1 at each iteration since
it represents the expected frequency in state 119878119894at time 119905 = 1
for each 1 le 119894 le 119873With respect to the reestimation of the state duration
parameters Θ firstly we estimate the mean 120583119894119889
and thevariance 1205902
119894119889of the 119894th state duration for each 1 le 119894 le 119873
from the forward and backward variables and the estimationof the state duration variable
120583119894119889
=sum
119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)) 119889
119905(119894)
sum119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
(41)
1205902
119894119889= (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
sdot (119889119905(119894) minus 120583
119894119889)2
)
sdot (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)))
minus1
(42)
where (41) can be interpreted as the probability of transitionfrom state 119878
119894to 119878
119895with 119894 = 119895 at time 119905weighted by the duration
of state 119878119894at 119905 giving the desired expected value while in (42)
the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance
Then the parameters of the desired duration distributioncan be estimated from 120583
119894119889and 1205902
119894119889 For example if a Gamma
distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]
119894and 120578
119894
for each 1 le 119894 le 119873 can be calculated as ]119894= 120583
2
119894119889120590
2
119894119889and
120578119894= 120590
2
119894119889120583
119894119889
8 Mathematical Problems in Engineering
Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]
In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878
119895at time 119905
with the probability of the observation vector x119905evaluated by
the 119896th mixture component as
120574119905(119895 119896) = [
[
120572119905(119895) 120573
119905(119895)
sum119873
119895=1120572119905(119895) 120573
119905(119895)
]
]
sdot [
[
119888119895119896N (x
119905120583
119895119896U
119895119896)
sum119872
119898=1119888119895119898N (x
119905120583
119895119898U
119895119898)
]
]
(43)
By using the former quantity the parameters 119888119895119896 120583
119895119896 andU
119895119896
are reestimated through the following formulas
119888119895119896=
sum119879
119905=1120574119905(119895 119896)
sum119879
119905=1sum
119872
119898=1120574119905(119895 119898)
120583119895119896=sum
119879
119905=1120574119905(119895 119896) sdot x
119905
sum119879
119905=1120574119905(119895 119896)
U119895119896=sum
119879
119905=1120574119905(119895 119896) sdot (x
119905minus 120583
119895119896) (x
119905minus 120583
119895119896)119879
sum119879
119905=1120574119905(119895 119896)
(44)
where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for
the observation matrix 119887119895(119897) is
119887119895(119897) =
sum119879
119905=1with119909119905=119883119897
120574119905(119895)
sum119879
119905=1120574119905(119895)
(45)
where the quantity 120574119905(119895) which takes into account the dura-
tion dependent forward variable 120572119905(119895) is calculated through
(17)The reader is referred to Rabinerrsquos work [13] for the inter-
pretation on the observation parameters reestimation formu-las
3 AIC-Based Model Selection
In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion
In general information criteria are represented as a two-term structure They account for a compromise between
a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations
The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as
AIC =minus log 119871 () + 119901
119879 (46)
where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)
Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901
ℎ+ 119901
119900 where
119901ℎare the parameters of the hidden states layer while 119901
119900are
those of the observation layerIn particular 119901
ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where
(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition
matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the
number of parameters 120579 of the duration distribution
Concerning 119901119900 a distinction must be made between
discrete and continuous observations
(i) in the case of discrete observations with 119871 possibleobservable values 119901
119900= (119871 minus 1) sdot 119873 which accounts
for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate
mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901
[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862
4 Remaining Useful Lifetime Estimation
One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state
As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878
119896that represents the
failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state
Mathematical Problems in Engineering 9
119878119896 If we assume that the time to failure is a random variable
119863 following a determinate probability density we define theRUL at the current time 119905 as
RUL119905= 119863 = E (119863) 119904
119905+= 119878
119896 119904
119905+minus1= 119878
119894
1 le 119894 119896 le 119873 119894 = 119896
(47)
where E denotes the expected valueHaving fixed the failure state the estimation of the RUL
is performed in two steps every time a new observation isacquired (online)
(1) estimation of the current state(2) projection of the future state transitions until the
failure state is reached and estimation of the expectedsojourn time
The estimation of the current state is performed via theViterbi path that is the variable 120575
119905= [120575
119905(119894)]
1le119894le119873defined in
(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575
119905(119894) obtained as
120575119905(119894) = max
11990411199042119904119905minus1
P (119904119905= 119878
119894| 119904
11199042sdot sdot sdot 119904
119905minus1 x
1x2sdot sdot sdot x
119905 120582)
=120575119905(119894)
sum119873
119895=1120575119905(119895)
1 le 119894 le 119873
(48)
that is an estimate of the probability of being in state 119878119894at
time 119905Together with the normalized variable 120575
119905(119894) the maxi-
mum a posteriori estimate of the current state 119904lowast
119905is taken
into account according to (34) If 119904lowast119905coincides with the failure
state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904
lowast
119905) is
calculated as
119889avg (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (49)
where with 120583119889119894
we denote the expected value of the durationvariable in state 119878
119894according to the duration distribution
specified by the parameters 120579119894 Equation (49) thus estimates
the remaining time in the current state by subtracting theestimated states duration 119889
119905(119894) at time 119905 from the expected
sojourn time of state 119878119894 and weighting the result using the
uncertainty about the current state 120575119905(119894) and finally by
summing up all the contributions from each stateIn addition to the average remaining time a lower and an
upper bound value can be calculated based on the standarddeviation 120590
119889119894
of the duration distribution for state 119878119894
119889low (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (50)
119889up (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (51)
Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows
120575next = [120575119905+119889
(119894)]1le119894le119873
= (A0)119879
sdot 120575119905 (52)
while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as
119904lowast
next = 119904lowast
119905+119889= argmax
1le119894le119873
120575119905+119889
(119894) (53)
Again if 119904lowast119905+119889
coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =
119889avg(119904lowast
119905) calculated at the previous step with the bound values
119863low = 119889low(119904lowast
119905) and119863up = 119889up(119904
lowast
119905) Otherwise the estimation
of the sojourn time of the next state is calculated as follows
119889avg (119904lowast
119905+119889) =
119873
sum
119894=1
120583119889119894
⊙ 120575119905+119889
(119894) (54)
119889low (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
) ⊙ 120575119905+119889
(119894) (55)
119889up (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
) ⊙ 120575119905+119889
(119894) (56)
This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state
119863avg = sum119889avg (57)
119863low = sum119889low (58)
119863up = sum119889up (59)
Finally Algorithm 1 details the above described RUL esti-mation procedure
5 Experimental Results
To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data
The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)
10 Mathematical Problems in Engineering
(1) function RulEstimation(x119905 119878
119896) ⊳ x
119905 The last observation acquired
(2) ⊳ 119878119896 The failure state
(3) Initialization(4) 119863avg larr 0
(5) 119863low larr 0
(6) 119863up larr 0
(7) Current state estimation(8) Calculate 120575
119905⊳ Using (48)
(9) Calculate 119904lowast119905
⊳ Using (34)(10) Calculate d
119905⊳ Using (20)
(11) 119878 larr 119904lowast
119905
(12) Loop(13) while 119878 = 119878
119896do
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
the underlying stochastic process of the state to be a semi-Markov chain instead of a simple Markov chain of a HMMSuch model is referred to as Hidden Semi-Markov Model(HSMM) [21] HSMMs and explicit duration modeles havebeen proven beneficial for many applications [22ndash25] Acomplete overview of different duration model classes hasbeenmade byYu [26]Most state durationmodels used in theliterature are nonparametric discrete distributions [27ndash29]As a consequence the number of parameters that describe themodel and that have to be estimated is high and consequentlythe learning procedure can be computationally expensive forreal complex applications Moreover it is necessary to specifya priori the maximum duration allowed in each state
To alleviate the high dimensionality of the parameterspace parametric duration models have been proposed Forexample Salfner [6] proposed a generic parametric continu-ous distribution to model the state sojourn time However intheir model the observation has been assumed to be discreteand applied to recognize failure-prone observation sequenceUsing continuous observation Azimi et al [30ndash32] specifiedan HSMM with parametric duration distribution belongingto the Gamma family and modeled the observation processby a Gaussian
Inspired by the latter two approaches in this workwe pro-pose a generic specification of a parametric HSMM in whichno constraints are made on the model of the state durationand on the observation processes In our approach the stateduration ismodeled as a generic parametric density functionOn the other hand the observations can be modeled eitheras a discrete stochastic process or as continuous mixtureof Gaussians The latter has been shown to approximatearbitrarily closely any finite continuous density function[33] The proposed model can be generally used in a widerange of applications and types of data Moreover in thispaper we introduce a new and more effective estimator ofthe time spent by the system in a determinate state priorto the current time To the best of our knowledge a partfrom the above referred works the literature on HSMMsapplied to prognosis and predictive maintenance for indus-trial machines is limited [34] Hence the present work aimsto show the effectiveness of the proposed duration model insolving condition monitoring and RUL estimation problems
Dealing with state space models and in particular ofHSMMs one should define the number of states and cor-rect family of duration density and in case of continuousobservations the adequate number of Gaussian mixturesSuch parameters play a prominent role since the right modelconfiguration is essential to enable an accurate modelingof the dynamic pattern and the covariance structure of theobserved time series The estimation of a satisfactory modelconfiguration is referred to asmodel selection in literature
While several state-of-the-art approaches use expertknowledge to get insight on the model structure [15 3536] an automated methodology for model selection is oftenrequired In the literature model selection has been deeplystudied for a wide range of models Among the existingmethodologies information based techniques have beenextensively analyzed in literature with satisfactory results
Although Bayesian Information Criterion (BIC) is particu-larly appropriate to be used in finite mixture models [37 38]Akaike Information Criterion (AIC) has been demonstratedto outperform BIC when applied to more complex modelsand when the sample size is limited [39 40] which is the caseof the target application of this paper
In this work AIC is used to estimate the correct modelconfiguration with the final goal of an automated HSMMsmodel selection which exploits only the information avail-able in the input dataWhile model selection techniques havebeen extensively used in the framework of Hidden MarkovModels [41ndash43] to the best of our knowledge the presentwork is the first that proposes their appliance to durationmodels and in particular to HSMMs
In summary the present work contributes to conditionmonitoring predictive maintenance and RUL estimationproblems by
(i) proposing a general Hidden Semi-Markov Modelapplicable for continuous or discrete observations andwith no constraints on the density function used tomodel the state duration
(ii) proposing a more effective estimator of the stateduration variable 119889
119905(119894) that is the time spent by the
system in the 119894th state prior to current time 119905(iii) adapting the learning inference and prediction algo-
rithms considering the defined HSMM parametersand the proposed 119889
119905(119894) estimator
(iv) using the Akaike Information Criterion for automaticmodel selection
The rest of the paper is organized as follows in Section 2we introduce the theory of the proposed HSMM togetherwith its learning inference and prediction algorithmsSection 3 gives a short theoretical overview of the AkaikeInformation Criterion Section 4 presents the methodologyused to estimate the Remaining Useful Lifetime using theproposed HSMM In Section 5 experimental results arediscussed The conclusion and future research directions aregiven in Section 6
2 Hidden Semi-Markov Models
Hidden Semi-Markov Models (HSMMs) introduce the con-cept of variable duration which results in a more accuratemodeling power if the system being modeled shows a depen-dence on time
In this section we give the specification of the proposedHSMM for which we model the state duration with a para-metric state-dependent distribution Compared to nonpara-metric modeling this approach has two main advantages
(i) the model is specified by a limited number of param-eters as a consequence the learning procedure iscomputationally less expensive
(ii) the model does not require the a priori knowledgeof the maximum sojourn time allowed in each statebeing inherently learnt through the duration distribu-tion parameters
Mathematical Problems in Engineering 3
21 Model Specification A Hidden Semi-Markov Model isa doubly embedded stochastic model with an underlyingstochastic process that is not observable (hidden) but canonly be observed through another set of stochastic processesthat produce the sequence of observations HSMMallows theunderlying process to be a semi-Markov chain with a variableduration or sojourn time for each state The key conceptof HSMMs is that the semi-Markov property holds for thismodel while in HMMs the Markov property implies that thevalue of the hidden state at time 119905 depends exclusively on itsvalue of time 119905 minus 1 in HSMMs the probability of transitionfrom state 119878
119895to state 119878
119894at time 119905depends on the duration spent
in state 119878119895prior to time 119905
In the following we denote the number of states in themodel as119873 the individual states as 119878 = 119878
1 119878
119873 and the
state at time 119905 as 119904119905 The semi-Markov property can be written
as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119895 119904
1= 119878
119896)
= P (119904119905+1
= 119894 | 119904119905= 119895 119889
119905(119895)) 1 le 119894 119895 119896 le 119873
(1)
where the duration variable 119889119905(119895) is defined as the time spent
in state 119878119895prior to time 119905
Although the state duration is inherently discrete inmany studies [44 45] it has been modeled with a continuousparametric density function Similar to the work of Azimiet al [30ndash32] in this paper we use the discrete counterpartof the chosen parametric probability density function (pdf)With this approximation if we denote the pdf of the sojourntime in state 119878
119894as 119891(119909 120579
119894) where 120579
119894represents the set of
parameters of the pdf relative to the 119894th state the probabilitythat the system stays in state 119878
119894for exactly 119889 time steps
can be calculated as int119889
119889minus1119891(119909 120579
119894)119889119909 Considering the HSMM
formulation we can generally denote the state dependentduration distributions by the set of their parameters relativeto each state as Θ = 120579
1 120579
119873
Many related works on HSMMs [31 32 44 45] consider119891(119909 120579
119894) within the exponential family In particular Gamma
distributions are oftenused in speech processing applicationsIn this work we do not impose a type of distribution functionto model the duration The only requirement is that theduration should be modeled as a positive function beingnegative durations physically meaningless
HSMMs require also the definition of a ldquodynamicrdquo tran-sitionmatrix as a consequence of the semi-Markov propertyDifferently from the HMMs in which a constant transitionprobability leads to a geometric distributed state sojourntime HSMMs explicitly define a transition matrix whichdepending on the duration variable has increasing probabil-ities of changing state as the time goes on For conveniencewe specify the state duration variable in a form of a vector d
119905
with dimensions119873 times 1 as
d119905=
119889119905(119895) if 119904
119905= 119878
119895
1 if 119904119905
= 119878119895
(2)
The quantity 119889119905(119895) can be easily calculated by induction from
119889119905minus1
(119895) as
119889119905(119895) = 119904
119905(119895) sdot 119904
119905minus1(119895) sdot 119889
119905minus1(119895) + 1 (3)
where 119904119905(119895) is 1 if 119904
119905= 119878
119895 0 otherwise
If we assume that at time 119905 the system is in state 119878119894 we can
formally define the duration-dependent transition matrix asAd119905
= [119886119894119895(d
119905)] with
119886119894119895(d
119905) = P (119904
119905+1= 119878
119895| 119904
119905= 119878
119894 119889
119905(119894)) 1 le 119894 119895 le 119873 (4)
The specification of themodel can be further simplified byobserving that at each time 119905 the matrix Ad
119905
can be decom-posed in two terms the recurrent and the nonrecurrent statetransition probabilities
The recurrent transition probabilities P(d119905) = [119901
119894119895(d
119905)]
which depend only on the duration vector d119905and the
parameters Θ take into account the dynamics of the self-transition probabilities It is defined as the probability ofremaining in the current state at the next time step given theduration spent in the current state prior to time 119905
119901119894119894(d
119905) = P (119904
119905+1= 119878
119894| 119904
119905= 119878
119894 119889
119905(119894))
= P (119904119905+1
= 119878119894|
119904119905= 119878
119894 119904
119905minus1= 119878
119894 119904
119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894)
= (P (119904119905+1
= 119878119894 119904
119905= 119878
119894 119904
119905minus119889119905(119894)+2
= 119878119894|
119904119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894))
sdot (P (119904119905= 119878
119894 119904
119905minus1= 119878
119894 119904
119905minus119889119905(119894)+2
= 119878119894|
119904119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894))
minus1
(5)
The denominator in (5) can be expressed as suminfin
119896=1P(119904
119905+119896=
119878119894 119904
119905+119896minus1= 119878
119894 119904
119905minus119889119905(119894)+2
= 119878119894| 119904
119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894)
which is the probability that the system at time 119905 has beenstaying in state 119878
119894for at least 119889
119905(119894) minus 1 time units The above
expression is equivalent to 1 minus 119865(119889119905(119894) minus 1 120579
119894) where 119865(sdot 120579
119894)
is the duration cumulative distribution function relative tothe the state 119878
119894 that is 119865(119889 120579) = int
119889
minusinfin119891(119909 120579)119889119909 As a
consequence from (5) we can define the recurrent transitionprobabilities as a diagonal matrix with dimensions119873times119873 as
P (d119905) = [119901
119894119895(d
119905)] =
1 minus 119865 (119889119905(119894) 120579
119894)
1 minus 119865 (119889119905(119894) minus 1 120579
119894)
if 119894 = 119895
0 if 119894 = 119895
(6)
The usage of the cumulative functions in (6) which tendto 1 as the duration tends to infinity suggests that theprobability of self-transition tends to decrease as the sojourntime increases leading the model to always leave the currentstate if time approaches infinity
The nonrecurrent state transition probabilities A0=
[1198860
119894119895] rule the transitions between two different states It is
4 Mathematical Problems in Engineering
represented by a 119873 times 119873 matrix with the diagonal elementsequal to zero defined as
A0= [119886
0
119894119895] =
0 if 119894 = 119895
P (119904119905+1
= 119878119895| 119904
119905= 119878
119894) if 119894 = 119895
(7)
A0 must be specified as a stochastic matrix that is its ele-ments have to satisfy the constraintsum119873
119895=11198860
119894119895= 1 for all 119894
As a consequence of the above decomposition thedynamic of the underlying semi-Markov chain can be definedby specifying only the state-dependent duration parametersΘ and the nonrecurrentmatrixA0 since themodel transitionmatrix can be calculated at each time 119905 using (6) and (7)
Ad119905
= P (d119905) + (I minus P (d
119905))A0
(8)
where I is the identity matrix If we denote the elements ofthe dynamic transition matrix Ad
119905
as 119886119894119895(d
119905) the stochastic
constraint sum119873
119895=1119886119894119895(d
119905) = 1 for all 119894 and 119905 is guaranteed from
the fact that P(d119905) is a diagonal matrix and A0 is a stochastic
matrixFor several applications it is necessary to model the
absorbing state which in the case of industrial equipmentcorresponds to the ldquobrokenrdquo or ldquofailurerdquo state If we denote theabsorbing state as 119878
119896with 119896 isin [1119873] we must fix the 119896th row
of the nonrecurrentmatrixA0 to be 1198860119896119896= 1 and 1198860
119896119894= 0 for all
1 le 119894 le 119873 with 119894 = 119896 By substituting such A0 matrix in (8)it is easy to show that the element 119886
119896119896(d
119905) = 1 and remains
constant for all 119905 while the duration probability parameters120579119896are not influent for the absorbing state 119878
119896 An example of
absorbing state specification will be given in Section 5With respect to the input observation signals in this work
we consider both continuous and discrete data by adaptingthe suitable observationmodel depending on the observationnature In particular for the continuous case we modelthe observations with a multivariate mixture of Gaussiansdistributions This choice presents two main advantages (i)a multivariate model allows to deal with multiple observa-tions at the same time this is often the case of industrialequipments modeling since at each time multiple sensorsrsquomeasurements are available and (ii) mixture of Gaussians hasbeen proved to closely approximate any finite and continuousdensity function [33] Formally if we denote by x
119905the
observation vector at time 119905 and the generic observationvector beingmodeled as x the observation density for the 119895thstate is represented by a finite mixture of119872 gaussians
119887119895(x) =
119872
sum
119898=1
119888119895119898N (x120583
119895119898U
119895119898) 1 le 119895 le 119873 (9)
where 119888119895119898
is the mixture coefficient for the 119898th mixture instate 119878
119895 which satisfies the stochastic constraintsum119872
119898=1119888119895119898
= 1
for 1 le 119895 le 119873 and 119888119895119898
ge 0 for 1 le 119895 le 119873 and 1 le 119898 le 119872while N is the Gaussian density with mean vector 120583
119895119898and
covariance matrix U119895119898
for the 119898th mixture component instate 119895
In case of discrete data we model the observationswithin each state with a nonparametric discrete probability
distribution In particular if 119871 is the number of distinctobservation symbols per state and if we denote the symbolsas 119883 = 119883
1 119883
119871 and the observation at time 119905 as 119909
119905 the
observation symbol probability distribution can be defined asa matrix 119861 = [119887
119895(119897)] of dimensions119873 times 119871 where
119887119895(119897) = P [119909
119905= 119883
119897| 119904
119905= 119878
119895] 1 le 119895 le 119873 1 le 119897 le 119871 (10)
Since the system in each state at each time step can emit oneof the possible 119871 symbols the matrix 119861 is stochastic that is itis constrained to sum119871
119897=1119887119895(119897) = 1 for all 1 le 119895 le 119873
Finally as in the case of HMMs we specify the initial statedistribution 120587 = 120587
119894 which defines the probability of the
starting state as
120587119894= P [119904
1= 119878
119894] 1 le 119894 le 119873 (11)
From the above considerations two different HSMMmodels can be considered In the case of continuous obser-vation 120582 = (A0
Θ 119862 120583 119880 120587) and in the case of discreteobservation the HSMM is characterized by 120582 = (A0
Θ 119861 120587)An example of continuous HSMM with 3 states is shown inFigure 1
22 Learning and Inference Algorithms Let us denote thegeneric sequence of observations being indiscriminatelycontinuous vectors or discrete symbols as x = x
1x2sdot sdot sdot x
119879 in
order to use the defined HSMM model in practice similarlyto the HMM we need to solve three basic problems
(1) Given the observation x and a model 120582 calculate theprobability that the sequence x has been generated bythe model 120582 that is P(x | 120582)
(2) Given the observation x and a model 120582 calculatethe state sequence 119878 = 119904
11199042sdot sdot sdot 119904
119879which have most
probably generated the sequence x(3) Given the observation x find the parameters of the
model 120582 which maximize P(x | 120582)
As in case of HMM solving the above problems requiresusing the forward-backward [13] decoding (Viterbi [46] andForney [47]) and ExpectationMaximization [48] algorithmswhichwill be adapted to theHSMMintroduced in Section 21In the following we also propose a more effective estimatorof the state duration variable 119889
119905(119894) defined in (2)
221 The Forward-Backward Algorithm Given a genericsequence of observations x = x
1x2sdot sdot sdot x
119879 the goal is to
calculate the model likelihood that isP(x | 120582) This quantityis useful for the training procedure where the parametersthat locally maximize the model likelihood are chosen aswell as for classification problems The latter is the case inwhich the observation sequence x has to be mapped to oneof a finite set of 119862 classes represented by a set of HSMMparameters 119871 = 120582
1 120582
119862 The class of x is chosen such
that 120582(x) = argmax120582isin119871
P(119883 | 120582)To calculate the model likelihood we first define the
forward variable at each time 119905 as
120572119905(119894) = P (x
1x2sdot sdot sdot x
119905 119904
119905= 119878
119894| 120582) 1 le 119894 le 119873 (12)
Mathematical Problems in Engineering 5
Hiddenstates
Observationprobabilities
Sojournprobabilities
Time (u) Time (u) Time (u)
Observed Observed Observed
S1 S2 S3
a12 a23
P(oS1) P(oS2) P(oS3)
d3(u)d2(u)d1(u)
Figure 1 Graphical representation of an HSMM
Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula
120572119905(119895 119889) =
119863
sum
1198891015840=1
119873
sum
119894=1
(120572119905minus1198891015840 (119894 119889
1015840) 119886
0
119894119895119901119895119895(119889
1015840)
119905
prod
119896=119905minus119889+1
119887119895(x
119896))
1 le 119895 le 119873 1 le 119905 le 119879
(13)
that is the sum of the probabilities of being in the currentstate 119878
119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889
1015840le 119863
and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878
119894 1 le 119894 le 119873 and 119894 = 119895
The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction
To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency
To calculate the forward variable 120572119905(119895) using Azimirsquos
approach the duration-dependent transition matrix defined
in (8) is taken in consideration in the induction formula of(13) which becomes [30]
120572119905(119895) = [
119873
sum
119894=1
120572119905minus1
(119894) 119886119894119895(d
119905minus1)] 119887
119895(x
119905) (14)
To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889
119905(119894)] defined as
119889119905(119894) = E (119889
119905(119894) | x
1x2sdot sdot sdot x
119905 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (15)
where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula
d119905= 120574
119905minus1⊙ d
119905minus1+ 1 (16)
where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574
119905(119894)] (the
probability of being in state 119878119894at time 119905 given the observation
sequence and the model parameters) with dimensions119873 times 1
is calculated in terms of 120572119905(119894) as
120574119905(119894) = P (119904
119905= 119878
119894| x
1x2sdot sdot sdot x
119905 120582) =
120572119905(119894)
sum119873
119895=1120572119905(119895)
1 le 119894 le 119873
(17)
Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known
119889119905(119894) = 119904
119905minus1(119894) sdot 119889
119905minus1(119894) + 1 (18)
where for each 119905 119904119905(119894) is 1 if 119904
119905= 119878
119894 0 otherwise
6 Mathematical Problems in Engineering
A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878
1 119878
1 119878
2 ) the correct sequence of the duration
vector is d1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3= [1 1 1]
119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d
1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3=
[1 2 1]119879 which is in contradiction with the definition of the
state duration vector given in (2)To calculate the average state duration variable 119889
119905(119894) we
propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as
119889119905(119894) = P (119904
119905minus1= 119878
119894| 119904
119905= 119878
119894 x
1 x
119905) sdot (119889
119905minus1(119894) + 1) (19)
=119886119894119894(d
119905minus1) sdot 120572
119905minus1(119894) sdot 119887
119894(x
119905)
120572119905(119894)
sdot (119889119905minus1
(119894) + 1)
1 le 119894 le 119873
(20)
The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878
119894in the previous step
Using the proposed (20) the forward algorithm can bespecified as follows
(1) initialization with 1 le 119894 le 119873
1205721(119894) = 120587
119894119887119894(x
1)
1198891(119894) = 1
Ad1
= P (d1) + (I minus P (d
1))A0
(21)
where P(d119894) is estimated using (6)
(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1
120572119905+1
(119895) = [
119873
sum
119894=1
120572119905(119894) 119886
119894119895(d
119905)] 119887
119895(x
119905+1) (22)
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (23)
Ad119905+1
= P (d119905+1
) + (I minus P (d119905+1
))A0 (24)
where 119886119894119895(d
119905) are the coefficients of the matrix Ad
119905
(3) termination
P (x | 120582) =119873
sum
119894=1
120572119879(119894) (25)
Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573
119905(119894) as
120573119905(119894) = P (x
119905+1x119905+2
sdot sdot sdot x119879| 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (26)
Having estimated the dynamic transition matrix Ad119905
foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows
(1) Initialization
120573119879(119894) = 1 1 le 119894 le 119873 (27)
(2) Induction
120573119905(119894) =
119873
sum
119895=1
119886119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873
(28)
Although the variable 120573119905(119894) is not necessary for the
calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223
222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence
Formally given a sequence of observation x = x1x2sdot sdot sdot x
119879
the best state sequence 119878lowast = 119904lowast
1119904lowast
2sdot sdot sdot 119904
lowast
119879corresponding to x is
calculated by defining the variable 120575119905(119894) as
120575119905(119894) = max
11990411199042119904119905minus1
P (11990411199042 119904
119905= 119878
119894 x
1x2sdot sdot sdot x
119905| 120582) (29)
The procedure to recursively calculate the variable 120575119905(119894)
and to retrieve the target state sequence (ie the argumentswhich maximize the 120575
119905(119894)rsquos) for the proposed HSMM is a
straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575
119905(119894) of the dynamic transition matrix Ad
119905
= [119886119894119895(d
119905)]
calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows
(1) initialization with 1 le 119894 le 119873
1205751(119894) = 120587
119894119887119894(x
1)
1205951(119894) = 0
(30)
(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879
120575119905(119895) = max
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] 119887
119895(x
119905) (31)
120595119905(119895) = argmax
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] (32)
(3) termination
119875lowast= max
1le119894le119873
[120575119879(119894)] (33)
119904lowast
119879= argmax
1le119894le119873
[120575119879(119894)] (34)
where we keep track of the argument maximizing (31) usingthe vector 120595
119905 which tracked back gives the desired best
state sequence
119904lowast
119905= 120595
119905+1(119904
lowast
119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)
Mathematical Problems in Engineering 7
223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0
Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0
Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =
x1x2sdot sdot sdot x
119879 referred to as training set in the following the
training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)
We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations
Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations
Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585
119905(119894 119895) which
represents the probability of being in state 119878119894at time 119905 and
in state 119878119895at time 119905 + 1 given the model and the observation
sequence as
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582) (36)
However in the HSMM case the variable 120585119905(119894 119895) considers
the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582)
=P (119904
119905= 119878
119894 119904
119905+1= 119878
119895 x | 120582)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
sum119873
119894=1sum
119873
119895=1120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
(37)
From 120585119905(119894 119895) we can derive the quantity 120574
119905(119894) (already
defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model
parameters
120574119905(119894) =
119873
sum
119895=1
120585119905(119894 119895) (38)
Finally the the reestimation formulas for the parameters120587 and A0 are given by
120587119894= 120574
1(119894) (39)
1198860
119894119895=
(sum119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
sum119873
119895=1(sum
119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
(40)
where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where
119892119894119895= 0 for 119894 = 119895 and 119892
119894119895= 1 for 119894 = 119895 ⊙ represents the element
by element product between two matrices sum119879minus1
119905=1120574119905(119894) is the
expected number of transitions from state 119878119894 andsum119879minus1
119905=1120585119905(119894 119895)
is the expected number of transitions from state 119878119894to state 119878
119895
Equation (39) represents the expected number of timesthat the model starts in state 119878
119894 while (40) represents the
expected number of transitions from state 119878119894to state 119878
119895with
119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878
119894
For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873
119895=11198860
119894119895= 1
for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587
119894inherently sums up to 1 at each iteration since
it represents the expected frequency in state 119878119894at time 119905 = 1
for each 1 le 119894 le 119873With respect to the reestimation of the state duration
parameters Θ firstly we estimate the mean 120583119894119889
and thevariance 1205902
119894119889of the 119894th state duration for each 1 le 119894 le 119873
from the forward and backward variables and the estimationof the state duration variable
120583119894119889
=sum
119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)) 119889
119905(119894)
sum119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
(41)
1205902
119894119889= (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
sdot (119889119905(119894) minus 120583
119894119889)2
)
sdot (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)))
minus1
(42)
where (41) can be interpreted as the probability of transitionfrom state 119878
119894to 119878
119895with 119894 = 119895 at time 119905weighted by the duration
of state 119878119894at 119905 giving the desired expected value while in (42)
the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance
Then the parameters of the desired duration distributioncan be estimated from 120583
119894119889and 1205902
119894119889 For example if a Gamma
distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]
119894and 120578
119894
for each 1 le 119894 le 119873 can be calculated as ]119894= 120583
2
119894119889120590
2
119894119889and
120578119894= 120590
2
119894119889120583
119894119889
8 Mathematical Problems in Engineering
Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]
In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878
119895at time 119905
with the probability of the observation vector x119905evaluated by
the 119896th mixture component as
120574119905(119895 119896) = [
[
120572119905(119895) 120573
119905(119895)
sum119873
119895=1120572119905(119895) 120573
119905(119895)
]
]
sdot [
[
119888119895119896N (x
119905120583
119895119896U
119895119896)
sum119872
119898=1119888119895119898N (x
119905120583
119895119898U
119895119898)
]
]
(43)
By using the former quantity the parameters 119888119895119896 120583
119895119896 andU
119895119896
are reestimated through the following formulas
119888119895119896=
sum119879
119905=1120574119905(119895 119896)
sum119879
119905=1sum
119872
119898=1120574119905(119895 119898)
120583119895119896=sum
119879
119905=1120574119905(119895 119896) sdot x
119905
sum119879
119905=1120574119905(119895 119896)
U119895119896=sum
119879
119905=1120574119905(119895 119896) sdot (x
119905minus 120583
119895119896) (x
119905minus 120583
119895119896)119879
sum119879
119905=1120574119905(119895 119896)
(44)
where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for
the observation matrix 119887119895(119897) is
119887119895(119897) =
sum119879
119905=1with119909119905=119883119897
120574119905(119895)
sum119879
119905=1120574119905(119895)
(45)
where the quantity 120574119905(119895) which takes into account the dura-
tion dependent forward variable 120572119905(119895) is calculated through
(17)The reader is referred to Rabinerrsquos work [13] for the inter-
pretation on the observation parameters reestimation formu-las
3 AIC-Based Model Selection
In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion
In general information criteria are represented as a two-term structure They account for a compromise between
a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations
The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as
AIC =minus log 119871 () + 119901
119879 (46)
where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)
Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901
ℎ+ 119901
119900 where
119901ℎare the parameters of the hidden states layer while 119901
119900are
those of the observation layerIn particular 119901
ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where
(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition
matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the
number of parameters 120579 of the duration distribution
Concerning 119901119900 a distinction must be made between
discrete and continuous observations
(i) in the case of discrete observations with 119871 possibleobservable values 119901
119900= (119871 minus 1) sdot 119873 which accounts
for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate
mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901
[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862
4 Remaining Useful Lifetime Estimation
One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state
As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878
119896that represents the
failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state
Mathematical Problems in Engineering 9
119878119896 If we assume that the time to failure is a random variable
119863 following a determinate probability density we define theRUL at the current time 119905 as
RUL119905= 119863 = E (119863) 119904
119905+= 119878
119896 119904
119905+minus1= 119878
119894
1 le 119894 119896 le 119873 119894 = 119896
(47)
where E denotes the expected valueHaving fixed the failure state the estimation of the RUL
is performed in two steps every time a new observation isacquired (online)
(1) estimation of the current state(2) projection of the future state transitions until the
failure state is reached and estimation of the expectedsojourn time
The estimation of the current state is performed via theViterbi path that is the variable 120575
119905= [120575
119905(119894)]
1le119894le119873defined in
(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575
119905(119894) obtained as
120575119905(119894) = max
11990411199042119904119905minus1
P (119904119905= 119878
119894| 119904
11199042sdot sdot sdot 119904
119905minus1 x
1x2sdot sdot sdot x
119905 120582)
=120575119905(119894)
sum119873
119895=1120575119905(119895)
1 le 119894 le 119873
(48)
that is an estimate of the probability of being in state 119878119894at
time 119905Together with the normalized variable 120575
119905(119894) the maxi-
mum a posteriori estimate of the current state 119904lowast
119905is taken
into account according to (34) If 119904lowast119905coincides with the failure
state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904
lowast
119905) is
calculated as
119889avg (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (49)
where with 120583119889119894
we denote the expected value of the durationvariable in state 119878
119894according to the duration distribution
specified by the parameters 120579119894 Equation (49) thus estimates
the remaining time in the current state by subtracting theestimated states duration 119889
119905(119894) at time 119905 from the expected
sojourn time of state 119878119894 and weighting the result using the
uncertainty about the current state 120575119905(119894) and finally by
summing up all the contributions from each stateIn addition to the average remaining time a lower and an
upper bound value can be calculated based on the standarddeviation 120590
119889119894
of the duration distribution for state 119878119894
119889low (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (50)
119889up (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (51)
Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows
120575next = [120575119905+119889
(119894)]1le119894le119873
= (A0)119879
sdot 120575119905 (52)
while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as
119904lowast
next = 119904lowast
119905+119889= argmax
1le119894le119873
120575119905+119889
(119894) (53)
Again if 119904lowast119905+119889
coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =
119889avg(119904lowast
119905) calculated at the previous step with the bound values
119863low = 119889low(119904lowast
119905) and119863up = 119889up(119904
lowast
119905) Otherwise the estimation
of the sojourn time of the next state is calculated as follows
119889avg (119904lowast
119905+119889) =
119873
sum
119894=1
120583119889119894
⊙ 120575119905+119889
(119894) (54)
119889low (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
) ⊙ 120575119905+119889
(119894) (55)
119889up (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
) ⊙ 120575119905+119889
(119894) (56)
This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state
119863avg = sum119889avg (57)
119863low = sum119889low (58)
119863up = sum119889up (59)
Finally Algorithm 1 details the above described RUL esti-mation procedure
5 Experimental Results
To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data
The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)
10 Mathematical Problems in Engineering
(1) function RulEstimation(x119905 119878
119896) ⊳ x
119905 The last observation acquired
(2) ⊳ 119878119896 The failure state
(3) Initialization(4) 119863avg larr 0
(5) 119863low larr 0
(6) 119863up larr 0
(7) Current state estimation(8) Calculate 120575
119905⊳ Using (48)
(9) Calculate 119904lowast119905
⊳ Using (34)(10) Calculate d
119905⊳ Using (20)
(11) 119878 larr 119904lowast
119905
(12) Loop(13) while 119878 = 119878
119896do
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
21 Model Specification A Hidden Semi-Markov Model isa doubly embedded stochastic model with an underlyingstochastic process that is not observable (hidden) but canonly be observed through another set of stochastic processesthat produce the sequence of observations HSMMallows theunderlying process to be a semi-Markov chain with a variableduration or sojourn time for each state The key conceptof HSMMs is that the semi-Markov property holds for thismodel while in HMMs the Markov property implies that thevalue of the hidden state at time 119905 depends exclusively on itsvalue of time 119905 minus 1 in HSMMs the probability of transitionfrom state 119878
119895to state 119878
119894at time 119905depends on the duration spent
in state 119878119895prior to time 119905
In the following we denote the number of states in themodel as119873 the individual states as 119878 = 119878
1 119878
119873 and the
state at time 119905 as 119904119905 The semi-Markov property can be written
as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119895 119904
1= 119878
119896)
= P (119904119905+1
= 119894 | 119904119905= 119895 119889
119905(119895)) 1 le 119894 119895 119896 le 119873
(1)
where the duration variable 119889119905(119895) is defined as the time spent
in state 119878119895prior to time 119905
Although the state duration is inherently discrete inmany studies [44 45] it has been modeled with a continuousparametric density function Similar to the work of Azimiet al [30ndash32] in this paper we use the discrete counterpartof the chosen parametric probability density function (pdf)With this approximation if we denote the pdf of the sojourntime in state 119878
119894as 119891(119909 120579
119894) where 120579
119894represents the set of
parameters of the pdf relative to the 119894th state the probabilitythat the system stays in state 119878
119894for exactly 119889 time steps
can be calculated as int119889
119889minus1119891(119909 120579
119894)119889119909 Considering the HSMM
formulation we can generally denote the state dependentduration distributions by the set of their parameters relativeto each state as Θ = 120579
1 120579
119873
Many related works on HSMMs [31 32 44 45] consider119891(119909 120579
119894) within the exponential family In particular Gamma
distributions are oftenused in speech processing applicationsIn this work we do not impose a type of distribution functionto model the duration The only requirement is that theduration should be modeled as a positive function beingnegative durations physically meaningless
HSMMs require also the definition of a ldquodynamicrdquo tran-sitionmatrix as a consequence of the semi-Markov propertyDifferently from the HMMs in which a constant transitionprobability leads to a geometric distributed state sojourntime HSMMs explicitly define a transition matrix whichdepending on the duration variable has increasing probabil-ities of changing state as the time goes on For conveniencewe specify the state duration variable in a form of a vector d
119905
with dimensions119873 times 1 as
d119905=
119889119905(119895) if 119904
119905= 119878
119895
1 if 119904119905
= 119878119895
(2)
The quantity 119889119905(119895) can be easily calculated by induction from
119889119905minus1
(119895) as
119889119905(119895) = 119904
119905(119895) sdot 119904
119905minus1(119895) sdot 119889
119905minus1(119895) + 1 (3)
where 119904119905(119895) is 1 if 119904
119905= 119878
119895 0 otherwise
If we assume that at time 119905 the system is in state 119878119894 we can
formally define the duration-dependent transition matrix asAd119905
= [119886119894119895(d
119905)] with
119886119894119895(d
119905) = P (119904
119905+1= 119878
119895| 119904
119905= 119878
119894 119889
119905(119894)) 1 le 119894 119895 le 119873 (4)
The specification of themodel can be further simplified byobserving that at each time 119905 the matrix Ad
119905
can be decom-posed in two terms the recurrent and the nonrecurrent statetransition probabilities
The recurrent transition probabilities P(d119905) = [119901
119894119895(d
119905)]
which depend only on the duration vector d119905and the
parameters Θ take into account the dynamics of the self-transition probabilities It is defined as the probability ofremaining in the current state at the next time step given theduration spent in the current state prior to time 119905
119901119894119894(d
119905) = P (119904
119905+1= 119878
119894| 119904
119905= 119878
119894 119889
119905(119894))
= P (119904119905+1
= 119878119894|
119904119905= 119878
119894 119904
119905minus1= 119878
119894 119904
119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894)
= (P (119904119905+1
= 119878119894 119904
119905= 119878
119894 119904
119905minus119889119905(119894)+2
= 119878119894|
119904119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894))
sdot (P (119904119905= 119878
119894 119904
119905minus1= 119878
119894 119904
119905minus119889119905(119894)+2
= 119878119894|
119904119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894))
minus1
(5)
The denominator in (5) can be expressed as suminfin
119896=1P(119904
119905+119896=
119878119894 119904
119905+119896minus1= 119878
119894 119904
119905minus119889119905(119894)+2
= 119878119894| 119904
119905minus119889119905(119894)+1
= 119878119894 119904
119905minus119889119905(119894)
= 119878119894)
which is the probability that the system at time 119905 has beenstaying in state 119878
119894for at least 119889
119905(119894) minus 1 time units The above
expression is equivalent to 1 minus 119865(119889119905(119894) minus 1 120579
119894) where 119865(sdot 120579
119894)
is the duration cumulative distribution function relative tothe the state 119878
119894 that is 119865(119889 120579) = int
119889
minusinfin119891(119909 120579)119889119909 As a
consequence from (5) we can define the recurrent transitionprobabilities as a diagonal matrix with dimensions119873times119873 as
P (d119905) = [119901
119894119895(d
119905)] =
1 minus 119865 (119889119905(119894) 120579
119894)
1 minus 119865 (119889119905(119894) minus 1 120579
119894)
if 119894 = 119895
0 if 119894 = 119895
(6)
The usage of the cumulative functions in (6) which tendto 1 as the duration tends to infinity suggests that theprobability of self-transition tends to decrease as the sojourntime increases leading the model to always leave the currentstate if time approaches infinity
The nonrecurrent state transition probabilities A0=
[1198860
119894119895] rule the transitions between two different states It is
4 Mathematical Problems in Engineering
represented by a 119873 times 119873 matrix with the diagonal elementsequal to zero defined as
A0= [119886
0
119894119895] =
0 if 119894 = 119895
P (119904119905+1
= 119878119895| 119904
119905= 119878
119894) if 119894 = 119895
(7)
A0 must be specified as a stochastic matrix that is its ele-ments have to satisfy the constraintsum119873
119895=11198860
119894119895= 1 for all 119894
As a consequence of the above decomposition thedynamic of the underlying semi-Markov chain can be definedby specifying only the state-dependent duration parametersΘ and the nonrecurrentmatrixA0 since themodel transitionmatrix can be calculated at each time 119905 using (6) and (7)
Ad119905
= P (d119905) + (I minus P (d
119905))A0
(8)
where I is the identity matrix If we denote the elements ofthe dynamic transition matrix Ad
119905
as 119886119894119895(d
119905) the stochastic
constraint sum119873
119895=1119886119894119895(d
119905) = 1 for all 119894 and 119905 is guaranteed from
the fact that P(d119905) is a diagonal matrix and A0 is a stochastic
matrixFor several applications it is necessary to model the
absorbing state which in the case of industrial equipmentcorresponds to the ldquobrokenrdquo or ldquofailurerdquo state If we denote theabsorbing state as 119878
119896with 119896 isin [1119873] we must fix the 119896th row
of the nonrecurrentmatrixA0 to be 1198860119896119896= 1 and 1198860
119896119894= 0 for all
1 le 119894 le 119873 with 119894 = 119896 By substituting such A0 matrix in (8)it is easy to show that the element 119886
119896119896(d
119905) = 1 and remains
constant for all 119905 while the duration probability parameters120579119896are not influent for the absorbing state 119878
119896 An example of
absorbing state specification will be given in Section 5With respect to the input observation signals in this work
we consider both continuous and discrete data by adaptingthe suitable observationmodel depending on the observationnature In particular for the continuous case we modelthe observations with a multivariate mixture of Gaussiansdistributions This choice presents two main advantages (i)a multivariate model allows to deal with multiple observa-tions at the same time this is often the case of industrialequipments modeling since at each time multiple sensorsrsquomeasurements are available and (ii) mixture of Gaussians hasbeen proved to closely approximate any finite and continuousdensity function [33] Formally if we denote by x
119905the
observation vector at time 119905 and the generic observationvector beingmodeled as x the observation density for the 119895thstate is represented by a finite mixture of119872 gaussians
119887119895(x) =
119872
sum
119898=1
119888119895119898N (x120583
119895119898U
119895119898) 1 le 119895 le 119873 (9)
where 119888119895119898
is the mixture coefficient for the 119898th mixture instate 119878
119895 which satisfies the stochastic constraintsum119872
119898=1119888119895119898
= 1
for 1 le 119895 le 119873 and 119888119895119898
ge 0 for 1 le 119895 le 119873 and 1 le 119898 le 119872while N is the Gaussian density with mean vector 120583
119895119898and
covariance matrix U119895119898
for the 119898th mixture component instate 119895
In case of discrete data we model the observationswithin each state with a nonparametric discrete probability
distribution In particular if 119871 is the number of distinctobservation symbols per state and if we denote the symbolsas 119883 = 119883
1 119883
119871 and the observation at time 119905 as 119909
119905 the
observation symbol probability distribution can be defined asa matrix 119861 = [119887
119895(119897)] of dimensions119873 times 119871 where
119887119895(119897) = P [119909
119905= 119883
119897| 119904
119905= 119878
119895] 1 le 119895 le 119873 1 le 119897 le 119871 (10)
Since the system in each state at each time step can emit oneof the possible 119871 symbols the matrix 119861 is stochastic that is itis constrained to sum119871
119897=1119887119895(119897) = 1 for all 1 le 119895 le 119873
Finally as in the case of HMMs we specify the initial statedistribution 120587 = 120587
119894 which defines the probability of the
starting state as
120587119894= P [119904
1= 119878
119894] 1 le 119894 le 119873 (11)
From the above considerations two different HSMMmodels can be considered In the case of continuous obser-vation 120582 = (A0
Θ 119862 120583 119880 120587) and in the case of discreteobservation the HSMM is characterized by 120582 = (A0
Θ 119861 120587)An example of continuous HSMM with 3 states is shown inFigure 1
22 Learning and Inference Algorithms Let us denote thegeneric sequence of observations being indiscriminatelycontinuous vectors or discrete symbols as x = x
1x2sdot sdot sdot x
119879 in
order to use the defined HSMM model in practice similarlyto the HMM we need to solve three basic problems
(1) Given the observation x and a model 120582 calculate theprobability that the sequence x has been generated bythe model 120582 that is P(x | 120582)
(2) Given the observation x and a model 120582 calculatethe state sequence 119878 = 119904
11199042sdot sdot sdot 119904
119879which have most
probably generated the sequence x(3) Given the observation x find the parameters of the
model 120582 which maximize P(x | 120582)
As in case of HMM solving the above problems requiresusing the forward-backward [13] decoding (Viterbi [46] andForney [47]) and ExpectationMaximization [48] algorithmswhichwill be adapted to theHSMMintroduced in Section 21In the following we also propose a more effective estimatorof the state duration variable 119889
119905(119894) defined in (2)
221 The Forward-Backward Algorithm Given a genericsequence of observations x = x
1x2sdot sdot sdot x
119879 the goal is to
calculate the model likelihood that isP(x | 120582) This quantityis useful for the training procedure where the parametersthat locally maximize the model likelihood are chosen aswell as for classification problems The latter is the case inwhich the observation sequence x has to be mapped to oneof a finite set of 119862 classes represented by a set of HSMMparameters 119871 = 120582
1 120582
119862 The class of x is chosen such
that 120582(x) = argmax120582isin119871
P(119883 | 120582)To calculate the model likelihood we first define the
forward variable at each time 119905 as
120572119905(119894) = P (x
1x2sdot sdot sdot x
119905 119904
119905= 119878
119894| 120582) 1 le 119894 le 119873 (12)
Mathematical Problems in Engineering 5
Hiddenstates
Observationprobabilities
Sojournprobabilities
Time (u) Time (u) Time (u)
Observed Observed Observed
S1 S2 S3
a12 a23
P(oS1) P(oS2) P(oS3)
d3(u)d2(u)d1(u)
Figure 1 Graphical representation of an HSMM
Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula
120572119905(119895 119889) =
119863
sum
1198891015840=1
119873
sum
119894=1
(120572119905minus1198891015840 (119894 119889
1015840) 119886
0
119894119895119901119895119895(119889
1015840)
119905
prod
119896=119905minus119889+1
119887119895(x
119896))
1 le 119895 le 119873 1 le 119905 le 119879
(13)
that is the sum of the probabilities of being in the currentstate 119878
119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889
1015840le 119863
and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878
119894 1 le 119894 le 119873 and 119894 = 119895
The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction
To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency
To calculate the forward variable 120572119905(119895) using Azimirsquos
approach the duration-dependent transition matrix defined
in (8) is taken in consideration in the induction formula of(13) which becomes [30]
120572119905(119895) = [
119873
sum
119894=1
120572119905minus1
(119894) 119886119894119895(d
119905minus1)] 119887
119895(x
119905) (14)
To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889
119905(119894)] defined as
119889119905(119894) = E (119889
119905(119894) | x
1x2sdot sdot sdot x
119905 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (15)
where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula
d119905= 120574
119905minus1⊙ d
119905minus1+ 1 (16)
where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574
119905(119894)] (the
probability of being in state 119878119894at time 119905 given the observation
sequence and the model parameters) with dimensions119873 times 1
is calculated in terms of 120572119905(119894) as
120574119905(119894) = P (119904
119905= 119878
119894| x
1x2sdot sdot sdot x
119905 120582) =
120572119905(119894)
sum119873
119895=1120572119905(119895)
1 le 119894 le 119873
(17)
Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known
119889119905(119894) = 119904
119905minus1(119894) sdot 119889
119905minus1(119894) + 1 (18)
where for each 119905 119904119905(119894) is 1 if 119904
119905= 119878
119894 0 otherwise
6 Mathematical Problems in Engineering
A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878
1 119878
1 119878
2 ) the correct sequence of the duration
vector is d1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3= [1 1 1]
119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d
1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3=
[1 2 1]119879 which is in contradiction with the definition of the
state duration vector given in (2)To calculate the average state duration variable 119889
119905(119894) we
propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as
119889119905(119894) = P (119904
119905minus1= 119878
119894| 119904
119905= 119878
119894 x
1 x
119905) sdot (119889
119905minus1(119894) + 1) (19)
=119886119894119894(d
119905minus1) sdot 120572
119905minus1(119894) sdot 119887
119894(x
119905)
120572119905(119894)
sdot (119889119905minus1
(119894) + 1)
1 le 119894 le 119873
(20)
The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878
119894in the previous step
Using the proposed (20) the forward algorithm can bespecified as follows
(1) initialization with 1 le 119894 le 119873
1205721(119894) = 120587
119894119887119894(x
1)
1198891(119894) = 1
Ad1
= P (d1) + (I minus P (d
1))A0
(21)
where P(d119894) is estimated using (6)
(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1
120572119905+1
(119895) = [
119873
sum
119894=1
120572119905(119894) 119886
119894119895(d
119905)] 119887
119895(x
119905+1) (22)
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (23)
Ad119905+1
= P (d119905+1
) + (I minus P (d119905+1
))A0 (24)
where 119886119894119895(d
119905) are the coefficients of the matrix Ad
119905
(3) termination
P (x | 120582) =119873
sum
119894=1
120572119879(119894) (25)
Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573
119905(119894) as
120573119905(119894) = P (x
119905+1x119905+2
sdot sdot sdot x119879| 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (26)
Having estimated the dynamic transition matrix Ad119905
foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows
(1) Initialization
120573119879(119894) = 1 1 le 119894 le 119873 (27)
(2) Induction
120573119905(119894) =
119873
sum
119895=1
119886119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873
(28)
Although the variable 120573119905(119894) is not necessary for the
calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223
222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence
Formally given a sequence of observation x = x1x2sdot sdot sdot x
119879
the best state sequence 119878lowast = 119904lowast
1119904lowast
2sdot sdot sdot 119904
lowast
119879corresponding to x is
calculated by defining the variable 120575119905(119894) as
120575119905(119894) = max
11990411199042119904119905minus1
P (11990411199042 119904
119905= 119878
119894 x
1x2sdot sdot sdot x
119905| 120582) (29)
The procedure to recursively calculate the variable 120575119905(119894)
and to retrieve the target state sequence (ie the argumentswhich maximize the 120575
119905(119894)rsquos) for the proposed HSMM is a
straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575
119905(119894) of the dynamic transition matrix Ad
119905
= [119886119894119895(d
119905)]
calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows
(1) initialization with 1 le 119894 le 119873
1205751(119894) = 120587
119894119887119894(x
1)
1205951(119894) = 0
(30)
(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879
120575119905(119895) = max
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] 119887
119895(x
119905) (31)
120595119905(119895) = argmax
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] (32)
(3) termination
119875lowast= max
1le119894le119873
[120575119879(119894)] (33)
119904lowast
119879= argmax
1le119894le119873
[120575119879(119894)] (34)
where we keep track of the argument maximizing (31) usingthe vector 120595
119905 which tracked back gives the desired best
state sequence
119904lowast
119905= 120595
119905+1(119904
lowast
119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)
Mathematical Problems in Engineering 7
223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0
Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0
Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =
x1x2sdot sdot sdot x
119879 referred to as training set in the following the
training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)
We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations
Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations
Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585
119905(119894 119895) which
represents the probability of being in state 119878119894at time 119905 and
in state 119878119895at time 119905 + 1 given the model and the observation
sequence as
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582) (36)
However in the HSMM case the variable 120585119905(119894 119895) considers
the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582)
=P (119904
119905= 119878
119894 119904
119905+1= 119878
119895 x | 120582)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
sum119873
119894=1sum
119873
119895=1120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
(37)
From 120585119905(119894 119895) we can derive the quantity 120574
119905(119894) (already
defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model
parameters
120574119905(119894) =
119873
sum
119895=1
120585119905(119894 119895) (38)
Finally the the reestimation formulas for the parameters120587 and A0 are given by
120587119894= 120574
1(119894) (39)
1198860
119894119895=
(sum119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
sum119873
119895=1(sum
119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
(40)
where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where
119892119894119895= 0 for 119894 = 119895 and 119892
119894119895= 1 for 119894 = 119895 ⊙ represents the element
by element product between two matrices sum119879minus1
119905=1120574119905(119894) is the
expected number of transitions from state 119878119894 andsum119879minus1
119905=1120585119905(119894 119895)
is the expected number of transitions from state 119878119894to state 119878
119895
Equation (39) represents the expected number of timesthat the model starts in state 119878
119894 while (40) represents the
expected number of transitions from state 119878119894to state 119878
119895with
119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878
119894
For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873
119895=11198860
119894119895= 1
for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587
119894inherently sums up to 1 at each iteration since
it represents the expected frequency in state 119878119894at time 119905 = 1
for each 1 le 119894 le 119873With respect to the reestimation of the state duration
parameters Θ firstly we estimate the mean 120583119894119889
and thevariance 1205902
119894119889of the 119894th state duration for each 1 le 119894 le 119873
from the forward and backward variables and the estimationof the state duration variable
120583119894119889
=sum
119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)) 119889
119905(119894)
sum119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
(41)
1205902
119894119889= (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
sdot (119889119905(119894) minus 120583
119894119889)2
)
sdot (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)))
minus1
(42)
where (41) can be interpreted as the probability of transitionfrom state 119878
119894to 119878
119895with 119894 = 119895 at time 119905weighted by the duration
of state 119878119894at 119905 giving the desired expected value while in (42)
the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance
Then the parameters of the desired duration distributioncan be estimated from 120583
119894119889and 1205902
119894119889 For example if a Gamma
distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]
119894and 120578
119894
for each 1 le 119894 le 119873 can be calculated as ]119894= 120583
2
119894119889120590
2
119894119889and
120578119894= 120590
2
119894119889120583
119894119889
8 Mathematical Problems in Engineering
Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]
In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878
119895at time 119905
with the probability of the observation vector x119905evaluated by
the 119896th mixture component as
120574119905(119895 119896) = [
[
120572119905(119895) 120573
119905(119895)
sum119873
119895=1120572119905(119895) 120573
119905(119895)
]
]
sdot [
[
119888119895119896N (x
119905120583
119895119896U
119895119896)
sum119872
119898=1119888119895119898N (x
119905120583
119895119898U
119895119898)
]
]
(43)
By using the former quantity the parameters 119888119895119896 120583
119895119896 andU
119895119896
are reestimated through the following formulas
119888119895119896=
sum119879
119905=1120574119905(119895 119896)
sum119879
119905=1sum
119872
119898=1120574119905(119895 119898)
120583119895119896=sum
119879
119905=1120574119905(119895 119896) sdot x
119905
sum119879
119905=1120574119905(119895 119896)
U119895119896=sum
119879
119905=1120574119905(119895 119896) sdot (x
119905minus 120583
119895119896) (x
119905minus 120583
119895119896)119879
sum119879
119905=1120574119905(119895 119896)
(44)
where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for
the observation matrix 119887119895(119897) is
119887119895(119897) =
sum119879
119905=1with119909119905=119883119897
120574119905(119895)
sum119879
119905=1120574119905(119895)
(45)
where the quantity 120574119905(119895) which takes into account the dura-
tion dependent forward variable 120572119905(119895) is calculated through
(17)The reader is referred to Rabinerrsquos work [13] for the inter-
pretation on the observation parameters reestimation formu-las
3 AIC-Based Model Selection
In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion
In general information criteria are represented as a two-term structure They account for a compromise between
a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations
The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as
AIC =minus log 119871 () + 119901
119879 (46)
where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)
Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901
ℎ+ 119901
119900 where
119901ℎare the parameters of the hidden states layer while 119901
119900are
those of the observation layerIn particular 119901
ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where
(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition
matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the
number of parameters 120579 of the duration distribution
Concerning 119901119900 a distinction must be made between
discrete and continuous observations
(i) in the case of discrete observations with 119871 possibleobservable values 119901
119900= (119871 minus 1) sdot 119873 which accounts
for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate
mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901
[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862
4 Remaining Useful Lifetime Estimation
One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state
As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878
119896that represents the
failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state
Mathematical Problems in Engineering 9
119878119896 If we assume that the time to failure is a random variable
119863 following a determinate probability density we define theRUL at the current time 119905 as
RUL119905= 119863 = E (119863) 119904
119905+= 119878
119896 119904
119905+minus1= 119878
119894
1 le 119894 119896 le 119873 119894 = 119896
(47)
where E denotes the expected valueHaving fixed the failure state the estimation of the RUL
is performed in two steps every time a new observation isacquired (online)
(1) estimation of the current state(2) projection of the future state transitions until the
failure state is reached and estimation of the expectedsojourn time
The estimation of the current state is performed via theViterbi path that is the variable 120575
119905= [120575
119905(119894)]
1le119894le119873defined in
(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575
119905(119894) obtained as
120575119905(119894) = max
11990411199042119904119905minus1
P (119904119905= 119878
119894| 119904
11199042sdot sdot sdot 119904
119905minus1 x
1x2sdot sdot sdot x
119905 120582)
=120575119905(119894)
sum119873
119895=1120575119905(119895)
1 le 119894 le 119873
(48)
that is an estimate of the probability of being in state 119878119894at
time 119905Together with the normalized variable 120575
119905(119894) the maxi-
mum a posteriori estimate of the current state 119904lowast
119905is taken
into account according to (34) If 119904lowast119905coincides with the failure
state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904
lowast
119905) is
calculated as
119889avg (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (49)
where with 120583119889119894
we denote the expected value of the durationvariable in state 119878
119894according to the duration distribution
specified by the parameters 120579119894 Equation (49) thus estimates
the remaining time in the current state by subtracting theestimated states duration 119889
119905(119894) at time 119905 from the expected
sojourn time of state 119878119894 and weighting the result using the
uncertainty about the current state 120575119905(119894) and finally by
summing up all the contributions from each stateIn addition to the average remaining time a lower and an
upper bound value can be calculated based on the standarddeviation 120590
119889119894
of the duration distribution for state 119878119894
119889low (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (50)
119889up (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (51)
Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows
120575next = [120575119905+119889
(119894)]1le119894le119873
= (A0)119879
sdot 120575119905 (52)
while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as
119904lowast
next = 119904lowast
119905+119889= argmax
1le119894le119873
120575119905+119889
(119894) (53)
Again if 119904lowast119905+119889
coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =
119889avg(119904lowast
119905) calculated at the previous step with the bound values
119863low = 119889low(119904lowast
119905) and119863up = 119889up(119904
lowast
119905) Otherwise the estimation
of the sojourn time of the next state is calculated as follows
119889avg (119904lowast
119905+119889) =
119873
sum
119894=1
120583119889119894
⊙ 120575119905+119889
(119894) (54)
119889low (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
) ⊙ 120575119905+119889
(119894) (55)
119889up (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
) ⊙ 120575119905+119889
(119894) (56)
This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state
119863avg = sum119889avg (57)
119863low = sum119889low (58)
119863up = sum119889up (59)
Finally Algorithm 1 details the above described RUL esti-mation procedure
5 Experimental Results
To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data
The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)
10 Mathematical Problems in Engineering
(1) function RulEstimation(x119905 119878
119896) ⊳ x
119905 The last observation acquired
(2) ⊳ 119878119896 The failure state
(3) Initialization(4) 119863avg larr 0
(5) 119863low larr 0
(6) 119863up larr 0
(7) Current state estimation(8) Calculate 120575
119905⊳ Using (48)
(9) Calculate 119904lowast119905
⊳ Using (34)(10) Calculate d
119905⊳ Using (20)
(11) 119878 larr 119904lowast
119905
(12) Loop(13) while 119878 = 119878
119896do
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
represented by a 119873 times 119873 matrix with the diagonal elementsequal to zero defined as
A0= [119886
0
119894119895] =
0 if 119894 = 119895
P (119904119905+1
= 119878119895| 119904
119905= 119878
119894) if 119894 = 119895
(7)
A0 must be specified as a stochastic matrix that is its ele-ments have to satisfy the constraintsum119873
119895=11198860
119894119895= 1 for all 119894
As a consequence of the above decomposition thedynamic of the underlying semi-Markov chain can be definedby specifying only the state-dependent duration parametersΘ and the nonrecurrentmatrixA0 since themodel transitionmatrix can be calculated at each time 119905 using (6) and (7)
Ad119905
= P (d119905) + (I minus P (d
119905))A0
(8)
where I is the identity matrix If we denote the elements ofthe dynamic transition matrix Ad
119905
as 119886119894119895(d
119905) the stochastic
constraint sum119873
119895=1119886119894119895(d
119905) = 1 for all 119894 and 119905 is guaranteed from
the fact that P(d119905) is a diagonal matrix and A0 is a stochastic
matrixFor several applications it is necessary to model the
absorbing state which in the case of industrial equipmentcorresponds to the ldquobrokenrdquo or ldquofailurerdquo state If we denote theabsorbing state as 119878
119896with 119896 isin [1119873] we must fix the 119896th row
of the nonrecurrentmatrixA0 to be 1198860119896119896= 1 and 1198860
119896119894= 0 for all
1 le 119894 le 119873 with 119894 = 119896 By substituting such A0 matrix in (8)it is easy to show that the element 119886
119896119896(d
119905) = 1 and remains
constant for all 119905 while the duration probability parameters120579119896are not influent for the absorbing state 119878
119896 An example of
absorbing state specification will be given in Section 5With respect to the input observation signals in this work
we consider both continuous and discrete data by adaptingthe suitable observationmodel depending on the observationnature In particular for the continuous case we modelthe observations with a multivariate mixture of Gaussiansdistributions This choice presents two main advantages (i)a multivariate model allows to deal with multiple observa-tions at the same time this is often the case of industrialequipments modeling since at each time multiple sensorsrsquomeasurements are available and (ii) mixture of Gaussians hasbeen proved to closely approximate any finite and continuousdensity function [33] Formally if we denote by x
119905the
observation vector at time 119905 and the generic observationvector beingmodeled as x the observation density for the 119895thstate is represented by a finite mixture of119872 gaussians
119887119895(x) =
119872
sum
119898=1
119888119895119898N (x120583
119895119898U
119895119898) 1 le 119895 le 119873 (9)
where 119888119895119898
is the mixture coefficient for the 119898th mixture instate 119878
119895 which satisfies the stochastic constraintsum119872
119898=1119888119895119898
= 1
for 1 le 119895 le 119873 and 119888119895119898
ge 0 for 1 le 119895 le 119873 and 1 le 119898 le 119872while N is the Gaussian density with mean vector 120583
119895119898and
covariance matrix U119895119898
for the 119898th mixture component instate 119895
In case of discrete data we model the observationswithin each state with a nonparametric discrete probability
distribution In particular if 119871 is the number of distinctobservation symbols per state and if we denote the symbolsas 119883 = 119883
1 119883
119871 and the observation at time 119905 as 119909
119905 the
observation symbol probability distribution can be defined asa matrix 119861 = [119887
119895(119897)] of dimensions119873 times 119871 where
119887119895(119897) = P [119909
119905= 119883
119897| 119904
119905= 119878
119895] 1 le 119895 le 119873 1 le 119897 le 119871 (10)
Since the system in each state at each time step can emit oneof the possible 119871 symbols the matrix 119861 is stochastic that is itis constrained to sum119871
119897=1119887119895(119897) = 1 for all 1 le 119895 le 119873
Finally as in the case of HMMs we specify the initial statedistribution 120587 = 120587
119894 which defines the probability of the
starting state as
120587119894= P [119904
1= 119878
119894] 1 le 119894 le 119873 (11)
From the above considerations two different HSMMmodels can be considered In the case of continuous obser-vation 120582 = (A0
Θ 119862 120583 119880 120587) and in the case of discreteobservation the HSMM is characterized by 120582 = (A0
Θ 119861 120587)An example of continuous HSMM with 3 states is shown inFigure 1
22 Learning and Inference Algorithms Let us denote thegeneric sequence of observations being indiscriminatelycontinuous vectors or discrete symbols as x = x
1x2sdot sdot sdot x
119879 in
order to use the defined HSMM model in practice similarlyto the HMM we need to solve three basic problems
(1) Given the observation x and a model 120582 calculate theprobability that the sequence x has been generated bythe model 120582 that is P(x | 120582)
(2) Given the observation x and a model 120582 calculatethe state sequence 119878 = 119904
11199042sdot sdot sdot 119904
119879which have most
probably generated the sequence x(3) Given the observation x find the parameters of the
model 120582 which maximize P(x | 120582)
As in case of HMM solving the above problems requiresusing the forward-backward [13] decoding (Viterbi [46] andForney [47]) and ExpectationMaximization [48] algorithmswhichwill be adapted to theHSMMintroduced in Section 21In the following we also propose a more effective estimatorof the state duration variable 119889
119905(119894) defined in (2)
221 The Forward-Backward Algorithm Given a genericsequence of observations x = x
1x2sdot sdot sdot x
119879 the goal is to
calculate the model likelihood that isP(x | 120582) This quantityis useful for the training procedure where the parametersthat locally maximize the model likelihood are chosen aswell as for classification problems The latter is the case inwhich the observation sequence x has to be mapped to oneof a finite set of 119862 classes represented by a set of HSMMparameters 119871 = 120582
1 120582
119862 The class of x is chosen such
that 120582(x) = argmax120582isin119871
P(119883 | 120582)To calculate the model likelihood we first define the
forward variable at each time 119905 as
120572119905(119894) = P (x
1x2sdot sdot sdot x
119905 119904
119905= 119878
119894| 120582) 1 le 119894 le 119873 (12)
Mathematical Problems in Engineering 5
Hiddenstates
Observationprobabilities
Sojournprobabilities
Time (u) Time (u) Time (u)
Observed Observed Observed
S1 S2 S3
a12 a23
P(oS1) P(oS2) P(oS3)
d3(u)d2(u)d1(u)
Figure 1 Graphical representation of an HSMM
Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula
120572119905(119895 119889) =
119863
sum
1198891015840=1
119873
sum
119894=1
(120572119905minus1198891015840 (119894 119889
1015840) 119886
0
119894119895119901119895119895(119889
1015840)
119905
prod
119896=119905minus119889+1
119887119895(x
119896))
1 le 119895 le 119873 1 le 119905 le 119879
(13)
that is the sum of the probabilities of being in the currentstate 119878
119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889
1015840le 119863
and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878
119894 1 le 119894 le 119873 and 119894 = 119895
The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction
To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency
To calculate the forward variable 120572119905(119895) using Azimirsquos
approach the duration-dependent transition matrix defined
in (8) is taken in consideration in the induction formula of(13) which becomes [30]
120572119905(119895) = [
119873
sum
119894=1
120572119905minus1
(119894) 119886119894119895(d
119905minus1)] 119887
119895(x
119905) (14)
To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889
119905(119894)] defined as
119889119905(119894) = E (119889
119905(119894) | x
1x2sdot sdot sdot x
119905 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (15)
where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula
d119905= 120574
119905minus1⊙ d
119905minus1+ 1 (16)
where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574
119905(119894)] (the
probability of being in state 119878119894at time 119905 given the observation
sequence and the model parameters) with dimensions119873 times 1
is calculated in terms of 120572119905(119894) as
120574119905(119894) = P (119904
119905= 119878
119894| x
1x2sdot sdot sdot x
119905 120582) =
120572119905(119894)
sum119873
119895=1120572119905(119895)
1 le 119894 le 119873
(17)
Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known
119889119905(119894) = 119904
119905minus1(119894) sdot 119889
119905minus1(119894) + 1 (18)
where for each 119905 119904119905(119894) is 1 if 119904
119905= 119878
119894 0 otherwise
6 Mathematical Problems in Engineering
A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878
1 119878
1 119878
2 ) the correct sequence of the duration
vector is d1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3= [1 1 1]
119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d
1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3=
[1 2 1]119879 which is in contradiction with the definition of the
state duration vector given in (2)To calculate the average state duration variable 119889
119905(119894) we
propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as
119889119905(119894) = P (119904
119905minus1= 119878
119894| 119904
119905= 119878
119894 x
1 x
119905) sdot (119889
119905minus1(119894) + 1) (19)
=119886119894119894(d
119905minus1) sdot 120572
119905minus1(119894) sdot 119887
119894(x
119905)
120572119905(119894)
sdot (119889119905minus1
(119894) + 1)
1 le 119894 le 119873
(20)
The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878
119894in the previous step
Using the proposed (20) the forward algorithm can bespecified as follows
(1) initialization with 1 le 119894 le 119873
1205721(119894) = 120587
119894119887119894(x
1)
1198891(119894) = 1
Ad1
= P (d1) + (I minus P (d
1))A0
(21)
where P(d119894) is estimated using (6)
(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1
120572119905+1
(119895) = [
119873
sum
119894=1
120572119905(119894) 119886
119894119895(d
119905)] 119887
119895(x
119905+1) (22)
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (23)
Ad119905+1
= P (d119905+1
) + (I minus P (d119905+1
))A0 (24)
where 119886119894119895(d
119905) are the coefficients of the matrix Ad
119905
(3) termination
P (x | 120582) =119873
sum
119894=1
120572119879(119894) (25)
Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573
119905(119894) as
120573119905(119894) = P (x
119905+1x119905+2
sdot sdot sdot x119879| 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (26)
Having estimated the dynamic transition matrix Ad119905
foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows
(1) Initialization
120573119879(119894) = 1 1 le 119894 le 119873 (27)
(2) Induction
120573119905(119894) =
119873
sum
119895=1
119886119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873
(28)
Although the variable 120573119905(119894) is not necessary for the
calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223
222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence
Formally given a sequence of observation x = x1x2sdot sdot sdot x
119879
the best state sequence 119878lowast = 119904lowast
1119904lowast
2sdot sdot sdot 119904
lowast
119879corresponding to x is
calculated by defining the variable 120575119905(119894) as
120575119905(119894) = max
11990411199042119904119905minus1
P (11990411199042 119904
119905= 119878
119894 x
1x2sdot sdot sdot x
119905| 120582) (29)
The procedure to recursively calculate the variable 120575119905(119894)
and to retrieve the target state sequence (ie the argumentswhich maximize the 120575
119905(119894)rsquos) for the proposed HSMM is a
straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575
119905(119894) of the dynamic transition matrix Ad
119905
= [119886119894119895(d
119905)]
calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows
(1) initialization with 1 le 119894 le 119873
1205751(119894) = 120587
119894119887119894(x
1)
1205951(119894) = 0
(30)
(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879
120575119905(119895) = max
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] 119887
119895(x
119905) (31)
120595119905(119895) = argmax
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] (32)
(3) termination
119875lowast= max
1le119894le119873
[120575119879(119894)] (33)
119904lowast
119879= argmax
1le119894le119873
[120575119879(119894)] (34)
where we keep track of the argument maximizing (31) usingthe vector 120595
119905 which tracked back gives the desired best
state sequence
119904lowast
119905= 120595
119905+1(119904
lowast
119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)
Mathematical Problems in Engineering 7
223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0
Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0
Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =
x1x2sdot sdot sdot x
119879 referred to as training set in the following the
training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)
We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations
Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations
Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585
119905(119894 119895) which
represents the probability of being in state 119878119894at time 119905 and
in state 119878119895at time 119905 + 1 given the model and the observation
sequence as
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582) (36)
However in the HSMM case the variable 120585119905(119894 119895) considers
the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582)
=P (119904
119905= 119878
119894 119904
119905+1= 119878
119895 x | 120582)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
sum119873
119894=1sum
119873
119895=1120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
(37)
From 120585119905(119894 119895) we can derive the quantity 120574
119905(119894) (already
defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model
parameters
120574119905(119894) =
119873
sum
119895=1
120585119905(119894 119895) (38)
Finally the the reestimation formulas for the parameters120587 and A0 are given by
120587119894= 120574
1(119894) (39)
1198860
119894119895=
(sum119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
sum119873
119895=1(sum
119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
(40)
where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where
119892119894119895= 0 for 119894 = 119895 and 119892
119894119895= 1 for 119894 = 119895 ⊙ represents the element
by element product between two matrices sum119879minus1
119905=1120574119905(119894) is the
expected number of transitions from state 119878119894 andsum119879minus1
119905=1120585119905(119894 119895)
is the expected number of transitions from state 119878119894to state 119878
119895
Equation (39) represents the expected number of timesthat the model starts in state 119878
119894 while (40) represents the
expected number of transitions from state 119878119894to state 119878
119895with
119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878
119894
For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873
119895=11198860
119894119895= 1
for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587
119894inherently sums up to 1 at each iteration since
it represents the expected frequency in state 119878119894at time 119905 = 1
for each 1 le 119894 le 119873With respect to the reestimation of the state duration
parameters Θ firstly we estimate the mean 120583119894119889
and thevariance 1205902
119894119889of the 119894th state duration for each 1 le 119894 le 119873
from the forward and backward variables and the estimationof the state duration variable
120583119894119889
=sum
119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)) 119889
119905(119894)
sum119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
(41)
1205902
119894119889= (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
sdot (119889119905(119894) minus 120583
119894119889)2
)
sdot (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)))
minus1
(42)
where (41) can be interpreted as the probability of transitionfrom state 119878
119894to 119878
119895with 119894 = 119895 at time 119905weighted by the duration
of state 119878119894at 119905 giving the desired expected value while in (42)
the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance
Then the parameters of the desired duration distributioncan be estimated from 120583
119894119889and 1205902
119894119889 For example if a Gamma
distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]
119894and 120578
119894
for each 1 le 119894 le 119873 can be calculated as ]119894= 120583
2
119894119889120590
2
119894119889and
120578119894= 120590
2
119894119889120583
119894119889
8 Mathematical Problems in Engineering
Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]
In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878
119895at time 119905
with the probability of the observation vector x119905evaluated by
the 119896th mixture component as
120574119905(119895 119896) = [
[
120572119905(119895) 120573
119905(119895)
sum119873
119895=1120572119905(119895) 120573
119905(119895)
]
]
sdot [
[
119888119895119896N (x
119905120583
119895119896U
119895119896)
sum119872
119898=1119888119895119898N (x
119905120583
119895119898U
119895119898)
]
]
(43)
By using the former quantity the parameters 119888119895119896 120583
119895119896 andU
119895119896
are reestimated through the following formulas
119888119895119896=
sum119879
119905=1120574119905(119895 119896)
sum119879
119905=1sum
119872
119898=1120574119905(119895 119898)
120583119895119896=sum
119879
119905=1120574119905(119895 119896) sdot x
119905
sum119879
119905=1120574119905(119895 119896)
U119895119896=sum
119879
119905=1120574119905(119895 119896) sdot (x
119905minus 120583
119895119896) (x
119905minus 120583
119895119896)119879
sum119879
119905=1120574119905(119895 119896)
(44)
where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for
the observation matrix 119887119895(119897) is
119887119895(119897) =
sum119879
119905=1with119909119905=119883119897
120574119905(119895)
sum119879
119905=1120574119905(119895)
(45)
where the quantity 120574119905(119895) which takes into account the dura-
tion dependent forward variable 120572119905(119895) is calculated through
(17)The reader is referred to Rabinerrsquos work [13] for the inter-
pretation on the observation parameters reestimation formu-las
3 AIC-Based Model Selection
In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion
In general information criteria are represented as a two-term structure They account for a compromise between
a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations
The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as
AIC =minus log 119871 () + 119901
119879 (46)
where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)
Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901
ℎ+ 119901
119900 where
119901ℎare the parameters of the hidden states layer while 119901
119900are
those of the observation layerIn particular 119901
ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where
(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition
matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the
number of parameters 120579 of the duration distribution
Concerning 119901119900 a distinction must be made between
discrete and continuous observations
(i) in the case of discrete observations with 119871 possibleobservable values 119901
119900= (119871 minus 1) sdot 119873 which accounts
for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate
mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901
[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862
4 Remaining Useful Lifetime Estimation
One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state
As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878
119896that represents the
failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state
Mathematical Problems in Engineering 9
119878119896 If we assume that the time to failure is a random variable
119863 following a determinate probability density we define theRUL at the current time 119905 as
RUL119905= 119863 = E (119863) 119904
119905+= 119878
119896 119904
119905+minus1= 119878
119894
1 le 119894 119896 le 119873 119894 = 119896
(47)
where E denotes the expected valueHaving fixed the failure state the estimation of the RUL
is performed in two steps every time a new observation isacquired (online)
(1) estimation of the current state(2) projection of the future state transitions until the
failure state is reached and estimation of the expectedsojourn time
The estimation of the current state is performed via theViterbi path that is the variable 120575
119905= [120575
119905(119894)]
1le119894le119873defined in
(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575
119905(119894) obtained as
120575119905(119894) = max
11990411199042119904119905minus1
P (119904119905= 119878
119894| 119904
11199042sdot sdot sdot 119904
119905minus1 x
1x2sdot sdot sdot x
119905 120582)
=120575119905(119894)
sum119873
119895=1120575119905(119895)
1 le 119894 le 119873
(48)
that is an estimate of the probability of being in state 119878119894at
time 119905Together with the normalized variable 120575
119905(119894) the maxi-
mum a posteriori estimate of the current state 119904lowast
119905is taken
into account according to (34) If 119904lowast119905coincides with the failure
state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904
lowast
119905) is
calculated as
119889avg (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (49)
where with 120583119889119894
we denote the expected value of the durationvariable in state 119878
119894according to the duration distribution
specified by the parameters 120579119894 Equation (49) thus estimates
the remaining time in the current state by subtracting theestimated states duration 119889
119905(119894) at time 119905 from the expected
sojourn time of state 119878119894 and weighting the result using the
uncertainty about the current state 120575119905(119894) and finally by
summing up all the contributions from each stateIn addition to the average remaining time a lower and an
upper bound value can be calculated based on the standarddeviation 120590
119889119894
of the duration distribution for state 119878119894
119889low (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (50)
119889up (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (51)
Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows
120575next = [120575119905+119889
(119894)]1le119894le119873
= (A0)119879
sdot 120575119905 (52)
while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as
119904lowast
next = 119904lowast
119905+119889= argmax
1le119894le119873
120575119905+119889
(119894) (53)
Again if 119904lowast119905+119889
coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =
119889avg(119904lowast
119905) calculated at the previous step with the bound values
119863low = 119889low(119904lowast
119905) and119863up = 119889up(119904
lowast
119905) Otherwise the estimation
of the sojourn time of the next state is calculated as follows
119889avg (119904lowast
119905+119889) =
119873
sum
119894=1
120583119889119894
⊙ 120575119905+119889
(119894) (54)
119889low (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
) ⊙ 120575119905+119889
(119894) (55)
119889up (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
) ⊙ 120575119905+119889
(119894) (56)
This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state
119863avg = sum119889avg (57)
119863low = sum119889low (58)
119863up = sum119889up (59)
Finally Algorithm 1 details the above described RUL esti-mation procedure
5 Experimental Results
To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data
The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)
10 Mathematical Problems in Engineering
(1) function RulEstimation(x119905 119878
119896) ⊳ x
119905 The last observation acquired
(2) ⊳ 119878119896 The failure state
(3) Initialization(4) 119863avg larr 0
(5) 119863low larr 0
(6) 119863up larr 0
(7) Current state estimation(8) Calculate 120575
119905⊳ Using (48)
(9) Calculate 119904lowast119905
⊳ Using (34)(10) Calculate d
119905⊳ Using (20)
(11) 119878 larr 119904lowast
119905
(12) Loop(13) while 119878 = 119878
119896do
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula
120572119905(119895 119889) =
119863
sum
1198891015840=1
119873
sum
119894=1
(120572119905minus1198891015840 (119894 119889
1015840) 119886
0
119894119895119901119895119895(119889
1015840)
119905
prod
119896=119905minus119889+1
119887119895(x
119896))
1 le 119895 le 119873 1 le 119905 le 119879
(13)
that is the sum of the probabilities of being in the currentstate 119878
119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889
1015840le 119863
and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878
119894 1 le 119894 le 119873 and 119894 = 119895
The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction
To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency
To calculate the forward variable 120572119905(119895) using Azimirsquos
approach the duration-dependent transition matrix defined
in (8) is taken in consideration in the induction formula of(13) which becomes [30]
120572119905(119895) = [
119873
sum
119894=1
120572119905minus1
(119894) 119886119894119895(d
119905minus1)] 119887
119895(x
119905) (14)
To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889
119905(119894)] defined as
119889119905(119894) = E (119889
119905(119894) | x
1x2sdot sdot sdot x
119905 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (15)
where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula
d119905= 120574
119905minus1⊙ d
119905minus1+ 1 (16)
where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574
119905(119894)] (the
probability of being in state 119878119894at time 119905 given the observation
sequence and the model parameters) with dimensions119873 times 1
is calculated in terms of 120572119905(119894) as
120574119905(119894) = P (119904
119905= 119878
119894| x
1x2sdot sdot sdot x
119905 120582) =
120572119905(119894)
sum119873
119895=1120572119905(119895)
1 le 119894 le 119873
(17)
Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known
119889119905(119894) = 119904
119905minus1(119894) sdot 119889
119905minus1(119894) + 1 (18)
where for each 119905 119904119905(119894) is 1 if 119904
119905= 119878
119894 0 otherwise
6 Mathematical Problems in Engineering
A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878
1 119878
1 119878
2 ) the correct sequence of the duration
vector is d1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3= [1 1 1]
119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d
1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3=
[1 2 1]119879 which is in contradiction with the definition of the
state duration vector given in (2)To calculate the average state duration variable 119889
119905(119894) we
propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as
119889119905(119894) = P (119904
119905minus1= 119878
119894| 119904
119905= 119878
119894 x
1 x
119905) sdot (119889
119905minus1(119894) + 1) (19)
=119886119894119894(d
119905minus1) sdot 120572
119905minus1(119894) sdot 119887
119894(x
119905)
120572119905(119894)
sdot (119889119905minus1
(119894) + 1)
1 le 119894 le 119873
(20)
The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878
119894in the previous step
Using the proposed (20) the forward algorithm can bespecified as follows
(1) initialization with 1 le 119894 le 119873
1205721(119894) = 120587
119894119887119894(x
1)
1198891(119894) = 1
Ad1
= P (d1) + (I minus P (d
1))A0
(21)
where P(d119894) is estimated using (6)
(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1
120572119905+1
(119895) = [
119873
sum
119894=1
120572119905(119894) 119886
119894119895(d
119905)] 119887
119895(x
119905+1) (22)
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (23)
Ad119905+1
= P (d119905+1
) + (I minus P (d119905+1
))A0 (24)
where 119886119894119895(d
119905) are the coefficients of the matrix Ad
119905
(3) termination
P (x | 120582) =119873
sum
119894=1
120572119879(119894) (25)
Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573
119905(119894) as
120573119905(119894) = P (x
119905+1x119905+2
sdot sdot sdot x119879| 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (26)
Having estimated the dynamic transition matrix Ad119905
foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows
(1) Initialization
120573119879(119894) = 1 1 le 119894 le 119873 (27)
(2) Induction
120573119905(119894) =
119873
sum
119895=1
119886119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873
(28)
Although the variable 120573119905(119894) is not necessary for the
calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223
222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence
Formally given a sequence of observation x = x1x2sdot sdot sdot x
119879
the best state sequence 119878lowast = 119904lowast
1119904lowast
2sdot sdot sdot 119904
lowast
119879corresponding to x is
calculated by defining the variable 120575119905(119894) as
120575119905(119894) = max
11990411199042119904119905minus1
P (11990411199042 119904
119905= 119878
119894 x
1x2sdot sdot sdot x
119905| 120582) (29)
The procedure to recursively calculate the variable 120575119905(119894)
and to retrieve the target state sequence (ie the argumentswhich maximize the 120575
119905(119894)rsquos) for the proposed HSMM is a
straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575
119905(119894) of the dynamic transition matrix Ad
119905
= [119886119894119895(d
119905)]
calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows
(1) initialization with 1 le 119894 le 119873
1205751(119894) = 120587
119894119887119894(x
1)
1205951(119894) = 0
(30)
(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879
120575119905(119895) = max
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] 119887
119895(x
119905) (31)
120595119905(119895) = argmax
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] (32)
(3) termination
119875lowast= max
1le119894le119873
[120575119879(119894)] (33)
119904lowast
119879= argmax
1le119894le119873
[120575119879(119894)] (34)
where we keep track of the argument maximizing (31) usingthe vector 120595
119905 which tracked back gives the desired best
state sequence
119904lowast
119905= 120595
119905+1(119904
lowast
119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)
Mathematical Problems in Engineering 7
223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0
Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0
Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =
x1x2sdot sdot sdot x
119879 referred to as training set in the following the
training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)
We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations
Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations
Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585
119905(119894 119895) which
represents the probability of being in state 119878119894at time 119905 and
in state 119878119895at time 119905 + 1 given the model and the observation
sequence as
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582) (36)
However in the HSMM case the variable 120585119905(119894 119895) considers
the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582)
=P (119904
119905= 119878
119894 119904
119905+1= 119878
119895 x | 120582)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
sum119873
119894=1sum
119873
119895=1120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
(37)
From 120585119905(119894 119895) we can derive the quantity 120574
119905(119894) (already
defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model
parameters
120574119905(119894) =
119873
sum
119895=1
120585119905(119894 119895) (38)
Finally the the reestimation formulas for the parameters120587 and A0 are given by
120587119894= 120574
1(119894) (39)
1198860
119894119895=
(sum119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
sum119873
119895=1(sum
119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
(40)
where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where
119892119894119895= 0 for 119894 = 119895 and 119892
119894119895= 1 for 119894 = 119895 ⊙ represents the element
by element product between two matrices sum119879minus1
119905=1120574119905(119894) is the
expected number of transitions from state 119878119894 andsum119879minus1
119905=1120585119905(119894 119895)
is the expected number of transitions from state 119878119894to state 119878
119895
Equation (39) represents the expected number of timesthat the model starts in state 119878
119894 while (40) represents the
expected number of transitions from state 119878119894to state 119878
119895with
119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878
119894
For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873
119895=11198860
119894119895= 1
for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587
119894inherently sums up to 1 at each iteration since
it represents the expected frequency in state 119878119894at time 119905 = 1
for each 1 le 119894 le 119873With respect to the reestimation of the state duration
parameters Θ firstly we estimate the mean 120583119894119889
and thevariance 1205902
119894119889of the 119894th state duration for each 1 le 119894 le 119873
from the forward and backward variables and the estimationof the state duration variable
120583119894119889
=sum
119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)) 119889
119905(119894)
sum119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
(41)
1205902
119894119889= (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
sdot (119889119905(119894) minus 120583
119894119889)2
)
sdot (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)))
minus1
(42)
where (41) can be interpreted as the probability of transitionfrom state 119878
119894to 119878
119895with 119894 = 119895 at time 119905weighted by the duration
of state 119878119894at 119905 giving the desired expected value while in (42)
the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance
Then the parameters of the desired duration distributioncan be estimated from 120583
119894119889and 1205902
119894119889 For example if a Gamma
distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]
119894and 120578
119894
for each 1 le 119894 le 119873 can be calculated as ]119894= 120583
2
119894119889120590
2
119894119889and
120578119894= 120590
2
119894119889120583
119894119889
8 Mathematical Problems in Engineering
Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]
In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878
119895at time 119905
with the probability of the observation vector x119905evaluated by
the 119896th mixture component as
120574119905(119895 119896) = [
[
120572119905(119895) 120573
119905(119895)
sum119873
119895=1120572119905(119895) 120573
119905(119895)
]
]
sdot [
[
119888119895119896N (x
119905120583
119895119896U
119895119896)
sum119872
119898=1119888119895119898N (x
119905120583
119895119898U
119895119898)
]
]
(43)
By using the former quantity the parameters 119888119895119896 120583
119895119896 andU
119895119896
are reestimated through the following formulas
119888119895119896=
sum119879
119905=1120574119905(119895 119896)
sum119879
119905=1sum
119872
119898=1120574119905(119895 119898)
120583119895119896=sum
119879
119905=1120574119905(119895 119896) sdot x
119905
sum119879
119905=1120574119905(119895 119896)
U119895119896=sum
119879
119905=1120574119905(119895 119896) sdot (x
119905minus 120583
119895119896) (x
119905minus 120583
119895119896)119879
sum119879
119905=1120574119905(119895 119896)
(44)
where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for
the observation matrix 119887119895(119897) is
119887119895(119897) =
sum119879
119905=1with119909119905=119883119897
120574119905(119895)
sum119879
119905=1120574119905(119895)
(45)
where the quantity 120574119905(119895) which takes into account the dura-
tion dependent forward variable 120572119905(119895) is calculated through
(17)The reader is referred to Rabinerrsquos work [13] for the inter-
pretation on the observation parameters reestimation formu-las
3 AIC-Based Model Selection
In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion
In general information criteria are represented as a two-term structure They account for a compromise between
a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations
The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as
AIC =minus log 119871 () + 119901
119879 (46)
where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)
Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901
ℎ+ 119901
119900 where
119901ℎare the parameters of the hidden states layer while 119901
119900are
those of the observation layerIn particular 119901
ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where
(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition
matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the
number of parameters 120579 of the duration distribution
Concerning 119901119900 a distinction must be made between
discrete and continuous observations
(i) in the case of discrete observations with 119871 possibleobservable values 119901
119900= (119871 minus 1) sdot 119873 which accounts
for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate
mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901
[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862
4 Remaining Useful Lifetime Estimation
One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state
As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878
119896that represents the
failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state
Mathematical Problems in Engineering 9
119878119896 If we assume that the time to failure is a random variable
119863 following a determinate probability density we define theRUL at the current time 119905 as
RUL119905= 119863 = E (119863) 119904
119905+= 119878
119896 119904
119905+minus1= 119878
119894
1 le 119894 119896 le 119873 119894 = 119896
(47)
where E denotes the expected valueHaving fixed the failure state the estimation of the RUL
is performed in two steps every time a new observation isacquired (online)
(1) estimation of the current state(2) projection of the future state transitions until the
failure state is reached and estimation of the expectedsojourn time
The estimation of the current state is performed via theViterbi path that is the variable 120575
119905= [120575
119905(119894)]
1le119894le119873defined in
(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575
119905(119894) obtained as
120575119905(119894) = max
11990411199042119904119905minus1
P (119904119905= 119878
119894| 119904
11199042sdot sdot sdot 119904
119905minus1 x
1x2sdot sdot sdot x
119905 120582)
=120575119905(119894)
sum119873
119895=1120575119905(119895)
1 le 119894 le 119873
(48)
that is an estimate of the probability of being in state 119878119894at
time 119905Together with the normalized variable 120575
119905(119894) the maxi-
mum a posteriori estimate of the current state 119904lowast
119905is taken
into account according to (34) If 119904lowast119905coincides with the failure
state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904
lowast
119905) is
calculated as
119889avg (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (49)
where with 120583119889119894
we denote the expected value of the durationvariable in state 119878
119894according to the duration distribution
specified by the parameters 120579119894 Equation (49) thus estimates
the remaining time in the current state by subtracting theestimated states duration 119889
119905(119894) at time 119905 from the expected
sojourn time of state 119878119894 and weighting the result using the
uncertainty about the current state 120575119905(119894) and finally by
summing up all the contributions from each stateIn addition to the average remaining time a lower and an
upper bound value can be calculated based on the standarddeviation 120590
119889119894
of the duration distribution for state 119878119894
119889low (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (50)
119889up (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (51)
Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows
120575next = [120575119905+119889
(119894)]1le119894le119873
= (A0)119879
sdot 120575119905 (52)
while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as
119904lowast
next = 119904lowast
119905+119889= argmax
1le119894le119873
120575119905+119889
(119894) (53)
Again if 119904lowast119905+119889
coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =
119889avg(119904lowast
119905) calculated at the previous step with the bound values
119863low = 119889low(119904lowast
119905) and119863up = 119889up(119904
lowast
119905) Otherwise the estimation
of the sojourn time of the next state is calculated as follows
119889avg (119904lowast
119905+119889) =
119873
sum
119894=1
120583119889119894
⊙ 120575119905+119889
(119894) (54)
119889low (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
) ⊙ 120575119905+119889
(119894) (55)
119889up (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
) ⊙ 120575119905+119889
(119894) (56)
This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state
119863avg = sum119889avg (57)
119863low = sum119889low (58)
119863up = sum119889up (59)
Finally Algorithm 1 details the above described RUL esti-mation procedure
5 Experimental Results
To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data
The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)
10 Mathematical Problems in Engineering
(1) function RulEstimation(x119905 119878
119896) ⊳ x
119905 The last observation acquired
(2) ⊳ 119878119896 The failure state
(3) Initialization(4) 119863avg larr 0
(5) 119863low larr 0
(6) 119863up larr 0
(7) Current state estimation(8) Calculate 120575
119905⊳ Using (48)
(9) Calculate 119904lowast119905
⊳ Using (34)(10) Calculate d
119905⊳ Using (20)
(11) 119878 larr 119904lowast
119905
(12) Loop(13) while 119878 = 119878
119896do
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878
1 119878
1 119878
2 ) the correct sequence of the duration
vector is d1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3= [1 1 1]
119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d
1= [1 1 1]
119879 d2= [2 1 1]
119879 and d3=
[1 2 1]119879 which is in contradiction with the definition of the
state duration vector given in (2)To calculate the average state duration variable 119889
119905(119894) we
propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as
119889119905(119894) = P (119904
119905minus1= 119878
119894| 119904
119905= 119878
119894 x
1 x
119905) sdot (119889
119905minus1(119894) + 1) (19)
=119886119894119894(d
119905minus1) sdot 120572
119905minus1(119894) sdot 119887
119894(x
119905)
120572119905(119894)
sdot (119889119905minus1
(119894) + 1)
1 le 119894 le 119873
(20)
The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878
119894in the previous step
Using the proposed (20) the forward algorithm can bespecified as follows
(1) initialization with 1 le 119894 le 119873
1205721(119894) = 120587
119894119887119894(x
1)
1198891(119894) = 1
Ad1
= P (d1) + (I minus P (d
1))A0
(21)
where P(d119894) is estimated using (6)
(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1
120572119905+1
(119895) = [
119873
sum
119894=1
120572119905(119894) 119886
119894119895(d
119905)] 119887
119895(x
119905+1) (22)
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (23)
Ad119905+1
= P (d119905+1
) + (I minus P (d119905+1
))A0 (24)
where 119886119894119895(d
119905) are the coefficients of the matrix Ad
119905
(3) termination
P (x | 120582) =119873
sum
119894=1
120572119879(119894) (25)
Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573
119905(119894) as
120573119905(119894) = P (x
119905+1x119905+2
sdot sdot sdot x119879| 119904
119905= 119878
119894 120582) 1 le 119894 le 119873 (26)
Having estimated the dynamic transition matrix Ad119905
foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows
(1) Initialization
120573119879(119894) = 1 1 le 119894 le 119873 (27)
(2) Induction
120573119905(119894) =
119873
sum
119895=1
119886119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873
(28)
Although the variable 120573119905(119894) is not necessary for the
calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223
222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence
Formally given a sequence of observation x = x1x2sdot sdot sdot x
119879
the best state sequence 119878lowast = 119904lowast
1119904lowast
2sdot sdot sdot 119904
lowast
119879corresponding to x is
calculated by defining the variable 120575119905(119894) as
120575119905(119894) = max
11990411199042119904119905minus1
P (11990411199042 119904
119905= 119878
119894 x
1x2sdot sdot sdot x
119905| 120582) (29)
The procedure to recursively calculate the variable 120575119905(119894)
and to retrieve the target state sequence (ie the argumentswhich maximize the 120575
119905(119894)rsquos) for the proposed HSMM is a
straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575
119905(119894) of the dynamic transition matrix Ad
119905
= [119886119894119895(d
119905)]
calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows
(1) initialization with 1 le 119894 le 119873
1205751(119894) = 120587
119894119887119894(x
1)
1205951(119894) = 0
(30)
(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879
120575119905(119895) = max
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] 119887
119895(x
119905) (31)
120595119905(119895) = argmax
1le119894le119873
[120575119905minus1
(119894) 119886119894119895(d
119905)] (32)
(3) termination
119875lowast= max
1le119894le119873
[120575119879(119894)] (33)
119904lowast
119879= argmax
1le119894le119873
[120575119879(119894)] (34)
where we keep track of the argument maximizing (31) usingthe vector 120595
119905 which tracked back gives the desired best
state sequence
119904lowast
119905= 120595
119905+1(119904
lowast
119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)
Mathematical Problems in Engineering 7
223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0
Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0
Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =
x1x2sdot sdot sdot x
119879 referred to as training set in the following the
training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)
We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations
Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations
Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585
119905(119894 119895) which
represents the probability of being in state 119878119894at time 119905 and
in state 119878119895at time 119905 + 1 given the model and the observation
sequence as
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582) (36)
However in the HSMM case the variable 120585119905(119894 119895) considers
the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582)
=P (119904
119905= 119878
119894 119904
119905+1= 119878
119895 x | 120582)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
sum119873
119894=1sum
119873
119895=1120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
(37)
From 120585119905(119894 119895) we can derive the quantity 120574
119905(119894) (already
defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model
parameters
120574119905(119894) =
119873
sum
119895=1
120585119905(119894 119895) (38)
Finally the the reestimation formulas for the parameters120587 and A0 are given by
120587119894= 120574
1(119894) (39)
1198860
119894119895=
(sum119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
sum119873
119895=1(sum
119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
(40)
where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where
119892119894119895= 0 for 119894 = 119895 and 119892
119894119895= 1 for 119894 = 119895 ⊙ represents the element
by element product between two matrices sum119879minus1
119905=1120574119905(119894) is the
expected number of transitions from state 119878119894 andsum119879minus1
119905=1120585119905(119894 119895)
is the expected number of transitions from state 119878119894to state 119878
119895
Equation (39) represents the expected number of timesthat the model starts in state 119878
119894 while (40) represents the
expected number of transitions from state 119878119894to state 119878
119895with
119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878
119894
For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873
119895=11198860
119894119895= 1
for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587
119894inherently sums up to 1 at each iteration since
it represents the expected frequency in state 119878119894at time 119905 = 1
for each 1 le 119894 le 119873With respect to the reestimation of the state duration
parameters Θ firstly we estimate the mean 120583119894119889
and thevariance 1205902
119894119889of the 119894th state duration for each 1 le 119894 le 119873
from the forward and backward variables and the estimationof the state duration variable
120583119894119889
=sum
119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)) 119889
119905(119894)
sum119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
(41)
1205902
119894119889= (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
sdot (119889119905(119894) minus 120583
119894119889)2
)
sdot (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)))
minus1
(42)
where (41) can be interpreted as the probability of transitionfrom state 119878
119894to 119878
119895with 119894 = 119895 at time 119905weighted by the duration
of state 119878119894at 119905 giving the desired expected value while in (42)
the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance
Then the parameters of the desired duration distributioncan be estimated from 120583
119894119889and 1205902
119894119889 For example if a Gamma
distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]
119894and 120578
119894
for each 1 le 119894 le 119873 can be calculated as ]119894= 120583
2
119894119889120590
2
119894119889and
120578119894= 120590
2
119894119889120583
119894119889
8 Mathematical Problems in Engineering
Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]
In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878
119895at time 119905
with the probability of the observation vector x119905evaluated by
the 119896th mixture component as
120574119905(119895 119896) = [
[
120572119905(119895) 120573
119905(119895)
sum119873
119895=1120572119905(119895) 120573
119905(119895)
]
]
sdot [
[
119888119895119896N (x
119905120583
119895119896U
119895119896)
sum119872
119898=1119888119895119898N (x
119905120583
119895119898U
119895119898)
]
]
(43)
By using the former quantity the parameters 119888119895119896 120583
119895119896 andU
119895119896
are reestimated through the following formulas
119888119895119896=
sum119879
119905=1120574119905(119895 119896)
sum119879
119905=1sum
119872
119898=1120574119905(119895 119898)
120583119895119896=sum
119879
119905=1120574119905(119895 119896) sdot x
119905
sum119879
119905=1120574119905(119895 119896)
U119895119896=sum
119879
119905=1120574119905(119895 119896) sdot (x
119905minus 120583
119895119896) (x
119905minus 120583
119895119896)119879
sum119879
119905=1120574119905(119895 119896)
(44)
where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for
the observation matrix 119887119895(119897) is
119887119895(119897) =
sum119879
119905=1with119909119905=119883119897
120574119905(119895)
sum119879
119905=1120574119905(119895)
(45)
where the quantity 120574119905(119895) which takes into account the dura-
tion dependent forward variable 120572119905(119895) is calculated through
(17)The reader is referred to Rabinerrsquos work [13] for the inter-
pretation on the observation parameters reestimation formu-las
3 AIC-Based Model Selection
In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion
In general information criteria are represented as a two-term structure They account for a compromise between
a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations
The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as
AIC =minus log 119871 () + 119901
119879 (46)
where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)
Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901
ℎ+ 119901
119900 where
119901ℎare the parameters of the hidden states layer while 119901
119900are
those of the observation layerIn particular 119901
ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where
(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition
matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the
number of parameters 120579 of the duration distribution
Concerning 119901119900 a distinction must be made between
discrete and continuous observations
(i) in the case of discrete observations with 119871 possibleobservable values 119901
119900= (119871 minus 1) sdot 119873 which accounts
for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate
mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901
[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862
4 Remaining Useful Lifetime Estimation
One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state
As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878
119896that represents the
failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state
Mathematical Problems in Engineering 9
119878119896 If we assume that the time to failure is a random variable
119863 following a determinate probability density we define theRUL at the current time 119905 as
RUL119905= 119863 = E (119863) 119904
119905+= 119878
119896 119904
119905+minus1= 119878
119894
1 le 119894 119896 le 119873 119894 = 119896
(47)
where E denotes the expected valueHaving fixed the failure state the estimation of the RUL
is performed in two steps every time a new observation isacquired (online)
(1) estimation of the current state(2) projection of the future state transitions until the
failure state is reached and estimation of the expectedsojourn time
The estimation of the current state is performed via theViterbi path that is the variable 120575
119905= [120575
119905(119894)]
1le119894le119873defined in
(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575
119905(119894) obtained as
120575119905(119894) = max
11990411199042119904119905minus1
P (119904119905= 119878
119894| 119904
11199042sdot sdot sdot 119904
119905minus1 x
1x2sdot sdot sdot x
119905 120582)
=120575119905(119894)
sum119873
119895=1120575119905(119895)
1 le 119894 le 119873
(48)
that is an estimate of the probability of being in state 119878119894at
time 119905Together with the normalized variable 120575
119905(119894) the maxi-
mum a posteriori estimate of the current state 119904lowast
119905is taken
into account according to (34) If 119904lowast119905coincides with the failure
state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904
lowast
119905) is
calculated as
119889avg (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (49)
where with 120583119889119894
we denote the expected value of the durationvariable in state 119878
119894according to the duration distribution
specified by the parameters 120579119894 Equation (49) thus estimates
the remaining time in the current state by subtracting theestimated states duration 119889
119905(119894) at time 119905 from the expected
sojourn time of state 119878119894 and weighting the result using the
uncertainty about the current state 120575119905(119894) and finally by
summing up all the contributions from each stateIn addition to the average remaining time a lower and an
upper bound value can be calculated based on the standarddeviation 120590
119889119894
of the duration distribution for state 119878119894
119889low (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (50)
119889up (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (51)
Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows
120575next = [120575119905+119889
(119894)]1le119894le119873
= (A0)119879
sdot 120575119905 (52)
while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as
119904lowast
next = 119904lowast
119905+119889= argmax
1le119894le119873
120575119905+119889
(119894) (53)
Again if 119904lowast119905+119889
coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =
119889avg(119904lowast
119905) calculated at the previous step with the bound values
119863low = 119889low(119904lowast
119905) and119863up = 119889up(119904
lowast
119905) Otherwise the estimation
of the sojourn time of the next state is calculated as follows
119889avg (119904lowast
119905+119889) =
119873
sum
119894=1
120583119889119894
⊙ 120575119905+119889
(119894) (54)
119889low (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
) ⊙ 120575119905+119889
(119894) (55)
119889up (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
) ⊙ 120575119905+119889
(119894) (56)
This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state
119863avg = sum119889avg (57)
119863low = sum119889low (58)
119863up = sum119889up (59)
Finally Algorithm 1 details the above described RUL esti-mation procedure
5 Experimental Results
To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data
The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)
10 Mathematical Problems in Engineering
(1) function RulEstimation(x119905 119878
119896) ⊳ x
119905 The last observation acquired
(2) ⊳ 119878119896 The failure state
(3) Initialization(4) 119863avg larr 0
(5) 119863low larr 0
(6) 119863up larr 0
(7) Current state estimation(8) Calculate 120575
119905⊳ Using (48)
(9) Calculate 119904lowast119905
⊳ Using (34)(10) Calculate d
119905⊳ Using (20)
(11) 119878 larr 119904lowast
119905
(12) Loop(13) while 119878 = 119878
119896do
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0
Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0
Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =
x1x2sdot sdot sdot x
119879 referred to as training set in the following the
training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)
We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations
Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations
Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585
119905(119894 119895) which
represents the probability of being in state 119878119894at time 119905 and
in state 119878119895at time 119905 + 1 given the model and the observation
sequence as
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582) (36)
However in the HSMM case the variable 120585119905(119894 119895) considers
the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by
120585119905(119894 119895) = P (119904
119905= 119878
119894 119904
119905+1= 119878
119895| x 120582)
=P (119904
119905= 119878
119894 119904
119905+1= 119878
119895 x | 120582)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
P (x | 120582)
=120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
sum119873
119894=1sum
119873
119895=1120572119905(119894) 119886
119894119895(d
119905) 119887
119895(x
119905+1) 120573
119905+1(119895)
(37)
From 120585119905(119894 119895) we can derive the quantity 120574
119905(119894) (already
defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model
parameters
120574119905(119894) =
119873
sum
119895=1
120585119905(119894 119895) (38)
Finally the the reestimation formulas for the parameters120587 and A0 are given by
120587119894= 120574
1(119894) (39)
1198860
119894119895=
(sum119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
sum119873
119895=1(sum
119879minus1
119905=1120585119905(119894 119895)) ⊙ 119866
(40)
where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where
119892119894119895= 0 for 119894 = 119895 and 119892
119894119895= 1 for 119894 = 119895 ⊙ represents the element
by element product between two matrices sum119879minus1
119905=1120574119905(119894) is the
expected number of transitions from state 119878119894 andsum119879minus1
119905=1120585119905(119894 119895)
is the expected number of transitions from state 119878119894to state 119878
119895
Equation (39) represents the expected number of timesthat the model starts in state 119878
119894 while (40) represents the
expected number of transitions from state 119878119894to state 119878
119895with
119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878
119894
For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873
119895=11198860
119894119895= 1
for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587
119894inherently sums up to 1 at each iteration since
it represents the expected frequency in state 119878119894at time 119905 = 1
for each 1 le 119894 le 119873With respect to the reestimation of the state duration
parameters Θ firstly we estimate the mean 120583119894119889
and thevariance 1205902
119894119889of the 119894th state duration for each 1 le 119894 le 119873
from the forward and backward variables and the estimationof the state duration variable
120583119894119889
=sum
119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)) 119889
119905(119894)
sum119879minus1
119905=1120572119905(119894) (sum
119873
119895=1119895 =119894119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
(41)
1205902
119894119889= (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895))
sdot (119889119905(119894) minus 120583
119894119889)2
)
sdot (
119879minus1
sum
119905=1
120572119905(119894)(
119873
sum
119895=1119895 =119894
119886119894119895(119889
119905(119894)) 119887
119895(x
119905+1) 120573
119905+1(119895)))
minus1
(42)
where (41) can be interpreted as the probability of transitionfrom state 119878
119894to 119878
119895with 119894 = 119895 at time 119905weighted by the duration
of state 119878119894at 119905 giving the desired expected value while in (42)
the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance
Then the parameters of the desired duration distributioncan be estimated from 120583
119894119889and 1205902
119894119889 For example if a Gamma
distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]
119894and 120578
119894
for each 1 le 119894 le 119873 can be calculated as ]119894= 120583
2
119894119889120590
2
119894119889and
120578119894= 120590
2
119894119889120583
119894119889
8 Mathematical Problems in Engineering
Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]
In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878
119895at time 119905
with the probability of the observation vector x119905evaluated by
the 119896th mixture component as
120574119905(119895 119896) = [
[
120572119905(119895) 120573
119905(119895)
sum119873
119895=1120572119905(119895) 120573
119905(119895)
]
]
sdot [
[
119888119895119896N (x
119905120583
119895119896U
119895119896)
sum119872
119898=1119888119895119898N (x
119905120583
119895119898U
119895119898)
]
]
(43)
By using the former quantity the parameters 119888119895119896 120583
119895119896 andU
119895119896
are reestimated through the following formulas
119888119895119896=
sum119879
119905=1120574119905(119895 119896)
sum119879
119905=1sum
119872
119898=1120574119905(119895 119898)
120583119895119896=sum
119879
119905=1120574119905(119895 119896) sdot x
119905
sum119879
119905=1120574119905(119895 119896)
U119895119896=sum
119879
119905=1120574119905(119895 119896) sdot (x
119905minus 120583
119895119896) (x
119905minus 120583
119895119896)119879
sum119879
119905=1120574119905(119895 119896)
(44)
where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for
the observation matrix 119887119895(119897) is
119887119895(119897) =
sum119879
119905=1with119909119905=119883119897
120574119905(119895)
sum119879
119905=1120574119905(119895)
(45)
where the quantity 120574119905(119895) which takes into account the dura-
tion dependent forward variable 120572119905(119895) is calculated through
(17)The reader is referred to Rabinerrsquos work [13] for the inter-
pretation on the observation parameters reestimation formu-las
3 AIC-Based Model Selection
In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion
In general information criteria are represented as a two-term structure They account for a compromise between
a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations
The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as
AIC =minus log 119871 () + 119901
119879 (46)
where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)
Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901
ℎ+ 119901
119900 where
119901ℎare the parameters of the hidden states layer while 119901
119900are
those of the observation layerIn particular 119901
ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where
(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition
matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the
number of parameters 120579 of the duration distribution
Concerning 119901119900 a distinction must be made between
discrete and continuous observations
(i) in the case of discrete observations with 119871 possibleobservable values 119901
119900= (119871 minus 1) sdot 119873 which accounts
for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate
mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901
[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862
4 Remaining Useful Lifetime Estimation
One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state
As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878
119896that represents the
failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state
Mathematical Problems in Engineering 9
119878119896 If we assume that the time to failure is a random variable
119863 following a determinate probability density we define theRUL at the current time 119905 as
RUL119905= 119863 = E (119863) 119904
119905+= 119878
119896 119904
119905+minus1= 119878
119894
1 le 119894 119896 le 119873 119894 = 119896
(47)
where E denotes the expected valueHaving fixed the failure state the estimation of the RUL
is performed in two steps every time a new observation isacquired (online)
(1) estimation of the current state(2) projection of the future state transitions until the
failure state is reached and estimation of the expectedsojourn time
The estimation of the current state is performed via theViterbi path that is the variable 120575
119905= [120575
119905(119894)]
1le119894le119873defined in
(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575
119905(119894) obtained as
120575119905(119894) = max
11990411199042119904119905minus1
P (119904119905= 119878
119894| 119904
11199042sdot sdot sdot 119904
119905minus1 x
1x2sdot sdot sdot x
119905 120582)
=120575119905(119894)
sum119873
119895=1120575119905(119895)
1 le 119894 le 119873
(48)
that is an estimate of the probability of being in state 119878119894at
time 119905Together with the normalized variable 120575
119905(119894) the maxi-
mum a posteriori estimate of the current state 119904lowast
119905is taken
into account according to (34) If 119904lowast119905coincides with the failure
state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904
lowast
119905) is
calculated as
119889avg (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (49)
where with 120583119889119894
we denote the expected value of the durationvariable in state 119878
119894according to the duration distribution
specified by the parameters 120579119894 Equation (49) thus estimates
the remaining time in the current state by subtracting theestimated states duration 119889
119905(119894) at time 119905 from the expected
sojourn time of state 119878119894 and weighting the result using the
uncertainty about the current state 120575119905(119894) and finally by
summing up all the contributions from each stateIn addition to the average remaining time a lower and an
upper bound value can be calculated based on the standarddeviation 120590
119889119894
of the duration distribution for state 119878119894
119889low (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (50)
119889up (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (51)
Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows
120575next = [120575119905+119889
(119894)]1le119894le119873
= (A0)119879
sdot 120575119905 (52)
while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as
119904lowast
next = 119904lowast
119905+119889= argmax
1le119894le119873
120575119905+119889
(119894) (53)
Again if 119904lowast119905+119889
coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =
119889avg(119904lowast
119905) calculated at the previous step with the bound values
119863low = 119889low(119904lowast
119905) and119863up = 119889up(119904
lowast
119905) Otherwise the estimation
of the sojourn time of the next state is calculated as follows
119889avg (119904lowast
119905+119889) =
119873
sum
119894=1
120583119889119894
⊙ 120575119905+119889
(119894) (54)
119889low (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
) ⊙ 120575119905+119889
(119894) (55)
119889up (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
) ⊙ 120575119905+119889
(119894) (56)
This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state
119863avg = sum119889avg (57)
119863low = sum119889low (58)
119863up = sum119889up (59)
Finally Algorithm 1 details the above described RUL esti-mation procedure
5 Experimental Results
To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data
The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)
10 Mathematical Problems in Engineering
(1) function RulEstimation(x119905 119878
119896) ⊳ x
119905 The last observation acquired
(2) ⊳ 119878119896 The failure state
(3) Initialization(4) 119863avg larr 0
(5) 119863low larr 0
(6) 119863up larr 0
(7) Current state estimation(8) Calculate 120575
119905⊳ Using (48)
(9) Calculate 119904lowast119905
⊳ Using (34)(10) Calculate d
119905⊳ Using (20)
(11) 119878 larr 119904lowast
119905
(12) Loop(13) while 119878 = 119878
119896do
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]
In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878
119895at time 119905
with the probability of the observation vector x119905evaluated by
the 119896th mixture component as
120574119905(119895 119896) = [
[
120572119905(119895) 120573
119905(119895)
sum119873
119895=1120572119905(119895) 120573
119905(119895)
]
]
sdot [
[
119888119895119896N (x
119905120583
119895119896U
119895119896)
sum119872
119898=1119888119895119898N (x
119905120583
119895119898U
119895119898)
]
]
(43)
By using the former quantity the parameters 119888119895119896 120583
119895119896 andU
119895119896
are reestimated through the following formulas
119888119895119896=
sum119879
119905=1120574119905(119895 119896)
sum119879
119905=1sum
119872
119898=1120574119905(119895 119898)
120583119895119896=sum
119879
119905=1120574119905(119895 119896) sdot x
119905
sum119879
119905=1120574119905(119895 119896)
U119895119896=sum
119879
119905=1120574119905(119895 119896) sdot (x
119905minus 120583
119895119896) (x
119905minus 120583
119895119896)119879
sum119879
119905=1120574119905(119895 119896)
(44)
where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for
the observation matrix 119887119895(119897) is
119887119895(119897) =
sum119879
119905=1with119909119905=119883119897
120574119905(119895)
sum119879
119905=1120574119905(119895)
(45)
where the quantity 120574119905(119895) which takes into account the dura-
tion dependent forward variable 120572119905(119895) is calculated through
(17)The reader is referred to Rabinerrsquos work [13] for the inter-
pretation on the observation parameters reestimation formu-las
3 AIC-Based Model Selection
In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion
In general information criteria are represented as a two-term structure They account for a compromise between
a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations
The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as
AIC =minus log 119871 () + 119901
119879 (46)
where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)
Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901
ℎ+ 119901
119900 where
119901ℎare the parameters of the hidden states layer while 119901
119900are
those of the observation layerIn particular 119901
ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where
(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition
matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the
number of parameters 120579 of the duration distribution
Concerning 119901119900 a distinction must be made between
discrete and continuous observations
(i) in the case of discrete observations with 119871 possibleobservable values 119901
119900= (119871 minus 1) sdot 119873 which accounts
for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate
mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901
[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862
4 Remaining Useful Lifetime Estimation
One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state
As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878
119896that represents the
failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state
Mathematical Problems in Engineering 9
119878119896 If we assume that the time to failure is a random variable
119863 following a determinate probability density we define theRUL at the current time 119905 as
RUL119905= 119863 = E (119863) 119904
119905+= 119878
119896 119904
119905+minus1= 119878
119894
1 le 119894 119896 le 119873 119894 = 119896
(47)
where E denotes the expected valueHaving fixed the failure state the estimation of the RUL
is performed in two steps every time a new observation isacquired (online)
(1) estimation of the current state(2) projection of the future state transitions until the
failure state is reached and estimation of the expectedsojourn time
The estimation of the current state is performed via theViterbi path that is the variable 120575
119905= [120575
119905(119894)]
1le119894le119873defined in
(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575
119905(119894) obtained as
120575119905(119894) = max
11990411199042119904119905minus1
P (119904119905= 119878
119894| 119904
11199042sdot sdot sdot 119904
119905minus1 x
1x2sdot sdot sdot x
119905 120582)
=120575119905(119894)
sum119873
119895=1120575119905(119895)
1 le 119894 le 119873
(48)
that is an estimate of the probability of being in state 119878119894at
time 119905Together with the normalized variable 120575
119905(119894) the maxi-
mum a posteriori estimate of the current state 119904lowast
119905is taken
into account according to (34) If 119904lowast119905coincides with the failure
state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904
lowast
119905) is
calculated as
119889avg (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (49)
where with 120583119889119894
we denote the expected value of the durationvariable in state 119878
119894according to the duration distribution
specified by the parameters 120579119894 Equation (49) thus estimates
the remaining time in the current state by subtracting theestimated states duration 119889
119905(119894) at time 119905 from the expected
sojourn time of state 119878119894 and weighting the result using the
uncertainty about the current state 120575119905(119894) and finally by
summing up all the contributions from each stateIn addition to the average remaining time a lower and an
upper bound value can be calculated based on the standarddeviation 120590
119889119894
of the duration distribution for state 119878119894
119889low (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (50)
119889up (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (51)
Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows
120575next = [120575119905+119889
(119894)]1le119894le119873
= (A0)119879
sdot 120575119905 (52)
while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as
119904lowast
next = 119904lowast
119905+119889= argmax
1le119894le119873
120575119905+119889
(119894) (53)
Again if 119904lowast119905+119889
coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =
119889avg(119904lowast
119905) calculated at the previous step with the bound values
119863low = 119889low(119904lowast
119905) and119863up = 119889up(119904
lowast
119905) Otherwise the estimation
of the sojourn time of the next state is calculated as follows
119889avg (119904lowast
119905+119889) =
119873
sum
119894=1
120583119889119894
⊙ 120575119905+119889
(119894) (54)
119889low (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
) ⊙ 120575119905+119889
(119894) (55)
119889up (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
) ⊙ 120575119905+119889
(119894) (56)
This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state
119863avg = sum119889avg (57)
119863low = sum119889low (58)
119863up = sum119889up (59)
Finally Algorithm 1 details the above described RUL esti-mation procedure
5 Experimental Results
To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data
The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)
10 Mathematical Problems in Engineering
(1) function RulEstimation(x119905 119878
119896) ⊳ x
119905 The last observation acquired
(2) ⊳ 119878119896 The failure state
(3) Initialization(4) 119863avg larr 0
(5) 119863low larr 0
(6) 119863up larr 0
(7) Current state estimation(8) Calculate 120575
119905⊳ Using (48)
(9) Calculate 119904lowast119905
⊳ Using (34)(10) Calculate d
119905⊳ Using (20)
(11) 119878 larr 119904lowast
119905
(12) Loop(13) while 119878 = 119878
119896do
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
119878119896 If we assume that the time to failure is a random variable
119863 following a determinate probability density we define theRUL at the current time 119905 as
RUL119905= 119863 = E (119863) 119904
119905+= 119878
119896 119904
119905+minus1= 119878
119894
1 le 119894 119896 le 119873 119894 = 119896
(47)
where E denotes the expected valueHaving fixed the failure state the estimation of the RUL
is performed in two steps every time a new observation isacquired (online)
(1) estimation of the current state(2) projection of the future state transitions until the
failure state is reached and estimation of the expectedsojourn time
The estimation of the current state is performed via theViterbi path that is the variable 120575
119905= [120575
119905(119894)]
1le119894le119873defined in
(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575
119905(119894) obtained as
120575119905(119894) = max
11990411199042119904119905minus1
P (119904119905= 119878
119894| 119904
11199042sdot sdot sdot 119904
119905minus1 x
1x2sdot sdot sdot x
119905 120582)
=120575119905(119894)
sum119873
119895=1120575119905(119895)
1 le 119894 le 119873
(48)
that is an estimate of the probability of being in state 119878119894at
time 119905Together with the normalized variable 120575
119905(119894) the maxi-
mum a posteriori estimate of the current state 119904lowast
119905is taken
into account according to (34) If 119904lowast119905coincides with the failure
state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904
lowast
119905) is
calculated as
119889avg (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (49)
where with 120583119889119894
we denote the expected value of the durationvariable in state 119878
119894according to the duration distribution
specified by the parameters 120579119894 Equation (49) thus estimates
the remaining time in the current state by subtracting theestimated states duration 119889
119905(119894) at time 119905 from the expected
sojourn time of state 119878119894 and weighting the result using the
uncertainty about the current state 120575119905(119894) and finally by
summing up all the contributions from each stateIn addition to the average remaining time a lower and an
upper bound value can be calculated based on the standarddeviation 120590
119889119894
of the duration distribution for state 119878119894
119889low (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (50)
119889up (119904lowast
119905) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
minus 119889119905(119894)) ⊙ 120575
119905(119894) (51)
Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows
120575next = [120575119905+119889
(119894)]1le119894le119873
= (A0)119879
sdot 120575119905 (52)
while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as
119904lowast
next = 119904lowast
119905+119889= argmax
1le119894le119873
120575119905+119889
(119894) (53)
Again if 119904lowast119905+119889
coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =
119889avg(119904lowast
119905) calculated at the previous step with the bound values
119863low = 119889low(119904lowast
119905) and119863up = 119889up(119904
lowast
119905) Otherwise the estimation
of the sojourn time of the next state is calculated as follows
119889avg (119904lowast
119905+119889) =
119873
sum
119894=1
120583119889119894
⊙ 120575119905+119889
(119894) (54)
119889low (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
minus 120590119889119894
) ⊙ 120575119905+119889
(119894) (55)
119889up (119904lowast
119905+119889) =
119873
sum
119894=1
(120583119889119894
+ 120590119889119894
) ⊙ 120575119905+119889
(119894) (56)
This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state
119863avg = sum119889avg (57)
119863low = sum119889low (58)
119863up = sum119889up (59)
Finally Algorithm 1 details the above described RUL esti-mation procedure
5 Experimental Results
To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data
The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)
10 Mathematical Problems in Engineering
(1) function RulEstimation(x119905 119878
119896) ⊳ x
119905 The last observation acquired
(2) ⊳ 119878119896 The failure state
(3) Initialization(4) 119863avg larr 0
(5) 119863low larr 0
(6) 119863up larr 0
(7) Current state estimation(8) Calculate 120575
119905⊳ Using (48)
(9) Calculate 119904lowast119905
⊳ Using (34)(10) Calculate d
119905⊳ Using (20)
(11) 119878 larr 119904lowast
119905
(12) Loop(13) while 119878 = 119878
119896do
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg
51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations
511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878
5as absorbing
(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete
For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]
and are adapted to obtain an equivalent left-right parametricHSMM as follows
120587 =
[[[[[
[
1
0
0
0
0
]]]]]
]
A0=
[[[[[
[
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
]]]]]
]
ΘN = 1205791= [100 20] 120579
2= [90 15] 120579
3= [100 20]
1205794= [80 25] 120579
5= [200 1]
ΘG = 1205791= [500 02] 120579
2= [540 01667]
1205793= [500 02] 120579
4= [256 03125]
1205795= [800 0005]
ΘW = 1205791= [102 28] 120579
2= [92 29] 120579
3= [102 28]
1205794= [82 20] 120579
5= [200 256]
(60)
whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583
119889and the the variance 120590
2
119889of the
Gaussian distribution the shape ]119889and the scale 120578
119889of the
Gamma distribution and the scale 119886119889and the shape 119887
119889of
Mathematical Problems in Engineering 11
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 6000
20
40
60Observed signal
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
(a) Example of simulated data for the continuous case
0 100 200 300 400 500 600
12345
Hidden states sequence
0 100 200 300 400 500 6000
100
200
300 State duration
0 100 200 300 400 500 600
132
54
76
Observed symbols
(b) Example of simulated data for the discrete case
Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case
theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878
5 being the absorbing state the
duration parameters 1205795have no influence on the data since
once the state 1198785is reached the system will remain there
foreverConcerning the continuous observation modeling a
bivariate Gaussian distribution has been used with the fol-lowing parameters [15]
1205831= [
20
20] 120583
2= [
20
35] 120583
3= [
35
35]
1205835= [
28
28]
1198801= [
20 0
0 20] 119880
2= [
15 0
0 15] 119880
3= [
15 minus2
minus2 15]
1198804= [
5 0
0 5] 119880
5= [
10 3
3 10]
(61)
while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution
119861 =
[[[[[
[
08 02 0 0 0 0 0
01 08 01 0 0 0 0
0 01 08 01 0 0 0
0 0 01 07 01 01 0
0 0 0 02 06 01 01
]]]]]
]
(62)
An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used
512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of
learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4
As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8
The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling
513 Condition Monitoring The optimal parameters 120582lowast
obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x
1x2sdot sdot sdot x
119905 Each time a new data point is acquired
the Viterbi algorithm is used to estimate the current state119904lowast
119905= argmax
1le119894le119873[120575
119905(119894)] as specified in (34)
12 Mathematical Problems in Engineering
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
(a) AIC values for continuous data and Gaussian durationdistribution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(b) AIC values for continuous data and Gamma duration distri-bution
2 3 4 5 6 7 80
1
2
3
4
5
6
7
Number of states
(c) AIC values for continuous data and Weibull duration distri-bution
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
(d) AIC values for discrete data and Gaussian duration distribu-tion
2 3 4 5 6 7 8Number of states
0
02
04
06
08
1
12
14
GaussianGamma
Weibull
(e) AIC values for discrete data andGammaduration distribution
2 3 4 5 6 7 8Number of states
GaussianGamma
Weibull
0
02
04
06
08
1
12
14
(f) AIC values for discrete data andWeibull duration distribution
Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data
An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the
HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series
Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of
Mathematical Problems in Engineering 13
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0985
0 100 200 300 400 500 600 7000
204060 Observations
Correct guessWrong guess
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
(a) State estimation with Viterbi path for continuous data andGamma duration distribution
0 100 200 300 400 500 600 70012345
True state sequence
0 100 200 300 400 500 600 70012345
Viterbi path accuracy 0992
Correct guessWrong guess
0 100 200 300 400 500 600 700246
Observations
(b) State estimation with Viterbi path for discrete data andGaussian duration distribution
Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition
the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency
514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878
5as the failure state and
the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x
1x2sdot sdot sdot x
119905 up to time 119905 When a new obser-
vation is acquired after the current state probability 120575119905(119894)
is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed
Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation
To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as
APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)
where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as
APE =sum
119879
119905=1APE (119905)119879
(64)
where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances
The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error
Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported
Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This
14 Mathematical Problems in Engineering
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
Time
True RULUpper RUL
Average RULLower RUL
(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution
Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches
Table 1 State recognition accuracy
(a) Continuous observations
Test case Duration distributionGaussian Gamma Weibull
is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)
52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality
The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels
521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]
The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing
Mathematical Problems in Engineering 15
Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)
The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6
The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface
The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process
Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]
(a) APE of the RUL estimation for the continuous observation test cases
Testcase
Duration distributionGaussian Gamma Weibull
APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low
Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)
Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)
16 Mathematical Problems in Engineering
Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]
Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
Figure 6 Global overview of the Pronostia experimental platform[19]
Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered
(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton
(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton
(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton
Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test
Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)
As already mentioned the available data correspondto normally degraded bearings meaning that the defects
were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]
In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we
estimate RMS as 119909RMS119908
= radic(1119871)sum119871
119905=11199032119908(119905) and kurtosis as
119909KURT119908
= ((1119871)sum119871
119905=1(119903
119908(119905) minus 119903
119908)4)((1119871)sum
119871
119905=1(119903
119908(119905) minus 119903
119908)2)2
where 119903119908is the mean of 119903
119908 An example of feature extraction
for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after
the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method
522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following
(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1
Mathematical Problems in Engineering 17
times106
0 1 2 3 4 5 6 7
50
40
30
20
10
0
minus10
minus20
minus30
minus40
minus50
Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]
(b)(a)
0 280300
7
Extracted feature RMS
Time (s)
0 280300
60
Time (s)
Extracted feature kurtosis
Featureextraction
Win
dow
1
Win
dow
2
Win
dow
3
Win
dow
Win
dow
r(t)
x1 x2 x3
middot middot middot
middot middot middot
nminus1
n
xnminus1 xn
Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density
(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration
Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast
119894were estimated for the 119894th testing bear-
ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878
4as the failure state The same proce-
dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-
ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as
18 Mathematical Problems in Engineering
2 3 4 5 60
005
01
015
02
025
03
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
Gaussian duration modelGamma duration modelWeibull duration model
(a) AIC values for Condition 1
2 3 4 5 60
01
02
03
04
05
06
Number of states
Gaussian duration modelGamma duration modelWeibull duration model
(b) AIC values for Condition 2
Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density
0 22590
22590
45000
Time (s)
True RULUpper RUL
Average RULLower RUL
(a) RUL estimation for Bearing1 7
True RULUpper RUL
Average RULLower RUL
0 70100
7010
19000
Time (s)
(b) RUL estimation for Bearing2 6
Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings
well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time
Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors
of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2
6 Conclusion and Future Work
In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation
The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required
Mathematical Problems in Engineering 19
Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology
This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications
As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension
Appendix
In this appendix we give the derivation of the state durationvariable introduced in (20) as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(119883
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A1)
The random variable 119889119905(119894) has been defined in Section 21
as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889
119905(119894) is sampled from an
arbitrary distribution
119889119905(119894) sim 119891 (119889) (A2)
We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as
P (119889119905(119894) = 119889) = P (119904
119905minus119889minus1= 119878
119894 119904
119905minus119889= 119878
119894 119904
119905minus1= 119878
119894|
119904119905= 119878
119894 x
1 x
119905 120582)
(A3)
We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows
119889119905+1
(119894) = P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1) sdot (119889
119905(119894) + 1)
(A15)
The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step
In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as
P (119904119905+1
= 119878119894| 119904
119905= 119878
119894 x
1 x
119905)
= sum
119889119905
119886119894119894(d
119905) sdot P (119889
119905| x
1 x
119905)
asymp 119886119894119894(d
119905)
(A20)
while the denominator of (A19) can be expressed as follows
P (x119905+1
| x1 x
119905) =
P (x1 x
119905 x
119905+1)
P (x1 x
119905)
=sum
119873
119894=1120572119905+1
(119894)
sum119873
119894=1120572119905(119894)
(A21)
By substituting (A20) and (A21) in (A19) we obtain
P (119904119905= 119878
119894 119904
119905+1= 119878
119894| x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
sum119873
119894=1120572119905+1
(119894)
(A22)
and then by combining (A22) and (A16) we obtain
P (119904119905= 119878
119894| 119904
119905+1= 119878
119894 x
1 x
119905+1)
=119886119894119894(d
119905) sdot 120574
119905(119894) sdot sum
119873
119894=1120572119905(119894) sdot 119887
119894(x
119905+1)
120574119905+1
(119894) sum119873
119894=1120572119905+1
(119894)
(A23)
Finally by substituting (A23) in (A15) and considering that
120574119905(119894) =
120572119905(119894)
sum119873
119894=1120572119905(119894)
(A24)
we derive the induction formula for 119889119905+1
(119894) in terms of modelparameters as
119889119905+1
(119894) =119886119894119894(d
119905) sdot 120572
119905(119894) sdot 119887
119894(x
119905+1)
120572119905+1
(119894)sdot (119889
119905(119894) + 1) (A25)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989
[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004
[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996
[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002
[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005
[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008
[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309
[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en
[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003
[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002
[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven
22 Mathematical Problems in Engineering
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012
[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008
[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031
[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010
[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010
[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011
[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012
[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012
[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980
[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk
[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007
[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005
[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003
[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997
[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003
[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014
[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003
[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008
[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005
[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999
[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007
[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011
[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001
[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002
[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007
[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997
[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008
[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005
[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997
Mathematical Problems in Engineering 23
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013
[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002
[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986
[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993
[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006
[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973
[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977
[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012
[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985
[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015
[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013
[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013
[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013
[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013
[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012
[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012
[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model
and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013
[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013